Home | History | Annotate | only in /src/sys/net
History log of /src/sys/net
RevisionDateAuthorComments
 1.47 12-Oct-2025  thorpej Some platforms have rules for retrieving the MAC address for an interface
beyond what properties exist. For example, a local address maybe be
present in a device tree property, but a system-wide property may indicate
that it should not be used (in favor of e.g. a singular system MAC addres -
LOOKIN' AT YOU, SUNW!).

So, the ether-get-mac-address device call is introduced to handle this
situation. Consult it before the standard properites, and if it succeeds,
use its result.
 1.46 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.45 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.44 11-Sep-2020  roy branches: 1.44.6; 1.44.8;
Implement address agnostic Neighbor Detection.

This is heavily based on IPv6 Neighbor Detection and allows per protocol
timers which also facilitate Neighor Unreachability Detection.
 1.43 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.42 29-Jan-2020  thorpej Add support for MP-safe network interface statistics by maintaining them
in per-cpu storage, and collecting them for export in an if_data structure
when user-space wants them.

The new if_stat API is structured to make a gradual transition to the
new way in network drivers possible, and per-cpu stats are currently
disabled (thus there is no kernel ABI change). Once all drivers have
been converted, the old ABI will be removed, and per-cpu stats will be
enabled universally.
 1.41 20-Jan-2020  thorpej Remove FDDI support.
 1.40 19-Jan-2020  thorpej Remove Token Ring support.
 1.39 19-Jan-2020  thorpej Remove HIPPI support and the esh(4) driver that uses it. There have not
been any users of HIPPI for some time, and it is unlikely to be resurrected.
 1.38 06-Sep-2018  maxv branches: 1.38.6;
Remove the network ATM code.
 1.37 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.36 16-Feb-2018  knakahara branches: 1.36.2; 1.36.4;
Currently, it is not necessary to install rss_config.h. Pointed out by msaitoh@n.o.
 1.35 16-Feb-2018  knakahara Introduce very simple Receive Side Scaling (RSS) utility.

ok by msaitoh@n.o.
 1.34 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.33 16-Feb-2017  knakahara branches: 1.33.6;
add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.32 27-Oct-2012  alnsn branches: 1.32.14; 1.32.18; 1.32.22;
Add bpfjit and enable it for amd64.
 1.31 27-Sep-2012  alnsn Remove bpf_jit which was ported from FreeBSD recently.

It will soon be replaced with the new bpfjit kernel module.
 1.30 02-Aug-2012  matt branches: 1.30.2;
Export <net/bpf_jit.h> and add to the set lists.
 1.29 22-Aug-2010  rmind branches: 1.29.8;
Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.28 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.27 30-May-2009  hannken branches: 1.27.2; 1.27.4;
No need to include bsd.subdir.mk as bsd.kinc.mk already includes it.
 1.26 26-May-2009  pooka Install agr ioctl header and stop putting our hand under the sys skirt
in ifconfig.
 1.25 05-May-2008  ad branches: 1.25.14;
Back out previous. It broke the build.
 1.24 04-May-2008  ad Don't install sys/net/zlib.h.
 1.23 23-Apr-2008  thorpej branches: 1.23.2;
Add subroutines to support collating per-cpu-gathered network statistics.
 1.22 13-Jan-2007  isaki branches: 1.22.36; 1.22.40; 1.22.42;
Install <net/if_pflog.h>.
 1.21 11-Jan-2007  mouse Hook srt into the rest of the kernel build machinery, so it works to
just uncomment the pseudo-device line (which arguably should go into
other ports' GENERICs too, and at some point may).

OKed by perry.
 1.20 23-Nov-2006  rpaulo branches: 1.20.2; 1.20.4;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.19 18-Jun-2006  uwe branches: 1.19.4; 1.19.6;
Do not instal net/if_pppvar.h, net/if_slvar.h and net/if_stripvar.h.
The former two are no longer necessary as slstats is no more
and pppstats now uses an ioctl instead of rummaging through kmem.
The latter has nothign interesting for the userland, but uses
struct bintime that I'm about to hide under #ifdef _KERNEL.

A bunch of remaining <net/if_*.h> headers is pretty useless to the
userland too, but ... someone else's yag to shave...
 1.18 11-Dec-2005  christos branches: 1.18.4; 1.18.8; 1.18.14; 1.18.16;
merge ktrace-lwp.
 1.17 08-Jan-2005  cube branches: 1.17.8; 1.17.10;
Install net/if_tap.h.
 1.16 22-Jun-2004  itojun fix "includes" for pfvar.h
 1.15 22-Jun-2004  itojun foundation for PF
 1.14 13-Oct-2003  dyoung Complete replacement of the old 802.11 layer with the new.
 1.13 26-Nov-2002  lukem branches: 1.13.6;
Remove KDIR=, since SYS_INCLUDE=symlinks and KDIR are not supported any more.
 1.12 05-Oct-2001  bjh21 Install net/ieee1394.h the same way we install all the other
link-layer-specific headers.
 1.11 17-Aug-2001  augustss branches: 1.11.2;
Install if_bridgevar.h.
 1.10 29-Apr-2001  martin branches: 1.10.2;
Add an in-kernel PPPoE (ppp over ethernet, RFC 2516) implementation,
based on the existing net/if_spppsubr.c stuff.

While there are completely userland (bpf based) implementations available,
those have a vastly larger per packet overhead thus causing major CPU
overhead and higher latency. On an i386 base router, running a 486DX at 50MHz
my line (768kBit/s downstream) was limited to something (varying) between 10
and 20 kByte/s effective download rate. With this implementation I get full
bandwidth (~85kByte/s).

This is client side only. Arguably the right way to add full PPPoE support
(including server side) would be a variation of the ppp line discipline and
appropriate modifications to pppd. I promise every help I can give to anyone
doing that - but I needed this realy fast. Besids, on low memory NAT boxes
with typically a single PPPoE connection, this implementation is more
lightweight than a pppd based one, which nicely fits my needs.
 1.9 12-Dec-2000  thorpej branches: 1.9.2;
Put the BPF DLT_* constants into their own header file so that things
that reference them don't have to slurp in all of the BPF headers.

Define a new generic RAWAF type that is like DLT_RAW, but isn't specific
to IP (the macro takes an AF_* constant as an argument to generate the
actual type).
 1.8 28-Sep-2000  enami Install if_vlanvar.h.
 1.7 19-Apr-2000  itojun branches: 1.7.4;
add net/if_stf.h and netinet/ip_encap.h (almost noone will include them though)
 1.6 23-Jan-2000  chopps Add beginnings of ieee 802.11 generic stuff
 1.5 01-Jul-1999  itojun branches: 1.5.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.4 04-Apr-1999  explorer Install if_sppp.h in include/net/
 1.3 22-Mar-1999  bad branches: 1.3.2;
Add if_token.h to INCS.
 1.2 02-Oct-1998  hwr branches: 1.2.4;
Also install if_gre.h in /usr/include/net/
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.2.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.3.2.1 04-Apr-1999  explorer branches: 1.3.2.1.2; 1.3.2.1.4;
Pull up recent changes to if_sppp*.[ch] (i4b code) with RCS id fixes
 1.3.2.1.4.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.3.2.1.4.1 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.3.2.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.3.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.5.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.4.1 31-Dec-2000  jhawk Pull up revision 1.8 (requested by bouyer):
Add support for 802.1Q virtual LANs.
 1.9.2.4 11-Dec-2002  thorpej Sync with HEAD.
 1.9.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.9.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.9.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.10.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.11.2.1 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.13.6.4 17-Jan-2005  skrll Sync with HEAD.
 1.13.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.13.6.1 03-Aug-2004  skrll Sync with HEAD
 1.17.10.3 26-Feb-2007  yamt sync with head.
 1.17.10.2 30-Dec-2006  yamt sync with head.
 1.17.10.1 21-Jun-2006  yamt sync with head.
 1.17.8.1 19-Nov-2007  bouyer Pull up following revision(s) (requested by tron in ticket #1864):
distrib/sets/lists/comp/mi: revision 1.990 via patch
sys/net/Makefile: revision 1.22 via patch
Install <net/if_pflog.h>.
 1.18.16.1 13-Jul-2006  gdamore Merge from HEAD.
 1.18.14.1 22-Jun-2006  chap Complete a sync sys/ with head.
 1.18.8.1 26-Jun-2006  yamt sync with head.
 1.18.4.1 09-Sep-2006  rpaulo sync with head
 1.19.6.1 10-Dec-2006  yamt sync with head.
 1.19.4.2 01-Feb-2007  ad Sync with head.
 1.19.4.1 12-Jan-2007  ad Sync with head.
 1.20.4.1 29-Oct-2007  wrstuden Catch up with 4.0 RC3
 1.20.2.1 14-Oct-2007  xtraeme Pull up following revision(s) (requested by tron in ticket #932):
distrib/sets/lists/comp/mi: revision 1.990 (via patch)
sys/net/Makefile: revision 1.22 (via patch)

Install <net/if_pflog.h>.
 1.22.42.1 18-May-2008  yamt sync with head.
 1.22.40.1 02-Jun-2008  mjf Sync with HEAD.
 1.22.36.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.23.2.3 09-Oct-2010  yamt sync with head
 1.23.2.2 11-Aug-2010  yamt sync with head.
 1.23.2.1 20-Jun-2009  yamt sync with head
 1.25.14.1 23-Jul-2009  jym Sync with HEAD.
 1.27.4.2 05-Mar-2011  rmind sync with head
 1.27.4.1 03-Jul-2010  rmind sync with head
 1.27.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.27.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.29.8.1 30-Oct-2012  yamt sync with head
 1.30.2.2 03-Dec-2017  jdolecek update from HEAD
 1.30.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.32.22.1 21-Apr-2017  bouyer Sync with HEAD
 1.32.18.1 20-Mar-2017  pgoyette Sync with HEAD
 1.32.14.1 28-Aug-2017  skrll Sync with HEAD
 1.33.6.1 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.36.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.36.4.1 10-Jun-2019  christos Sync with HEAD
 1.36.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.36.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.38.6.2 29-Feb-2020  ad Sync with head.
 1.38.6.1 25-Jan-2020  ad Sync with head.
 1.44.8.1 31-May-2021  cjep sync with head
 1.44.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.6 10-Feb-1994  mycroft Clean up deleted files.
 1.5 10-Feb-1994  mycroft Deprecate af.h.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4 10-Feb-1994  mycroft Clean up deleted files.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.258 20-Oct-2024  mlelstv MBUFTRACE
 1.257 19-Aug-2024  ozaki-r bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.256 19-Aug-2024  ozaki-r bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
 1.255 15-Aug-2024  riastradh bpf(4): KNF whitespace fixes. No functional change intended.

Preparation for:

kern/58596: bpf(4) MP-safety issues
 1.254 15-Aug-2024  riastradh bpf(4): Sort includes. No functional change intended.

Preparation for:

kern/58596: bpf(4) MP-safety issues
 1.253 15-Aug-2024  rin bpf: Mark bpfread_filtops FILTEROP_MPSAFE

Fix deadlock for non-NET_MPSAFE kernel, reported as
PR kern/58531 (thanks manu@ for test).

I've confirmed that there is no new regression for ATF with
any combination of -HEAD/netbsd-10 and default/NET_MPSAFE
rump kernels (aarch64).

Although, some problems have been reported on MP-safety for
bpf(4), PR kern/58596. But, it should take some time to fix.
At the moment, commit this part in advance.

OK ozaki-r@
 1.252 31-Jul-2023  christos branches: 1.252.6;
Don't call versioned stuff "old". Follow the naming convention for versioning
and name them after the last version of the OS they appeared on.
 1.251 08-Feb-2023  gutteridge bpf.c: support loopback writes when BIOCSHDRCMPLT is set

Following changes in r. 1.249 "bpf: support sending packets on loopback
interfaces", also allow for this to succeed when the "header complete"
flag is set, which is the practice of some tools, e.g., tcpreplay and
Scapy. With this change, both of those example tools now work, e.g.,
Scapy passes "L3bpfSocket - send and sniff on loopback" in its test
suite.

There are several ways of addressing this issue; this commit is
intended to be the most conservative and consistent with the previous
changes. (E.g., FreeBSD instead has special handling of this condition
in its if_loop.c.)
 1.250 07-Feb-2023  gutteridge bpf.c: fix a few typos and grammatical issues in comments
 1.249 30-Nov-2022  ozaki-r branches: 1.249.2;
bpf: support sending packets on loopback interfaces

Previously sending packets on a loopback interface via bpf failed
because the packets are treated as AF_UNSPEC by bpf and the loopback
interface couldn't handle such packets.

This fix enables user programs to prepend a protocol family (AF_INET or
AF_INET6) to a payload. bpf interprets it and treats a packet as so,
not just AF_UNSPEC. The protocol family is encoded as 4 bytes, host byte
order as per DLT_NULL in the specification(*).

(*) https://www.tcpdump.org/linktypes.html

Proposed on tech-net and tech-kern
 1.248 19-Nov-2022  yamt bpf: refresh bd_pid in a few more places as well

This made "netstat -B" show hostapd and wpa_supplicant for me.

kingcrab# netstat -B
Active BPF peers
PID Int Recv Drop Capt Flags Bufsize Comm
433 urtwn0 102 0 2 I-RSH 524288 hostapd
211 urtwn0 102 0 4 I-RS- 32768 dhcpd
670 bwfm0 295 0 2 I-RSH 524288 wpa_supplicant
kingcrab#
 1.247 03-Sep-2022  riastradh bpf(4): Reject bogus timeout values before arithmetic overflows.

Reported-by: syzbot+fbd86bdf579944b64a98@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=60d46fd4863952897cbf67c6b1bcc8b20ec7bde6

XXX pullup-8
XXX pullup-9
 1.246 15-Mar-2022  riastradh bpf(4): Handle null bf_insn on free.

This is not guaranteed by bpf_setf to be nonnull.

Reported-by: syzbot+de1ec9471dfc2f283dda@syzkaller.appspotmail.com
 1.245 12-Mar-2022  riastradh bpf(4): Nix KM_NOSLEEP and prune dead branch.

https://syzkaller.appspot.com/bug?id=0fa7029d5565d9670a24c364d44bd116c76d7e7f
 1.244 12-Mar-2022  riastradh bpf(4): Clamp read timeout to INT_MAX ticks to avoid overflow.

Reported-by: syzbot+c543d35064d3492b9091@syzkaller.appspotmail.com
 1.243 26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.242 16-Sep-2021  andvar fix typos in word "successful".
 1.241 14-Jul-2021  yamaguchi unset IFF_PROMISC at bpf_detach()

Doing "d->bd_promisc = 0" is that bpf_detach() does not call
ifpromisc(ifp, 0). Currently, there is no reason for
this behavior so that it is removed.
In addition to the change, the workaround for it in vlan(4)
is also removed.
 1.240 09-Jun-2021  martin Add a bpf_register_track_event() function (and deregister equivalent)
that allows a driver to track listeners attaching/detaching from tap
points.

This is usefull for drivers that would have to do extra work for some
taps and can not easily decide (at the driver level) if the work would
be needed further up the stack.

An example is providing radiotap headers for IEEE 802.11 frames.
 1.239 18-Dec-2020  thorpej branches: 1.239.4;
Use sel{record,remove}_knote().
 1.238 02-Aug-2020  maxv branches: 1.238.2;
Use a more informative panic message.
 1.237 11-Jun-2020  roy bpf(4): Add ioctls BIOCSETWF and BIOCLOCK

Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underlying parameters
of the bpf(4) device. An example might be the setting of bpf(4) filter
programs or attaching to different network interfaces.

BIOCSETWF can be used to set write filters for outgoing packets.
Currently if a bpf(4) consumer is compromised, the bpf(4) descriptor can
essentially be used as a raw socket, regardless of consumer's UID.
Write filters give users the ability to constrain which packets can be sent
through the bpf(4) descriptor.

Taken from OpenBSD.
 1.236 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.235 07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.234 01-Feb-2020  riastradh Fix wrong memory order and switch bpf to atomic_load/store_*.
 1.233 19-Jan-2020  thorpej Stop including strip.h (it's no longer generated).
 1.232 29-Nov-2019  ryo branches: 1.232.2;
bpf can send a packet greater than MCLBYTES (JumboFrame) using multiple mbuf.
 1.231 13-Sep-2019  maxv As I suspected, the KASSERT I added yesterday can fire if we try to process
zero-sized packets. Skip them to prevent a type confusion that can trigger
random page faults later.

Reported-by: syzbot+3e447ebdcb2bcfa402ac@syzkaller.appspotmail.com
 1.230 12-Sep-2019  maxv Add KASSERT to catch bugs. Something tells me it could easily fire.
 1.229 10-Jul-2019  maxv branches: 1.229.2;
Fix info leak: use kmem_zalloc, because we align the buffers, and the
otherwise uninitialized padding bytes get copied to userland in bpf_read().
 1.228 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.227 25-Jul-2018  msaitoh Initialize some members in a mbuf which is on stack.
 1.226 26-Jun-2018  msaitoh branches: 1.226.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.225 25-Jun-2018  msaitoh Removal of bpf_tap().
 1.224 14-May-2018  ozaki-r Protect packet input routines with KERNEL_LOCK and splsoftnet

if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.

if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@
 1.223 25-Jan-2018  ozaki-r branches: 1.223.2;
Abandon unnecessary softint

The softint was introduced to defer fownsignal that was called in bpf_wakeup to
softint at v1.139, but now bpf_wakeup always runs in softint so we don't need
the softint anymore.
 1.222 15-Dec-2017  ozaki-r Make softint and callout MP-safe
 1.221 12-Dec-2017  ozaki-r Fix panic in callout_halt (fix typo)

Reported by wiz@
 1.220 30-Nov-2017  christos add fo_name so we can identify the fileops in a simple way.
 1.219 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.218 25-Oct-2017  maya Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.217 19-Oct-2017  ozaki-r Turn on D_MPSAFE flag of bpf_cdevsw that is already MP-safe

Pointed out by k-goda@IIJ
 1.216 20-Feb-2017  ozaki-r branches: 1.216.4; 1.216.6;
Reinit a pslist entry before inserting it to a pslist again

Fix PR kern/51984
Tested by nonaka@
 1.215 19-Feb-2017  christos typo
 1.214 13-Feb-2017  ozaki-r Update comments to reflect bpf MP-ification
 1.213 09-Feb-2017  ozaki-r Make bpf MP-safe

By the change, bpf_mtap can run without any locks as long as its bpf filter
doesn't match a target packet. Pushing data to a bpf buffer still needs
a lock. Removing the lock requires big changes and it's a future work.

Another known issue is that we need to remain some obsolete variables to
avoid breaking kvm(3) users such as netstat and fstat. One problem for
MP-ification is that in order to keep statistic counters of bpf_d we need
to use atomic operations for them. Once we retire the kvm(3) users, we
should make the counters per-CPU and remove the atomic operations.
 1.212 01-Feb-2017  ozaki-r Reduce return points
 1.211 01-Feb-2017  ozaki-r Kill tsleep/wakeup and use cv
 1.210 01-Feb-2017  ozaki-r Make bpf_gstats percpu
 1.209 01-Feb-2017  ozaki-r Use pslist(9) instead of queue(9) for psz/psref

As usual some member variables of struct bpf_d and bpf_if remain to avoid
breaking kvm(3) users (netstat and fstat).
 1.208 01-Feb-2017  ozaki-r Use kmem(9) instead of malloc/free
 1.207 01-Feb-2017  ozaki-r Make global variables static
 1.206 25-Jan-2017  ozaki-r Use bpf_ops for bpf_mtap_softint

By doing so we don't need to care whether a kernel enables bpfilter or not.
 1.205 24-Jan-2017  ozaki-r Defer bpf_mtap in Rx interrupt context to softint

bpf_mtap of some drivers is still called in hardware interrupt context.
We want to run them in softint as well as bpf_mtap of most drivers
(see if_percpuq_softint and if_input).

To this end, bpf_mtap_softint mechanism is implemented; it defers
bpf_mtap processing to a dedicated softint for a target driver.
By using the machanism, we can move bpf_mtap processing to softint
without changing target drivers much while it adds some overhead
on CPU and memory. Once target drivers are changed to softint-based,
we should return to normal bpf_mtap.

Proposed on tech-kern and tech-net
 1.204 23-Jan-2017  ozaki-r Make bpf_setf static
 1.203 19-Jul-2016  pgoyette branches: 1.203.2;
Fix regression introduced in tests/net/bpf and tests/net/bpfilter

The rump code needs to call devsw_attach() in order to assign a dev_major
for bpf; it then uses this to create rumps /dev/bpf node. Unfortunately,
this leaves the devsw attached, so when the bpf module tries to initialize
itself, it gets an EEXIST error and fails.

So, once rump has figured what the dev_major should be, call devsw_detach()
to remove the devsw. Then, when the module initialization code calls
devsw_attach() it will succeed.
 1.202 17-Jul-2016  pgoyette Now that we're only calling devsw_attach() in the modular driver, it
is not ok for the driver/module to already exist. So don't ignore
EEXIST.
 1.201 17-Jul-2016  pgoyette Don't initialize variables that no longer exist in built-in module.
 1.200 17-Jul-2016  pgoyette Don't try to call devsw_attach() for built-in driver code.
 1.199 20-Jun-2016  knakahara branches: 1.199.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.198 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.197 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.196 07-Jun-2016  pgoyette Create separate modules for i2c_bitbang and bpf_filter so these files
can be included in kernels which need them without also duplicating
them in other modules. Removes the duplicate symbols I found which
prevented loading i2c and bpf modules after having fixed PR 45125.
 1.195 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.194 01-Feb-2016  christos Do less work under the kernel lock, otherwise dhcpcd aborting causes us
to deadlock.
 1.193 16-Dec-2015  christos don't free mbuf twice.
XXX: pullup 7.
 1.192 14-Oct-2015  christos PR/49386: Ryota Ozaki: Add a mutex for bpf creation/removal to avoid races.
Add M_CANFAIL to malloc.
 1.191 30-May-2015  joerg Improve wording.
 1.190 29-Dec-2014  ozaki-r Remove unnecessary variable bc
 1.189 13-Sep-2014  rmind branches: 1.189.2;
PR/49190: bpf_deliver: set scratch memory store in bpf_args_t.
 1.188 05-Sep-2014  matt Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.187 07-Aug-2014  ozaki-r branches: 1.187.2;
Use NULL instead of 0 for pointers
 1.186 28-Jul-2014  alnsn Enable net.bpf.jit only if MODULAR and BPFJIT. Tweak a warning about postponed
jit activation.
 1.185 25-Jul-2014  dholland Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.184 10-Jul-2014  christos initialize args the same way we do in filter.
 1.183 24-Jun-2014  alnsn Implement copfuncs and external memory in bpfjit.
 1.182 16-Mar-2014  dholland branches: 1.182.2;
Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.181 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.180 05-Dec-2013  christos It is silly to kill the system when an interface failed to clear promiscuous
mode. Some return EINVAL when they are dying, but others like USB return EIO.
Downgrade to a DIAGNOSTIC printf. Same should be done for the malloc/NOWAIT,
but this is rarely hit.
 1.179 16-Nov-2013  rmind bpf_deliver: convert to bpf_filter_ext().
 1.178 15-Nov-2013  rmind - Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.177 18-Sep-2013  rmind Add bpf_filter_ext() to use with BPF COP, restore bpf_filter() as it was
originally to preserve compatibility. Similarly, add bpf_validate_ext()
which takes bpf_ctx_t.
 1.176 09-Sep-2013  christos PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
XXX: Pullup -6
 1.175 30-Aug-2013  rmind bpf_filter: add a custom argument which can be passed to coprocessor routine.
 1.174 29-Aug-2013  rmind Implement BPF_COP/BPF_COPX instructions in the misc category (BPF_MISC)
which add a capability to call external functions in a predetermined way.

It can be thought as a BPF "coprocessor" -- a generic mechanism to offload
more complex packet inspection operations. There is no default coprocessor
and this functionality is not targeted to the /dev/bpf. This is primarily
targeted to the kernel subsystems, therefore there is no way to set a custom
coprocessor at the userlevel.

Discussed on: tech-net@
OK: core@
 1.173 27-Oct-2012  alnsn branches: 1.173.2;
Add bpfjit and enable it for amd64.
 1.172 27-Sep-2012  alnsn Remove bpf_jit which was ported from FreeBSD recently.

It will soon be replaced with the new bpfjit kernel module.
 1.171 15-Aug-2012  alnsn branches: 1.171.2;
Fix two bugs introduced by recent commit.

- When handling contiguous buffer in _bpf_tap(), pass its real size
rather than 0 to avoid reading packet data as mbuf struct on
out-of-bounds loads.
- Correctly pass pktlen and buflen arguments from bpf_deliver() to
bpf_filter() to avoid reading mbuf struct as packet data.
JIT case is still broken.

Also, test pointers againts NULL.
 1.170 02-Aug-2012  rmind Build fix for some ports.
 1.169 01-Aug-2012  rmind Add BPF JIT compiler, currently supporting amd64 and i386. Code obtained
from FreeBSD. Also, make few BPF fixes and simplifications while here.
Note that bpf_jit_enable is false for now.

OK dyoung@, some feedback from matt@
 1.168 16-Dec-2011  christos branches: 1.168.2; 1.168.6; 1.168.8;
make comment reflect reality
 1.167 15-Dec-2011  christos don't leak mbufs.
 1.166 30-Aug-2011  bouyer branches: 1.166.2; 1.166.6;
Provide netbsd32 compat for bpf. Beside the ioctls, the structure
returned to userland by read(2) also needs to be converted.
For this, the bpf descriptor is flagged as compat32 (or not) in the
open and ioctl functions (where the user process's pid is also updated
in the descriptor). When the bpf buffer is filled in, the 32bits or native
header is used depending on the information stored in the descriptor.

This won't work if a 64bit binary does the open and ioctls, and then
exec a 32bit program which will do the read. But this is very
unlikely to happen in real life ...

Tested on i386 and loongson; with these changes my loongson can run
dhclient and tcpdump with a n32 userland.
 1.165 10-Jun-2011  christos setting things once is enough.
 1.164 30-Mar-2011  christos branches: 1.164.2;
lib/44807: something broken in stat(2), return that we are a character
device in st_mode.
 1.163 30-Mar-2011  bouyer Allocate buffers with (M_WAITOK | M_CANFAIL) instead of M_NOWAIT.
M_NOWAIT cause dhcpd on a low-memory server with lots of interfaces to
occasionally fail to start with ENOBUFS; (M_WAITOK | M_CANFAIL) seems to
fix this.
Tested on 3 different dhcp servers.
 1.162 22-Jan-2011  christos undo previous. Read the diff wrong.
 1.161 22-Jan-2011  christos fix comment
 1.160 02-Jan-2011  christos branches: 1.160.2; 1.160.4;
kern/44310: Alexander Nasonov: write to /dev/bpf truncates size_t to int
 1.159 08-Dec-2010  pooka linkset no more
 1.158 14-Apr-2010  pooka Add a little comment on how bpf can be made unloadable, per pointer from ad.
 1.157 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.156 13-Mar-2010  christos branches: 1.156.2;
add BIOC{G,S}FEEDBACK which allows one to receive injected outgoing packets
via bpf.
 1.155 26-Jan-2010  pooka branches: 1.155.2;
Include sys/atomic.h now that it's used but gets stealth-included
only on some archs.
 1.154 25-Jan-2010  pooka Make bpf dynamically loadable.
 1.153 19-Jan-2010  pooka Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.152 17-Jan-2010  pooka Forward declare struct bpf_if and use that as the type for bpf_if
instead of "void *". Buys us oo times the type-safety for 0 times
the price.
(no functional change)
 1.151 15-Jan-2010  pooka * remove just-for-kicks locking
* KNF
* remove outdated comment (quite a funny one to read in 2010, though)
 1.150 20-Dec-2009  dsl If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
 1.149 09-Dec-2009  dsl Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
 1.148 23-Nov-2009  rmind Remove some unecessary includes sys/user.h header.
 1.147 05-Oct-2009  christos add the error from ifpromisc to the panic.
 1.146 11-Apr-2009  christos Fix locking as Andy explained. Also fill in uid and gid like sys_pipe did.
 1.145 11-Apr-2009  christos Fix PR/37878 and PR/37550: Provide stat(2) for all devices and don't use
fbadop_stat.
 1.144 04-Apr-2009  ad Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)
 1.143 11-Mar-2009  mrg like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.
 1.142 11-Jan-2009  christos branches: 1.142.2;
merge christos-time_t
 1.141 15-Jun-2008  christos branches: 1.141.4; 1.141.6;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.140 21-May-2008  ad branches: 1.140.2;
Acquire kernel_lock in the bpf fileops.
 1.139 24-Apr-2008  ad branches: 1.139.2; 1.139.4;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.138 20-Apr-2008  scw Pull in a couple of fixes from FreeBSD, the first of which addresses a
failure of wpa_supplicant(8) to re-key promptly, as reported in
http://mail-index.netbsd.org/tech-net/2008/04/18/msg000459.html

- Make bpf's read timeout work more correctly with select/poll.

- A fix for catchpacket() which delays calling bpf_wakeup() until
the state has been updated.
 1.137 26-Mar-2008  christos branches: 1.137.2; 1.137.4;
- put const back, no reason to modify the prototype.
1. Please don't cast function pointers to (void *), use the full function
prototype cast; this is for archs where a function pointer is not a regular
pointer.
2. Compare pointers to NULL not 0.
 1.136 24-Mar-2008  yamt merge yamt-lazymbuf branch.
 1.135 21-Mar-2008  ad Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.134 01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.133 20-Feb-2008  matt branches: 1.133.2; 1.133.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.132 20-Dec-2007  dyoung Use LIST_FOREACH().
 1.131 05-Dec-2007  pooka branches: 1.131.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.130 11-Jul-2007  xtraeme branches: 1.130.6; 1.130.8; 1.130.14; 1.130.16;
Replace a simple lock with a mutex and make it static (as it's only used
on this file). Ok by ad@.
 1.129 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.128 30-May-2007  christos Move the nasty ifdefs in one place. Requested by ad and dyoung.
 1.127 29-May-2007  christos Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.126 04-Mar-2007  christos branches: 1.126.2; 1.126.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.125 16-Nov-2006  christos branches: 1.125.4;
__unused removal on arguments; approved by core.
 1.124 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.123 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.122 28-Aug-2006  christos branches: 1.122.2; 1.122.4;
add missing initializer
 1.121 04-Aug-2006  martin Fix typo in comment
 1.120 26-Jul-2006  christos Patch from Dheeraj S, inspired by the following FreeBSD change:

Rather than calling mircotime() in catchpacket(), make catchpacket()
take a timeval indicating when the packet was captured. Move
microtime() to the calling functions and grab the timestamp as soon
as we know that we're going to call catchpacket at least once.

This means that we call microtime() once per matched packet, as
opposed to once per matched packet per bpf listener. It also means
that we return the same timestamp to all bpf listeners, rather than
slightly different ones.

It would be more accurate to call microtime() even earlier for all
packets, as you have to grab (1+#listener) locks before you can
determine if the packet will be logged. You could always grab a
timestamp before the locks, but microtime() can be costly, so this
didn't seem like a good idea.

(I guess most ethernet interfaces will have a bpf listener these
days because of dhclient. That means that we could be doing two bpf
locks on most packets going through the interface.)
 1.119 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.118 27-Jun-2006  tron Make this build with GCC 4.x.
 1.117 14-May-2006  elad branches: 1.117.4;
integrate kauth.
 1.116 10-May-2006  mrg quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
 1.115 26-Dec-2005  rpaulo branches: 1.115.4; 1.115.6; 1.115.8; 1.115.10; 1.115.12;
Kill BPF_KERN_FILTER. Seems like it died with the new pppd import.
No replies from tech-kern@, but who introduced this option 8 years ago
(Christos) said it's ok to remove it.
 1.114 24-Dec-2005  perry Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.113 14-Dec-2005  rpaulo Correct typo in comments.
 1.112 11-Dec-2005  christos merge ktrace-lwp.
 1.111 05-Sep-2005  rpaulo Use ANSI function declarations everywhere and a consistent indentation on
them.
 1.110 04-Aug-2005  rpaulo Implemented the kernel part of BPF statistics and BPF peers, net.bpf.stats
and net.bpf.peers sysctls respectively.

A new structure was added to describe the external (user viewable)
representation of a BPF file; a new entry was added to the bpf_d
structure to store the PID of the calling process; a simple_lock was added
to protect the insert/removal from the net.bpf.peers sysctl handler.

This idea came from FreeBSD (Christian S.J. Peron) but while it is
implemented with sysctl's it differs a bit.

Reviewed by: christos@ and atatat@ (who gave me the tip for the net.bpf.peers
sysctl helper function).
 1.109 22-Jun-2005  peter branches: 1.109.2;
Missing m_freem() in bpf_write. PR/29138.
 1.108 20-Jun-2005  atatat Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
 1.107 26-Feb-2005  perry nuke trailing whitespace
 1.106 12-Feb-2005  christos pass the flag to fdclone.
 1.105 30-Nov-2004  christos branches: 1.105.4; 1.105.6;
Clonify bpf. I am not changing /dev/bpfX -> /dev/bpf until all userland
programs have been fixed.
 1.104 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.103 19-Aug-2004  christos - ansify
- remove unnecessary casts
- change caddr_t to void *
- no functional change.
 1.102 05-Aug-2004  enami Don't refuse to attach an interface even if it is down so that one can
capture the very first packet when an interface is up.
 1.101 06-Jun-2004  dyoung Per Matt Thomas' and Darren Reed's suggestions:

Add bpf_deliver prototype.

Rename bpf_measure to m_length and move it to sys/sys/mbuf.h. I
make m_length an inline function in the header file to preserve
its performance characteristics, for better or for worse.

Optimize m_length: use the length in m_pkthdr.len, if M_PKTHDR.

In bpf_deliver, zero the on-stack mbuf before we do anything else
with it.
 1.100 29-May-2004  darrenr back out previous change - these diffs aren't what I'd tested.
 1.99 29-May-2004  darrenr add mmap(2) interface to bpf(4) devices, along with BIOCMMAPINFO ioctl call
for applications to interact with the bpf device for the purpose of using
mmap to examinen captured data.
 1.98 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.97 19-May-2004  darrenr reapply a change that got undone with more recent changes to bpf to wakeup
any sleepers _after_ the device info has been updated, not before.
 1.96 30-Apr-2004  dyoung Add bpf_mtap2, which taps a packet whose head is in a void *buffer
and whose tail is in an mbuf chain.
 1.95 20-Apr-2004  darrenr If we timeout waiting for data on the bpf device, allow data in the current
storage buffer (bd_sbuf) to indicate that there is data present.
 1.94 15-Apr-2004  darrenr Add a count of the number of packets that match the bpf filter applied to a
particule device. In doing this, make a new the bpf_stat structure with
members that are u_long rather than u_int, matching the counters in the bpf_d.
the original bpf_stat is now bpf_stat_old and so to the original ioctl
is preserved as BIOCGSTATSOLD.
 1.93 14-Apr-2004  darrenr * from bpf 1.2a1, use the IO_NDELAY flag in bpfread() to indicate whether or
not a read operation should be allowed to sleep. This allows the use of
bd_rtout with a value of "-1" to be eliminated (signed comparison and
assignment to an unsigned long.)
* in 1.91, a change was introduced that had bpfpoll() returning POLLRDNORM
set when the timeout expired. This impacted poorly on performance as well
as causing select to return an fd available for reading when it wasn't.
Change the behaviour here to only allow the possibility of POLLIN being
returned as active in the event of a timeout.
 1.92 11-Apr-2004  darrenr from freebsd's kern/36219, the if expression in deciding whether or not
to return something check the value of bd_state in the wrong place.
 1.91 10-Apr-2004  darrenr Fix bpf so that select will return for a timeout (from FreeBSD.)

Fix the behaviour of BIOCIMMEDIATE (fix from LBL BPF code via FreeBSD.)

In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf (based on
similar change FreeBSD but fixes BIOC*SEESENT issue with that.)

Copy the implementation of BIOCSSEESENT, BIOCGSEESENT by FreeBSD.

Review Assistance: Guy Harris

PRs: kern/8674, kern/12170
 1.90 24-Mar-2004  atatat branches: 1.90.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.89 22-Jan-2004  jonathan Make bpf_maxbufsize writable via sysctl, as written by Andrew Brown.
 1.88 21-Jan-2004  jonathan Fix an Emacs finger-glitch (missing semicolon#).
 1.87 21-Jan-2004  jonathan Update bpf buffer parameters, as per recent discussion on tech-net.

Increase the default bpf buffer size used by naive apps that don't do
BIOCSBLEN, from 8k to 32k. The former value of 8192 is too small to
hold a normal jumbo Ethernet frame (circa 9k), 16k is a little small
for Large-jumbo (~16k) frames supported by newer gigabit
Ethernet/10Gbe, so (somewhat arbitrarily) increase the default to 32k.

Increase the upper limit to which BIOSBLEN can raise bpf buffer-size
drastically, to 1 Mbyte. State-of-the-art for packet capture circa
1999 was around 256k; savvy NetBSD developers now use 1 Mbyte.
Note that libpcap has been updated to do binary-search on BIOCSBLEN
values up to 1 Mbyte.

Work is in progress to make both values sysctl'able. Source comments
note that consensus on tech-net is that we should find some heuristic
to set the boot-time default values dynamically, based on system memory.
 1.86 22-Sep-2003  christos - pass signo to fownsignal [ok by jd]
- make urg signal handling use fownsignal
- remove out of band detection in sowakeup
 1.85 21-Sep-2003  jdolecek cleanup & uniform descriptor owner handling:
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl

change discussed on tech-kern@
 1.84 13-Aug-2003  wrstuden Include correct file for defopt.
 1.83 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.82 29-Jun-2003  fvdl branches: 1.82.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.81 28-Jun-2003  darrenr From OpenBSD 1.33-1.34:
When using bpf(4) in immediate mode, and using kevent(2) to receive
notification of packet arrival, the usermode application isn't notified
until a second packet arrives.

This is because KNOTE() calls filt_bpfread() before bd_slen has been
updated with the newly arrived packet length, so it looks like there
is no data there.

Moving the bpf_wakeup() call for immediate mode to after bd_slen is set
fixes it.

From: wayne@epipe.com.au in pr 3175
 1.80 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.79 19-Jun-2003  itojun avoid panic in malloc() under extremely low memory situation.
OpenBSD problem report 2235, 2236, 2640. fix by Otto Moerbeek.
 1.78 13-Mar-2003  dsl Check that the process/process group id passed to TIOCSPRP is in the session
of the current process.
 1.77 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.76 26-Nov-2002  christos si_ -> sel_
 1.75 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.74 25-Sep-2002  thorpej Don't include <sys/map.h>.
 1.73 24-Sep-2002  itojun backout recent changes, for PR 18392.
bpf_mtap() gets called with not-well-initialized mbuf, so we need to go through
it without touching m->m_pkthdr.len and such. it's part of our bpf_mtap() API
(at least today).
 1.72 19-Sep-2002  atatat Add a missing semi-colon.
 1.71 19-Sep-2002  darrenr For the trivial case where the packet is only in one mbuf, call bpf_tap()
(idea from FreeBSD) - alternative to changing bpf_filter() to be aware of
kernel calling convetion where 0 is passed as the length for mbufs.
 1.70 19-Sep-2002  darrenr If M_PKTHDR is set we can use m_pkthdr.len instead of the for loop.
 1.69 15-Sep-2002  thorpej In bpf_setdlt(), preserve the promiscuous mode setting of the
descriptor.

From David Young <dyoung@ojctech.com>, slight change by me.
 1.68 11-Sep-2002  itojun KNF - return is not a function.
 1.67 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.66 28-Aug-2002  onoe Define new kernel interface bpfattach2() to register another data link
type for the driver, which will be used for 802.11 drivers.
Also add 2 APIs to get a list of available DLTs and use one for them.
BIOCGDLTLIST (struct bpf_dltlist)
BIOCSDLT (u_int)
 1.65 06-Jun-2002  wrstuden defparam BPF_BUFSIZE
 1.64 23-Mar-2002  darrenr branches: 1.64.2;
If someone is poll'ing to write to bpf, assume that it can always be done
and include POLLOUT and POLLWRNORM in the returned events flag set.
Derived from FreeBSD.
 1.63 12-Nov-2001  lukem add RCSIDs
 1.62 10-Sep-2001  bjh21 Add MI Econet support. This is lacking any interfaces to higher-layer
protocols, and lacking any timeouts, but it basically works, doing four-way
handshakes in both directions and incoming Machine Peek operations.

Oh, and Econet is Acorn's ancient, proprietary 500kbit/s networking
technology.
 1.61 13-Apr-2001  thorpej branches: 1.61.2; 1.61.4;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.60 29-Dec-2000  thorpej branches: 1.60.2;
Fix non-blocking BPF reads, from Guy Harris, kern/11836.
 1.59 12-Dec-2000  thorpej Use <net/dlt.h> to get the DLT_* constants. Also change bpfattach()
and bpf_change_type() to take just a pointer to the ifnet, rather than
a pointer to the ifnet and a pointer to a member of the ifnet (the bpf
pointer).

We'll let this ride on the Dec 12 1.5N version bump.
 1.58 04-Jul-2000  thorpej Move ifpromimsc() to if.c
 1.57 28-May-2000  jhawk branches: 1.57.2;
Ensure that all callers of pfind() can deal with pfind(0) returning
a real procp* rather than NULL.
 1.56 28-May-2000  matt Fix bpf output on fddi to actually work. Make it compatible with ULTRIX
and Tru64.
 1.55 12-May-2000  jonathan branches: 1.55.2;
Make BPF_BUFSIZE overridable: 8192 is smaller than MTU of some devices.
TODO: defopt, or make sysctl'able (c.f. FreeBSD).
 1.54 12-Apr-2000  chs remove support for sunos and ancient BSDs.
 1.53 30-Mar-2000  augustss Kill some more register declarations.
 1.52 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.51 02-Feb-2000  enami Revoke bpf device on detach.
 1.50 02-Feb-2000  enami Since we are allowed to wait, no need to check the return value.
 1.49 02-Feb-2000  enami Remove duplicated forward declarations.
 1.48 31-Jan-2000  thorpej Implement bpfdetach().
 1.47 11-May-1999  thorpej branches: 1.47.2;
* Add the ability to change the data link type on the fly.
* Define two more data link types: NetBSD PPP-over-serial and NetBSD
PPP-over-Ethernet. (Different PPP encaps have different header formats!)
 1.46 04-Dec-1998  bouyer branches: 1.46.2; 1.46.6;
Init the decriptors at boot time rather than at interface attach time.
Now that we have pcmcia hot-plug, it's not the same. Fixes kern/3189.
 1.45 05-Nov-1998  jonathan Increase compiled-in default bpf buffer size from 4096 to 8192.
(the libpcap API provides no way to resize the inkernel buffe,r and
4096 is too small to capture maximum-sized FDDI frames.)
 1.44 18-Aug-1998  thorpej Add some braces to make egcs happy (ambiguous else warning).
 1.43 06-Aug-1998  perry Sigh. "consts in prototypes can be quite a drag..."
fix last two fixes one more time, this time dealing with ugly
prototype issues, including the fact that the bcopy returns nothing,
but memcpy returns a void *. Never mind that we don't use it...
 1.42 06-Aug-1998  perry Fix botched prototype decl in last fix.
 1.41 06-Aug-1998  perry Convert bcopy,bzero to memcpy,memset
This was semi-nontrivial, since a function pointer to bcopy gets used
in this file.
Note #1: The catchpacket routine, which takes a function pointer to
bpf_mcpy or memcpy, should probably be converted to take a
flag that just says which is used, so memcpy can be inlined.
Note #2: The code is heavily #ifdef'ed to run on older operating
systems. We probably want to clean that cruft out, unless
someone is planning a new release of the code at LBL (doubtful.)
 1.40 30-Apr-1998  thorpej Implement two new BPF ioctls: BPFGHDRCMPLT and BPFSHDRCMPLT, to get/set
the "header already complete" flag. This allows BPF writers to spoof
layer 2 source addresses (providing the layer 2 in use supports it) in
applications where this is necessary. From Greg Smith <greg@nas.nasa.gov>.
 1.39 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.38 12-Oct-1997  mycroft Do *not* free the mbuf chain we just created.
 1.37 09-Oct-1997  christos GC bd_sig
 1.36 09-Oct-1997  christos Sync with bpf-1.2a1
- whitespace
- add rcsid; our sccsid is newer than the one on 1.2a1.
- change prototype to add mtu
- change size_t to u_int for consistency.
- add alignment stuff in bpf_movein
- add more consistency checks bpf_movein
- use one uiomove and then bcopy the data in bpf_movein
- update the comment for the panic when ifpromisc fails.
- separate the case when we have non blocking I/O and
no data and return EWOULDBLOCK
- check for other errors and return them
- pass the mtu to bpf_movein
- Add the BPF_KERN_FILTER junk, just so that we keep up with the code
- remove BIOCSRSIG, BIOCGRSIG; SIGIO does this well.
- don't add the SIOCGIFADDR stuff (it is bogus)
- Check for malloc return for consistency.
- comment should say poll
- change formatting to match the current code.
- save and restore the pcount and flags in case we fail to set the
interface into promiscuous mode.
- fix spelling typo.
 1.35 17-Mar-1997  scottr branches: 1.35.4;
if_arc.h is in net, not netinet.
 1.34 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.33 21-Feb-1997  thorpej Don't let the read timeout get inadvertently rounded down to 0.
From John Hawkinson <jhawk@mit.edu>, PR #2531.
 1.32 13-Oct-1996  christos branches: 1.32.4;
backout previous kprintf change
 1.31 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.30 07-Sep-1996  mycroft Implement poll(2).
 1.29 14-Jun-1996  cgd avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.
 1.28 22-May-1996  mycroft Remove duplicate definition of bpf_setif().
 1.27 07-May-1996  thorpej Kill a couple of unnecessary calls to strlen().
 1.26 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.25 30-Mar-1996  christos Eliminate need for and remove net_conf.h
 1.24 13-Feb-1996  christos Net prototypes
 1.23 27-Sep-1995  thorpej Enhancements to the bpf from Stu Grossman <grossman@cygnus.com>:
* grok FIONBIO, FIOASYNC, and TIOC{G,S}PGRP
* add BIOC{G,S}RSIG; get/set the signal to be delivered
to the process or process group upon packet reception.
Defaults to SIGIO.
 1.22 13-Aug-1995  mycroft Don't pass through SIOCGIFADDR, per Steve McCanne.
 1.21 12-Aug-1995  mycroft splnet --> splsoftnet
 1.20 23-Jul-1995  mycroft For outgoing packets, always allocate a header mbuf and fill it in.
 1.19 22-Apr-1995  cgd copy routines should take size_t lengths for prototype consistency.
don't assume that tick is >= 1000; loses badly on alpha (div. by zero)
only try unaligned copies if NetBSD's UNALIGNED_ACCESS symbol is defined.
various misc type size cleanups, mostly short -> int16_t.
 1.18 22-Mar-1995  mycroft Fix panic when an interface in promiscuous mode goes down and the BPF user
tries to turn off promiscuous mode. From Lon Willett.
 1.17 23-Feb-1995  glass preliminary arcnet support. uses lame but RFC address resolution
 1.16 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.15 15-Jul-1994  cgd don't use inline, use __inline, like cdefs intends (so it can kill it if nongcc
 1.14 29-Jun-1994  cgd branches: 1.14.2;
this is what cdefs.h is for
 1.13 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.12 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.11 25-Jan-1994  deraadt new from mccanne. be afraid.
 1.10 12-Jan-1994  mycroft Get the pkthdr.len calculation right.
 1.9 12-Jan-1994  deraadt writing out of bpf; use a hdr mbuf and set the pkthdr.len as well.
(rarpd now works with if_ep.c!)
 1.8 18-Dec-1993  mycroft Canonicalize all #includes.
 1.7 23-Nov-1993  cgd defines change
 1.6 15-Nov-1993  deraadt add bpfilterattach(), as in magnum
 1.5 18-May-1993  cgd branches: 1.5.4;
make kernel select interface be one-stop shopping & clean it all up.
 1.4 09-Apr-1993  glass fixes stupid piece of bpf code that duplicates cdefs.h's handling of
'inline' in such a way as to cause stupid warnings.
 1.3 05-Apr-1993  deraadt selwakeup() takes a "pid_t" rather than "struct proc *" now.
 1.2 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.5.4.5 03-Dec-1993  mycroft Path from Andrew Moore <alm@netcom.com> to make sure the ether type field is
correct when sending raw packets.
 1.5.4.4 27-Nov-1993  mycroft Remove remaining sleep()s.
 1.5.4.3 23-Nov-1993  cgd defines change
 1.5.4.2 09-Oct-1993  mycroft Add dummy bpfilterattach() to make autoconfig happy.
 1.5.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.14.2.1 15-Jul-1994  cgd updates from trunk. basically, C language errors.
 1.32.4.3 12-Mar-1997  is Merge in changes from The Trunk
 1.32.4.2 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.32.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.35.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.46.6.1 21-Jun-1999  thorpej Sync w/ -current.
 1.46.2.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.47.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.47.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.47.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.47.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.55.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.57.2.1 25-Jan-2001  jhawk Pull up revision 1.60 (requested by thorpej):
Fix non-blocking BPF reads. Fixes PR kern/11836.
 1.60.2.9 11-Dec-2002  thorpej Sync with HEAD.
 1.60.2.8 11-Nov-2002  nathanw Catch up to -current
 1.60.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.60.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.60.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.60.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.60.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.60.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.60.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.61.4.3 01-Oct-2001  fvdl Catch up with -current.
 1.61.4.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.61.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.61.2.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.61.2.6 02-Oct-2002  jdolecek do not need the (void *) cast for kn_hook anymore
 1.61.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.61.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.61.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.61.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.61.2.1 08-Sep-2001  thorpej Add kqueue support.
 1.64.2.3 29-Aug-2002  gehenna catch up with -current.
 1.64.2.2 20-Jun-2002  gehenna catch up with -current.
 1.64.2.1 16-May-2002  gehenna Add the character device switch.
Replace the direct-access to devsw table with calling devsw APIs.
 1.82.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.82.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.82.2.8 15-Feb-2005  skrll Sync with HEAD.
 1.82.2.7 18-Dec-2004  skrll Sync with HEAD.
 1.82.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.82.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.82.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.82.2.3 12-Aug-2004  skrll Sync with HEAD.
 1.82.2.2 03-Aug-2004  skrll Sync with HEAD
 1.82.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.90.2.2 28-May-2004  tron Pull up revision 1.98 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.90.2.1 21-Apr-2004  jmc Pullup rev 1.91-1.95 (requested by darrenr in ticket #167)

Reduce bpf buffer to 32k from 1M to reduce kernel memory usage from userland
binaries.
Fix bpf so that select will return for a timeout.
Fix the behaviour of BIOCIMMEDIATE.
In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf.
Various other bpf fixes, including PR#8674, PR#12170
 1.105.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.105.4.1 29-Apr-2005  kent sync with -current
 1.109.2.9 24-Mar-2008  yamt sync with head.
 1.109.2.8 17-Mar-2008  yamt sync with head.
 1.109.2.7 27-Feb-2008  yamt sync with head.
 1.109.2.6 21-Jan-2008  yamt sync with head
 1.109.2.5 07-Dec-2007  yamt sync with head
 1.109.2.4 03-Sep-2007  yamt sync with head.
 1.109.2.3 30-Dec-2006  yamt sync with head.
 1.109.2.2 21-Jun-2006  yamt sync with head.
 1.109.2.1 07-Jul-2005  yamt de-constify mbuf.
 1.115.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.115.10.4 11-May-2006  elad sync with head
 1.115.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.115.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.115.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.115.8.3 03-Sep-2006  yamt sync with head.
 1.115.8.2 11-Aug-2006  yamt sync with head
 1.115.8.1 24-May-2006  yamt sync with head.
 1.115.6.1 01-Jun-2006  kardel Sync with head.
 1.115.4.1 09-Sep-2006  rpaulo sync with head
 1.117.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.122.4.2 10-Dec-2006  yamt sync with head.
 1.122.4.1 22-Oct-2006  yamt sync with head
 1.122.2.1 18-Nov-2006  ad Sync with head.
 1.125.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.126.4.1 11-Jul-2007  mjf Sync with head.
 1.126.2.5 15-Jul-2007  ad Sync with head.
 1.126.2.4 15-Jul-2007  ad Sync with head.
 1.126.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.126.2.2 09-Jun-2007  ad Sync with head.
 1.126.2.1 10-Apr-2007  ad Changes to select/poll:

- Make them MP safe and decouple from the proc locks.
- selwakeup: don't call p_find, or traverse per-proc LWP lists (ouch).
- selwakeup: don't lock the sleep queue unless we need to.
 1.130.16.2 26-Dec-2007  ad Sync with head.
 1.130.16.1 08-Dec-2007  ad Sync with head.
 1.130.14.2 27-Dec-2007  mjf Sync with HEAD.
 1.130.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.130.8.2 23-Mar-2008  matt sync with HEAD
 1.130.8.1 09-Jan-2008  matt sync with HEAD
 1.130.6.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.131.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.133.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.133.6.4 29-Jun-2008  mjf Sync with HEAD.
 1.133.6.3 02-Jun-2008  mjf Sync with HEAD.
 1.133.6.2 03-Apr-2008  mjf Sync with HEAD.
 1.133.6.1 29-Mar-2008  mjf - etc/devfsd.conf: Add some rules to give nodes like /dev/tty and
/dev/null better default modes, i.e. 0666.

- sbin/init: Run devfsd -s before going to multiuser.

- sys/arch: Provide arm32, i386, sparc with a mem_init() function to request
device nodes for /dev/null, /dev/zero, etc.

- sys/dev: Convert rnd, wd, agp, raid, cd, sd, wsdisplay, wskbd, wsmouse,
wsmux, tty, bpf, swap to devfs New World Order.

- sys/fs/devfs: Make the visibility attribute of device nodes configurable.
Also provide a function to mount a devfs on boot.

- sys/kern: Add a new boot flag, -n. This disables devfs support. Unless
the -n flag is specified the kernel will mount a devfs file
system on boot.
 1.133.2.1 24-Mar-2008  keiichi sync with head.
 1.137.4.3 17-Jun-2008  yamt sync with head.
 1.137.4.2 04-Jun-2008  yamt sync with head
 1.137.4.1 18-May-2008  yamt sync with head.
 1.137.2.3 28-Dec-2008  christos back to usecs now for source compatibility
 1.137.2.2 01-Nov-2008  christos Sync with head.
 1.137.2.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.139.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.139.2.3 11-Aug-2010  yamt sync with head.
 1.139.2.2 11-Mar-2010  yamt sync with head
 1.139.2.1 04-May-2009  yamt sync with head.
 1.140.2.1 18-Jun-2008  simonb Sync with head.
 1.141.6.3 11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1874):
sys/net/bpf.c: revision 1.176 via patch
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.141.6.2 05-Apr-2011  riz branches: 1.141.6.2.2;
Pull up following revision(s) (requested by bouyer in ticket #1587):
sys/net/bpf.c: revision 1.163
Allocate buffers with (M_WAITOK | M_CANFAIL) instead of M_NOWAIT.
M_NOWAIT cause dhcpd on a low-memory server with lots of interfaces to
occasionally fail to start with ENOBUFS; (M_WAITOK | M_CANFAIL) seems to
fix this.
Tested on 3 different dhcp servers.
 1.141.6.1 04-Apr-2009  snj branches: 1.141.6.1.6;
Pull up following revision(s) (requested by ad in ticket #661):
sys/arch/xen/xen/xenevt.c: revision 1.32
sys/compat/svr4/svr4_net.c: revision 1.56
sys/compat/svr4_32/svr4_32_net.c: revision 1.19
sys/dev/dmover/dmover_io.c: revision 1.32
sys/dev/putter/putter.c: revision 1.21
sys/kern/kern_descrip.c: revision 1.190
sys/kern/kern_drvctl.c: revision 1.23
sys/kern/kern_event.c: revision 1.64
sys/kern/sys_mqueue.c: revision 1.14
sys/kern/sys_pipe.c: revision 1.109
sys/kern/sys_socket.c: revision 1.59
sys/kern/uipc_syscalls.c: revision 1.136
sys/kern/vfs_vnops.c: revision 1.164
sys/kern/uipc_socket.c: revision 1.188
sys/net/bpf.c: revision 1.144
sys/net/if_tap.c: revision 1.55
sys/opencrypto/cryptodev.c: revision 1.47
sys/sys/file.h: revision 1.67
sys/sys/param.h: patch
sys/sys/socketvar.h: revision 1.119
Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
 1.141.6.2.2.1 11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1874):
sys/net/bpf.c: revision 1.176 via patch
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.141.6.1.6.1 11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1874):
sys/net/bpf.c: revision 1.176 via patch
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.141.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.141.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.142.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.155.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.156.2.4 12-Jun-2011  rmind sync with head
 1.156.2.3 21-Apr-2011  rmind sync with head
 1.156.2.2 05-Mar-2011  rmind sync with head
 1.156.2.1 30-May-2010  rmind sync with head
 1.160.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.160.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.164.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.166.6.1 18-Feb-2012  mrg merge to -current.
 1.166.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.166.2.2 30-Oct-2012  yamt sync with head
 1.166.2.1 17-Apr-2012  yamt sync with head
 1.168.8.1 11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #941):
sys/net/bpf.c: revision 1.176
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.168.6.1 11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #941):
sys/net/bpf.c: revision 1.176
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.168.2.1 11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #941):
sys/net/bpf.c: revision 1.176
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.171.2.3 03-Dec-2017  jdolecek update from HEAD
 1.171.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.171.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.173.2.1 18-May-2014  rmind sync with head
 1.182.2.1 10-Aug-2014  tls Rebase.
 1.187.2.1 21-Sep-2014  snj Pull up following revision(s) (requested by rmind in ticket #106):
sys/net/bpf.c: revision 1.189
PR/49190: bpf_deliver: set scratch memory store in bpf_args_t.
 1.189.2.8 28-Aug-2017  skrll Sync with HEAD
 1.189.2.7 05-Feb-2017  skrll Sync with HEAD
 1.189.2.6 05-Oct-2016  skrll Sync with HEAD
 1.189.2.5 09-Jul-2016  skrll Sync with HEAD
 1.189.2.4 19-Mar-2016  skrll Sync with HEAD
 1.189.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.189.2.2 06-Jun-2015  skrll Sync with HEAD
 1.189.2.1 06-Apr-2015  skrll Sync with HEAD
 1.199.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.199.2.3 26-Jul-2016  pgoyette Rename LOCALCOUNT_INITIALIZER to DEVSW_MODULE_INIT. This better describes
what we're doing, and why.
 1.199.2.2 19-Jul-2016  pgoyette Instead of repeatedly typing the conditional initialization of the
.d_localcount members in the various {b,c}devsw, define an initializer
macro and use it. This also removes the need for defining new symbols
for each 'struct localcount'.

As suggested by riastradh@
 1.199.2.1 17-Jul-2016  pgoyette Adapt some modular drivers to the localcount(9) world. We're still
not actually using the localcount stuff, but we need to differentiate
between built-in vs loaded drivers and allocate a "struct localcount"
only for loaded drivers.
 1.203.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.216.6.9 04-Aug-2023  martin Apply patch, requested by ozaki-r in ticket #1885:

sys/net/bpf.c (apply patch)

bpf: allow to read with no filter (regressed at revision 1.213,
fixed differently in -current)
 1.216.6.8 22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1802):

sys/net/bpf.c: revision 1.247 (manually merged)

bpf(4): Reject bogus timeout values before arithmetic overflows.
 1.216.6.7 04-Aug-2019  martin Pull up following revision(s) (requested by maxv in ticket #1323):

sys/net/bpf.c: revision 1.229

Fix info leak: use kmem_zalloc, because we align the buffers, and the
otherwise uninitialized padding bytes get copied to userland in bpf_read().
 1.216.6.6 15-May-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #826):

sys/net/if_bridge.c: revision 1.155
sys/net/if.c: revision 1.421
sys/net/bpf.c: revision 1.224
sys/net/if.c: revision 1.422
sys/net/if.c: revision 1.423

Use if_is_mpsafe (NFC)

Protect packet input routines with KERNEL_LOCK and splsoftnet
if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.
if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect
non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@

Protect if_deferred_start_softint with KERNEL_LOCK if the interface isn't
MP-safe
 1.216.6.5 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #526):
sys/net/bpfdesc.h: revision 1.45
sys/net/bpf.c: revision 1.223
Abandon unnecessary softint
The softint was introduced to defer fownsignal that was called in bpf_wakeup to
softint at v1.139, but now bpf_wakeup always runs in softint so we don't need
the softint anymore.
 1.216.6.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.216.6.3 21-Dec-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #454):
sys/net/bpf.c: revision 1.222
Make softint and callout MP-safe
 1.216.6.2 21-Dec-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #446):
sys/net/bpf.c: revision 1.221
Fix panic in callout_halt (fix typo)
Reported by wiz@
 1.216.6.1 25-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #329):
sys/net/bpf.c: revision 1.217
Turn on D_MPSAFE flag of bpf_cdevsw that is already MP-safe
Pointed out by k-goda@IIJ
 1.216.4.2 29-Apr-2017  pgoyette Remove more unnecessary #include for sys/localcount.h
 1.216.4.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.223.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.223.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.223.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.223.2.1 21-May-2018  pgoyette Sync with HEAD
 1.226.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.226.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.226.2.1 10-Jun-2019  christos Sync with HEAD
 1.229.2.4 13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1886):

sys/net/bpfdesc.h: revision 1.49
sys/net/bpf.c: revision 1.256
sys/net/bpf.c: revision 1.257
sys/net/bpfdesc.h: revision 1.50

bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.229.2.3 04-Aug-2023  martin Apply patch, requested by ozaki-r in ticket #1708:

sys/net/bpf.c (apply patch)

bpf: allow to read with no filter (regressed at revision 1.213,
fixed differently in -current)
 1.229.2.2 22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1605):

sys/net/bpf.c: revision 1.247 (manually merged)

bpf(4): Reject bogus timeout values before arithmetic overflows.
 1.229.2.1 16-Oct-2019  martin Pull up following revision(s) (requested by maxv in ticket #335):

sys/net/bpf.c: revision 1.230
sys/net/bpf.c: revision 1.231

Add KASSERT to catch bugs. Something tells me it could easily fire.

-

As I suspected, the KASSERT I added yesterday can fire if we try to process
zero-sized packets. Skip them to prevent a type confusion that can trigger
random page faults later.
 1.232.2.2 29-Feb-2020  ad Sync with head.
 1.232.2.1 25-Jan-2020  ad Sync with head.
 1.238.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.239.4.2 01-Aug-2021  thorpej Sync with HEAD.
 1.239.4.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.249.2.3 13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #858):

sys/net/bpfdesc.h: revision 1.49
sys/net/bpf.c: revision 1.256
sys/net/bpf.c: revision 1.257
sys/net/bpfdesc.h: revision 1.50

bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.249.2.2 22-Aug-2024  martin Pull up following revision(s) (requested by rin in ticket #784):

sys/net/bpf.c: revision 1.253

bpf: Mark bpfread_filtops FILTEROP_MPSAFE

Fix deadlock for non-NET_MPSAFE kernel, reported as
PR kern/58531 (thanks manu@ for test).

I've confirmed that there is no new regression for ATF with
any combination of -HEAD/netbsd-10 and default/NET_MPSAFE
rump kernels (aarch64).

Although, some problems have been reported on MP-safety for
bpf(4), PR kern/58596. But, it should take some time to fix.

At the moment, commit this part in advance.
OK ozaki-r@
 1.249.2.1 24-Feb-2023  martin Pull up following revision(s) (requested by gutteridge in ticket #103):

sys/net/bpf.c: revision 1.251

bpf.c: support loopback writes when BIOCSHDRCMPLT is set

Following changes in r. 1.249 "bpf: support sending packets on loopback
interfaces", also allow for this to succeed when the "header complete"
flag is set, which is the practice of some tools, e.g., tcpreplay and
Scapy. With this change, both of those example tools now work, e.g.,
Scapy passes "L3bpfSocket - send and sniff on loopback" in its test
suite.

There are several ways of addressing this issue; this commit is
intended to be the most conservative and consistent with the previous
changes. (E.g., FreeBSD instead has special handling of this condition
in its if_loop.c.)
 1.252.6.1 02-Aug-2025  perseant Sync with HEAD
 1.82 23-Aug-2023  rin bpf: Fix SIZEOF_BPF_HDR (for LP64 userland) on mips64

It cannot fit within 18 bytes, of course ;)

As we had never provided working bpf(4) implementation for LP64
userland on mips, just use natural structure size here.
 1.81 17-Aug-2023  christos add new for libpcap.
 1.80 31-Jul-2023  christos put back compat names, should be removed from the sanitizers
 1.79 31-Jul-2023  christos Don't call versioned stuff "old". Follow the naming convention for versioning
and name them after the last version of the OS they appeared on.
 1.78 20-Jun-2022  yamaguchi branches: 1.78.4;
bpf(4): added support for VLAN hardware offloading of ethernet devices
 1.77 09-Jun-2021  martin Add a bpf_register_track_event() function (and deregister equivalent)
that allows a driver to track listeners attaching/detaching from tap
points.

This is usefull for drivers that would have to do extra work for some
taps and can not easily decide (at the driver level) if the work would
be needed further up the stack.

An example is providing radiotap headers for IEEE 802.11 frames.
 1.76 09-Jun-2021  martin Add a (FreeBSD compatible) bpf_peers_present() predicate to allow
testing for active listeners on a tap.
 1.75 11-Jun-2020  roy branches: 1.75.6;
bpf(4): Add ioctls BIOCSETWF and BIOCLOCK

Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underlying parameters
of the bpf(4) device. An example might be the setting of bpf(4) filter
programs or attaching to different network interfaces.

BIOCSETWF can be used to set write filters for outgoing packets.
Currently if a bpf(4) consumer is compromised, the bpf(4) descriptor can
essentially be used as a raw socket, regardless of consumer's UID.
Write filters give users the ability to constrain which packets can be sent
through the bpf(4) descriptor.

Taken from OpenBSD.
 1.74 26-Feb-2019  msaitoh Whitespace change.
 1.73 03-Sep-2018  christos Add definitions from libpcap-1.9.0
 1.72 26-Jun-2018  msaitoh branches: 1.72.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.71 25-Jun-2018  msaitoh Removal of bpf_tap().
 1.70 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.69 25-Jan-2017  ozaki-r branches: 1.69.12;
Use bpf_ops for bpf_mtap_softint

By doing so we don't need to care whether a kernel enables bpfilter or not.
 1.68 24-Jan-2017  ozaki-r Defer bpf_mtap in Rx interrupt context to softint

bpf_mtap of some drivers is still called in hardware interrupt context.
We want to run them in softint as well as bpf_mtap of most drivers
(see if_percpuq_softint and if_input).

To this end, bpf_mtap_softint mechanism is implemented; it defers
bpf_mtap processing to a dedicated softint for a target driver.
By using the machanism, we can move bpf_mtap processing to softint
without changing target drivers much while it adds some overhead
on CPU and memory. Once target drivers are changed to softint-based,
we should return to normal bpf_mtap.

Proposed on tech-kern and tech-net
 1.67 05-Sep-2015  dholland branches: 1.67.2; 1.67.4;
Uses _IOR/_IOW/etc. and thus needs sys/ioccom.h. PR 41200
 1.66 19-Nov-2014  christos branches: 1.66.2;
Add BPF_MOD/BPF_XOR, sync DLT entries and document unused bpf instructions.
From libpcap-1.6.2
 1.65 24-Jun-2014  rmind - Improve the comments in bpf.h and KNF a little.
- Rename bpf_ctx_t member noinit to preinited (reflects the meaning better).
 1.64 24-Jun-2014  alnsn Implement copfuncs and external memory in bpfjit.
 1.63 15-Nov-2013  rmind branches: 1.63.2;
- Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.62 18-Sep-2013  rmind Add bpf_filter_ext() to use with BPF COP, restore bpf_filter() as it was
originally to preserve compatibility. Similarly, add bpf_validate_ext()
which takes bpf_ctx_t.
 1.61 30-Aug-2013  rmind bpf_filter: add a custom argument which can be passed to coprocessor routine.
 1.60 29-Aug-2013  rmind Implement BPF_COP/BPF_COPX instructions in the misc category (BPF_MISC)
which add a capability to call external functions in a predetermined way.

It can be thought as a BPF "coprocessor" -- a generic mechanism to offload
more complex packet inspection operations. There is no default coprocessor
and this functionality is not targeted to the /dev/bpf. This is primarily
targeted to the kernel subsystems, therefore there is no way to set a custom
coprocessor at the userlevel.

Discussed on: tech-net@
OK: core@
 1.59 15-Mar-2012  christos branches: 1.59.2; 1.59.4;
add {__BEGIN,__END}_DECLS
 1.58 30-Aug-2011  bouyer branches: 1.58.2; 1.58.6; 1.58.8;
Provide netbsd32 compat for bpf. Beside the ioctls, the structure
returned to userland by read(2) also needs to be converted.
For this, the bpf descriptor is flagged as compat32 (or not) in the
open and ioctl functions (where the user process's pid is also updated
in the descriptor). When the bpf buffer is filled in, the 32bits or native
header is used depending on the information stored in the descriptor.

This won't work if a 64bit binary does the open and ioctls, and then
exec a 32bit program which will do the read. But this is very
unlikely to happen in real life ...

Tested on i386 and loongson; with these changes my loongson can run
dhclient and tcpdump with a n32 userland.
 1.57 05-Dec-2010  christos make bpf_validate available in userland.
 1.56 05-Dec-2010  christos constify
 1.55 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.54 13-Mar-2010  christos branches: 1.54.2;
add BIOC{G,S}FEEDBACK which allows one to receive injected outgoing packets
via bpf.
 1.53 25-Jan-2010  pooka branches: 1.53.2;
Make bpf dynamically loadable.
 1.52 19-Jan-2010  pooka Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.51 17-Jan-2010  pooka Forward declare struct bpf_if and use that as the type for bpf_if
instead of "void *". Buys us oo times the type-safety for 0 times
the price.
(no functional change)
 1.50 13-Jan-2009  christos restore binary compatibility on 64 bit systems.
 1.49 11-Jan-2009  christos merge christos-time_t
 1.48 10-Dec-2005  elad branches: 1.48.70; 1.48.72; 1.48.76; 1.48.86;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.47 05-Dec-2005  rpaulo Make the bpf_maxbufsize a constant when bpfilter pseudo-device is not
present in the kernel config, thus fixing the build. Problem reported
by Havard Eidnes. Solution proposed by Christos, thanks.
 1.46 30-Nov-2005  rpaulo PR 32198: bpf_validate() needs to do more checks, from Otto Moerbeek/OpenBSD
via Guy Harris.
Problems like out-of-bounds read/write in filter machine operations
were fixed.
 1.45 30-Nov-2005  rpaulo Replace u_intXX_t by their C99 counterparts.
 1.44 30-Nov-2005  rpaulo Fix typo in comment found by Guy Harris (PR 32198).
 1.43 04-Aug-2005  rpaulo Implemented the kernel part of BPF statistics and BPF peers, net.bpf.stats
and net.bpf.peers sysctls respectively.

A new structure was added to describe the external (user viewable)
representation of a BPF file; a new entry was added to the bpf_d
structure to store the PID of the calling process; a simple_lock was added
to protect the insert/removal from the net.bpf.peers sysctl handler.

This idea came from FreeBSD (Christian S.J. Peron) but while it is
implemented with sysctl's it differs a bit.

Reviewed by: christos@ and atatat@ (who gave me the tip for the net.bpf.peers
sysctl helper function).
 1.42 26-Feb-2005  perry branches: 1.42.4;
nuke trailing whitespace
 1.41 19-Aug-2004  christos branches: 1.41.4; 1.41.6;
Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.40 19-Aug-2004  christos - ansify
- remove unnecessary casts
- change caddr_t to void *
- no functional change.
 1.39 29-May-2004  darrenr back out previous change - these diffs aren't what I'd tested.
 1.38 29-May-2004  darrenr add mmap(2) interface to bpf(4) devices, along with BIOCMMAPINFO ioctl call
for applications to interact with the bpf device for the purpose of using
mmap to examinen captured data.
 1.37 30-Apr-2004  dyoung Add bpf_mtap2, which taps a packet whose head is in a void *buffer
and whose tail is in an mbuf chain.
 1.36 15-Apr-2004  darrenr don't use u_long in an ioctl, rather, u_int64_t so there are no long 32/64
bit compatibility problems. bump kernel version like it should have been.
 1.35 15-Apr-2004  darrenr Add a count of the number of packets that match the bpf filter applied to a
particule device. In doing this, make a new the bpf_stat structure with
members that are u_long rather than u_int, matching the counters in the bpf_d.
the original bpf_stat is now bpf_stat_old and so to the original ioctl
is preserved as BIOCGSTATSOLD.
 1.34 10-Apr-2004  darrenr Fix bpf so that select will return for a timeout (from FreeBSD.)

Fix the behaviour of BIOCIMMEDIATE (fix from LBL BPF code via FreeBSD.)

In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf (based on
similar change FreeBSD but fixes BIOC*SEESENT issue with that.)

Copy the implementation of BIOCSSEESENT, BIOCGSEESENT by FreeBSD.

Review Assistance: Guy Harris

PRs: kern/8674, kern/12170
 1.33 22-Jan-2004  jonathan branches: 1.33.2;
Make bpf_maxbufsize writable via sysctl, as written by Andrew Brown.
 1.32 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.31 21-Sep-2002  thorpej branches: 1.31.6;
Nuke the old SunOS-style ioctl defns.
 1.30 28-Aug-2002  onoe Define new kernel interface bpfattach2() to register another data link
type for the driver, which will be used for 802.11 drivers.
Also add 2 APIs to get a list of available DLTs and use one for them.
BIOCGDLTLIST (struct bpf_dltlist)
BIOCSDLT (u_int)
 1.29 14-Dec-2001  thorpej branches: 1.29.8;
Use __sh__ instead of __sh3__.
 1.28 12-Dec-2000  thorpej branches: 1.28.2; 1.28.4;
Use <net/dlt.h> to get the DLT_* constants. Also change bpfattach()
and bpf_change_type() to take just a pointer to the ifnet, rather than
a pointer to the ifnet and a pointer to a member of the ifnet (the bpf
pointer).

We'll let this ride on the Dec 12 1.5N version bump.
 1.27 11-Nov-2000  thorpej Pull in <sys/time.h>, since we use timevals here.
 1.26 02-Nov-2000  eeh Fix sparc64 LP64 issues.
 1.25 31-Jan-2000  thorpej branches: 1.25.4;
Implement bpfdetach().
 1.24 13-Sep-1999  itojun branches: 1.24.2;
Merge in NetBSD/sh3 from cvs.kame.net repository.

Tree structure:
- sys/arch/sh3: sh3 generic code
As commented, in-chip device drivers are put into sys/arch/sh3/dev.
- sys/arch/evbsh3: sh3 evaluation boards (pure sh3 CPU, no fancy external HW)
- sys/arch/mmeye: Brains mmEye, www.brains.co.jp
MI source code includes couple of #ifdef for sh3-coff support.
(sh3 uses coff or elf)

Needs some more improvements, especialy in sys/arch/sh3/conf/files.sh3,
to compile the tree (due to last minute tree structure change).
 1.23 11-May-1999  thorpej * Add the ability to change the data link type on the fly.
* Define two more data link types: NetBSD PPP-over-serial and NetBSD
PPP-over-Ethernet. (Different PPP encaps have different header formats!)
 1.22 25-Jul-1998  explorer branches: 1.22.10;
define DLT_HDLC
 1.21 14-May-1998  kml Driver for Essential Communications' RoadRunner HIPPI (800 Mb/sec network)
card. With some modification, this could probably also work for their
Gigabit Ethernet card based on the same chipset...
 1.20 30-Apr-1998  thorpej Implement two new BPF ioctls: BPFGHDRCMPLT and BPFSHDRCMPLT, to get/set
the "header already complete" flag. This allows BPF writers to spoof
layer 2 source addresses (providing the layer 2 in use supports it) in
applications where this is necessary. From Greg Smith <greg@nas.nasa.gov>.
 1.19 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.18 10-Oct-1997  christos Add definitions for bpf_int and bpf_u_int; these are not used in the kernel,
but libpcap expects them if we advertize our current BPF_VERSION.
 1.17 09-Oct-1997  christos sync with bpf-1.2a1
- fix whitespace
- add rcsid
- add BPF_RELEASE define
- add BIOCSTCPF BIOCSUDPF
 1.16 03-Oct-1997  christos - Add some new data link types from libpcap-0.4a3
- bpf_filter() does exist in userland
 1.15 13-Dec-1996  mikel branches: 1.15.10;
add ATM data-link type; reqd. for libpcap.
 1.14 02-May-1996  cgd On new architectures and on the alpha, define SIZEOF_BPF_HDR to be
sizeof(struct bpf_hdr). On machines that we currently support that
can use the old definition (which just covers the size of the data in
struct bpf_hdr), use it even though it's a hack. (This was changed
for the 'new architectures' case so as to be fail-safe; BPF may
waste a few bytes of space per captured packet on new architectures,
but now at least it's more likely to work.)
 1.13 13-Feb-1996  christos Net prototypes
 1.12 27-Sep-1995  thorpej Enhancements to the bpf from Stu Grossman <grossman@cygnus.com>:
* grok FIONBIO, FIOASYNC, and TIOC{G,S}PGRP
* add BIOC{G,S}RSIG; get/set the signal to be delivered
to the process or process group upon packet reception.
Defaults to SIGIO.
 1.11 22-Apr-1995  cgd copy routines should take size_t lengths for prototype consistency.
don't assume that tick is >= 1000; loses badly on alpha (div. by zero)
only try unaligned copies if NetBSD's UNALIGNED_ACCESS symbol is defined.
various misc type size cleanups, mostly short -> int16_t.
 1.10 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.9 06-Mar-1995  mycroft Make this more type-safe for the Alpha. From the libpcap 0.0 distribution.
(Needs more work.)
 1.8 13-Jan-1995  jtc Protect from multiple inclusion with _NET_BPF_H_, for PR #679.
 1.7 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.3 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.2 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.15.10.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.22.10.1 21-Jun-1999  thorpej Sync w/ -current.
 1.24.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.24.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.24.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.25.4.1 03-Nov-2000  tv Pullup 1.26 [eeh]:
Fix sparc64 LP64 issues.
 1.28.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.28.4.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.28.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.28.2.3 18-Oct-2002  nathanw Catch up to -current.
 1.28.2.2 17-Sep-2002  nathanw Catch up to -current.
 1.28.2.1 08-Jan-2002  nathanw Catch up to -current.
 1.29.8.1 29-Aug-2002  gehenna catch up with -current.
 1.31.6.7 11-Dec-2005  christos Sync with head.
 1.31.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.31.6.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.31.6.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.31.6.3 18-Sep-2004  skrll Sync with HEAD.
 1.31.6.2 25-Aug-2004  skrll Sync with HEAD.
 1.31.6.1 03-Aug-2004  skrll Sync with HEAD
 1.33.2.1 21-Apr-2004  jmc Pullup rev 1.34-1.36 (requested by darrenr in ticket #167)

Reduce bpf buffer to 32k from 1M to reduce kernel memory usage from userland
binaries.
Fix bpf so that select will return for a timeout.
Fix the behaviour of BIOCIMMEDIATE.
In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf.
Various other bpf fixes, including PR#8674, PR#12170
 1.41.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.41.4.1 29-Apr-2005  kent sync with -current
 1.42.4.1 21-Jun-2006  yamt sync with head.
 1.48.86.1 19-Jan-2009  skrll Sync with HEAD.
 1.48.76.3 11-Aug-2010  yamt sync with head.
 1.48.76.2 11-Mar-2010  yamt sync with head
 1.48.76.1 04-May-2009  yamt sync with head.
 1.48.72.3 30-Dec-2008  christos need to burn more numbers since sizeof(timeval50) == sizeof(timeval) on 64
bit archs.
 1.48.72.2 28-Dec-2008  christos back to usecs now for source compatibility
 1.48.72.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.48.70.1 17-Jan-2009  mjf Sync with HEAD.
 1.53.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.54.2.2 05-Mar-2011  rmind sync with head
 1.54.2.1 30-May-2010  rmind sync with head
 1.58.8.1 12-Jun-2012  riz Pull up following revision(s) (requested by abs in ticket #312):
sys/net/bpf.h: revision 1.59
add {__BEGIN,__END}_DECLS
 1.58.6.1 05-Apr-2012  mrg sync to latest -current.
 1.58.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.58.2.1 17-Apr-2012  yamt sync with head
 1.59.4.1 18-May-2014  rmind sync with head
 1.59.2.2 03-Dec-2017  jdolecek update from HEAD
 1.59.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.63.2.1 10-Aug-2014  tls Rebase.
 1.66.2.2 05-Feb-2017  skrll Sync with HEAD
 1.66.2.1 22-Sep-2015  skrll Sync with HEAD
 1.67.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.67.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.69.12.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.69.12.3 28-Jul-2018  pgoyette Sync with HEAD
 1.69.12.2 25-Jun-2018  pgoyette Sync with HEAD
 1.69.12.1 22-Apr-2018  pgoyette Sync with HEAD
 1.72.2.1 10-Jun-2019  christos Sync with HEAD
 1.75.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.78.4.1 13-Sep-2023  martin Pull up following revision(s) (requested by rin in ticket #373):

sys/net/bpf.h: revision 1.82

bpf: Fix SIZEOF_BPF_HDR (for LP64 userland) on mips64

It cannot fit within 18 bytes, of course ;)

As we had never provided working bpf(4) implementation for LP64
userland on mips, just use natural structure size here.
 1.2 01-Mar-1998  fvdl Remove extraneous files from Lite2 merge.
 1.1 01-Mar-1998  fvdl branches: 1.1.1;
Initial revision
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.73 02-Sep-2024  christos merge changes from libpcap-1.10.5
 1.72 17-Aug-2023  christos branches: 1.72.6;
define symbols that new libpcap needs
 1.71 07-Jun-2016  pgoyette Create separate modules for i2c_bitbang and bpf_filter so these files
can be included in kernels which need them without also duplicating
them in other modules. Removes the duplicate symbols I found which
prevented loading i2c and bpf modules after having fixed PR 45125.
 1.70 11-Feb-2015  alnsn Fix the build.
 1.69 11-Feb-2015  alnsn It's not enough to check that a class of the last instruction is BPF_RET.
The opcodes in bpf_validate() must match opcodes understood by bpf_filter().

Found by afl-fuzz http://lcamtuf.coredump.cx/afl/.
 1.68 19-Nov-2014  christos branches: 1.68.2;
Add BPF_MOD/BPF_XOR, sync DLT entries and document unused bpf instructions.
From libpcap-1.6.2
 1.67 07-Jul-2014  alnsn Arithmetic overflow when calculating variable offsets (BPF_LD+BPF_IND
instructions) should be handled uniformly for contiguous buffers and mbufs.
 1.66 05-Jul-2014  alnsn Implement error checking in m_xbyte() and check for errors after m_xbyte() call.
Reuse (len - k) expression in m_xword() and m_xhalf() to give an optimization
hint to a compiler.

When m_xbyte() didn't exist, bpf_filter() handled out-of-bounds BPF_B loads
correctly because "return 0" inside MINDEX() was aborting filter programs.
After the change that added m_xbyte() zero values were passed to A or X
registers instead of aborting a filter program.
 1.65 25-Jun-2014  alnsn Check "preinited" argument of bpf_set_extmem().
 1.64 24-Jun-2014  rmind - bpf_validate_ext: fix memword validation in BPF_ST/BPF_STX case.
- bpf_set_extmem: check the number of words against BPF_MAX_MEMWORDS.
 1.63 24-Jun-2014  rmind - Improve the comments in bpf.h and KNF a little.
- Rename bpf_ctx_t member noinit to preinited (reflects the meaning better).
 1.62 24-Jun-2014  alnsn Implement copfuncs and external memory in bpfjit.
 1.61 15-Nov-2013  rmind branches: 1.61.2;
- Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.60 05-Oct-2013  rmind bpf_filter: re-use some code. No functional change intended.
 1.59 19-Sep-2013  rmind bpf_validate_ext: allow COP to modify the memstore.
 1.58 18-Sep-2013  rmind Add bpf_filter_ext() to use with BPF COP, restore bpf_filter() as it was
originally to preserve compatibility. Similarly, add bpf_validate_ext()
which takes bpf_ctx_t.
 1.57 30-Aug-2013  rmind bpf_filter: add a custom argument which can be passed to coprocessor routine.
 1.56 29-Aug-2013  rmind Implement BPF_COP/BPF_COPX instructions in the misc category (BPF_MISC)
which add a capability to call external functions in a predetermined way.

It can be thought as a BPF "coprocessor" -- a generic mechanism to offload
more complex packet inspection operations. There is no default coprocessor
and this functionality is not targeted to the /dev/bpf. This is primarily
targeted to the kernel subsystems, therefore there is no way to set a custom
coprocessor at the userlevel.

Discussed on: tech-net@
OK: core@
 1.55 27-Oct-2012  alnsn branches: 1.55.2;
Add bpfjit and enable it for amd64.
 1.54 27-Sep-2012  alnsn Remove bpf_jit which was ported from FreeBSD recently.

It will soon be replaced with the new bpfjit kernel module.
 1.53 15-Aug-2012  alnsn branches: 1.53.2;
MINDEX() macro has 'return 0;' statement. It doesn't set *err to
1 before return when invoked from m_xword() and m_xhalf() functions.
The caller doesn't set it to 1 either. So, set *err to 1 before
invoking MINDEX().
 1.52 02-Aug-2012  rmind bpf_filter: remove unnecessary memset(), add a comment.
 1.51 01-Aug-2012  rmind Add BPF JIT compiler, currently supporting amd64 and i386. Code obtained
from FreeBSD. Also, make few BPF fixes and simplifications while here.
Note that bpf_jit_enable is false for now.

OK dyoung@, some feedback from matt@
 1.50 29-Dec-2011  alnsn Apply same bounds checks for BPF_LD|BPF_B|BPF_IND as for
BPF_LD|BPF_H|BPF_IND and BPF_LD|BPF_W|BPF_IND.

From FreeBSD r48548, the original r45574 had a typo.
 1.49 29-Dec-2011  christos PR/45751: Alexander Nasonov: No overflow check in BPF_LD|BPF_ABS
 1.48 14-Jul-2011  drochner branches: 1.48.2; 1.48.6;
back out previous - this should be unnecessary on NetBSD due to
the extra validation introduced in rev.1.42 (and pulled up to netbsd-5)
 1.47 14-Jul-2011  drochner clear the packet filter's scratch memory before running the filter
program, otherwise kernel memory can be leaked, from Guy Harris
per PR kern/45142
 1.46 19-Feb-2011  christos delint.
 1.45 19-Feb-2011  enami Fix userland build.
 1.44 19-Feb-2011  christos Use kmem instead of malloc. Requested by rmind.
 1.43 19-Feb-2011  matt Use __CTASSERT
 1.42 19-Feb-2011  christos Avoid stack memory disclosure by keeping track during filter validation time
of initialized memory. Idea taken from linux.
 1.41 05-Dec-2010  mrg branches: 1.41.2; 1.41.4;
revert another part of bpf_filter 1.38 that broke the check for divide
by zero while validating the bpf program.

originally spotted by skrll@, and broke atf the month-old atf test for
this exact problem: net_bpf_t_div-by-zero_div_by_zero.
 1.40 05-Dec-2010  mrg consider BPF_ABS, BPF_IND and BPF_MSH as they used to be in rev 1.37.

this fixes dhclient, and i'm told dhcpcd as well.


this patch from skrll@netbsd.org, tested by me.
 1.39 05-Dec-2010  mrg apply the smallest hack to allow this to build without warnings again.
 1.38 05-Dec-2010  christos make bpf_validate available in userland.
 1.37 05-Dec-2010  christos constify
 1.36 21-Apr-2010  drochner the correct check for BPF_K is with BPF_SRC for BPF_ALU ops, from
Guy Harris per PR kern/43185
fixes possible division-by-zero crashes by evil filter expressions
like "len / 0 = 1"
pullup candidate
 1.35 20-Aug-2008  joerg branches: 1.35.4; 1.35.10; 1.35.12; 1.35.14; 1.35.16;
As the scratch memory is only ever copied to or from A and X, make it
unsigned as well.
 1.34 02-Jan-2008  christos branches: 1.34.6; 1.34.10; 1.34.12; 1.34.16;
PR/37663: Guy Harris: bpf_validate rejects valid programs that use the multiply instruction
 1.33 27-Jan-2007  cbiere branches: 1.33.20; 1.33.26; 1.33.32;
Use be16dec() and be32dec() instead of reimplementing them.
 1.32 04-Oct-2006  oster branches: 1.32.2; 1.32.4;
It is not sufficient for MINDEX to just 'return 0' if the MINDEX macro
is going to be used from within m_xhalf() and m_xword(). In using
MINDEX in those cases, we must set *err to '1' *before* calling MINDEX
just in case MINDEX does decide to 'return', and causes the function
to return 0 with an un-set err value. A consequence of this fix is
that we can cleanup a couple of (now) unneeded goto's. Problem found
by inspection whilst searching for the cause of a different panic.

Also: pavel@ noted the following:
if (merr != 0)
return 0;
was missing from after a call to m_xhalf(), so fix that too.

src/regress/sys/net/bpf/out-of-bounds now passes the regression test.

Ok'ed by pavel@.
 1.31 14-May-2006  christos branches: 1.31.8; 1.31.10;
XXX: GCC uninitialized.
 1.30 27-Feb-2006  drochner branches: 1.30.2; 1.30.6;
fix bpf_validate():
a missing "break" caused any bpf filter containing
a division to be rejected
 1.29 07-Feb-2006  wiz Add a /* CONSTCOND */ for lint.
 1.28 14-Dec-2005  rpaulo branches: 1.28.2; 1.28.4; 1.28.6;
Fix previous commit: ABS, IND and MSH are valid codes.
 1.27 13-Dec-2005  rpaulo In bpf_validate(), get rid of bpf_maxbufsize test as there are other
clients of bpf_filter(), like if_ppp, that are not limited by
bpf_maxbufsize. The same check is done at the run time, so there is no
problem created.

Noticed by Guy Harris in private email.
 1.26 05-Dec-2005  rpaulo Oops, the previous revision had a wrong pre-processor #if clause.
 1.25 05-Dec-2005  rpaulo Make the bpf_maxbufsize a constant when bpfilter pseudo-device is not
present in the kernel config, thus fixing the build. Problem reported
by Havard Eidnes. Solution proposed by Christos, thanks.
 1.24 30-Nov-2005  rpaulo PR 32198: bpf_validate() needs to do more checks, from Otto Moerbeek/OpenBSD
via Guy Harris.
Problems like out-of-bounds read/write in filter machine operations
were fixed.
 1.23 30-Nov-2005  rpaulo More KNF. C99 uintXX_t types.
 1.22 30-Nov-2005  rpaulo KNF. ANSYfy. de-P().
 1.21 26-Feb-2005  perry branches: 1.21.2; 1.21.4; 1.21.12;
nuke trailing whitespace
 1.20 07-Aug-2003  agc branches: 1.20.8; 1.20.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 15-Nov-2001  lukem branches: 1.19.16;
don't need <sys/types.h> when including <sys/param.h>
 1.18 12-Nov-2001  lukem add RCSIDs
 1.17 22-Jul-2000  matt branches: 1.17.2; 1.17.4;
Add a missing include when using this in user space.
 1.16 12-Apr-2000  chs remove sunos stuff.
 1.15 30-Mar-2000  augustss Kill some more register declarations.
 1.14 09-Oct-1997  christos branches: 1.14.16;
bpf_filter.c:
- update copyright
- add their rcsid
- initialize some variables later later for consistency
with the current code.
- change char to u_char to match the current code.
 1.13 07-Jul-1997  phil branches: 1.13.2;
Provide better filter validation. PR 3366.
 1.12 13-Feb-1996  christos Net prototypes
 1.11 22-Apr-1995  cgd copy routines should take size_t lengths for prototype consistency.
don't assume that tick is >= 1000; loses badly on alpha (div. by zero)
only try unaligned copies if NetBSD's UNALIGNED_ACCESS symbol is defined.
various misc type size cleanups, mostly short -> int16_t.
 1.10 01-Apr-1995  mycroft Fix bogus buffer indexing when a value is split across a mbuf boundary,
as suggested by Greg Ansley. Also, redefine MINDEX() slightly to avoid
duplicating code.
 1.9 28-Mar-1995  jtc KERNEL -> _KERNEL
 1.8 06-Mar-1995  mycroft Undo an #include ordering change.
 1.7 06-Mar-1995  mycroft Make this more type-safe for the Alpha. From the libpcap 0.0 distribution.
(Needs more work.)
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.13.2.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.14.16.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.17.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.19.16.5 11-Dec-2005  christos Sync with head.
 1.19.16.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.19.16.1 03-Aug-2004  skrll Sync with HEAD
 1.20.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.20.8.1 29-Apr-2005  kent sync with -current
 1.21.12.1 23-Oct-2006  ghen Pull up following revision(s) (requested by oster in ticket #1539):
sys/net/bpf_filter.c: revision 1.32
It is not sufficient for MINDEX to just 'return 0' if the MINDEX macro
is going to be used from within m_xhalf() and m_xword(). In using
MINDEX in those cases, we must set *err to '1' *before* calling MINDEX
just in case MINDEX does decide to 'return', and causes the function
to return 0 with an un-set err value. A consequence of this fix is
that we can cleanup a couple of (now) unneeded goto's. Problem found
by inspection whilst searching for the cause of a different panic.
Also: pavel@ noted the following:
if (merr != 0)
return 0;
was missing from after a call to m_xhalf(), so fix that too.
src/regress/sys/net/bpf/out-of-bounds now passes the regression test.
Ok'ed by pavel@.
 1.21.4.4 21-Jan-2008  yamt sync with head
 1.21.4.3 26-Feb-2007  yamt sync with head.
 1.21.4.2 30-Dec-2006  yamt sync with head.
 1.21.4.1 21-Jun-2006  yamt sync with head.
 1.21.2.1 23-Oct-2006  ghen Pull up following revision(s) (requested by oster in ticket #1539):
sys/net/bpf_filter.c: revision 1.32
It is not sufficient for MINDEX to just 'return 0' if the MINDEX macro
is going to be used from within m_xhalf() and m_xword(). In using
MINDEX in those cases, we must set *err to '1' *before* calling MINDEX
just in case MINDEX does decide to 'return', and causes the function
to return 0 with an un-set err value. A consequence of this fix is
that we can cleanup a couple of (now) unneeded goto's. Problem found
by inspection whilst searching for the cause of a different panic.
Also: pavel@ noted the following:
if (merr != 0)
return 0;
was missing from after a call to m_xhalf(), so fix that too.
src/regress/sys/net/bpf/out-of-bounds now passes the regression test.
Ok'ed by pavel@.
 1.28.6.2 01-Jun-2006  kardel Sync with head.
 1.28.6.1 22-Apr-2006  simonb Sync with head.
 1.28.4.1 09-Sep-2006  rpaulo sync with head
 1.28.2.2 01-Mar-2006  yamt sync with head.
 1.28.2.1 18-Feb-2006  yamt sync with head.
 1.30.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.30.2.1 24-May-2006  yamt sync with head.
 1.31.10.1 22-Oct-2006  yamt sync with head
 1.31.8.2 01-Feb-2007  ad Sync with head.
 1.31.8.1 18-Nov-2006  ad Sync with head.
 1.32.4.1 03-Jun-2008  skrll Sync with netbsd-4.
 1.32.2.2 13-Jun-2010  riz Pull up following revision(s) (requested by drochner in ticket #1393):
sys/net/bpf_filter.c: revision 1.36
the correct check for BPF_K is with BPF_SRC for BPF_ALU ops, from
Guy Harris per PR kern/43185
fixes possible division-by-zero crashes by evil filter expressions
like "len / 0 =3D 1"
pullup candidate
 1.32.2.1 03-Feb-2008  riz Pull up following revision(s) (requested by christos in ticket #1032):
sys/net/bpf_filter.c: revision 1.34
PR/37663: Guy Harris: bpf_validate rejects valid programs that use the
multiply instruction
 1.33.32.1 02-Jan-2008  bouyer Sync with HEAD
 1.33.26.1 18-Feb-2008  mjf Sync with HEAD.
 1.33.20.1 09-Jan-2008  matt sync with HEAD
 1.34.16.1 19-Oct-2008  haad Sync with HEAD.
 1.34.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.34.10.2 11-Aug-2010  yamt sync with head.
 1.34.10.1 04-May-2009  yamt sync with head.
 1.34.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.35.16.2 05-Mar-2011  rmind sync with head
 1.35.16.1 30-May-2010  rmind sync with head
 1.35.14.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.35.12.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.35.10.1 22-Mar-2011  bouyer Pull up following revision(s) (requested by spz in ticket #1571):
sys/net/bpf_filter.c: revision 1.36, 1.42 -> 1.46 via patch
Avoid stack memory disclosure by keeping track during filter validation time
of initialized memory. Idea taken from linux.
Use __CTASSERT
Use kmem instead of malloc. Requested by rmind.
Fix userland build.
delint.
the correct check for BPF_K is with BPF_SRC for BPF_ALU ops, from
Guy Harris per PR kern/43185
fixes possible division-by-zero crashes by evil filter expressions
like "len / 0 = 1"
pullup candidate
 1.35.4.2 20-Mar-2011  bouyer Pull up following revision(s) (requested by spz in ticket #1571):
sys/net/bpf_filter.c: revision 1.42 - 1.46 via patch
Avoid stack memory disclosure by keeping track during filter validation time
of initialized memory. Idea taken from linux.
Use __CTASSERT
Use kmem instead of malloc. Requested by rmind.
Fix userland build.
delint.
 1.35.4.1 20-May-2010  snj branches: 1.35.4.1.2;
Pull up following revision(s) (requested by drochner in ticket #1381):
sys/net/bpf_filter.c: revision 1.36
the correct check for BPF_K is with BPF_SRC for BPF_ALU ops, from
Guy Harris per PR kern/43185
fixes possible division-by-zero crashes by evil filter expressions
like "len / 0 = 1"
 1.35.4.1.2.1 20-Mar-2011  bouyer Pull up following revision(s) (requested by spz in ticket #1571):
sys/net/bpf_filter.c: revision 1.42 - 1.46 via patch
Avoid stack memory disclosure by keeping track during filter validation time
of initialized memory. Idea taken from linux.
Use __CTASSERT
Use kmem instead of malloc. Requested by rmind.
Fix userland build.
delint.
 1.41.4.1 05-Mar-2011  bouyer Sync with HEAD
 1.41.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.48.6.1 18-Feb-2012  mrg merge to -current.
 1.48.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.48.2.2 30-Oct-2012  yamt sync with head
 1.48.2.1 17-Apr-2012  yamt sync with head
 1.53.2.3 03-Dec-2017  jdolecek update from HEAD
 1.53.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.53.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.55.2.1 18-May-2014  rmind sync with head
 1.61.2.1 10-Aug-2014  tls Rebase.
 1.68.2.2 09-Jul-2016  skrll Sync with HEAD
 1.68.2.1 06-Apr-2015  skrll Sync with HEAD
 1.72.6.1 02-Aug-2025  perseant Sync with HEAD
 1.2 27-Sep-2012  alnsn Remove bpf_jit which was ported from FreeBSD recently.

It will soon be replaced with the new bpfjit kernel module.
 1.1 01-Aug-2012  rmind branches: 1.1.2;
Add BPF JIT compiler, currently supporting amd64 and i386. Code obtained
from FreeBSD. Also, make few BPF fixes and simplifications while here.
Note that bpf_jit_enable is false for now.

OK dyoung@, some feedback from matt@
 1.1.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.3 27-Sep-2012  alnsn Remove bpf_jit which was ported from FreeBSD recently.

It will soon be replaced with the new bpfjit kernel module.
 1.2 02-Aug-2012  rmind branches: 1.2.2;
Add struct bpf_insn tag.
 1.1 01-Aug-2012  rmind Add BPF JIT compiler, currently supporting amd64 and i386. Code obtained
from FreeBSD. Also, make few BPF fixes and simplifications while here.
Note that bpf_jit_enable is false for now.

OK dyoung@, some feedback from matt@
 1.2.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.8 25-Jun-2018  msaitoh Removal of bpf_tap().
 1.7 25-Jan-2017  ozaki-r branches: 1.7.12;
Use bpf_ops for bpf_mtap_softint

By doing so we don't need to care whether a kernel enables bpfilter or not.
 1.6 30-Jan-2012  matt branches: 1.6.6; 1.6.24; 1.6.28; 1.6.32;
Use proper ANSI prototypes for foo() -> foo(void)
Caught when compiling with -Wold-style-definition
 1.5 05-Apr-2010  joerg branches: 1.5.8; 1.5.12;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.4 25-Jan-2010  pooka branches: 1.4.2; 1.4.4; 1.4.6;
Make bpf dynamically loadable.
 1.3 19-Jan-2010  pooka fix pasto in previous
 1.2 19-Jan-2010  pooka slap dis wit summah dat RCSId
 1.1 19-Jan-2010  pooka Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.4.6.1 30-May-2010  rmind sync with head
 1.4.4.3 11-Aug-2010  yamt sync with head.
 1.4.4.2 11-Mar-2010  yamt sync with head
 1.4.4.1 25-Jan-2010  yamt file bpf_stub.c was added on branch yamt-nfs-mp on 2010-03-11 15:04:26 +0000
 1.4.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.5.12.1 18-Feb-2012  mrg merge to -current.
 1.5.8.1 17-Apr-2012  yamt sync with head
 1.6.32.1 21-Apr-2017  bouyer Sync with HEAD
 1.6.28.1 20-Mar-2017  pgoyette Sync with HEAD
 1.6.24.1 05-Feb-2017  skrll Sync with HEAD
 1.6.6.1 03-Dec-2017  jdolecek update from HEAD
 1.7.12.1 25-Jun-2018  pgoyette Sync with HEAD
 1.50 19-Aug-2024  ozaki-r bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.49 19-Aug-2024  ozaki-r bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
 1.48 09-Jun-2021  martin branches: 1.48.10; 1.48.16;
Add a bpf_register_track_event() function (and deregister equivalent)
that allows a driver to track listeners attaching/detaching from tap
points.

This is usefull for drivers that would have to do extra work for some
taps and can not easily decide (at the driver level) if the work would
be needed further up the stack.

An example is providing radiotap headers for IEEE 802.11 frames.
 1.47 11-Jun-2020  roy branches: 1.47.6;
bpf(4): Add ioctls BIOCSETWF and BIOCLOCK

Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underlying parameters
of the bpf(4) device. An example might be the setting of bpf(4) filter
programs or attaching to different network interfaces.

BIOCSETWF can be used to set write filters for outgoing packets.
Currently if a bpf(4) consumer is compromised, the bpf(4) descriptor can
essentially be used as a raw socket, regardless of consumer's UID.
Write filters give users the ability to constrain which packets can be sent
through the bpf(4) descriptor.

Taken from OpenBSD.
 1.46 26-Jun-2018  msaitoh branches: 1.46.6;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.45 25-Jan-2018  ozaki-r branches: 1.45.2;
Abandon unnecessary softint

The softint was introduced to defer fownsignal that was called in bpf_wakeup to
softint at v1.139, but now bpf_wakeup always runs in softint so we don't need
the softint anymore.
 1.44 09-Feb-2017  ozaki-r branches: 1.44.6;
Make bpf MP-safe

By the change, bpf_mtap can run without any locks as long as its bpf filter
doesn't match a target packet. Pushing data to a bpf buffer still needs
a lock. Removing the lock requires big changes and it's a future work.

Another known issue is that we need to remain some obsolete variables to
avoid breaking kvm(3) users such as netstat and fstat. One problem for
MP-ification is that in order to keep statistic counters of bpf_d we need
to use atomic operations for them. Once we retire the kvm(3) users, we
should make the counters per-CPU and remove the atomic operations.
 1.43 01-Feb-2017  ozaki-r Kill tsleep/wakeup and use cv
 1.42 01-Feb-2017  ozaki-r Use pslist(9) instead of queue(9) for psz/psref

As usual some member variables of struct bpf_d and bpf_if remain to avoid
breaking kvm(3) users (netstat and fstat).
 1.41 01-Feb-2017  ozaki-r Use kmem(9) instead of malloc/free
 1.40 24-Jan-2017  ozaki-r Defer bpf_mtap in Rx interrupt context to softint

bpf_mtap of some drivers is still called in hardware interrupt context.
We want to run them in softint as well as bpf_mtap of most drivers
(see if_percpuq_softint and if_input).

To this end, bpf_mtap_softint mechanism is implemented; it defers
bpf_mtap processing to a dedicated softint for a target driver.
By using the machanism, we can move bpf_mtap processing to softint
without changing target drivers much while it adds some overhead
on CPU and memory. Once target drivers are changed to softint-based,
we should return to normal bpf_mtap.

Proposed on tech-kern and tech-net
 1.39 23-Jan-2017  ozaki-r Make bpf_setf static
 1.38 15-Nov-2013  rmind branches: 1.38.6; 1.38.10; 1.38.14;
- Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.37 28-Oct-2012  alnsn branches: 1.37.2;
Comment bd_jitcode member.
 1.36 27-Oct-2012  alnsn Add bpfjit and enable it for amd64.
 1.35 27-Sep-2012  alnsn Remove bpf_jit which was ported from FreeBSD recently.

It will soon be replaced with the new bpfjit kernel module.
 1.34 01-Aug-2012  rmind branches: 1.34.2;
Add BPF JIT compiler, currently supporting amd64 and i386. Code obtained
from FreeBSD. Also, make few BPF fixes and simplifications while here.
Note that bpf_jit_enable is false for now.

OK dyoung@, some feedback from matt@
 1.33 30-Aug-2011  bouyer branches: 1.33.2;
Provide netbsd32 compat for bpf. Beside the ioctls, the structure
returned to userland by read(2) also needs to be converted.
For this, the bpf descriptor is flagged as compat32 (or not) in the
open and ioctl functions (where the user process's pid is also updated
in the descriptor). When the bpf buffer is filled in, the 32bits or native
header is used depending on the information stored in the descriptor.

This won't work if a 64bit binary does the open and ioctls, and then
exec a 32bit program which will do the read. But this is very
unlikely to happen in real life ...

Tested on i386 and loongson; with these changes my loongson can run
dhclient and tcpdump with a n32 userland.
 1.32 13-Mar-2010  christos add BIOC{G,S}FEEDBACK which allows one to receive injected outgoing packets
via bpf.
 1.31 21-Jan-2010  dyoung branches: 1.31.2;
Spelling fix: correspoding -> corresponding.
 1.30 11-Apr-2009  christos Fix PR/37878 and PR/37550: Provide stat(2) for all devices and don't use
fbadop_stat.
 1.29 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.28 24-Apr-2008  ad branches: 1.28.2; 1.28.10; 1.28.16;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.27 20-Feb-2008  matt branches: 1.27.6; 1.27.8;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.26 09-Jul-2007  ad branches: 1.26.8;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.25 04-Mar-2007  christos branches: 1.25.2; 1.25.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.24 10-Dec-2005  elad branches: 1.24.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.23 04-Aug-2005  rpaulo Implemented the kernel part of BPF statistics and BPF peers, net.bpf.stats
and net.bpf.peers sysctls respectively.

A new structure was added to describe the external (user viewable)
representation of a BPF file; a new entry was added to the bpf_d
structure to store the PID of the calling process; a simple_lock was added
to protect the insert/removal from the net.bpf.peers sysctl handler.

This idea came from FreeBSD (Christian S.J. Peron) but while it is
implemented with sysctl's it differs a bit.

Reviewed by: christos@ and atatat@ (who gave me the tip for the net.bpf.peers
sysctl helper function).
 1.22 17-Mar-2005  kleink branches: 1.22.2;
A couple of <sys/select.h>-related changes:
* Factor out struct selinfo and its header dependencies into its own header,
<sys/selinfo.h>, to avoid namespace pollution.
* Include <sys/selinfo.h> in user-visible headers where necessary.
 1.21 30-Nov-2004  christos branches: 1.21.4; 1.21.6;
Clonify bpf. I am not changing /dev/bpfX -> /dev/bpf until all userland
programs have been fixed.
 1.20 29-May-2004  darrenr back out previous change - these diffs aren't what I'd tested.
 1.19 29-May-2004  darrenr add mmap(2) interface to bpf(4) devices, along with BIOCMMAPINFO ioctl call
for applications to interact with the bpf device for the purpose of using
mmap to examinen captured data.
 1.18 15-Apr-2004  darrenr Add a count of the number of packets that match the bpf filter applied to a
particule device. In doing this, make a new the bpf_stat structure with
members that are u_long rather than u_int, matching the counters in the bpf_d.
the original bpf_stat is now bpf_stat_old and so to the original ioctl
is preserved as BIOCGSTATSOLD.
 1.17 10-Apr-2004  darrenr Fix bpf so that select will return for a timeout (from FreeBSD.)

Fix the behaviour of BIOCIMMEDIATE (fix from LBL BPF code via FreeBSD.)

In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf (based on
similar change FreeBSD but fixes BIOC*SEESENT issue with that.)

Copy the implementation of BIOCSSEESENT, BIOCGSEESENT by FreeBSD.

Review Assistance: Guy Harris

PRs: kern/8674, kern/12170
 1.16 07-Aug-2003  agc branches: 1.16.2;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.15 30-Apr-1998  thorpej branches: 1.15.48;
Implement two new BPF ioctls: BPFGHDRCMPLT and BPFSHDRCMPLT, to get/set
the "header already complete" flag. This allows BPF writers to spoof
layer 2 source addresses (providing the layer 2 in use supports it) in
applications where this is necessary. From Greg Smith <greg@nas.nasa.gov>.
 1.14 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.13 09-Oct-1997  christos GC bd_sig
 1.12 09-Oct-1997  christos - add their rcsid
- add ifdef to match current code
 1.11 27-Sep-1995  thorpej branches: 1.11.14;
Enhancements to the bpf from Stu Grossman <grossman@cygnus.com>:
* grok FIONBIO, FIOASYNC, and TIOC{G,S}PGRP
* add BIOC{G,S}RSIG; get/set the signal to be delivered
to the process or process group upon packet reception.
Defaults to SIGIO.
 1.10 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.7 23-Nov-1993  cgd defines change
 1.6 09-Sep-1993  davidg branches: 1.6.2;
added include of select.h to bpfdesc.h because it now has a reference to
struct selinfo.
 1.5 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.4 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.3 06-Apr-1993  deraadt commit damnit!
 1.2 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6.2.1 23-Nov-1993  cgd defines change
 1.11.14.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.15.48.7 11-Dec-2005  christos Sync with head.
 1.15.48.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.15.48.5 01-Apr-2005  skrll Sync with HEAD.
 1.15.48.4 18-Dec-2004  skrll Sync with HEAD.
 1.15.48.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.15.48.2 18-Sep-2004  skrll Sync with HEAD.
 1.15.48.1 03-Aug-2004  skrll Sync with HEAD
 1.16.2.1 21-Apr-2004  jmc Pullup rev 1.17-1.18 (requested by darrenr in ticket #167)

Reduce bpf buffer to 32k from 1M to reduce kernel memory usage from userland
binaries.
Fix bpf so that select will return for a timeout.
Fix the behaviour of BIOCIMMEDIATE.
In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf.
Various other bpf fixes, including PR#8674, PR#12170
 1.21.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.21.4.1 29-Apr-2005  kent sync with -current
 1.22.2.3 27-Feb-2008  yamt sync with head.
 1.22.2.2 03-Sep-2007  yamt sync with head.
 1.22.2.1 21-Jun-2006  yamt sync with head.
 1.24.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.25.4.1 11-Jul-2007  mjf Sync with head.
 1.25.2.1 01-Jul-2007  ad Adapt to callout API change.
 1.26.8.1 23-Mar-2008  matt sync with HEAD
 1.27.8.1 18-May-2008  yamt sync with head.
 1.27.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.28.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.28.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.28.2.3 11-Aug-2010  yamt sync with head.
 1.28.2.2 11-Mar-2010  yamt sync with head
 1.28.2.1 04-May-2009  yamt sync with head.
 1.31.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.33.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.33.2.1 30-Oct-2012  yamt sync with head
 1.34.2.3 03-Dec-2017  jdolecek update from HEAD
 1.34.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.37.2.1 18-May-2014  rmind sync with head
 1.38.14.1 21-Apr-2017  bouyer Sync with HEAD
 1.38.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.38.6.2 28-Aug-2017  skrll Sync with HEAD
 1.38.6.1 05-Feb-2017  skrll Sync with HEAD
 1.44.6.1 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #526):
sys/net/bpfdesc.h: revision 1.45
sys/net/bpf.c: revision 1.223
Abandon unnecessary softint
The softint was introduced to defer fownsignal that was called in bpf_wakeup to
softint at v1.139, but now bpf_wakeup always runs in softint so we don't need
the softint anymore.
 1.45.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.46.6.1 13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1886):

sys/net/bpfdesc.h: revision 1.49
sys/net/bpf.c: revision 1.256
sys/net/bpf.c: revision 1.257
sys/net/bpfdesc.h: revision 1.50

bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.47.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.48.16.1 02-Aug-2025  perseant Sync with HEAD
 1.48.10.1 13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #858):

sys/net/bpfdesc.h: revision 1.49
sys/net/bpf.c: revision 1.256
sys/net/bpf.c: revision 1.257
sys/net/bpfdesc.h: revision 1.50

bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.48 01-Feb-2020  riastradh Fix wrong memory order and switch bpf to atomic_load/store_*.
 1.47 20-Jan-2019  alnsn branches: 1.47.6;
Add missing include.
 1.46 29-Jul-2016  alnsn branches: 1.46.14; 1.46.16;
Don't trigger BJ_ASSERT(false) on invalid BPF_Jxxx opcode in jmp_to_op().

This change helps survive AFL fuzzing without calling bpf_validate() first.

Also change alu_to_op() function to have a similar interface.
 1.45 29-May-2016  alnsn branches: 1.45.2;
Adapt to the new version of sljit@r313.
 1.44 29-Dec-2015  alnsn Replace the nsaveds() function with #define NSAVEDS 3. No functional change.

Patch from Michael McConville.
 1.43 14-Feb-2015  alnsn Copyright year.
 1.42 14-Feb-2015  alnsn In some implementations pc->k is signed. Cast it to uint32_t before comparing.
 1.41 14-Feb-2015  alnsn Properly track initialisation of registers for BPF_JMP instructions.
 1.40 13-Feb-2015  alnsn Don't emit wrapped-around reads. They're dead code but dead code elimination
logic isn't smart enough to figure it out.

Found by afl fuzzer http://lcamtuf.coredump.cx/afl/.
 1.39 12-Feb-2015  alnsn Fix bugs found by afl fuzzer http://lcamtuf.coredump.cx/afl/.
 1.38 15-Jan-2015  christos rename variable to avoid conflict with "div"
 1.37 08-Dec-2014  justin Help gcc by initialising variable
 1.36 20-Nov-2014  alnsn branches: 1.36.2;
Implement BPF_MOD.
 1.35 20-Nov-2014  alnsn Implement BPF_ALU+BPF_MOD-BPF_K when pc->k is a power of 2. Get rid of divt
and divw arguments in emit_moddiv(), they're accessible via the pc argument.
 1.34 20-Nov-2014  alnsn Follow argument convension of other emit_xxx() functions.
 1.33 19-Nov-2014  christos Add BPF_MOD/BPF_XOR (untested, needs work)
 1.32 26-Jul-2014  alnsn branches: 1.32.2;
Don't use saved EREG registers because sljit 0.91 can generate
bogus code on amd64. The A and X registers are saved on the stack.

The most recent version of sljit fixes bogus code generation but
it's not backward compatible with sljit 0.91.
 1.31 24-Jul-2014  alnsn For P[X+0] load, don't emit wrap around check and copy X intead of emitting X+0.
 1.30 22-Jul-2014  alnsn Two tweaks: don't use a temporary register to dereference the err agrument
after xcall and don't generate ((tmp1 & 0xf) << 2) twice in emit_msh().
 1.29 22-Jul-2014  alnsn Don't use scratch registers for X and to restore A after BPF_COPX call.
 1.28 13-Jul-2014  alnsn Refactor BPF_COPX code. New version doesn't load buf and buflen after copx call.
 1.27 13-Jul-2014  alnsn Don't use BJ_TMP2REG for 32bit packet reads. Assign this register to (buf+X)
in BPF_LD+BPF_IND and save one instruction.
 1.26 12-Jul-2014  alnsn emit_xcall: check overflow by comparing X with (UINT32_MAX - pk->k), restore
the A register after checking that xcall succeeded.
 1.25 12-Jul-2014  alnsn Initialise status to avoid -Wuninitialized warning.
 1.24 12-Jul-2014  alnsn Some small changes: add missing error checks; move sjump initialisation away
from optimize(); +BJ_HINT_PKT, -BJ_HINT_IND; tweak comments.
 1.23 11-Jul-2014  alnsn Handle overflow in BPF_LD+BPF_IND for mbuf chains and make two minor changes:
move sljit_emit_return() to generate_insn_code() and use a different register
for checking errors after xcall.
 1.22 08-Jul-2014  alnsn Most filter programs in the kernel need 3 scratch registers.
 1.21 05-Jul-2014  alnsn Review some SLJIT_MOV instructions with respect to width.
 1.20 04-Jul-2014  alnsn Add optimization hints. They replace nscratches and ncopfuncs and improve
readability.
 1.19 01-Jul-2014  alnsn Move the main loop in bpfjit_generate_code() to a new function and make few
small changes.
 1.18 25-Jun-2014  alnsn Default initialize external memwords.

This change doesn't affect performance of valid bpf kernel programs
because bpf_filter_ext() checks that all memwords are initialized
explicitly.
 1.17 25-Jun-2014  alnsn New jitcode takes two arguments.
 1.16 25-Jun-2014  alnsn Use SLJIT_MOV_P to copy extmem pointer.
 1.15 25-Jun-2014  rmind bpfjit_generate_code: emit the instruction correctly.
 1.14 24-Jun-2014  rmind - Improve the comments in bpf.h and KNF a little.
- Rename bpf_ctx_t member noinit to preinited (reflects the meaning better).
 1.13 24-Jun-2014  alnsn Implement copfuncs and external memory in bpfjit.
 1.12 17-Jun-2014  alnsn Update code to the latest sljit version.
 1.11 23-May-2014  alnsn Enable ABC optimization when one branch returns 0.
 1.10 23-May-2014  alnsn Loads at offsets UINT32_MAX or greater are unreachable.
 1.9 23-May-2014  alnsn Implement unconditional jump to "return 0" for loads at UINT32_MAX+1 or greater.
 1.8 22-May-2014  alnsn Some small changes in preparation for a bigger change.

- typedef for ABC variables and MAX_ABC_LENGTH constant,
- cast pc->k to uint32_t in more places,
- whitespaces.
 1.7 15-May-2014  alnsn Refactor bpfjit code.

- Implement Array Bounds Check Elimination for packet bytes.
- Track initialization of registers and memwords.
- Remove "bj_" prefix from struct members.
- Shorten "BPFJIT_" prefix to "BJ_".
- Other small improvements.
 1.6 15-Dec-2013  pooka branches: 1.6.2;
XXXgcc Wuninitialized kludge
 1.5 15-Nov-2013  rmind Fix the bpfjit build.
 1.4 15-Nov-2013  rmind - Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.3 20-Sep-2013  rmind bpfjit: replace malloc with kmem, KNF a little, add RCS ID.
 1.2 10-Nov-2012  alnsn branches: 1.2.2; 1.2.4;
Add RCSID and fix -Wsign-compare warnings.
 1.1 27-Oct-2012  alnsn branches: 1.1.2;
Add bpfjit and enable it for amd64.
 1.1.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 27-Oct-2012  yamt file bpfjit.c was added on branch yamt-pagecache on 2012-10-30 17:22:42 +0000
 1.2.4.1 18-May-2014  rmind sync with head
 1.2.2.4 03-Dec-2017  jdolecek update from HEAD
 1.2.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.2.2 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.2.2.1 10-Nov-2012  tls file bpfjit.c was added on branch tls-maxphys on 2012-11-20 03:02:46 +0000
 1.6.2.1 10-Aug-2014  tls Rebase.
 1.32.2.1 16-Feb-2015  martin Pull up following revision(s) (requested by alnsn in ticket #519):
sys/net/bpfjit.c: revision 1.39-1.41
Fix bugs found by afl fuzzer http://lcamtuf.coredump.cx/afl/.
-
Don't emit wrapped-around reads. They're dead code but dead code elimination
logic isn't smart enough to figure it out.
-
Properly track initialisation of registers for BPF_JMP instructions.
 1.36.2.4 05-Oct-2016  skrll Sync with HEAD
 1.36.2.3 09-Jul-2016  skrll Sync with HEAD
 1.36.2.2 19-Mar-2016  skrll Sync with HEAD
 1.36.2.1 06-Apr-2015  skrll Sync with HEAD
 1.45.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.46.16.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.46.16.1 10-Jun-2019  christos Sync with HEAD
 1.46.14.1 26-Jan-2019  pgoyette Sync with HEAD
 1.47.6.1 29-Feb-2020  ad Sync with head.
 1.4 25-Jun-2014  alnsn Fix copyright years.
 1.3 24-Jun-2014  alnsn Implement copfuncs and external memory in bpfjit.
 1.2 15-Nov-2013  rmind branches: 1.2.2;
- Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.1 27-Oct-2012  alnsn branches: 1.1.2; 1.1.4; 1.1.6;
Add bpfjit and enable it for amd64.
 1.1.6.1 18-May-2014  rmind sync with head
 1.1.4.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.4.2 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.4.1 27-Oct-2012  tls file bpfjit.h was added on branch tls-maxphys on 2012-11-20 03:02:46 +0000
 1.1.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 27-Oct-2012  yamt file bpfjit.h was added on branch yamt-pagecache on 2012-10-30 17:22:42 +0000
 1.2.2.1 10-Aug-2014  tls Rebase.
 1.27 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.26 28-Feb-2018  ozaki-r branches: 1.26.40;
Remove an obsolete assertion too (fix build)

bif_refs was removed when migrated to use psref.
 1.25 28-Feb-2018  ozaki-r Sweep obsolete BRIDGE_MPSAFE (it's always on now)
 1.24 09-Mar-2017  ozaki-r Remove unnecessary splnet
 1.23 10-Jun-2016  ozaki-r branches: 1.23.2; 1.23.4;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.22 19-Apr-2016  ozaki-r Apply psref(9) to bridge(4)

Note that there is an issue that ioctls for an interface and a destruction
of the interface can run in parallel and it causes race conditions on
bridge as well (it rarely happens). The issue will be addressed in the
interface common code (if.c).
 1.21 11-Apr-2016  ozaki-r Fix usage of pslist(9)

Pointed out by riastradh@.
 1.20 11-Apr-2016  ozaki-r Use pslist(9) in bridge(4)

This adds missing memory barriers to list operations for pserialize.
 1.19 15-Feb-2016  ozaki-r Simplify bridge(4)

Thanks to introducing softint-based if_input, the entire bridge code now
never run in hardware interrupt context. So we can simplify the code.

- Remove spin mutexes
- They were needed because some code of bridge could run in
hardware interrupt context
- We now need only an adaptive mutex for each shared object
(a member list and a forwarding table)
- Remove pktqueue
- bridge_input is already in softint, using another softint
(for bridge_forward) is useless
- Packet distribution should be down at device drivers
 1.18 31-Dec-2014  ozaki-r Use pserialize in bridge

This change enables lockless accesses to bridge member lists.
See locking notes in a comment to know how pserialize and
mutexes are used.

This change also provides support for softint-based interrupt
handling; pserialize readers can run in both HW interrupt and
softint contexts.

As usual, pserialize is used only when NET_MPSAFE on.
 1.17 14-Jul-2014  ozaki-r branches: 1.17.4;
Make bridge MPSAFE

- Introduce BRIDGE_MPSAFE
- It's enabled only when NET_MPSAFE is defined
in if.h or the kernel config
- Add iflist and rtlist mutex locks
- Locking iflist is performance sensitive,
so it's not used when !BRIDGE_MPSAFE
- Add bif object reference counting
- It enables fine-grain locking for bridge member lists
by allowing to not hold a lock during touching a bif
- bridge_release_member is added to decrement the
reference count
- A condition variable is added to do bridge_delete_member
gracefully
- Add if_bridgeif to ifnet
- It's a shortcut to a bif object of a bridge member
- It reduces a bif lookup cost and so lock contention on iflist
- Make bridgestp MPSAFE too
 1.16 18-Jun-2014  ozaki-r Make local functions static

This change unveiled some functions are unused. Remove some and
comment out the others.

No functional change.
 1.15 17-Jun-2014  ozaki-r Restructure ether_input and bridge_input

The network stack of NetBSD is well organized and
layered. A packet reception is processed from a
lower layer to an upper layer one by one. However,
ether_input and bridge_input are not structured so.
bridge_input is called inside ether_input.

The new structure replaces ifnet#if_input of a bridge
member with bridge_input when the member is attached.
So a packet goes straight on a packet reception via
a bridge, bridge_input => ether_input => ip_input.

The change is part of a patch of Lloyd Parkes submitted
in PR 48104. Unlike the patch, the change doesn't
intend to change the behavior of the packet processing.
Another patch will fix PR 48104.
 1.14 18-Jan-2009  mrg branches: 1.14.24; 1.14.38;
Fix multiple problems:

* A sign extension error creating the bridge ID corrupted the
priority (always making it the maximum).
* Do not catch STP packets on an interface for which STP is not
enabled -- it's a violation of the spec, and causes STP to fail on
neighboring bridges.
* An optimization to bstp_input() -- some information is already
known when we call it.

contributed anonymously.
 1.13 25-Dec-2007  perry branches: 1.13.10; 1.13.18;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.12 26-Aug-2007  dyoung branches: 1.12.2; 1.12.8; 1.12.10; 1.12.14;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.11 04-Mar-2007  christos branches: 1.11.2; 1.11.10; 1.11.14;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.10 16-Nov-2006  christos branches: 1.10.4;
__unused removal on arguments; approved by core.
 1.9 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.8 15-Apr-2006  christos branches: 1.8.8; 1.8.10;
Coverity CID 2728: Add KASSERT before NULL deref.
 1.7 11-Dec-2005  christos branches: 1.7.4; 1.7.6; 1.7.8; 1.7.10; 1.7.12;
merge ktrace-lwp.
 1.6 26-Feb-2005  perry branches: 1.6.4;
nuke trailing whitespace
 1.5 28-Nov-2003  keihan branches: 1.5.8; 1.5.10;
s/netbsd.org/NetBSD.org/g
 1.4 16-Sep-2003  jdc Adapt to account for bridge_enqueue()'s extra parameter.
 1.3 03-Feb-2003  thorpej branches: 1.3.2;
Test callout_pending(), not callout_active(), and eliminate now-unnecessary
callout_deactivate() calls.
 1.2 12-Nov-2001  lukem add RCSIDs
 1.1 17-Aug-2001  thorpej branches: 1.1.2; 1.1.4;
Add support for building Ethernet bridges, based on Jason Wright's
bridge driver from OpenBSD, although the bridge code has been *heavily*
modified by me (the 802.1D code remains mostly unchanged from the
original).
 1.1.4.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.1.4.1 17-Aug-2001  thorpej file bridgestp.c was added on branch kqueue on 2001-08-25 06:16:56 +0000
 1.1.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.1.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.1.2.1 17-Aug-2001  nathanw file bridgestp.c was added on branch nathanw_sa on 2001-08-24 00:12:05 +0000
 1.3.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.3.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.5.8.1 29-Apr-2005  kent sync with -current
 1.6.4.4 21-Jan-2008  yamt sync with head
 1.6.4.3 03-Sep-2007  yamt sync with head.
 1.6.4.2 30-Dec-2006  yamt sync with head.
 1.6.4.1 21-Jun-2006  yamt sync with head.
 1.7.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.7.10.1 19-Apr-2006  elad sync with head.
 1.7.8.1 24-May-2006  yamt sync with head.
 1.7.6.1 22-Apr-2006  simonb Sync with head.
 1.7.4.1 09-Sep-2006  rpaulo sync with head
 1.8.10.2 10-Dec-2006  yamt sync with head.
 1.8.10.1 22-Oct-2006  yamt sync with head
 1.8.8.1 18-Nov-2006  ad Sync with head.
 1.10.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.11.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.11.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.11.2.1 09-Oct-2007  ad Sync with head.
 1.12.14.1 02-Jan-2008  bouyer Sync with HEAD
 1.12.10.1 26-Dec-2007  ad Sync with head.
 1.12.8.1 18-Feb-2008  mjf Sync with HEAD.
 1.12.2.1 09-Jan-2008  matt sync with HEAD
 1.13.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.13.10.1 04-May-2009  yamt sync with head.
 1.14.38.1 10-Aug-2014  tls Rebase.
 1.14.24.2 03-Dec-2017  jdolecek update from HEAD
 1.14.24.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.4.5 28-Aug-2017  skrll Sync with HEAD
 1.17.4.4 09-Jul-2016  skrll Sync with HEAD
 1.17.4.3 22-Apr-2016  skrll Sync with HEAD
 1.17.4.2 19-Mar-2016  skrll Sync with HEAD
 1.17.4.1 06-Apr-2015  skrll Sync with HEAD
 1.23.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.23.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.26.40.1 02-Aug-2025  perseant Sync with HEAD
 1.23 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.22 06-Aug-2016  pgoyette branches: 1.22.52;
Catch up with the renaming of module ppp --> if_ppp and avoid warning
messages at boot (or module load) time.
 1.21 06-Aug-2016  pgoyette Modularize the ppp driver, and adjust dependencies of the compressor
modules.

For now, this is still included as a built-in module in GENERIC kernels.
 1.20 29-Nov-2008  cube branches: 1.20.26; 1.20.44;
Fix handling of ppp compressor modules, from Andrew Doran's input.
- ref count each compressor
- allow {un,}registration of several modules at once
- une RUN_ONCE to make sure the mutex is initialised, because
unfortunately built-in (and bootloader-loaded) modules init functions
are run before pseudo-devices attach (reported by Nick Hudson).
 1.19 25-Nov-2008  cube Rework the way PPP compmressors are handled and allow them to be
automatically loaded when needed.
 1.18 15-Jun-2008  christos branches: 1.18.2; 1.18.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.17 20-Feb-2008  matt branches: 1.17.6; 1.17.8; 1.17.10; 1.17.12; 1.17.14;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.16 16-Nov-2006  christos branches: 1.16.24;
__unused removal on arguments; approved by core.
 1.15 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.14 11-Dec-2005  thorpej branches: 1.14.20; 1.14.22;
ANSI function decls and application of static.
 1.13 11-Dec-2005  christos merge ktrace-lwp.
 1.12 07-Aug-2003  agc branches: 1.12.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.11 15-Nov-2001  lukem branches: 1.11.16;
don't need <sys/types.h> when including <sys/param.h>
 1.10 12-Nov-2001  lukem add RCSIDs
 1.9 18-Jul-2001  thorpej bzero -> memset
 1.8 25-Aug-2000  thorpej branches: 1.8.2; 1.8.4;
Don't use MALLOC() for variable-sized allocations.
 1.7 12-Mar-1997  christos branches: 1.7.22;
Update to ppp-2.3b4; from Paul Mackerras
 1.6 13-Oct-1996  christos backout previous kprintf change
 1.5 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.4 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.3 03-Mar-1996  thorpej Remove extra RCS id.
 1.2 13-Feb-1996  christos Net prototypes
 1.1 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.7.22.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.4.1 03-Aug-2001  lukem update to -current
 1.8.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.8.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.8.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.11.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.11.16.1 03-Aug-2004  skrll Sync with HEAD
 1.12.16.3 27-Feb-2008  yamt sync with head.
 1.12.16.2 30-Dec-2006  yamt sync with head.
 1.12.16.1 21-Jun-2006  yamt sync with head.
 1.14.22.2 10-Dec-2006  yamt sync with head.
 1.14.22.1 22-Oct-2006  yamt sync with head
 1.14.20.1 18-Nov-2006  ad Sync with head.
 1.16.24.1 23-Mar-2008  matt sync with HEAD
 1.17.14.1 18-Jun-2008  simonb Sync with head.
 1.17.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.17.10.1 04-May-2009  yamt sync with head.
 1.17.8.1 17-Jun-2008  yamt sync with head.
 1.17.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.17.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.18.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.18.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.20.44.1 05-Oct-2016  skrll Sync with HEAD
 1.20.26.1 03-Dec-2017  jdolecek update from HEAD
 1.22.52.1 02-Aug-2025  perseant Sync with HEAD
 1.8 27-May-2021  christos Simplify; no need to special case the small buffer zero src_len.
lla_snprintf1 never returns -1.
 1.7 27-May-2021  christos Don't use the stack, print to the buffer directly (this was one of the
biggest stack users).
 1.6 30-Apr-2019  kre branches: 1.6.14; 1.6.16;
Add the missing add. (Return to the earlier state, done differently.)

When dl_print() was converted to use lla_snprintf() the offset to
the LLA in dl_addr.dl_data was forgotten (dl_data contains both
the interface name and the LL addr, we want the latter, not the former).

When there is no data (src_len == 0), still null terminate the output buffer
(provided there is space in it for the \0).
 1.5 29-Apr-2019  christos match definition of hexdigits[] to the declaration in <sys/systm.h>
 1.4 29-Apr-2019  roy Move lla_snprintf from if_arp.c to dl_print.c
 1.3 06-Apr-2016  christos branches: 1.3.16; 1.3.20;
pretty-print link addresses.
 1.2 02-Dec-2014  christos branches: 1.2.2;
missed _
 1.1 02-Dec-2014  christos - split struct dladdr out of struct sockaddr_dl
- add routines to print struct sockaddr_dl and struct dladdr
- make if_dl.h idempotent
 1.2.2.3 22-Apr-2016  skrll Sync with HEAD
 1.2.2.2 06-Apr-2015  skrll Sync with HEAD
 1.2.2.1 02-Dec-2014  skrll file dl_print.c was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.3.20.1 10-Jun-2019  christos Sync with HEAD
 1.3.16.2 03-Dec-2017  jdolecek update from HEAD
 1.3.16.1 06-Apr-2016  jdolecek file dl_print.c was added on branch tls-maxphys on 2017-12-03 11:39:02 +0000
 1.6.16.1 31-May-2021  cjep sync with head
 1.6.14.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.25 02-Sep-2024  christos merge changes from libpcap-1.10.5
 1.24 17-Aug-2023  christos branches: 1.24.6;
Use the version from libpcap-1.10.4
 1.23 28-May-2022  andvar fix various typos in comments, mainly origional->original,
extened->extended, incomming->incoming.
 1.22 05-Dec-2021  msaitoh s/preceed/preced/ in comment.
 1.21 05-Dec-2021  msaitoh s/accomodate/accommodate/ in comment.
 1.20 01-Oct-2019  christos sync with libcap-1.9.1
 1.19 03-Sep-2018  christos sync with libpcap-1.9.0
 1.18 08-Feb-2018  dholland branches: 1.18.2; 1.18.4;
Typos.
 1.17 24-Jan-2017  christos Sync with libpcap-1.8.1
 1.16 31-Mar-2015  christos branches: 1.16.2; 1.16.4;
update with new entries from libpcap-1.7.2
 1.15 19-Nov-2014  christos branches: 1.15.2;
Add BPF_MOD/BPF_XOR, sync DLT entries and document unused bpf instructions.
From libpcap-1.6.2
 1.14 07-Apr-2013  kardel recover DLT_HIPPY and DLT_HDLC from before for if_hippisubr.c and hd64570.c
 1.13 06-Apr-2013  christos update from libpcap
 1.12 21-Dec-2011  christos branches: 1.12.6;
PR/45730: David Holland: Avoid having 2 copies of bpf.h in /usr/include.
This adds the missing entries from libpcap to make libpcap compile with
our bpf.h.
 1.11 27-Feb-2006  drochner branches: 1.11.104; 1.11.108;
add missing DLTs from the libpcap-0.9.4 distribution
 1.10 10-Dec-2005  elad branches: 1.10.2; 1.10.4; 1.10.6;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.9 28-Sep-2004  dyoung branches: 1.9.12;
Add several new DLTs. From tcpdump.org.
 1.8 22-Jun-2004  itojun prepare PF-related hooks. reviewed by matt, perry, christos
 1.7 16-Nov-2003  dyoung Add data-link type DLT_IEEE802_11_RADIO to wi and atw. DLT_IEEE802_11_RADIO
lets you monitor radio stats like received signal strength, which
diversity antenna was used, channel/frequency, modulation, and data
rate.
 1.6 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.5 17-Apr-2003  salo branches: 1.5.2;
depreceated->deprecated
 1.4 28-Aug-2002  onoe Define new DLT: DLT_IEEE802_11, DLT_PRISM_HEADER, and DLT_AIRONET_HEADER
from tcpdump.org
 1.3 10-Sep-2001  bjh21 branches: 1.3.10;
Add MI Econet support. This is lacking any interfaces to higher-layer
protocols, and lacking any timeouts, but it basically works, doing four-way
handshakes in both directions and incoming Machine Peek operations.

Oh, and Econet is Acorn's ancient, proprietary 500kbit/s networking
technology.
 1.2 29-Apr-2001  martin branches: 1.2.2; 1.2.4;
Add an in-kernel PPPoE (ppp over ethernet, RFC 2516) implementation,
based on the existing net/if_spppsubr.c stuff.

While there are completely userland (bpf based) implementations available,
those have a vastly larger per packet overhead thus causing major CPU
overhead and higher latency. On an i386 base router, running a 486DX at 50MHz
my line (768kBit/s downstream) was limited to something (varying) between 10
and 20 kByte/s effective download rate. With this implementation I get full
bandwidth (~85kByte/s).

This is client side only. Arguably the right way to add full PPPoE support
(including server side) would be a variation of the ppp line discipline and
appropriate modifications to pppd. I promise every help I can give to anyone
doing that - but I needed this realy fast. Besids, on low memory NAT boxes
with typically a single PPPoE connection, this implementation is more
lightweight than a pppd based one, which nicely fits my needs.
 1.1 12-Dec-2000  thorpej branches: 1.1.2; 1.1.4;
Put the BPF DLT_* constants into their own header file so that things
that reference them don't have to slurp in all of the BPF headers.

Define a new generic RAWAF type that is like DLT_RAW, but isn't specific
to IP (the macro takes an AF_* constant as an argument to generate the
actual type).
 1.1.4.3 17-Sep-2002  nathanw Catch up to -current.
 1.1.4.2 21-Sep-2001  nathanw Catch up to -current.
 1.1.4.1 21-Jun-2001  nathanw Catch up to -current.
 1.1.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.1.2.1 12-Dec-2000  bouyer file dlt.h was added on branch thorpej_scsipi on 2000-12-13 15:50:27 +0000
 1.2.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.2.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.2.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.3.10.1 29-Aug-2002  gehenna catch up with -current.
 1.5.2.5 11-Dec-2005  christos Sync with head.
 1.5.2.4 19-Oct-2004  skrll Sync with HEAD
 1.5.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.1 03-Aug-2004  skrll Sync with HEAD
 1.9.12.1 21-Jun-2006  yamt sync with head.
 1.10.6.1 22-Apr-2006  simonb Sync with head.
 1.10.4.1 09-Sep-2006  rpaulo sync with head
 1.10.2.1 01-Mar-2006  yamt sync with head.
 1.11.108.1 18-Feb-2012  mrg merge to -current.
 1.11.104.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.104.1 17-Apr-2012  yamt sync with head
 1.12.6.2 03-Dec-2017  jdolecek update from HEAD
 1.12.6.1 23-Jun-2013  tls resync from head
 1.15.2.2 05-Feb-2017  skrll Sync with HEAD
 1.15.2.1 06-Apr-2015  skrll Sync with HEAD
 1.16.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.16.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.18.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.4.1 10-Jun-2019  christos Sync with HEAD
 1.18.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.24.6.1 02-Aug-2025  perseant Sync with HEAD
 1.1 12-Oct-2025  thorpej Some platforms have rules for retrieving the MAC address for an interface
beyond what properties exist. For example, a local address maybe be
present in a device tree property, but a system-wide property may indicate
that it should not be used (in favor of e.g. a singular system MAC addres -
LOOKIN' AT YOU, SUNW!).

So, the ether-get-mac-address device call is introduced to handle this
situation. Consult it before the standard properites, and if it succeeds,
use its result.
 1.2 12-Oct-2025  thorpej Reven for ether_calls,v 1.1.
 1.1 12-Oct-2025  thorpej Some platforms have rules for retrieving the MAC address for an interface
beyond what properties exist. For example, a local address maybe be
present in a device tree property, but a system-wide property may indicate
that it should not be used (in favor of e.g. a singular system MAC addres -
LOOKIN' AT YOU, SUNW!).

So, the ether-get-mac-address device call is introduced to handle this
situation. Consult it before the standard properites, and if it succeeds,
use its result.
 1.1 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.9 15-Sep-2024  andvar s/chanks/chunks/ and s/chekcsum/checksum/ in comment.
 1.8 03-Sep-2021  andvar branches: 1.8.10;
fix typos in comments, mainly s/extention/extension/ and s/sufficent/sufficient/
 1.7 27-Mar-2020  jdolecek replace the conditional m_pullup() on start of ether_sw_offload_tx()
with a KASSERT(), to make it clear no mbuf manipulation is ever done here
 1.6 15-Dec-2018  rin branches: 1.6.2; 1.6.6;
Improve wording in comments: replace "chain" with "queue" for
sequence of mbuf's connected by m_nextpkt, in order to avoid
confusion with those connected by m_next.

No binary changes.
 1.5 15-Dec-2018  rin Replace panic with rate-limited LOG_ERR message when we encounter
invalid ether frame with non-zero csum flags.

Requested by thorpej.
 1.4 13-Dec-2018  rin Panic rather than silently dropping packets when TX offload options are
enabled for unsupported frame types.
 1.3 13-Dec-2018  rin Also take care of non-DIAGNOSTIC case.
 1.2 13-Dec-2018  rin Fix (bridge && !inet6) build.
 1.1 12-Dec-2018  rin PR kern/53562

Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.

OK msaitoh thorpej
 1.6.6.3 08-Apr-2020  martin Merge changes from current as of 20200406
 1.6.6.2 10-Jun-2019  christos Sync with HEAD
 1.6.6.1 15-Dec-2018  christos file ether_sw_offload.c was added on branch phil-wifi on 2019-06-10 22:09:45 +0000
 1.6.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.6.2.1 15-Dec-2018  pgoyette file ether_sw_offload.c was added on branch pgoyette-compat on 2018-12-26 14:02:04 +0000
 1.8.10.1 02-Aug-2025  perseant Sync with HEAD
 1.1 12-Dec-2018  rin branches: 1.1.2; 1.1.6;
PR kern/53562

Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.

OK msaitoh thorpej
 1.1.6.2 10-Jun-2019  christos Sync with HEAD
 1.1.6.1 12-Dec-2018  christos file ether_sw_offload.h was added on branch phil-wifi on 2019-06-10 22:09:45 +0000
 1.1.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.1.2.1 12-Dec-2018  pgoyette file ether_sw_offload.h was added on branch pgoyette-compat on 2018-12-26 14:02:04 +0000
 1.22 22-Nov-2021  msaitoh Add LLDP and MACSec.
 1.21 22-Nov-2021  msaitoh Modify comment:

s/Netbios/NetBIOS/
s/PPPOE/PPPoE/
 1.20 22-Nov-2021  msaitoh s/repsonse/response/ in comment.
 1.19 01-Jan-2020  ryo Add the ETHERTYPE_QINQ for 802.1ad VLAN stacking
 1.18 23-Sep-2012  chs branches: 1.18.38; 1.18.42;
add entries for AOE and FCOE.
 1.17 10-Dec-2005  elad branches: 1.17.110; 1.17.120;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.16 07-Jan-2005  yamt branches: 1.16.10;
add ETHERTYPE_SLOWPROTOCOLS. (0x8809)
 1.15 23-Jul-2004  mycroft Add ETHERTYPE_PAE.
 1.14 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 10-Feb-2002  thorpej branches: 1.13.16;
Add the Ethertype for 802.3x flow control packets.
 1.12 18-Oct-2001  matt Add ETHERTYPEs for MPLS (Unicast & Multicast).
 1.11 11-Jun-2001  wiz branches: 1.11.2;
Fix various misspellings of compatible/compatibility.
 1.10 29-Sep-1999  ad branches: 1.10.14;
Define ETHERTYPE_VLAN: IEEE 802.1Q VLAN tagging.
 1.9 21-May-1999  augustss Make this file syntactically correct again.
 1.8 20-May-1999  tsarna Add a *bunch* of types (file went from ~4K to ~16K!)
 1.7 20-Mar-1999  thorpej branches: 1.7.4;
Define the PPPoE Discovery and PPPoE ethertypes.
 1.6 13-Oct-1998  kim Put back ETHERTALK_AT (but I did convert *all* code to ETHERTYPE_ATALK),
so if vendors (or something) used it, it is still found. Also added short
comments for each alias to explain why they are there.
 1.5 13-Oct-1998  kim Use ETHERTYPE_ATALK instead of ETHERTYPE_AT. The former seems more common.
Our other constants also use "ATALK".

Added many new ETHERTYPE constants to sys/net/ethertypes.h, including the
ones from libpcap and tcpdump "ethertype.h" files.
 1.4 09-Sep-1998  thorpej Add/move some Ethertypes, PR #5997, Heiko W.Rupp.
 1.3 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.2 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.1 06-Mar-1997  is branches: 1.1.2;
file ethertypes.h was initially added on branch is-newarp.
 1.1.2.1 06-Mar-1997  is Factor out the ETHERTYPE_XXX definitions. They are needed as
- Ethernet protocol type numbers
- ARP protocol type numbers, as per "Assigned Numbers".
This way we don't need to pull in all the Ethernet include file into the
ARP code.
 1.7.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.10.14.3 28-Feb-2002  nathanw Catch up to -current.
 1.10.14.2 22-Oct-2001  nathanw Catch up to -current.
 1.10.14.1 21-Jun-2001  nathanw Catch up to -current.
 1.11.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.11.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.16.3 11-Dec-2005  christos Sync with head.
 1.13.16.2 17-Jan-2005  skrll Sync with HEAD.
 1.13.16.1 03-Aug-2004  skrll Sync with HEAD
 1.16.10.1 21-Jun-2006  yamt sync with head.
 1.17.120.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.17.110.1 30-Oct-2012  yamt sync with head
 1.18.42.1 07-Jul-2020  martin Pull up following revision(s) (requested by jmcneill in ticket #980):

sys/dev/pci/if_aq.c: revision 1.4
sys/dev/pci/if_aq.c: revision 1.5
sys/arch/amd64/conf/GENERIC: revision 1.553
sys/dev/pci/files.pci: revision 1.419
sys/arch/amd64/conf/XEN3_DOM0: revision 1.170
sys/dev/pci/if_aq.c: revision 1.9
share/man/man4/Makefile: revision 1.693
sys/dev/pci/pcidevs: revision 1.1411
share/man/man4/aq.4: revision 1.1
share/man/man4/aq.4: revision 1.3
sys/arch/i386/conf/ALL: revision 1.479
share/man/man4/aq.4: revision 1.4
sys/dev/pci/if_aq.c: revision 1.10
sys/dev/pci/files.pci: revision 1.421
sys/dev/pci/if_aq.c: revision 1.11
sys/dev/pci/if_aq.c: revision 1.12
sys/dev/pci/if_aq.c: revision 1.13
sys/dev/pci/if_aq.c: revision 1.14
sys/dev/pci/if_aq.c: revision 1.15
sys/dev/pci/if_aq.c: revision 1.16
sys/dev/pci/pcidevs: revision 1.1408
sys/arch/amd64/conf/ALL: revision 1.135
sys/net/ethertypes.h: revision 1.19
sys/arch/i386/conf/GENERIC: revision 1.1218
distrib/sets/lists/man/mi: revision 1.1668
sys/dev/pci/if_aq.c: revision 1.1
sys/dev/pci/if_aq.c: revision 1.2
sys/dev/pci/pcidevs: revision 1.1395
sys/dev/pci/if_aq.c: revision 1.3
sys/arch/evbarm/conf/GENERIC64: revision 1.125

Add the ETHERTYPE_QINQ for 802.1ad VLAN stacking

add Aquantia AQC 10G network adapters
add support Aquantia AQC seriese 10G network adapters.

this driver is based on the FreeBSD version https://github.com/Aquantia/aqtion-freebsd ,
but drastically rewritten for NetBSD.

add aq(4)

Add Aquantia AQC100, AQC100S and D100.

add support VLAN HW filter

set/clear IFF_OACTIVE flag only on txring 0

make counters per queue

support internal PHY temperature sensor

Found by kUBSan:
- Use unsigned to avoid undefined behavior in aq_hw_init().
- Cast to unsigned to avoid undefined behavior in aq_set_mac_addr().

fix descriptions of register map in comment

return the ifmedia active status correctly even while the link is not up after attach.
pointed out by msaitoh@. thanks.

On FIBRE devices, there are times when linkstat interrupt doesn't occur?
reported from Andrius V. thanks.
- use polling instead of linkstat interrupt when FIBRE
- add AQ_FORCE_POLL_LINKSTAT options (not by default)

sort product table, and tabify

add support AQC100S and D100.
not tested, but they are probably the same as the AQC100.
 1.18.38.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.32 24-Sep-2021  knakahara Fix build failure for i386 INSTALL_XEN3PAE_DOMU, sorry.
 1.31 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.30 30-Jan-2021  jmcneill branches: 1.30.4; 1.30.6;
Add symmetric toeplitz implementation with integration for NICs, from OpenBSD.
 1.29 27-Sep-2020  roy branches: 1.29.2;
vether: Implement a virtual ethernet interface

The vether interface simulates a normal Ethernet interface by encapsulating
standard network frames with an Ethernet header, specifically for use as
a member in a bridge(4).

To use vether the administrator needs to configure an address onto the
interface so that packets can be routed to it. An Ethernet header will
be prepended and, if the vether interface is a member of a bridge(4),
the frame will show up there.

Taken from OpenBSD.
 1.28 13-Sep-2020  roy nd needs arp or inet6. inet is not enough.
 1.27 11-Sep-2020  roy Implement address agnostic Neighbor Detection.

This is heavily based on IPv6 Neighbor Detection and allows per protocol
timers which also facilitate Neighor Unreachability Detection.
 1.26 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.25 29-Jan-2020  thorpej Add support for MP-safe network interface statistics by maintaining them
in per-cpu storage, and collecting them for export in an if_data structure
when user-space wants them.

The new if_stat API is structured to make a gradual transition to the
new way in network drivers possible, and per-cpu stats are currently
disabled (thus there is no kernel ABI change). Once all drivers have
been converted, the old ABI will be removed, and per-cpu stats will be
enabled universally.
 1.24 20-Jan-2020  thorpej Remove FDDI support.
 1.23 19-Jan-2020  thorpej Remove Token Ring support.
 1.22 19-Jan-2020  thorpej Remove HIPPI support and the esh(4) driver that uses it. There have not
been any users of HIPPI for some time, and it is unlikely to be resurrected.
 1.21 19-Jan-2020  thorpej Remove the strip(4) - Starmode Radio IP - pseudo-device driver. It is
long since obsolete.
 1.20 12-Dec-2018  rin branches: 1.20.6;
PR kern/53562

Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.

OK msaitoh thorpej
 1.19 23-Sep-2018  maxv Remove ISDN from the kernel. It has remained unmaintained for a long time,
is of poor quality, and is now an obstacle to MP-ification. It was removed
ten years ago from FreeBSD for the same reason.

This retires a big user of the mbuf API, and will ease maintenance of the
kernel.
 1.18 06-Sep-2018  maxv Remove the network ATM code.
 1.17 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.16 27-Feb-2018  maxv branches: 1.16.2; 1.16.4;
Remove the Econet code. It was part of acorn26, which was removed a
month ago.
 1.15 16-Feb-2018  knakahara Introduce very simple Receive Side Scaling (RSS) utility.

ok by msaitoh@n.o.
 1.14 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.13 16-Feb-2017  knakahara branches: 1.13.6; 1.13.12;
add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.12 02-Feb-2017  ozaki-r Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net
 1.11 16-Sep-2016  pgoyette branches: 1.11.2;
Move kern_ctf.c into the dtrace_fbt module (the only place it is used)
rather than including in kernels with KDTRACE_HOOKS defined. Update
the dtrace_fbt module to depend on the zlib module.

Bump kernel version to avoid module mismatch.

Welcome to 7.99.38 !
 1.10 10-Aug-2016  knakahara follow renaming ifmpls to mpls.

This fixes i386 ALL build.
 1.9 05-Apr-2016  pgoyette branches: 1.9.2;
Update dependency: zlib is only needed for the swcrypto device, not for
any other component of opencrypto.
 1.8 26-Nov-2015  ozaki-r Fix build dependency of if_llatbl.c

if_llatbl.c is required if inet or inet6 is enabled. Depending on ether
doesn't suit for NDP case.
 1.7 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.6 01-Jun-2015  roy Back out prior

gimpy1@ we don't #include driver .h in netbsd32
 1.5 31-May-2015  roy Revert prior change, optionally include PPPOE and SPPP support again.
Fix compat_netbsd32 module building by enforcing both.
 1.4 31-May-2015  roy Revert prior as it's no longer needed.
 1.3 31-May-2015  roy Allow sppp to be #if NSPPP > 0
 1.2 02-Dec-2014  christos - split struct dladdr out of struct sockaddr_dl
- add routines to print struct sockaddr_dl and struct dladdr
- make if_dl.h idempotent
 1.1 12-Oct-2014  uebayasi branches: 1.1.2;
Move net definitions.
 1.1.2.7 28-Aug-2017  skrll Sync with HEAD
 1.1.2.6 05-Feb-2017  skrll Sync with HEAD
 1.1.2.5 05-Oct-2016  skrll Sync with HEAD
 1.1.2.4 22-Apr-2016  skrll Sync with HEAD
 1.1.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.2 22-Sep-2015  skrll Sync with HEAD
 1.1.2.1 06-Apr-2015  skrll Sync with HEAD
 1.9.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.9.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.11.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.13.12.2 03-Dec-2017  jdolecek update from HEAD
 1.13.12.1 16-Feb-2017  jdolecek file files.net was added on branch tls-maxphys on 2017-12-03 11:39:02 +0000
 1.13.6.2 26-Feb-2018  snj Pull up following revision(s) (requested by knakahara in ticket #567):
distrib/sets/lists/comp/mi: 1.2182-1.2183
sys/dev/pci/if_wm.c: 1.564
sys/dev/pci/ixgbe/ixgbe.c: 1.122
sys/dev/pci/ixgbe/ixgbe_rss.h: 1.3
sys/dev/pci/ixgbe/ixv.c: 1.78
sys/net/Makefile: 1.35-1.36
sys/net/files.net: 1.15
sys/net/rss_config.c: 1.1
sys/net/rss_config.h: 1.1
Introduce very simple Receive Side Scaling (RSS) utility.
ok by msaitoh@n.o.
--
Apply RSS utility to wm(4).
ok by msaitoh@n.o.
--
Apply RSS utility to ixg(4) and ixv(4).
ok by msaitoh@n.o.
--
Fix build failure, sorry.
--
Currently, it is not necessary to install rss_config.h. Pointed out by msaitoh@n.o.
 1.13.6.1 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.16.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.16.4.1 10-Jun-2019  christos Sync with HEAD
 1.16.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.16.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.16.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.20.6.2 29-Feb-2020  ad Sync with head.
 1.20.6.1 25-Jan-2020  ad Sync with head.
 1.29.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.30.6.1 31-May-2021  cjep sync with head
 1.30.4.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.6 29-Aug-2011  jmcneill build pf module with WARNS=3, and remove the need for -Wno-shadow
 1.5 14-Sep-2009  degroote Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@
 1.4 18-Jun-2008  yamt merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@
 1.3 11-Dec-2005  christos branches: 1.3.70; 1.3.72; 1.3.74; 1.3.76; 1.3.78;
merge ktrace-lwp.
 1.2 01-Jun-2005  yamt -Wno-shadow for some pf files.
IMO there is no point to fix them in our tree.
 1.1 22-Jun-2004  itojun branches: 1.1.2;
foundation for PF
 1.1.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.2.1 22-Jun-2004  skrll file files.pf was added on branch ktrace-lwp on 2004-08-03 10:54:11 +0000
 1.3.78.1 18-Jun-2008  simonb Sync with head.
 1.3.76.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.3.74.2 16-Sep-2009  yamt sync with head
 1.3.74.1 04-May-2009  yamt sync with head.
 1.3.72.1 19-Apr-2008  yamt Peter Postma's work-in-progress pf import from OpenBSD 4.2.
updated to -current by me.
 1.3.70.1 29-Jun-2008  mjf Sync with HEAD.
 1.535 12-Jun-2025  ozaki-r if: protect if_link_state_change_process with IFNET_LOCK

This change avoids race conditions between if_link_state_change handlers
and other operations on a target interface such as if_ioctl.
 1.534 05-Jun-2025  ozaki-r if: remove unused ifa_ifwithaf()
 1.533 05-Jun-2025  ozaki-r Apply if_first_addr() and if_first_addr_psref()
 1.532 05-Jun-2025  ozaki-r if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
 1.531 16-Dec-2024  ozaki-r if: add counts of packet drops on if_percpuq to if_iqdrops

So packets dropped on if_percpuq appear in ifconfig -v.
 1.530 29-Jun-2024  riastradh branches: 1.530.2;
if_stats(9): Add ifp argument to if_stat..._ref.

This will enable us to pass the ifp through to a dtrace probe inside.

No functional change intended in this change, but this is an API
change visible to modules so it shouldn't be pulled up.

PR kern/58377
 1.529 24-Feb-2023  riastradh branches: 1.529.2;
sys/net/if.c: Eliminate __HAVE_ATOMIC_AS_MEMBAR conditionals.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
 1.528 25-Nov-2022  msaitoh branches: 1.528.2;
KNF. No functional change.
 1.527 24-Oct-2022  msaitoh Make ifq_drops in struct ifqueue and struct ifaltq 64 bit.
 1.526 20-Sep-2022  knakahara Remove routes on an address removal if the routes referencing to the address. Implemented by ozaki-r@n.o.

A route that has a gateway is on a connected route can be invalid if the
connected route is deleted, i.e., an associated address is removed.
Traditionally NetBSD doesn't sweep such a route on the address removal. Sending
packets over the route fails with "No route to host". Also the route holds an
orphan ifaddr as rt_ifa that is destructed say by in_purgeaddr.

If the same address is assgined again in such a state, there can be two
different ifaddr objects with the same address. Until recently it's not a
big problem because we can send packets anyway. However after MP-ification
of the network stack, we can't send packets because we strictly check if rt_ifa
(i.e., the (old) ifaddr) is valid.

This change automatically removes such routes on a removal of an associated
address to avoid keeping inconsistent routes.
 1.525 03-Sep-2022  thorpej Garbage-collect everything related to struct domain::dom_ifqueues
(except dom_ifqueues itself, until the next kernel version bump).
It's no longer used now that nothing uses the legacy netisr mechanism.
 1.524 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.523 02-Sep-2022  thorpej Re-factor how pktq_barrier() is issued by if_detach().

Rather than excplicitly referencing ip_pktq and ip6_pktq in if_detach(),
instead add all pktqueues to a global list. This list is then used in
the new pktq_ifdetach() function to issue a barrier on all pktqueues.

Note that the performance of this list is not critical; it will seldom
be accessed (then pktqueues are created/destroyed and when network
interfaces are detached), and so a simple synchronization strategy using
a rwlock is sufficient.
 1.522 02-Sep-2022  thorpej f_detach(): Drain the protocol input queues before the pr_purgeif()
calls; pktq_barrier() doesn't remove packets from the queue, it waits
for the packets enqueued before the barrier to drain. This, in turn,
may cause the protocols to gain additional references to the interface
that's detaching. By draining the queues first, we ensure that no
additional references will be taken after calling pr_purgeif().
 1.521 02-Sep-2022  thorpej pktqueue: Re-factor sysctl handling.

Provide a new pktq_sysctl_setup() function that attaches standard
pktq sysctl nodes below a specified parent node, with either a
fixed node ID or CTL_CREATE to dynamically assign node IDs. Make
all of the sysctl handlers private to pktqueue.c, and remove the
INET- and INET6-specific pktqueue sysctl code from net/if.c.
 1.520 21-Aug-2022  skrll Sprinkle more const. NFC.
 1.519 21-Aug-2022  skrll Sprinkle const. NFC.
 1.518 21-Aug-2022  skrll Style / whitespace.
 1.517 20-Aug-2022  riastradh ifnet(9): Make sure to use if_timer and if_watchdog at IPL_NET.
 1.516 20-Aug-2022  riastradh ifnet(9): On if_deactivate, don't make null if_slowtimo nonnull.

Fixes crash on detach.
 1.515 20-Aug-2022  riastradh ifnet(9): Kernel lock for struct ifnet::if_timer.
 1.514 20-Aug-2022  riastradh ifnet(9): Add sysctl net.interaces.ifN.watchdog.trigger.

For interfaces that use if_watchdog, this forces it to be called at
the next tick.
 1.513 20-Aug-2022  riastradh ifnet(9): Defer if_watchdog (a.k.a. if_slowtimo) to workqueue.

This is necessary to make mii_down and the *_init/stop routines that
call it to sleep waiting for MII callouts on other CPUs.

Mark the workqueue and callout MP-safe; only take the kernel lock
around the callback.

No kernel bump despite change to struct ifnet because the change is
ABI-compatible and using the callout outside net/if.c has never been
kosher.
 1.512 17-Aug-2022  rillig if.c: fix typo in comment
 1.511 29-Jul-2022  skrll Fix a typo in a comment.
 1.510 29-Jul-2022  skrll KNF a comment
 1.509 11-Jul-2022  skrll KNF two comments.
 1.508 11-Jul-2022  skrll Grammar in a comment.
 1.507 08-Jul-2022  skrll alredy -> already
 1.506 07-Jul-2022  riastradh ifioctl(9): Don't touch ifconf or ifreq until command is validated.

sys_ioctl validates the data pointer according to the command's size
and direction. But userland may ioctl commands other than
OSIOCGIFCONF or OOSIOCGIFCONF -- and if userland passes an IOC_VOID
command, the argument is passed through verbatim and may be null.

Reported-by: syzbot+19b1bf83e5481273eafc@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=f4c91a7dcd31901c80d91af6ed01456faf0a7286

Reported-by: syzbot+442c033feb784d055185@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=4a3a4b92dbe9695046ff17a5474cef52aed23e0b

Reported-by: syzbot+4c87d0cdf7025741ea7a@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=3e5f42c998e43ad42da40dec3c7873e6aae187e4
 1.505 22-May-2022  andvar fix various small typos, mainly in comments.
 1.504 11-May-2022  andvar fix various typos in comments.
 1.503 09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.502 12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.501 31-Dec-2021  riastradh sys/net: Document if_mcast_op with comment and refuse other commands.

Meant only for multicast addition/deletion operations, nothing else.
 1.500 31-Dec-2021  riastradh sys/net: Document if_flags_set with a comment.
 1.499 31-Dec-2021  riastradh sys/net: Assert IFNET_LOCKED in if_ioctl, if_init, and if_stop.

Exception: Not for SIOCADDMULTI/SIOCDELMULTI, for which it is the
driver's responsibility to take internal locks. Typically this is
already done via struct ethercom::ec_lock.
 1.498 31-Dec-2021  riastradh sys: Use if_ioctl wrapper function.
 1.497 31-Dec-2021  riastradh sys/net: New functions if_ioctl, if_init, and if_stop.

These are wrappers, suitable for inserting appropriate kasserts
regarding the API's locking contract, for the corresponding functions
in struct ifnet.

Since these are intended to commit configuration changes to the
interface, which may involve resetting the device, the caller should
hold IFNET_LOCK. However, I can't straightforwardly prove that all
callers do yet, so the assertion is disabled for now.
 1.496 30-Sep-2021  yamaguchi net: obsolete ifnet::if_link_state_chenged
that was used for updating link-state of vlan I/F

The obsoleted function is replaced with
ifnet::if_linkstate_hooks
 1.495 30-Sep-2021  yamaguchi carp: Register carp_carpdev_state to link-state change hook
 1.494 30-Sep-2021  yamaguchi lagg: Register lagg_linkstate_changed to link-state change hook
 1.493 30-Sep-2021  yamaguchi bridge: Register bridge_calc_link_state to link-state change hook
 1.492 30-Sep-2021  yamaguchi Provide a hook point called at change of link state
 1.491 30-Sep-2021  yamaguchi Replace ifnet::if_agriprivate with ifnet::if_lagg

agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).

After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
 1.490 21-Sep-2021  christos remove extra changes
 1.489 21-Sep-2021  christos don't opencode kauth_cred_get()
 1.488 16-Sep-2021  andvar fix various typos, mainly in comments.
 1.487 01-Jul-2021  blymn Back out fix for kern_pmf.c calling a null if_stop and apply a fix
suggested by Jared McNeill which sets if_stop to a stub function
which means that more than just the pmf is protected from the NULL call.
 1.486 29-Jun-2021  riastradh Make if_stats_init, if_attach, if_initialize return void.

percpu_alloc can't fail.


Author: Maya Rashish <maya@NetBSD.org>
Committer: Taylor R Campbell <riastradh@NetBSD.org>
 1.485 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.484 15-Oct-2020  roy branches: 1.484.6; 1.484.8;
net: remove IFEF_NO_LINK_STATE_CHANGE

This flag was only set for virtual interfaces.
All virtual interfaces have a means of knowing if they are going to work
or not and as such now support link state changes.

If we want this flag back, it should be used as an indicator that
the interfaces does not support link state changes that userland can use
so it can make a decision on what to do when the link state is UNKNOWN.
 1.483 27-Sep-2020  roy bridge: When an interface joins then mark addresses on it as tentative

The exact flow is detatch addresses, join bridge and then mark detached
addresses as tentative.
This ensures that Duplicate Address Detection for the joining interface
are performed across all members of the bridge.
 1.482 27-Sep-2020  roy bridge: Calculate link state as the best link state of any member

If any member is LINK_STATE_UP then it's LINK_STATE_UP.
Otherwise if any member is LINK_STATE_UNKNOWN then it's LINK_STATE_UNKNOWN.
Otherwise it's LINK_STATE_DOWN.
 1.481 26-Sep-2020  roy net: Add a callback to ifnet to notify of link state changes
 1.480 26-Sep-2020  roy net: Fix the setting of if_link_state

Link state changes are not dependant on the interface being up, but we also
need to guard against more link state changes being scheduled when the
interface is being detached.

We do this by clearing the link queue but keeping if_link_sheduled = true.
We can check for this in both if_link_state_change() and
if_link_state_change_work() to abort early as there is no point in doing
anything if the interface is being detached because if_down() is called
in if_detach() after the workqueue has been drained to the same overall
effect.
 1.479 16-Jul-2020  msaitoh Don't accept negative value.

Reported-by: syzbot+e71a77402d6668f1868d@syzkaller.appspotmail.com
 1.478 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.477 05-May-2020  jdolecek adjust comment - sosetopt() path doesn't take IFNET_LOCK()
 1.476 05-May-2020  jdolecek add a NOMPSAFE comment for if_mcast_op(), it is called from context
which doesn't hold IFNET_LOCK() in some cases, and calls if_ioctl

this needs to be sorted out for NET_MPSAFE
 1.475 05-May-2020  jdolecek remove struct ifnet if_mcastop, it's not used by anything
 1.474 18-Apr-2020  thorpej In _if_down(), release the link state change lock before calling
workqueue_wait(). Add a comment explaining how the locking here
works.

PR kern/55018.
 1.473 21-Feb-2020  joerg branches: 1.473.4;
Explicitly cast pointers to uintptr_t before casting to enums. They are
not necessarily the same size. Don't cast pointers to bool, check for
NULL instead.
 1.472 07-Feb-2020  thorpej IPL_SOFTNET -> IPL_NET in previous.
 1.471 06-Feb-2020  thorpej Perform link state change processing on a work queue, rather than in a
softint.
 1.470 01-Feb-2020  riastradh Switch sys/net to percpu_create.
 1.469 29-Jan-2020  thorpej Add support for MP-safe network interface statistics by maintaining them
in per-cpu storage, and collecting them for export in an if_data structure
when user-space wants them.

The new if_stat API is structured to make a gradual transition to the
new way in network drivers possible, and per-cpu stats are currently
disabled (thus there is no kernel ABI change). Once all drivers have
been converted, the old ABI will be removed, and per-cpu stats will be
enabled universally.
 1.468 20-Jan-2020  thorpej Remove FDDI support.
 1.467 19-Jan-2020  thorpej Remove Token Ring support.
 1.466 17-Dec-2019  christos branches: 1.466.2;
Protect network ioctls from non-authorized users. (Ilja Van Sprundel)
 1.465 14-Nov-2019  maxv Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
 1.464 13-Nov-2019  ozaki-r Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.463 06-Oct-2019  uwe xc_barrier - convenience function to xc_broadcast() a nop.

Make the intent more clear and also avoid a bunch of (xcfunc_t)nullop
casts that gcc 8 -Wcast-function-type is not happy about.
 1.462 25-Sep-2019  ozaki-r Make panic messages more informative
 1.461 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.460 13-Sep-2019  msaitoh if_flags is neither int nor short. It's unsigned short.
 1.459 20-Aug-2019  roy if: announce flag changes other than up or down

For example toggling promiscuous mode or disabling ARP.

XXX Pullup -9
 1.458 15-Aug-2019  ozaki-r Restore if_ioctl on error of ifc_destroy

Otherwise subsequence ioctls won't work.

Patch from Harold Gutch on PR kern/54434 (tweaked a bit by me)
 1.457 25-Jul-2019  knakahara branches: 1.457.2;
micro-optimization for if_snd_is_used()
 1.456 04-Jul-2019  ozaki-r Add support for a network interface description.

ioctl(2):
- Add SIOCGIFDESCR/SIOCSIFDESCR commands to get/set the description.

This enables to make a memo for interface, like "Home network" or "Remote VPN".

From t-kusaba@IIJ
 1.455 21-May-2019  msaitoh KNF. No functional change.
 1.454 17-May-2019  msaitoh The max subtype of the ifmedia word is 31. It's too small for Ethernet now.
We currently use use it up to 30. We should extend the limit to be able to use
more than 10Gbps speeds. Our ifmedia(4) is inconvenience and have some problem
so we should redesign the interface, but it's too late for netbsd-9 to do it.
So, we keep the data structure size and modify the structure a bit. The
strategy is almost the same as FreeBSD. Many bits of IFM_OMASK for Ethernet
have not used, so use some of them for Ethernet's subtype.

The differences against FreeBSD are:
- We use NetBSD style compat code (i.e. no SIOCGIFXMEDIA).
- FreeBSD's IFM_ETH_XTYPE's bit location is from 11 to "14" even though
IFM_OMASK is from 8 to "15". We use _IFM_ETH_XTMASK from bit 13 to "15".
- FreeBSD changed the meaning of IFM_TYPE_MATCH(). I think we should
not do it. We keep it not changing and added new IFM_TYPE_SUBTYPE_MATCH()
macro for matching both TYPE and SUBTYPE.
- Added up to 400GBASE-SR16.

New layout of the media word is as follows (from ifmedia_h):

* if_media Options word:
* Bits Use
* ---- -------
* 0-4 Media subtype MAX SUBTYPE == 255 for ETH and 31 for others
* 5-7 Media type
* 8-15 Type specific options
* 16-18 Mode (for multi-mode devices)
* 19 (Reserved for Future Use)
* 20-27 Shared (global) options
* 28-31 Instance
*
* 3 2 1
* 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
* +-------+---------------+-+-----+---------------+-----+---------+
* | | |R| | | | |
* | IMASK | GMASK |F|MMASK+-----+ OMASK |NMASK| TMASK |
* | | |U| |XTMSK| | | |
* +-------+---------------+-+-----+-----+---------+-----+---------+
* <-----> <---> <--->
* IFM_INST() IFM_MODE() IFM_TYPE()
*
* IFM_SUBTYPE(other than ETH)<------->
*
* <---> IFM_SUBTYPE(ETH)<------->
*
*
* <-------------> <------------->
* IFM_OPTIONS()
 1.453 17-May-2019  ozaki-r Implement an aggressive psref leak detector

It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.

Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted. A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread. Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.

The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire). We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.

The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.

Proposed on tech-kern
 1.452 15-May-2019  ozaki-r Store IFF_ALLMULTI in ec_flags instead of if_flags to avoid data races

IFF_ALLMULTI is set/unset to if_flags via if_mcast_op. To avoid data races on
if_flags, IFNET_LOCK was added for if_mcast_op. Unfortunately it produces
a deadlock so we want to remove added IFNET_LOCK by avoiding the data races by
another approach.

This fix introduces ec_flags to struct ethercom and stores IFF_ALLMULTI to it.
ec_flags is protected by ETHER_LOCK and thus IFNET_LOCK is no longer necessary
for if_mcast_op. Note that the fix is applied only to MP-safe drivers that
the data races matter.

In the kernel, IFF_ALLMULTI is set by a driver and used by the driver itself.
So changing the storing place doesn't break anything. One exception is
ioctl(SIOCGIFFLAGS); we have to include IFF_ALLMULTI in a result if needed to
export the flag as well as before.

A upcoming commit will remove IFNET_LOCK.

PR kern/54189
 1.451 20-Apr-2019  pgoyette Typos in comments. NFCI.
 1.450 16-Apr-2019  msaitoh Rename ifreqo2n() and ifreqo2n() to IFREQO2N_43() and IFREQN2O_43():
- ifreqo2n() and ifreqn2o() are for COMPAT_43, so add _43 to the name.
- Uppercase to make it clear those are macros.
 1.449 15-Apr-2019  christos Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks!
 1.448 11-Apr-2019  msaitoh Remove inclusion of compat/sys/socket.h. It's not required anymore.
 1.447 23-Mar-2019  pgoyette Replace compile-time checking for vlan code with a module hook.

Should resolve the errors reported on irc when booting a kernel which
has agr without vlan:


[ 1.0000000] WARNING: module error: built-in module if_agr can't find builtin dependency `if_vlan'
[ 1.0000000] WARNING: module error: built-in module if_agr prerequisite if_vlan failed, error 2
 1.446 01-Mar-2019  pgoyette Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.
 1.445 29-Jan-2019  pgoyette Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.
 1.444 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.443 21-Dec-2018  msaitoh Add SIOCSETHERCAP. It's used to change ec_capenable.
 1.442 12-Dec-2018  rin PR kern/53562

Handle TX offload in software when a packet is sent via
bridge_output(). We can send it as is in the following
exceptional cases:

For unicast:

(1) When the destination interface is the same as source.

(2) When the destination supports all TX offload options
specified in a packet.

For multicast/broadcast:

(3) When all the members of the bridge support the specified
TX offload options.

For (3), add sc_csum_flags_tx flag to bridge softc, which is
logical AND b/w capabilities of TX offload options in member
interface (ifp->if_csum_flags_tx). The flag is updated when a
member is (i) added to or (ii) removed from a bridge, or (iii)
if_csum_flags_tx flag of a member interface is manipulated via
ifconfig(8).

Turn on M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx flag when
TSO[46] is enabled for that interface.

OK msaitoh thorpej
 1.441 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.440 30-Oct-2018  ozaki-r Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
 1.439 30-Oct-2018  ozaki-r Use atomic operations for ifa_refcnt
 1.438 30-Oct-2018  ozaki-r Remove a wrong assertion in ifaref

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.
 1.437 18-Oct-2018  knakahara fix panic when do ifconfig -vlanif and ifconfig vlanif again. advised by ozaki-r@.

e.g. do the following commands.
====================
# ifconfig vlan0 create
# ifconfig vlan0 vlan 100 vlanif wm0
# ifconfig vlan0 -vlanif wm0
# ifconfig vlan0 vlan 100 vlanif wm0
====================

ATF net/if_vlan do this type of test, however it cannot detect this bug.
Because the shmif(4)'s ifp->if_hwdl is always NULL as shmif(4)'s ethernet
address is set U/L bit.
See: https://nxr.netbsd.org/xref/src/sys/net/if_ethersubr.c#997
 1.436 07-Sep-2018  christos Flip the order of free'ing things to avoid crash (from ozaki-r). Tested
with a month's uptime. Used to crash once a week.
 1.435 06-Sep-2018  maxv Remove the network ATM code.
 1.434 27-Aug-2018  ozaki-r Restore splx removed accidentally at v1.406

Pointed out by k-goda@IIJ
 1.433 10-Aug-2018  knakahara fix if_snd_is_used(), ifp->if_snd is also used by if.c::if_transmit().
 1.432 10-Aug-2018  msaitoh - Fix a bug that drop counter shows incorrect vaule like
"net.inet.ip.ifq.drops = 72059810241052672"
- Change pktq's length sysctl to uint64_t.
 1.431 06-Aug-2018  msaitoh Change pktq's drops count sysctl from CTLTYPE_INT to CTLTYPE_QUAD.
 1.430 09-Jul-2018  christos Calling rtinit(sa_family = AF_LINK, RTM_DELETE, 0) is guaranteed not to
work. Remove bogus call leaving a KASSERT behind.
 1.429 03-Jul-2018  ozaki-r Fix net.inet6.ip6.ifq node doesn't exist

The node (and child nodes) is initialized in sysctl_net_pktq_setup, but the call
of sysctl_net_pktq_setup is skipped unexpectedly.

sysctl_net_pktq_setup is skipped if in6_present is false that indicates the
netinet6 component isn't loaded on rump kernels. However the flag is
accidentally always false because the flag is turned on in in6_dom_init that is
called after if_sysctl_setup on both normal and rump kernels.

Fix the issue by moving if_sysctl_setup after in6_dom_init (domaininit on normal
kernels). This fix is ad-hoc but good enough for netbsd-8. We should refine
the initialization order of network components in the future.

Pointed out by hikaru@
 1.428 26-Jun-2018  msaitoh branches: 1.428.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.427 01-Jun-2018  ozaki-r Make sure to remove all AF_LINK addresses in if_detach
 1.426 01-Jun-2018  ozaki-r Make sure to not change if_hwdl once set
 1.425 31-May-2018  ozaki-r Relax a lock check in if_mcast_op unless NET_MPSAFE

It seems that there remain some paths that don't satisfy the constraint that is
required only if NET_MPSAFE. So don't check it by default.

One known path is nd6_rtrequest => in6_addmulti => if_mcast_op, which is not
easy to address.
 1.424 24-May-2018  msaitoh Print "NET_MPSAFE enabled" if it's enabled.
 1.423 14-May-2018  ozaki-r Protect if_deferred_start_softint with KERNEL_LOCK if the interface isn't MP-safe
 1.422 14-May-2018  ozaki-r Protect packet input routines with KERNEL_LOCK and splsoftnet

if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.

if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@
 1.421 14-May-2018  ozaki-r Use if_is_mpsafe (NFC)
 1.420 12-Apr-2018  christos disentangle a bit more the compat ioctl code.
 1.419 30-Jan-2018  ozaki-r branches: 1.419.2;
Destroy ifq_lock at the end of if_detach

It still can be used in if_detach.
 1.418 10-Jan-2018  ozaki-r Check MP-safety in ifa_insert and ifa_remove only for IFEF_MPSAFE drivers

Eventually the assertions should pass for all drivers, however, at this point
it's too eager.

Fix PR kern/52895
 1.417 26-Dec-2017  ozaki-r Suppress the assertion of IFNET_LOCK in if_mcast_op if MROUTING

MROUTING doesn't deal with IFNET_LOCK yet.

Reported by kardel@
 1.416 15-Dec-2017  ozaki-r Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
 1.415 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.414 14-Dec-2017  ozaki-r Reorder some destruction routines in if_detach

- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
 1.413 11-Dec-2017  ozaki-r Wrap if_ioctl_lock with IFNET_* macros (NFC)

Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
 1.412 11-Dec-2017  ozaki-r Rename IFNET_LOCK to IFNET_GLOBAL_LOCK

IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
 1.411 08-Dec-2017  ozaki-r Revert "Make if_timer MP-safe if IFEF_MPSAFE"

Because it has decreased the performance of wm. And also I found that
wm_watchdog doesn't work well with if_watchdog framework at all. Sharing one
counter (if_timer) with multiple instances (hardware multi-queues) can't detect
a single (or some) stall of them because other instances reset the counter even
if the stalled one want the watchdog to fire.

Interfaces without IFEF_MPSAFE works safely with the original if_watchdog thanks
to KENREL_LOCK. OTOH, interfaces with IFEF_MPSAFE shouldn't use if_watchdog and
should implement their own watchdog timer that works with multiple instances.
 1.410 08-Dec-2017  ozaki-r Fix build of kernels without ether

By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.

PR kern/52790
 1.409 07-Dec-2017  ozaki-r Get rid of outdated comments
 1.408 07-Dec-2017  ozaki-r Ensure to call if_addr_init with holding if_ioctl_lock
 1.407 07-Dec-2017  ozaki-r Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH

At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
 1.406 06-Dec-2017  ozaki-r Make if_link_queue MP-safe if IFEF_MPSAFE

if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.

Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.

Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
 1.405 06-Dec-2017  ozaki-r Make if_timer MP-safe if IFEF_MPSAFE

if_timer, a counter used by if_watchdog (if_slowtimo), can be modified in
if_watchdog and if_start and/or interrupt handlers of some device drivers. All
such accesses were serialized by KERNEL_LOCK. If IFEF_MPSAFE is enabled,
KERNEL_LOCK of if_start (and perhaps interrupt handlers) is omitted and if_timer
becomes racy.

Fix the race condition by protecting if_timer by a spin mutex. if_watchdog_reset
and if_watchdog_stop are introduced to ensure to take the mutex on accessing
if_timer. Interface with IFEF_MPSAFE enabled must use the functions.

In addition, if_watchdog callout is now set CALLOUT_MPSAFE if IFEF_MPSAFE. It
means that if_watchdog implemented by a driver must be MP-safe if the driver is
set IFEF_MPSAFE.

Currenlty interfaces with IFEF_MPSAFE implementing if_watchdog and accessing
if_timer in if_start and interrupt handlers are only wm(4). wm is changed to
use the functions. (Its watchdog handler (wm_watchdog) is already MP-safe.

These contracts will be written somewhere in a further commit.

Note that the spin mutex is now ifp->if_snd.ifq_lock to avoid adding another
spin mutex to each interface. For now reusing it isn't problematic (see the
comment to know why) thought if that does matter in the future, feel free to
replace it with a new spin mutex. It's easy to do.
 1.404 06-Dec-2017  knakahara unify processing to check nesting count for some tunnel protocols.
 1.403 06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock on if_up and if_down

One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
 1.402 06-Dec-2017  ozaki-r Fix locking against myself on ifpromisc

vlan_unconfig_locked could be called with holding if_ioctl_lock.
 1.401 06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock when calling if_flags_set
 1.400 22-Nov-2017  ozaki-r Fix and make consistent of usages of psz/psref in ifconf variants
 1.399 22-Nov-2017  ozaki-r Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE

If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.

This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.

Proposed on tech-kern@ and tech-net@
 1.398 19-Nov-2017  christos remove useless cast, initialize family.
 1.397 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.396 23-Oct-2017  msaitoh if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
 1.395 27-Jun-2017  roy Introduce if_get_bylla to find an interface with the active
local link address.
 1.394 01-Jun-2017  chs branches: 1.394.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.393 19-May-2017  ozaki-r Allow CARP to call the link_state_change handler immediately

If the handler is delayed because of the indirection call via softint,
some operations are executed in reverse and may cause unexpected
behaviors. For example, due to the issue a GARP packet wasn't sent on
a transition from the BACKUP state to the MASTER state; this happened
because IN_IFF_DETACHED flag wasn't cleared on arpannounce, which
had been cleared in the link_state_change handler.

This fixes an issue reported by sborrill@ on tech-net:
http://mail-index.netbsd.org/tech-net/2017/03/14/msg006283.html
 1.392 06-Apr-2017  ozaki-r Prepare netipsec for rump-ification

- Include "opt_*.h" only if _KERNEL_OPT is defined
- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
- Setup sysctls in ipsec_attach explicitly instead of using SYSCTL_SETUP
- Call ip6flow_invalidate_all in key_spdadd only if in6_present
- It's possible that a rump kernel loads the ipsec library but not
the inet6 library
 1.391 06-Apr-2017  ozaki-r Revert "Make sure to hold if_ioctl_lock when calling ifp->if_ioctl"

As per pgoyette@ and riastradh@ requests; we shouldn't decide to
hold a lock based on if the lock is held or not.
 1.390 05-Apr-2017  ozaki-r Make sure to hold if_ioctl_lock when calling ifp->if_ioctl

Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.
 1.389 28-Mar-2017  ozaki-r Avoid touching a mbuf after enqueuing it
 1.388 24-Mar-2017  ozaki-r Remove extra semicolon
 1.387 16-Mar-2017  ozaki-r Simplify ifunit, if_get and if_get_byindex by reusing other functions

Inspired by kre@'s comment
 1.386 16-Mar-2017  ozaki-r Fix panic on ifconfig <number>

Pointed out by s-yamaguchi@IIJ
 1.385 14-Mar-2017  ozaki-r Use if_acquire and if_release instead of using psref API directly

- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
 1.384 14-Mar-2017  ozaki-r Replace DIAGNOSTIC + panic with KASSERT
 1.383 09-Mar-2017  knakahara ifp->if_transmit() must free mbuf even if error occurred.

Add missing m_freem(m) to if_nulltransmit().
Below ifp->if_transmit() implementations are already added m_freem(m) properly.
- wm(4)
- ixg(4)
- ixv(4)
- pppoe(4)
- gif(4)
- l2tp(4)

pointed out by ozaki-r@n.o, thanks.
 1.382 07-Mar-2017  ozaki-r Add missing splnet to if_deferred_start_common

if_start should run in splnet to avoid running interrupt handlers.
 1.381 23-Feb-2017  ozaki-r Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.380 17-Feb-2017  ozaki-r Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.
 1.379 16-Feb-2017  knakahara support interface name which includes digit.
 1.378 15-Feb-2017  ozaki-r Avoid if_dl and if_sadl to be NULL

Calling if_deactivate_sadl and then if_sadl_setrefs exposes NULL-ed if_dl
and if_sadl to users for a moment. It's harmful because users expect that
they're always non-NULL. Fix it.

Note that a race condition still remains; if_dl and if_sald aren't updated
atomically so a user can see different data from if_dl and if_sadl.
Fortunately none uses both if_dl and if_sadl at the same time, so the race
condition doesn't hurt nobody for now. (In the first place exposing one
data with two ways is problematic?)
 1.377 10-Feb-2017  christos make attach and detach locking symmetric (detaching cloners failed)
 1.376 09-Feb-2017  ozaki-r Make bpf MP-safe

By the change, bpf_mtap can run without any locks as long as its bpf filter
doesn't match a target packet. Pushing data to a bpf buffer still needs
a lock. Removing the lock requires big changes and it's a future work.

Another known issue is that we need to remain some obsolete variables to
avoid breaking kvm(3) users such as netstat and fstat. One problem for
MP-ification is that in order to keep statistic counters of bpf_d we need
to use atomic operations for them. Once we retire the kvm(3) users, we
should make the counters per-CPU and remove the atomic operations.
 1.375 25-Jan-2017  christos fix locking against myself in module autoload; module autoload calls
if_clone_attach which takes the lock again.
 1.374 24-Jan-2017  ozaki-r Restore splnet for if_slowtimo

if_slowtimo (== if_watchdog) still requires splnet for most drivers.

Pointed out by nonaka@
 1.373 23-Jan-2017  ozaki-r Replace some splnet with splsoftnet
 1.372 20-Jan-2017  ozaki-r Protect if_clone data with if_clone_mtx

To this end, carpattach needs to be delayed from RUMP_COMPONENT_NET to
RUMP_COMPONENT_NET_IF on rump_server. Otherwise mutex_enter via carpattach
for if_clone_mtx is called before mutex_init for it in ifinit1.
 1.371 10-Jan-2017  ozaki-r branches: 1.371.2;
Add softnet_lock to if_link_state_change_si

Fix
panic: lock error: Mutex: mutex_vector_exit: assertion failed:
MUTEX_OWNER(mtx->mtx_owner) == curthread
at callout_halt <= arp_dad_stop <= in_if_link_down.
 1.370 10-Jan-2017  ozaki-r Enable some sysctl knobs on rump kernels for ifmcstat
 1.369 26-Dec-2016  christos pfil(9) improvements to handle address changes:

Add:
PFIL_IFADDR call on interface reconfig (mbuf is ioctl #)
PFIL_IFNET call on interface attach/detach (mbuf is PFIL_IFNET_*)

from rmind@
 1.368 15-Dec-2016  ozaki-r Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input

The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
 1.367 13-Dec-2016  ozaki-r Constify ifp of if_is_deactivated
 1.366 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.365 09-Dec-2016  christos This spams 100's of times during boot!
 1.364 08-Dec-2016  ozaki-r Introduce deferred if_start framework

The framework provides a means to schedule if_start that will be executed
in softint later. It intends to be used to avoid calling if_start,
especially bpf_mtap, in hardware interrupt.

It adds a dedicated softint to a driver if the driver requests to use the
framework via if_deferred_start_init. The driver can schedule deferred
if_start by if_schedule_deferred_start.

Proposed and discussed on tech-kern and tech-net
 1.363 06-Dec-2016  ozaki-r Fix memory leak of struct if_percpuq on interface destruction
 1.362 15-Nov-2016  ozaki-r Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
 1.361 05-Nov-2016  pgoyette Move if_43.c back into the shared Makefile.sysio where it really
belongs.

Update the code to invoke the two routines compat_cvtcmd() and
compat_ifioctl() through indirect pointers. Initialize those
pointers in sys/net/if.c and update them in the compat module's
initialization code.

Addresses the issue pointed out in PR kern/51598
 1.360 28-Oct-2016  ozaki-r Fix the position of IFADDR_ENTRY_DESTROY

It must be called after all readers left, i.e, after pserialize_perform.
 1.359 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.358 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.357 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.356 22-Jul-2016  knakahara Reduce KERNEL_LOCK thereby ifq_lock is used by default.

if_snd is always excluded by ifq_lock now. So, the KERNEL_LOCK in if_transmit()
which serializes packet output processing is not needed now.
 1.355 22-Jul-2016  knakahara Toward NET_MPSAFE-on in future, if_snd uses if_snd->ifq_lock by default.

That can reduce confusing difference between NET_MPSAFE on and off.
 1.354 07-Jul-2016  ozaki-r branches: 1.354.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.353 05-Jul-2016  knakahara fix evbsh3 build
 1.352 04-Jul-2016  knakahara make encap_lock_{enter,exit} interruptable.
 1.351 04-Jul-2016  ozaki-r Tweak p2p_rtrequest as well for ifaddr initialization change

We need to set lo0ifp to rt->rt_ifp if the interface is RTF_LOCAL.

Fix PR kern/51301.
 1.350 01-Jul-2016  ozaki-r Make sure to free all interface addresses in if_detach

Addresses of an interface (struct ifaddr) have a (reverse) pointer of an
interface object (ifa->ifa_ifp). If the addresses are surely freed when
their interface is destroyed, the pointer is always valid and we don't
need a tweak of replacing the pointer to if_index like mbuf.

In order to make sure the assumption, the following changes are required:
- Deactivate the interface at the firstish of if_detach. This prevents
in6_unlink_ifa from saving multicast addresses (wrongly)
- Invalidate rtcache(s) and clear a rtentry referencing an address on
RTM_DELETE. rtcache(s) may delay freeing an address
- Replace callout_stop with callout_halt of DAD timers to ensure stopping
such timers in if_detach
 1.349 01-Jul-2016  ozaki-r Add debug helper function for interface addresses

It checks whether all addresses of an interface being destroyed
are freed (no reference remains) at the end of if_detach.
 1.348 28-Jun-2016  ozaki-r Introduce if_is_deactivated

Checking ifp->if_output == if_nulloutput is too implicit.

No functional change.
 1.347 27-Jun-2016  knakahara fix spelling mistake pointed out by roy@n.o
 1.346 27-Jun-2016  knakahara reduce link state changing softint if it is not required

ok by ozaki-r@n.o
 1.345 22-Jun-2016  knakahara fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.344 21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.343 20-Jun-2016  knakahara apply if_start_lock() to L2 callers which call ifp->if_start() of device derivers
 1.342 20-Jun-2016  ozaki-r Do psref_target_destroy after purging packets

Because purging packets may try to send packets, which requires psref yet.
 1.341 16-Jun-2016  riastradh Fix error branches of if_sdl_sysctl.

Can't release the psref if we didn't even find the interface!
 1.340 16-Jun-2016  ozaki-r Use if_get_byindex instead of if_byindex for MP-safe
 1.339 16-Jun-2016  ozaki-r Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND
 1.338 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.337 31-May-2016  ozaki-r Optimize if_get_byindex by adding __predict_true
 1.336 16-May-2016  ozaki-r Replace ifnet_lock with if_get and if_put

ifnet_lock is a dedicated method to safely destroy an interface over running
ioctl operations. Replace it with a more generic mechanism using psref(9).
 1.335 16-May-2016  ozaki-r Introduce if_get, if_get_byindex and if_put

The new API enables to obtain an ifnet object with protected by psref(9).
It is intended to be used where an obtained ifnet object is used over
sleepable operations.
 1.334 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.333 02-May-2016  skrll Typo in comment
 1.332 28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.331 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.330 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.329 11-Apr-2016  ozaki-r Don't use radix tree API directly
 1.328 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.327 23-Mar-2016  knakahara add drop count which means the sum of struct if_percpuq's per-CPU queues.

ok by ozaki-r@n.o
 1.326 07-Mar-2016  ozaki-r Add missing percpu_putref to error path
 1.325 19-Feb-2016  roy Implement a queue for if_link_state_change() calls to fix a race condition
introduced in the prior patch.

The queue has capacity to store 8 link state changes, if it overflows then
the oldest state change is lost, but the oldest DOWN state change is
preserved to ensure any subsequent UP state changes reflect properly.

Because there are only 3 states to queue, the queue itself is implemented
by storing 2-bit numbers in a bigger one.
To increase the size of the queue, just increase the size of the backing
store to a bigger number.
 1.324 15-Feb-2016  ozaki-r Run if_link_state_change in softint

if_link_state_change can execute the network stack that is expected to
not run in hardware interrupt (at least now), however network drivers
may call it in hardware interrupt. Avoid that by introducing a new
softint for if_link_state_change.

The original patch is provided by mlelstv@ and tweaked a bit by me.

Should fix PR kern/50602.
 1.323 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.322 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.321 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.320 04-Jan-2016  ozaki-r Fix the destruction of the afdata lock

Pointed out by mlelstv@
 1.319 20-Nov-2015  ozaki-r Remove an ifnet object from the global list before destructing it
 1.318 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.317 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.316 29-Jun-2015  ozaki-r Remove ifnet_addrs

We can assume that ifnet_addrs[ifp->if_index] is always the same as
ifp->if_dl, so we can replace ifnet_addrs[ifp->if_index] with ifp->if_dl
and remove ifnet_addrs entirely.

ok martin@
 1.315 18-May-2015  martin Implement SIOCIFGCLONERS for netbsd32, so ifconfig -C works.
 1.314 22-Apr-2015  roy This comment is no longer IPv6 specific.
 1.313 22-Apr-2015  roy Fix a copy n pasta error with prior.
 1.312 22-Apr-2015  roy Move INET6 specific in6_if_{up,down}() and in6_if_link_{up,down}()
into agnostic domain functions.
 1.311 21-Apr-2015  pooka Attach PF_INET6 pktq sysctls only when inet6 is present.

More modular initialization would be nicer, but at least this patch
prevents "sysctl -a" from crashing when INET6 is defined by inet6 has
not been attached.
 1.310 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.309 07-Apr-2015  roy Move in6if_do_dad() to if_do_dad() as the routine is not INET6 specific
and could equally be used by INET.
 1.308 16-Jan-2015  ozaki-r Introduce defflag for NET_MPSAFE
 1.307 15-Dec-2014  ozaki-r Introduce if_initialize and if_register as an alternative to if_attach

if_attach initializes an ifnet object and registers it to the system
(e.g., ifnet_list), however, if_attach doesn't complete the
initialization and the rest of it will be done by if_alloc_sadl
that is normally directly called by device drivers or called via
functions like ether_ifattach. So there is a race between
if_attach and if_alloc_sadl (A half-baked ifnet object may be
accessed, for example, via ioctl between them).

The aim of this fix is to register an initializing ifnet object
after completing its initializations. To this end, this fix
separates if_attach into an initialization part (if_initialize)
and a registration part (if_register) and call the latter after
if_alloc_sadl (ether_ifattach). So a typical usage of the two
new APIs is like this:

if_initialize(ifp); // was if_attach
ether_ifattach(ifp, enaddr);
if_register(ifp);

Nonetheless, changing every drivers to do so at once isn't
feasible. So we keep if_attach working as it used to be and
will change only some drivers that we need at this point.
Once we know the fix really works well, we'll change all
the others.

Some more information of the fix can be found here:
http://mail-index.netbsd.org/tech-kern/2014/12/10/msg018242.html

No objection on tech-kern and tech-net.
 1.306 14-Dec-2014  martin Avoid a race when the ifp->if_slowtimo pointer is changed while we are
running in if_slowtimo already. Suggested by Masao Uebayashi
in PR kern/49462.
 1.305 11-Dec-2014  martin Avoid scheduling more slow timeouts while we are in the process of detaching
the interface: set if_slowtimo to NULL before doing the callout_halt()
and test for that in the callout. Fixes PR kern/49462.
 1.304 08-Dec-2014  ozaki-r Tweak ifconf (retry)

The tweak makes the code intention clear and further changes easy.

No functional change.

The first trial broke SIOCGIFCONF (PR 49437). So as not to repeat the mistake,
t_ifconf was added. It should warn if something goes wrong on ifconf.
 1.303 02-Dec-2014  ozaki-r Revert "Pull if_drain routine out of m_reclaim"

The commit broke dlopen()'d rumpnet on platforms where ld.so does not
override weak aliases (e.g. musl, Solaris, potentially OS X, ...).

Requested by pooka@.
 1.302 01-Dec-2014  ozaki-r Make more functions static

No functional change.
 1.301 01-Dec-2014  christos PR/49437: jmcneill: revert broken changes that broke SIOCGIFCONF (mdnsd uses it)
 1.300 28-Nov-2014  ozaki-r branches: 1.300.2;
Remove dead codes and make if_free_sadl static

No functional change.
 1.299 27-Nov-2014  ozaki-r Pull if_drain routine out of m_reclaim

It's if-specific and should be in if.c.

No functional change.
 1.298 26-Nov-2014  ozaki-r Tweak ifconf variants

The tweaks make the code intention clear and make further changes easy.

No functional change.
 1.297 26-Nov-2014  ozaki-r Change if_slowtimo_ch to a pointer

One benefit to do so is to reduce memory used for struct callout;
we can avoid to allocate struct callout for interfaces that don't
use callout.

Requested by uebayasi@.
 1.296 26-Nov-2014  ozaki-r Create if_slowtimo (if_watchdog) callout for each interface

This change is to obviate the need to run if_slowtimo callbacks that
may sleep inside IFNET_FOREACH. And also by this change we can turn
on MPSAFE of callouts individually.

Discussed with uebayasi@ and riastradh@.
 1.295 26-Nov-2014  ozaki-r Rename if_watchdog to if_slowtimo

if_watchdog callbacks do a little more than what "watchdog" suggests.

Discussed with uebayasi@ (the idea originally from openbsd-tech).
 1.294 26-Nov-2014  ozaki-r Make if_slowtimo static
 1.293 17-Nov-2014  pooka Make ifconfig destroy work if INET6 is present but not attached
 1.292 07-Nov-2014  christos PR/49373: Ryota Ozaki: Running if_clone_create and if_clone_destroy in
parallel causes panic
XXX: Pullup 7.
 1.291 09-Sep-2014  rmind Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.290 09-Aug-2014  rtr branches: 1.290.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.289 31-Jul-2014  ozaki-r Define IFADDR_FOREACH_SAFE for on-the-fly element removal in a loop

We have to use it when we purge an address element in an ifaddr loop.

This change restores the original behavior that was accidentally degraded.
 1.288 30-Jul-2014  ozaki-r Call etherinit from ifinit1 only when it is required

This unbreaks the builds of kernels that don't build if_ethersubr.c.
 1.287 29-Jul-2014  ozaki-r Use IFADDR_FOREACH
 1.286 28-Jul-2014  ozaki-r Add a mutex for global variables of if_ethersubr.c

To initialize the mutex, we introduce etherinit that is called from ifinit1.
 1.285 01-Jul-2014  ozaki-r Lock IFQ operations when NET_MPSAFE

- Introduce NET_MPSAFE
- not defined by default
- Add ifq_lock to protect ifnet#if_snd
- Initialize ifq_lock and lock IFQ operations
when NET_MPSAFE

When NET_MPSAFE isn't defined, this modification
doesn't change its behavior and adds trivial
performance overheads.

Discussed with matt@ on tech-net
 1.284 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.283 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.282 16-Jun-2014  ozaki-r Move sysctl_pktq_{maxlen,count} to pktqueue.c and make them global

They will be used by bridge.

ok rmind@
 1.281 13-Jun-2014  rmind if_detach: drain ip6_pktq as well.
 1.280 10-Jun-2014  joerg Introduce new sysctls for obtaining interface-specific addresses:
- net.sdl for the active link-layer adddress (the MAC)
- net.ether.multicast for the Ethernet multicast addresses
- net.inet6.multicast for the IPv6 multicast groups
- net.inet6.multicast_kludge for temporarily removed multicast groups

Use this sysctls for replacing the kmem grovelling in ifmcstat(8).
 1.279 09-Jun-2014  rmind Implement pktq_set_maxlen() and let sysctl net.inet.{ip,ip6}.ifq.maxlen be
changed on the fly again.
 1.278 07-Jun-2014  he Include <netinet/in.h> before <netinet/in_var.h> to avoid build failure
for the COMPUTEX7750 kernel of evbsh3-eb.
Also, don't reference ip_pktq if INET isn't defined (found by the same
kernel).
 1.277 06-Jun-2014  rmind - Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.276 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.275 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.274 18-May-2014  rmind - Move ifnet_list (and lo0ifp while here) under #ifdef _KERNEL.
- Make ifindex2ifnet, if_indexlim and some other variables static.
- Move if_index generation into its own function.
- if_alloc/if_free: replace malloc with kmem.
 1.273 26-Apr-2014  pooka Decouple sockets linkage from interface code by making ifioctl() a pointer.
 1.272 25-Feb-2014  pooka branches: 1.272.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.271 03-Jan-2014  pooka missed one inet6 check
 1.270 02-Jan-2014  pooka Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.269 19-Oct-2013  mrg adjust previous; old_link_state is also used in INET6.
 1.268 19-Oct-2013  martin Ifdef a variable like its use
 1.267 06-Oct-2013  christos remove unrelated diff.
 1.266 05-Oct-2013  christos Add SIOCGIFINDEX from Ty Sarna and Matthew Sporleder.
 1.265 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.264 20-Jun-2013  roy branches: 1.264.2;
Move the detaching and making tentative addresses out if in6_if_up
and into in6_if_link_up.

This fixes a possible panic where link is up but not the interface.
Note that a better solution would be to listen to the routing socket
in the kernel, but I don't know how to do that.

Reachable Router tests for IFF_UP as well.
 1.263 11-Jun-2013  roy When an interface link state changes to down, mark all attached IPv6
addresses as detached.
Likewise, when the link state changes to up, mark all detached IPv6
as tentative and start DAD on them.

Advertised router reachability now checks that link state is not down.
This means that when an interface link state changes, the default IPv6
router may change as well.
 1.262 10-Mar-2013  christos allow cloners as modules.
 1.261 01-Nov-2012  msaitoh Fix a bug that SIOCZIFDATA clears if_lastchage by zero.
Update if_lastchange with getnanotime().
 1.260 03-Feb-2012  christos branches: 1.260.2; 1.260.6;
PR/45764, PR/45914
Part 2:
Arrange so that the pointers that we free (ifp->if_afdata, dom->dom_ifqueues[i])
are set to NULL.
While I am here, add a continue.
 1.259 28-Dec-2011  dyoung Fix ifpromisc() regression: if ifpromisc(ifp, 1) is called, do set
IFF_PROMISC whether ifp is IFF_UP or not, but do not call ifp->if_ioctl
unless ifp is IFF_UP.
 1.258 27-Nov-2011  jakllsch branches: 1.258.2;
We need a cv_destroy() here too. Fixes LOCKDEBUG panic on interface detachment.
 1.257 16-Nov-2011  dyoung Before freeing an ifnet_lock, destroy its mutex. Should help with
kern/43294.
 1.256 28-Oct-2011  dyoung branches: 1.256.2;
Userland may not change the IFF_CANTCHANGE flags, however, the kernel
may, so make sure if_flags_set() takes care of them. Fixes a regression
in ifpromisc().
 1.255 25-Oct-2011  dyoung Document the ifioctl locking in comments.

Add a missing percpu_free(9) call.
 1.254 19-Oct-2011  dyoung Fix userland compilation: pull the ifioctl lock-related data members
into a struct ifnet_lock that the ifnet has a pointer to. In a
non-_KERNEL environment, don't #include <sys/percpu.h> et cetera, and
don't define the struct ifnet_lock but *do* declare it.
 1.253 19-Oct-2011  dyoung Extract subroutines ifioctl_enter() and ifioctl_exit().
 1.252 19-Oct-2011  dyoung Start to untangle the ifnet ioctls mess.

Add ifnet functions, if_mcast_op(), if_flags_set(), and if_addr_init()
for adding/deleting multicast addresses, modifying the if_flags,
and initializing local/remote addresses. Make ifpromisc() use
if_flags_set(). Protocols and network drivers should use these
instead of ifp->if_ioctl() calls. Subsequent commits will
replace ifp->if_ioctl(SIOCADDMULTI| SIOCDELMULTI| SIOCSIFDSTADDR|
SIOCINITIFADDR| SIOCSIFFLAGS) calls with calls to the new functions.

Use a mutex(9) to synchronize ifp->if_ioctl() calls originating in
userland. Also synchronize ifp->if_ioctl() calls with ifnet detachment
and reclamation.
 1.251 12-Aug-2011  dyoung Define if_free() for ixg(4) to use.
 1.250 18-Jan-2011  rmind NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.249 15-Nov-2010  pooka branches: 1.249.2;
Implement ifconfig linkstr as proposed on tech-net.
 1.248 06-Nov-2010  christos remove unused variables.
 1.247 06-Nov-2010  christos PR/44054: Onno van der Linden: Stacksmashing in handling of ioctl OOSIO*
parameter.
 1.246 02-Nov-2010  christos PR/44030: Onno van der Linden: ifreqn2o gets called with the parameters the
wrong way around in /sys/net/if.c
 1.245 23-Sep-2010  christos prevent integer oveflow. From Maksymilian Arciemowicz
 1.244 12-Jun-2010  skrll Correct the argument order of ifreqn2o conversion.

Fixes PR/42585.
 1.243 02-Jun-2010  dyoung Prevent if_detach() from crashing while it walks the routing table
to find and unlink routes that reference the detached ifnet: make
if_rt_walktree() return ERESTART whenever it has deleted a route.
Whenever rt_walktree() returns ERESTART, if_detach() restarts it.

I believe that this fix resembles one by Jonathan Kollasch or by someone
else, which has languished in a PR for too long. Sorry!

Tested by me and by Jeff Rizzo.

XXX It's supposed to be safe for rn_walktree() to apply to the routing
XXX table a routine that may delete routes. Why isn't it safe in
XXX practice?
 1.242 28-Jan-2010  mbalmer branches: 1.242.2; 1.242.4;
fix language
 1.241 13-Nov-2009  joerg Simplify ifreq_setaddr:
- Drop the INET6 block. The commands are never given to this function
and truncating the sockaddr is arguably not the desired result anyway.
- Clear the address before copying. This fixes SIOCGIFNETMASK and possible
other ioctls for users that don't check sa_len. This includes
COMPAT_43 and Linux emulation.

OK dyoung@
 1.240 26-Oct-2009  cegger buildfix: only declare sysctl_net_ifq_setup() if INET or INET6 is defined
 1.239 03-Oct-2009  elad Move default network interface policy back to the subsystem.
 1.238 19-Sep-2009  skrll Initialise index_gen_mtx before use.
 1.237 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.236 15-Sep-2009  jakllsch When working with address preferences, sockaddr_externalize() both
addresses before comparing them.

This allows IPv6 link-local addresses (which have an embedded scope id)
to have a preference set on them.

ok dyoung
 1.235 11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.234 13-Aug-2009  dyoung Use sysctl(9) to expose to userland each interface transmission
queue's maximum length, current length, and number of drops. E.g.,

% sysctl net.interfaces.bnx0
net.interfaces.bnx0.sndq.len = 0
net.interfaces.bnx0.sndq.maxlen = 509
net.interfaces.bnx0.sndq.drops = 0

Let userland adjust the maximum queue length.

While I'm here, add a 64-bit generation number, if_index_gen, to
ifnet; the pair [ifp->if_index, ifp->if_index_gen] can serve to
identify an ifnet for the lifetime of the system. I will use this
in an upcoming change.

Ok matt@.
 1.233 12-Feb-2009  christos PR/40603: Christoph Badura: unprivileged users can add and delete interface
link addresses. Fixed by centralizing the test as suggested. Will pull up
to 5.0 once submitter tests the fix.
 1.232 11-Jan-2009  christos branches: 1.232.2;
merge christos-time_t
 1.231 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.230 24-Oct-2008  dyoung branches: 1.230.2; 1.230.4;
Do not gratuitously cast to void *.
 1.229 24-Oct-2008  dyoung Undo a change in my last commit that was not suppsoed to be committed.
 1.228 24-Oct-2008  dyoung Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.227 18-Jun-2008  yamt branches: 1.227.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@
 1.226 15-Jun-2008  christos - add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.225 13-May-2008  dyoung branches: 1.225.2;
Cosmetic; reduce excessive parenthesization.
 1.224 13-May-2008  dyoung Let us call ioctl(SIOC[ADG]LIFADDR) with a link-layer address on
an AF_LINK socket, only, to be consistent with SIOC[ADG]LIFADDR
behavior on AF_INET and AF_INET6 sockets. Let us create AF_LINK
sockets for this purpose. Note that most operations on AF_LINK
sockets are not implemented.
 1.223 11-May-2008  dyoung Add kernel support for adding/removing link-layer addresses using
SIOCALIFADDR AND SIOCDLIFADDR, respectively. Corresponding
ifconfig(8) changes are coming soon.
 1.222 29-Apr-2008  ad branches: 1.222.2;
kern/38502 ifconfig wi0 hangs

Don't acquire the socket lock for PRU_CONTROL.
 1.221 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.220 24-Apr-2008  martin branches: 1.220.2;
Make it compile if !COMPAT_OSOCK
 1.219 24-Apr-2008  ad Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.218 29-Feb-2008  dyoung branches: 1.218.2; 1.218.4;
Cosmetic: shorten staircases. Join some lines.
 1.217 07-Feb-2008  martin branches: 1.217.2; 1.217.6;
Make it compile w/o INET6
 1.216 07-Feb-2008  xtraeme Remove neticp (network info commpage) stuff that dyoung added
accidentally to make this build again.
 1.215 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.214 23-Jan-2008  dyoung Fix more fall-out from extracting ifioctl_common(): don't return
ENETRESET from ifioctl().
 1.213 22-Jan-2008  dyoung Functional: return ENTRESET from ifioctl_common(), if SIOCSIFCAP
changed anything.

Cosmetic: shorten staircase.
 1.212 22-Jan-2008  dyoung Add missing break statement.
 1.211 22-Jan-2008  dyoung Take two steps toward adding and deleting link-layer addresses.

1 Extract subroutine if_dl_create() from if_alloc_sadl().
if_dl_create() allocates a link-layer ifaddr.

2 Extract subroutine ifioctl_common() from ifioctl(). ifioctl_common()
will be the basis for an ifnet "superclass" whose functions
drivers may inherit. Very simple drivers may set ifnet->if_ioctl
= ifioctl_common. More sophisticated drivers will set ifnet->if_ioctl
= driver_ioctl. driver_ioctl() will call ifioctl_common() to
re-use the common code.
 1.210 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.209 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.208 06-Dec-2007  dyoung branches: 1.208.4;
Fix a serious regression: insert new ifaddrs at the end of if_addrlist,
not at the front, because the first ifaddr on the list has special
significance (grrr).
 1.207 06-Dec-2007  dyoung Add ifa_insert() and ifa_remove() that add/remove an ifaddr to/from
an interface and increase/decrease its reference count.
 1.206 05-Dec-2007  dyoung Extract common code, creating a subroutine if_purgeaddrs(ifp,
family, purgeaddr) which applies function `purgeaddr' to each
address on `ifp' belonging to `family'.
 1.205 05-Dec-2007  dyoung Use IFADDR_EMPTY().
 1.204 04-Dec-2007  dyoung Use IFADDR_FOREACH().
 1.203 01-Nov-2007  dyoung branches: 1.203.2; 1.203.4;
Change a few malloc(9) + memset(3) pairs to malloc(..., ...|M_ZERO).
 1.202 11-Oct-2007  dyoung branches: 1.202.2;
In ifreq_setaddr(), use the right buffer sizes for compat v.
non-compat commands.
 1.201 13-Sep-2007  gdt branches: 1.201.2;
Add a define for the ifru_space union member.

Copy the entire sockaddr to the buffer to be written to user space,
according to its length, not just the part that fits in struct
sockaddr.

This fixes the 'bad MAC address' problem in dhclient.
 1.200 11-Sep-2007  gdt Fix bug in SIOCGIFCONF where the wrong length was calculated for
sockaddrs bigger than struct sockaddr. Tightly bind decrementing
available space and using it, avoiding incorrect accounting in an
error case. Document invariants. Document calling convention for
SIOCGIFCONF. Simplify by removing code to handle sockaddrs that don't
fit in struct ifreq; with sockaddr_storage this can no longer occur.
Add several KASSERTs.

This commit resolves the problem with racoon failing to list
interfaces.

Proposed on tech-net@ with no objections.
 1.199 01-Sep-2007  dyoung Fix compilation if !defined(INET6). Thanks, Geoff Wing, for the
bug report & patch.
 1.198 31-Aug-2007  dyoung Per discussion in 30 May 2007 on tech-net, add accessors for
ifreq->ifr_addr, ifreq_getaddr() and ifreq_setaddr().
 1.197 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.196 20-Aug-2007  skd branches: 1.196.2;
Clean up net compat ioctls, and clean up handling of wireless ioctls.
 1.195 07-Aug-2007  dyoung branches: 1.195.2;
In if_alloc_sadl(), use sockaddr_dl_init() and satocsdl(). Introduce
variable 'mask' for the netmask, and use it instead of assigning
to 'sdl' twice.

In ifa_ifwithnet(), use satocsdl().
 1.194 19-Jul-2007  dyoung branches: 1.194.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.193 09-Jul-2007  ad branches: 1.193.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.192 09-Jun-2007  dyoung Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.191 01-Jun-2007  christos - fix unused variable when none of the compat options are defined.
- remove debugging
 1.190 01-Jun-2007  enami Fix some bugs in ifconf():
- maintain space left correctly. the pointer is advanced by the size
of struct ifreq when length of address is small.
- single sizeof operator is enough to take the size of struct.
- the type of `sz' must be singed type since it is/was compared against to
the variable which may become negative.
- no need to traverse rest of interfaces once we got an error. note that
the latter `break' statement was inside inner loop.
 1.189 31-May-2007  christos provide the minimum ifreq size (when sockaddr is empty)
 1.188 30-May-2007  christos Move the nasty ifdefs in one place. Requested by ad and dyoung.
 1.187 29-May-2007  xtraeme Initialize oifr to fix build with COMPAT_40.
 1.186 29-May-2007  christos Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.185 18-Mar-2007  dyoung KNF: compare pointers with NULL instead of 0, and do not "test
truth" of either integers or pointers, so that it's clear
what's going on. Remove superfluous () from return statements.
bcmp -> memcmp, bcopy -> memcpy.

Misc. cosmetic: join some lines, remove a few empty lines, remove
spaces from type casts. Don't open-code IFNET_FOREACH(). Shorten
some staircases.
 1.184 18-Mar-2007  dyoung The departure of IPv6 interfaces does not agree with pf. The pfil
hooks that signal the interface's departure run before IPv6 sends
messages to indicate that it is leaving its multicast groups; when
pf filters the departure messages, it does not recognize the output
interface, so it complains at the departure of gre65, for example:

pf_test6: kif == NULL, if_xname gre65

I have changed if_detach() so that it calls pr_usrreq(PRU_PURGEIF)
before pfil_run_hooks(PFIL_IFNET_DETACH), instead of the other way
around. That quiets the pf_test6: messages.
 1.183 04-Mar-2007  christos branches: 1.183.2; 1.183.4; 1.183.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.182 22-Feb-2007  dyoung Cosmetic: use TAILQ_EMPTY, TAILQ_FOREACH.
 1.181 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.180 03-Dec-2006  dyoung branches: 1.180.2;
Fix spelling, s/straglers/stragglers/.
 1.179 02-Dec-2006  dyoung In if_rt_walktree(), make absolutely certain not to leave a dangling
pointer, rt_ifp, from an rtentry to an interface that we are going
to destroy.
 1.178 20-Nov-2006  dyoung "Reform" TAILQ usage:

Obey the TAILQ abstraction while removing ifaddrs from an interface
in if_detach; just restart the loop after removing one or more
ifaddrs from the interface.

Convert a bunch of for (ifa = TAILQ_FIRST(); ifa; ifa = TAILQ_NEXT())
loops to TAILQ_FOREACH().

Remove some superfluous parentheses while I am here.
 1.177 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.176 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.175 27-Oct-2006  christos Use strncpy to copy out interface names so that the trailing part of the
buffer is zeroed, and check for overflow.
 1.174 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.173 22-Oct-2006  christos fix typo.
 1.172 22-Oct-2006  christos use strlcpy instead of strncpy or bcopy to copy the interface name.
 1.171 22-Oct-2006  pooka be appropriately const poisonous
 1.170 13-Oct-2006  hannken More __unused (COMPAT_OSOCK not defined).
 1.169 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.168 25-Aug-2006  matt branches: 1.168.2; 1.168.4;
One step closer to loadable domains. Store pointers to a domain's soft
interrupt queues so if_detach can remove packets to removed interfaces from
them. This eliminates a lot of conditional ugly code in if.c
 1.167 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.166 18-May-2006  liamjfoy Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.165 14-May-2006  elad integrate kauth.
 1.164 01-May-2006  dyoung Remove needless "link state changed to DOWN/UP" message.
 1.163 11-Dec-2005  thorpej branches: 1.163.4; 1.163.6; 1.163.8; 1.163.10; 1.163.12;
ANSI function decls and application of static.
 1.162 11-Dec-2005  christos merge ktrace-lwp.
 1.161 24-Sep-2005  christos It is now ``later''. Follow cgd's 1993 wish and move struct osockaddr
and struct omsghdr to a compat header.
 1.160 19-Jul-2005  gdt Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.159 22-Jun-2005  dyoung branches: 1.159.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.
 1.158 29-May-2005  christos - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.157 02-May-2005  yamt split IFCAP_CSUM_xxx to IFCAP_CSUM_xxx_Rx and IFCAP_CSUM_xxx_Tx.
 1.156 31-Mar-2005  christos fix compiling with -DALTQ
 1.155 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.154 26-Feb-2005  perry branches: 1.154.2;
nuke trailing whitespace
 1.153 24-Jan-2005  matt branches: 1.153.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.152 23-Jan-2005  matt Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.151 09-Jan-2005  yamt branches: 1.151.2;
ifioctl: don't use super user priviledge unless it's needed.
 1.150 04-Dec-2004  peter Fix a typo in Bill Studenmund's name.
 1.149 04-Dec-2004  peter Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.148 04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.147 07-Oct-2004  tron Backout last two revision:
1.) There is objection against this change by at least one developer.
2.) These changes cause repeatable system lockups and crashes for
at least four people.
 1.146 06-Oct-2004  itojun use ifunit()
 1.145 06-Oct-2004  itojun call dom_ifattach[] at consistent state. before this commit, dom_ifattach[]
was called after interface attach is completely done for non-cloning interface,
and from within if_attach() for cloning interface (which was wrong).
 1.144 27-Jul-2004  yamt - rename PFIL_NEWIF to PFIL_IFNET, and handle interface detach events
as well.
- use it for pf(4).

mostly from Peter Postma. PR/26403.
 1.143 22-Jun-2004  itojun prepare PF-related hooks. reviewed by matt, perry, christos
 1.142 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.141 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.140 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.139 24-Mar-2004  atatat branches: 1.139.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.138 29-Jan-2004  drochner make it compile with !(INET || INET6)
 1.137 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.136 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.135 28-Nov-2003  keihan s/netbsd.org/NetBSD.org/g
 1.134 11-Nov-2003  drochner fix interface address list traversal in if_detach():
The code was assuming that interface addresses are removed one-by-one.
With IPv6 and multicasts, removal of one address can remove other
addresses as side effect, which caused accesses of free()d memory.
 1.133 10-Nov-2003  jonathan Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.
 1.132 13-Oct-2003  dyoung Use new 802.11 header files.
 1.131 01-Oct-2003  itojun fix out-of-bounds access to ifindex2ifnet[]. found by iij seil team.
 1.130 14-Aug-2003  itojun correct number range handling. David Young
 1.129 14-Aug-2003  itojun fix INT_MAX check in if_clone_lookup
 1.128 14-Aug-2003  itojun correct if_clone_lookup(). based on diff from Quentin Garnier
 1.127 09-Aug-2003  christos Fix problem with OSIOCIFCONF where it tried to copyout addresses that
did not fit in struct osockaddr. Fixes linux emulation issue where bogus
addresses where returned for the interfaces [AF_LINK, AF_INET6]. While
I am here, change ioctl, so if the ifconf buffer passed is NULL, then it
computes how much space is needed and returns it in ifc_len.
 1.126 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.125 15-Jul-2003  itojun decnetintrq is still in fddi/tokenring (is it needed?).
 1.124 06-Jul-2003  dyoung Prepare to consolidate 802.11 media handling (which is handled in
code duplicated by each driver, now) into the 802.11 framework.
 1.123 29-Jun-2003  fvdl branches: 1.123.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.122 29-Jun-2003  ichiro missing ')'
 1.121 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.120 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.119 19-May-2003  christos add COMPAT_ULTRIX where necessary. Thanks gimpy!
 1.118 16-May-2003  itojun AF_LINK sockaddr has to be attached to ifp->if_addrlist until the end,
as many of the code assumes that TAILQ_FIRST(ifp->if_addrlist) is non-null.
 1.117 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.116 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.115 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.114 27-Sep-2002  onoe Add check suser() for SIOCS80211BSSID, SIOCS80211CHANNEL.
 1.113 26-Aug-2002  thorpej Fix signed/unsigned comparison warnings from GCC 3.3.
 1.112 26-Jul-2002  wiz Spell '[Rr]ight' correctly. From Jim Bernard.
 1.111 13-Jun-2002  itojun typo
 1.110 08-Jun-2002  itojun need to protect if_attachdomain() too
 1.109 08-Jun-2002  itojun protect dom_ifattach by splnet
 1.108 30-May-2002  itojun improve nd6_setmtu(), to warn too-small MTU on SIOCSIFMTU. sync w/kame
 1.107 27-May-2002  itojun re-scan all ifnet after domaininit() for if_afdata initialization.
 1.106 27-May-2002  itojun framework to add af-dependent data structure to struct ifnet.
as discussed at bsd-api-discuss. sync w/kame
 1.105 23-May-2002  matt Add SIOCGIFDATA and SIOCZIFDATA ioctl's to get interface data. (the Z
variant also zeroes the counters after copying them). In ifunit, add
support for dealing all numeric ifname by treating them as an ifindex
which is used to look up the interface.
 1.104 12-May-2002  matt branches: 1.104.2; 1.104.4;
Eliminate more commons.
 1.103 17-Mar-2002  simonb Make the 'ifnet' variable an extern and declare it in if.c.
 1.102 09-Feb-2002  atatat (1) Make if_index "wrap" at USHRT_MAX instead of going above it so
that other parts of the kernel won't lose gratuitously. There are
places where it's assumed that it won't grow that large.

(2) Avoid accidental reuse of occupied slots in the ifindex2ifnet[]
table.
 1.101 02-Dec-2001  abs Add an #if defined(INET) ... around if_detach_queues's declaration to match the
one around its definition.
 1.100 27-Nov-2001  augustss Make it compile in the absence of networks. Closes PR 14274 (mine).
 1.99 12-Nov-2001  lukem add RCSIDs
 1.98 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.97 17-Sep-2001  thorpej branches: 1.97.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.96 02-Aug-2001  itojun branches: 1.96.2;
fix logic to free up ifqueue on if_detach(). prev pointer was incorrectly set.
 1.95 29-Jul-2001  itojun make sure to cleanup software interrupt queues (like ipintrq)
on interface detach, otherwise we will have a dangling pointer
from m->m_pkthdr.rcvif.
 1.94 28-Jul-2001  itojun indent fix
 1.93 24-Jul-2001  itojun clear ifindex2ifnet[] on if_detach.
 1.92 18-Jul-2001  thorpej bzero -> memset
 1.91 14-Jun-2001  itojun branches: 1.91.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.90 07-Jun-2001  mrg make ifioctl() compat lkm friendly.
 1.89 02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.88 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.87 10-Apr-2001  thorpej Add a PFIL_HOOKS filtering point to every network interface.
 1.86 03-Mar-2001  thorpej branches: 1.86.2;
Add some missing ALTQ initialization, pointed out by
Kenjiro Cho <kjc@csl.sony.co.jp>.
 1.85 20-Feb-2001  itojun add SIOC[SG]LIFPHYADDR ioctl. greatly simplify tunnel address settings.
sync with kame. old ioctls are supplied but not recommended for new code.
 1.84 29-Jan-2001  thorpej if_alloc_sadl(): if the interface already has a link name, free
it before assigning a new one. This is useful for interfaces
that may change their link names in the course of their existence.
 1.83 17-Jan-2001  itojun configure sdl_alen properly
 1.82 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.81 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.80 18-Dec-2000  thorpej Add SIOCGIFDLT, which will fetch the data link type (DLT_* constant)
for a given network interface.
 1.79 18-Dec-2000  thorpej Commit to the ALTQ glue.
 1.78 14-Dec-2000  thorpej Change an if_qflush() to an IFQ_PURGE() to deal with ALTQ correctly.
 1.77 13-Dec-2000  thorpej In if_qflush(), use IFQ_PURGE() rathen than an open-coded version.
 1.76 12-Dec-2000  thorpej Only allow superuser to change 802.11 power params.
 1.75 11-Oct-2000  thorpej Change the if_reset vector to if_init, and add an if_stop. if_stop
also takes an argument indicating whether or not the interface should
also be disabled (i.e. power removed, resources freed, etc.)
 1.74 07-Oct-2000  itojun repair SIOCGIFP{DST,SRC}ADDR.
 1.73 04-Oct-2000  itojun ifp->if_ioctl may be NULL, so check it for SIOCSIFPHY*.
 1.72 04-Oct-2000  thorpej Make sure we're super-user for SIOCSIFPHYADDR, SIOCDIFPHYADDR,
and SIOCSIFPHYADDR_IN6.
 1.71 01-Oct-2000  thorpej Change the behavior of ifpromisc() slightly. If interface is not IFF_UP,
attempting to enable promisc would result in ENETDOWN. Change this to
allow the interface to always be placed in promiscuous mode, regardless
of IFF_UP. When the interface does come up, the IFF_PROMISC flag will
be consulted, and this matches the behavior that disabling promiscuous
mode has.
 1.70 29-Sep-2000  mellon - Figure out how long if list buffer needs to be if it's too short (fixes
PR#10968).
 1.69 21-Jul-2000  onoe add following two ioctls to handle WEP key for IEEE 802.11 wireless
LAN drivers: SIOCS80211NWKEY and SIOCG80211NWKEY.
 1.68 20-Jul-2000  pk Missing increment on ifp->if_pcount.
 1.67 20-Jul-2000  thorpej Add a SIOCGIFCLONERS ioctl, which fetches a list of network
interface cloners from the kernel.
 1.66 19-Jul-2000  onoe moved the check priviledge for SIOCS80211NWID from each driver to ifioctl().
it also fixes the problem that non-priviledged user can change nwid
for wi and ray drivers.
 1.65 04-Jul-2000  thorpej Move ifpromimsc() to if.c
 1.64 04-Jul-2000  thorpej Oops, restrict SIOCIF{CREATE,DESTROY} to super-user.
 1.63 02-Jul-2000  thorpej Add the notion of "cloning" of network pseudo-interface (e.g. `gif').
This allows them to be created and destroyed on the fly via ifconfig(8),
rather than specifying the count in the kernel configuration file.
 1.62 26-Apr-2000  bouyer branches: 1.62.4;
ifa_ifwithnet(): for the netatalk case, don't blindly return the first match
but try to find a exact match first. Closes kern/9957.
 1.61 30-Mar-2000  augustss Kill some more register declarations.
 1.60 30-Mar-2000  simonb Delete redundant decls of if_slowtimo and if_null{output,input,start,
ioctl,reset,watchdog,drain} - they're in <net/if.h>.
 1.59 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.58 22-Mar-2000  itojun remove if_withname, which was merged in by mistake during KAME merge.
 1.57 06-Mar-2000  thorpej - Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.
 1.56 06-Feb-2000  thorpej In if_detach(), call PRU_PURGEIF for *every* protocol within a domain
that has a usrreq entry point. Each protocol may have its own PCB
tables that need to be purged of references to the interface.
 1.55 05-Feb-2000  itojun fix route cleanup on interface removal. (not sure why -Wall did not catch it)
 1.54 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.53 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.52 29-Sep-1999  thorpej branches: 1.52.2;
const poison ifunit().
 1.51 24-Aug-1999  bouyer Fix ifa_ifwithnet() for the netatalk case: netatalk uses blocks of addresses
which can't be handled by netmask, and ifa_ifwithnet() didn't find the
interface associated with an adress if it was in the same block but not with
the same prefix. This prevented 'route add' and atalkd to work properly
with some network configs.
This has been discussed on tech-net some weeks ago.
 1.50 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.49 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.48 10-Dec-1998  christos branches: 1.48.2; 1.48.4; 1.48.6;
defopt COMPAT_43
 1.47 26-Jun-1998  thorpej branches: 1.47.6;
defopt COMPAT_SVR4
 1.46 25-Jun-1998  thorpej defopt COMPAT_LINUX
 1.45 14-May-1998  kml Driver for Essential Communications' RoadRunner HIPPI (800 Mb/sec network)
card. With some modification, this could probably also work for their
Gigabit Ethernet card based on the same chipset...
 1.44 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.43 28-Jan-1998  thorpej Use offsetof() from libkern.h
 1.42 02-Oct-1997  is Reimplement a test for broadcast addresses advertized, which was left out
when rewriting the ARP system.
 1.41 29-Aug-1997  thorpej Garbage-collect.
 1.40 29-Aug-1997  thorpej Bring changes from marc-pcmcia branch down to the trunk.
 1.39 17-Mar-1997  thorpej branches: 1.39.4;
BSD/OS-style network interface media selection, implemented by
Jonathan Stone and myself. Many thanks to Matt Thomas for providing
the information necessary to implement this interface, and for helping
to shake out the bugs.
 1.38 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.37 13-Jun-1996  cgd branches: 1.37.4;
implement SIOCGIFMTU in a generic manner, by pulling the MTU out of
each netif's if_data structure. There's no point in making each
driver implement this ioctl.
 1.36 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.35 07-May-1996  thorpej branches: 1.35.4;
Kill a couple of unnecessary calls to strlen().
 1.34 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.33 22-Apr-1996  christos - Fix fencepost error in ifconf() where if space = n * sizeof(struct ifreq),
only n - 1 interfaces would be obtained. This bug is present in the Lite2
sources too.
- Support COMPAT_SVR4 in ifconf()
 1.32 12-Mar-1996  mrg oops; back out previous change and add comment describing what the weird goto does.
 1.31 12-Mar-1996  mrg eliminate stupid use of "goto next;" where next was: "next: continue;"
 1.30 12-Mar-1996  mrg test for null ifa_dstaddr before using it. (pr#2183 from chuck cranor)
 1.29 05-Mar-1996  thorpej Handle more than 10 interfaces of a given type (well, up to `if99', anyhow).
From Neil McRae, PR #1992.
 1.28 27-Feb-1996  mycroft Emulate OSIOCGIFADDR, et al, if COMPAT_LINUX is defined.
 1.27 27-Feb-1996  mycroft Handle OSIOCGIFCONF if COMPAT_LINUX is defined.
 1.26 26-Feb-1996  mrg two more local addr changes, all done differently now (idea from charles)
 1.25 21-Feb-1996  christos Close PR/2105: if.c does not compile without COMPAT_43 due to missing casts.
 1.24 13-Feb-1996  christos Net prototypes
 1.23 12-Aug-1995  mycroft splnet --> splsoftnet
 1.22 12-Jun-1995  mycroft Make sure to initialize ifnet correctly.
 1.21 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.20 22-Apr-1995  cgd be more careful when rounding sockaddr_dl sizes. also, one u_short * ->
u_int16_t * conversion.
 1.19 09-Mar-1995  mycroft ifconf() takes a u_long, not an int.
 1.18 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.17 26-Jul-1994  cgd kill vax code, at ragge's requeust.
 1.16 29-Jun-1994  cgd branches: 1.16.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.15 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.14 05-May-1994  mycroft Remove now-bogus cast.
 1.13 05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.12 29-Apr-1994  cgd change timeout/untimeout/wakeup/sleep/tsleep args to void *
 1.11 10-Feb-1994  mycroft Deprecate af.h.
 1.10 02-Feb-1994  hpeyerl Multicast is no longer optional
 1.9 18-Dec-1993  mycroft Canonicalize all #includes.
 1.8 18-Dec-1993  mycroft Canonicalize all #includes.
 1.7 06-Dec-1993  hpeyerl multicast support.
From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.6 27-Aug-1993  mycroft branches: 1.6.2;
+ #if defined(INET) && NETHER > 0
+ #endif
Around the:
/* XXX -- Temporary fix before changing 10 ethernet drivers */
so you can compile a kernel with out INET and ETHERNET support.
 1.5 14-Aug-1993  deraadt ppp from paul mackerras
 1.4 27-Jun-1993  andrew ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6.2.3 08-Nov-1993  mycroft Fix some #includes I munged in the last commit.
 1.6.2.2 08-Nov-1993  mycroft Remove references to af.h.
 1.6.2.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.16.2.1 14-Aug-1994  mycroft update from trunk (to remove ancient vax stuff)
 1.35.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.37.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.39.4.3 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.39.4.2 29-Aug-1997  thorpej Update from trunk.
 1.39.4.1 30-Jul-1997  marc set up the inteface send queue when the if is attached, not when the
if layer is initialized. Otherwise, interfaces attached after the
autoconfiguration won't be set up properly.
 1.47.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.48.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.48.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.48.4.2 02-Aug-1999  thorpej Update from trunk.
 1.48.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.48.2.2 30-Apr-2000  he Pull up revision 1.62 (via patch, requested by bouyer):
Fix ifa_ifwithnet() for the netatalk case to properly return the
best match and not the first match. Makes netatalk work again
on networks without AppleTalk routers. Fixes PR#9957.
 1.48.2.1 24-Aug-1999  he Pull up revision 1.51:
Fix a problem in ifa_ifwithnet() for netatalk, making atalkd
work in more configurations. (bouyer)
 1.52.2.7 21-Apr-2001  bouyer Sync with HEAD
 1.52.2.6 12-Mar-2001  bouyer Sync with HEAD.
 1.52.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.52.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.52.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.52.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.52.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.62.4.6 31-Dec-2000  jhawk Pull up revisions 1.63-1.64, 1.67 (requested by bouyer):
Support cloning of network pseudo-interfaces.
 1.62.4.5 07-Oct-2000  itojun pullup 1.73 -> 1.74 (approved by eleng-1-5)
repair SIOCGIFP{DST,SRC}ADDR.
 1.62.4.4 06-Oct-2000  itojun pullup (approved by releng-1-5)
move privilege check for SIOCSIFPHY* from in{,6}_control to ifioctl.
fix privilege check mistakes (which allows non-root user to modify gif
physical address in some cases). sync with kame.
> cvs rdiff -r1.62 -r1.63 syssrc/sys/netinet/in.c
> cvs rdiff -r1.34 -r1.35 syssrc/sys/netinet6/in6.c
> cvs rdiff -r1.71 -r1.73 syssrc/sys/net/if.c
 1.62.4.3 02-Oct-2000  mellon Pull up 1.69-1.70 - kernel portion of fix for PR#10968 (jhawk approved)
 1.62.4.2 21-Jul-2000  onoe Pullup 802.11 stuff (approved by jhawk)
- add support for nwkey to ifconfig
basesrc/sbin/ifconfig/ifconfig.c 1.88
basesrc/sbin/ifconfig/ifconfig.8 1.39
syssrc/sys/dev/ic/awi.c 1.26
syssrc/sys/dev/ic/awi_wep.c 1.3
syssrc/sys/dev/ic/awivar.h 1.12
syssrc/sys/dev/pcmcia/if_wi.c 1.26
syssrc/sys/net/if.c 1.69
syssrc/sys/net/if_ieee80211.h 1.5
 1.62.4.1 21-Jul-2000  onoe Pull up 802.11 stuff (approved by jhawk)
- check priviledge for SIOCS80211NWID
syssrc/sys/dev/ic/awi.c 1.23, 1.25
syssrc/sys/net/if.c 1.66
 1.86.2.16 11-Nov-2002  nathanw Catch up to -current
 1.86.2.15 18-Oct-2002  nathanw Catch up to -current.
 1.86.2.14 27-Aug-2002  nathanw Catch up to -current.
 1.86.2.13 01-Aug-2002  nathanw Catch up to -current.
 1.86.2.12 15-Jul-2002  nathanw Whitespace.
 1.86.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.86.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.86.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.86.2.8 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.86.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.86.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.86.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.86.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.86.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.86.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.86.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.91.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.91.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.91.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.91.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.91.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.91.2.1 03-Aug-2001  lukem update to -current
 1.96.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.97.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.104.4.4 12-Mar-2004  jmc Pullup rev 1.131 (requested by briggs in ticket #1560)

Fix out-of-bounds access to ifindex2ifnet[].
 1.104.4.3 10-Sep-2003  tron Pull up revision 1.128-1.130 via patch (requested by itojun in ticket #1406):
correct if_clone_lookup(). based on diff from Quentin Garnier
fix INT_MAX check in if_clone_lookup
correct number range handling. David Young
 1.104.4.2 19-Jun-2003  grant Apply patch (requested by itojun in ticket #1290):

AF_LINK sockaddr has to be attached to ifp->if_addrlist until the end,
as many of the code assumes that TAILQ_FIRST(ifp->if_addrlist) is
non-null.
 1.104.4.1 01-Nov-2002  tron Pull up revision 1.105 (requested by martin in ticket #32):
Add SIOCGIFDATA and SIOCZIFDATA ioctl's to get interface data. (the Z
variant also zeroes the counters after copying them). In ifunit, add
support for dealing all numeric ifname by treating them as an ifindex
which is used to look up the interface.
 1.104.2.3 29-Aug-2002  gehenna catch up with -current.
 1.104.2.2 20-Jun-2002  gehenna catch up with -current.
 1.104.2.1 30-May-2002  gehenna Catch up with -current.
 1.123.2.12 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.123.2.11 01-Apr-2005  skrll Sync with HEAD.
 1.123.2.10 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.123.2.9 04-Feb-2005  skrll Sync with HEAD.
 1.123.2.8 24-Jan-2005  skrll Sync with HEAD.
 1.123.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.123.2.6 18-Dec-2004  skrll Sync with HEAD.
 1.123.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.123.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.123.2.3 03-Aug-2004  skrll Sync with HEAD
 1.123.2.2 02-Jul-2003  wrstuden Check in lwp-ification changes needed to get the evbarm/IQ80321 kernel
to compile.

only question I have is over the:
l->l_proc->p_stats->p_ru.ru_msgsnd++;
command at line 245 of dev/kttcp.c. Should we be doing per-lwp or
per-proc accounting?
 1.123.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.139.2.2 19-Nov-2006  bouyer Pull up following revision(s) (requested by adrianp in ticket #10759):
sys/net/if.c: revision 1.172 via patch
sys/net/if.c: revision 1.173 via patch
sys/net/if.c: revision 1.175 via patch
Avoid kernel memory disclose in if_clone_list().
 1.139.2.1 28-May-2004  tron branches: 1.139.2.1.2; 1.139.2.1.4;
Pull up revision 1.142 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.139.2.1.4.1 19-Nov-2006  bouyer Pull up following revision(s) (requested by adrianp in ticket #10759):
sys/net/if.c: revision 1.172 via patch
sys/net/if.c: revision 1.173 via patch
sys/net/if.c: revision 1.175 via patch
Avoid kernel memory disclose in if_clone_list().
 1.139.2.1.2.1 19-Nov-2006  bouyer Pull up following revision(s) (requested by adrianp in ticket #10759):
sys/net/if.c: revision 1.172 via patch
sys/net/if.c: revision 1.173 via patch
sys/net/if.c: revision 1.175 via patch
Avoid kernel memory disclose in if_clone_list().
 1.151.2.1 29-Apr-2005  kent sync with -current
 1.153.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.154.2.3 27-Oct-2006  ghen Pull up following revision(s) (requested by christos in ticket #1572):
sys/net/if.c: revision 1.175
Use strncpy to copy out interface names so that the trailing part of the
buffer is zeroed, and check for overflow.
 1.154.2.2 26-Oct-2006  ghen Pull up following revision(s) (requested by christos in ticket #1563):
sys/net/if.c: revision 1.172
sys/net/if.c: revision 1.173
use strlcpy instead of strncpy or bcopy to copy the interface name.
fix typo.
 1.154.2.1 15-Aug-2005  tron branches: 1.154.2.1.2;
Pull up revision 1.160 (requested by gdt in ticket #661):
Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.154.2.1.2.2 27-Oct-2006  ghen Pull up following revision(s) (requested by christos in ticket #1572):
sys/net/if.c: revision 1.175
Use strncpy to copy out interface names so that the trailing part of the
buffer is zeroed, and check for overflow.
 1.154.2.1.2.1 26-Oct-2006  ghen Pull up following revision(s) (requested by christos in ticket #1563):
sys/net/if.c: revision 1.172
sys/net/if.c: revision 1.173
use strlcpy instead of strncpy or bcopy to copy the interface name.
fix typo.
 1.159.2.11 17-Mar-2008  yamt sync with head.
 1.159.2.10 11-Feb-2008  yamt sync with head.
 1.159.2.9 04-Feb-2008  yamt sync with head.
 1.159.2.8 21-Jan-2008  yamt sync with head
 1.159.2.7 07-Dec-2007  yamt sync with head
 1.159.2.6 15-Nov-2007  yamt sync with head.
 1.159.2.5 27-Oct-2007  yamt sync with head.
 1.159.2.4 03-Sep-2007  yamt sync with head.
 1.159.2.3 26-Feb-2007  yamt sync with head.
 1.159.2.2 30-Dec-2006  yamt sync with head.
 1.159.2.1 21-Jun-2006  yamt sync with head.
 1.163.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.163.10.4 11-May-2006  elad sync with head
 1.163.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.163.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.163.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.163.8.3 03-Sep-2006  yamt sync with head.
 1.163.8.2 11-Aug-2006  yamt sync with head
 1.163.8.1 24-May-2006  yamt sync with head.
 1.163.6.1 01-Jun-2006  kardel Sync with head.
 1.163.4.1 09-Sep-2006  rpaulo sync with head
 1.168.4.2 10-Dec-2006  yamt sync with head.
 1.168.4.1 22-Oct-2006  yamt sync with head
 1.168.2.2 12-Jan-2007  ad Sync with head.
 1.168.2.1 18-Nov-2006  ad Sync with head.
 1.180.2.3 24-Mar-2007  yamt sync with head.
 1.180.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.180.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.183.6.1 29-Mar-2007  reinoud Pullup to -current
 1.183.4.1 11-Jul-2007  mjf Sync with head.
 1.183.2.7 12-Oct-2007  ad Sync with head.
 1.183.2.6 09-Oct-2007  ad Sync with head.
 1.183.2.5 20-Aug-2007  ad Sync with HEAD.
 1.183.2.4 15-Jul-2007  ad Sync with head.
 1.183.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.183.2.2 09-Jun-2007  ad Sync with head.
 1.183.2.1 10-Apr-2007  ad Sync with head.
 1.193.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.193.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.194.4.6 09-Dec-2007  jmcneill Sync with HEAD.
 1.194.4.5 04-Nov-2007  jmcneill Sync with HEAD.
 1.194.4.4 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.194.4.3 02-Oct-2007  joerg Sync with HEAD.
 1.194.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.194.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.195.2.2 07-Aug-2007  dyoung In if_alloc_sadl(), use sockaddr_dl_init() and satocsdl(). Introduce
variable 'mask' for the netmask, and use it instead of assigning
to 'sdl' twice.

In ifa_ifwithnet(), use satocsdl().
 1.195.2.1 07-Aug-2007  dyoung file if.c was added on branch matt-mips64 on 2007-08-07 04:14:38 +0000
 1.196.2.3 23-Mar-2008  matt sync with HEAD
 1.196.2.2 09-Jan-2008  matt sync with HEAD
 1.196.2.1 06-Nov-2007  matt sync with HEAD
 1.201.2.1 14-Oct-2007  yamt sync with head.
 1.202.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.203.4.2 26-Dec-2007  ad Sync with head.
 1.203.4.1 08-Dec-2007  ad Sync with head.
 1.203.2.3 18-Feb-2008  mjf Sync with HEAD.
 1.203.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.203.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.208.4.2 23-Jan-2008  bouyer Sync with HEAD.
 1.208.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.217.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.217.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.217.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.217.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.217.2.1 24-Mar-2008  keiichi sync with head.
 1.218.4.3 17-Jun-2008  yamt sync with head.
 1.218.4.2 18-May-2008  yamt sync with head.
 1.218.4.1 19-Apr-2008  yamt Peter Postma's work-in-progress pf import from OpenBSD 4.2.
updated to -current by me.
 1.218.2.5 28-Nov-2008  christos handle old IFDATAREQ; this makes ifconfig work and now my machine
comes up multi-user with new libc/new kernel old binaries.
 1.218.2.4 10-Nov-2008  christos resolve conflicts.
 1.218.2.3 09-Nov-2008  christos merge with head.
 1.218.2.2 01-Nov-2008  christos Sync with head.
 1.218.2.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.220.2.7 09-Oct-2010  yamt sync with head
 1.220.2.6 11-Aug-2010  yamt sync with head.
 1.220.2.5 11-Mar-2010  yamt sync with head
 1.220.2.4 16-Sep-2009  yamt sync with head
 1.220.2.3 19-Aug-2009  yamt sync with head.
 1.220.2.2 04-May-2009  yamt sync with head.
 1.220.2.1 16-May-2008  yamt sync with head.
 1.222.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.225.2.1 18-Jun-2008  simonb Sync with head.
 1.227.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.230.4.6 05-Jan-2012  sborrill Pull up the following revisions(s) (requested by obache in ticket #1708):
sys/net/if.c: revision 1.246

PR/44030: ifreqn2o gets called with the parameters the wrong way around.
Reverts fix for PR 42585 (ticket #1416) as the root cause of the crash is
addressed by PR 44054 (pullup #1541).
 1.230.4.5 08-Aug-2011  riz Pull up following revision(s) (requested by sborrill in ticket #1643):
sys/net/if.c: revision 1.243
Prevent if_detach() from crashing while it walks the routing table
to find and unlink routes that reference the detached ifnet: make
if_rt_walktree() return ERESTART whenever it has deleted a route.
Whenever rt_walktree() returns ERESTART, if_detach() restarts it.
I believe that this fix resembles one by Jonathan Kollasch or by someone
else, which has languished in a PR for too long. Sorry!
Tested by me and by Jeff Rizzo.
XXX It's supposed to be safe for rn_walktree() to apply to the routing
XXX table a routine that may delete routes. Why isn't it safe in
XXX practice?
 1.230.4.4 16-Feb-2011  bouyer Pull up following revision(s) (requested by chs in ticket #1541):
sys/compat/common/if_43.c: revision 1.3 via patch
sys/net/if.c: revision 1.247 via patch
PR/44054: Onno van der Linden: Stacksmashing in handling of ioctl OOSIO*
parameter.
can't map the old and the new SIO calls the way we did before because the
numbers have changed. Instead provide a switch. Keep the old code there,
to handle cases we did not handle in the first switch, but this is a hack
and should be removed.
 1.230.4.3 12-Jun-2010  riz branches: 1.230.4.3.2;
Pull up following revision(s) (requested by skrll in ticket #1416):
sys/net/if.c: revision 1.244
Correct the argument order of ifreqn2o conversion.
Fixes PR/42585.
 1.230.4.2 28-Nov-2009  bouyer Pull up following revision(s) (requested by joerg in ticket #1148):
sys/net/if.c: revision 1.241
Simplify ifreq_setaddr:
- Drop the INET6 block. The commands are never given to this function
and truncating the sockaddr is arguably not the desired result anyway.
- Clear the address before copying. This fixes SIOCGIFNETMASK and possible
other ioctls for users that don't check sa_len. This includes
COMPAT_43 and Linux emulation.
OK dyoung@
 1.230.4.1 24-Feb-2009  snj branches: 1.230.4.1.2; 1.230.4.1.4;
Pull up following revision(s) (requested by christos in ticket #459):
sys/net/if.c: revision 1.233
PR/40603: Christoph Badura: unprivileged users can add and delete interface
link addresses. Fixed by centralizing the test as suggested. Will pull up
to 5.0 once submitter tests the fix.
 1.230.4.3.2.1 08-Aug-2011  riz Pull up following revision(s) (requested by sborrill in ticket #1643):
sys/net/if.c: revision 1.243
Prevent if_detach() from crashing while it walks the routing table
to find and unlink routes that reference the detached ifnet: make
if_rt_walktree() return ERESTART whenever it has deleted a route.
Whenever rt_walktree() returns ERESTART, if_detach() restarts it.
I believe that this fix resembles one by Jonathan Kollasch or by someone
else, which has languished in a PR for too long. Sorry!
Tested by me and by Jeff Rizzo.
XXX It's supposed to be safe for rn_walktree() to apply to the routing
XXX table a routine that may delete routes. Why isn't it safe in
XXX practice?
 1.230.4.1.4.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.230.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.230.4.1.2.1 08-Aug-2011  riz Pull up following revision(s) (requested by sborrill in ticket #1643):
sys/net/if.c: revision 1.243
Prevent if_detach() from crashing while it walks the routing table
to find and unlink routes that reference the detached ifnet: make
if_rt_walktree() return ERESTART whenever it has deleted a route.
Whenever rt_walktree() returns ERESTART, if_detach() restarts it.
I believe that this fix resembles one by Jonathan Kollasch or by someone
else, which has languished in a PR for too long. Sorry!
Tested by me and by Jeff Rizzo.
XXX It's supposed to be safe for rn_walktree() to apply to the routing
XXX table a routine that may delete routes. Why isn't it safe in
XXX practice?
 1.230.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.230.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.232.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.242.4.2 05-Mar-2011  rmind sync with head
 1.242.4.1 03-Jul-2010  rmind sync with head
 1.242.2.3 06-Nov-2010  uebayasi Sync with HEAD.
 1.242.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.242.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.249.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.256.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.256.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.256.2.1 17-Apr-2012  yamt sync with head
 1.258.2.1 18-Feb-2012  mrg merge to -current.
 1.260.6.4 03-Dec-2017  jdolecek update from HEAD
 1.260.6.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.260.6.2 23-Jun-2013  tls resync from head
 1.260.6.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.260.2.1 19-Nov-2012  riz Pull up following revision(s) (requested by msaitoh in ticket #669):
sys/net/if.c: revision 1.261
Fix a bug that SIOCZIFDATA clears if_lastchage by zero.
Update if_lastchange with getnanotime().
 1.264.2.4 18-May-2014  rmind sync with head
 1.264.2.3 28-Aug-2013  rmind sync with head
 1.264.2.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.264.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.272.2.1 10-Aug-2014  tls Rebase.
 1.290.2.2 19-Apr-2019  martin Pull up following revision(s) via patch (requested by christos in ticket #1689):

sys/compat/linux/common/linux_socket.c: revision 1.145
sys/net/if.c: revision 1.449
sys/compat/linux32/common/linux32_socket.c: revision 1.30
sys/compat/common/uipc_syscalls_40.c: revision 1.19

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks!

-

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks! This is the compat code part
pointed out by ozaki-r@
 1.290.2.1 11-Nov-2014  martin branches: 1.290.2.1.2; 1.290.2.1.6;
Pull up following revision(s) (requested by ozaki-r in ticket #205):
sys/net/if.c: revision 1.292
PR/49373: Ryota Ozaki: Running if_clone_create and if_clone_destroy in
parallel causes panic
XXX: Pullup 7.
 1.290.2.1.6.1 19-Apr-2019  martin Pull up following revision(s) via patch (requested by christos in ticket #1689):

sys/compat/linux/common/linux_socket.c: revision 1.145
sys/net/if.c: revision 1.449
sys/compat/linux32/common/linux32_socket.c: revision 1.30
sys/compat/common/uipc_syscalls_40.c: revision 1.19

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks!

-

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks! This is the compat code part
pointed out by ozaki-r@
 1.290.2.1.2.1 19-Apr-2019  martin Pull up following revision(s) via patch (requested by christos in ticket #1689):

sys/compat/linux/common/linux_socket.c: revision 1.145
sys/net/if.c: revision 1.449
sys/compat/linux32/common/linux32_socket.c: revision 1.30
sys/compat/common/uipc_syscalls_40.c: revision 1.19

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks!

-

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks! This is the compat code part
pointed out by ozaki-r@
 1.300.2.12 28-Aug-2017  skrll Sync with HEAD
 1.300.2.11 05-Feb-2017  skrll Sync with HEAD
 1.300.2.10 05-Dec-2016  skrll Sync with HEAD
 1.300.2.9 05-Oct-2016  skrll Sync with HEAD
 1.300.2.8 09-Jul-2016  skrll Sync with HEAD
 1.300.2.7 29-May-2016  skrll Sync with HEAD
 1.300.2.6 22-Apr-2016  skrll Sync with HEAD
 1.300.2.5 19-Mar-2016  skrll Sync with HEAD
 1.300.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.300.2.3 22-Sep-2015  skrll Sync with HEAD
 1.300.2.2 06-Jun-2015  skrll Sync with HEAD
 1.300.2.1 06-Apr-2015  skrll Sync with HEAD
 1.354.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.354.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.354.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.354.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.354.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.354.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.371.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.394.2.19 17-Jul-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #1576):

sys/net/if.c: revision 1.479
sys/compat/common/uipc_syscalls_40.c: revision 1.23
sys/compat/linux/common/linux_socket.c: revision 1.150
sys/compat/linux32/common/linux32_socket.c: revision 1.31

Don't accept negative value.
 1.394.2.18 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.394.2.17 19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1339):

sys/net/if.c: revision 1.458
tests/net/if/t_ifconfig.sh: revision 1.21

Restore if_ioctl on error of ifc_destroy

Otherwise subsequence ioctls won't work.

Patch from Harold Gutch on PR kern/54434 (tweaked a bit by me)
tests: check if ifconfig (ioctl) works after a failure of ifconfig destroy

This is a test for PR kern/54434.
 1.394.2.16 19-Apr-2019  martin Pull up following revision(s) (requested by christos in ticket #1233):

sys/compat/linux/common/linux_socket.c: revision 1.145
sys/net/if.c: revision 1.449
sys/compat/linux32/common/linux32_socket.c: revision 1.30
sys/compat/common/uipc_syscalls_40.c: revision 1.19

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks!

-

Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
disclosure. From Andy Nguyen, many thanks! This is the compat code part
pointed out by ozaki-r@
 1.394.2.15 06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1080):

sys/netinet6/nd6.c: revision 1.251
sys/netinet/if_arp.c: revision 1.276
sys/net/if.c: revision 1.438
sys/net/if.c: revision 1.439
sys/net/route.c: revision 1.214
sys/net/route.c: revision 1.215
sys/net/route.c: revision 1.216
sys/netinet6/in6.c: revision 1.270
sys/net/route.h: revision 1.120
sys/net/if.c: revision 1.440

Remove a wrong assertion in ifaref

-

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.

-

Use atomic operations for ifa_refcnt

-

Avoid a dangling pointer during rt_replace_ifa

-

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.

-

Use rt_update framework on updating a rtentry
 1.394.2.14 27-Aug-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #991):

sys/net/if.c: revision 1.434

Restore splx removed accidentally at v1.406
Pointed out by k-goda@IIJ
 1.394.2.13 13-Jul-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #915):

sys/net/if.c: revision 1.424

Print "NET_MPSAFE enabled" if it's enabled.
 1.394.2.12 13-Jul-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #911):

sys/kern/init_main.c: revision 1.498
sys/rump/net/lib/libnet/net_component.c: revision 1.10
sys/net/if.h: revision 1.264
sys/net/if.c: revision 1.429

Fix net.inet6.ip6.ifq node doesn't exist

The node (and child nodes) is initialized in sysctl_net_pktq_setup, but the call
of sysctl_net_pktq_setup is skipped unexpectedly.
sysctl_net_pktq_setup is skipped if in6_present is false that indicates the
netinet6 component isn't loaded on rump kernels. However the flag is
accidentally always false because the flag is turned on in in6_dom_init that is
called after if_sysctl_setup on both normal and rump kernels.

Fix the issue by moving if_sysctl_setup after in6_dom_init (domaininit on normal
kernels). This fix is ad-hoc but good enough for netbsd-8. We should refine
the initialization order of network components in the future.

Pointed out by hikaru@
 1.394.2.11 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #844):

sys/net/if.c: revision 1.425

Relax a lock check in if_mcast_op unless NET_MPSAFE

It seems that there remain some paths that don't satisfy the constraint that is
required only if NET_MPSAFE. So don't check it by default.

One known path is nd6_rtrequest => in6_addmulti => if_mcast_op, which is not
easy to address.
 1.394.2.10 15-May-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #826):

sys/net/if_bridge.c: revision 1.155
sys/net/if.c: revision 1.421
sys/net/bpf.c: revision 1.224
sys/net/if.c: revision 1.422
sys/net/if.c: revision 1.423

Use if_is_mpsafe (NFC)

Protect packet input routines with KERNEL_LOCK and splsoftnet
if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.
if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect
non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@

Protect if_deferred_start_softint with KERNEL_LOCK if the interface isn't
MP-safe
 1.394.2.9 28-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #595):
sys/net/if.c: revision 1.398
sys/net/rtsock.c: revision 1.231
remove useless cast, initialize family.
Avoid using a zero family mask.
 1.394.2.8 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.394.2.7 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.394.2.6 13-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #487):
sys/net/if.c: revision 1.417
Suppress the assertion of IFNET_LOCK in if_mcast_op if MROUTING
MROUTING doesn't deal with IFNET_LOCK yet.
Reported by kardel@
 1.394.2.5 13-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #486):
sys/net/if.c: revision 1.418
Check MP-safety in ifa_insert and ifa_remove only for IFEF_MPSAFE drivers
Eventually the assertions should pass for all drivers, however, at this point
it's too eager.
Fix PR kern/52895
 1.394.2.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.394.2.3 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.394.2.2 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #407):
sys/compat/linux32/common/linux32_socket.c: revision 1.28
sys/net/if.c: revision 1.400
sys/netipsec/key.c: revision 1.243
sys/compat/linux/common/linux_socket.c: revision 1.139
sys/netinet/ip_carp.c: revision 1.93
sys/netinet6/in6.c: revision 1.252
sys/netinet6/in6.c: revision 1.253
sys/netinet6/in6.c: revision 1.254
sys/net/if_spppsubr.c: revision 1.173
sys/net/if_spppsubr.c: revision 1.174
sys/compat/common/uipc_syscalls_40.c: revision 1.14
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Fix usage of FOREACH macro
key_sad.lock is held there so SAVLIST_WRITER_FOREACH is enough.
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref (more)
Fix and make consistent of usages of psz/psref in ifconf variants
Remove unnecessary goto because there is no cleanup code to share (NFC)
Tweak a condition; we don't need to care ifacount to be negative
Fix a race condition of in6_ifinit
in6_ifinit checks the number of IPv6 addresses on a given interface and
if it's zero (i.e., an IPv6 address being assigned to the interface
is the first one), call if_addr_init. However, the actual assignment of
the address (ifa_insert) is out of in6_ifinit. The check and the
assignment must be done atomically.
Fix it by holding in6_ifaddr_lock during in6_ifinit and ifa_insert.
And also add missing pserialize to IFADDR_READER_FOREACH.
 1.394.2.1 01-Jul-2017  snj Pull up following revision(s) (requested by roy in ticket #77):
sys/net/if.h: revision 1.240
sys/netinet/if_arp.c: revision 1.253
sys/net/if.c: revision 1.395
Introduce if_get_bylla to find an interface with the active
local link address.
--
Use if_get_bylla() instead of just looking at the lla of the interface
the address belongs to.
This allows any ARP message we receieved from another interface to
be correctly dropped.
While here, move the protocol length check higher up the food chain.
 1.419.2.22 22-Jan-2019  pgoyette Convert the MODULE_{,VOID_}HOOK_CALL macros to do everything in-line
rather than defining an intermediate hook##call function. Almost
all of the hooks are called only once, and although we lose the
ability of doing things like

if (MODULE_HOOK_CALL(...) == 0) ...

we simplify things quite a bit. With this change, we no longer need
to have both declaration and definition macros, and the definition
no longer needs to have both prototype argument list and a "real"
argument list.

FWIW, the above if now needs to written as

int ret;

MODULE_HOOK_CALL(..., ret);
if (ret == 0) ...

with appropriate use of braces {}.
 1.419.2.21 18-Jan-2019  pgoyette Don't restrict hooks to having only int or void types. Pass the hook's
type to the various macros, as needed.

Allows us to reduce diffs to original in at least one or two places (we
no longer have to provide an additional parameter to the hook routine
for returning a non-int return value).
 1.419.2.20 14-Jan-2019  pgoyette Create a variant of the HOOK macros that handles hook routines of
type void, and use them where appropriate.
 1.419.2.19 13-Jan-2019  pgoyette Remove the HOOK2 versions of the MODULE_HOOK macros. There were
only a few uses, and using them led to some lack of clarity in the
code. Instead, we now use two separate hooks, with names that
make it clear(er) what we're doing.

This also positions us to start unraveling some of the rtsock_50
mess, which will need (at least) five hooks.
 1.419.2.18 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.419.2.17 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.419.2.16 20-Oct-2018  pgoyette Sync with head
 1.419.2.15 30-Sep-2018  pgoyette Ssync with HEAD
 1.419.2.14 29-Sep-2018  pgoyette In MODULE_HOOK_CALL_DECL we don't need to provide the actual argument
list for calling the hook function, nor do we need to provide the
default value (for when the hook has not been set).
 1.419.2.13 21-Sep-2018  pgoyette Clean-up some pre-existing function-pointer code (related to if_43)
to use the new MP-safe mechanism.
 1.419.2.12 20-Sep-2018  pgoyette The uipc_syscalls_40 compat routine doesn't have a ``struct lwp *l''
argument - adjust hook parameter lists accordingly.
 1.419.2.11 20-Sep-2018  pgoyette Use the MP-safe hooks mechanism for the uipc_syscalls_40 and _50
routines.
 1.419.2.10 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.419.2.9 28-Jul-2018  pgoyette Sync with HEAD
 1.419.2.8 25-Jun-2018  pgoyette Sync with HEAD
 1.419.2.7 21-May-2018  pgoyette Sync with HEAD
 1.419.2.6 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.419.2.5 12-Apr-2018  pgoyette Merge christos's recent changes on HEAD
 1.419.2.4 08-Mar-2018  pgoyette Handle ifconf() compat vectors
 1.419.2.3 07-Mar-2018  pgoyette Remove redundant assignment
 1.419.2.2 06-Mar-2018  pgoyette Allocate and initialize the vector for compat_ifconf()
 1.419.2.1 06-Mar-2018  pgoyette Untangle some networking compat code so we can build a kernel with
networking and MODULAR, but without any actual COMPAT_* code (ie,
assuming that all the compat stuff can be added later via modules).
 1.428.2.3 21-Apr-2020  martin Sync with HEAD
 1.428.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.428.2.1 10-Jun-2019  christos Sync with HEAD
 1.457.2.4 17-Jul-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #1018):

sys/net/if.c: revision 1.479
sys/compat/common/uipc_syscalls_40.c: revision 1.23
sys/compat/linux/common/linux_socket.c: revision 1.150
sys/compat/linux32/common/linux32_socket.c: revision 1.31

Don't accept negative value.
 1.457.2.3 17-Dec-2019  martin Pull up following revision(s) (requested by christos in ticket #569):

sys/dev/usb/if_umb.c: revision 1.10
sys/net/if.c: revision 1.466
sys/dev/ic/ath.c: revision 1.129

Protect network ioctls from non-authorized users. (Ilja Van Sprundel)
 1.457.2.2 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.457.2.1 19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #98):

sys/net/if.c: revision 1.458
tests/net/if/t_ifconfig.sh: revision 1.21

Restore if_ioctl on error of ifc_destroy

Otherwise subsequence ioctls won't work.

Patch from Harold Gutch on PR kern/54434 (tweaked a bit by me)
tests: check if ifconfig (ioctl) works after a failure of ifconfig destroy

This is a test for PR kern/54434.
 1.466.2.2 29-Feb-2020  ad Sync with head.
 1.466.2.1 25-Jan-2020  ad Sync with head.
 1.473.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.484.8.1 31-May-2021  cjep sync with head
 1.484.6.2 01-Aug-2021  thorpej Sync with HEAD.
 1.484.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.528.2.2 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.528.2.1 14-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1136):

sys/net/route.c: revision 1.238
sys/net/route.c: revision 1.239
sys/net/if.c: revision 1.535

route: do ifa_rtrequest() before rt_addaddr()

ifa_rtrequest() could change a given rtentry in the routing table.


route: lower the priority of the workqueues

PRI_SOFTNET makes the kthread of a workqueue SCHED_RR which can monopolize
a CPU if there are many rtentries to free in rt_free_work. So lower the
prirority of the workqueues to PRI_USER which is the scheduling class for
time-sharing.

Also change rt_timer_wq as well just in case.


if: protect if_link_state_change_process with IFNET_LOCK

This change avoids race conditions between if_link_state_change handlers
and other operations on a target interface such as if_ioctl.
 1.529.2.1 11-Nov-2023  thorpej branches: 1.529.2.1.2;
Mostly de-tangle ifnet::if_snd from ifaltq, in a way that's minimally-
invasive to the ALTQ code itself.

The point of this is to lay the groundwork for future changes to ifqueue,
which among other benefits, will also hide the ALTQ ABI from drivers.
 1.529.2.1.2.5 16-Nov-2023  thorpej - Rename if_transmit() -> if_transmit_default()
- In if_enqueue(), handle the ALTQ-is-enabled case by creating a sort of
chimera from ifq_put_slow() and if_transmit_default(), mainly to avoid
having to repeatedly take and release the ifq lock.
 1.529.2.1.2.4 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.529.2.1.2.3 15-Nov-2023  thorpej Protect the ALTQ state that's exposed to the ifqueue if the ifq->ifq_lock.
This requires exposing some implementation details to ALTQ, which is guarded
by an __IFQ_PRIVATE define.
 1.529.2.1.2.2 15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.529.2.1.2.1 14-Nov-2023  thorpej New network interface output queue API.
 1.530.2.1 02-Aug-2025  perseant Sync with HEAD
 1.308 05-Jun-2025  ozaki-r if: remove unused ifa_ifwithaf()
 1.307 05-Jun-2025  ozaki-r if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
 1.306 22-Sep-2024  andvar s/remvoed/removed/ in comment.
 1.305 09-Oct-2023  riastradh branches: 1.305.2; 1.305.4;
net/if.h: Explain the IFF_ALLMULTI situation.

No functional change intended.
 1.304 25-Nov-2022  knakahara branches: 1.304.2;
Support explicit unnumbered interface.

Currently, NetBSD supports implicit unnumbered interface by setting
the same IP address to two interfaces. However, such interface is not
treated as unnumbered when one of the interfaces is being changed and
has been changed IP address. That behavior can be harmful for some
routing daemons.
 1.303 24-Oct-2022  msaitoh Make ifq_drops in struct ifqueue and struct ifaltq 64 bit.
 1.302 18-Sep-2022  martin Typo in comment
 1.301 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.300 20-Aug-2022  riastradh ifnet(9): Defer if_watchdog (a.k.a. if_slowtimo) to workqueue.

This is necessary to make mii_down and the *_init/stop routines that
call it to sleep waiting for MII callouts on other CPUs.

Mark the workqueue and callout MP-safe; only take the kernel lock
around the callback.

No kernel bump despite change to struct ifnet because the change is
ABI-compatible and using the callout outside net/if.c has never been
kosher.
 1.299 28-Jul-2022  skrll Trailing whitespace
 1.298 20-Jun-2022  yamaguchi bpf(4): added support for VLAN hardware offloading of ethernet devices
 1.297 20-Jun-2022  yamaguchi Handling frames that vlan id is 0 as non-VLAN frames
even if a vlan tag is stripped by harware offloading
 1.296 31-Dec-2021  riastradh sys/net: New functions if_ioctl, if_init, and if_stop.

These are wrappers, suitable for inserting appropriate kasserts
regarding the API's locking contract, for the corresponding functions
in struct ifnet.

Since these are intended to commit configuration changes to the
interface, which may involve resetting the device, the caller should
hold IFNET_LOCK. However, I can't straightforwardly prove that all
callers do yet, so the assertion is disabled for now.
 1.295 30-Sep-2021  yamaguchi net: obsolete ifnet::if_link_state_chenged
that was used for updating link-state of vlan I/F

The obsoleted function is replaced with
ifnet::if_linkstate_hooks
 1.294 30-Sep-2021  yamaguchi Provide a hook point called at change of link state
 1.293 30-Sep-2021  yamaguchi Replace ifnet::if_agriprivate with ifnet::if_lagg

agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).

After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
 1.292 09-Aug-2021  andvar fix various typos in compatibility, mainly in comments.
 1.291 29-Jun-2021  riastradh Make if_stats_init, if_attach, if_initialize return void.

percpu_alloc can't fail.


Author: Maya Rashish <maya@NetBSD.org>
Committer: Taylor R Campbell <riastradh@NetBSD.org>
 1.290 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.289 15-Oct-2020  roy branches: 1.289.6; 1.289.8;
net: remove IFEF_NO_LINK_STATE_CHANGE

This flag was only set for virtual interfaces.
All virtual interfaces have a means of knowing if they are going to work
or not and as such now support link state changes.

If we want this flag back, it should be used as an indicator that
the interfaces does not support link state changes that userland can use
so it can make a decision on what to do when the link state is UNKNOWN.
 1.288 27-Sep-2020  roy bridge: When an interface joins then mark addresses on it as tentative

The exact flow is detatch addresses, join bridge and then mark detached
addresses as tentative.
This ensures that Duplicate Address Detection for the joining interface
are performed across all members of the bridge.
 1.287 26-Sep-2020  roy net: Add a callback to ifnet to notify of link state changes
 1.286 26-Sep-2020  roy net: Fix the setting of if_link_state

Link state changes are not dependant on the interface being up, but we also
need to guard against more link state changes being scheduled when the
interface is being detached.

We do this by clearing the link queue but keeping if_link_sheduled = true.
We can check for this in both if_link_state_change() and
if_link_state_change_work() to abort early as there is no point in doing
anything if the interface is being detached because if_down() is called
in if_detach() after the workqueue has been drained to the same overall
effect.
 1.285 22-Sep-2020  roy ifconfig: Report link state even if media is not supported

For AF_LINK addrs from getifaddrs(2), ifa_data is struct if_data.
This in turn holds ifi_link_state which we can use to report
link status if the interface does not support media where it's normally
reported.

Based on OpenBSD.
 1.284 28-Aug-2020  ozaki-r net: introduce IFQ_ENQUEUE_ISR to assemble packet queuing routines (NFCI)
 1.283 05-May-2020  jdolecek remove struct ifnet if_mcastop, it's not used by anything
 1.282 14-Feb-2020  thorpej Remove the conditional __IF_STATS_PERCPU.
 1.281 06-Feb-2020  thorpej Perform link state change processing on a work queue, rather than in a
softint.
 1.280 01-Feb-2020  thorpej Make if_stats competely opaque to user-space.
 1.279 01-Feb-2020  thorpej Flip the switch to the per-cpu implementation in <net/if_stats.h>. Leave
the conditional in place for a time in case serious problems are discovered,
so that the Old Way can be re-enabled quickly. After some time, the Old
Way will be removed completely.
 1.278 29-Jan-2020  thorpej Add support for MP-safe network interface statistics by maintaining them
in per-cpu storage, and collecting them for export in an if_data structure
when user-space wants them.

The new if_stat API is structured to make a gradual transition to the
new way in network drivers possible, and per-cpu stats are currently
disabled (thus there is no kernel ABI change). Once all drivers have
been converted, the old ABI will be removed, and per-cpu stats will be
enabled universally.
 1.277 19-Sep-2019  knakahara branches: 1.277.2;
Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.276 13-Sep-2019  msaitoh if_flags is neither int nor short. It's unsigned short.
 1.275 10-Aug-2019  rmind Add the ifnet_t::if_npf_private field. Bump the kernel version.
Fixes PR/54098.
 1.274 04-Jul-2019  ozaki-r branches: 1.274.2;
Add support for a network interface description.

ioctl(2):
- Add SIOCGIFDESCR/SIOCSIFDESCR commands to get/set the description.

This enables to make a memo for interface, like "Home network" or "Remote VPN".

From t-kusaba@IIJ
 1.273 24-Jun-2019  skrll Fix 'unknown' spellos
 1.272 10-May-2019  msaitoh Remove extra parentheses. No functional change.
 1.271 10-May-2019  msaitoh Add missing parentheses for IFQ_CLASSIFY macro's argument.
 1.270 10-May-2019  msaitoh Modify comment to make the data structure clear. No functional change.
 1.269 23-Mar-2019  pgoyette Replace compile-time checking for vlan code with a module hook.

Should resolve the errors reported on irc when booting a kernel which
has agr without vlan:


[ 1.0000000] WARNING: module error: built-in module if_agr can't find builtin dependency `if_vlan'
[ 1.0000000] WARNING: module error: built-in module if_agr prerequisite if_vlan failed, error 2
 1.268 05-Feb-2019  msaitoh Remove NOTRAILERS from IFFBITS.
 1.267 05-Feb-2019  msaitoh Remove very old IFF_NOTRAILERS flag.
 1.266 18-Oct-2018  knakahara fix panic when do ifconfig -vlanif and ifconfig vlanif again. advised by ozaki-r@.

e.g. do the following commands.
====================
# ifconfig vlan0 create
# ifconfig vlan0 vlan 100 vlanif wm0
# ifconfig vlan0 -vlanif wm0
# ifconfig vlan0 vlan 100 vlanif wm0
====================

ATF net/if_vlan do this type of test, however it cannot detect this bug.
Because the shmif(4)'s ifp->if_hwdl is always NULL as shmif(4)'s ethernet
address is set U/L bit.
See: https://nxr.netbsd.org/xref/src/sys/net/if_ethersubr.c#997
 1.265 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.264 03-Jul-2018  ozaki-r Fix net.inet6.ip6.ifq node doesn't exist

The node (and child nodes) is initialized in sysctl_net_pktq_setup, but the call
of sysctl_net_pktq_setup is skipped unexpectedly.

sysctl_net_pktq_setup is skipped if in6_present is false that indicates the
netinet6 component isn't loaded on rump kernels. However the flag is
accidentally always false because the flag is turned on in in6_dom_init that is
called after if_sysctl_setup on both normal and rump kernels.

Fix the issue by moving if_sysctl_setup after in6_dom_init (domaininit on normal
kernels). This fix is ad-hoc but good enough for netbsd-8. We should refine
the initialization order of network components in the future.

Pointed out by hikaru@
 1.263 21-Jun-2018  knakahara branches: 1.263.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.262 12-Jun-2018  ozaki-r Check if ether_ifdetach is called without INET_LOCK
 1.261 01-May-2018  maxv Move if_name() from net_osdep.h to if.h. net_osdep.h is now unused and can
be removed - the other BSDs did the same.

Discussed with Kengo (if.h suggested by him).
 1.260 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.259 12-Apr-2018  ozaki-r Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.
 1.258 15-Jan-2018  maxv branches: 1.258.2;
Add a KASSERT in IFQ_CLASSIFY, we really need to make sure the given
mbuf is the top of the chain.
 1.257 18-Dec-2017  ozaki-r Note that IFNET_LOCK must not be held in softint
 1.256 15-Dec-2017  ozaki-r Write a guideline for converting an interface to IFEF_MPSAFE

Requested by skrll@
 1.255 15-Dec-2017  ozaki-r Describe which lock is used to protect each member variable of struct ifnet

Requested by skrll@
 1.254 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.253 11-Dec-2017  ozaki-r Wrap if_ioctl_lock with IFNET_* macros (NFC)

Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
 1.252 11-Dec-2017  ozaki-r Rename IFNET_LOCK to IFNET_GLOBAL_LOCK

IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
 1.251 08-Dec-2017  ozaki-r Revert "Make if_timer MP-safe if IFEF_MPSAFE"

Because it has decreased the performance of wm. And also I found that
wm_watchdog doesn't work well with if_watchdog framework at all. Sharing one
counter (if_timer) with multiple instances (hardware multi-queues) can't detect
a single (or some) stall of them because other instances reset the counter even
if the stalled one want the watchdog to fire.

Interfaces without IFEF_MPSAFE works safely with the original if_watchdog thanks
to KENREL_LOCK. OTOH, interfaces with IFEF_MPSAFE shouldn't use if_watchdog and
should implement their own watchdog timer that works with multiple instances.
 1.250 08-Dec-2017  ozaki-r Fix build of kernels without ether

By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.

PR kern/52790
 1.249 06-Dec-2017  ozaki-r Make if_timer MP-safe if IFEF_MPSAFE

if_timer, a counter used by if_watchdog (if_slowtimo), can be modified in
if_watchdog and if_start and/or interrupt handlers of some device drivers. All
such accesses were serialized by KERNEL_LOCK. If IFEF_MPSAFE is enabled,
KERNEL_LOCK of if_start (and perhaps interrupt handlers) is omitted and if_timer
becomes racy.

Fix the race condition by protecting if_timer by a spin mutex. if_watchdog_reset
and if_watchdog_stop are introduced to ensure to take the mutex on accessing
if_timer. Interface with IFEF_MPSAFE enabled must use the functions.

In addition, if_watchdog callout is now set CALLOUT_MPSAFE if IFEF_MPSAFE. It
means that if_watchdog implemented by a driver must be MP-safe if the driver is
set IFEF_MPSAFE.

Currenlty interfaces with IFEF_MPSAFE implementing if_watchdog and accessing
if_timer in if_start and interrupt handlers are only wm(4). wm is changed to
use the functions. (Its watchdog handler (wm_watchdog) is already MP-safe.

These contracts will be written somewhere in a further commit.

Note that the spin mutex is now ifp->if_snd.ifq_lock to avoid adding another
spin mutex to each interface. For now reusing it isn't problematic (see the
comment to know why) thought if that does matter in the future, feel free to
replace it with a new spin mutex. It's easy to do.
 1.248 06-Dec-2017  knakahara unify processing to check nesting count for some tunnel protocols.
 1.247 06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock on if_up and if_down

One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
 1.246 06-Dec-2017  ozaki-r Fix locking against myself on ifpromisc

vlan_unconfig_locked could be called with holding if_ioctl_lock.
 1.245 06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock when calling if_flags_set
 1.244 22-Nov-2017  ozaki-r Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE

If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.

This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.

Proposed on tech-kern@ and tech-net@
 1.243 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.242 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.241 23-Oct-2017  msaitoh if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
 1.240 27-Jun-2017  roy Introduce if_get_bylla to find an interface with the active
local link address.
 1.239 19-May-2017  ozaki-r branches: 1.239.2;
Allow CARP to call the link_state_change handler immediately

If the handler is delayed because of the indirection call via softint,
some operations are executed in reverse and may cause unexpected
behaviors. For example, due to the issue a GARP packet wasn't sent on
a transition from the BACKUP state to the MASTER state; this happened
because IN_IFF_DETACHED flag wasn't cleared on arpannounce, which
had been cleared in the link_state_change handler.

This fixes an issue reported by sborrill@ on tech-net:
http://mail-index.netbsd.org/tech-net/2017/03/14/msg006283.html
 1.238 06-Apr-2017  ozaki-r Revert "Make sure to hold if_ioctl_lock when calling ifp->if_ioctl"

As per pgoyette@ and riastradh@ requests; we shouldn't decide to
hold a lock based on if the lock is held or not.
 1.237 05-Apr-2017  ozaki-r Make sure to hold if_ioctl_lock when calling ifp->if_ioctl

Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.
 1.236 14-Mar-2017  ozaki-r Use if_acquire and if_release instead of using psref API directly

- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
 1.235 23-Feb-2017  ozaki-r Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.234 17-Feb-2017  ozaki-r Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.
 1.233 22-Dec-2016  ozaki-r branches: 1.233.2;
Remove assertion that the lock isn't held

It's useless in this case, because without it we can know that
the lock is held or not on a next lock acquisition and even more
if LOCKDEBUG is enabled a failure on the acquisition will provide
useful information for debugging while an assertion failure will
provide just the fact that the assertion failed.
 1.232 13-Dec-2016  ozaki-r Constify ifp of if_is_deactivated
 1.231 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.230 08-Dec-2016  ozaki-r Introduce deferred if_start framework

The framework provides a means to schedule if_start that will be executed
in softint later. It intends to be used to avoid calling if_start,
especially bpf_mtap, in hardware interrupt.

It adds a dedicated softint to a driver if the driver requests to use the
framework via if_deferred_start_init. The driver can schedule deferred
if_start by if_schedule_deferred_start.

Proposed and discussed on tech-kern and tech-net
 1.229 22-Nov-2016  ozaki-r Make lortrequest static and rename it to loop_rtrequest

No functional change.
 1.228 08-Oct-2016  joerg Since IFF_MULTICAST's value can't be represented without implicit cast
as signed short, make if_flags unsigned.
 1.227 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.226 21-Sep-2016  roy Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.
 1.225 10-Aug-2016  kre On the first day (that being the eighth day of the eighth month,) the
building was completed only to discover that within there lay havoc.

On the second day all just groaned and moaned, and it must be someone
else's problen.

On the third day, St. Martin stepped in and traced the culprit, which
provided inspiration, and a correction was made.

Forevermore all were agog at just how such a trivial thing could do
so much damage...


OK... to be a little less vague. The loopback interface is a truly
"special" thing, and rump knew that - and treated it very specially.
Unfortunately, when the loopback interface is changed, and rump does
not keep up, bad things happen.

This (overall) might, or might not, be the correct fix - but for now
it appears to work. If someone, sometime, finds a better way to
deal with the issues of the loopback interfaces true majesty, feel
free to revert this and do it another way.
 1.224 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.223 01-Aug-2016  ozaki-r Revert "Revert part of "Switch the address list of intefaces to pslist(9)" (r1.220)"

netstat now uses sysctl instead of kvm(3) to get address information from
the kernel. So we can avoid the issue introduced by the reverted commit
(PR kern/51325) by updating netstat with the latest source code.
 1.222 22-Jul-2016  knakahara Toward NET_MPSAFE-on in future, if_snd uses if_snd->ifq_lock by default.

That can reduce confusing difference between NET_MPSAFE on and off.
 1.221 11-Jul-2016  ozaki-r branches: 1.221.2;
Revert part of "Switch the address list of intefaces to pslist(9)" (r1.220)

Reverting the whole change set just messes up many files uselessly
because changes to them (except for if.h) are proper.

- Remove ifa_pslist_entry that breaks kvm(3) users (e.g., netstat -ia)
- Change IFADDR_{READER,WRITER}_* macros to use old IFADDR_* (or just NOP)
for now

Fix PR kern/51325
 1.220 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.219 30-Jun-2016  ozaki-r Get rid of duplicate prototype of ifafree
 1.218 28-Jun-2016  ozaki-r Introduce if_is_deactivated

Checking ifp->if_output == if_nulloutput is too implicit.

No functional change.
 1.217 27-Jun-2016  knakahara fix spelling mistake pointed out by roy@n.o
 1.216 27-Jun-2016  knakahara reduce link state changing softint if it is not required

ok by ozaki-r@n.o
 1.215 22-Jun-2016  knakahara fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.214 21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.213 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.212 21-Jun-2016  ozaki-r Introduce if_index_t
 1.211 20-Jun-2016  knakahara introduce if_start_lock()

if_start_lock() calls ifp->if_start() holding KERNEL_LOCK if it is required.
 1.210 20-Jun-2016  knakahara fix: i386 build failure
 1.209 20-Jun-2016  knakahara introduce if_output_lock()

if_output_lock() calls ifp->if_output() holding KERNEL_LOCK if it is required.
 1.208 20-Jun-2016  knakahara introduce if_extflags (was if__pad1)
 1.207 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.206 16-May-2016  ozaki-r Replace ifnet_lock with if_get and if_put

ifnet_lock is a dedicated method to safely destroy an interface over running
ioctl operations. Replace it with a more generic mechanism using psref(9).
 1.205 16-May-2016  ozaki-r Introduce if_get, if_get_byindex and if_put

The new API enables to obtain an ifnet object with protected by psref(9).
It is intended to be used where an obtained ifnet object is used over
sleepable operations.
 1.204 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.203 28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.202 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.201 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.200 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (2/3) : eliminate pktattr argument from altq implemantation
 1.199 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (1/3) : add altq_pktattr fields to m_pkthdr

Reviewed by joerg@n.o and tls@n.o, thanks.
 1.198 19-Feb-2016  roy Implement a queue for if_link_state_change() calls to fix a race condition
introduced in the prior patch.

The queue has capacity to store 8 link state changes, if it overflows then
the oldest state change is lost, but the oldest DOWN state change is
preserved to ensure any subsequent UP state changes reflect properly.

Because there are only 3 states to queue, the queue itself is implemented
by storing 2-bit numbers in a bigger one.
To increase the size of the queue, just increase the size of the backing
store to a bigger number.
 1.197 16-Feb-2016  ozaki-r Remove workaround for GATEWAY

The workaround was introduced because lltable/llentry uses rwlock
but it may be executed in hardware interrupt due to fast forward.
Now we don't run fast forward in hardware interrupt anymore, so
we can remove the workaround.
 1.196 15-Feb-2016  ozaki-r Run if_link_state_change in softint

if_link_state_change can execute the network stack that is expected to
not run in hardware interrupt (at least now), however network drivers
may call it in hardware interrupt. Avoid that by introducing a new
softint for if_link_state_change.

The original patch is provided by mlelstv@ and tweaked a bit by me.

Should fix PR kern/50602.
 1.195 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.194 04-Jan-2016  ozaki-r Fix the destruction of the afdata lock

Pointed out by mlelstv@
 1.193 02-Oct-2015  ozaki-r Fix typo
 1.192 30-Sep-2015  ozaki-r Make GATEWAY (fastforward) work again

With GATEWAY (fastforward), the whole forwarding processing runs in
hardware interrupt context. So we cannot use rwlock for lltable and
llentry in that case.

This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
when GATEWAY is enabled. We need to tweak locking only around rtree
in lltable_free. Other than that, what we need to do is to change macros
for locks.

I hope fastforward runs in softint some day in the future...
 1.191 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.190 18-May-2015  martin Implement SIOCIFGCLONERS for netbsd32, so ifconfig -C works.
 1.189 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.188 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.187 07-Apr-2015  roy Move in6if_do_dad() to if_do_dad() as the routine is not INET6 specific
and could equally be used by INET.
 1.186 03-Apr-2015  msaitoh Use 1000ULL to prevent integer overflow (for IF_Gbps(10)). Same as OpenBSD.
 1.185 16-Jan-2015  ozaki-r Remove an outdated snippet for NET_MPSAFE
 1.184 15-Dec-2014  ozaki-r Introduce if_initialize and if_register as an alternative to if_attach

if_attach initializes an ifnet object and registers it to the system
(e.g., ifnet_list), however, if_attach doesn't complete the
initialization and the rest of it will be done by if_alloc_sadl
that is normally directly called by device drivers or called via
functions like ether_ifattach. So there is a race between
if_attach and if_alloc_sadl (A half-baked ifnet object may be
accessed, for example, via ioctl between them).

The aim of this fix is to register an initializing ifnet object
after completing its initializations. To this end, this fix
separates if_attach into an initialization part (if_initialize)
and a registration part (if_register) and call the latter after
if_alloc_sadl (ether_ifattach). So a typical usage of the two
new APIs is like this:

if_initialize(ifp); // was if_attach
ether_ifattach(ifp, enaddr);
if_register(ifp);

Nonetheless, changing every drivers to do so at once isn't
feasible. So we keep if_attach working as it used to be and
will change only some drivers that we need at this point.
Once we know the fix really works well, we'll change all
the others.

Some more information of the fix can be found here:
http://mail-index.netbsd.org/tech-kern/2014/12/10/msg018242.html

No objection on tech-kern and tech-net.
 1.183 02-Dec-2014  ozaki-r Revert "Pull if_drain routine out of m_reclaim"

The commit broke dlopen()'d rumpnet on platforms where ld.so does not
override weak aliases (e.g. musl, Solaris, potentially OS X, ...).

Requested by pooka@.
 1.182 01-Dec-2014  ozaki-r Make more functions static

No functional change.
 1.181 28-Nov-2014  ozaki-r branches: 1.181.2;
Remove dead codes and make if_free_sadl static

No functional change.
 1.180 27-Nov-2014  ozaki-r Pull if_drain routine out of m_reclaim

It's if-specific and should be in if.c.

No functional change.
 1.179 26-Nov-2014  ozaki-r Change if_slowtimo_ch to a pointer

One benefit to do so is to reduce memory used for struct callout;
we can avoid to allocate struct callout for interfaces that don't
use callout.

Requested by uebayasi@.
 1.178 26-Nov-2014  ozaki-r Create if_slowtimo (if_watchdog) callout for each interface

This change is to obviate the need to run if_slowtimo callbacks that
may sleep inside IFNET_FOREACH. And also by this change we can turn
on MPSAFE of callouts individually.

Discussed with uebayasi@ and riastradh@.
 1.177 26-Nov-2014  ozaki-r Rename if_watchdog to if_slowtimo

if_watchdog callbacks do a little more than what "watchdog" suggests.

Discussed with uebayasi@ (the idea originally from openbsd-tech).
 1.176 26-Nov-2014  ozaki-r Make if_slowtimo static
 1.175 09-Sep-2014  rmind Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.174 31-Jul-2014  ozaki-r branches: 1.174.2;
Define IFADDR_FOREACH_SAFE for on-the-fly element removal in a loop

We have to use it when we purge an address element in an ifaddr loop.

This change restores the original behavior that was accidentally degraded.
 1.173 31-Jul-2014  ozaki-r Define IFNET_EMPTY() and replace !IFNET_FIRST() with it

No functional change.
 1.172 16-Jul-2014  ozaki-r Kill void * for bridge in struct ifnet

No functional change.
 1.171 14-Jul-2014  ozaki-r Make bridge MPSAFE

- Introduce BRIDGE_MPSAFE
- It's enabled only when NET_MPSAFE is defined
in if.h or the kernel config
- Add iflist and rtlist mutex locks
- Locking iflist is performance sensitive,
so it's not used when !BRIDGE_MPSAFE
- Add bif object reference counting
- It enables fine-grain locking for bridge member lists
by allowing to not hold a lock during touching a bif
- bridge_release_member is added to decrement the
reference count
- A condition variable is added to do bridge_delete_member
gracefully
- Add if_bridgeif to ifnet
- It's a shortcut to a bif object of a bridge member
- It reduces a bif lookup cost and so lock contention on iflist
- Make bridgestp MPSAFE too
 1.170 01-Jul-2014  ozaki-r Unbreak lib/libc/net/getifaddrs.c

--- getifaddrs.o ---
In file included from /tmp/bracket/build/2014.07.01.10.35.18-i386/src/lib/libc/net/getifaddrs.c:39:0:
/tmp/bracket/build/2014.07.01.10.35.18-i386/src/sys/net/if.h:208:2: error: unknown type name 'kmutex_t'
kmutex_t *ifq_lock;
^
 1.169 01-Jul-2014  ozaki-r Lock IFQ operations when NET_MPSAFE

- Introduce NET_MPSAFE
- not defined by default
- Add ifq_lock to protect ifnet#if_snd
- Initialize ifq_lock and lock IFQ operations
when NET_MPSAFE

When NET_MPSAFE isn't defined, this modification
doesn't change its behavior and adds trivial
performance overheads.

Discussed with matt@ on tech-net
 1.168 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.167 16-Jun-2014  ozaki-r Include pktqueue.h only if _KERNEL
 1.166 16-Jun-2014  ozaki-r Move sysctl_pktq_{maxlen,count} to pktqueue.c and make them global

They will be used by bridge.

ok rmind@
 1.165 18-May-2014  rmind - Move ifnet_list (and lo0ifp while here) under #ifdef _KERNEL.
- Make ifindex2ifnet, if_indexlim and some other variables static.
- Move if_index generation into its own function.
- if_alloc/if_free: replace malloc with kmem.
 1.164 17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.163 26-Apr-2014  pooka Decouple sockets linkage from interface code by making ifioctl() a pointer.
 1.162 17-Apr-2014  christos add LRO
 1.161 12-Mar-2014  pooka branches: 1.161.2;
add a mask for valid capabilities

also add a comment stating why capabilities start from 0x80
 1.160 25-Jan-2014  christos add a lint comment
 1.159 28-Oct-2013  christos add an alias for the linux name for the interface index
 1.158 05-Oct-2013  christos fix the source too, not just the doc.
 1.157 05-Oct-2013  christos Add SIOCGIFINDEX from Ty Sarna and Matthew Sporleder.
 1.156 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.155 25-Oct-2012  msaitoh branches: 1.155.2;
Move the prototype definition of ether_input() from if.h to if_ether.h.
 1.154 25-Oct-2011  dyoung branches: 1.154.2; 1.154.8; 1.154.12;
Document the ifioctl locking in comments.

Add a missing percpu_free(9) call.
 1.153 19-Oct-2011  dyoung Fix userland compilation: pull the ifioctl lock-related data members
into a struct ifnet_lock that the ifnet has a pointer to. In a
non-_KERNEL environment, don't #include <sys/percpu.h> et cetera, and
don't define the struct ifnet_lock but *do* declare it.
 1.152 19-Oct-2011  dyoung Start to untangle the ifnet ioctls mess.

Add ifnet functions, if_mcast_op(), if_flags_set(), and if_addr_init()
for adding/deleting multicast addresses, modifying the if_flags,
and initializing local/remote addresses. Make ifpromisc() use
if_flags_set(). Protocols and network drivers should use these
instead of ifp->if_ioctl() calls. Subsequent commits will
replace ifp->if_ioctl(SIOCADDMULTI| SIOCDELMULTI| SIOCSIFDSTADDR|
SIOCINITIFADDR| SIOCSIFFLAGS) calls with calls to the new functions.

Use a mutex(9) to synchronize ifp->if_ioctl() calls originating in
userland. Also synchronize ifp->if_ioctl() calls with ifnet detachment
and reclamation.
 1.151 12-Aug-2011  dyoung Declare if_free().
 1.150 01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.149 18-Jan-2011  rmind branches: 1.149.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.148 15-Nov-2010  pooka branches: 1.148.2;
Implement ifconfig linkstr as proposed on tech-net.
 1.147 20-Oct-2010  pooka Remove XXX comment with the text "going away soon". It was added
in September 1989 -- I think we passed "soon" around last week.
 1.146 17-Jan-2010  pooka branches: 1.146.2; 1.146.4;
Forward declare struct bpf_if and use that as the type for bpf_if
instead of "void *". Buys us oo times the type-safety for 0 times
the price.
(no functional change)
 1.145 05-Oct-2009  dyoung Replace u_quad_t with uint64_t. u_quad_t is just a typedef for
uint64_t, so no ABI/API breakage will result from this change.
 1.144 11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.143 13-Aug-2009  dyoung Use sysctl(9) to expose to userland each interface transmission
queue's maximum length, current length, and number of drops. E.g.,

% sysctl net.interfaces.bnx0
net.interfaces.bnx0.sndq.len = 0
net.interfaces.bnx0.sndq.maxlen = 509
net.interfaces.bnx0.sndq.drops = 0

Let userland adjust the maximum queue length.

While I'm here, add a 64-bit generation number, if_index_gen, to
ifnet; the pair [ifp->if_index, ifp->if_index_gen] can serve to
identify an ifnet for the lifetime of the system. I will use this
in an upcoming change.

Ok matt@.
 1.142 11-Jan-2009  christos merge christos-time_t
 1.141 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.140 24-Oct-2008  dyoung branches: 1.140.2; 1.140.8;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.139 18-Jun-2008  yamt branches: 1.139.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@
 1.138 15-Jun-2008  christos - add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.137 13-May-2008  dyoung branches: 1.137.2;
Let us call ioctl(SIOC[ADG]LIFADDR) with a link-layer address on
an AF_LINK socket, only, to be consistent with SIOC[ADG]LIFADDR
behavior on AF_INET and AF_INET6 sockets. Let us create AF_LINK
sockets for this purpose. Note that most operations on AF_LINK
sockets are not implemented.
 1.136 11-May-2008  dyoung Add kernel support for adding/removing link-layer addresses using
SIOCALIFADDR AND SIOCDLIFADDR, respectively. Corresponding
ifconfig(8) changes are coming soon.
 1.135 28-Apr-2008  martin branches: 1.135.2;
Remove clause 3 and 4 from TNF licenses
 1.134 07-Feb-2008  dyoung branches: 1.134.6; 1.134.8; 1.134.10; 1.134.12;
Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.133 22-Jan-2008  dyoung Take two steps toward adding and deleting link-layer addresses.

1 Extract subroutine if_dl_create() from if_alloc_sadl().
if_dl_create() allocates a link-layer ifaddr.

2 Extract subroutine ifioctl_common() from ifioctl(). ifioctl_common()
will be the basis for an ifnet "superclass" whose functions
drivers may inherit. Very simple drivers may set ifnet->if_ioctl
= ifioctl_common. More sophisticated drivers will set ifnet->if_ioctl
= driver_ioctl. driver_ioctl() will call ifioctl_common() to
re-use the common code.
 1.132 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.131 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.130 06-Dec-2007  dyoung branches: 1.130.4;
Add ifa_insert() and ifa_remove() that add/remove an ifaddr to/from
an interface and increase/decrease its reference count.
 1.129 05-Dec-2007  dyoung Extract common code, creating a subroutine if_purgeaddrs(ifp,
family, purgeaddr) which applies function `purgeaddr' to each
address on `ifp' belonging to `family'.
 1.128 05-Dec-2007  dyoung Add IFNET_FIRST(), IFNET_NEXT(), IFADDR_FIRST(), IFADDR_NEXT(),
IFADDR_EMPTY().

Call the IF{NET,ADDR}_FOREACH() macro arguments __ifp and __ifa
instead of ifp and ifa.
 1.127 13-Sep-2007  gdt branches: 1.127.6; 1.127.8;
Add a define for the ifru_space union member.

Copy the entire sockaddr to the buffer to be written to user space,
according to its length, not just the part that fits in struct
sockaddr.

This fixes the 'bad MAC address' problem in dhclient.
 1.126 02-Sep-2007  dyoung Protect userland from ifreq_getaddr() w/ #ifdef _KERNEL.
 1.125 31-Aug-2007  dyoung Per discussion in 30 May 2007 on tech-net, add accessors for
ifreq->ifr_addr, ifreq_getaddr() and ifreq_setaddr().
 1.124 29-May-2007  christos branches: 1.124.2; 1.124.6; 1.124.8;
Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.123 04-Mar-2007  christos branches: 1.123.2; 1.123.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.122 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.121 23-Nov-2006  yamt branches: 1.121.4;
implement ipv6 TSO.
partly from Matthias Scheler. tested by him.
 1.120 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.119 30-Aug-2006  christos branches: 1.119.2; 1.119.4;
fully initialize IF_CLONE_INITIALIZER
 1.118 25-Jun-2006  yamt add a comment on if_agrprivate.
 1.117 23-Jun-2006  drochner remove dependency on "agr" to make "struct ifnet" independant of the
kernel configuration, avoids kernel/userland mismatches, ok by christos
 1.116 18-May-2006  liamjfoy branches: 1.116.4;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.115 16-Mar-2006  christos branches: 1.115.2;
Remove duplicate and slightly different declaration of ether_sprintf, which
really should be in if_ether.h like all the other ether_ functions.
 1.114 11-Dec-2005  thorpej branches: 1.114.4; 1.114.6; 1.114.8; 1.114.10;
ANSI function decls and application of static.
 1.113 11-Dec-2005  christos merge ktrace-lwp.
 1.112 06-Dec-2005  christos make the ALTQ macros statement-line, by wrapping them in do {} while (0)
 1.111 27-Jul-2005  dyoung Add members ifr_buf, ifr_buflen to ifreq for specifying the location
and size of a userland buffer. The kernel shall not copyout more
than ifr_buflen bytes to ifr_buf. For future ioctls that use
ifr_buf and ifr_buflen instead of ifr_data, the kernel can return
a larger struct in the future than when the ioctl is introduced,
without breaking ABI compatibility, provided that the size, order,
and semantics of the fields at the front of the struct does not
change.
 1.110 22-Jun-2005  dyoung branches: 1.110.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.
 1.109 19-Jun-2005  peter Use 'pattr' consistently in the IFQ_* macros.
 1.108 02-May-2005  yamt split IFCAP_CSUM_xxx to IFCAP_CSUM_xxx_Rx and IFCAP_CSUM_xxx_Tx.
 1.107 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.106 20-Mar-2005  agc Fix the spelling of Bill Studenmund's name - noticed from the licences
on the Sony PSP as found in:

http://www.scei.co.jp/psp-license/pspnet.txt
 1.105 20-Mar-2005  thorpej Define IFFBITS and IFCAPBITS here in <net/if.h>. Taken from ifconfig.
 1.104 18-Mar-2005  yamt add agr(4), a pseudo network device driver for link aggregation.
 1.103 06-Mar-2005  matt Add beginning of TCP Segment Offload support.
 1.102 28-Feb-2005  jonathan Increase default value for IFQ_MAXLEN from 50 to 256.

The value of 50 dates back to 4.3BSD and 10Mbit interfaces.
Gigabit interfaces are 100x faster, and by observation, when heavy
interrupt mitigation is enabled, gigabit interfaces can enqueue 40 packets
or more in a single hardware interrupt. So IFQ_MAXLEN of 256 is adequate
for at least four gigabit interfaces.

Increasing IFQ_MAXLEN discussed and approved, in priniciple, circa Apr 2004.
The value is sysctl'able, so the default is no longer so critical,
but (imho) best to tune for high-performane systems by default.
 1.101 26-Feb-2005  perry nuke trailing whitespace
 1.100 24-Jan-2005  matt branches: 1.100.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.99 08-Jan-2005  yamt branches: 1.99.2;
constify broadcastaddr.
 1.98 04-Dec-2004  peter Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.97 04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.96 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.95 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.94 28-Nov-2003  keihan s/netbsd.org/NetBSD.org/g
 1.93 10-Nov-2003  jonathan Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.
 1.92 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.91 03-Jul-2003  ragge Make IFQ_MAXLEN possible to set as an config-file option.
 1.90 29-Jun-2003  fvdl branches: 1.90.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.89 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.88 30-Apr-2003  bjh21 Expose IF_NAMESIZE for POSIX and X/Open applications.
 1.87 28-Apr-2003  bjh21 Add a new feature-test macro, _NETBSD_SOURCE. If this is defined
by the application, all NetBSD interfaces are made visible, even
if some other feature-test macro (like _POSIX_C_SOURCE) is defined.
<sys/featuretest.h> defined _NETBSD_SOURCE if none of _ANSI_SOURCE,
_POSIX_C_SOURCE and _XOPEN_SOURCE is defined, so as to preserve
existing behaviour.

This has two major advantages:
+ Programs that require non-POSIX facilities but define _POSIX_C_SOURCE
can trivially be overruled by putting -D_NETBSD_SOURCE in their CFLAGS.
+ It makes most of the #ifs simpler, in that they're all now ORs of the
various macros, rather than having checks for (!defined(_ANSI_SOURCE) ||
!defined(_POSIX_C_SOURCE) || !defined(_XOPEN_SOURCE)) all over the place.

I've tried not to change the semantics of the headers in any case where
_NETBSD_SOURCE wasn't defined, but there were some places where the
current semantics were clearly mad, and retaining them was harder than
correcting them. In particular, I've mostly normalised things so that
_ANSI_SOURCE gets you the smallest set of stuff, then _POSIX_C_SOURCE,
_XOPEN_SOURCE and _NETBSD_SOURCE in that order.

Tested by building for vax, encouraged by thorpej, and uncontested in
tech-userlevel for a week.
 1.86 05-Mar-2003  christos Fix the fallout from potr malloc changes
 1.85 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.84 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.83 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.82 26-Aug-2002  thorpej Fix signed/unsigned comparison warnings from GCC 3.3.
 1.81 09-Aug-2002  soren <net/if.h> needs <sys/socket.h> for struct sockaddr.
PR kern/3377 from der Mouse.
 1.80 23-Jun-2002  itojun g/c last bit of old ipv6 prefix management.
 1.79 11-Jun-2002  pooka s/splimp/splnet/ in comment
 1.78 27-May-2002  itojun re-scan all ifnet after domaininit() for if_afdata initialization.
 1.77 27-May-2002  itojun framework to add af-dependent data structure to struct ifnet.
as discussed at bsd-api-discuss. sync w/kame
 1.76 23-May-2002  matt Add SIOCGIFDATA and SIOCZIFDATA ioctl's to get interface data. (the Z
variant also zeroes the counters after copying them). In ifunit, add
support for dealing all numeric ifname by treating them as an ifindex
which is used to look up the interface.
 1.75 17-Mar-2002  simonb branches: 1.75.4; 1.75.6;
Make the 'ifnet' variable an extern and declare it in if.c.
 1.74 17-Sep-2001  thorpej Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.73 14-Jun-2001  itojun branches: 1.73.2; 1.73.4;
fix comment on ifi_lastchange, for 1.4 if_data
 1.72 14-Jun-2001  itojun update comment on if_lastchange
 1.71 11-Jun-2001  wiz Fix various misspellings of compatible/compatibility.
 1.70 02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.69 30-May-2001  mrg use _KERNEL_OPT
 1.68 10-Apr-2001  enami fix possible typo in comment.
 1.67 10-Apr-2001  thorpej Add a PFIL_HOOKS filtering point to every network interface.
 1.66 07-Apr-2001  thorpej ether_*() functions belong in if_ether.h, not if.h.
 1.65 17-Jan-2001  itojun branches: 1.65.2;
move forward decl of rt_addrinfo upwards.
 1.64 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.63 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.62 23-Dec-2000  thorpej Fix a silly bug in the ALTQ version of IFQ_DEQUEUE().
 1.61 18-Dec-2000  thorpej Add an "ifr_dlt" alias for the union in struct ifreq.
 1.60 18-Dec-2000  thorpej Always pull in DLT_* constants.
 1.59 18-Dec-2000  thorpej Add a if_dlt member, used so that userland can query the DLT_* of an
interface without having to first attach it to a bpfdesc.
 1.58 18-Dec-2000  thorpej Commit to the ALTQ glue.
 1.57 14-Dec-2000  thorpej Fix braino in IF_PURGE().
 1.56 14-Dec-2000  thorpej Oops, forgot IFQ_POLL() in the ALTQ case.
 1.55 13-Dec-2000  thorpej First step at integrating ALTQ -- IFQ_*() glue macros that select
old-style queueing or ALTQ based on a compile time option.
 1.54 11-Oct-2000  thorpej Change the if_reset vector to if_init, and add an if_stop. if_stop
also takes an argument indicating whether or not the interface should
also be disabled (i.e. power removed, resources freed, etc.)
 1.53 20-Jul-2000  thorpej Add a SIOCGIFCLONERS ioctl, which fetches a list of network
interface cloners from the kernel.
 1.52 04-Jul-2000  thorpej Don't allow IFF_PROMISC to be changed directly by userspace. It
interferes with the reference counting done by ifpromisc(), and is
essentially impossible to get the semantics correct if we allow this
flag to be directly toggled.

No programs should really be affected by this; IFF_PROMISC is basically
useless without bpf, anyway, and bpf still provides a way to set
promiscuous mode on an interface (which uses ifpromisc()).
 1.51 02-Jul-2000  thorpej Add the notion of "cloning" of network pseudo-interface (e.g. `gif').
This allows them to be created and destroyed on the fly via ifconfig(8),
rather than specifying the count in the kernel configuration file.
 1.50 15-May-2000  itojun branches: 1.50.4;
backout previous (packed attribute to struct ifreq)
 1.49 15-May-2000  itojun add packed attribute to struct ifreq. this should avoid unaligned access
while parsing SIOCGIFCONF, on alignment-picky archs.
 1.48 29-Mar-2000  simonb Extern the declarations of ifindex2ifnet and if_index.
 1.47 22-Mar-2000  itojun remove if_withname, which was merged in by mistake during KAME merge.
 1.46 06-Mar-2000  thorpej - Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.
 1.45 06-Mar-2000  kleink Make pre-1.5 compatibility structures being defined conditional on _KERNEL
as well.
 1.44 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.43 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.42 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.41 26-Oct-1999  wrstuden Up the size of the ifa_flags and ifa_refcnt from shorts to ints. Now will
deal correctly with more than 32767 routes out an interface.

Should close PR 7148 regarding problems when ifs_refcnt overflows.

Bump kernel version from 1.4L to 1.4M.
 1.40 29-Sep-1999  thorpej branches: 1.40.2; 1.40.4; 1.40.6;
const poison ifunit().
 1.39 21-Sep-1999  matt Add a ifru_value (unsigned int) as a generic value.
 1.38 03-Jul-1999  kleink Add namespace protection, using XNS5.2 D2.0 as a reference (which effectively
boils down to not making anything but the if_nameindex(3) interfaces available
to _XOPEN_SOURCE).
 1.37 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.36 18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.35 27-Mar-1999  aidan branches: 1.35.2; 1.35.4; 1.35.6;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.
 1.34 10-Mar-1999  thorpej Const poison ether_ifattach().
 1.33 10-Mar-1999  thorpej Const poison ether_sprintf().
 1.32 22-May-1998  matt branches: 1.32.6;
Add an if_drain to the ifnet structure (call when the system is low
on mbufs). Add code to m_reclaim to call if_drain in each ifnet
that has one set. Remove register from declarations.
 1.31 14-May-1998  kml Driver for Essential Communications' RoadRunner HIPPI (800 Mb/sec network)
card. With some modification, this could probably also work for their
Gigabit Ethernet card based on the same chipset...
 1.30 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.29 02-Oct-1997  is Reimplement a test for broadcast addresses advertized, which was left out
when rewriting the ARP system.
 1.28 08-Apr-1997  chuck branches: 1.28.4;
prevent multiple inclusions
 1.27 17-Mar-1997  thorpej BSD/OS-style network interface media selection, implemented by
Jonathan Stone and myself. Many thanks to Matt Thomas for providing
the information necessary to implement this interface, and for helping
to shake out the bugs.
 1.26 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.25 15-Jan-1997  gwr branches: 1.25.2;
fix alignment again for m68k
 1.24 13-Jun-1996  cgd branches: 1.24.2;
add an ifru_mtu member to the union in 'struct ifreq', and add a
#define so that ifr_mtu accesses that. MTU shouldn't be overloaded
with ifr_metric, if only for clarity. Adding an MTU field to the
union hurts nothing (in fact, does not actually _change_ generated
code), and does improve clarity.
 1.23 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.22 26-Feb-1996  mrg two more local addr changes, all done differently now (idea from charles)
 1.21 17-Feb-1996  pk struct ifaliasreq: adapt nomenclature to protocol specific counterparts, ie.
swap `ifra_broadaddr' and `ifra_dstaddr'.
 1.20 13-Feb-1996  christos Net prototypes
 1.19 19-Jun-1995  cgd oops; export that head definition to non-kernel code.
 1.18 19-Jun-1995  cgd define a type for the ifnet queue's head.
 1.17 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.16 07-Apr-1995  mycroft if_start and if_watchdog should return void.
 1.15 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.14 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.13 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.12 19-Oct-1994  cgd fix pr 528; don't define struct if_data inside another structure.
 1.11 26-Jul-1994  cgd kill vax code, at ragge's requeust.
 1.10 29-Jun-1994  cgd branches: 1.10.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 16-Feb-1994  mycroft IFF_ALLMULTI is not externally settable.
 1.7 10-Feb-1994  mycroft if_init and if_done are not actually used; no point in having them at all.
 1.6 10-Dec-1993  cgd slight fix to last
 1.5 10-Dec-1993  cgd the IFF_MULTICAST constant should always be defined. also,
move IFF_LLC* -> IFF_LINK*; they were misnamed.
 1.4 06-Dec-1993  hpeyerl multicast support.
From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd branches: 1.3.4;
add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.4 10-Dec-1993  cgd LLC -> LINK
 1.3.4.3 14-Nov-1993  mycroft Canonicalize all #includes.
 1.3.4.2 03-Nov-1993  mycroft if_init and if_done aren't actually used anywhere; nuke them. if_start and
if_watchdog return void.
 1.3.4.1 29-Oct-1993  mycroft Make if_reset #ifdef vax. (Note: this shifts struct ifnet; rebuild your
kernels from scratch.)
 1.10.2.1 14-Aug-1994  mycroft update from trunk (to remove ancient vax stuff)
 1.24.2.1 18-Jan-1997  thorpej Update from trunk.
 1.25.2.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.28.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.32.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.35.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.35.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.35.4.3 02-Aug-1999  thorpej Update from trunk.
 1.35.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.35.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.35.2.1 11-May-2000  he Pull up revision 1.46 (partial, via patch, requested by jhawk):
Add a driver for ``wi'', Lucent "Orinoco"/Wavelan.
 1.40.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.40.4.1 15-Nov-1999  fvdl Sync with -current
 1.40.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.40.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.40.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.40.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.40.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.50.4.1 31-Dec-2000  jhawk Pull up revision 1.51, 1.53 (requested by bouyer):
Support cloning of network pseudo-interfaces.
 1.65.2.9 11-Nov-2002  nathanw Catch up to -current
 1.65.2.8 27-Aug-2002  nathanw Catch up to -current.
 1.65.2.7 13-Aug-2002  nathanw Catch up to -current.
 1.65.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.65.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.65.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.65.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.65.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.65.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.73.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.73.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.73.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.73.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.75.6.1 01-Nov-2002  tron Pull up revision 1.76 (requested by martin in ticket #32):
Add SIOCGIFDATA and SIOCZIFDATA ioctl's to get interface data. (the Z
variant also zeroes the counters after copying them). In ifunit, add
support for dealing all numeric ifname by treating them as an ifindex
which is used to look up the interface.
 1.75.4.4 29-Aug-2002  gehenna catch up with -current.
 1.75.4.3 15-Jul-2002  gehenna catch up with -current.
 1.75.4.2 20-Jun-2002  gehenna catch up with -current.
 1.75.4.1 30-May-2002  gehenna Catch up with -current.
 1.90.2.12 11-Dec-2005  christos Sync with head.
 1.90.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.90.2.10 01-Apr-2005  skrll Sync with HEAD.
 1.90.2.9 08-Mar-2005  skrll Sync with HEAD.
 1.90.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.90.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.90.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.90.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.90.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.90.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.90.2.2 03-Aug-2004  skrll Sync with HEAD
 1.90.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.99.2.1 29-Apr-2005  kent sync with -current
 1.100.2.2 26-Mar-2005  yamt sync with head.
 1.100.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.110.2.9 11-Feb-2008  yamt sync with head.
 1.110.2.8 04-Feb-2008  yamt sync with head.
 1.110.2.7 21-Jan-2008  yamt sync with head
 1.110.2.6 07-Dec-2007  yamt sync with head
 1.110.2.5 27-Oct-2007  yamt sync with head.
 1.110.2.4 03-Sep-2007  yamt sync with head.
 1.110.2.3 26-Feb-2007  yamt sync with head.
 1.110.2.2 30-Dec-2006  yamt sync with head.
 1.110.2.1 21-Jun-2006  yamt sync with head.
 1.114.10.1 19-Apr-2006  elad sync with head.
 1.114.8.4 03-Sep-2006  yamt sync with head.
 1.114.8.3 26-Jun-2006  yamt sync with head.
 1.114.8.2 24-May-2006  yamt sync with head.
 1.114.8.1 01-Apr-2006  yamt sync with head.
 1.114.6.2 01-Jun-2006  kardel Sync with head.
 1.114.6.1 22-Apr-2006  simonb Sync with head.
 1.114.4.1 09-Sep-2006  rpaulo sync with head
 1.115.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.116.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.119.4.1 10-Dec-2006  yamt sync with head.
 1.119.2.2 12-Jan-2007  ad Sync with head.
 1.119.2.1 18-Nov-2006  ad Sync with head.
 1.121.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.121.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.123.4.1 11-Jul-2007  mjf Sync with head.
 1.123.2.2 09-Oct-2007  ad Sync with head.
 1.123.2.1 09-Jun-2007  ad Sync with head.
 1.124.8.3 23-Mar-2008  matt sync with HEAD
 1.124.8.2 09-Jan-2008  matt sync with HEAD
 1.124.8.1 06-Nov-2007  matt sync with HEAD
 1.124.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.124.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.124.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.124.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.127.8.2 26-Dec-2007  ad Sync with head.
 1.127.8.1 08-Dec-2007  ad Sync with head.
 1.127.6.3 18-Feb-2008  mjf Sync with HEAD.
 1.127.6.2 27-Dec-2007  mjf Sync with HEAD.
 1.127.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.130.4.2 23-Jan-2008  bouyer Sync with HEAD.
 1.130.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.134.12.5 11-Mar-2010  yamt sync with head
 1.134.12.4 16-Sep-2009  yamt sync with head
 1.134.12.3 19-Aug-2009  yamt sync with head.
 1.134.12.2 04-May-2009  yamt sync with head.
 1.134.12.1 16-May-2008  yamt sync with head.
 1.134.10.3 17-Jun-2008  yamt sync with head.
 1.134.10.2 18-May-2008  yamt sync with head.
 1.134.10.1 19-Apr-2008  yamt Peter Postma's work-in-progress pf import from OpenBSD 4.2.
updated to -current by me.
 1.134.8.3 09-Nov-2008  christos merge with head.
 1.134.8.2 01-Nov-2008  christos Sync with head.
 1.134.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.134.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.134.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.134.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.135.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.137.2.1 18-Jun-2008  simonb Sync with head.
 1.139.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.140.8.3 24-Dec-2011  matt Make this compile if COMPAT_14 is defined.
 1.140.8.2 13-May-2010  matt Add a spare int field to ifa_msghdr so its length is a multiple of 8.
 1.140.8.1 11-May-2010  matt A few changes that make the route interface and related sysctls 32/64 bit
independent so the netbsd32 userland can use them.
 1.140.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.146.4.1 05-Mar-2011  rmind sync with head
 1.146.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.148.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.149.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.154.12.3 03-Dec-2017  jdolecek update from HEAD
 1.154.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.154.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.154.8.1 16-Apr-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #1289):
sys/net/if.h: revision 1.186
Use 1000ULL to prevent integer overflow (for IF_Gbps(10)). Same as OpenBSD.
 1.154.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.154.2.1 30-Oct-2012  yamt sync with head
 1.155.2.3 18-May-2014  rmind sync with head
 1.155.2.2 28-Aug-2013  rmind sync with head
 1.155.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.161.2.1 10-Aug-2014  tls Rebase.
 1.174.2.1 16-Apr-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #693):
sys/net/if.h: revision 1.186
Use 1000ULL to prevent integer overflow (for IF_Gbps(10)). Same as OpenBSD.
 1.181.2.12 28-Aug-2017  skrll Sync with HEAD
 1.181.2.11 05-Feb-2017  skrll Sync with HEAD
 1.181.2.10 05-Dec-2016  skrll Sync with HEAD
 1.181.2.9 05-Oct-2016  skrll Sync with HEAD
 1.181.2.8 09-Jul-2016  skrll Sync with HEAD
 1.181.2.7 29-May-2016  skrll Sync with HEAD
 1.181.2.6 22-Apr-2016  skrll Sync with HEAD
 1.181.2.5 19-Mar-2016  skrll Sync with HEAD
 1.181.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.181.2.3 22-Sep-2015  skrll Sync with HEAD
 1.181.2.2 06-Jun-2015  skrll Sync with HEAD
 1.181.2.1 06-Apr-2015  skrll Sync with HEAD
 1.221.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.221.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.221.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.221.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.221.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.221.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.233.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.239.2.8 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.239.2.7 13-Jul-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #911):

sys/kern/init_main.c: revision 1.498
sys/rump/net/lib/libnet/net_component.c: revision 1.10
sys/net/if.h: revision 1.264
sys/net/if.c: revision 1.429

Fix net.inet6.ip6.ifq node doesn't exist

The node (and child nodes) is initialized in sysctl_net_pktq_setup, but the call
of sysctl_net_pktq_setup is skipped unexpectedly.
sysctl_net_pktq_setup is skipped if in6_present is false that indicates the
netinet6 component isn't loaded on rump kernels. However the flag is
accidentally always false because the flag is turned on in in6_dom_init that is
called after if_sysctl_setup on both normal and rump kernels.

Fix the issue by moving if_sysctl_setup after in6_dom_init (domaininit on normal
kernels). This fix is ad-hoc but good enough for netbsd-8. We should refine
the initialization order of network components in the future.

Pointed out by hikaru@
 1.239.2.6 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.239.2.5 14-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #749):

sys/net/if.h: revision 1.259
sys/net/route.c: revision 1.209
sys/net/route.h: revision 1.118
sys/net/rtsock.c: revision 1.240

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by
moving utility functions of rtentry updates from rtsock.c and ensuring
holding the rt_lock.
It also improves the atomicity of a update of a rtentry.
 1.239.2.4 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.239.2.3 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.239.2.2 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.239.2.1 01-Jul-2017  snj Pull up following revision(s) (requested by roy in ticket #77):
sys/net/if.h: revision 1.240
sys/netinet/if_arp.c: revision 1.253
sys/net/if.c: revision 1.395
Introduce if_get_bylla to find an interface with the active
local link address.
--
Use if_get_bylla() instead of just looking at the lla of the interface
the address belongs to.
This allows any ARP message we receieved from another interface to
be correctly dropped.
While here, move the protocol length check higher up the food chain.
 1.258.2.15 20-Oct-2018  pgoyette Sync with head
 1.258.2.14 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.258.2.13 28-Jul-2018  pgoyette Sync with HEAD
 1.258.2.12 25-Jun-2018  pgoyette Sync with HEAD
 1.258.2.11 02-May-2018  pgoyette Synch with HEAD
 1.258.2.10 22-Apr-2018  pgoyette Sync with HEAD
 1.258.2.9 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.258.2.8 08-Mar-2018  pgoyette Handle ifconf() compat vectors
 1.258.2.7 06-Mar-2018  pgoyette Declare it correctly
 1.258.2.6 06-Mar-2018  pgoyette Declare the compat_ifconf vector, not the stub.
 1.258.2.5 06-Mar-2018  pgoyette And we need the oifreq definition here, too
 1.258.2.4 06-Mar-2018  pgoyette Better to add these required headers closer to where they're needed
 1.258.2.3 06-Mar-2018  pgoyette And another required header
 1.258.2.2 06-Mar-2018  pgoyette Include necessary header
 1.258.2.1 06-Mar-2018  pgoyette Move indirect function call vectors to if.h where they can be
found by the code that manipulates them.
 1.263.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.263.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.263.2.1 10-Jun-2019  christos Sync with HEAD
 1.274.2.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.277.2.1 29-Feb-2020  ad Sync with head.
 1.289.8.1 31-May-2021  cjep sync with head
 1.289.6.2 01-Aug-2021  thorpej Sync with HEAD.
 1.289.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.304.2.1 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.305.4.1 02-Aug-2025  perseant Sync with HEAD
 1.305.2.1 11-Nov-2023  thorpej branches: 1.305.2.1.2;
Mostly de-tangle ifnet::if_snd from ifaltq, in a way that's minimally-
invasive to the ALTQ code itself.

The point of this is to lay the groundwork for future changes to ifqueue,
which among other benefits, will also hide the ALTQ ABI from drivers.
 1.305.2.1.2.5 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.305.2.1.2.4 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.305.2.1.2.3 15-Nov-2023  thorpej Protect the ALTQ state that's exposed to the ifqueue if the ifq->ifq_lock.
This requires exposing some implementation details to ALTQ, which is guarded
by an __IFQ_PRIVATE define.
 1.305.2.1.2.2 15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.305.2.1.2.1 14-Nov-2023  thorpej New network interface output queue API.
 1.1 11-Dec-1998  kenh branches: 1.1.2;
file if_alloc.h was initially added on branch kenh-if-detach.
 1.1.2.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.23 23-Oct-2017  msaitoh If if_attach() failed in the attach function, return.
 1.22 20-Feb-2008  matt branches: 1.22.54; 1.22.90;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.21 25-Dec-2007  he Convert to using if_set_sadl() instead of arc_storelladdr(), catching
an overlooked setting of ifnet->if_sadl. This follows up the recent
change to net/if.h.
 1.20 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.19 14-Dec-2005  christos branches: 1.19.46; 1.19.52; 1.19.56; 1.19.60;
argument type conflict.
 1.18 11-Dec-2005  thorpej ANSI function decls and application of static.
 1.17 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.16 26-Feb-2005  perry branches: 1.16.4;
nuke trailing whitespace
 1.15 07-Aug-2003  agc branches: 1.15.8; 1.15.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.14 13-May-2002  matt branches: 1.14.10;
Eliminate common.
 1.13 19-Nov-1999  thorpej branches: 1.13.6; 1.13.8;
Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.12 25-Sep-1999  is branches: 1.12.2; 1.12.8;
Decouple IP mtu for ARCnet devices from interface MTU.
This is important, because for most protocols, link level fragmentation is
used, but with different default effective MTUs. (e.g.: IPv4 default MTU
is 1500 octets, IPv6 default MTU is 9072 octets).
 1.11 27-Aug-1999  is Factor out arc_storelladdr(), and use that instead of arc_ifattach() in
the bah_reset() function.
This makes the last change work without deconnecting all the other interfaces
from the interface list.
 1.10 20-May-1999  thorpej Oops, commit here slipped through the cracks.
 1.9 25-Feb-1999  is branches: 1.9.4;
So... after all, the ATA878.2 copy I had was buggy. The newer revision has
this fixed in the figures (but still not in the text); anyway, the intention
of the ATA is that this is identical to the PHDS specification.
Remove the ...EXC_8782 constant, and change the _EXC_1201 constant to be
a simple ...EXC.
 1.8 16-Jan-1999  is - define protocol type for diagnostics (0x80 as per ANSI 878.1)
- define protocol type for IP version 6
- define length of exceptional length packets for both RFC 1201-style and
ATA 878.2-style fragmentation.
 1.7 09-Feb-1998  perry branches: 1.7.6;
add multiple inclusion protection (and cleanup).
 1.6 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.5 07-Jun-1995  cgd branches: 1.5.8;
update from Ignatios Souvatzis
 1.4 14-Apr-1995  chopps update arc_input() proto to match reality.
 1.3 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.2 02-Mar-1995  chopps add prototypes
 1.1 23-Feb-1995  glass preliminary arcnet support. uses lame but RFC address resolution
 1.5.8.2 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.5.8.1 08-Feb-1997  is Extinguish the link level address from struct arccom, too.
XXX Todo: change this in the hardware driver.
 1.7.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.9.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.12.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.8.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.6.1 20-Jun-2002  nathanw Catch up to -current.
 1.14.10.5 11-Dec-2005  christos Sync with head.
 1.14.10.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.14.10.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.14.10.2 18-Sep-2004  skrll Sync with HEAD.
 1.14.10.1 03-Aug-2004  skrll Sync with HEAD
 1.15.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.15.8.1 29-Apr-2005  kent sync with -current
 1.16.4.3 27-Feb-2008  yamt sync with head.
 1.16.4.2 21-Jan-2008  yamt sync with head
 1.16.4.1 21-Jun-2006  yamt sync with head.
 1.19.60.1 02-Jan-2008  bouyer Sync with HEAD
 1.19.56.1 26-Dec-2007  ad Sync with head.
 1.19.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.19.46.2 23-Mar-2008  matt sync with HEAD
 1.19.46.1 09-Jan-2008  matt sync with HEAD
 1.22.90.1 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.22.54.1 03-Dec-2017  jdolecek update from HEAD
 1.87 21-Sep-2025  christos Centralize all the "can't handle af%d\n", messages in one place and provide
more context. Now I get ad-nauseam:
ether_output: wm1: can't handle af18 (link: link#2)
 1.86 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.85 03-Sep-2022  thorpej branches: 1.85.8; 1.85.10;
Garbage-collect the remaining vestiges of netisr.
 1.84 03-Sep-2022  thorpej Convert ARP from a legacy netisr to pktqueue.
 1.83 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.82 28-Aug-2020  ozaki-r branches: 1.82.6;
net: introduce IFQ_ENQUEUE_ISR to assemble packet queuing routines (NFCI)
 1.81 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.80 09-May-2018  maxv branches: 1.80.2; 1.80.8;
Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
 1.79 26-Apr-2018  maxv m_copy -> m_copym
 1.78 23-Oct-2017  msaitoh branches: 1.78.2;
If if_attach() failed in the attach function, return.
 1.77 14-Feb-2017  ozaki-r branches: 1.77.6;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.76 24-Jan-2017  maxv Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.75 11-Jan-2017  ozaki-r branches: 1.75.2;
Get rid of unnecessary header inclusions
 1.74 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.73 28-Apr-2016  ozaki-r branches: 1.73.2;
Constify remaining rtentry of if_output (fix build)
 1.72 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.71 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.70 09-Feb-2016  ozaki-r Fix build
 1.69 13-Oct-2015  roy arpresolve() now returns 0 on success otherwise an error code.
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.

This allows applications such as ping(8) to display a suitable error
condition.
 1.68 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.67 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.66 05-Jun-2014  rmind branches: 1.66.2; 1.66.4; 1.66.6; 1.66.8;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.65 15-May-2014  msaitoh Put schednetisr() into splnet()/splx() pair.
This might avoids delay of processing a packet.
 1.64 24-Sep-2012  msaitoh branches: 1.64.2; 1.64.10;
Add missing "\n" in log(9)
 1.63 05-Apr-2010  joerg branches: 1.63.8; 1.63.14; 1.63.18; 1.63.20;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.62 19-Jan-2010  pooka branches: 1.62.2; 1.62.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.61 20-Nov-2009  christos ar_tha() can return NULL; treat this as an error.
 1.60 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.59 20-Feb-2008  matt branches: 1.59.6; 1.59.10; 1.59.16; 1.59.18; 1.59.20; 1.59.22; 1.59.24;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.58 25-Dec-2007  he Convert to using if_set_sadl() instead of arc_storelladdr(), catching
an overlooked setting of ifnet->if_sadl. This follows up the recent
change to net/if.h.
 1.57 19-Oct-2007  ad branches: 1.57.2; 1.57.4; 1.57.8;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.56 30-Aug-2007  dyoung branches: 1.56.4;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.55 19-Feb-2007  dyoung branches: 1.55.4; 1.55.12; 1.55.16; 1.55.18;
Remove unused #define SIN(). From he@.
 1.54 19-Feb-2007  dyoung Fix fallout from if_output constification. Thanks, Havard Eidnes,
for reporting the problem and testing my patch.
 1.53 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.52 07-Jun-2006  kardel branches: 1.52.12;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.51 11-Dec-2005  thorpej branches: 1.51.4; 1.51.6; 1.51.8; 1.51.14;
ANSI function decls and application of static.
 1.50 11-Dec-2005  christos merge ktrace-lwp.
 1.49 05-Jun-2005  he branches: 1.49.2;
Fix -Wcast-qual warning.
 1.48 17-May-2005  christos Yes, it was a cool trick >20 years ago to use "0123456789abcdef"[a] to
implement, xtoa(), but I think defining the samestring 50 times is a bit
too much. Defined HEXDIGITS and hexdigits in subr_prf.c and use it...
 1.47 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.46 26-Feb-2005  perry nuke trailing whitespace
 1.45 25-Mar-2004  is branches: 1.45.8; 1.45.10;
UCB no longer requires the advertising clause.
 1.44 11-Aug-2003  itojun minor knf
 1.43 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42 02-May-2003  itojun branches: 1.42.2;
KNF
 1.41 19-Jan-2003  simonb Remove variable that is only assigned too but not referenced.
 1.40 11-Sep-2002  itojun KNF - return is not a function.
 1.39 05-Mar-2002  itojun bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.38 12-Nov-2001  lukem add RCSIDs
 1.37 17-Oct-2001  itojun unifdef OLDIP6OUTPUT
 1.36 14-Jun-2001  itojun branches: 1.36.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.35 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.34 17-Jan-2001  thorpej branches: 1.34.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.33 18-Dec-2000  thorpej Fill in if_dlt.
 1.32 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.31 12-Apr-2000  itojun revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.30 30-Mar-2000  augustss Kill some more register declarations.
 1.29 20-Dec-1999  frueauf Make this compile again:
NEWIP6OUTPUT gets no longer defined, revers logic to use OLDIP6OUTPUT.
 1.28 25-Sep-1999  is branches: 1.28.2; 1.28.8;
Decouple IP mtu for ARCnet devices from interface MTU.
This is important, because for most protocols, link level fragmentation is
used, but with different default effective MTUs. (e.g.: IPv4 default MTU
is 1500 octets, IPv6 default MTU is 9072 octets).
 1.27 19-Sep-1999  is Zeroth version of IPv6 support for ARCnet. Correct MTU handling still needs
to be done.
 1.26 29-Aug-1999  is Move the mtu initialization to arc_storelladdr, so that it will be upped
again when switching link0 on.
XXX This stuff needs to be thought about, especially with the doomming IPv6
support, which uses yet another default mtu.
 1.25 27-Aug-1999  is Don't assume PHDS encoding for DIAGNOSE packets... we have to pass them
raw, if used at all.
 1.24 27-Aug-1999  is Factor out arc_storelladdr(), and use that instead of arc_ifattach() in
the bah_reset() function.
This makes the last change work without deconnecting all the other interfaces
from the interface list.
 1.23 26-Aug-1999  is Only use ifp->if_addrlen after initializing it.\
Problem detected by Andreas Johansson.
 1.22 26-Aug-1999  is Eliminate a function call... we know its exactly one byte here
 1.21 18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.20 25-Feb-1999  is branches: 1.20.4;
So... after all, the ATA878.2 copy I had was buggy. The newer revision has
this fixed in the figures (but still not in the text); anyway, the intention
of the ATA is that this is identical to the PHDS specification.
Remove the ...EXC_8782 constant, and change the _EXC_1201 constant to be
a simple ...EXC.
 1.19 16-Jan-1999  is Yet another performance optimization for exceptional length ARCnet packets.
This time in the receive path.
 1.18 16-Jan-1999  is Make the code path for exceptional length packets a bit faster (2 mbuf
operations less) and better readable.
 1.17 05-Jul-1998  jonathan branches: 1.17.6;
defopt INET, NETATALK.
 1.16 02-Oct-1997  is Reimplement a test for broadcast addresses advertized, which was left out
when rewriting the ARP system.
 1.15 23-Mar-1997  is branches: 1.15.4;
Fix several bugs related to the new ARP code, and ARCnet ARP support.
Among other, add ARPHRD_ARCNET definition, make sure the hardware type is
set on outgoing ARP packets, make sure we dont send out replies as broadcasts.
 1.14 17-Mar-1997  is Make this compile on port-amiga. Bug report by Bernd Ernesti.
 1.13 16-Mar-1997  is move if_arc.h to sys/net
 1.12 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.11 13-Oct-1996  christos branches: 1.11.4;
backout previous kprintf change
 1.10 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.9 02-Sep-1996  is Add IP multicast support as per RFC 1122 section 3.3.7 to ARCnet.
"The mapping of IP Class D addresses to local addresses is
currently specified for the following types of networks:
[...]
o Any network that supports broadcast but not multicast,
addressing: all IP Class D addresses map to the local
broadcast address."
 1.8 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.7 15-Apr-1996  is Don't even check the not-yet-initialized mbuf pointers for being !=
NULL in the error exit code of arc_output(), else we see random data
and try to m_freem() it, panic'ing the machine.
 1.6 24-Dec-1995  mycroft Various cleanup, mostly by me, submitted by Ignatios Souvatzis.
 1.5 12-Jul-1995  cgd branches: 1.5.2;
fix struct member use, as explained in pr 1164. style police
beat the fix into submission.
 1.4 07-Jun-1995  cgd update from Ignatios Souvatzis
 1.3 14-Apr-1995  chopps change args to arc_input also add check on link address which fixes pr#922. from Ignatios Souvatzis <is@beverly.rhein.de>
 1.2 11-Apr-1995  mycroft Remove some explicit references to loif.
 1.1 23-Feb-1995  glass preliminary arcnet support. uses lame but RFC address resolution
 1.5.2.1 15-Apr-1996  is Fix a bug in the HI part of the ARCnet driver, which would cause the
kernel to panic if the IP layer tried to output at the time the
interface was ifconfig'd down.
 1.11.4.4 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.11.4.3 11-Feb-1997  is Oops, forgot some usages of ((struct arccom *)ifp)->ac_anaddr
 1.11.4.2 08-Feb-1997  is Extinguish the link level address from struct arccom, too.
XXX Todo: change this in the hardware driver.
 1.11.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.15.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.17.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.20.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.28.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.28.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.28.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.28.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.28.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.28.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.34.2.5 17-Sep-2002  nathanw Catch up to -current.
 1.34.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.34.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.34.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.34.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.36.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.36.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.36.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.42.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.2.5 01-Apr-2005  skrll Sync with HEAD.
 1.42.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.42.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.42.2.1 03-Aug-2004  skrll Sync with HEAD
 1.45.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.45.8.1 29-Apr-2005  kent sync with -current
 1.49.2.6 27-Feb-2008  yamt sync with head.
 1.49.2.5 21-Jan-2008  yamt sync with head
 1.49.2.4 27-Oct-2007  yamt sync with head.
 1.49.2.3 03-Sep-2007  yamt sync with head.
 1.49.2.2 26-Feb-2007  yamt sync with head.
 1.49.2.1 21-Jun-2006  yamt sync with head.
 1.51.14.1 19-Jun-2006  chap Sync with head.
 1.51.8.1 26-Jun-2006  yamt sync with head.
 1.51.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.51.4.1 09-Sep-2006  rpaulo sync with head
 1.52.12.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.55.18.3 23-Mar-2008  matt sync with HEAD
 1.55.18.2 09-Jan-2008  matt sync with HEAD
 1.55.18.1 06-Nov-2007  matt sync with HEAD
 1.55.16.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.55.16.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.55.12.1 03-Sep-2007  skrll Sync with HEAD.
 1.55.4.2 23-Oct-2007  ad Sync with head.
 1.55.4.1 09-Oct-2007  ad Sync with head.
 1.56.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.57.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.57.4.1 26-Dec-2007  ad Sync with head.
 1.57.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.59.24.1 21-Apr-2010  matt sync to netbsd-5
 1.59.22.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.59.20.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.59.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.59.16.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.59.10.3 11-Aug-2010  yamt sync with head.
 1.59.10.2 11-Mar-2010  yamt sync with head
 1.59.10.1 04-May-2009  yamt sync with head.
 1.59.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.62.4.1 30-May-2010  rmind sync with head
 1.62.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.63.20.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.63.18.3 03-Dec-2017  jdolecek update from HEAD
 1.63.18.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.63.18.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.63.14.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.63.14.1 23-Oct-2012  riz branches: 1.63.14.1.2;
Pull up following revision(s) (requested by msaitoh in ticket #616):
sys/netinet/if_atm.c: revision 1.33
sys/net/if_arcsubr.c: revision 1.64
sys/netinet/ip_mroute.c: revision 1.126
Add missing "\n" in log(9)
 1.63.14.1.2.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.63.8.1 30-Oct-2012  yamt sync with head
 1.64.10.1 10-Aug-2014  tls Rebase.
 1.64.2.1 18-May-2014  rmind sync with head
 1.66.8.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.66.6.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.66.4.9 28-Aug-2017  skrll Sync with HEAD
 1.66.4.8 05-Feb-2017  skrll Sync with HEAD
 1.66.4.7 05-Oct-2016  skrll Sync with HEAD
 1.66.4.6 29-May-2016  skrll Sync with HEAD
 1.66.4.5 22-Apr-2016  skrll Sync with HEAD
 1.66.4.4 19-Mar-2016  skrll Sync with HEAD
 1.66.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.66.4.2 22-Sep-2015  skrll Sync with HEAD
 1.66.4.1 06-Jun-2015  skrll Sync with HEAD
 1.66.2.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.73.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.73.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.75.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.77.6.1 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.78.2.2 21-May-2018  pgoyette Sync with HEAD
 1.78.2.1 02-May-2018  pgoyette Synch with HEAD
 1.80.8.1 29-Feb-2020  ad Sync with head.
 1.80.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.82.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.85.10.1 02-Aug-2025  perseant Sync with HEAD
 1.85.8.2 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.85.8.1 15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.43 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.42 17-Feb-2021  christos - pass the alignment instead of the mask (as Roy asked and to match the
other macro)
- use alignof to determine that alignment and CTASSERT what we expect
- remove unused macros
 1.41 16-Feb-2021  martin ARP headers only need 2 byte alignment - pointed out by roy.
 1.40 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.39 14-Feb-2021  roy if_arp: Just KASSERT that arphrd is aligned

While here improve readability of checking ARP IEEE1394 matches interface.
 1.38 13-Feb-2021  roy if_arp: Ensure that arphdr is aligned
 1.37 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.36 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.35 03-Feb-2021  roy Whitespace
 1.34 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.33 30-Jun-2018  christos branches: 1.33.12;
Provide an inline to return the data part of the arp packet instead of
open-coding it in multiple places.
 1.32 19-Apr-2018  christos branches: 1.32.2;
s/static inline/static __inline/g for consistency.
 1.31 13-Feb-2018  maxv branches: 1.31.2;
Define ar_* as inlined functions, not as macros. Makes it easier to
understand why ARPHRD_IEEE1394 needs to be handled with care - it doesn't
have ar_tha.
 1.30 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.29 15-Apr-2008  thorpej branches: 1.29.48; 1.29.68;
Make ARP stats per-cpu.
 1.28 20-Feb-2008  matt branches: 1.28.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.27 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.26 04-Mar-2007  christos branches: 1.26.16; 1.26.22; 1.26.24; 1.26.28;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.25 10-Dec-2005  elad branches: 1.25.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.24 07-Aug-2003  agc branches: 1.24.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.23 24-Jun-2002  itojun branches: 1.23.6;
integrate IEEE1394 ARP into generic ARP logic.
XXX there's no check at all in ar_hrd, and we don't set ar_hrd on outgoing.
it seems like a bad thing.
 1.22 12-Jun-2001  wiz branches: 1.22.2; 1.22.14;
receive, not recieve
 1.21 15-Aug-2000  jhawk branches: 1.21.2;
Add kernel counters for arp events, displayable with netstat -s -f arp
 1.20 27-Aug-1999  thorpej branches: 1.20.2;
packed -> __packed__
 1.19 08-May-1999  matt Add ARP hardware type for IEEE 1394 (FireWire)
 1.18 22-Mar-1999  bad branches: 1.18.4;
Add ARPHRD_IEEE802.
 1.17 23-Feb-1999  is Remove zero length array
 1.16 23-Feb-1999  kleink Addendum to rev. 1.15: use of __extension__ here is supported in GCC 2.8.0 and
above only; since this is the only occurence, fix it locally rather than in
<sys/cdefs.h> as to not remove all the functionality on pre-2.8 systems.
XXX Shouldn't use zero-length arrays at all.
 1.15 21-Feb-1999  kleink Zero-sized arrays are a GNU C extension; from Dave Sainty in PR kern/6271.
 1.14 10-Dec-1998  christos linted comment
 1.13 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.12 08-Sep-1997  mikel eliminate non-comment text after #endifs; from Dave Sainty in PR kern/4091
 1.11 25-Mar-1997  jonathan branches: 1.11.4;
Add ARP hardware type for Richochet "starmode" radio addresses.
 1.10 23-Mar-1997  is Fix several bugs related to the new ARP code, and ARCnet ARP support.
Among other, add ARPHRD_ARCNET definition, make sure the hardware type is
set on outgoing ARP packets, make sure we dont send out replies as broadcasts.
 1.9 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.8 08-Mar-1995  cgd branches: 1.8.8;
fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.7 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.6 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.5 05-Sep-1993  cassidy Add definitions for RARP request and reply.
 1.4 03-Aug-1993  glass more "warning: `/*' within comment" fixes
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.8.8.2 12-Mar-1997  is Merge in changes from The Trunk
 1.8.8.1 11-Feb-1997  is - Add macros, to if_arp.h:struct arphdr, to access an ARP messages' variable
fields based on the ar_hln and ar_pln fields.
- Add AR_ARP case to ether_output, using the ar_tha() macro defined above.
 1.11.4.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.18.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.20.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.2.2 01-Aug-2002  nathanw Catch up to -current.
 1.21.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.22.14.1 15-Jul-2002  gehenna catch up with -current.
 1.22.2.1 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.23.6.4 11-Dec-2005  christos Sync with head.
 1.23.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.23.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.23.6.1 03-Aug-2004  skrll Sync with HEAD
 1.24.16.4 27-Feb-2008  yamt sync with head.
 1.24.16.3 21-Jan-2008  yamt sync with head
 1.24.16.2 03-Sep-2007  yamt sync with head.
 1.24.16.1 21-Jun-2006  yamt sync with head.
 1.25.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.26.28.1 02-Jan-2008  bouyer Sync with HEAD
 1.26.24.1 26-Dec-2007  ad Sync with head.
 1.26.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.26.16.2 23-Mar-2008  matt sync with HEAD
 1.26.16.1 09-Jan-2008  matt sync with HEAD
 1.28.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.29.68.1 22-Sep-2015  skrll Sync with HEAD
 1.29.48.1 03-Dec-2017  jdolecek update from HEAD
 1.31.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.31.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.32.2.1 10-Jun-2019  christos Sync with HEAD
 1.33.12.1 03-Apr-2021  thorpej Sync with HEAD.
 1.22 06-Sep-2018  maxv Remove the network ATM code.
 1.21 28-Apr-2016  ozaki-r branches: 1.21.16; 1.21.18;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.20 06-Sep-2015  dholland More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.19 01-Feb-2011  chuck branches: 1.19.14; 1.19.32;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.18 20-Feb-2008  matt branches: 1.18.32; 1.18.38; 1.18.40;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.17 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.16 17-Feb-2007  dyoung branches: 1.16.18; 1.16.24; 1.16.26; 1.16.30;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.15 11-Dec-2005  thorpej branches: 1.15.26;
ANSI function decls and application of static.
 1.14 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.13 26-Feb-2005  perry branches: 1.13.4;
nuke trailing whitespace
 1.12 05-Jul-2001  toshii branches: 1.12.22; 1.12.30; 1.12.32;
Fix typo. s/extention/extension/
 1.11 19-Nov-1999  thorpej branches: 1.11.6;
Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.10 01-Jul-1999  itojun branches: 1.10.2; 1.10.8;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.9 06-May-1998  bouyer branches: 1.9.10; 1.9.12;
Make ATM_LLC_SETTYPE do the rigth thing: swap byte on LE machines,
don't swap on BE machines. The previous revision required a ntohs()
in atm_output(), to work on LE machines. This was broken for BE machines.
 1.8 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.7 09-Nov-1996  chuck branches: 1.7.14;
fix previous byte-order fix the correct way
(from Zdenek Salvet <salvet@horn.ics.muni.cz>)
 1.6 03-Jul-1996  chuck ported ATM to FreeBSD 2.2-960612-SNAP
 1.5 29-Jun-1996  chuck change:
- change asock to rxhand and adjust all for this [esp atm_input]
 1.4 28-Jun-1996  chuck add hook for user to turn on/off raw mode
 1.3 27-Jun-1996  chuck fix/improvement:
- add proto if atm_input
- add native mode atm hooks to if_atmsubr.c (atm_input)
 1.2 26-Jun-1996  chuck [1] add new rxso passing structure to if_atm.h
[2] modify atm_output to handle native mode atm output mbufs
 1.1 22-Jun-1996  chuck network support for ATM networks (ATM == Async Transfer Mode, not
Automatic Teller Machine).

Currently supports PVCs only (no ATM ARP either).
 1.7.14.1 08-May-1998  mycroft Pull up 1.9, per request of bouyer.
 1.9.12.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.9.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.9.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.10.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.10.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.12.32.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.12.30.1 29-Apr-2005  kent sync with -current
 1.12.22.2 11-Dec-2005  christos Sync with head.
 1.12.22.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.13.4.4 27-Feb-2008  yamt sync with head.
 1.13.4.3 21-Jan-2008  yamt sync with head
 1.13.4.2 26-Feb-2007  yamt sync with head.
 1.13.4.1 21-Jun-2006  yamt sync with head.
 1.15.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.16.30.1 02-Jan-2008  bouyer Sync with HEAD
 1.16.26.1 26-Dec-2007  ad Sync with head.
 1.16.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.16.18.2 23-Mar-2008  matt sync with HEAD
 1.16.18.1 09-Jan-2008  matt sync with HEAD
 1.18.40.1 08-Feb-2011  bouyer Sync with HEAD
 1.18.38.1 06-Jun-2011  jruoho Sync with HEAD.
 1.18.32.1 05-Mar-2011  rmind sync with head
 1.19.32.2 29-May-2016  skrll Sync with HEAD
 1.19.32.1 22-Sep-2015  skrll Sync with HEAD
 1.19.14.1 03-Dec-2017  jdolecek update from HEAD
 1.21.18.1 10-Jun-2019  christos Sync with HEAD
 1.21.16.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.62 06-Sep-2018  maxv Remove the network ATM code.
 1.61 11-Jan-2017  ozaki-r branches: 1.61.14; 1.61.16;
Get rid of unnecessary header inclusions
 1.60 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.59 10-Jun-2016  ozaki-r branches: 1.59.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.58 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.57 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.56 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.55 28-Jan-2016  ozaki-r Tidy up

- KNF
- Remove obsolete ifdefs for other OSes
- Remove unnecessary else block

No functional change.
 1.54 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.53 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.52 05-Jun-2014  rmind branches: 1.52.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.51 15-May-2014  msaitoh Put schednetisr() into splnet()/splx() pair.
This might avoids delay of processing a packet.
 1.50 11-Oct-2012  christos branches: 1.50.2; 1.50.10;
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.49 01-Feb-2011  chuck branches: 1.49.4; 1.49.10; 1.49.14;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.48 05-Apr-2010  joerg branches: 1.48.2; 1.48.4;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.47 19-Jan-2010  pooka branches: 1.47.2; 1.47.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.46 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.45 18-Mar-2009  cegger bcopy -> memcpy
 1.44 18-Mar-2009  cegger bcmp -> memcmp
 1.43 17-Dec-2008  cegger branches: 1.43.2;
kill MALLOC and FREE macros.
 1.42 15-Jun-2008  christos branches: 1.42.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.41 20-Feb-2008  matt branches: 1.41.6; 1.41.8; 1.41.10; 1.41.12; 1.41.14;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.40 19-Oct-2007  ad machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.39 07-Mar-2007  liamjfoy branches: 1.39.2; 1.39.14; 1.39.16; 1.39.20;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.38 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.37 11-Dec-2005  thorpej branches: 1.37.26;
ANSI function decls and application of static.
 1.36 11-Dec-2005  christos merge ktrace-lwp.
 1.35 31-Mar-2005  christos branches: 1.35.2;
factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.34 26-Feb-2005  perry nuke trailing whitespace
 1.33 21-Apr-2004  itojun branches: 1.33.4; 1.33.6;
kill sprintf, use snprintf
 1.32 19-Jan-2003  simonb branches: 1.32.2;
Remove variable that is only assigned too but not referenced.
 1.31 12-Nov-2001  lukem add RCSIDs
 1.30 18-Jul-2001  thorpej bzero -> memset
 1.29 14-Jun-2001  itojun branches: 1.29.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.28 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.27 17-Jan-2001  thorpej branches: 1.27.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.26 18-Dec-2000  thorpej Fill in if_dlt.
 1.25 13-Dec-2000  thorpej Add ALTQ glue.
 1.24 12-Dec-2000  thorpej Include BPF headers as necessary (feh, too many changes to try and
merge...)
 1.23 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.22 30-Mar-2000  augustss Kill some more register declarations.
 1.21 28-Jan-2000  enami Set the right ethertype in LLC header for PVC interface.
Pointed by onoe@sm.sony.co.jp
 1.20 01-Jul-1999  itojun branches: 1.20.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.19 18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.18 05-Jul-1998  jonathan branches: 1.18.6; 1.18.10; 1.18.12;
defopt NATM.
 1.17 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.16 06-May-1998  drochner This comment is not true.
 1.15 01-May-1998  thorpej Glue in IP flow fast forwarding.
 1.14 15-Apr-1998  bouyer Fix my previous commit: the ATM_LLC_* macros do the ntoh/hton conversion,
so the bug was not a missing ntohs in atm_input(), it was an extraneous
htons in atm_output().
 1.13 24-Mar-1998  bouyer Add a missing ntohs. With this change I got ip over atm (vpi/vci) working
between 2 PCs.
 1.12 15-Mar-1997  cgd branches: 1.12.8;
s/if_ethertypes.h/ethertypes.h/ because if_ethertypes.h doesn't exist
 1.11 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.10 11-Mar-1997  chuck minor fix to freebsd section of code from Kenjiro Cho <kjc@csl.sony.co.jp>
 1.9 09-Nov-1996  chuck branches: 1.9.4;
fix previous byte-order fix the correct way
(from Zdenek Salvet <salvet@horn.ics.muni.cz>)
 1.8 18-Oct-1996  chuck fix: add missing ntohs() for llc mode, as noted by several people including
Dong Lin, Zdenek Salvet, and Matthias Drochner(i think).
 1.7 13-Oct-1996  christos backout previous kprintf change
 1.6 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.5 03-Jul-1996  chuck ported ATM to FreeBSD 2.2-960612-SNAP
 1.4 29-Jun-1996  chuck change:
- change asock to rxhand and adjust all for this [esp atm_input]
 1.3 27-Jun-1996  chuck fix/improvement:
- add proto if atm_input
- add native mode atm hooks to if_atmsubr.c (atm_input)
 1.2 26-Jun-1996  chuck [1] add new rxso passing structure to if_atm.h
[2] modify atm_output to handle native mode atm output mbufs
 1.1 22-Jun-1996  chuck network support for ATM networks (ATM == Async Transfer Mode, not
Automatic Teller Machine).

Currently supports PVCs only (no ATM ARP either).
 1.9.4.1 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.12.8.1 05-May-1998  mycroft Pull up 1.13-1.14, per request of bouyer.
 1.18.12.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.18.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.18.10.2 01-Jul-1999  thorpej Sync w/ -current.
 1.18.10.1 21-Jun-1999  thorpej Sync w/ -current.
 1.18.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.20.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.20.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.20.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.20.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.20.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.27.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.27.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.27.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.29.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.29.2.1 03-Aug-2001  lukem update to -current
 1.32.2.5 01-Apr-2005  skrll Sync with HEAD.
 1.32.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.32.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.32.2.1 03-Aug-2004  skrll Sync with HEAD
 1.33.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.33.4.1 29-Apr-2005  kent sync with -current
 1.35.2.5 27-Feb-2008  yamt sync with head.
 1.35.2.4 27-Oct-2007  yamt sync with head.
 1.35.2.3 03-Sep-2007  yamt sync with head.
 1.35.2.2 26-Feb-2007  yamt sync with head.
 1.35.2.1 21-Jun-2006  yamt sync with head.
 1.37.26.2 12-Mar-2007  rmind Sync with HEAD.
 1.37.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.39.20.1 25-Oct-2007  bouyer Sync with HEAD.
 1.39.16.2 23-Mar-2008  matt sync with HEAD
 1.39.16.1 06-Nov-2007  matt sync with HEAD
 1.39.14.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.39.2.1 23-Oct-2007  ad Sync with head.
 1.41.14.1 18-Jun-2008  simonb Sync with head.
 1.41.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.41.10.3 11-Aug-2010  yamt sync with head.
 1.41.10.2 11-Mar-2010  yamt sync with head
 1.41.10.1 04-May-2009  yamt sync with head.
 1.41.8.1 17-Jun-2008  yamt sync with head.
 1.41.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.41.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.42.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.42.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.43.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.47.4.2 05-Mar-2011  rmind sync with head
 1.47.4.1 30-May-2010  rmind sync with head
 1.47.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.48.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.48.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.49.14.3 03-Dec-2017  jdolecek update from HEAD
 1.49.14.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.49.14.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.49.10.1 31-Oct-2012  riz Pull up following revision(s) (requested by christos in ticket #638):
sys/net/if_ppp.c: revision 1.137
sys/netinet6/ip6_flow.c: revision 1.20
sys/net/if_fddisubr.c: revision 1.82
sys/net/if_ethersubr.c: revision 1.192
sys/netinet6/in6_var.h: revision 1.66
sys/net/if_atmsubr.c: revision 1.50
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.49.4.1 30-Oct-2012  yamt sync with head
 1.50.10.1 10-Aug-2014  tls Rebase.
 1.50.2.1 18-May-2014  rmind sync with head
 1.52.4.8 05-Feb-2017  skrll Sync with HEAD
 1.52.4.7 05-Oct-2016  skrll Sync with HEAD
 1.52.4.6 09-Jul-2016  skrll Sync with HEAD
 1.52.4.5 29-May-2016  skrll Sync with HEAD
 1.52.4.4 22-Apr-2016  skrll Sync with HEAD
 1.52.4.3 19-Mar-2016  skrll Sync with HEAD
 1.52.4.2 22-Sep-2015  skrll Sync with HEAD
 1.52.4.1 06-Jun-2015  skrll Sync with HEAD
 1.59.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.59.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.61.16.1 10-Jun-2019  christos Sync with HEAD
 1.61.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.199 22-Apr-2025  ozaki-r bridge: resolve a race condition in bridge_stop()

Without BRIDGE_LOCK, the callout can be scheduled after callout_halt.

Note that we should avoid depending on IFF_RUNNING which can be racy.
Suggested by riastradh at https://mail-index.netbsd.org/source-changes-d/2025/04/16/msg014470.html

PR kern/59340
 1.198 22-Apr-2025  ozaki-r Revert "bridge: avoid a race condition on stopping callout" (r1.197)

There is a better fix.
 1.197 16-Apr-2025  ozaki-r bridge: avoid a race condition on stopping callout

Without BRIDGE_LOCK, the callout can be scheduled after callout_halt.
 1.196 16-Dec-2024  ozaki-r bridge: unify frame discarding paths (NFC)
 1.195 16-Dec-2024  ozaki-r bridge: remove redundant IFF_RUNNING check

It has been done in bridge_input, so doing in bridge_forward is
redundant and yet racy.

Also it fixes a possible mbuf leak.
 1.194 03-Sep-2024  ozaki-r bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.
 1.193 16-Jul-2024  ozaki-r bridge: get rid of unnecessary macros for pserialize
 1.192 16-Jul-2024  ozaki-r bridge: add missing curlwp_bind() for pppoe

From knakahara@
 1.191 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.190 29-Jun-2024  riastradh branches: 1.190.2;
if_stats(9): Add ifp argument to if_stat..._ref.

This will enable us to pass the ifp through to a dtrace probe inside.

No functional change intended in this change, but this is an API
change visible to modules so it shouldn't be pulled up.

PR kern/58377
 1.189 29-Jul-2022  skrll branches: 1.189.4; 1.189.6;
Sprinkle const
 1.188 29-Jul-2022  skrll Trailing whitespace
 1.187 20-Jun-2022  yamaguchi bridge(4): support VLAN frames stripped by hardware tagging
 1.186 31-Dec-2021  riastradh sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.185 31-Dec-2021  riastradh sys: Use if_ioctl wrapper function.
 1.184 31-Dec-2021  riastradh sys: Use if_stop wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.183 30-Sep-2021  yamaguchi bridge: Register bridge_ifdetach to ether_ifdetach hook
 1.182 30-Sep-2021  yamaguchi bridge: Register bridge_calc_link_state to link-state change hook
 1.181 02-Jul-2021  yamaguchi Use if_ioctl() for changing MTU, not ether_ioctl to prevent panic

Fix PR kern/56292
 1.180 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.179 19-Feb-2021  christos branches: 1.179.4;
- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.178 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.177 02-Nov-2020  roy bridge: revert prior

It's of little use.
If we need to do this in the future, consider a sysctl to do it for all
interfaces in the bridge and not just the one being added.
 1.176 27-Sep-2020  roy branches: 1.176.2;
bridge: When an interface joins then mark addresses on it as tentative

The exact flow is detatch addresses, join bridge and then mark detached
addresses as tentative.
This ensures that Duplicate Address Detection for the joining interface
are performed across all members of the bridge.
 1.175 27-Sep-2020  roy bridge: Calculate link state as the best link state of any member

If any member is LINK_STATE_UP then it's LINK_STATE_UP.
Otherwise if any member is LINK_STATE_UNKNOWN then it's LINK_STATE_UNKNOWN.
Otherwise it's LINK_STATE_DOWN.
 1.174 01-Aug-2020  maxv Remove #ifdef BRIDGE_IPF, compile in the code by default. Sent to
tech-net@.
 1.173 01-May-2020  jdolecek report no enabled capabilities when no interface is part of bridge
 1.172 30-Apr-2020  jdolecek for bridge(4), report the common enabled capabilities of the members
via SIOCGIFCAP for visibility
 1.171 27-Apr-2020  jdolecek if MTU of the added interface doesn't match the bridge, modify the MTU
of the interface to that of the bridge instead of just refusing the
addition with EINVAL

this is a convenience feature to simplify bridge setup with non-standard
MTU, the useful behaviour observed with Linux xenbr
 1.170 27-Mar-2020  jdolecek replace the conditional m_pullup() on start of bridge_output() with
a KASSERT(), to make it clear no mbuf manipulation is ever done here

the condition should never trigger, this always runs after ether_output()
M_PREPEND()s ether_header
 1.169 24-Mar-2020  jdolecek reset the csum_flags in bridge_brodcast() also for bmcast path

for destination interfaces with real hardware offloading this fixes
multicast packet corruption; for xvif(4) this fix stops treating them
as having no csum

may fix PR kern/42386
 1.168 24-Feb-2020  rin Remove debug printf I put into bridge_calc_csum_flags().
Sorry for noise.
 1.167 23-Feb-2020  jdolecek disable the DEBUG bridge_calc_csum_flags() printf
 1.166 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.165 05-Aug-2019  msaitoh branches: 1.165.2;
Cast uint32_t to avoid undefined behavior in bridge_rthash(). Found by kUBSan.
 1.164 22-Dec-2018  rin branches: 1.164.4;
Take the interface out of promiscuous mode in bridge_delete_member()
instead of bridge_ioctl_del(). Otherwise, the member interfaces are
left in promiscuous mode when the bridge is destroyed.
 1.163 15-Dec-2018  rin Improve wording in comments: replace "chain" with "queue" for
sequence of mbuf's connected by m_nextpkt, in order to avoid
confusion with those connected by m_next.

No binary changes.
 1.162 14-Dec-2018  martin Need <netinet6/ip6_var.h> for ip6_statinc() prototype.
 1.161 12-Dec-2018  rin PR kern/53562

Handle TX offload in software when a packet is sent via
bridge_output(). We can send it as is in the following
exceptional cases:

For unicast:

(1) When the destination interface is the same as source.

(2) When the destination supports all TX offload options
specified in a packet.

For multicast/broadcast:

(3) When all the members of the bridge support the specified
TX offload options.

For (3), add sc_csum_flags_tx flag to bridge softc, which is
logical AND b/w capabilities of TX offload options in member
interface (ifp->if_csum_flags_tx). The flag is updated when a
member is (i) added to or (ii) removed from a bridge, or (iii)
if_csum_flags_tx flag of a member interface is manipulated via
ifconfig(8).

Turn on M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx flag when
TSO[46] is enabled for that interface.

OK msaitoh thorpej
 1.160 09-Nov-2018  ozaki-r Fix that brconfig <bridge> (addr) can't show a large number of MAC addresses

The command shows only 256 addresses at maximum even if a bridge caches more
addresses. It occurs because the kernel doesn't return an error if the command
passes a short buffer that can't store all cached addresses; the kernel fills
cached addresses as much as possible and returns it without telling that the
result is truncated.

Fix the issue by telling a required size of a buffer if a buffer passed from the
command is not enough, which lets the command retry with an enough buffer.

Reported by k-goda@IIJ
 1.159 19-Sep-2018  msaitoh Micro optimization. m_copym(M_COPYALL) -> m_copypacket().
 1.158 18-Sep-2018  msaitoh - Fix bridge_enqueue() which was broken by last commit. Use correct mbuf
pointer.
- Modify comment.
 1.157 14-Sep-2018  msaitoh Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
flags. bridge_enqueue() is called from bridge_output() when a packet is
spontaneous. Clear csum_flags before calling brige_enqueue() in
bridge_forward() or bridge_broadcast() instead of in the beginning of
bridge_enqueue().

Note that this change doesn't fix a problem on the following configuration:

A bridge has two or more interfaces.

An address is assigned to an bridge member interface and
some offload flags are set.

Another interface has no address and has no any offload flag.

XXX pullup-[78]
 1.156 25-May-2018  ozaki-r branches: 1.156.2;
Ensure to call if_register after interface initializations finish
 1.155 14-May-2018  ozaki-r Protect packet input routines with KERNEL_LOCK and splsoftnet

if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.

if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@
 1.154 18-Apr-2018  ozaki-r Add missing PSLIST_ENTRY_INIT and PSLIST_ENTRY_DESTROY
 1.153 18-Apr-2018  ozaki-r Get rid of a unnecessary semicolon

Pointed out by kamil@
 1.152 18-Apr-2018  ozaki-r bridge: use pslist(9) for rtlist and rthash

The change fixes race conditions on list operations. One example is that a
reader may see invalid pointers on a looking item in a list due to lack of
membar_producer.
 1.151 18-Apr-2018  ozaki-r Simplify bridge_rtnode_insert (NFC)
 1.150 18-Apr-2018  ozaki-r Remove obsolete NULL checks
 1.149 10-Apr-2018  ozaki-r Fix bridge_rtdelete

It removes a rtable entry that belongs to a specified interface, however, its
original behavior was to delete all belonging entries. Restore the original
behavior.
 1.148 15-Jan-2018  maxv branches: 1.148.2;
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:

ifconfig bridge0 create
ifconfig bridge0 destroy
 1.147 28-Dec-2017  ozaki-r Ensure the timer isn't running by using workqueue_wait
 1.146 19-Dec-2017  ozaki-r Don't set IFEF_MPSAFE unless NET_MPSAFE at this point

Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.

Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.145 11-Dec-2017  ozaki-r Wrap if_ioctl_lock with IFNET_* macros (NFC)

Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
 1.144 08-Dec-2017  ozaki-r Fix build of kernels without ether

By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.

PR kern/52790
 1.143 06-Dec-2017  ozaki-r Ensure to not turn on IFF_RUNNING of an interface until its initialization completes

And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
 1.142 06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock when calling if_flags_set
 1.141 17-Nov-2017  ozaki-r Add missing IFEF_NO_LINK_STATE_CHANGE to bridge
 1.140 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.139 15-Nov-2017  ozaki-r Mark callouts of bridge CALLOUT_MPSAFE
 1.138 25-Oct-2017  ozaki-r Remove unnecessary splsoftnet
 1.137 25-Oct-2017  ozaki-r Don't free sc_rthash twice
 1.136 23-Oct-2017  msaitoh - If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
 1.135 02-Oct-2017  ozaki-r Add curlwp_bind to bridge_input for psref

It can be called in a thread context via tap (tap_dev_write).

Fix PR kern/52587
 1.134 07-Mar-2017  ozaki-r branches: 1.134.6;
Remove unnecessary splnet for bridge_enqueue

bridge_enqueue now uses if_transmit_lock that does splnet for device
drivers, so splnet for bridge_enqueue isn't needed anymore.
 1.133 16-Feb-2017  knakahara add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.132 23-Jan-2017  ozaki-r Replace some splnet with splsoftnet
 1.131 15-Sep-2016  christos branches: 1.131.2;
Always do the mbuf checks. The packet filters (npf) expect the mbuf to be
pulled-up. (Krists Krilovs)
 1.130 29-Aug-2016  ozaki-r KNF; replace white spaces with hard tabs

No functional change.
 1.129 22-Jun-2016  knakahara branches: 1.129.2;
fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.128 20-Jun-2016  knakahara fix: should not assert IFEF_OUTPUT_MPSAFE in bridge_output()
 1.127 20-Jun-2016  knakahara tentative fix for ATF(net/if_bridge/t_bridge)
 1.126 20-Jun-2016  knakahara make bridge_output MP-safe, so that bridge(4) can enable IFEF_OUTPUT_MPSAFE.

making MP-scalable is future work.
 1.125 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.124 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.123 16-May-2016  ozaki-r Apply if_get and if_put to bridge(4)
 1.122 04-May-2016  roy Allow multicast/broadcast packets from a bridge member to other members.
Note this should just call bridge_broadcast when more locking issues are
resolved.
 1.121 28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.120 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.119 24-Apr-2016  christos CID 1358673: dead code
 1.118 22-Apr-2016  roy Change used from int to bool.
If used, abort the loop because we think we're already at the end.
 1.117 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.116 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (2/3) : eliminate pktattr argument from altq implemantation
 1.115 19-Apr-2016  ozaki-r Apply psref(9) to bridge(4)

Note that there is an issue that ioctls for an interface and a destruction
of the interface can run in parallel and it causes race conditions on
bridge as well (it rarely happens). The issue will be addressed in the
interface common code (if.c).
 1.114 19-Apr-2016  ozaki-r Remove BRIDGE_MPSAFE switch and enable MP-safe code by default

We need to enable it by default because bridge_input now runs
in softint, but bridge_input w/o BRIDGE_MPSAFE was designed as
it runs in hardware interrupt.

Note that there remains a racy code in bridge_output; it will be
solved in the upcoming change (applying psref(9)).
 1.113 11-Apr-2016  ozaki-r Fix usage of pslist(9)

Pointed out by riastradh@.
 1.112 11-Apr-2016  ozaki-r Use pslist(9) in bridge(4)

This adds missing memory barriers to list operations for pserialize.
 1.111 28-Mar-2016  ozaki-r Remove unused global bridge list

Pointed out by riastradh@
 1.110 23-Mar-2016  ozaki-r Fix LIST_FOREACH argument
 1.109 23-Mar-2016  ozaki-r Use LIST_FOREACH instead of LIST_FOREACH_SAFE

No need to use *_SAFE because we don't remove any items in the loop.
 1.108 15-Feb-2016  ozaki-r Simplify bridge(4)

Thanks to introducing softint-based if_input, the entire bridge code now
never run in hardware interrupt context. So we can simplify the code.

- Remove spin mutexes
- They were needed because some code of bridge could run in
hardware interrupt context
- We now need only an adaptive mutex for each shared object
(a member list and a forwarding table)
- Remove pktqueue
- bridge_input is already in softint, using another softint
(for bridge_forward) is useless
- Packet distribution should be down at device drivers
 1.107 10-Feb-2016  ozaki-r Don't share struct work, instead have one per softc

Pointed out by riastradh@
 1.106 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.105 19-Nov-2015  christos Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig@espci.fr). Factor out the vlan_mtu enabling and
disabling code.
 1.104 20-Oct-2015  maxv Harmless alloc inconsistency; make sure the exact same argument is given to
kmem_alloc/kmem_free. Found by Brainy.
 1.103 07-Oct-2015  ozaki-r Enqueue frames to a curcpu's pktqueue

Currently RX can run on a CPU other than CPU#0, so always enqueuing
to a pktqueue of CPU#0 makes no sense. Let's use a curcpu's pktqueue,
although bridge_foward softint doesn't run in parallel without
NET_MPSAFE.

This is a temporal solution. We need a fundamental solution.
 1.102 28-Aug-2015  rjs Don't set M_PROTO1 in mbuf flags.

This was left over from the old usage of gif(4) with bridges.
 1.101 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.100 23-Jul-2015  ozaki-r Fix PR 48104

So far bridge cannot receive frames via a member interface when the frames
come from another member interface. So when we assign an IP address to
a member interface, hosts connected to another member interface cannot
ping to the IP address. That behavior isn't expected. See PR 48104 for
more realistic examples of this issue.

The change does:
- drop M_PROMISC before ether_input, which allows a bridge member interface
to receive a frame coming from another bridge member interface
- receive broadcast/multicast frames via all bridge member interfaces,
which is required to receive IPv6 multicast packets destined to a
multicast group belonging to a bridge member interface that is different
from a packet arrival interface

roy@ helped testing of the fix, thanks!
 1.99 01-Jun-2015  matt Modify the BRDGGIFS and BRDGRTS cmds to be more COMPAT_NETBSD32 friendly.
(XXX whitespace)
 1.98 16-Apr-2015  ozaki-r Fix racy bridge_delete_member

It can be called from bridge_ioctl_del and bridge_clone_destroy with
a same bridge member (bif) at the same time. We have to prevent
that happens.

Pointed out by riastradh@
 1.97 08-Jan-2015  ozaki-r Use pserialize for rtlist in bridge

This change enables lockless accesses to bridge rtable lists.
See locking notes in a comment to know how pserialize and
mutexes are used. Some functions are rearranged to use
pserialize. A workqueue is introduced to use pserialize in
bridge_rtage via bridge_timer callout.

As usual, pserialize and mutexes are used only when NET_MPSAFE
on. On the other hand, the newly added workqueue is used
regardless of NET_MPSAFE on or off.
 1.96 01-Jan-2015  ozaki-r Reset the expire time of a cache on receiving a frame for the cache

The expire time of a cache in a bridge MAC address table was never reset
once it is initialized regardless of traffic for the cache. The behavior
isn't supposed and active caches are unnecessarily expired and removed.

PR kern/49507
 1.95 31-Dec-2014  ozaki-r Use pserialize in bridge

This change enables lockless accesses to bridge member lists.
See locking notes in a comment to know how pserialize and
mutexes are used.

This change also provides support for softint-based interrupt
handling; pserialize readers can run in both HW interrupt and
softint contexts.

As usual, pserialize is used only when NET_MPSAFE on.
 1.94 25-Dec-2014  ozaki-r Use LIST_FOREACH_SAFE in bridge_rt* functions
 1.93 24-Dec-2014  ozaki-r Replace malloc/free with kmem_* in if_bridge

Additionally M_NOWAIT is replaced with KM_SLEEP.
 1.92 22-Dec-2014  ozaki-r Call ether_input/m_freem without holding a lock or referencing unnecessary objects

When NET_MPSAFE on, a bridge tries to pass up a packet to Layer 3
(or call m_freem) with holding a lock or referencing unnecessary
objects. That causes random lock ups. The change fixes the issue.
 1.91 15-Aug-2014  ozaki-r branches: 1.91.2;
bridge: reject non-IFF_SIMPLEX interfaces

bridge does not work with !IFF_SIMPLEX interfaces (PR/18035);
the bug is not yet fixed. Until it gets fixed, we should
reject non-IFF_SIMPLEX interfaces.

Discussed with pooka@
 1.90 23-Jul-2014  ozaki-r branches: 1.90.2;
Avoid calling copyout with holding mutex(IPL_NET)

Because copyout may lead a page fault that may sleep, we have to pull it
out from the critical section of mutex(IPL_NET) in bridge_ioctl_gifs.
 1.89 23-Jul-2014  ozaki-r Add missing unlock
 1.88 20-Jul-2014  ozaki-r Don't return ENETRESET when ioctl SIOCSIFMTU

Otherwise, just changing MTU with ifconfig shows
a confusable error message.

RP kern/48996
 1.87 14-Jul-2014  ozaki-r Make bridge MPSAFE

- Introduce BRIDGE_MPSAFE
- It's enabled only when NET_MPSAFE is defined
in if.h or the kernel config
- Add iflist and rtlist mutex locks
- Locking iflist is performance sensitive,
so it's not used when !BRIDGE_MPSAFE
- Add bif object reference counting
- It enables fine-grain locking for bridge member lists
by allowing to not hold a lock during touching a bif
- bridge_release_member is added to decrement the
reference count
- A condition variable is added to do bridge_delete_member
gracefully
- Add if_bridgeif to ifnet
- It's a shortcut to a bif object of a bridge member
- It reduces a bif lookup cost and so lock contention on iflist
- Make bridgestp MPSAFE too
 1.86 02-Jul-2014  ozaki-r Protect bridge_list with a mutex
 1.85 02-Jul-2014  ozaki-r Remove obsolete codes for if_snd
 1.84 23-Jun-2014  ozaki-r Get rid of unnecessary xc_broadcast after pktq_barrier

Pointed out by rmind@
 1.83 18-Jun-2014  ozaki-r Restructure bridge_input and bridge_broadcast

There are two changes:
- Assemble the places calling pktq_enqueue (bridge_forward)
for unicast and {b,m}cast frames into one
- Receive {b,m}cast frames in bridge_broadcast, not in
bridge_input

The changes make the code clear and readable. bridge_input
now doesn't need to take care of {b,m}cast frames;
bridge_forward and bridge_broadcast have the responsibility.

The changes are based on a patch of Lloyd Parkes submitted
in PR 48104, but don't fix its issue yet.
 1.82 18-Jun-2014  ozaki-r Tidy up bridge_input

No functional change.
 1.81 17-Jun-2014  ozaki-r Restructure ether_input and bridge_input

The network stack of NetBSD is well organized and
layered. A packet reception is processed from a
lower layer to an upper layer one by one. However,
ether_input and bridge_input are not structured so.
bridge_input is called inside ether_input.

The new structure replaces ifnet#if_input of a bridge
member with bridge_input when the member is attached.
So a packet goes straight on a packet reception via
a bridge, bridge_input => ether_input => ip_input.

The change is part of a patch of Lloyd Parkes submitted
in PR 48104. Unlike the patch, the change doesn't
intend to change the behavior of the packet processing.
Another patch will fix PR 48104.
 1.80 16-Jun-2014  ozaki-r Add net.interfaces.bridgeN.fwdq.{maxlen,len,drops} sysctl
 1.79 16-Jun-2014  ozaki-r Use pktqueue for bridge forwarding queue and softint
 1.78 15-Jun-2014  ozaki-r Get rid of unnecessary splnet for pool_{get,put}

A mutex prevents interrupts in the functions now.
 1.77 29-Jun-2013  rmind branches: 1.77.4;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.76 22-Mar-2012  wiz branches: 1.76.2; 1.76.4;
Fix typo in kauth name. From PR 46234 by Matthew Mondor.
Tested by Geoff Adams and Ryo ONODERA.
 1.75 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.74 19-Nov-2011  tls branches: 1.74.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.73 23-May-2011  joerg branches: 1.73.4;
simplify
 1.72 07-Dec-2010  pooka branches: 1.72.2;
_KERNEL_TOP
 1.71 19-Jan-2010  pooka branches: 1.71.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.70 17-May-2009  cegger fix crash in bridge_ioctl():


BRDGGFLT and BRDGSFILT bridge controls are only available with BRIDGE_IPF and PFIL_HOOKS defined.
In amd64 GENERIC and XEN kernel configs PFIL_HOOKS is defined but BRIDGE_IPF is not.

When a BRDGGFLT or BRDGSFILT command comes in, then ifd->ifd_cmd is not in range
of bridge_control_table_size. Then bc is not set and is dereferenced
later => BOOM.
 1.69 12-May-2009  elad Move kauth(9) call before going into splnet().

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/05/08/msg001286.html
 1.68 04-Apr-2009  bouyer Fix another typo
 1.67 04-Apr-2009  bouyer Fix a comment, and make it build.
 1.66 04-Apr-2009  bouyer Fixes from Masao Uebayashi
 1.65 04-Apr-2009  bouyer Fix for if_start() and pfil_hook() being called from hardware interrupt
context (reported on various mailing-lists, and part of PR kern/41114,
causing panic in pf(4) and possibly ipf(4) when BRIDGE_IPF is used).
Defer bridge_forward() to a software interrupt; bridge_input() enqueues
mbufs to ifp->if_snd which is handled in bridge_forward().
 1.64 18-Jan-2009  mrg branches: 1.64.2;
Fix multiple problems:

* A sign extension error creating the bridge ID corrupted the
priority (always making it the maximum).
* Do not catch STP packets on an interface for which STP is not
enabled -- it's a violation of the spec, and causes STP to fail on
neighboring bridges.
* An optimization to bstp_input() -- some information is already
known when we call it.

contributed anonymously.
 1.63 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.62 15-Jun-2008  christos branches: 1.62.2; 1.62.4; 1.62.6;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.61 15-Apr-2008  thorpej branches: 1.61.2; 1.61.4; 1.61.6; 1.61.8;
Make ip6 and icmp6 stats per-cpu.
 1.60 12-Apr-2008  cegger make this build with BRIDGE_IPF and PFIL_HOOKS options
 1.59 12-Apr-2008  thorpej Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.58 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.57 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.56 20-Feb-2008  matt branches: 1.56.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.55 19-Jan-2008  dyoung Use C99 array initializers for bridge_control_table[].
 1.54 27-Aug-2007  dyoung branches: 1.54.2; 1.54.8; 1.54.14;
LLADDR -> CLLADDR.
 1.53 26-Aug-2007  dyoung Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.52 09-Jul-2007  ad branches: 1.52.2; 1.52.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.51 12-Mar-2007  ad branches: 1.51.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.50 04-Mar-2007  christos branches: 1.50.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.49 21-Feb-2007  dyoung Use __arraycount().
 1.48 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.47 04-Jan-2007  elad branches: 1.47.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.46 23-Nov-2006  rpaulo New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.45 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.44 17-Oct-2006  dogcow now that we have -Wno-unused-parameter, back out all the tremendously ugly
code to gratuitously access said parameters.
 1.43 13-Oct-2006  dogcow More -Wunused fallout. sprinkle __unused when possible; otherwise, use the
do { if (&x) {} } while (/* CONSTCOND */ 0);
construct as suggested by uwe in <20061012224845.GA9449@snark.ptc.spbu.ru>.
 1.42 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.41 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.40 23-Jul-2006  ad branches: 1.40.4; 1.40.6;
Use the LWP cached credentials where sane.
 1.39 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.38 18-May-2006  liamjfoy branches: 1.38.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.37 14-May-2006  elad integrate kauth.
 1.36 17-Jan-2006  christos branches: 1.36.2; 1.36.4; 1.36.6; 1.36.8; 1.36.10;
Make sure that breq is also cleared (from Xin LI)
 1.35 09-Jan-2006  christos Make sure we initialize all structs to 0; from Xin LI
 1.34 24-Dec-2005  perry branches: 1.34.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.33 11-Dec-2005  thorpej ANSI function decls and application of static.
 1.32 11-Dec-2005  christos merge ktrace-lwp.
 1.31 01-Jun-2005  jdc branches: 1.31.2;
Fix this properly by renaming the conflicting variables.
 1.30 01-Jun-2005  jdc Remove extraneous definition of struct llc (found by shadow warning).
 1.29 26-Feb-2005  perry branches: 1.29.2; 1.29.4;
nuke trailing whitespace
 1.28 31-Jan-2005  kim Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
 1.27 04-Dec-2004  peter branches: 1.27.4; 1.27.6;
Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.26 06-Oct-2004  bad Interfaces that do checksum offloading indicate the checksum status of
received packets in csum_flags in the packet header. Packets that are
forwarded over the bridge need to have csum_flags cleared before being
put on the output queue. Do so in bridge_enqueue().

Discussed with Jason Thorpe.

Fixes PR kern/27007 and the first part of PR kern/21831.
 1.25 05-Oct-2004  christos Only enable BRIDGE_IPF code if PFIL_HOOKS is enabled.
 1.24 21-Apr-2004  itojun kill a sprintf
 1.23 21-Apr-2004  itojun kill sprintf, use snprintf
 1.22 31-Jan-2004  jdc branches: 1.22.2;
Use m_copydata(), m_adj() and M_PREPEND() to manipulate mbuf's in
bridge_ipf(). Fixes kernel memory corruption that occured when using
m_split() and m_cat().
Idea from OpenBSD.
 1.21 09-Dec-2003  augustss Fix spelling mistake in a comment.
 1.20 28-Oct-2003  mycroft Mark this initializer in the canonical way so it can be found later.
 1.19 25-Oct-2003  christos Fix uninitialized variable warnings
 1.18 16-Sep-2003  jdc Add a flag parameter to bridge_enqueue() to tell it whether to run the
filter or not. We only need to run the filter for bridge_forward() and
bridge_broadcast(). If we also run it for bridge_output(), we will run
the filter twice outbound per packet, so don't.

In bridge_ipf(), make sure we don't run m_cat() on a single mbuf chain
by checking to see (and remembering) if we need to m_split() the mbuf.
This fixes bridge + ipfilter on sparc.

Fixes PR kern/22063.
 1.17 11-Aug-2003  itojun rm extra blank line
 1.16 13-Jul-2003  jdc Include opt_inet.h to get INET6 definition.
Now, bridged ipv6 packets are passed through ipfilter.
However, some v6 packets still do not get transmitted when ipf is enabled.
Partial fix for PR kern/22063.
 1.15 23-Jun-2003  martin branches: 1.15.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.14 24-May-2003  kristerw Make sure splx() is called for all bridge_ioctl() error cases.
 1.13 16-May-2003  itojun use strlcpy
 1.12 14-May-2003  itojun use arc4random
 1.11 19-Mar-2003  bouyer Fix 2 bugs:
- initialise stp when the bridge is turned up, without this stp will keep
all interfaces disabled in a sequence like:
brconfig bridge0 add if0 add if1 stp if0 stp if1 up
- s/BRDGSPRI/BRDGSIFPRIO in brconfig.c:cmd_ifpriority()

add a command (ifpathcost) to change the stp path cost of the STP path cost of
an interface. Display the interface path cost with the others STP parameters.
 1.10 27-Feb-2003  perseant Make BRIDGE_IPF an option, and document it. Add it (commented) to GENERIC.
Let brconfig tell whether the bridge is using the ipfilter hook, or not.
 1.9 15-Feb-2003  perseant Add ipf packet-filtering option to if_bridge. The option is controlled at
compile-time by BRIDGE_IPF, and at runtime by brconfig with the {ipf,-ipf}
option on a per-bridge basis.

As a side-effect, add PFIL_HOOKS processing to if_bridge.
 1.8 24-Aug-2002  martin Add a function to lookup bridge members by struct ifnet * and use
it at all call sites that have such a pointer readily available.
This avoids unnecessary strcmp()s in critical paths, and removes
some XXX comments.
 1.7 08-Jun-2002  itojun reject "add" request if if_mtu is different.
 1.6 23-May-2002  itojun use IFT_BRIDGE
 1.5 24-Mar-2002  jdolecek branches: 1.5.2; 1.5.4;
Fix a memory leak in bridge_ioctl_add() when the called for non-ethernet
interface.
Problem noted and fix provided by in kern/16019 by Love.
 1.4 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.3 12-Nov-2001  lukem add RCSIDs
 1.2 17-Aug-2001  thorpej branches: 1.2.2; 1.2.4;
Only report expire time for DYNAMIC forwarding table entries.
 1.1 17-Aug-2001  thorpej Add support for building Ethernet bridges, based on Jason Wright's
bridge driver from OpenBSD, although the bridge code has been *heavily*
modified by me (the 802.1D code remains mostly unchanged from the
original).
 1.2.4.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.2.4.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.2.4.4 16-Mar-2002  jdolecek Catch up with -current.
 1.2.4.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.2.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.2.4.1 17-Aug-2001  thorpej file if_bridge.c was added on branch kqueue on 2001-08-25 06:16:56 +0000
 1.2.2.9 27-Aug-2002  nathanw Catch up to -current.
 1.2.2.8 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.2.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.2.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.2.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.2.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.2.2.3 25-Sep-2001  nathanw LWPify.
 1.2.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.2.2.1 17-Aug-2001  nathanw file if_bridge.c was added on branch nathanw_sa on 2001-08-24 00:12:08 +0000
 1.5.4.2 30-Jun-2003  grant Apply patch (requested by bouyer in ticket #1355):

Fix 2 bugs:
- initialise stp when the bridge is turned up, without this stp will
keep all interfaces disabled in a sequence like:
brconfig bridge0 add if0 add if1 stp if0 stp if1 up
- s/BRDGSPRI/BRDGSIFPRIO in brconfig.c:cmd_ifpriority()

add a command (ifpathcost) to change the stp path cost of the STP path
cost of an interface. Display the interface path cost with the others
STP parameters.
 1.5.4.1 10-Jun-2002  tv Pull up revision 1.7 (requested by itojun in ticket #217):
reject "add" request if if_mtu is different.
 1.5.2.3 29-Aug-2002  gehenna catch up with -current.
 1.5.2.2 20-Jun-2002  gehenna catch up with -current.
 1.5.2.1 30-May-2002  gehenna Catch up with -current.
 1.15.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.15.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.15.2.6 04-Feb-2005  skrll Sync with HEAD.
 1.15.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.15.2.4 19-Oct-2004  skrll Sync with HEAD
 1.15.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.15.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.15.2.1 03-Aug-2004  skrll Sync with HEAD
 1.22.2.3 12-Feb-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10306):
sys/net/if_bridge.c: revision 1.36
Make sure that breq is also cleared (from Xin LI)
 1.22.2.2 09-Jan-2006  tron Pull up following revision(s) (requested by christos in ticket #10219):
sys/net/if_bridge.c: revision 1.35
Make sure we initialize all structs to 0; from Xin LI
 1.22.2.1 08-Oct-2004  jmc branches: 1.22.2.1.2; 1.22.2.1.4;
Pullup rev 1.26 (requested by bad in ticket #900)

Interfaces that do checksum offloading indicate the checksum status of
received packets in csum_flags in the packet header. Packets that are
forwarded over the bridge need to have csum_flags cleared before being
put on the output queue. Do so in bridge_enqueue(). PR#27007 PR#21831
 1.22.2.1.4.2 12-Feb-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10306):
sys/net/if_bridge.c: revision 1.36
Make sure that breq is also cleared (from Xin LI)
 1.22.2.1.4.1 09-Jan-2006  tron Pull up following revision(s) (requested by christos in ticket #10219):
sys/net/if_bridge.c: revision 1.35
Make sure we initialize all structs to 0; from Xin LI
 1.22.2.1.2.2 12-Feb-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10306):
sys/net/if_bridge.c: revision 1.36
Make sure that breq is also cleared (from Xin LI)
 1.22.2.1.2.1 09-Jan-2006  tron Pull up following revision(s) (requested by christos in ticket #10219):
sys/net/if_bridge.c: revision 1.35
Make sure we initialize all structs to 0; from Xin LI
 1.27.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.27.6.1 12-Feb-2005  yamt sync with head.
 1.27.4.1 29-Apr-2005  kent sync with -current
 1.29.4.2 12-Feb-2006  tron Pull up following revision(s) (requested by adrianp in ticket #1163):
sys/net/if_bridge.c: revision 1.36
Make sure that breq is also cleared (from Xin LI)
 1.29.4.1 09-Jan-2006  tron Pull up following revision(s) (requested by christos in ticket #1111):
sys/net/if_bridge.c: revision 1.35
Make sure we initialize all structs to 0; from Xin LI
 1.29.2.2 12-Feb-2006  tron Pull up following revision(s) (requested by adrianp in ticket #1163):
sys/net/if_bridge.c: revision 1.36
Make sure that breq is also cleared (from Xin LI)
 1.29.2.1 09-Jan-2006  tron Pull up following revision(s) (requested by christos in ticket #1111):
sys/net/if_bridge.c: revision 1.35
Make sure we initialize all structs to 0; from Xin LI
 1.31.2.6 27-Feb-2008  yamt sync with head.
 1.31.2.5 21-Jan-2008  yamt sync with head
 1.31.2.4 03-Sep-2007  yamt sync with head.
 1.31.2.3 26-Feb-2007  yamt sync with head.
 1.31.2.2 30-Dec-2006  yamt sync with head.
 1.31.2.1 21-Jun-2006  yamt sync with head.
 1.34.2.2 01-Feb-2006  yamt sync with head.
 1.34.2.1 15-Jan-2006  yamt sync with head.
 1.36.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.36.8.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.36.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.36.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.36.6.3 11-Aug-2006  yamt sync with head
 1.36.6.2 26-Jun-2006  yamt sync with head.
 1.36.6.1 24-May-2006  yamt sync with head.
 1.36.4.2 01-Jun-2006  kardel Sync with head.
 1.36.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.36.2.1 09-Sep-2006  rpaulo sync with head
 1.38.2.1 19-Jun-2006  chap Sync with head.
 1.40.6.2 10-Dec-2006  yamt sync with head.
 1.40.6.1 22-Oct-2006  yamt sync with head
 1.40.4.2 12-Jan-2007  ad Sync with head.
 1.40.4.1 18-Nov-2006  ad Sync with head.
 1.47.2.3 24-Mar-2007  yamt sync with head.
 1.47.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.47.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.50.2.3 09-Oct-2007  ad Sync with head.
 1.50.2.2 01-Jul-2007  ad Adapt to callout API change.
 1.50.2.1 13-Mar-2007  ad Sync with head.
 1.51.2.1 11-Jul-2007  mjf Sync with head.
 1.52.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.52.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.54.14.1 20-Jan-2008  bouyer Sync with HEAD
 1.54.8.1 18-Feb-2008  mjf Sync with HEAD.
 1.54.2.1 23-Mar-2008  matt sync with HEAD
 1.56.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.56.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.56.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.61.8.1 18-Jun-2008  simonb Sync with head.
 1.61.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.61.4.4 11-Mar-2010  yamt sync with head
 1.61.4.3 20-Jun-2009  yamt sync with head
 1.61.4.2 16-May-2009  yamt sync with head
 1.61.4.1 04-May-2009  yamt sync with head.
 1.61.2.1 17-Jun-2008  yamt sync with head.
 1.62.6.4 04-Apr-2009  snj Pull up following revision(s) (requested by bouyer in ticket #660):
sys/net/if_bridge.c: revision 1.68
Fix another typo
 1.62.6.3 04-Apr-2009  snj Pull up following revision(s) (requested by bouyer in ticket #660):
sys/net/if_bridge.c: revision 1.67
Fix a comment, and make it build.
 1.62.6.2 04-Apr-2009  snj Pull up following revision(s) (requested by bouyer in ticket #660):
sys/net/if_bridge.c: revision 1.66
Fixes from Masao Uebayashi
 1.62.6.1 04-Apr-2009  snj Pull up following revision(s) (requested by bouyer in ticket #660):
sys/net/if_bridge.c: revision 1.65
sys/net/if_bridgevar.h: revision 1.14
Fix for if_start() and pfil_hook() being called from hardware interrupt
context (reported on various mailing-lists, and part of PR kern/41114,
causing panic in pf(4) and possibly ipf(4) when BRIDGE_IPF is used).
Defer bridge_forward() to a software interrupt; bridge_input() enqueues
mbufs to ifp->if_snd which is handled in bridge_forward().
 1.62.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.62.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.62.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.64.2.2 23-Jul-2009  jym Sync with HEAD.
 1.64.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.71.4.2 31-May-2011  rmind sync with head
 1.71.4.1 05-Mar-2011  rmind sync with head
 1.72.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.73.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.73.4.1 17-Apr-2012  yamt sync with head
 1.74.2.1 05-Apr-2012  mrg sync to latest -current.
 1.76.4.1 28-Aug-2013  rmind sync with head
 1.76.2.2 03-Dec-2017  jdolecek update from HEAD
 1.76.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.77.4.1 10-Aug-2014  tls Rebase.
 1.90.2.1 24-Sep-2017  snj Pull up following revision(s) (requested by manu in ticket #1409):
sys/arch/xen/xen/if_xennet_xenbus.c: 1.65
sys/arch/xen/xen/xennetback_xenbus.c: 1.53, 1.56 via patch
sys/net/if_bridge.c: 1.105
sys/net/if_ether.h: 1.65
sys/net/if_ethersubr.c: 1.215, 1.235
sys/net/if_vlan.c: 1.76, 1.77, 1.83, 1.88, 1.94
Protect vlan_unconfig with a mutex
It is not thread-safe but is likely to be executed in concurrent.
See PR 49264 for more detail.
--
Tweak vlan_unconfig
No functional change.
--
Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig%espci.fr@localhost). Factor out the vlan_mtu enabling and
disabling code.
--
Enable the VLAN mtu capability and check for the adjusted packet size
(Jean-Jacques.Puig at espci.fr).
Factor out the packet-size checking function for clarity.
--
Don't increment the reference count only when it was 0...
From Jean-Jacques.Puig
--
Account for the CRC len (Jean-Jacques.Puig)
--
Fix a bug that the parent interface's callback wasn't called when the vlan
interface is configured. A callback function uses VLAN_ATTACHED() function
which check ec->ec_nvlans, the value should be incremented before calling the
callback. This bug was added in if_vlan.c rev. 1.83 (2015/11/19).
 1.91.2.11 28-Aug-2017  skrll Sync with HEAD
 1.91.2.10 05-Feb-2017  skrll Sync with HEAD
 1.91.2.9 05-Oct-2016  skrll Sync with HEAD
 1.91.2.8 09-Jul-2016  skrll Sync with HEAD
 1.91.2.7 29-May-2016  skrll Sync with HEAD
 1.91.2.6 22-Apr-2016  skrll Sync with HEAD
 1.91.2.5 19-Mar-2016  skrll Sync with HEAD
 1.91.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.91.2.3 22-Sep-2015  skrll Sync with HEAD
 1.91.2.2 06-Jun-2015  skrll Sync with HEAD
 1.91.2.1 06-Apr-2015  skrll Sync with HEAD
 1.129.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.129.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.131.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.134.6.12 03-Oct-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #1046):

sys/net/if_bridge.c: revision 1.157
sys/net/if_bridge.c: revision 1.158
sys/net/if_bridge.c: revision 1.159

Fix a bug that bridge_enqueue() incorrectly cleared outgoing packet's offload
flags. bridge_enqueue() is called from bridge_output() when a packet is
spontaneous. Clear csum_flags before calling brige_enqueue() in
bridge_forward() or bridge_broadcast() instead of in the beginning of
bridge_enqueue().

Note that this change doesn't fix a problem on the following configuration:

A bridge has two or more interfaces.
An address is assigned to an bridge member interface and
some offload flags are set.
Another interface has no address and has no any offload flag.

XXX pullup-[78]

- Fix bridge_enqueue() which was broken by last commit. Use correct mbuf
pointer.
- Modify comment.

Micro optimization. m_copym(M_COPYALL) -> m_copypacket().
 1.134.6.11 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #843):

sys/dev/pci/ixgbe/ixv.c: revision 1.101
sys/net/if_bridge.c: revision 1.156
sys/net/if_pppoe.c: revision 1.138
sys/dev/pci/if_wm.c: revision 1.580
sys/dev/pci/ixgbe/ixgbe.c: revision 1.156
sys/net/if_gif.c: revision 1.142

Ensure to call if_register after interface initializations finish
 1.134.6.10 15-May-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #826):

sys/net/if_bridge.c: revision 1.155
sys/net/if.c: revision 1.421
sys/net/bpf.c: revision 1.224
sys/net/if.c: revision 1.422
sys/net/if.c: revision 1.423

Use if_is_mpsafe (NFC)

Protect packet input routines with KERNEL_LOCK and splsoftnet
if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.
if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect
non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@

Protect if_deferred_start_softint with KERNEL_LOCK if the interface isn't
MP-safe
 1.134.6.9 18-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #777):

tests/net/if_bridge/t_rtable.sh: revision 1.3
sys/net/if_bridge.c: revision 1.150-1.154
sys/net/if_bridgevar.h: revision 1.32

Remove obsolete NULL checks

Simplify bridge_rtnode_insert (NFC)

bridge: use pslist(9) for rtlist and rthash

The change fixes race conditions on list operations. One example is that a
reader may see invalid pointers on a looking item in a list due to lack of
membar_producer.

Add a test that checks if brconfig flush surely removes all entries

Get rid of a unnecessary semicolon
Pointed out by kamil@

Add missing PSLIST_ENTRY_INIT and PSLIST_ENTRY_DESTROY
 1.134.6.8 10-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #727):

tests/net/if_bridge/t_rtable.sh: revision 1.2
sys/net/if_bridge.c: revision 1.149

Fix bridge_rtdelete

It removes a rtable entry that belongs to a specified interface, however,
its original behavior was to delete all belonging entries.
Restore the original behavior.

Add a test case for bridge_rtdelete
 1.134.6.7 26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #572):
sys/net/if_bridge.c: 1.138, 1.148
tests/net/if_bridge/t_bridge.sh: 1.18
tests/net/if_gif/t_gif.sh: 1.12
tests/net/if_ipsec/t_ipsec.sh: 1.3
tests/net/if_l2tp/t_l2tp.sh: 1.4
tests/net/if_loop/t_basic.sh: 1.2
tests/net/if_pppoe/t_pppoe.sh: 1.18
tests/net/if_tap/t_tap.sh: 1.7
tests/net/if_tun/Makefile: 1.2
tests/net/if_tun/t_tun.sh: 1.5
tests/net/if_vlan/t_vlan.sh: 1.8
tests/net/net_common.sh: 1.26
Remove unnecessary splsoftnet
--
If the bridge is not running, don't call bridge_stop. Otherwise the
following commands will crash the kernel:
ifconfig bridge0 create
ifconfig bridge0 destroy
--
Commonalize and add tests of creating/destroying interfaces
 1.134.6.6 16-Jan-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #497):
tests/rump/rumpkern/Makefile: revision 1.16
tests/rump/kernspace/Makefile: revision 1.6
tests/rump/kernspace/workqueue.c: revision 1.1
tests/rump/kernspace/workqueue.c: revision 1.2
tests/rump/kernspace/workqueue.c: revision 1.3
tests/rump/kernspace/workqueue.c: revision 1.4
tests/rump/kernspace/workqueue.c: revision 1.5
tests/rump/kernspace/workqueue.c: revision 1.6
tests/rump/rumpkern/t_workqueue.c: revision 1.1
sys/sys/workqueue.h: revision 1.10
tests/rump/rumpkern/t_workqueue.c: revision 1.2
tests/rump/kernspace/kernspace.h: revision 1.5
tests/rump/kernspace/kernspace.h: revision 1.6
sys/net/if_bridge.c: revision 1.147
distrib/sets/lists/debug/mi: revision 1.225
sys/kern/subr_workqueue.c: revision 1.34
share/man/man9/workqueue.9: revision 1.12
sys/net/if_spppsubr.c: revision 1.178
distrib/sets/lists/tests/mi: revision 1.763
Add simple test for workqueue(9)
Add declaration. build fix
sorry, I forgot to commit this file.
Tweak use of cv_timedwait
- Handle its return value
- Specify more appropriate time-out periods (2 ticks is too short)
Fix a race condition on taking the mutex
The workqueue worker can take the mutex before the tester tries to take it after
calling workqueue_enqueue. If it happens, the worker calls cv_broadcast before
the tester calls cv_timedwait and the tester will wait until the cv timed out
Take the mutex before calling workqueue_enqueue so that the tester surely calls
cv_timedwait before the worker calls cv_broadcast.
The fix stabilizes the test, t_workqueue/workqueue1.
Add workqueue_wait that waits for a specific work to finish
The caller must ensure that no new work is enqueued before calling
workqueue_wait. Note that Note that if the workqueue is WQ_PERCPU, the caller
can enqueue a new work to another queue other than the waiting queue.
Discussed on tech-kern@
Ensure the timer isn't running by using workqueue_wait
Functionalize some routines to add new tests easily (NFC)
Add a test case for workqueue_wait
Fix build
 1.134.6.5 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.134.6.4 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.134.6.3 23-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #386):
sys/net/if_bridge.c: revision 1.141
Add missing IFEF_NO_LINK_STATE_CHANGE to bridge
 1.134.6.2 23-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #382):
sys/net/if_bridge.c: revision 1.139
sys/net/if_loop.c: revision 1.97
Don't take KERNEL_LOCK in looutput if NET_MPSAFE
We can perhaps get rid of KERNEL_LOCK from looutput, but for now
keep it for safe.
--
Mark callouts of bridge CALLOUT_MPSAFE
 1.134.6.1 02-Oct-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #297):
sys/net/if_bridge.c: revision 1.135
Add curlwp_bind to bridge_input for psref
It can be called in a thread context via tap (tap_dev_write).
Fix PR kern/52587
 1.148.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.148.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.148.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.148.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.148.2.3 21-May-2018  pgoyette Sync with HEAD
 1.148.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.148.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.156.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.156.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.156.2.1 10-Jun-2019  christos Sync with HEAD
 1.164.4.3 15-May-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1953):

sys/net/if_bridge.c: revision 1.199
sys/net/if_bridgevar.h: revision 1.40

bridge: resolve a race condition in bridge_stop()
Without BRIDGE_LOCK, the callout can be scheduled after callout_halt.

Note that we should avoid depending on IFF_RUNNING which can be racy.
Suggested by riastradh at

https://mail-index.netbsd.org/source-changes-d/2025/04/16/msg014470.html

PR kern/59340
 1.164.4.2 20-Jul-2024  martin Pull up following revision(s) (requested by rin in ticket #1858):

sys/net/if_bridge.c: revision 1.192

bridge: add missing curlwp_bind() for pppoe
From knakahara@
 1.164.4.1 27-Feb-2020  martin Pull up following revision(s) (requested by rin in ticket #734):

sys/net/if_bridge.c: revision 1.167
sys/net/if_bridge.c: revision 1.168

disable the DEBUG bridge_calc_csum_flags() printf
-
Remove debug printf I put into bridge_calc_csum_flags().

Sorry for noise.
 1.165.2.1 29-Feb-2020  ad Sync with head.
 1.176.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.176.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.179.4.2 01-Aug-2021  thorpej Sync with HEAD.
 1.179.4.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.189.6.1 11-Nov-2023  thorpej branches: 1.189.6.1.2;
Mostly de-tangle ifnet::if_snd from ifaltq, in a way that's minimally-
invasive to the ALTQ code itself.

The point of this is to lay the groundwork for future changes to ifqueue,
which among other benefits, will also hide the ALTQ ABI from drivers.
 1.189.6.1.2.2 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.189.6.1.2.1 16-Nov-2023  thorpej Clean up the locking protocol around altq_etherclassify(). It's no longer
required to acquire KERNEL_LOCK *just* because ALTQ is compiled into the
kernel; you only have to acquire it if ALTQ is enabled on the interface
in question.
 1.189.4.3 15-May-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1116):

sys/net/if_bridge.c: revision 1.199
sys/net/if_bridgevar.h: revision 1.40

bridge: resolve a race condition in bridge_stop()
Without BRIDGE_LOCK, the callout can be scheduled after callout_halt.

Note that we should avoid depending on IFF_RUNNING which can be racy.
Suggested by riastradh at

https://mail-index.netbsd.org/source-changes-d/2025/04/16/msg014470.html

PR kern/59340
 1.189.4.2 05-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #818):

sys/net/if_bridgevar.h: revision 1.39
sbin/brconfig/brconfig.c: revision 1.18
tests/net/if_bridge/unicast.pcap.uue: revision 1.1
tests/net/if_bridge/t_bridge.sh: revision 1.20
sbin/brconfig/brconfig.8: revision 1.21
tests/net/if_bridge/t_bridge.sh: revision 1.21
sys/net/if_bridge.c: revision 1.194
tests/net/if_bridge/Makefile: revision 1.4
distrib/sets/lists/tests/mi: revision 1.1336
tests/net/if_bridge/broadcast.pcap.uue: revision 1.1

bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.

tests: dedup test scripts like others

brconfig: add protect/-protect commands

It marks/clears a specified interface "protected".
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.

distrib: install uuencoded pcap files for testing
 1.189.4.1 20-Jul-2024  martin Pull up following revision(s) (requested by rin in ticket #763):

sys/net/if_bridge.c: revision 1.192

bridge: add missing curlwp_bind() for pppoe
From knakahara@
 1.190.2.1 02-Aug-2025  perseant Sync with HEAD
 1.40 22-Apr-2025  ozaki-r bridge: resolve a race condition in bridge_stop()

Without BRIDGE_LOCK, the callout can be scheduled after callout_halt.

Note that we should avoid depending on IFF_RUNNING which can be racy.
Suggested by riastradh at https://mail-index.netbsd.org/source-changes-d/2025/04/16/msg014470.html

PR kern/59340
 1.39 03-Sep-2024  ozaki-r bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.
 1.38 16-Jul-2024  ozaki-r bridge: get rid of unnecessary macros for pserialize
 1.37 30-Sep-2021  yamaguchi branches: 1.37.4; 1.37.10;
bridge: Register bridge_ifdetach to ether_ifdetach hook
 1.36 30-Sep-2021  yamaguchi bridge: Register bridge_calc_link_state to link-state change hook
 1.35 27-Sep-2020  roy bridge: Calculate link state as the best link state of any member

If any member is LINK_STATE_UP then it's LINK_STATE_UP.
Otherwise if any member is LINK_STATE_UNKNOWN then it's LINK_STATE_UNKNOWN.
Otherwise it's LINK_STATE_DOWN.
 1.34 30-Apr-2020  jdolecek add sc_capenable member, forgot to commit
 1.33 12-Dec-2018  rin branches: 1.33.4;
PR kern/53562

Handle TX offload in software when a packet is sent via
bridge_output(). We can send it as is in the following
exceptional cases:

For unicast:

(1) When the destination interface is the same as source.

(2) When the destination supports all TX offload options
specified in a packet.

For multicast/broadcast:

(3) When all the members of the bridge support the specified
TX offload options.

For (3), add sc_csum_flags_tx flag to bridge softc, which is
logical AND b/w capabilities of TX offload options in member
interface (ifp->if_csum_flags_tx). The flag is updated when a
member is (i) added to or (ii) removed from a bridge, or (iii)
if_csum_flags_tx flag of a member interface is manipulated via
ifconfig(8).

Turn on M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx flag when
TSO[46] is enabled for that interface.

OK msaitoh thorpej
 1.32 18-Apr-2018  ozaki-r branches: 1.32.2;
bridge: use pslist(9) for rtlist and rthash

The change fixes race conditions on list operations. One example is that a
reader may see invalid pointers on a looking item in a list due to lack of
membar_producer.
 1.31 28-Apr-2016  ozaki-r branches: 1.31.10; 1.31.16;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.30 19-Apr-2016  ozaki-r Apply psref(9) to bridge(4)

Note that there is an issue that ioctls for an interface and a destruction
of the interface can run in parallel and it causes race conditions on
bridge as well (it rarely happens). The issue will be addressed in the
interface common code (if.c).
 1.29 19-Apr-2016  ozaki-r Remove BRIDGE_MPSAFE switch and enable MP-safe code by default

We need to enable it by default because bridge_input now runs
in softint, but bridge_input w/o BRIDGE_MPSAFE was designed as
it runs in hardware interrupt.

Note that there remains a racy code in bridge_output; it will be
solved in the upcoming change (applying psref(9)).
 1.28 11-Apr-2016  ozaki-r Move #include <sys/pslist.h> inside #ifdef _KERNEL for building brconfig
 1.27 11-Apr-2016  ozaki-r Use pslist(9) in bridge(4)

This adds missing memory barriers to list operations for pserialize.
 1.26 15-Feb-2016  ozaki-r Simplify bridge(4)

Thanks to introducing softint-based if_input, the entire bridge code now
never run in hardware interrupt context. So we can simplify the code.

- Remove spin mutexes
- They were needed because some code of bridge could run in
hardware interrupt context
- We now need only an adaptive mutex for each shared object
(a member list and a forwarding table)
- Remove pktqueue
- bridge_input is already in softint, using another softint
(for bridge_forward) is useless
- Packet distribution should be down at device drivers
 1.25 10-Feb-2016  ozaki-r Don't share struct work, instead have one per softc

Pointed out by riastradh@
 1.24 01-Jun-2015  matt Modify the BRDGGIFS and BRDGRTS cmds to be more COMPAT_NETBSD32 friendly.
(XXX whitespace)
 1.23 16-Jan-2015  ozaki-r Introduce defflag for NET_MPSAFE
 1.22 08-Jan-2015  ozaki-r Use pserialize for rtlist in bridge

This change enables lockless accesses to bridge rtable lists.
See locking notes in a comment to know how pserialize and
mutexes are used. Some functions are rearranged to use
pserialize. A workqueue is introduced to use pserialize in
bridge_rtage via bridge_timer callout.

As usual, pserialize and mutexes are used only when NET_MPSAFE
on. On the other hand, the newly added workqueue is used
regardless of NET_MPSAFE on or off.
 1.21 31-Dec-2014  ozaki-r Use pserialize in bridge

This change enables lockless accesses to bridge member lists.
See locking notes in a comment to know how pserialize and
mutexes are used.

This change also provides support for softint-based interrupt
handling; pserialize readers can run in both HW interrupt and
softint contexts.

As usual, pserialize is used only when NET_MPSAFE on.
 1.20 14-Jul-2014  ozaki-r branches: 1.20.4;
Make bridge MPSAFE

- Introduce BRIDGE_MPSAFE
- It's enabled only when NET_MPSAFE is defined
in if.h or the kernel config
- Add iflist and rtlist mutex locks
- Locking iflist is performance sensitive,
so it's not used when !BRIDGE_MPSAFE
- Add bif object reference counting
- It enables fine-grain locking for bridge member lists
by allowing to not hold a lock during touching a bif
- bridge_release_member is added to decrement the
reference count
- A condition variable is added to do bridge_delete_member
gracefully
- Add if_bridgeif to ifnet
- It's a shortcut to a bif object of a bridge member
- It reduces a bif lookup cost and so lock contention on iflist
- Make bridgestp MPSAFE too
 1.19 20-Jun-2014  ozaki-r Remove unnecessary sc_softintr
 1.18 17-Jun-2014  ozaki-r Restructure ether_input and bridge_input

The network stack of NetBSD is well organized and
layered. A packet reception is processed from a
lower layer to an upper layer one by one. However,
ether_input and bridge_input are not structured so.
bridge_input is called inside ether_input.

The new structure replaces ifnet#if_input of a bridge
member with bridge_input when the member is attached.
So a packet goes straight on a packet reception via
a bridge, bridge_input => ether_input => ip_input.

The change is part of a patch of Lloyd Parkes submitted
in PR 48104. Unlike the patch, the change doesn't
intend to change the behavior of the packet processing.
Another patch will fix PR 48104.
 1.17 16-Jun-2014  ozaki-r Include pktqueue.h only if _KERNEL
 1.16 16-Jun-2014  ozaki-r Use pktqueue for bridge forwarding queue and softint
 1.15 23-Aug-2012  drochner branches: 1.15.2; 1.15.12;
the address expire counter is just a time difference; it can turn
negative after the timer expired until the entry is deleted.
make it signed, so that we don't get output like
"00:1b:78:12:50:46 wm0 18446744073709551349 flags=0<>"
 1.14 04-Apr-2009  bouyer branches: 1.14.12;
Fix for if_start() and pfil_hook() being called from hardware interrupt
context (reported on various mailing-lists, and part of PR kern/41114,
causing panic in pf(4) and possibly ipf(4) when BRIDGE_IPF is used).
Defer bridge_forward() to a software interrupt; bridge_input() enqueues
mbufs to ifp->if_snd which is handled in bridge_forward().
 1.13 18-Jan-2009  mrg branches: 1.13.2;
Fix multiple problems:

* A sign extension error creating the bridge ID corrupted the
priority (always making it the maximum).
* Do not catch STP packets on an interface for which STP is not
enabled -- it's a violation of the spec, and causes STP to fail on
neighboring bridges.
* An optimization to bstp_input() -- some information is already
known when we call it.

contributed anonymously.
 1.12 11-Jan-2009  christos merge christos-time_t
 1.11 09-Jul-2007  ad branches: 1.11.28; 1.11.30; 1.11.34; 1.11.44; 1.11.46;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.10 04-Mar-2007  christos branches: 1.10.2; 1.10.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.9 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.8 10-Dec-2005  elad branches: 1.8.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.7 28-Jun-2005  seanb branches: 1.7.2;
- Rearranged layout of struct bridge_iflist slightly to
make members naturally aligned.
- This saves 8 bytes worth of pad.
 1.6 26-Feb-2005  perry nuke trailing whitespace
 1.5 16-Sep-2003  jdc branches: 1.5.8; 1.5.10;
Add filter/no filter flag parameter to bridge_enqueue().
 1.4 08-Jul-2003  itojun prototype must not have variable name
 1.3 19-Mar-2003  bouyer branches: 1.3.2;
Fix 2 bugs:
- initialise stp when the bridge is turned up, without this stp will keep
all interfaces disabled in a sequence like:
brconfig bridge0 add if0 add if1 stp if0 stp if1 up
- s/BRDGSPRI/BRDGSIFPRIO in brconfig.c:cmd_ifpriority()

add a command (ifpathcost) to change the stp path cost of the STP path cost of
an interface. Display the interface path cost with the others STP parameters.
 1.2 15-Feb-2003  perseant Add ipf packet-filtering option to if_bridge. The option is controlled at
compile-time by BRIDGE_IPF, and at runtime by brconfig with the {ipf,-ipf}
option on a per-bridge basis.

As a side-effect, add PFIL_HOOKS processing to if_bridge.
 1.1 17-Aug-2001  thorpej branches: 1.1.2; 1.1.4; 1.1.18;
Add support for building Ethernet bridges, based on Jason Wright's
bridge driver from OpenBSD, although the bridge code has been *heavily*
modified by me (the 802.1D code remains mostly unchanged from the
original).
 1.1.18.1 30-Jun-2003  grant Apply patch (requested by bouyer in ticket #1355):

Fix 2 bugs:
- initialise stp when the bridge is turned up, without this stp will
keep all interfaces disabled in a sequence like:
brconfig bridge0 add if0 add if1 stp if0 stp if1 up
- s/BRDGSPRI/BRDGSIFPRIO in brconfig.c:cmd_ifpriority()

add a command (ifpathcost) to change the stp path cost of the STP path
cost of an interface. Display the interface path cost with the others
STP parameters.
 1.1.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.1.4.1 17-Aug-2001  thorpej file if_bridgevar.h was added on branch kqueue on 2001-08-25 06:16:56 +0000
 1.1.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.1.2.1 17-Aug-2001  nathanw file if_bridgevar.h was added on branch nathanw_sa on 2001-08-24 00:12:08 +0000
 1.3.2.6 11-Dec-2005  christos Sync with head.
 1.3.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.3.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.5.8.1 29-Apr-2005  kent sync with -current
 1.7.2.3 03-Sep-2007  yamt sync with head.
 1.7.2.2 26-Feb-2007  yamt sync with head.
 1.7.2.1 21-Jun-2006  yamt sync with head.
 1.8.26.2 12-Mar-2007  rmind Sync with HEAD.
 1.8.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.10.4.1 11-Jul-2007  mjf Sync with head.
 1.10.2.1 01-Jul-2007  ad Adapt to callout API change.
 1.11.46.1 04-Apr-2009  snj Pull up following revision(s) (requested by bouyer in ticket #660):
sys/net/if_bridge.c: revision 1.65
sys/net/if_bridgevar.h: revision 1.14
Fix for if_start() and pfil_hook() being called from hardware interrupt
context (reported on various mailing-lists, and part of PR kern/41114,
causing panic in pf(4) and possibly ipf(4) when BRIDGE_IPF is used).
Defer bridge_forward() to a software interrupt; bridge_input() enqueues
mbufs to ifp->if_snd which is handled in bridge_forward().
 1.11.44.2 28-Apr-2009  skrll Sync with HEAD.
 1.11.44.1 19-Jan-2009  skrll Sync with HEAD.
 1.11.34.1 04-May-2009  yamt sync with head.
 1.11.30.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.11.28.1 17-Jan-2009  mjf Sync with HEAD.
 1.13.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.14.12.1 30-Oct-2012  yamt sync with head
 1.15.12.1 10-Aug-2014  tls Rebase.
 1.15.2.2 03-Dec-2017  jdolecek update from HEAD
 1.15.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.20.4.5 29-May-2016  skrll Sync with HEAD
 1.20.4.4 22-Apr-2016  skrll Sync with HEAD
 1.20.4.3 19-Mar-2016  skrll Sync with HEAD
 1.20.4.2 06-Jun-2015  skrll Sync with HEAD
 1.20.4.1 06-Apr-2015  skrll Sync with HEAD
 1.31.16.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.31.16.1 22-Apr-2018  pgoyette Sync with HEAD
 1.31.10.1 18-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #777):

tests/net/if_bridge/t_rtable.sh: revision 1.3
sys/net/if_bridge.c: revision 1.150-1.154
sys/net/if_bridgevar.h: revision 1.32

Remove obsolete NULL checks

Simplify bridge_rtnode_insert (NFC)

bridge: use pslist(9) for rtlist and rthash

The change fixes race conditions on list operations. One example is that a
reader may see invalid pointers on a looking item in a list due to lack of
membar_producer.

Add a test that checks if brconfig flush surely removes all entries

Get rid of a unnecessary semicolon
Pointed out by kamil@

Add missing PSLIST_ENTRY_INIT and PSLIST_ENTRY_DESTROY
 1.32.2.1 10-Jun-2019  christos Sync with HEAD
 1.33.4.1 15-May-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1953):

sys/net/if_bridge.c: revision 1.199
sys/net/if_bridgevar.h: revision 1.40

bridge: resolve a race condition in bridge_stop()
Without BRIDGE_LOCK, the callout can be scheduled after callout_halt.

Note that we should avoid depending on IFF_RUNNING which can be racy.
Suggested by riastradh at

https://mail-index.netbsd.org/source-changes-d/2025/04/16/msg014470.html

PR kern/59340
 1.37.10.1 02-Aug-2025  perseant Sync with HEAD
 1.37.4.2 15-May-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1116):

sys/net/if_bridge.c: revision 1.199
sys/net/if_bridgevar.h: revision 1.40

bridge: resolve a race condition in bridge_stop()
Without BRIDGE_LOCK, the callout can be scheduled after callout_halt.

Note that we should avoid depending on IFF_RUNNING which can be racy.
Suggested by riastradh at

https://mail-index.netbsd.org/source-changes-d/2025/04/16/msg014470.html

PR kern/59340
 1.37.4.1 05-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #818):

sys/net/if_bridgevar.h: revision 1.39
sbin/brconfig/brconfig.c: revision 1.18
tests/net/if_bridge/unicast.pcap.uue: revision 1.1
tests/net/if_bridge/t_bridge.sh: revision 1.20
sbin/brconfig/brconfig.8: revision 1.21
tests/net/if_bridge/t_bridge.sh: revision 1.21
sys/net/if_bridge.c: revision 1.194
tests/net/if_bridge/Makefile: revision 1.4
distrib/sets/lists/tests/mi: revision 1.1336
tests/net/if_bridge/broadcast.pcap.uue: revision 1.1

bridge: implement interface protection

It enables a feature similar to "protected-port" or "isolation" in some
router products by marking member interfaces protected; when a frame
arrives on a protected interface and is being forwarded to another
protected interface, the frame will be discarded.

The code is developed by the SEIL team at IIJ.

tests: dedup test scripts like others

brconfig: add protect/-protect commands

It marks/clears a specified interface "protected".
tests, bridge: add tests for interface protection

The original author of the test is k-goda@IIJ. ozaki-r improved
the test slightly.

distrib: install uuencoded pcap files for testing
 1.31 07-Nov-2022  msaitoh Increase sdl_data so that more then IFNAMSIZ bytes are available (again).

COMPAT_9 is not required.

- The getifaddrs(3) function has no problem. The routing message has no
problem because struct rtm_msglen has rtm_msglen and we can get the next
message using with it. There is no any kernel data structure which has
struct sockaddr_dl foobadr[xxx] array.

- A data passed from userland and a kernel data are compared with
sockaddr_cmp(). The return value is used to check if the size is
inadequate or not.

- In the kernel, sdl_len is not directly used for the length of memcpy()
but the sockaddr_dl_measure() is used for it.
 1.30 27-Oct-2022  msaitoh Revert if_dl.h change. It'll be commited with the COMPAT_9 code in future.
 1.29 24-Oct-2022  msaitoh Increase sdl_data so that more then IFNAMSIZ bytes are available.

- Increase the size of dl_data[] from 12 to 24.
- Same as OpenBSD.
 1.28 30-Apr-2019  kre Whitespace consistency. NFC.
 1.27 29-Apr-2019  roy Move lla_snprintf from if_arp.c to dl_print.c
 1.26 03-Dec-2014  christos branches: 1.26.18;
add DL_PRINT macro
 1.25 02-Dec-2014  christos missed _
 1.24 02-Dec-2014  christos - split struct dladdr out of struct sockaddr_dl
- add routines to print struct sockaddr_dl and struct dladdr
- make if_dl.h idempotent
 1.23 20-Feb-2008  matt branches: 1.23.54; 1.23.74;
Revert change of char to int8_t.
 1.22 20-Feb-2008  matt s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.21 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.20 07-Aug-2007  dyoung branches: 1.20.2; 1.20.4;
As a stopgap measure to avoid dependency on net/if.h, don't use
IFNAMSIZ.
 1.19 07-Aug-2007  dyoung Lengthen sockaddr_dl so that a 16-byte FireWire address will fit
into sdl_data[].

Move the macro satocsdl() to net/if_dl.h, and introduce satosdl().

Add some helpers for initializing sockaddr_dl (sockaddr_dl_init),
for finding out the length to put in a sockaddr_dl's sdl_len member
(sockaddr_dl_measure), and for setting the link-layer address in
a sockaddr_dl to a new value (sockaddr_dl_setaddr).

Make sockaddr_copy() panic if the caller tries to copy a sockaddr
to a destination where it will not fit.
 1.18 11-Dec-2005  thorpej branches: 1.18.30; 1.18.40; 1.18.44;
ANSI function decls and application of static.
 1.17 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.16 31-May-2005  christos branches: 1.16.2;
change casts back to char * and explain why.
 1.15 30-May-2005  christos Add a CLLADDR macro which is the same as LLADDR only const. Make both macros
return void pointers.
 1.14 26-Feb-2005  perry nuke trailing whitespace
 1.13 20-Nov-2004  wiz branches: 1.13.4; 1.13.6;
Apply patch from PR 23990 by Greg Troxel (s/AF_DLI/AF_LINK/ in a comment):
<net/if_dl.h> defines struct sockaddr_dl. On the line defining member
"sdl_family" (which overlaps "sa_family" in struct sockaddr), the
comment says AF_DLI.

But,
1) AF_DLI is said to be a DEC Direct data link interface
(sys/socket.h)
2) The kernel actually sends sockaddr_dl structs with AF_LINK.
 1.12 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.11 28-Jul-2000  kleink branches: 1.11.24;
Avoid recursion with traditional cpp.
 1.10 26-Jun-2000  kleink Define sa_family_t and use it for sdl_family.
 1.9 09-Feb-1998  perry branches: 1.9.14;
add multiple inclusion protection (and cleanup).
 1.8 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.7 12-Mar-1995  cgd punt on using int8 types for chars, at least for now. char is 8 byts anyway.
 1.6 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.24.7 11-Dec-2005  christos Sync with head.
 1.11.24.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.11.24.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.11.24.4 29-Nov-2004  skrll Sync with HEAD.
 1.11.24.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.24.2 18-Sep-2004  skrll Sync with HEAD.
 1.11.24.1 03-Aug-2004  skrll Sync with HEAD
 1.13.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.13.4.1 29-Apr-2005  kent sync with -current
 1.16.2.3 27-Feb-2008  yamt sync with head.
 1.16.2.2 03-Sep-2007  yamt sync with head.
 1.16.2.1 21-Jun-2006  yamt sync with head.
 1.18.44.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.18.44.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.18.40.2 03-Sep-2007  skrll Sync with HEAD.
 1.18.40.1 15-Aug-2007  skrll Sync with HEAD.
 1.18.30.2 09-Oct-2007  ad Sync with head.
 1.18.30.1 20-Aug-2007  ad Sync with HEAD.
 1.20.4.2 07-Aug-2007  dyoung As a stopgap measure to avoid dependency on net/if.h, don't use
IFNAMSIZ.
 1.20.4.1 07-Aug-2007  dyoung file if_dl.h was added on branch matt-mips64 on 2007-08-07 04:59:47 +0000
 1.20.2.2 23-Mar-2008  matt sync with HEAD
 1.20.2.1 06-Nov-2007  matt sync with HEAD
 1.23.74.1 06-Apr-2015  skrll Sync with HEAD
 1.23.54.1 03-Dec-2017  jdolecek update from HEAD
 1.26.18.1 10-Jun-2019  christos Sync with HEAD
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file if_dummy.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.9 27-Feb-2018  maxv Remove the Econet code. It was part of acorn26, which was removed a
month ago.
 1.8 20-Feb-2008  matt s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.7 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.6 10-Dec-2005  elad branches: 1.6.46; 1.6.52; 1.6.56; 1.6.60;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.5 26-Feb-2005  perry branches: 1.5.4;
nuke trailing whitespace
 1.4 17-Sep-2001  bjh21 branches: 1.4.2; 1.4.4; 1.4.22; 1.4.30; 1.4.32;
Add retry mechanisms for Econet, so that if a four-way handshake doesn't
complete for some reason, we defer it for a bit and then try again. This
gets ping down to 0% packet loss.

Of course, ping _should_ have been at 0% packet loss anyway, and that's the
next thing to deal with.
 1.3 16-Sep-2001  bjh21 Add support for incoming IP broadcast packets. The protocol for this is
worked out by observing RISC iX's behaviour, so it may be technically
wrong. The only implementations of IP-over-Econet for which I've got
sources don't support broadcasts.

Tested using broadcast ping from RISC iX to NetBSD, and using rwhod.
 1.2 15-Sep-2001  bjh21 Add minimal IP-over-Econet support and a load of bug-fixes. I can ping,
unreliably, between my RISC iX and NetBSD boxes with this. There's a lot
of work to go before it's solid, though.
 1.1 10-Sep-2001  bjh21 branches: 1.1.2;
Add MI Econet support. This is lacking any interfaces to higher-layer
protocols, and lacking any timeouts, but it basically works, doing four-way
handshakes in both directions and incoming Machine Peek operations.

Oh, and Econet is Acorn's ancient, proprietary 500kbit/s networking
technology.
 1.1.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.1.2.1 10-Sep-2001  thorpej file if_eco.h was added on branch kqueue on 2001-09-13 01:16:21 +0000
 1.4.32.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.4.30.1 29-Apr-2005  kent sync with -current
 1.4.22.2 11-Dec-2005  christos Sync with head.
 1.4.22.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.4.4.2 01-Oct-2001  fvdl Catch up with -current.
 1.4.4.1 17-Sep-2001  fvdl file if_eco.h was added on branch thorpej-devvp on 2001-10-01 12:47:35 +0000
 1.4.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.4.2.1 17-Sep-2001  nathanw file if_eco.h was added on branch nathanw_sa on 2001-09-21 22:36:45 +0000
 1.5.4.3 27-Feb-2008  yamt sync with head.
 1.5.4.2 21-Jan-2008  yamt sync with head
 1.5.4.1 21-Jun-2006  yamt sync with head.
 1.6.60.1 02-Jan-2008  bouyer Sync with HEAD
 1.6.56.1 26-Dec-2007  ad Sync with head.
 1.6.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.6.46.2 23-Mar-2008  matt sync with HEAD
 1.6.46.1 09-Jan-2008  matt sync with HEAD
 1.52 27-Feb-2018  maxv Remove the Econet code. It was part of acorn26, which was removed a
month ago.
 1.51 31-Jan-2017  maxv Correctly handle the return value of arpresolve, otherwise we either leak
memory or use some we already freed.

Sent on tech-net, ok christos
 1.50 24-Jan-2017  maxv Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.49 03-Oct-2016  ozaki-r branches: 1.49.2;
Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.48 20-Jun-2016  knakahara branches: 1.48.2;
apply if_start_lock() to L2 callers which call ifp->if_start() of device derivers
 1.47 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.46 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.45 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.44 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.43 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.42 20-May-2015  ozaki-r Remove leftover use of AF_NS and NS option

Unnecessary NETISR_NS is also removed.
 1.41 16-Nov-2014  ozaki-r branches: 1.41.2;
Replace callout_stop with callout_halt

In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.

Discussed with martin@ and riastradh@.
 1.40 05-Jun-2014  rmind branches: 1.40.2;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.39 15-May-2014  msaitoh Put schednetisr() into splnet()/splx() pair.
This might avoids delay of processing a packet.
 1.38 04-Aug-2013  kiyohara branches: 1.38.2;
Fix build failed, if undef INET.
 1.37 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.36 20-Nov-2011  kiyohara branches: 1.36.4; 1.36.8; 1.36.10; 1.36.12; 1.36.18;
Fix build failed. Include if_inarp.h.
 1.35 05-Apr-2010  joerg branches: 1.35.8;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.34 19-Jan-2010  pooka branches: 1.34.2; 1.34.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.33 20-Nov-2009  christos ar_tha() can return NULL; treat this as an error.
 1.32 18-Mar-2009  cegger bzero -> memset
 1.31 07-Jan-2009  bjh21 branches: 1.31.2;
Make Econet code compile again.
 1.30 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.29 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.28 12-Mar-2008  dyoung branches: 1.28.4; 1.28.10; 1.28.12;
Make some cosmetic changes:

Use fewer 'error = ...; break;' statements and more 'return
...;'

Make the SIOCSIFFLAGS case more clear by using a switch
statement instead of an if-else if-else chain.

Shorten a staircase, and remove two unnecessary curly
braces.
 1.27 20-Feb-2008  matt branches: 1.27.2; 1.27.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.26 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.25 30-Aug-2007  dyoung branches: 1.25.6;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.24 26-Aug-2007  dyoung branches: 1.24.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.23 09-Jul-2007  ad branches: 1.23.2; 1.23.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.22 04-Mar-2007  christos branches: 1.22.2; 1.22.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.21 07-Jun-2006  kardel branches: 1.21.12;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.20 22-Apr-2006  simonb branches: 1.20.2;
One __KERNEL_RCSID() should be enough for this file.
 1.19 12-Feb-2006  bjh21 branches: 1.19.2; 1.19.4; 1.19.6;
Make Econet code compile again.
 1.18 11-Dec-2005  christos branches: 1.18.2; 1.18.4; 1.18.6;
merge ktrace-lwp.
 1.17 18-Aug-2005  yamt - introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.16 31-Mar-2005  christos branches: 1.16.2;
factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.15 26-Feb-2005  perry nuke trailing whitespace
 1.14 21-Apr-2004  itojun branches: 1.14.4; 1.14.6;
kill sprintf, use snprintf
 1.13 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.12 23-Jun-2003  martin branches: 1.12.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.11 17-Jan-2003  itojun switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.10 12-Nov-2001  lukem add RCSIDs
 1.9 12-Nov-2001  bjh21 Use the (not so-)newly-allocated IFT_ECONET rather than IFT_OTHER. This means
that programs start printing Econet link-layer addresses corrctly.
 1.8 17-Sep-2001  bjh21 branches: 1.8.2; 1.8.4;
Add retry mechanisms for Econet, so that if a four-way handshake doesn't
complete for some reason, we defer it for a bit and then try again. This
gets ping down to 0% packet loss.

Of course, ping _should_ have been at 0% packet loss anyway, and that's the
next thing to deal with.
 1.7 16-Sep-2001  bjh21 Add support for incoming IP broadcast packets. The protocol for this is
worked out by observing RISC iX's behaviour, so it may be technically
wrong. The only implementations of IP-over-Econet for which I've got
sources don't support broadcasts.

Tested using broadcast ping from RISC iX to NetBSD, and using rwhod.
 1.6 16-Sep-2001  bjh21 eco_input():
Use m_copydata() to preserve the Econet header, so we don't depend on
notionally-unused areas of an mbuf remaining untouched.
Check that ARP-over-Econet requests are exactly eight bytes long.
Use m_pullup() before trusting mtod().

Between them, these make reception of unicast ARP responses work properly.
 1.5 15-Sep-2001  bjh21 econet_inputframe: Check the header of each frame, and its length, to ensure
it looks like what we expect. This should help detect frames garbled by the
interface driver.
 1.4 15-Sep-2001  bjh21 Add minimal IP-over-Econet support and a load of bug-fixes. I can ping,
unreliably, between my RISC iX and NetBSD boxes with this. There's a lot
of work to go before it's solid, though.
 1.3 13-Sep-2001  bjh21 Remember to call eco_input() for incoming broadcasts.
 1.2 13-Sep-2001  bjh21 Add routing boilerplate to eco_output, verbatim from ether_output.
Update copyright notice to include UCB in consequence.
 1.1 10-Sep-2001  bjh21 branches: 1.1.2;
Add MI Econet support. This is lacking any interfaces to higher-layer
protocols, and lacking any timeouts, but it basically works, doing four-way
handshakes in both directions and incoming Machine Peek operations.

Oh, and Econet is Acorn's ancient, proprietary 500kbit/s networking
technology.
 1.1.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.1.2.1 10-Sep-2001  thorpej file if_ecosubr.c was added on branch kqueue on 2001-09-13 01:16:21 +0000
 1.8.4.2 01-Oct-2001  fvdl Catch up with -current.
 1.8.4.1 17-Sep-2001  fvdl file if_ecosubr.c was added on branch thorpej-devvp on 2001-10-01 12:47:35 +0000
 1.8.2.4 17-Jan-2003  thorpej Sync with HEAD.
 1.8.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.8.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.8.2.1 17-Sep-2001  nathanw file if_ecosubr.c was added on branch nathanw_sa on 2001-09-21 22:36:45 +0000
 1.12.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.12.2.5 01-Apr-2005  skrll Sync with HEAD.
 1.12.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.12.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.1 03-Aug-2004  skrll Sync with HEAD
 1.14.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.14.4.1 29-Apr-2005  kent sync with -current
 1.16.2.5 17-Mar-2008  yamt sync with head.
 1.16.2.4 27-Feb-2008  yamt sync with head.
 1.16.2.3 11-Feb-2008  yamt sync with head.
 1.16.2.2 03-Sep-2007  yamt sync with head.
 1.16.2.1 21-Jun-2006  yamt sync with head.
 1.18.6.3 01-Jun-2006  kardel Sync with head.
 1.18.6.2 22-Apr-2006  simonb Sync with head.
 1.18.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.18.4.1 09-Sep-2006  rpaulo sync with head
 1.18.2.1 18-Feb-2006  yamt sync with head.
 1.19.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.19.4.1 11-May-2006  elad sync with head
 1.19.2.2 26-Jun-2006  yamt sync with head.
 1.19.2.1 24-May-2006  yamt sync with head.
 1.20.2.1 19-Jun-2006  chap Sync with head.
 1.21.12.1 12-Mar-2007  rmind Sync with HEAD.
 1.22.4.1 11-Jul-2007  mjf Sync with head.
 1.22.2.2 09-Oct-2007  ad Sync with head.
 1.22.2.1 01-Jul-2007  ad Adapt to callout API change.
 1.23.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.23.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.24.2.2 23-Mar-2008  matt sync with HEAD
 1.24.2.1 06-Nov-2007  matt sync with HEAD
 1.25.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.27.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.27.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.27.2.1 24-Mar-2008  keiichi sync with head.
 1.28.12.2 28-Apr-2009  skrll Sync with HEAD.
 1.28.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.28.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.28.4.3 11-Aug-2010  yamt sync with head.
 1.28.4.2 11-Mar-2010  yamt sync with head
 1.28.4.1 04-May-2009  yamt sync with head.
 1.31.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.34.4.1 30-May-2010  rmind sync with head
 1.34.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.35.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.35.8.1 17-Apr-2012  yamt sync with head
 1.36.18.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.36.12.2 18-May-2014  rmind sync with head
 1.36.12.1 28-Aug-2013  rmind sync with head
 1.36.10.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.36.8.2 03-Dec-2017  jdolecek update from HEAD
 1.36.8.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.36.4.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.36.4.1 07-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #1201):
sys/kern/kern_ktrace.c: revision 1.166
sys/dev/isa/aps.c: revision 1.16
sys/dev/sysmon/sysmonvar.h: revision 1.45
sys/dev/ir/irframe_tty.c: revision 1.60
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.111-1.112 (patch)
sys/dev/pci/pccbb.c: revision 1.207
sys/dev/wscons/wskbd.c: revision 1.135
sys/dev/usb/ohci.c: revision 1.254
sys/net/if_ecosubr.c: revision 1.41
sys/dev/pcmcia/btbc.c: revision 1.17
sys/arch/x86/x86/via_padlock.c: revision 1.23
sys/dev/sdmmc/sdmmc.c: revision 1.23 (patch)
sys/dev/bluetooth/btkbd.c: revision 1.17
sys/dev/bluetooth/bcsp.c: revision 1.25
sys/arch/x86/pci/fwhrng.c: revision 1.8
sys/dev/ic/nslm7x.c: revision 1.61
share/man/man9/callout.9: revision 1.28 (patch)

Replace callout_stop with callout_halt and ensure the callout
is not running before destroying it.
 1.38.2.1 10-Aug-2014  tls Rebase.
 1.40.2.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.40.2.1 01-Dec-2014  martin branches: 1.40.2.1.2; 1.40.2.1.4;
Pull up following revision(s) (requested by ozaki-r in ticket #279):
sys/kern/kern_ktrace.c: revision 1.166
sys/dev/isa/aps.c: revision 1.16
sys/dev/sysmon/sysmonvar.h: revision 1.45
sys/dev/ir/irframe_tty.c: revision 1.60
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.111
sys/dev/sysmon/sysmon_envsys_events.c: revision 1.112
sys/dev/pci/pccbb.c: revision 1.207
sys/dev/wscons/wskbd.c: revision 1.135
sys/dev/usb/ohci.c: revision 1.254
sys/net/if_ecosubr.c: revision 1.41
sys/dev/pcmcia/btbc.c: revision 1.17
sys/arch/x86/x86/via_padlock.c: revision 1.23
sys/dev/sdmmc/sdmmc.c: revision 1.23
sys/dev/bluetooth/btkbd.c: revision 1.17
sys/dev/bluetooth/bcsp.c: revision 1.25
sys/arch/x86/pci/fwhrng.c: revision 1.8
sys/dev/ic/nslm7x.c: revision 1.61
share/man/man9/callout.9: revision 1.28
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
Discussed with martin@ and riastradh@.
Make it clear that we should use not callout_stop but callout_halt
before callout_destroy
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
In this case, we need to pass an interlock to callout_halt to wait for
the callout complete.
Reviewed by riastradh@.
Kill sme_callout_mtx and use sme_mtx instead
We can use sme_mtx for the callout as well. Actually we should do so
because sme_events_list and some other data that are touched in the
callout should be protected by sme_mtx, not sme_callout_mtx.
Discussed with riastradh@ in
http://mail-index.netbsd.org/tech-kern/2014/11/11/msg017956.html
Replace callout_stop with callout_halt
In order to call callout_destroy for a callout safely, we have to ensure
the function of the callout is not running and pending. To do so, we should
use callout_halt, not callout_stop.
In this case, we need to pass an interlock to callout_halt to wait for
the callout complete. And also we make sure that SME_CALLOUT_INITIALIZED
is unset before calling callout_halt to prevent the callout from calling
callout_schedule. This is the same as what we did in sys/netinet6/mld6.c@1.61.
Reviewed by riastradh@.
 1.40.2.1.4.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.40.2.1.2.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.41.2.6 05-Feb-2017  skrll Sync with HEAD
 1.41.2.5 05-Oct-2016  skrll Sync with HEAD
 1.41.2.4 09-Jul-2016  skrll Sync with HEAD
 1.41.2.3 22-Apr-2016  skrll Sync with HEAD
 1.41.2.2 22-Sep-2015  skrll Sync with HEAD
 1.41.2.1 06-Jun-2015  skrll Sync with HEAD
 1.48.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.48.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.49.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.92 04-Oct-2025  thorpej Add a shared function to query the common properties used for configuring
an Ethernet address.
 1.91 05-Feb-2024  andvar fix various typos in comments.
 1.90 01-Aug-2023  mrg fix simple mis-matched function prototype and definitions.

most of these are like, eg

void foo(int[2]);

with either of these

void foo(int*) { ... }
void foo(int[]) { ... }

in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.

found by GCC 12.
 1.89 20-Jun-2022  yamaguchi bridge(4): support VLAN frames stripped by hardware tagging
 1.88 15-Nov-2021  yamaguchi introduced APIs to configure VLAN TAG to ethernet devices
 1.87 30-Sep-2021  yamaguchi Provide a hook point called when ether_ifdetach is called
 1.86 14-Feb-2021  roy if_ether: revert prior alignment checks

Apparently not needed as our drivers ensure this.
 1.85 13-Feb-2021  roy if_ether: Ensure that ether_header is aligned
 1.84 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.83 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.82 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.81 17-Jul-2019  msaitoh branches: 1.81.10;
Implement VLAN hardware filter function(ETHERCAP_VLAN_HWFILTER).
First proposed by jmcneill in 2017 and modified by me.

How to use:

- Set callback function:

ether_set_vlan_cb(struct ethercom *, ether_vlancb_t)

- Callback. This function is called when a vlan is attached/detached to the
parent interface:

int (*ether_vlancb_t)(struct ethercom *ec, uint16_t vlanid, bool set);

- ifconfig(8)

ifconfig ixg0 [-]vlan-hwfilter

Note that ETHERCAP_VLAN_HWFILTER is set by default on ixg(4) because
the PF driver usually enable "all block" filter by default.
 1.80 17-Jul-2019  msaitoh KNF. No functional change.
 1.79 29-May-2019  msaitoh Even if we don't use MII(4), use the common path of SIOC[GS]IFMEDIA in
sys/net/if_ethersubr.c if we can.
- Add ec_ifmedia into struct ethercom.
- ec_mii in struct ethercom is kept and used as it is. It might be used in
future. Note that some Ethernet drivers which _DOESN'T_ use mii(4) use
ec_mii for keeping the if_media. Those should be changed in future.
 1.78 15-May-2019  ozaki-r Store IFF_ALLMULTI in ec_flags instead of if_flags to avoid data races

IFF_ALLMULTI is set/unset to if_flags via if_mcast_op. To avoid data races on
if_flags, IFNET_LOCK was added for if_mcast_op. Unfortunately it produces
a deadlock so we want to remove added IFNET_LOCK by avoiding the data races by
another approach.

This fix introduces ec_flags to struct ethercom and stores IFF_ALLMULTI to it.
ec_flags is protected by ETHER_LOCK and thus IFNET_LOCK is no longer necessary
for if_mcast_op. Note that the fix is applied only to MP-safe drivers that
the data races matter.

In the kernel, IFF_ALLMULTI is set by a driver and used by the driver itself.
So changing the storing place doesn't break anything. One exception is
ioctl(SIOCGIFFLAGS); we have to include IFF_ALLMULTI in a result if needed to
export the flag as well as before.

A upcoming commit will remove IFNET_LOCK.

PR kern/54189
 1.77 05-Mar-2019  msaitoh Centralize ETHER_ALIGN into net/if_ether.h. Note that this commit also changes
if_upgt.c's ETHER_ALIGN from 0 to 2.
 1.76 21-Dec-2018  msaitoh Add ETHERCAP_VLAN_HWFILTER and ETHERCAP_EEE.
 1.75 14-Jun-2018  yamaguchi branches: 1.75.2;
Remove ETHER_LOOKUP_MULTI()

The macro has been replaced with a function.
ok ozaki-r@
 1.74 14-Jun-2018  yamaguchi Replace macros related to multicast address with inline functions

ok ozaki-r@
 1.73 14-Jun-2018  yamaguchi Move macros related to multicast address into #ifdef _KERNEL

Those macros and structure are only used in the kernel.
reviewed by ozaki-r@n.o, thanks.
 1.72 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.71 15-Jan-2018  maxv branches: 1.71.2;
Mostly style, and add a bunch of KASSERTs.
 1.70 22-Nov-2017  msaitoh No functional change:
- u_int16_t -> uint16_t
- u_short -> uint16_t
- tag_hash_func -> vlan_tag_hash
- 0 -> NULL because vlr_parent is a pointer.
 1.69 22-Nov-2017  msaitoh Fix a bug that a vlan packet which has priority or CFI bit in the tag causes
panic.
 1.68 28-Sep-2017  christos - add a constant for the vlan mask.
- enforce that we have a tag before we get it.
 1.67 26-Sep-2017  knakahara VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.

I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html

XXX need pullup to -8 branch
 1.66 28-Dec-2016  ozaki-r branches: 1.66.8;
Protect ec_multi* with mutex

The data can be accessed from sysctl, ioctl, interface watchdog
(if_slowtimo) and interrupt handlers. We need to protect the data against
parallel accesses from them.

Currently the mutex is applied to some drivers, we need to apply it to all
drivers in the future.

Note that the mutex is adaptive one for ease of implementation but some
drivers access the data in interrupt context so we cannot apply the mutex
to every drivers as is. We have two options: one is to replace the mutex
with a spin one, which requires some additional works (see
ether_multicast_sysctl), and the other is to modify the drivers to access
the data not in interrupt context somehow.
 1.65 19-Nov-2015  christos branches: 1.65.2;

Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig@espci.fr). Factor out the vlan_mtu enabling and
disabling code.
 1.64 28-Jul-2014  ozaki-r branches: 1.64.2; 1.64.4;
Add a mutex for global variables of if_ethersubr.c

To initialize the mutex, we introduce etherinit that is called from ifinit1.
 1.63 10-Jun-2014  joerg Introduce new sysctls for obtaining interface-specific addresses:
- net.sdl for the active link-layer adddress (the MAC)
- net.ether.multicast for the Ethernet multicast addresses
- net.inet6.multicast for the IPv6 multicast groups
- net.inet6.multicast_kludge for temporarily removed multicast groups

Use this sysctls for replacing the kmem grovelling in ifmcstat(8).
 1.62 23-Apr-2014  pooka add a mask for currently valid ETHERCAP flags
 1.61 31-Oct-2012  msaitoh branches: 1.61.2; 1.61.10;
Add SIOCGETHERCAP ioctl.
There was no way to know the setting of ec_capabilities and ec_capenable
other than grepping the source.

See http://mail-index.netbsd.org/tech-kern/2010/07/28/msg008613.html
 1.60 25-Oct-2012  msaitoh Move the prototype definition of ether_input() from if.h to if_ether.h.
 1.59 30-Sep-2012  dholland Requires <net/if.h> to be compilable by itself.
 1.58 19-May-2010  christos branches: 1.58.8; 1.58.18;
Replace ether_nonstatic_aton with a
- better named one
- not suffering from buffer oveflow
- simpler
- handling different separators
- returning error codes for errors

Some ideas from one posted on tech-net by Jonathan A. Kollasch
 1.57 19-May-2010  jakllsch Changes to ether_nonstatic_aton():

Be more leinent on input string format. Each nibble pair may optionally be
followed by any of ':', '-', '.' or ' '.

Make source string const and work on a temporary copy. The caller may not
expect their string to be destroyed.
 1.56 18-Mar-2009  cegger branches: 1.56.2; 1.56.4;
bcmp -> memcmp
 1.55 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.54 07-Nov-2008  dyoung branches: 1.54.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.53 25-Jul-2008  dsl branches: 1.53.2;
Move the body of VLAN_INPUT_TAG() into a static inline function.
(Maybe it shouldn't even be inline - but I'd have to work out where to put it).
VLAN_INPUT_TAG() now calls vlan_input_tag() and does '_errcase' when it fails.
In reality the callers should all be changed, _errcase is ALWAYS continue,
which used to 'continue' (ie break) the do .. while (0) loop - not the
intended action!
Found by ramming all the kernel sources through a modified lint and grepping
for a specific error.
While here enclose the body of VLAN_OUTPUT_TAG() in ().
 1.52 25-Jul-2008  christos PR/39203: Paul Ripke: PPPoE issues with broken MTU/MRU implementations
Allow larger frames for systems that don't negotiate MTU/MRU properly.
 1.51 22-May-2008  dyoung branches: 1.51.2; 1.51.4;
Add ETHER_IS_LOCAL(). Tests for "local" ethernet addresses.
 1.50 15-Mar-2008  rtr branches: 1.50.2; 1.50.4; 1.50.6;
whitespace '\t' -> ' '
 1.49 20-Feb-2008  matt branches: 1.49.2; 1.49.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.48 31-Dec-2007  dyoung Add media-handling code for several ethernet drivers with MII buses
to share.
 1.47 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.46 19-Sep-2007  dyoung branches: 1.46.6; 1.46.8; 1.46.12;
Constify sockaddr argument to ether_multiaddr(). Change struct
ifreq * arguments to ether_addmulti() and ether_delmulti() to const
struct sockaddr *, since ether_{add,del}multi() only ever read the
sockaddr ifreq member, ifr_addr. Update uses in carp(4) and in
vlan(4).
 1.45 04-Mar-2007  christos branches: 1.45.2; 1.45.14; 1.45.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.44 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.43 24-Nov-2006  rpaulo branches: 1.43.2; 1.43.4;
The change I committed to etherip was wrong. ether_snprintf doesn't make
sense when chaning the MAC address of the virtual interface as pointed
out by Hans himself.
So, introduce ether_nonstatic_aton() and make etherip(4) and tap(4) use it.
 1.42 16-Mar-2006  christos branches: 1.42.10; 1.42.12;
Add a new function called ether_snprintf() which takes an external buffer
and a length. The buffer should be 3 * addrlen.
Remove local tap_ether_sprintf(), and use ether_snprintf() instead.
 1.41 29-Jan-2006  jdolecek branches: 1.41.2; 1.41.4; 1.41.6; 1.41.8;
fix VLAN_ATTACHED() macro, it was always true due to condition bug

Fixes PR kern/32645 by Pavel Cahyna
 1.40 10-Dec-2005  elad branches: 1.40.2;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.39 18-Mar-2005  yamt branches: 1.39.2;
add agr(4), a pseudo network device driver for link aggregation.
 1.38 20-Feb-2005  jdolecek expose the new VLAN macros only within kernel
 1.37 20-Feb-2005  jdolecek add several VLAN tagging related macros, to reduce code duplication
in various ethernet adapter drivers and improve code consistency; mostly
FreeBSD-compatible, with exception of VLAN_OUTPUT_TAG(), which takes
(struct ethercom *) rather than (struct ifnet *) as first parameter
since the information cannot be extracted via (struct ifnet)

also add VLAN_ATTACHED(ec), which tests if any VLAN is attached to the
ethernet device
 1.36 08-Jan-2005  yamt branches: 1.36.2; 1.36.4;
constify broadcastaddr.
 1.35 08-Jan-2005  yamt remove an unused member, enm_ec from ether_multi.
 1.34 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.33 26-Jun-2003  tron branches: 1.33.2;
Test for symbol "_KERNEL_OPT" instead of "_LKM" as suggested by
Matthew Green.
 1.32 26-Jun-2003  tron Don't include "opt_mbuftrace.h" if "_LKM" is defined. This fixes a build
problem in the "vmware-module3" package.
 1.31 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.30 25-Mar-2003  bouyer Make promiscous mode work on vlans: introduce a new link-layer m_flag
M_PROMISC. In ether_input(), flag packets comming from an interface in
promiscous mode which are not for us M_PROMISC instead of droping them.
Drop M_PROMISC packets which are not passed to vlan_input(). M_PROMISC
packets passed to vlan_input() will be looped back to ether_input()
the M_PROMISC flag will be handled appropriately.
Clear M_PROMISC before giving the packet to bridge, as bridge has its own
checks for local MAC addresses.
This also makes bridges on vlan working.
 1.29 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.28 16-Sep-2002  tron Use "const" for all input parameters of ethers(3) functions.
 1.27 05-Mar-2002  itojun branches: 1.27.8;
bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.26 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.25 03-Jun-2001  thorpej branches: 1.25.2; 1.25.6;
Consider the configured MTU of the interface when determining
if a packet is too large.
 1.24 03-Jun-2001  thorpej Add a capability bit that indicates support for Gigabit Ethernet
jumbo frames, and use it in SIOCSIFMTU.
 1.23 07-Apr-2001  thorpej Add altq_etherclassify(), a slight hack modified from the kame/freebsd4
tree, which allows a packet with Ethernet headers already present to
run through the ALTQ packet classifier. This is needed in order to
suport ALTQ on VLAN and bridge devices.
 1.22 07-Apr-2001  thorpej ether_*() functions belong in if_ether.h, not if.h.
 1.21 17-Nov-2000  bouyer branches: 1.21.2;
Supports hardware 802.1q VLAN tagging, per discussion on tech-net. The tag is
stored in a m_aux mbuf defined by AF_LINK, ETHERTYPE_VLAN.
Thanks to Jason & Itojun for the feedback.
 1.20 11-Oct-2000  thorpej Implement ether_ioctl(), ioctl operations common to all Ethernet
interfaces.
 1.19 03-Oct-2000  thorpej Improve the VLAN support, in particular, handling of MTU:
- Add a macro to compute the max frame length based on Ethertype
and presence of FCS, and use it to validate the packet size
in ether_input().
- Add capabilites to struct ethercom, and allow hardware drivers
to specify that they can handle the larger hardware MTU that
VLANs require in order to strictly conform to 802.1Q.
- Make ether_ifdetach() clear out the link address and free all of
the Ethernet multicast structures.

Also, rearrange the VLAN driver itself in preparation to supporting
other hardware types, including FDDI (which has 802.1Q VLAN capability).
 1.18 28-Sep-2000  enami Factor out and give a name to the common functionality to translate
sockaddr which represents a multicast address into an Ethernet address
or range of Etherenet addresses.
 1.17 17-Jun-2000  matt branches: 1.17.2;
Ansify before committing my next change.
 1.16 29-Mar-2000  enami branches: 1.16.2;
Fix typo in comment.
 1.15 29-Mar-2000  simonb Extern etherbroadcastaddr, ether_ipmulticast_min and ether_ipmulticast_max.
 1.14 06-Mar-2000  thorpej - Initialize ifp->if_baudrate to a sensible value when the interface is
attached.
- Add ether_crc32_be() and ether_crc_le(), common functions for computing
the Ethernet CRC on arbitrary length buffers. Nothing uses them yet,
and these should be double-checked and probably re-implemented as
table-driven functions.
 1.13 19-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.12 10-Aug-1999  thorpej branches: 1.12.2; 1.12.8;
u_char -> u_int8_t in the IPv6 goo.
 1.11 05-Aug-1999  thorpej M_HASCRC -> M_HASFCS, as suggested by Christoph Badura.
 1.10 04-Aug-1999  thorpej Define an Ethernet-specific flag which drivers can use to tell
the input routine that the CRC is included at the end of the frame.
 1.9 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.8 09-Apr-1999  drochner don't expose structures and prototypes to _STANDALONE programs
 1.7 25-Mar-1999  thorpej branches: 1.7.2;
Define several constants related to the Ethernet protocol:
- ETHER_ADDR_LEN: length of Ethernet address (actually, we already defined
this).
- ETHER_TYPE_LEN: length of the Ethernet header `type' field.
- ETHER_CRC_LEN: length of the Ethernet CRC (explorer got this already, mostly
because I forgot to commit these changes earlier).
- ETHER_HDR_LEN: total length of the Ethernet header
- ETHER_MAX_LEN: maximum length of an Ethernet frame, including header and CRC
- ETHER_MIN_LEN: minimum length of an Ethernet frame, including header and CRC

Define ETHERMTU and ETHERMIN (payload sizes) in terms of the above constants.
 1.6 25-Mar-1999  explorer define ETHER_CRC_LEN, for if_vr.c
 1.5 28-Jul-1998  is branches: 1.5.6;
Remove obsolete comment.
 1.4 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.3 02-Nov-1997  lukem * modify ether_aton, ether_hostton, and ether_line to take 'const char *'
arguments as appropriate
 1.2 15-Mar-1997  is branches: 1.2.8;
New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.1 07-Feb-1997  is branches: 1.1.2;
file if_ether.h was initially added on branch is-newarp.
 1.1.2.3 06-Mar-1997  is Factor out the ETHERTYPE_XXX definitions. They are needed as
- Ethernet protocol type numbers
- ARP protocol type numbers, as per "Assigned Numbers".
This way we don't need to pull in all the Ethernet include file into the
ARP code.
 1.1.2.2 18-Feb-1997  is Having converted everything, remove the struct ether_arp definition completely.
Some small cleanup.
STILLTODO: some sanity checks of the (now) variable link level address length
in incoming packets..
 1.1.2.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.2.8.1 10-Nov-1997  thorpej Sync w/ trunk.
 1.5.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.7.2.1 09-Apr-1999  drochner branches: 1.7.2.1.2; 1.7.2.1.4;
pull up rev. 1.8 - namespace protection for _STANDALONE programs
 1.7.2.1.4.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.7.2.1.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.7.2.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.7.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.12.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.2.3 21-Apr-2001  bouyer Sync with HEAD
 1.12.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.12.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.17.2.2 07-Jun-2001  he Pull up revision 1.25 (requested by thorpej):
Consider the configured MTU of the interface when determining
if a packet is too large.
 1.17.2.1 31-Dec-2000  jhawk Pull up revisions 1.18-1.19, 1.21 (requested by bouyer):
Add support for 802.1Q virtual LANs.
 1.21.2.5 18-Oct-2002  nathanw Catch up to -current.
 1.21.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.21.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.21.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.21.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.25.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.25.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.25.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.25.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.27.8.1 30-Jun-2003  grant Pull up revision 1.30 (requested by bouyer in ticket #1356):

Make promiscous mode work on vlans: introduce a new link-layer m_flag
M_PROMISC. In ether_input(), flag packets comming from an interface in
promiscous mode which are not for us M_PROMISC instead of droping them.
Drop M_PROMISC packets which are not passed to vlan_input(). M_PROMISC
packets passed to vlan_input() will be looped back to ether_input()
the M_PROMISC flag will be handled appropriately.
Clear M_PROMISC before giving the packet to bridge, as bridge has its
own checks for local MAC addresses.
This also makes bridges on vlan working.
 1.33.2.7 11-Dec-2005  christos Sync with head.
 1.33.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.33.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.33.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.33.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.1 03-Aug-2004  skrll Sync with HEAD
 1.36.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.36.2.1 29-Apr-2005  kent sync with -current
 1.39.2.8 17-Mar-2008  yamt sync with head.
 1.39.2.7 27-Feb-2008  yamt sync with head.
 1.39.2.6 21-Jan-2008  yamt sync with head
 1.39.2.5 27-Oct-2007  yamt sync with head.
 1.39.2.4 03-Sep-2007  yamt sync with head.
 1.39.2.3 26-Feb-2007  yamt sync with head.
 1.39.2.2 30-Dec-2006  yamt sync with head.
 1.39.2.1 21-Jun-2006  yamt sync with head.
 1.40.2.1 01-Feb-2006  yamt sync with head.
 1.41.8.1 19-Apr-2006  elad sync with head.
 1.41.6.1 01-Apr-2006  yamt sync with head.
 1.41.4.1 22-Apr-2006  simonb Sync with head.
 1.41.2.1 09-Sep-2006  rpaulo sync with head
 1.42.12.1 10-Dec-2006  yamt sync with head.
 1.42.10.1 12-Jan-2007  ad Sync with head.
 1.43.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.43.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.43.2.1 31-Mar-2009  bouyer Pull up following revision(s) (requested by dholland in ticket #1294):
sys/net/if_ether.h: revision 1.52
PR/39203: Paul Ripke: PPPoE issues with broken MTU/MRU implementations
Allow larger frames for systems that don't negotiate MTU/MRU properly.
 1.45.16.3 23-Mar-2008  matt sync with HEAD
 1.45.16.2 09-Jan-2008  matt sync with HEAD
 1.45.16.1 06-Nov-2007  matt sync with HEAD
 1.45.14.1 02-Oct-2007  joerg Sync with HEAD.
 1.45.2.1 09-Oct-2007  ad Sync with head.
 1.46.12.1 02-Jan-2008  bouyer Sync with HEAD
 1.46.8.1 26-Dec-2007  ad Sync with head.
 1.46.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.49.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.49.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.49.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.49.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.49.2.1 24-Mar-2008  keiichi sync with head.
 1.50.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.50.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.50.4.2 11-Aug-2010  yamt sync with head.
 1.50.4.1 04-May-2009  yamt sync with head.
 1.50.2.1 04-Jun-2008  yamt sync with head
 1.51.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.51.4.1 19-Oct-2008  haad Sync with HEAD.
 1.51.2.1 28-Jul-2008  simonb Sync with head.
 1.53.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.53.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.54.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.56.4.1 30-May-2010  rmind sync with head
 1.56.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.58.18.3 03-Dec-2017  jdolecek update from HEAD
 1.58.18.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.58.18.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.58.8.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.58.8.2 16-Jan-2013  yamt sync with (a bit old) head
 1.58.8.1 30-Oct-2012  yamt sync with head
 1.61.10.1 10-Aug-2014  tls Rebase.
 1.61.2.1 18-May-2014  rmind sync with head
 1.64.4.2 05-Feb-2017  skrll Sync with HEAD
 1.64.4.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.64.2.1 24-Sep-2017  snj Pull up following revision(s) (requested by manu in ticket #1409):
sys/arch/xen/xen/if_xennet_xenbus.c: 1.65
sys/arch/xen/xen/xennetback_xenbus.c: 1.53, 1.56 via patch
sys/net/if_bridge.c: 1.105
sys/net/if_ether.h: 1.65
sys/net/if_ethersubr.c: 1.215, 1.235
sys/net/if_vlan.c: 1.76, 1.77, 1.83, 1.88, 1.94
Protect vlan_unconfig with a mutex
It is not thread-safe but is likely to be executed in concurrent.
See PR 49264 for more detail.
--
Tweak vlan_unconfig
No functional change.
--
Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig%espci.fr@localhost). Factor out the vlan_mtu enabling and
disabling code.
--
Enable the VLAN mtu capability and check for the adjusted packet size
(Jean-Jacques.Puig at espci.fr).
Factor out the packet-size checking function for clarity.
--
Don't increment the reference count only when it was 0...
From Jean-Jacques.Puig
--
Account for the CRC len (Jean-Jacques.Puig)
--
Fix a bug that the parent interface's callback wasn't called when the vlan
interface is configured. A callback function uses VLAN_ATTACHED() function
which check ec->ec_nvlans, the value should be incremented before calling the
callback. This bug was added in if_vlan.c rev. 1.83 (2015/11/19).
 1.65.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.66.8.2 24-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #389):
sys/net/if_ether.h: revision 1.69
sys/net/if_vlan.c: revision 1.108
sys/dev/pci/if_bge.c: revision 1.313
sys/net/if_vlanvar.h: revision 1.11
sys/net/if_vlanvar.h: revision 1.12
sys/net/if_ether.h: revision 1.70
sys/net/if_vlan.c: revision 1.110
sys/dev/pci/if_wm.c: revision 1.544
sys/dev/pci/if_wmreg.h: revision 1.105
Fix a bug that a vlan packet which has priority or CFI bit in the tag causes
panic.
Revert part of if_bge.c 1.312. It's not required to mask other than VLAN ID
bits in VLAN tag.
Revert if_wmreg.h 1.104 and if_wm.c 1.542. It's not required to mask other
than VLAN ID bits in VLAN tag.
No functional change:
- u_int16_t -> uint16_t
- u_short -> uint16_t
- tag_hash_func -> vlan_tag_hash
- 0 -> NULL because vlr_parent is a pointer.
 1.66.8.1 24-Oct-2017  snj Pull up following revision(s) (requested by knakahara in ticket #302):
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.30-1.31
sys/arch/x86/pci/if_vmx.c: 1.20
sys/dev/ic/i82557.c: 1.148
sys/dev/ic/rtl8169.c: 1.152
sys/dev/pci/cxgb/cxgb_sge.c: 1.5
sys/dev/pci/if_age.c: 1.51
sys/dev/pci/if_alc.c: 1.25
sys/dev/pci/if_ale.c: 1.23
sys/dev/pci/if_bge.c: 1.311
sys/dev/pci/if_bge.c: 1.312
sys/dev/pci/if_bnx.c: 1.62
sys/dev/pci/if_jme.c: 1.32
sys/dev/pci/if_nfe.c: 1.64
sys/dev/pci/if_sip.c: 1.167
sys/dev/pci/if_stge.c: 1.63-1.64
sys/dev/pci/if_ti.c: 1.102
sys/dev/pci/if_txp.c: 1.48
sys/dev/pci/if_vge.c: 1.61
sys/dev/pci/if_wm.c: 1.538
sys/dev/pci/ixgbe/ix_txrx.c: 1.29 via patch
sys/net/agr/if_agrether_hash.c: 1.4
sys/net/if_ether.h: 1.67-1.68
sys/net/if_ethersubr.c: 1.244
sys/net/if_vlan.c: 1.100
sys/net80211/ieee80211_input.c: 1.89
sys/net80211/ieee80211_output.c: 1.59
sys/sys/mbuf.h: 1.171
VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.
I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html
--
only get vtag when we have vtag like the other drivers.
--
- only get the vtag if we have it like the other drivers
- mask the hardware vlan tag
--
- add a constant for the vlan mask.
- enforce that we have a tag before we get it.
only get vtag when we have vtag like the other drivers.
like if_bge.c:1.312 and if_stge.c:1.64.
fixed by s-yamaguchi@IIJ, thanks.
 1.71.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.71.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.71.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.75.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.75.2.1 10-Jun-2019  christos Sync with HEAD
 1.81.10.1 03-Apr-2021  thorpej Sync with HEAD.
 1.44 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.43 26-Jun-2018  msaitoh branches: 1.43.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.42 26-Jan-2018  maxv branches: 1.42.2;
A few fixes:

* Style.

* Don't add M_PKTHDR manually, that's absolutely forbidden. Add a
KASSERT to make sure it's already there.

* Add a missing NULL check after m_pullup.
 1.41 26-Jan-2018  maxv Don't call if_attach, do if_initialize+if_register, otherwise when an
EtherIP packet is received the first KASSERT in if_input() fires.
 1.40 06-Dec-2017  ozaki-r Ensure to not turn on IFF_RUNNING of an interface until its initialization completes

And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
 1.39 23-Oct-2017  msaitoh If if_initialize() failed in the attach function, free resources and return.
 1.38 11-Jul-2016  msaitoh branches: 1.38.8; 1.38.10;
KNF. No functional change.
 1.37 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.36 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.35 25-Feb-2014  pooka branches: 1.35.6;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.34 25-Jan-2014  christos add __USE
 1.33 28-Jul-2012  matt branches: 1.33.2; 1.33.4;
Fix -fno-common found by building i386/conf/ALL
 1.32 02-Jun-2012  dsl Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.31 28-Oct-2011  dyoung branches: 1.31.2;
kauth isn't used in here, so don't #include <sys/kauth.h>.
 1.30 19-May-2010  christos Replace ether_nonstatic_aton with a
- better named one
- not suffering from buffer oveflow
- simpler
- handling different separators
- returning error codes for errors

Some ideas from one posted on tech-net by Jonathan A. Kollasch
 1.29 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.28 19-Jan-2010  pooka branches: 1.28.2; 1.28.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.27 18-Mar-2009  cegger bcmp -> memcmp
 1.26 17-Dec-2008  cegger branches: 1.26.2;
kill MALLOC and FREE macros.
 1.25 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.24 03-Nov-2008  hans call pmf_device_deregister in detach functions. requested by jmcneill.
 1.23 02-Nov-2008  hans Add NULL pmf handlers. OK by cube.
 1.22 24-Oct-2008  dyoung branches: 1.22.2; 1.22.4;
Fix the device_t/softc split: introduce etherip_softc.sc_dev and
initialize it. Use sc_dev in etherip_clone_destroy() instead of
casting the softc to struct device *.

Remove gratuitous casts. Use device_t and cfdata_t throughout.
 1.21 10-Jul-2008  cegger make this compile again
 1.20 09-Jul-2008  joerg - device/softc split
- remove redundant ;
 1.19 24-Apr-2008  ad branches: 1.19.2; 1.19.4; 1.19.6; 1.19.8;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.18 05-Apr-2008  cegger branches: 1.18.2;
use aprint_*_dev and device_xname
 1.17 20-Feb-2008  matt branches: 1.17.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.16 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.15 11-Dec-2007  lukem use __KERNEL_RCSID()
 1.14 08-Oct-2007  ad branches: 1.14.6; 1.14.8; 1.14.10;
Use the softint API.
 1.13 16-Sep-2007  dyoung branches: 1.13.2;
Use sockaddr_dup() and sockaddr_free().
 1.12 10-Sep-2007  cube Remove 3rd clause and my name from all the licences which were only in my
name.
 1.11 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.10 26-Aug-2007  dyoung branches: 1.10.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.9 14-Jul-2007  ad branches: 1.9.2; 1.9.6;
Generic soft interrupts are mandatory.
 1.8 30-May-2007  christos Move the nasty ifdefs in one place. Requested by ad and dyoung.
 1.7 29-May-2007  christos Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.6 23-Apr-2007  dyoung Free route cache after detaching an etherip(4) instance.
 1.5 04-Mar-2007  christos branches: 1.5.2; 1.5.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.4 15-Dec-2006  joerg branches: 1.4.2; 1.4.4; 1.4.6;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.3 24-Nov-2006  rpaulo branches: 1.3.4;
The change I committed to etherip was wrong. ether_snprintf doesn't make
sense when chaning the MAC address of the virtual interface as pointed
out by Hans himself.
So, introduce ether_nonstatic_aton() and make etherip(4) and tap(4) use it.
 1.2 23-Nov-2006  rpaulo Remove extra prototype.
 1.1 23-Nov-2006  rpaulo New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.3.4.3 18-Dec-2006  yamt sync with head.
 1.3.4.2 10-Dec-2006  yamt sync with head.
 1.3.4.1 24-Nov-2006  yamt file if_etherip.c was added on branch yamt-splraiseipl on 2006-12-10 07:19:00 +0000
 1.4.6.2 07-May-2007  yamt sync with head.
 1.4.6.1 12-Mar-2007  rmind Sync with HEAD.
 1.4.4.2 12-Jan-2007  ad Sync with head.
 1.4.4.1 15-Dec-2006  ad file if_etherip.c was added on branch newlock2 on 2007-01-12 01:04:11 +0000
 1.4.2.6 27-Feb-2008  yamt sync with head.
 1.4.2.5 21-Jan-2008  yamt sync with head
 1.4.2.4 27-Oct-2007  yamt sync with head.
 1.4.2.3 03-Sep-2007  yamt sync with head.
 1.4.2.2 30-Dec-2006  yamt sync with head.
 1.4.2.1 15-Dec-2006  yamt file if_etherip.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:20 +0000
 1.5.4.1 11-Jul-2007  mjf Sync with head.
 1.5.2.7 09-Oct-2007  ad Sync with head.
 1.5.2.6 09-Oct-2007  ad Sync with head.
 1.5.2.5 15-Jul-2007  ad Sync with head.
 1.5.2.4 15-Jul-2007  ad Sync with head.
 1.5.2.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.5.2.2 09-Jun-2007  ad Sync with head.
 1.5.2.1 08-Jun-2007  ad Sync with head.
 1.9.6.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.9.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.9.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.9.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.10.2.3 23-Mar-2008  matt sync with HEAD
 1.10.2.2 09-Jan-2008  matt sync with HEAD
 1.10.2.1 06-Nov-2007  matt sync with HEAD
 1.13.2.1 14-Oct-2007  yamt sync with head.
 1.14.10.2 02-Jan-2008  bouyer Sync with HEAD
 1.14.10.1 13-Dec-2007  bouyer Sync with HEAD
 1.14.8.1 11-Dec-2007  yamt sync with head.
 1.14.6.1 26-Dec-2007  ad Sync with head.
 1.17.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.17.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.17.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.18.2.1 18-May-2008  yamt sync with head.
 1.19.8.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.19.8.1 19-Oct-2008  haad Sync with HEAD.
 1.19.6.1 18-Jul-2008  simonb Sync with head.
 1.19.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.19.2.3 11-Aug-2010  yamt sync with head.
 1.19.2.2 11-Mar-2010  yamt sync with head
 1.19.2.1 04-May-2009  yamt sync with head.
 1.22.4.2 19-Nov-2008  snj Pull up following revision(s) (requested by hans in ticket #89):
sys/net/if_tap.c: revision 1.49
sys/net/if_etherip.c: revision 1.24
call pmf_device_deregister in detach functions. requested by jmcneill.
 1.22.4.1 19-Nov-2008  snj Pull up following revision(s) (requested by hans in ticket #89):
sys/net/if_tap.c: revision 1.48
sys/net/if_etherip.c: revision 1.23
Add NULL pmf handlers. OK by cube.
 1.22.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.22.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.26.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.28.4.1 30-May-2010  rmind sync with head
 1.28.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.28.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.31.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.31.2.1 30-Oct-2012  yamt sync with head
 1.33.4.1 18-May-2014  rmind sync with head
 1.33.2.2 03-Dec-2017  jdolecek update from HEAD
 1.33.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.35.6.2 05-Oct-2016  skrll Sync with HEAD
 1.35.6.1 22-Sep-2015  skrll Sync with HEAD
 1.38.10.3 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #694):

sys/netinet6/ip6_etherip.c: revision 1.22
sys/net/if_etherip.c: revision 1.41
sys/net/if_etherip.c: revision 1.42
sys/netinet/ip_etherip.c: revision 1.21

Don't call if_attach, do if_initialize+if_register, otherwise when an
EtherIP packet is received the first KASSERT in if_input() fires.

A few fixes:
* Style.
* Don't add M_PKTHDR manually, that's absolutely forbidden. Add a
KASSERT to make sure it's already there.
* Add a missing NULL check after m_pullup.
 1.38.10.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.38.10.1 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.38.8.1 17-May-2017  pgoyette At suggestion of chuq@, modify config_attach_pseudo() to return with a
reference held on the device.

Adapt callers to expect the reference to exist, and to ensure that the
reference is released.
 1.42.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.42.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.43.2.1 10-Jun-2019  christos Sync with HEAD
 1.13 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.12 14-Dec-2016  knakahara branches: 1.12.14; 1.12.16;
fix race of gif_softc->gif_ro when we send multiple flows over gif on NET_MPSAFE enabled kernel.

make gif_softc->gif_ro percpu as well as ipforward_rt to resolve this race.
and add future TODO comment for etherip(4).
 1.11 28-Jul-2012  matt branches: 1.11.2; 1.11.16; 1.11.20;
Fix -fno-common found by building i386/conf/ALL
 1.10 12-Nov-2008  ad branches: 1.10.16;
Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.9 24-Oct-2008  dyoung branches: 1.9.2;
Fix the device_t/softc split: introduce etherip_softc.sc_dev and
initialize it. Use sc_dev in etherip_clone_destroy() instead of
casting the softc to struct device *.

Remove gratuitous casts. Use device_t and cfdata_t throughout.
 1.8 09-Jul-2008  joerg - device/softc split
- remove redundant ;
 1.7 20-Feb-2008  matt branches: 1.7.6; 1.7.10; 1.7.12; 1.7.14; 1.7.16;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.6 16-Sep-2007  dyoung KNF: use tabs instead of spaces.
 1.5 16-Sep-2007  dyoung Move the LIST_HEAD() definition below etherip_softc's definition.
Somehow having it above interfered with ctags(1) producing a tag
for etherip_softc.

Remove the sole member of the union etherip_softc.sc_scr; call it
sc_ro. Delete the union. Delete the #define for sc_ro. The union
was a holdover from days before the route caches were unified.
 1.4 14-Jul-2007  ad branches: 1.4.6; 1.4.8;
Generic soft interrupts are mandatory.
 1.3 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.2 15-Dec-2006  joerg branches: 1.2.2; 1.2.4; 1.2.6; 1.2.10; 1.2.12;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.1 23-Nov-2006  rpaulo branches: 1.1.4;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.1.4.3 18-Dec-2006  yamt sync with head.
 1.1.4.2 10-Dec-2006  yamt sync with head.
 1.1.4.1 23-Nov-2006  yamt file if_etherip.h was added on branch yamt-splraiseipl on 2006-12-10 07:19:00 +0000
 1.2.12.1 11-Jul-2007  mjf Sync with head.
 1.2.10.3 09-Oct-2007  ad Sync with head.
 1.2.10.2 15-Jul-2007  ad Sync with head.
 1.2.10.1 08-Jun-2007  ad Sync with head.
 1.2.6.1 07-May-2007  yamt sync with head.
 1.2.4.2 12-Jan-2007  ad Sync with head.
 1.2.4.1 15-Dec-2006  ad file if_etherip.h was added on branch newlock2 on 2007-01-12 01:04:11 +0000
 1.2.2.5 27-Feb-2008  yamt sync with head.
 1.2.2.4 27-Oct-2007  yamt sync with head.
 1.2.2.3 03-Sep-2007  yamt sync with head.
 1.2.2.2 30-Dec-2006  yamt sync with head.
 1.2.2.1 15-Dec-2006  yamt file if_etherip.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:20 +0000
 1.4.8.2 23-Mar-2008  matt sync with HEAD
 1.4.8.1 06-Nov-2007  matt sync with HEAD
 1.4.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.7.16.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.7.16.1 19-Oct-2008  haad Sync with HEAD.
 1.7.14.1 18-Jul-2008  simonb Sync with head.
 1.7.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.7.10.1 04-May-2009  yamt sync with head.
 1.7.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.7.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.9.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.10.16.1 30-Oct-2012  yamt sync with head
 1.11.20.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.11.16.1 05-Feb-2017  skrll Sync with HEAD
 1.11.2.1 03-Dec-2017  jdolecek update from HEAD
 1.12.16.1 10-Jun-2019  christos Sync with HEAD
 1.12.14.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.333 12-Oct-2025  thorpej Some platforms have rules for retrieving the MAC address for an interface
beyond what properties exist. For example, a local address maybe be
present in a device tree property, but a system-wide property may indicate
that it should not be used (in favor of e.g. a singular system MAC addres -
LOOKIN' AT YOU, SUNW!).

So, the ether-get-mac-address device call is introduced to handle this
situation. Consult it before the standard properites, and if it succeeds,
use its result.
 1.332 04-Oct-2025  thorpej Add a shared function to query the common properties used for configuring
an Ethernet address.
 1.331 21-Sep-2025  christos Centralize all the "can't handle af%d\n", messages in one place and provide
more context. Now I get ad-nauseam:
ether_output: wm1: can't handle af18 (link: link#2)
 1.330 23-Apr-2025  joe cleaar trailig whitespace
 1.329 28-Sep-2024  mlelstv count illegal slow protocol subtype as protocol error instead of generic
error.
 1.328 28-Sep-2024  mlelstv comment, whitespace.
 1.327 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.326 02-Nov-2023  yamaguchi branches: 1.326.2; 1.326.4;
Use ether_bpf_mtap only when the device supports vlan harware tagging

The function is bpf_mtap() for ethernet devices and *currently*
it is just handling VLAN tag stripped by the hardware.
 1.325 02-Nov-2023  yamaguchi Added NULL check
 1.324 20-Oct-2023  msaitoh Print error message when the multicast bit is set in the MAC address.
 1.323 15-Nov-2022  roy branches: 1.323.2;
arp: Validate ARP source hardware address matches Ethernet source

RFC 5227 section 1.1 states that for a DaD ARP probe the sender hardware
address must match the hardware address of the interface sending the
packet.

We can now verify this by checking the mbuf tag PACKET_TAG_ETHERNET_SRC.

This fixes an obsure issue where an old router was sending out bogus
ARP probes.

Thanks to Ryo Shimizu <ryo@nerv.org> for the re-implementation.
 1.322 15-Nov-2022  roy Revert prior.
 1.321 14-Nov-2022  roy net: Store a pointer to the Layer 2 Sender Hardware address in mbuf

The BSD networking stack is designed around passing a mbuf down the chain
and each layer removes the part it's interested in before passing it to
the next. This makes it easy for each layer to do it's work,
but non trivial to work backwards.

As such we now store a pointer to the Senders Hardware address in the
mbuf packet header so that protocols can perform any required validation.
 1.320 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.319 03-Sep-2022  thorpej Convert MPLS from a legacy netisr to pktqueue.
 1.318 03-Sep-2022  thorpej Convert NETATALK from a legacy netisr to pktqueue.
 1.317 03-Sep-2022  thorpej Convert ARP from a legacy netisr to pktqueue.
 1.316 03-Sep-2022  thorpej Only use configured RPS hash functions for IPv4 and IPv6 packets.

This is NFC change now because only IPv4 and IPv6 use pktqueue,
but that will change in future commits.
 1.315 20-Jun-2022  martin Avoid unused variable
 1.314 20-Jun-2022  yamaguchi bpf(4): added support for VLAN hardware offloading of ethernet devices
 1.313 20-Jun-2022  yamaguchi bridge(4): support VLAN frames stripped by hardware tagging
 1.312 20-Jun-2022  yamaguchi Handling frames that vlan id is 0 as non-VLAN frames
even if a vlan tag is stripped by harware offloading
 1.311 04-Apr-2022  yamaguchi Move input processing of lagg(4) before ether_input
to get rid of dependence.

This implementation is similar with that of bridge(4).
 1.310 31-Dec-2021  riastradh ethersubr(9): Assert IFNET_LOCKED in ether_ioctl_reinit.

Changes to if_flags are nontrivial configuration changes that require
the long-term ioctl lock.
 1.309 31-Dec-2021  riastradh sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.308 31-Dec-2021  riastradh sys: Use if_stop wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.307 10-Dec-2021  msaitoh Add comment to clarify.
 1.306 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.305 25-Nov-2021  msaitoh Better counting for ierrors, iqdrops and noproto in ether_input().

- Use if_noproto for unknown or unsupported protocols.
- Use if_ierrors for wrong mbuf or oversized frame.
 1.304 15-Nov-2021  yamaguchi introduced APIs to configure VLAN TAG to ethernet devices
 1.303 08-Nov-2021  christos Don't classify dropped packets that we don't understand as errors, for
example etype 0x88CA (TIPC (Transparent Inter Process Communication,)
or 0x893A (IEEE 1905).
Classify them as dropped like Linux does (FreeBSD just ignores them). From RVP.
 1.302 25-Oct-2021  ryo frame's vlan tag must be ntohs()'ed.
VLAN 0 Priority tag was misrecognized on non vlan-hwtagging interfaces.
 1.301 11-Oct-2021  knakahara Make pktq_rps_hash() pluggable for each interface type. Reviewed by gdt@n.o, thorpej@n.o, and riastradh@n.o, thanks.
 1.300 30-Sep-2021  yamaguchi lagg: Register lagg_ifdetach to ether_ifdetach hook
 1.299 30-Sep-2021  yamaguchi vlan: Register vlan_ifdetach to ether_ifdetach hook
 1.298 30-Sep-2021  yamaguchi bridge: Register bridge_ifdetach to ether_ifdetach hook
 1.297 30-Sep-2021  yamaguchi Provide a hook point called when ether_ifdetach is called
 1.296 30-Sep-2021  yamaguchi net: obsolete ifnet::if_link_state_chenged
that was used for updating link-state of vlan I/F

The obsoleted function is replaced with
ifnet::if_linkstate_hooks
 1.295 30-Sep-2021  yamaguchi vlan: Register the callback to update link-state of vlan I/F
to link-state change hook

The callback is registered in every vlan I/F even if the parent
interface is the same. Therefore it is not needed to search the
vlan I/F by the parent interface unlike the previous callback.
 1.294 30-Sep-2021  yamaguchi Replace ifnet::if_agriprivate with ifnet::if_lagg

agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).

After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
 1.293 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.292 14-Feb-2021  roy branches: 1.292.4; 1.292.6;
if_ether: revert prior alignment checks

Apparently not needed as our drivers ensure this.
 1.291 13-Feb-2021  roy Prior alignment fixes should not use an offset
 1.290 13-Feb-2021  roy if_ether: Ensure that ether_header is aligned
 1.289 26-Sep-2020  roy branches: 1.289.2;
vlan: match the interface link state with that of the parent

Now addresses on a vlan will detach and undergo duplicate address
dectection on link state changes just as on a standard interface.
 1.288 28-Aug-2020  ozaki-r ether: count dropped packets on output
 1.287 28-Aug-2020  ozaki-r ether: count dropped packets on input
 1.286 28-Aug-2020  ozaki-r ether: separate handling of LLC frames as ether_input_llc (NFCI)
 1.285 28-Aug-2020  ozaki-r net: introduce IFQ_ENQUEUE_ISR to assemble packet queuing routines (NFCI)
 1.284 30-Apr-2020  riastradh Convert ether_input from rnd_initial_entropy to entropy_epoch().
 1.283 15-Mar-2020  thorpej Add and use a new function, mowner_init_owner(), that initializes an
MBUFTRACE mowner structure (so that providers of it don't have to
grovel the internals).
 1.282 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.281 16-Jan-2020  kardel use the CARP interface for arp/nd instead of the carp parent interface.
this provides the correct source mac address for the packets.

there are routers out there that cache the source mac during
nd and then subsequently bypass/miss packet filters on carp
interfaces as they send to the parent interface mac instead of the
correct carp interface mac.
 1.280 16-Oct-2019  christos branches: 1.280.2;
Add and use __FPTRCAST, requested by uwe@
 1.279 16-Oct-2019  christos Add void * function pointer casts. There are different ways to "fix" those
warnings:
1. this one: add a void * cast (which I think is the least intrusive)
2. add pragmas to elide the warning
3. add intermediate inline conversion functions
4. change the called function prototypes, adding unused arguments and
converting some of the pointer arguments to void *.
5. make the functions varyadic (which defeats the purpose of checking)
6. pass command line flags to elide the warning
I did try 3 and 4 and I was not pleased with the result (sys_ptrace_common.c)
(3) added too much code and defines, and (4) made the regular use clumsy.
 1.278 02-Oct-2019  msaitoh Print oversized frame's message only when DIAGNOSTIC is set. The message
is not so important because we increment if_iqdrops now.
 1.277 01-Oct-2019  msaitoh Increment if_iqdrops when dropping an oversized frame.
 1.276 17-Jul-2019  msaitoh branches: 1.276.2;
Implement VLAN hardware filter function(ETHERCAP_VLAN_HWFILTER).
First proposed by jmcneill in 2017 and modified by me.

How to use:

- Set callback function:

ether_set_vlan_cb(struct ethercom *, ether_vlancb_t)

- Callback. This function is called when a vlan is attached/detached to the
parent interface:

int (*ether_vlancb_t)(struct ethercom *ec, uint16_t vlanid, bool set);

- ifconfig(8)

ifconfig ixg0 [-]vlan-hwfilter

Note that ETHERCAP_VLAN_HWFILTER is set by default on ixg(4) because
the PF driver usually enable "all block" filter by default.
 1.275 29-May-2019  msaitoh Even if we don't use MII(4), use the common path of SIOC[GS]IFMEDIA in
sys/net/if_ethersubr.c if we can.
- Add ec_ifmedia into struct ethercom.
- ec_mii in struct ethercom is kept and used as it is. It might be used in
future. Note that some Ethernet drivers which _DOESN'T_ use mii(4) use
ec_mii for keeping the if_media. Those should be changed in future.
 1.274 15-May-2019  ozaki-r Store IFF_ALLMULTI in ec_flags instead of if_flags to avoid data races

IFF_ALLMULTI is set/unset to if_flags via if_mcast_op. To avoid data races on
if_flags, IFNET_LOCK was added for if_mcast_op. Unfortunately it produces
a deadlock so we want to remove added IFNET_LOCK by avoiding the data races by
another approach.

This fix introduces ec_flags to struct ethercom and stores IFF_ALLMULTI to it.
ec_flags is protected by ETHER_LOCK and thus IFNET_LOCK is no longer necessary
for if_mcast_op. Note that the fix is applied only to MP-safe drivers that
the data races matter.

In the kernel, IFF_ALLMULTI is set by a driver and used by the driver itself.
So changing the storing place doesn't break anything. One exception is
ioctl(SIOCGIFFLAGS); we have to include IFF_ALLMULTI in a result if needed to
export the flag as well as before.

A upcoming commit will remove IFNET_LOCK.

PR kern/54189
 1.273 04-Feb-2019  mrg add or adjust fallthru comments.
 1.272 21-Dec-2018  msaitoh Add SIOCSETHERCAP. It's used to change ec_capenable.
 1.271 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.270 14-Jun-2018  yamaguchi branches: 1.270.2;
Use ether_lookup_multi() instead of the macro

ok ozaki-r@
 1.269 12-Jun-2018  ozaki-r Check if ether_ifdetach is called without INET_LOCK
 1.268 29-May-2018  maxv Remove an XXX of mine, actually it's fine. While here also remove a
misleading printf.
 1.267 29-May-2018  maxv Replace KASSERT by m_pullup. While the ethernet header is always there
when the packet was received on a physical interface, it may not be if
the packet was received over L2TP/EtherIP.

In particular, if the inner ethernet header ends up on two separate IP
fragments. Here the KASSERT is triggered, and on !DIAGNOSTIC we corrupt
memory.

Note that this is a widespread problem: a lot of L2 code was written with
the assumption that "most" headers are present in the first mbuf.
Obviously, that's not true if L2 encapsulation is being used.
 1.266 09-May-2018  maxv Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
 1.265 29-Apr-2018  maxv Remove references to m_copy in comments.
 1.264 26-Apr-2018  maxv m_copy -> m_copym
 1.263 09-Apr-2018  maxv Replace KASSERTMSG by a real check. L2 encapsulation protocols (at least
L2TP) don't ensure the LLC is there, and in !DIAGNOSTIC configurations
m_copydata will crash. Tested with L2TP.
 1.262 09-Apr-2018  maxv Add KASSERT. The input point expects struct ether_header to be there.

Now, I'm wondering whether it can be triggered by L2 encapsulation
protocols - they may not provide a contiguous area.
 1.261 09-Apr-2018  maxv Minor stylistic changes, add XXX and fix typo. No functional change.
 1.260 13-Feb-2018  maxv branches: 1.260.2;
Make the arpresolve branch more readable, fix typo, fix XXX (which I
added), add missing pserialize_read_exit (which I forgot).
 1.259 13-Feb-2018  maxv Remove KERNEL_LOCK around the MPLS code. It's not needed, since we're only
touching the tag of the mbuf - the tag belongs only to the mbuf, and the
mbuf is not shared.

ok knakahara@
 1.258 12-Feb-2018  maxv Fix typo, and add a comment about MPLS.
 1.257 19-Jan-2018  nakayama Fix inverted logic.
 1.256 15-Jan-2018  maxv Style, and fix a bug in the AppleTalk path: we're doing
M_PREPEND(M_DONTWAIT), but we forgot to NULL-check the mbuf afterwards.
 1.255 15-Jan-2018  maxv Fix two bugs in altq_etherclassify. When scanning the mbuf chain we need
to make sure that m_next is not NULL, otherwise NULL deref. After that,
we must not touch m->m_pkthdr, given that 'm' may not be the first mbuf
of the chain anymore.

Declare mtop, and add a KASSERT to make sure it has M_PKTHDR set.
 1.254 15-Jan-2018  maxv Fix a bug in the VLAN path: there's an inverted logic, the mbuf needs to
be bigger than struct ether_vlan_header, not smaller.

Meanwhile add a KASSERT in the LLC path.
 1.253 15-Jan-2018  maxv Style, make the code more readable, and add a KASSERT (we expect the mbuf
to have M_PKTHDR set).
 1.252 15-Jan-2018  maxv Several fixes:
- Style and typos
- Use kmem_zalloc, in case there is a padding between the fields of
the structures
- Use ETHER_ADDR_LEN instead of a hard-coded '6'
- kmem_alloc(KM_SLEEP) can't fail
- Simplify ether_aton_r
- Use mutex_obj_free, not to leak memory
 1.251 15-Jan-2018  maxv Fix the net.ether.multicast sysctl. If there is no multicast address
don't kmem_alloc(0) (which panics the kernel), and if the number of
multicast addresses has decreased don't copyout uninitialized kernel
data.
 1.250 09-Dec-2017  maxv style
 1.249 09-Dec-2017  maxv Make sure we have an llc structure in the packet, and don't read past the
end of the mbuf if we don't. I'm wondering whether we should not pull up
instead, but whatever.
 1.248 06-Dec-2017  ozaki-r Use kmem_alloc instead of kmem_intr_alloc in ether_addmulti

ether_addmulti is now not called in softint thanks to wqinput that
pulled input routines of ICMP out of softint.
 1.247 22-Nov-2017  msaitoh - Modify ether_ioctl() for readability. No functional change.
- KNF
 1.246 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.245 26-Oct-2017  msaitoh Use macro(ETHER_LOCK() and ETHER_UNLOCK()). No functional change.
 1.244 26-Sep-2017  knakahara VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.

I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html

XXX need pullup to -8 branch
 1.243 23-Jul-2017  para kmem_intr_free kmem_intr_[z]alloced memory

the underlying pools are the same but api-wise those should match
 1.242 06-Apr-2017  ozaki-r branches: 1.242.6;
Revert "Make sure to hold if_ioctl_lock when calling ifp->if_ioctl"

As per pgoyette@ and riastradh@ requests; we shouldn't decide to
hold a lock based on if the lock is held or not.
 1.241 05-Apr-2017  ozaki-r Make sure to hold if_ioctl_lock when calling ifp->if_ioctl

Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.
 1.240 24-Mar-2017  ozaki-r Remove KERNEL_LOCK for arpresolve in ether_output

Because arpresolve should be already MP-safe.
 1.239 21-Feb-2017  ozaki-r Sweep unnecessary malloc.h inclusions
 1.238 14-Feb-2017  ozaki-r Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.237 12-Feb-2017  skrll Remove redundant splnet/splx calls - ec_lock is IPL_NET.
 1.236 24-Jan-2017  maxv Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.235 13-Jan-2017  msaitoh branches: 1.235.2;
Fix a bug that the parent interface's callback wasn't called when the vlan
interface is configured. A callback function uses VLAN_ATTACHED() function
which check ec->ec_nvlans, the value should be incremented before calling the
callback. This bug was added in if_vlan.c rev. 1.83 (2015/11/19).
 1.234 10-Jan-2017  ozaki-r Enable some sysctl knobs on rump kernels for ifmcstat
 1.233 10-Jan-2017  ozaki-r Replace adaptive mutex for ethercom with spin one

Unfortunately even wm(4) doesn't allow adaptive mutex because wm(4)
tries to hold it with holding its own spin mutex.
 1.232 31-Dec-2016  ozaki-r Use kmem_intr_alloc instead of kmem_alloc

ether_addmulti still can be called in softint.

Fix PR kern/51755
 1.231 28-Dec-2016  ozaki-r Protect ec_multi* with mutex

The data can be accessed from sysctl, ioctl, interface watchdog
(if_slowtimo) and interrupt handlers. We need to protect the data against
parallel accesses from them.

Currently the mutex is applied to some drivers, we need to apply it to all
drivers in the future.

Note that the mutex is adaptive one for ease of implementation but some
drivers access the data in interrupt context so we cannot apply the mutex
to every drivers as is. We have two options: one is to replace the mutex
with a spin one, which requires some additional works (see
ether_multicast_sysctl), and the other is to modify the drivers to access
the data not in interrupt context somehow.
 1.230 28-Dec-2016  ozaki-r Use ether_ifattach in carp_clone_create instead of C&P code

carp_clone_destroy calls ether_ifdetach so not calling ether_ifattach is
inconsistent. If we add something pair of initialization and destruction
to ether_ifattach and ether_ifdetach (e.g., mutex_init/mutex_destroy),
ether_ifdetach of carp_clone_destroy won't work. So use ether_ifattach.

In order to do so, make ether_ifattach accept the 2nd argument (lla) as
NULL to allow carp to initialize its link level address by itself.
 1.229 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.228 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.227 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.226 25-Jul-2016  rjs Restore correct test for return value from aarpresolve().
 1.225 21-Jun-2016  knakahara branches: 1.225.2;
fix ATF net/carp failure
 1.224 20-Jun-2016  knakahara make ether_output() MP-safe, so that if_ether can enable IFEF_OUTPUT_MPSAFE.

making MP-scalable is future work.
 1.223 16-Jun-2016  ozaki-r Use if_get_byindex instead of if_byindex for MP-safe
 1.222 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.221 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.220 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (2/3) : eliminate pktattr argument from altq implemantation
 1.219 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (1/3) : add altq_pktattr fields to m_pkthdr

Reviewed by joerg@n.o and tls@n.o, thanks.
 1.218 15-Apr-2016  ozaki-r Hide PPPoE variables from if_ethersubr.c

This improves modularity of if_pppoe.

From s-yamaguchi@IIJ
 1.217 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.216 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.215 19-Nov-2015  christos Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig@espci.fr). Factor out the vlan_mtu enabling and
disabling code.
 1.214 13-Oct-2015  roy arpresolve() now returns 0 on success otherwise an error code.
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.

This allows applications such as ping(8) to display a suitable error
condition.
 1.213 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.212 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.211 12-Aug-2015  ozaki-r Tidy up header inclusions
 1.210 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.209 25-May-2015  ozaki-r Remove leftover IPX-related stuffs

No objection on tech-kern and tech-net.
 1.208 20-May-2015  ozaki-r Remove leftover use of AF_NS and NS option

Unnecessary NETISR_NS is also removed.
 1.207 13-Apr-2015  riastradh Include <sys/rndsource.h> for rnd_add_data.
 1.206 03-Apr-2015  ozaki-r Don't grab KERNEL_LOCK during if_output when NET_MPSAFE

The change makes L3 MP-safe work easy. At this point
we deal with only IP forwarding.

No functional change when NET_MPSAFE isn't enabled.
 1.205 28-Nov-2014  ozaki-r branches: 1.205.2;
Remove dead codes and make if_free_sadl static

No functional change.
 1.204 10-Aug-2014  tls branches: 1.204.2; 1.204.4; 1.204.6;
Merge tls-earlyentropy branch into HEAD.
 1.203 28-Jul-2014  ozaki-r Add a mutex for global variables of if_ethersubr.c

To initialize the mutex, we introduce etherinit that is called from ifinit1.
 1.202 30-Jun-2014  ozaki-r Schedule pppoe_softintr only when a packet is enqueued
 1.201 17-Jun-2014  ozaki-r Restructure ether_input and bridge_input

The network stack of NetBSD is well organized and
layered. A packet reception is processed from a
lower layer to an upper layer one by one. However,
ether_input and bridge_input are not structured so.
bridge_input is called inside ether_input.

The new structure replaces ifnet#if_input of a bridge
member with bridge_input when the member is attached.
So a packet goes straight on a packet reception via
a bridge, bridge_input => ether_input => ip_input.

The change is part of a patch of Lloyd Parkes submitted
in PR 48104. Unlike the patch, the change doesn't
intend to change the behavior of the packet processing.
Another patch will fix PR 48104.
 1.200 10-Jun-2014  joerg Introduce new sysctls for obtaining interface-specific addresses:
- net.sdl for the active link-layer adddress (the MAC)
- net.ether.multicast for the Ethernet multicast addresses
- net.inet6.multicast for the IPv6 multicast groups
- net.inet6.multicast_kludge for temporarily removed multicast groups

Use this sysctls for replacing the kmem grovelling in ifmcstat(8).
 1.199 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.198 15-May-2014  msaitoh Usually schednetisr() is called after enqueueing a packet with IF_ENQUEUE().
In some functions, they do it in reverse order. It's not a bug because
the pair is protected with splnet()/splx(s). It's not good for readability
and someone might mistake when modifing a code. Yes, I'm one of the person :-(

Save a NETISR_* value in a variable and call schednetisr() after enqueue
a packet for readability and future modification.
 1.197 13-May-2014  bouyer Make sure *(if_output)() is called with KERNEL_LOCK held.
Add some KASSERT for this.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details.
 1.196 25-Feb-2014  pooka branches: 1.196.2;
If the in6 domain was not attached, do not attempt to process IPv6 packets.
 1.195 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.194 01-Mar-2013  joerg branches: 1.194.6;
Retire OSI network stack. OK core@
 1.193 31-Oct-2012  msaitoh Add SIOCGETHERCAP ioctl.
There was no way to know the setting of ec_capabilities and ec_capenable
other than grepping the source.

See http://mail-index.netbsd.org/tech-kern/2010/07/28/msg008613.html
 1.192 11-Oct-2012  christos PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.191 05-Oct-2012  matt When setting a link address, don't bring up the interface automatically.
 1.190 17-Jul-2012  christos branches: 1.190.2;
PR/46587: Roger Pau Monne: Prevent panic on shutdown on bridge teardown ->
ifpromisc-> if_ioctl -> if_init. Idea from dyoung.
XXX: Pullup to 6.
 1.189 11-May-2012  chs in ether_ifdetach(), clear if_mowner before releasing what it points to.
fixes PR 42982.
 1.188 16-Jun-2011  kefren branches: 1.188.2; 1.188.6; 1.188.8;
use ETHERTYPE_MPLS only for unicast packets (RFC3032)
 1.187 24-May-2011  matt branches: 1.187.2;
Add code to auto-deencapsulate 0 tagged VLANs.
 1.186 25-Apr-2011  yamt use ETHER_IS_MULTICAST macro. no functional changes.
 1.185 12-Jan-2011  tsutsui branches: 1.185.2;
Fix off by one in ether_aton_r(). Noticed by "arp info overwritten" warning.
(how could it be missed for months?)
 1.184 17-Nov-2010  dyoung Cosmetic: fix indentation.
 1.183 27-Jun-2010  kefren Don't assume that rt_tag family is AF_MPLS but verify it.
This way rt_tag can be used for other future work also, not only MPLS
 1.182 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.181 19-May-2010  christos delint previous
 1.180 19-May-2010  christos Replace ether_nonstatic_aton with a
- better named one
- not suffering from buffer oveflow
- simpler
- handling different separators
- returning error codes for errors

Some ideas from one posted on tech-net by Jonathan A. Kollasch
 1.179 19-May-2010  jakllsch Changes to ether_nonstatic_aton():

Be more leinent on input string format. Each nibble pair may optionally be
followed by any of ':', '-', '.' or ' '.

Make source string const and work on a temporary copy. The caller may not
expect their string to be destroyed.
 1.178 05-May-2010  dyoung Constify some ether_output() arguments so that it's clear that they
can never be re-assigned.
 1.177 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.176 19-Jan-2010  pooka branches: 1.176.2; 1.176.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.175 28-Nov-2009  mbalmer Fix function name that was changed by mistake in the previous whitespace
commit.
 1.174 28-Nov-2009  isaki white space -> tab.
 1.173 20-Nov-2009  christos ar_tha() can return NULL; treat this as an error.
 1.172 29-May-2009  darran Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.

Adds SIOCINITIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).

Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).

In consultation with tls@.
 1.171 28-Apr-2009  dyoung Let this build with 'no options INET'.

(I don't know why I bothered, either.)
 1.170 07-Nov-2008  dyoung branches: 1.170.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.169 23-Jul-2008  dyoung branches: 1.169.2; 1.169.4; 1.169.6; 1.169.8;
Fix this another way: add the missing case statement.
 1.168 23-Jul-2008  gmcgarry Back out rev 1.163 which broke the logic for SIOCSIFFLAGS. PR#38976.
 1.167 13-May-2008  dyoung branches: 1.167.2; 1.167.4;
Delete unreachable SIOCSIFADDR/AF_LINK case.
 1.166 11-May-2008  dyoung Where applicable, s/0/NULL/, s/Bcmp/memcmp/. Remove a gratuitous
cast from a call to nd6_storelladdr().
 1.165 09-May-2008  rumble Nix a tautological return introduced in 1.129.
 1.164 15-Mar-2008  matt branches: 1.164.2; 1.164.4; 1.164.6;
Make sure M_PROMISC isn't already set, before we need to see if we are going
to see if we need to set M_PROMISC.
Assume the interface is not CARP'ed.
 1.163 12-Mar-2008  dyoung Make some cosmetic changes:

Use fewer 'error = ...; break;' statements and more 'return
...;'

Make the SIOCSIFFLAGS case more clear by using a switch
statement instead of an if-else if-else chain.

Shorten a staircase, and remove two unnecessary curly
braces.
 1.162 20-Feb-2008  matt branches: 1.162.2; 1.162.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.161 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.160 19-Jan-2008  dyoung Add default handling of SIOCSIFMEDIA/SIOCGIFMEDIA.
 1.159 02-Jan-2008  dyoung Fix XEN2_DOMU (and amd64?) builds: move ether_mediastatus(),
ether_mediachange() to their own module that we compile only if
the kernel configuration demands support for both MII buses and
ethernet. Thanks to Tom Spindler for suggesting that these routines
move to dev/mii/.
 1.158 31-Dec-2007  dyoung Add media-handling code for several ethernet drivers with MII buses
to share.
 1.157 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.156 08-Oct-2007  ad branches: 1.156.4; 1.156.6; 1.156.10;
Use the softint API.
 1.155 19-Sep-2007  dyoung branches: 1.155.2;
Constify sockaddr argument to ether_multiaddr(). Change struct
ifreq * arguments to ether_addmulti() and ether_delmulti() to const
struct sockaddr *, since ether_{add,del}multi() only ever read the
sockaddr ifreq member, ifr_addr. Update uses in carp(4) and in
vlan(4).
 1.154 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.153 26-Aug-2007  dyoung branches: 1.153.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.152 07-Aug-2007  dyoung branches: 1.152.2;
Constify.
 1.151 21-Jul-2007  dyoung branches: 1.151.4;
Use NULL instead of 0 for null pointers.
 1.150 14-Jul-2007  ad branches: 1.150.2;
Generic soft interrupts are mandatory.
 1.149 29-May-2007  christos Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.148 07-Mar-2007  liamjfoy branches: 1.148.2; 1.148.4;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.147 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.146 20-Feb-2007  dyoung Remove extraneous parentheses. bcopy -> memcpy.
 1.145 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.144 29-Jan-2007  bouyer branches: 1.144.2;
Drop M_PROMISC before passing the packet to a carp device, for the same
reason it's dropped before passing to bridge: when a vlan interface is
in promisc mode, it will loop the packet back to ether_input() with
M_PROMISC set, and when carp calls ether_input again the flag is still
there and the packet is dropped. If the carp interface doesn't take
the packet M_PROMISC is set just after is needed anyway.
Tested on a box with multiple carp on vlans, no comments about this patch
on tech-net@
 1.143 27-Jan-2007  cbiere Use a plain memcpy() instead of alignment- and endian-specific hacks.
 1.142 06-Jan-2007  bouyer Don't define dropanyway: label unless ISO or NETATALK is defined. Fix
kern/35364 by Gene ENonymous
 1.141 10-Dec-2006  is explain XID constants, and fix a wrong one
 1.140 10-Dec-2006  is comment on llc class
 1.139 01-Dec-2006  is branches: 1.139.2;
Remove an overlapping struct copy from ether_input, which caused address
corruption for incoming netiso packets with recent (at least NetBSD-3 and
later) compilers. This is done in a way that the copy is avoided totally.
Code path tested with tcp+udp/ipv4+ipv6, arp and ISO cltp/clnp.
Visually ok'd by Christos@.
 1.138 24-Nov-2006  rpaulo The change I committed to etherip was wrong. ether_snprintf doesn't make
sense when chaning the MAC address of the virtual interface as pointed
out by Hans himself.
So, introduce ether_nonstatic_aton() and make etherip(4) and tap(4) use it.
 1.137 23-Nov-2006  rpaulo New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.136 07-Sep-2006  dogcow branches: 1.136.2; 1.136.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.135 05-Aug-2006  pavel defflag PPPOE_SERVER and PPPOE_TERM_UNKNOWN_SESSIONS.
 1.134 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.133 18-May-2006  liamjfoy branches: 1.133.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.132 14-May-2006  elad integrate kauth.
 1.131 12-May-2006  mrg since ar_tha() can return NULL, don't pass it directly to functions
that expect real addresses. explicitly KASSERT() that it is not
NULL in the kernel and just avoid using it userland.

(the kernel could be more defensive about this, but, until now it
would have just crashed anyway.)
 1.130 15-Apr-2006  christos Coverity CID 1145: Protect against NULL deref.
 1.129 16-Mar-2006  christos branches: 1.129.2;
Add a new function called ether_snprintf() which takes an external buffer
and a length. The buffer should be 3 * addrlen.
Remove local tap_ether_sprintf(), and use ether_snprintf() instead.
 1.128 11-Dec-2005  thorpej branches: 1.128.4; 1.128.6; 1.128.8; 1.128.10;
ANSI function decls and application of static.
 1.127 11-Dec-2005  christos merge ktrace-lwp.
 1.126 10-Jun-2005  bouyer branches: 1.126.2;
As ether_input() is always called at IPL_NET, there is no need to
protect the IF_* operations with splnet()/splx().
 1.125 29-May-2005  christos - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.124 17-May-2005  christos Yes, it was a cool trick >20 years ago to use "0123456789abcdef"[a] to
implement, xtoa(), but I think defining the samestring 50 times is a bit
too much. Defined HEXDIGITS and hexdigits in subr_prf.c and use it...
 1.123 02-May-2005  matt rate limit the printfs for oversized ethernet packets.
 1.122 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.121 18-Mar-2005  yamt add agr(4), a pseudo network device driver for link aggregation.
 1.120 26-Feb-2005  perry branches: 1.120.2; 1.120.4; 1.120.6;
nuke trailing whitespace
 1.119 31-Jan-2005  kim Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
 1.118 08-Jan-2005  yamt branches: 1.118.2; 1.118.4;
constify broadcastaddr.
 1.117 08-Jan-2005  yamt remove an unused member, enm_ec from ether_multi.
 1.116 24-Jun-2004  jonathan Rename MBUFTRACE helper function m_claim() to m_claimm(),
for consistency with M_FREE() and m_freem(). Affected files:

sys/mbuf.h
kern/uipc_socket2.c
kern/uipc_mbuf.c
net/if_ethersubr.c
netatalk/ddp_input.c
nfs/nfs_socket.c
 1.115 06-Jun-2004  dyoung Resolve kern/25721 by detaching ethernet(-like) devices from a
bridge in ether_ifdetach.
 1.114 30-Oct-2003  simonb branches: 1.114.2;
Remove some assigned-to but otherwise unused variables.
 1.113 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.112 23-Jun-2003  martin branches: 1.112.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.111 11-Jun-2003  martin Add NETATALK to the list to check if we bail because no ARP is configured.
 1.110 11-Jun-2003  martin Fix typpo in #error message. Noted by Pawel Chwalowski in PR 21853.
 1.109 23-May-2003  itojun don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.108 16-May-2003  itojun use strlcpy
 1.107 02-May-2003  itojun KNF
 1.106 25-Mar-2003  bouyer Make promiscous mode work on vlans: introduce a new link-layer m_flag
M_PROMISC. In ether_input(), flag packets comming from an interface in
promiscous mode which are not for us M_PROMISC instead of droping them.
Drop M_PROMISC packets which are not passed to vlan_input(). M_PROMISC
packets passed to vlan_input() will be looped back to ether_input()
the M_PROMISC flag will be handled appropriately.
Clear M_PROMISC before giving the packet to bridge, as bridge has its own
checks for local MAC addresses.
This also makes bridges on vlan working.
 1.105 02-Mar-2003  aymeric ignore multicast PPPoE packets ASAP.
This improves performance a lot on slow machines behind a cable modem.
Protect it with PPPOE_SERVER as a reminder that this will have to be changed
if we add PPPoE server code in the kernel one day.
 1.104 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.103 03-Feb-2003  thorpej Test callout_pending(), not callout_active(), and eliminate now-unnecessary
callout_deactivate() calls.
 1.102 22-Jan-2003  jmmv Fix typo: realy -> really. Okay'ed by wiz.
 1.101 17-Jan-2003  itojun switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.100 12-Jan-2003  jdolecek Ethernet multicast entries are malloc'd M_IFMADDR, and thus should
be freed as M_IFMADDR too.
Fix supplied in PR kern/19037 by Sean Boudreau
 1.99 11-Sep-2002  itojun KNF - return is not a function.
 1.98 26-Aug-2002  thorpej Fix some signed/unsigned comparison warnings from GCC 3.3.
 1.97 20-Aug-2002  kristerw Make it compile for the __NO_STRICT_ALIGNMENT case.
 1.96 19-Aug-2002  thorpej In ether_output(), don't bother calling memcpy() to plop the ethertype
into the packet: On system with no strict alignment constraints, just
assign the value, and on others, do an inline 2 byte copy.
 1.95 18-May-2002  itojun branches: 1.95.2;
sync with KAME.
- make altq_etherclassify() able to handle packets whose ethernet header
is in a separate mbuf. closes netbsd PR 16559
 1.94 27-Apr-2002  enami branches: 1.94.2;
Use ETHER_HDR_LEN instead of 14.
 1.93 07-Apr-2002  martin XXX Explicitly make this fail to compile with a proper warning if we
do not have ARP configured.

This can be caused by configurations including bridge, ppppoe or vlan but
no ethernet interfaces - which does not make sense. We should add a way
to config(8) to issue this kind of warnings.
 1.92 05-Mar-2002  itojun bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.91 28-Feb-2002  thorpej Don't call m_aux_find() to find a VLAN tag unless VLANs are configured
on the interface.
 1.90 12-Nov-2001  lukem add RCSIDs
 1.89 17-Oct-2001  itojun unifdef OLDIP6OUTPUT
 1.88 25-Jul-2001  thorpej Duh, braino in last -- only kick the interface if we actually set
the MTU.
 1.87 25-Jul-2001  thorpej If we change the MTU, kick the interface; it may have to reprogram
registers for the new MTU.
 1.86 29-Jun-2001  thorpej branches: 1.86.2;
When setting an address on an interface, for address families which
do not require changing the link-level address, only (*if_init)()
if the interface is not already RUNNING.
 1.85 14-Jun-2001  itojun change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.84 12-Jun-2001  thorpej On a non-simplex interface, check incoming multicast (this catches
the broadcast case as well) to see if they came from us, and drop
them if they did.

This fixed IPv6 DAD on non-simplex interfaces, e.g. the Seeq 8003
found on my SGI Indy.
 1.83 03-Jun-2001  thorpej Consider the configured MTU of the interface when determining
if a packet is too large.
 1.82 03-Jun-2001  thorpej Add a capability bit that indicates support for Gigabit Ethernet
jumbo frames, and use it in SIOCSIFMTU.
 1.81 29-Apr-2001  martin Add an in-kernel PPPoE (ppp over ethernet, RFC 2516) implementation,
based on the existing net/if_spppsubr.c stuff.

While there are completely userland (bpf based) implementations available,
those have a vastly larger per packet overhead thus causing major CPU
overhead and higher latency. On an i386 base router, running a 486DX at 50MHz
my line (768kBit/s downstream) was limited to something (varying) between 10
and 20 kByte/s effective download rate. With this implementation I get full
bandwidth (~85kByte/s).

This is client side only. Arguably the right way to add full PPPoE support
(including server side) would be a variation of the ppp line discipline and
appropriate modifications to pppd. I promise every help I can give to anyone
doing that - but I needed this realy fast. Besids, on low memory NAT boxes
with typically a single PPPoE connection, this implementation is more
lightweight than a pppd based one, which nicely fits my needs.
 1.80 27-Apr-2001  marcus STDC cleanup: label not allowed just before end of block.
 1.79 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.78 11-Apr-2001  thorpej Add hooks for bridging.
 1.77 10-Apr-2001  thorpej Process pfil hooks for Ethernet input and output.
 1.76 07-Apr-2001  thorpej Add altq_etherclassify(), a slight hack modified from the kame/freebsd4
tree, which allows a packet with Ethernet headers already present to
run through the ALTQ packet classifier. This is needed in order to
suport ALTQ on VLAN and bridge devices.
 1.75 17-Jan-2001  thorpej branches: 1.75.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.74 26-Dec-2000  augustss Simplify case statement a tiny bit.
 1.73 18-Dec-2000  thorpej Fill in if_dlt.
 1.72 13-Dec-2000  thorpej Add ALTQ glue.
 1.71 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.70 17-Nov-2000  bouyer Supports hardware 802.1q VLAN tagging, per discussion on tech-net. The tag is
stored in a m_aux mbuf defined by AF_LINK, ETHERTYPE_VLAN.
Thanks to Jason & Itojun for the feedback.
 1.69 15-Nov-2000  thorpej Move bpfattach()/bpfdetach() calls into ether_ifattach()/ether_ifdetach().
 1.68 15-Oct-2000  matt When discarding oversized frame, say how long it was.
 1.67 15-Oct-2000  itojun suppress warning on nd6_storelladdr failure. the failure could happen
easily when we have routing table with too many entries. sync with kame.
 1.66 11-Oct-2000  thorpej Implement ether_ioctl(), ioctl operations common to all Ethernet
interfaces.
 1.65 04-Oct-2000  enami Free mbuf when dropping VLAN frame due to no configured vlan interface.
 1.64 03-Oct-2000  thorpej When an Ethernet interface detaches, unconfigure any VLANs associated
with it.
 1.63 03-Oct-2000  thorpej Improve the VLAN support, in particular, handling of MTU:
- Add a macro to compute the max frame length based on Ethertype
and presence of FCS, and use it to validate the packet size
in ether_input().
- Add capabilites to struct ethercom, and allow hardware drivers
to specify that they can handle the larger hardware MTU that
VLANs require in order to strictly conform to 802.1Q.
- Make ether_ifdetach() clear out the link address and free all of
the Ethernet multicast structures.

Also, rearrange the VLAN driver itself in preparation to supporting
other hardware types, including FDDI (which has 802.1Q VLAN capability).
 1.62 01-Oct-2000  thorpej Make the previous code path simpler (and possible easier for the
compiler to optimize), based on fddi_input().
 1.61 01-Oct-2000  thorpej Move the check for "promisc + unicast + not for us" into ether_input(),
and change Ethernet drivers to always pass all received frames to
ether_input() (with a few exceptions, which are documented in the
code).
 1.60 28-Sep-2000  enami Factor out and give a name to the common functionality to translate
sockaddr which represents a multicast address into an Ethernet address
or range of Etherenet addresses.
 1.59 27-Sep-2000  thorpej Glue VLANs into ether_input().
 1.58 17-Jun-2000  matt branches: 1.58.2;
Ansify before committing my next change.
 1.57 14-Jun-2000  mycroft Check the multicast bit in the header mbuf while interrupts are still blocked.
Otherwise we can run off into space if the packet was sent immediately and the
mbuf freed.
Pointed out by Boris Popov (not on our lists).
 1.56 12-May-2000  thorpej branches: 1.56.2;
- Fix a bug in the double-loop version of ether_crc32_le() -- we're not't
supposed to bubble carry through.
- Disable the double-loop version of ether_crc32_le() and add a
table-driven version of ether_crc32_le() -- the table-driven
version is faster.
 1.55 12-Apr-2000  itojun revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.54 06-Mar-2000  thorpej On second thought, only set a default baudrate for "ethernet" if one
isn't set already.
 1.53 06-Mar-2000  thorpej - Initialize ifp->if_baudrate to a sensible value when the interface is
attached.
- Add ether_crc32_be() and ether_crc_le(), common functions for computing
the Ethernet CRC on arbitrary length buffers. Nothing uses them yet,
and these should be double-checked and probably re-implemented as
table-driven functions.
 1.52 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.51 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.50 12-Oct-1999  matt branches: 1.50.2;
Fix appletalk over ethernet.
 1.49 21-Sep-1999  matt branches: 1.49.2;
Make NETATALK over FDDI.
 1.48 15-Sep-1999  is We only need the ether_ipmulticast_min and _max addresses if we have INET
compiled in.
 1.47 13-Sep-1999  itojun - Call in{,6}_pcbdetach if ipsec initialization is failed during PRU_ATTACH.
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.

(sync with recent KAME)
 1.46 05-Aug-1999  thorpej M_HASCRC -> M_HASFCS, as suggested by Christoph Badura.
 1.45 04-Aug-1999  thorpej In ether_input(), if M_HASCRC is set, trim the CRC off the packet.
 1.44 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.43 17-Jun-1999  bouyer mbuf should be allocated with M_DONTWAIT in ether_output(). This caused panics
when routing atalk.
 1.42 18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.41 10-Mar-1999  thorpej branches: 1.41.2; 1.41.4; 1.41.6;
Const poison ether_ifattach().
 1.40 10-Mar-1999  thorpej Const poison ether_sprintf().
 1.39 10-Dec-1998  christos IPX fixes.
 1.38 13-Oct-1998  kim branches: 1.38.4;
Use ETHERTYPE_ATALK instead of ETHERTYPE_AT. The former seems more common.
Our other constants also use "ATALK".

Added many new ETHERTYPE constants to sys/net/ethertypes.h, including the
ones from libpcap and tcpdump "ethertype.h" files.
 1.37 05-Jul-1998  jonathan defopt NS, NSIP.
 1.36 05-Jul-1998  jonathan defopt ISO TPIP.
 1.35 05-Jul-1998  jonathan defopt LLC
 1.34 05-Jul-1998  jonathan defopt CCITT.
 1.33 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.32 04-May-1998  christos Add IPX bits.
 1.31 30-Apr-1998  thorpej In ether_output(), if the socket address family is pseudo_AF_HDRCMPLT,
use the Ethernet source address speficied in the sockaddr rather than
the interface's Ethernet address, and then fall through to the AF_UNSPEC
case. From Greg Smith <greg@nas.nasa.gov>.
 1.30 29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.29 26-Apr-1998  mrg remove some register.
 1.28 24-Mar-1998  kleink register -> register int
 1.27 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.26 02-Oct-1997  is Reimplement a test for broadcast addresses advertized, which was left out
when rewriting the ARP system.
 1.25 03-Apr-1997  christos branches: 1.25.4;
Update for argument change in at_ifawithnet
 1.24 03-Apr-1997  christos PR/3444: Erik Bertelsen: Eliminate warnings when -UINET
 1.23 02-Apr-1997  christos Add netatalk stubs.
 1.22 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.21 13-Oct-1996  christos branches: 1.21.4;
backout previous kprintf change
 1.20 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.19 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.18 13-Feb-1996  christos Net prototypes
 1.17 24-Dec-1995  mycroft Avoid extra byte-swapping in average cases.
 1.16 24-Dec-1995  mycroft Remove old comment regarding trailers.
 1.15 29-Sep-1995  phil Move a #include to outside the #ifdef INET so it will compile without
INET defined.
 1.14 19-Aug-1995  mycroft Garbage collect useless `off' and `len' variables.
 1.13 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.12 07-Apr-1995  mycroft Set if_output in ether_ifattach().
 1.11 05-Apr-1995  mycroft Make OSI and X.25 work on little-endian machines.
 1.10 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.7 18-Apr-1994  glass revised nfs diskless support. uses bootp+rpc to gather parameters
 1.6 02-Feb-1994  hpeyerl Multicast is no longer optional
 1.5 23-Jan-1994  deraadt ether_output() & ether_input() take ether_type as a net-short.
AF_UNSPEC does not swap byte order of ether_type.
NOTE: this requires driver changes
 1.4 17-Dec-1993  mycroft From magnum branch:
Remove Jolitz's netisr kluge. Make sure cpl == 0 really means base priority.
Other minor cleanup.
 1.3 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.2 20-May-1993  cgd branches: 1.2.4;
add rcs ids to everything, and clean up headers
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.2.4.2 29-Oct-1993  mycroft #include machine/cpu.h
 1.2.4.1 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.21.4.5 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.21.4.4 18-Feb-1997  is Having converted everything, remove the struct ether_arp definition completely.
Some small cleanup.
STILLTODO: some sanity checks of the (now) variable link level address length
in incoming packets..
 1.21.4.3 12-Feb-1997  is Changed arprequest() to use AF_ARP sockaddr and NOT build its own Ethernet
header. Added some missing pieces in ether_output() to support this.
 1.21.4.2 11-Feb-1997  is - Add macros, to if_arp.h:struct arphdr, to access an ARP messages' variable
fields based on the ar_hln and ar_pln fields.
- Add AR_ARP case to ether_output, using the ar_tha() macro defined above.
 1.21.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.25.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.38.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.41.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.41.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.41.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.41.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.41.2.1 24-Jun-1999  perry pullup 1.42->1.43 (bouyer): allocate mbuf with M_DONTWAIT in ether_output()
 1.49.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.50.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.50.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.50.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.50.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.50.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.50.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.56.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.58.2.2 07-Jun-2001  he Pull up revision 1.83 (via patch, requested by thorpej):
Consider the configured MTU of the interface when determining
if a packet is too large.
 1.58.2.1 31-Dec-2000  jhawk Pull up revisions 1.59-1.60, 1.62-1.65, 1.70 via patch (requested by bouyer):
Add support for 802.1Q virtual LANs.
 1.75.2.13 17-Jan-2003  thorpej Sync with HEAD.
 1.75.2.12 15-Jan-2003  thorpej Sync with HEAD.
 1.75.2.11 17-Sep-2002  nathanw Catch up to -current.
 1.75.2.10 26-Aug-2002  nathanw Sync with rev 1.98 of -current.
 1.75.2.9 19-Aug-2002  thorpej Sync with trunk.
 1.75.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.75.2.7 17-Apr-2002  nathanw Catch up to -current.
 1.75.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.75.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.75.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.75.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.75.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.75.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.86.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.86.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.86.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.86.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.86.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.86.2.1 03-Aug-2001  lukem update to -current
 1.94.2.2 29-Aug-2002  gehenna catch up with -current.
 1.94.2.1 30-May-2002  gehenna Catch up with -current.
 1.95.2.5 30-Jun-2003  grant Pull up revision 1.106 (requested by bouyer in ticket #1356):

Make promiscous mode work on vlans: introduce a new link-layer m_flag
M_PROMISC. In ether_input(), flag packets comming from an interface in
promiscous mode which are not for us M_PROMISC instead of droping them.
Drop M_PROMISC packets which are not passed to vlan_input(). M_PROMISC
packets passed to vlan_input() will be looped back to ether_input()
the M_PROMISC flag will be handled appropriately.
Clear M_PROMISC before giving the packet to bridge, as bridge has its
own checks for local MAC addresses.
This also makes bridges on vlan working.
 1.95.2.4 24-Jun-2003  grant Apply patch (requested by itojun in ticket #1325):

don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.95.2.3 02-Jun-2003  tron Pull up revision 1.105 (requested by aymeric in ticket #1188):
ignore multicast PPPoE packets ASAP.
This improves performance a lot on slow machines behind a cable modem.
Protect it with PPPOE_SERVER as a reminder that this will have to be changed
if we add PPPoE server code in the kernel one day.
 1.95.2.2 26-Jan-2003  jmc Pullup revisions 1.101-1.102 (requested by jmmv in ticket #1102)
Fix typo: realy -> really. Okay'ed by wiz.
 1.95.2.1 19-Nov-2002  tron Pull up revision 1.96-1.97 (requested by thorpej in ticket #702):
In ether_output(), don't bother calling memcpy() to plop the ethertype
into the packet: On system with no strict alignment constraints, just
assign the value, and on others, do an inline 2 byte copy.
 1.112.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.112.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.112.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.112.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.112.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.112.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.112.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.112.2.1 03-Aug-2004  skrll Sync with HEAD
 1.114.2.2 14-Jul-2004  tron Pull up revision 1.116 (requested by jonathan in ticket #648):
Rename MBUFTRACE helper function m_claim() to m_claimm(),
for consistency with M_FREE() and m_freem(). Affected files:
sys/mbuf.h
kern/uipc_socket2.c
kern/uipc_mbuf.c
net/if_ethersubr.c
netatalk/ddp_input.c
nfs/nfs_socket.c
 1.114.2.1 07-Jun-2004  jdc Pull up revision 1.115 (requested by dyoung in ticket #448).

Resolve kern/25721 by detaching ethernet(-like) devices from a
bridge in ether_ifdetach.
 1.118.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.118.4.1 12-Feb-2005  yamt sync with head.
 1.118.2.1 29-Apr-2005  kent sync with -current
 1.120.6.1 08-Jan-2007  ghen Pull up following revision(s) (requested by bouyer in ticket #1623):
sys/net/if_ethersubr.c: revision 1.142 via patch
Don't define dropanyway: label unless ISO or NETATALK is defined. Fix
kern/35364 by Gene ENonymous
 1.120.4.1 08-Jan-2007  ghen Pull up following revision(s) (requested by bouyer in ticket #1623):
sys/net/if_ethersubr.c: revision 1.142 via patch
Don't define dropanyway: label unless ISO or NETATALK is defined. Fix
kern/35364 by Gene ENonymous
 1.120.2.2 08-Jan-2007  ghen Pull up following revision(s) (requested by bouyer in ticket #1623):
sys/net/if_ethersubr.c: revision 1.142 via patch
Don't define dropanyway: label unless ISO or NETATALK is defined. Fix
kern/35364 by Gene ENonymous
 1.120.2.1 02-Dec-2006  bouyer Pull up following revision(s) (requested by is in ticket #1597):
sys/net/if_ethersubr.c: revision 1.139 via patch
sys/netiso/clnp_input.c: revision 1.31
Remove an overlapping struct copy from ether_input, which caused address
corruption for incoming netiso packets with recent (at least NetBSD-3 and
later) compilers. This is done in a way that the copy is avoided totally.
Code path tested with tcp+udp/ipv4+ipv6, arp and ISO cltp/clnp.
Visually ok'd by Christos@.
 1.126.2.9 17-Mar-2008  yamt sync with head.
 1.126.2.8 27-Feb-2008  yamt sync with head.
 1.126.2.7 11-Feb-2008  yamt sync with head.
 1.126.2.6 21-Jan-2008  yamt sync with head
 1.126.2.5 27-Oct-2007  yamt sync with head.
 1.126.2.4 03-Sep-2007  yamt sync with head.
 1.126.2.3 26-Feb-2007  yamt sync with head.
 1.126.2.2 30-Dec-2006  yamt sync with head.
 1.126.2.1 21-Jun-2006  yamt sync with head.
 1.128.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.128.10.1 19-Apr-2006  elad sync with head.
 1.128.8.5 14-Sep-2006  yamt sync with head.
 1.128.8.4 11-Aug-2006  yamt sync with head
 1.128.8.3 26-Jun-2006  yamt sync with head.
 1.128.8.2 24-May-2006  yamt sync with head.
 1.128.8.1 01-Apr-2006  yamt sync with head.
 1.128.6.3 01-Jun-2006  kardel Sync with head.
 1.128.6.2 22-Apr-2006  simonb Sync with head.
 1.128.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.128.4.1 09-Sep-2006  rpaulo sync with head
 1.129.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.133.2.1 19-Jun-2006  chap Sync with head.
 1.136.4.2 18-Dec-2006  yamt sync with head.
 1.136.4.1 10-Dec-2006  yamt sync with head.
 1.136.2.2 01-Feb-2007  ad Sync with head.
 1.136.2.1 12-Jan-2007  ad Sync with head.
 1.139.2.2 27-Feb-2007  riz Pull up following revision(s) (requested by bouyer in ticket #465):
sys/net/if_ethersubr.c: revision 1.144
Drop M_PROMISC before passing the packet to a carp device, for the same
reason it's dropped before passing to bridge: when a vlan interface is
in promisc mode, it will loop the packet back to ether_input() with
M_PROMISC set, and when carp calls ether_input again the flag is still
there and the packet is dropped. If the carp interface doesn't take
the packet M_PROMISC is set just after is needed anyway.
Tested on a box with multiple carp on vlans, no comments about this patch
on tech-net@
 1.139.2.1 08-Jan-2007  tron Pull up following revision(s) (requested by bouyer in ticket #337):
sys/net/if_ethersubr.c: revision 1.142
Don't define dropanyway: label unless ISO or NETATALK is defined. Fix
kern/35364 by Gene ENonymous
 1.144.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.144.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.148.4.1 11-Jul-2007  mjf Sync with head.
 1.148.2.5 09-Oct-2007  ad Sync with head.
 1.148.2.4 20-Aug-2007  ad Sync with HEAD.
 1.148.2.3 15-Jul-2007  ad Sync with head.
 1.148.2.2 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.148.2.1 09-Jun-2007  ad Sync with head.
 1.150.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.150.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.151.4.4 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.151.4.3 02-Oct-2007  joerg Sync with HEAD.
 1.151.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.151.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.152.2.2 07-Aug-2007  dyoung Constify.
 1.152.2.1 07-Aug-2007  dyoung file if_ethersubr.c was added on branch matt-mips64 on 2007-08-07 04:37:45 +0000
 1.153.2.3 23-Mar-2008  matt sync with HEAD
 1.153.2.2 09-Jan-2008  matt sync with HEAD
 1.153.2.1 06-Nov-2007  matt sync with HEAD
 1.155.2.1 14-Oct-2007  yamt sync with head.
 1.156.10.2 20-Jan-2008  bouyer Sync with HEAD
 1.156.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.156.6.1 26-Dec-2007  ad Sync with head.
 1.156.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.162.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.162.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.162.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.162.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.162.2.1 24-Mar-2008  keiichi sync with head.
 1.164.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.164.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.164.4.5 11-Aug-2010  yamt sync with head.
 1.164.4.4 11-Mar-2010  yamt sync with head
 1.164.4.3 20-Jun-2009  yamt sync with head
 1.164.4.2 04-May-2009  yamt sync with head.
 1.164.4.1 16-May-2008  yamt sync with head.
 1.164.2.1 18-May-2008  yamt sync with head.
 1.167.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.167.4.1 19-Oct-2008  haad Sync with HEAD.
 1.167.2.1 28-Jul-2008  simonb Sync with head.
 1.169.8.1 21-Apr-2010  matt sync to netbsd-5
 1.169.6.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.169.4.2 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.169.4.1 05-Jun-2009  snj Pull up following revision(s) (requested by 792):
sys/dev/pci/if_wm.c: revision 1.175 via patch
sys/net/if_ethersubr.c: revision 1.172 via patch
sys/net/agr/ieee8023ad_lacp.c: revision 1.9 via patch
sys/net/agr/if_agr.c: revision 1.23 via patch
sys/net/agr/if_agrether.c: revision 1.7 via patch
sys/net/agr/if_agrvar_impl.h: revision 1.8 via patch
Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.
Adds SIOCSIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).
Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).
In consultation with tls@.
 1.169.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.170.4.2 23-Jul-2009  jym Sync with HEAD.
 1.170.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.176.4.4 31-May-2011  rmind sync with head
 1.176.4.3 05-Mar-2011  rmind sync with head
 1.176.4.2 03-Jul-2010  rmind sync with head
 1.176.4.1 30-May-2010  rmind sync with head
 1.176.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.176.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.185.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.187.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.188.8.5 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.188.8.4 03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.188.8.3 31-Oct-2012  riz branches: 1.188.8.3.2;
Pull up following revision(s) (requested by christos in ticket #638):
sys/net/if_ppp.c: revision 1.137
sys/netinet6/ip6_flow.c: revision 1.20
sys/net/if_fddisubr.c: revision 1.82
sys/net/if_ethersubr.c: revision 1.192
sys/netinet6/in6_var.h: revision 1.66
sys/net/if_atmsubr.c: revision 1.50
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.188.8.2 20-Aug-2012  riz branches: 1.188.8.2.4;
Pull up following revision(s) (requested by christos in ticket #517):
sys/net/if_ethersubr.c: revision 1.190
PR/46587: Roger Pau Monne: Prevent panic on shutdown on bridge teardown ->
ifpromisc-> if_ioctl -> if_init. Idea from dyoung.
XXX: Pullup to 6.
 1.188.8.1 18-May-2012  riz Pull up following revision(s) (requested by chs in ticket #258):
sys/net/if_ethersubr.c: revision 1.189
in ether_ifdetach(), clear if_mowner before releasing what it points to.
fixes PR 42982.
 1.188.8.3.2.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.188.8.3.2.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.188.8.2.4.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.188.8.2.4.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.188.6.1 02-Jun-2012  mrg sync to latest -current.
 1.188.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.188.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.188.2.2 30-Oct-2012  yamt sync with head
 1.188.2.1 23-May-2012  yamt sync with head.
 1.190.2.4 03-Dec-2017  jdolecek update from HEAD
 1.190.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.190.2.2 23-Jun-2013  tls resync from head
 1.190.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.194.6.2 18-May-2014  rmind sync with head
 1.194.6.1 28-Aug-2013  rmind sync with head
 1.196.2.2 10-Aug-2014  tls Rebase.
 1.196.2.1 07-Apr-2014  tls Increase unpredictability of early output: mix in the headers of the
first 100 Ethernet packets received by the system (if we are really
short of entropy, keep mixing them though we don't count any entropy from
them; such systems are particularly likely to have guessable outputs).
 1.204.6.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.204.4.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.204.2.2 24-Sep-2017  snj Pull up following revision(s) (requested by manu in ticket #1409):
sys/arch/xen/xen/if_xennet_xenbus.c: 1.65
sys/arch/xen/xen/xennetback_xenbus.c: 1.53, 1.56 via patch
sys/net/if_bridge.c: 1.105
sys/net/if_ether.h: 1.65
sys/net/if_ethersubr.c: 1.215, 1.235
sys/net/if_vlan.c: 1.76, 1.77, 1.83, 1.88, 1.94
Protect vlan_unconfig with a mutex
It is not thread-safe but is likely to be executed in concurrent.
See PR 49264 for more detail.
--
Tweak vlan_unconfig
No functional change.
--
Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig%espci.fr@localhost). Factor out the vlan_mtu enabling and
disabling code.
--
Enable the VLAN mtu capability and check for the adjusted packet size
(Jean-Jacques.Puig at espci.fr).
Factor out the packet-size checking function for clarity.
--
Don't increment the reference count only when it was 0...
From Jean-Jacques.Puig
--
Account for the CRC len (Jean-Jacques.Puig)
--
Fix a bug that the parent interface's callback wasn't called when the vlan
interface is configured. A callback function uses VLAN_ATTACHED() function
which check ec->ec_nvlans, the value should be incremented before calling the
callback. This bug was added in if_vlan.c rev. 1.83 (2015/11/19).
 1.204.2.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.205.2.12 28-Aug-2017  skrll Sync with HEAD
 1.205.2.11 05-Feb-2017  skrll Sync with HEAD
 1.205.2.10 05-Dec-2016  skrll Sync with HEAD
 1.205.2.9 05-Oct-2016  skrll Sync with HEAD
 1.205.2.8 09-Jul-2016  skrll Sync with HEAD
 1.205.2.7 29-May-2016  skrll Sync with HEAD
 1.205.2.6 22-Apr-2016  skrll Sync with HEAD
 1.205.2.5 19-Mar-2016  skrll Sync with HEAD
 1.205.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.205.2.3 22-Sep-2015  skrll Sync with HEAD
 1.205.2.2 06-Jun-2015  skrll Sync with HEAD
 1.205.2.1 06-Apr-2015  skrll Sync with HEAD
 1.225.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.225.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.225.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.225.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.225.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.225.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.235.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.242.6.10 10-Oct-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1770):

sys/net/if_ethersubr.c: revision 1.254

Fix a bug in the VLAN path: there's an inverted logic, the mbuf needs to
be bigger than struct ether_vlan_header, not smaller.

Meanwhile add a KASSERT in the LLC path.
 1.242.6.9 27-Oct-2021  martin Fix merge mishap from previous (ticket #1704)
 1.242.6.8 25-Oct-2021  martin Pull up following revision(s) (requested by ryo in ticket #1704):

sys/net/if_ethersubr.c: revision 1.302

frame's vlan tag must be ntohs()'ed.

VLAN 0 Priority tag was misrecognized on non vlan-hwtagging interfaces.
 1.242.6.7 08-Oct-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1402):

sys/net/if_ethersubr.c: revision 1.277

Increment if_iqdrops when dropping an oversized frame.
 1.242.6.6 08-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1401):

sys/net/if_ethersubr.c: revision 1.255

Fix two bugs in altq_etherclassify. When scanning the mbuf chain we need
to make sure that m_next is not NULL, otherwise NULL deref. After that,
we must not touch m->m_pkthdr, given that 'm' may not be the first mbuf
of the chain anymore.

Declare mtop, and add a KASSERT to make sure it has M_PKTHDR set.
 1.242.6.5 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #628):
sys/net/if_ethersubr.c: revision 1.250
sys/net/if_ethersubr.c: revision 1.251
sys/net/if_ethersubr.c: revision 1.252
sys/net/if_ethersubr.c: revision 1.248
Use kmem_alloc instead of kmem_intr_alloc in ether_addmulti

ether_addmulti is now not called in softint thanks to wqinput that
pulled input routines of ICMP out of softint.

style

Fix the net.ether.multicast sysctl. If there is no multicast address
don't kmem_alloc(0) (which panics the kernel), and if the number of
multicast addresses has decreased don't copyout uninitialized kernel
data.

Several fixes:
- Style and typos
- Use kmem_zalloc, in case there is a padding between the fields of
the structures
- Use ETHER_ADDR_LEN instead of a hard-coded '6'
- kmem_alloc(KM_SLEEP) can't fail
- Simplify ether_aton_r
- Use mutex_obj_free, not to leak memory
 1.242.6.4 08-Mar-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #618):
sys/net/if_ethersubr.c: revision 1.245
sys/net/if_ethersubr.c: revision 1.247

Use macro(ETHER_LOCK() and ETHER_UNLOCK()). No functional change.

- Modify ether_ioctl() for readability. No functional change.

- KNF
 1.242.6.3 09-Jan-2018  snj Pull up following revision(s) (requested by maxv in ticket #480):
sys/net/if_ethersubr.c: revision 1.249
Make sure we have an llc structure in the packet, and don't read past the
end of the mbuf if we don't. I'm wondering whether we should not pull up
instead, but whatever.
 1.242.6.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.242.6.1 24-Oct-2017  snj Pull up following revision(s) (requested by knakahara in ticket #302):
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.30-1.31
sys/arch/x86/pci/if_vmx.c: 1.20
sys/dev/ic/i82557.c: 1.148
sys/dev/ic/rtl8169.c: 1.152
sys/dev/pci/cxgb/cxgb_sge.c: 1.5
sys/dev/pci/if_age.c: 1.51
sys/dev/pci/if_alc.c: 1.25
sys/dev/pci/if_ale.c: 1.23
sys/dev/pci/if_bge.c: 1.311
sys/dev/pci/if_bge.c: 1.312
sys/dev/pci/if_bnx.c: 1.62
sys/dev/pci/if_jme.c: 1.32
sys/dev/pci/if_nfe.c: 1.64
sys/dev/pci/if_sip.c: 1.167
sys/dev/pci/if_stge.c: 1.63-1.64
sys/dev/pci/if_ti.c: 1.102
sys/dev/pci/if_txp.c: 1.48
sys/dev/pci/if_vge.c: 1.61
sys/dev/pci/if_wm.c: 1.538
sys/dev/pci/ixgbe/ix_txrx.c: 1.29 via patch
sys/net/agr/if_agrether_hash.c: 1.4
sys/net/if_ether.h: 1.67-1.68
sys/net/if_ethersubr.c: 1.244
sys/net/if_vlan.c: 1.100
sys/net80211/ieee80211_input.c: 1.89
sys/net80211/ieee80211_output.c: 1.59
sys/sys/mbuf.h: 1.171
VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.
I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html
--
only get vtag when we have vtag like the other drivers.
--
- only get the vtag if we have it like the other drivers
- mask the hardware vlan tag
--
- add a constant for the vlan mask.
- enforce that we have a tag before we get it.
only get vtag when we have vtag like the other drivers.
like if_bge.c:1.312 and if_stge.c:1.64.
fixed by s-yamaguchi@IIJ, thanks.
 1.260.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.260.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.260.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.260.2.3 21-May-2018  pgoyette Sync with HEAD
 1.260.2.2 02-May-2018  pgoyette Synch with HEAD
 1.260.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.270.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.270.2.1 10-Jun-2019  christos Sync with HEAD
 1.276.2.2 25-Oct-2021  martin Pull up following revision(s) (requested by ryo in ticket #1369):

sys/net/if_ethersubr.c: revision 1.302

frame's vlan tag must be ntohs()'ed.

VLAN 0 Priority tag was misrecognized on non vlan-hwtagging interfaces.
 1.276.2.1 08-Oct-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #292):

sys/net/if_ethersubr.c: revision 1.277
sys/net/if_ethersubr.c: revision 1.278

Increment if_iqdrops when dropping an oversized frame.

-

Print oversized frame's message only when DIAGNOSTIC is set. The message
is not so important because we increment if_iqdrops now.
 1.280.2.2 29-Feb-2020  ad Sync with head.
 1.280.2.1 17-Jan-2020  ad Sync with head.
 1.289.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.292.6.1 31-May-2021  cjep sync with head
 1.292.4.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.323.2.1 03-Nov-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #455):
sys/dev/pci/ixgbe/ixgbe.c: revision 1.347
sys/net/if_l2tp.c: revision 1.49
tests/net/if_vlan/t_vlan.sh: revision 1.25
sys/net/if_vlan.c: revision 1.171
sys/net/if_ethersubr.c: revision 1.326
sys/dev/pci/ixgbe/ixv.c: revision 1.194
Use ether_bpf_mtap only when the device supports vlan harware tagging
The function is bpf_mtap() for ethernet devices and *currently*
it is just handling VLAN tag stripped by the hardware.
l2tp(4): use ether_ifattach() to initialize ethercom
Support vlan(4) over l2tp(4)
Added the test for vlan over l2tp
 1.326.4.1 02-Aug-2025  perseant Sync with HEAD
 1.326.2.1 11-Nov-2023  thorpej branches: 1.326.2.1.2;
Mostly de-tangle ifnet::if_snd from ifaltq, in a way that's minimally-
invasive to the ALTQ code itself.

The point of this is to lay the groundwork for future changes to ifqueue,
which among other benefits, will also hide the ALTQ ABI from drivers.
 1.326.2.1.2.2 16-Nov-2023  thorpej Clean up the locking protocol around altq_etherclassify(). It's no longer
required to acquire KERNEL_LOCK *just* because ALTQ is compiled into the
kernel; you only have to acquire it if ALTQ is enabled on the interface
in question.
 1.326.2.1.2.1 15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.64 07-Sep-2024  andvar spelling and grammar fixes, mainly in comments.
 1.63 03-Sep-2022  thorpej branches: 1.63.10;
Garbage-collect the remaining vestiges of netisr.
 1.62 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.61 29-Jan-2020  thorpej branches: 1.61.10;
Adopt <net/if_stats.h>.
 1.60 27-Apr-2019  pgoyette branches: 1.60.4;
A few more empty-string --> NULL in required-modules lists
 1.59 26-Jun-2018  msaitoh branches: 1.59.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.58 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.57 06-Dec-2017  ozaki-r branches: 1.57.2;
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes

And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
 1.56 23-Oct-2017  msaitoh If if_attach() failed in the attach function, free resources and return.
 1.55 12-Dec-2016  ozaki-r branches: 1.55.8;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.54 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.53 10-Jun-2016  ozaki-r branches: 1.53.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.52 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.51 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.50 29-Jul-2014  ozaki-r branches: 1.50.4;
Use if_free instead of free
 1.49 06-Jun-2014  rmind - Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.48 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.47 05-Apr-2010  joerg branches: 1.47.18; 1.47.32;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.46 19-Jan-2010  pooka branches: 1.46.2; 1.46.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.45 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.44 24-Oct-2008  dyoung branches: 1.44.2;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.43 15-Jun-2008  christos branches: 1.43.2;
that should read if_alloc.
 1.42 15-Jun-2008  christos - add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.41 07-Feb-2008  dyoung branches: 1.41.6; 1.41.8; 1.41.10; 1.41.12; 1.41.14;
Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.40 19-Oct-2007  ad branches: 1.40.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.39 04-Mar-2007  christos branches: 1.39.2; 1.39.14; 1.39.16; 1.39.20;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.38 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.37 16-Nov-2006  christos branches: 1.37.4;
__unused removal on arguments; approved by core.
 1.36 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.35 09-Oct-2006  peter Remove unneeded usage of LIST_*.

ok cube@
 1.34 31-Jan-2006  rpaulo branches: 1.34.18; 1.34.20;
Replace the comment that came from if_loop.c many years ago by
something that matches reality.
 1.33 31-Jan-2006  christos PR/32676: Yves-Emmanuel JUTARD: faithprefix should only be defined with INET6
 1.32 11-Dec-2005  thorpej branches: 1.32.2;
ANSI function decls and application of static.
 1.31 11-Dec-2005  christos merge ktrace-lwp.
 1.30 04-Dec-2004  peter branches: 1.30.12;
Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.29 20-Aug-2004  enami Fix compilation error introduced by prevoius commit.
 1.28 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.27 21-Apr-2004  itojun kill sprintf, use snprintf
 1.26 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.25 04-Jun-2002  itojun branches: 1.25.6;
no need to set rmx_send/recvpipe.
 1.24 15-Nov-2001  thorpej branches: 1.24.8;
Someone <sys/param.h> was deleted from the includes list. Add it
back so that this file compiles again.
 1.23 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.22 12-Nov-2001  lukem add RCSIDs
 1.21 18-Jul-2001  thorpej bzero -> memset
 1.20 08-May-2001  itojun branches: 1.20.2;
remove #ifdef for freebsd
 1.19 08-May-2001  itojun correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.18 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.17 20-Feb-2001  itojun branches: 1.17.2;
explicitly use u_int32_t for DLT_NULL encapsulation.

correct gif address family. from chopps, sync with kame.
 1.16 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.15 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.14 18-Dec-2000  thorpej Fill in if_dlt.
 1.13 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.12 04-Jul-2000  thorpej faith(4) is now a cloning pseudo-device.
 1.11 30-Mar-2000  augustss branches: 1.11.4;
Kill some more register declarations.
 1.10 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.9 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.8 22-Dec-1999  itojun fix compilation on sun3x.
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
 1.7 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.6 02-Dec-1999  itojun rcsid police
 1.5 27-Oct-1999  itojun avoid unnecessary file inclusion.
 1.4 12-Jul-1999  bouyer branches: 1.4.2; 1.4.4; 1.4.6;
Needs cpu.h for netisr (compile breaks on sun3).
 1.3 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file if_faith.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file if_faith.c was added on branch chs-ubc2 on 1999-07-01 23:45:19 +0000
 1.4.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.4.1 15-Nov-1999  fvdl Sync with -current
 1.4.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.4.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.4.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.4.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.4.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.4.1 09-May-2001  he Pull up revision 1.19 (requested by itojun):
Correct faith prefix determintaion.
 1.17.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.4 15-Nov-2001  thorpej Merge from -trunk; compilation fix.
 1.17.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.17.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.20.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.20.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.1 03-Aug-2001  lukem update to -current
 1.24.8.1 20-Jun-2002  gehenna catch up with -current.
 1.25.6.5 18-Dec-2004  skrll Sync with HEAD.
 1.25.6.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.25.6.3 18-Sep-2004  skrll Sync with HEAD.
 1.25.6.2 25-Aug-2004  skrll Sync with HEAD.
 1.25.6.1 03-Aug-2004  skrll Sync with HEAD
 1.30.12.6 11-Feb-2008  yamt sync with head.
 1.30.12.5 27-Oct-2007  yamt sync with head.
 1.30.12.4 03-Sep-2007  yamt sync with head.
 1.30.12.3 26-Feb-2007  yamt sync with head.
 1.30.12.2 30-Dec-2006  yamt sync with head.
 1.30.12.1 21-Jun-2006  yamt sync with head.
 1.32.2.1 01-Feb-2006  yamt sync with head.
 1.34.20.2 10-Dec-2006  yamt sync with head.
 1.34.20.1 22-Oct-2006  yamt sync with head
 1.34.18.1 18-Nov-2006  ad Sync with head.
 1.37.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.37.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.39.20.1 25-Oct-2007  bouyer Sync with HEAD.
 1.39.16.2 23-Mar-2008  matt sync with HEAD
 1.39.16.1 06-Nov-2007  matt sync with HEAD
 1.39.14.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.39.2.1 23-Oct-2007  ad Sync with head.
 1.40.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.41.14.1 18-Jun-2008  simonb Sync with head.
 1.41.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.41.10.3 11-Aug-2010  yamt sync with head.
 1.41.10.2 11-Mar-2010  yamt sync with head
 1.41.10.1 04-May-2009  yamt sync with head.
 1.41.8.1 17-Jun-2008  yamt sync with head.
 1.41.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.41.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.43.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.44.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.46.4.1 30-May-2010  rmind sync with head
 1.46.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.47.32.1 10-Aug-2014  tls Rebase.
 1.47.18.2 03-Dec-2017  jdolecek update from HEAD
 1.47.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.50.4.5 05-Feb-2017  skrll Sync with HEAD
 1.50.4.4 05-Oct-2016  skrll Sync with HEAD
 1.50.4.3 09-Jul-2016  skrll Sync with HEAD
 1.50.4.2 29-May-2016  skrll Sync with HEAD
 1.50.4.1 22-Sep-2015  skrll Sync with HEAD
 1.53.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.55.8.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.55.8.1 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.57.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.57.2.1 02-May-2018  pgoyette Synch with HEAD
 1.59.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.59.2.1 10-Jun-2019  christos Sync with HEAD
 1.60.4.1 29-Feb-2020  ad Sync with head.
 1.61.10.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.63.10.1 02-Aug-2025  perseant Sync with HEAD
 1.3 11-Dec-2005  thorpej ANSI function decls and application of static.
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 08-May-2001  itojun branches: 1.1.2; 1.1.4; 1.1.26; 1.1.42;
correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.1.42.1 21-Jun-2006  yamt sync with head.
 1.1.26.1 11-Dec-2005  christos Sync with head.
 1.1.4.2 21-Jun-2001  nathanw Catch up to -current.
 1.1.4.1 08-May-2001  nathanw file if_faith.h was added on branch nathanw_sa on 2001-06-21 20:08:01 +0000
 1.1.2.2 09-May-2001  he Pull up revision 1.1 (new, requested by itojun):
Correct faith prefix determintaion.
 1.1.2.1 08-May-2001  he file if_faith.h was added on branch netbsd-1-5 on 2001-05-09 19:36:42 +0000
 1.15 20-Jan-2020  thorpej Remove FDDI support.
 1.14 25-Dec-2007  perry branches: 1.14.98; 1.14.104;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.13 01-Sep-2007  dyoung branches: 1.13.6; 1.13.8; 1.13.12;
fddi_addmulti and fddi_delmulti are never used in the kernel, so
delete them.
 1.12 04-Mar-2007  christos branches: 1.12.2; 1.12.10; 1.12.14; 1.12.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.11 11-Dec-2005  thorpej branches: 1.11.26;
ANSI function decls and application of static.
 1.10 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.9 13-Jun-2001  wiz branches: 1.9.22; 1.9.38;
withough -> without
 1.8 19-Nov-1999  thorpej branches: 1.8.6;
Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.7 18-May-1999  thorpej branches: 1.7.2; 1.7.8;
Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.6 20-Sep-1998  matt branches: 1.6.8;
Changes so that BPF readers will get the data in fddi packet aligned along
normal boundaries. This makes tcpdump much happier.
 1.5 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.4 24-Mar-1997  thorpej Resolve conflicts from merge.
 1.3 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.2 19-Aug-1995  cgd branches: 1.2.8;
local adaptations, and fix a could of compilation errors
 1.1 19-Aug-1995  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 24-Mar-1997  thorpej Update from Matt Thomas <matt@3am-software.com>. Notable highlight:
the DEFTA (TurboChannel) card now works on the Alpha!
 1.1.1.1 19-Aug-1995  cgd Generic FDDI support by Matt Thomas. Support for DEC "PDQ" FDDI chipset
and for the PCI attachment of said chipset ("if_fpa"), also from Matt Thomas.
Arguably, pdq* doesn't belong in sys/dev/ic, but it's going to be shared by
various bus attachment devices at some point in the future, and there's no
other place that seems to fit as well.
 1.2.8.1 20-Feb-1997  is Give fddi_ifattach() a 2nd parameter, like ether_ifattach(): a pointer to the
link level address. xxx_ifattach() copy it to the sockaddr_dl structure
associated with the interface.

Change pdq to pass that parameter.
Change frontend to not copy the l.l.a. itself.
 1.6.8.1 21-Jun-1999  thorpej Sync w/ -current.
 1.7.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.9.38.3 21-Jan-2008  yamt sync with head
 1.9.38.2 03-Sep-2007  yamt sync with head.
 1.9.38.1 21-Jun-2006  yamt sync with head.
 1.9.22.1 11-Dec-2005  christos Sync with head.
 1.11.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.12.16.2 09-Jan-2008  matt sync with HEAD
 1.12.16.1 06-Nov-2007  matt sync with HEAD
 1.12.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.12.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.12.2.1 09-Oct-2007  ad Sync with head.
 1.13.12.1 02-Jan-2008  bouyer Sync with HEAD
 1.13.8.1 26-Dec-2007  ad Sync with head.
 1.13.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.14.104.1 25-Jan-2020  ad Sync with head.
 1.14.98.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.112 20-Jan-2020  thorpej Remove FDDI support.
 1.111 05-Feb-2019  msaitoh branches: 1.111.6;
Remove very old IFF_NOTRAILERS flag.
 1.110 03-Feb-2019  mrg - add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily
 1.109 09-May-2018  maxv branches: 1.109.2;
Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
 1.108 29-Apr-2018  maxv Add missing pserialize_read_exit in error branch, spotted during my
previous commit.
 1.107 29-Apr-2018  maxv Remove references to m_copy in comments.
 1.106 26-Apr-2018  maxv m_copy -> m_copym
 1.105 14-Feb-2017  ozaki-r branches: 1.105.12;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.104 24-Jan-2017  maxv Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.103 11-Jan-2017  ozaki-r branches: 1.103.2;
Get rid of unnecessary header inclusions
 1.102 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.101 03-Oct-2016  ozaki-r Add missing return
 1.100 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.99 28-Apr-2016  ozaki-r branches: 1.99.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.98 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.97 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.96 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.95 13-Oct-2015  roy arpresolve() now returns 0 on success otherwise an error code.
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.

This allows applications such as ping(8) to display a suitable error
condition.
 1.94 30-Sep-2015  ozaki-r Remove extra opt_gateway.h
 1.93 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.92 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.91 25-May-2015  ozaki-r Remove leftover DECNET-related stuffs

No objection on tech-kern and tech-net.
 1.90 25-May-2015  ozaki-r Remove leftover IPX-related stuffs

No objection on tech-kern and tech-net.
 1.89 20-May-2015  ozaki-r Remove leftover use of AF_NS and NS option

Unnecessary NETISR_NS is also removed.
 1.88 07-Jun-2014  martin branches: 1.88.2; 1.88.4; 1.88.6; 1.88.8;
Try to untangle the ifdef mess a bit more
 1.87 06-Jun-2014  rmind Adjust previous change for the #ifdef mess and fix the build.
 1.86 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.85 15-May-2014  msaitoh Put schednetisr() into splnet()/splx() pair.
This might avoids delay of processing a packet.
 1.84 01-Mar-2013  joerg branches: 1.84.6; 1.84.10;
Retire OSI network stack. OK core@
 1.83 05-Feb-2013  joerg Remove remnants of AF_IMPLINK.
 1.82 11-Oct-2012  christos PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.81 05-Apr-2010  joerg branches: 1.81.8; 1.81.14; 1.81.18; 1.81.20;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.80 04-Feb-2010  joerg branches: 1.80.2; 1.80.4;
Explicitly include opt_gateway.h when depending on GATEWAY.
 1.79 19-Jan-2010  pooka Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.78 20-Nov-2009  christos ar_tha() can return NULL; treat this as an error.
 1.77 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.76 11-May-2008  dyoung branches: 1.76.4; 1.76.6; 1.76.8; 1.76.10; 1.76.12;
Where applicable, s/0/NULL/, s/Bcmp/memcmp/. Remove a gratuitous
cast from a call to nd6_storelladdr().
 1.75 20-Feb-2008  matt branches: 1.75.6; 1.75.8; 1.75.10; 1.75.12;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.74 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.73 19-Oct-2007  ad branches: 1.73.4; 1.73.8;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.72 30-Aug-2007  dyoung branches: 1.72.4;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.71 26-Aug-2007  dyoung branches: 1.71.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.70 07-Aug-2007  dyoung branches: 1.70.2;
Constify.
 1.69 21-Jul-2007  dyoung branches: 1.69.4;
Use NULL instead of 0 for null pointers.
 1.68 07-Mar-2007  liamjfoy branches: 1.68.2; 1.68.10;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.67 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.66 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.65 10-Dec-2006  is branches: 1.65.2;
Explain llc XID magic constants, correcting the XID header format tag.
 1.64 10-Dec-2006  is Avoid overlapping struct assignment for FDDI. Should fix netiso like in the
Ethernet case.
 1.63 07-Sep-2006  dogcow branches: 1.63.2; 1.63.4; 1.63.6;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.62 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.61 18-May-2006  liamjfoy branches: 1.61.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.60 12-May-2006  mrg since ar_tha() can return NULL, don't pass it directly to functions
that expect real addresses. explicitly KASSERT() that it is not
NULL in the kernel and just avoid using it userland.

(the kernel could be more defensive about this, but, until now it
would have just crashed anyway.)
 1.59 15-Apr-2006  christos Coverity CID 1146: Protect against NULL deref.
 1.58 11-Dec-2005  thorpej branches: 1.58.4; 1.58.6; 1.58.8; 1.58.10; 1.58.12;
ANSI function decls and application of static.
 1.57 11-Dec-2005  christos merge ktrace-lwp.
 1.56 30-May-2005  christos branches: 1.56.2;
bcopy -> memcpy
bcmp -> memcmp
and remove casts.
 1.55 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.54 26-Feb-2005  perry nuke trailing whitespace
 1.53 06-Dec-2004  christos branches: 1.53.4; 1.53.6;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.52 22-Mar-2004  matt Update my copyright to not include advertising clause.
 1.51 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.50 23-Jun-2003  martin branches: 1.50.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.49 16-May-2003  itojun use strlcpy
 1.48 14-May-2003  itojun unifdef for readability/clarity. thorpej ok
 1.47 26-Feb-2003  matt Fix tpyo.
 1.46 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.45 11-Sep-2002  itojun KNF - return is not a function.
 1.44 04-Jun-2002  itojun add a blank line
 1.43 12-Nov-2001  lukem branches: 1.43.8;
add RCSIDs
 1.42 17-Oct-2001  itojun unifdef OLDIP6OUTPUT
 1.41 24-Jul-2001  matt Update the max_linkhdr when we attach a fddi interface.
 1.40 14-Jun-2001  itojun branches: 1.40.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.39 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.38 17-Jan-2001  thorpej branches: 1.38.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.37 18-Dec-2000  thorpej Fill in if_dlt.
 1.36 13-Dec-2000  thorpej Add ALTQ glue.
 1.35 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.34 15-Oct-2000  itojun suppress warning on nd6_storelladdr failure. the failure could happen
easily when we have routing table with too many entries. sync with kame.
 1.33 14-Jun-2000  mycroft Check the multicast bit in the header mbuf while interrupts are still blocked.
Otherwise we can run off into space if the packet was sent immediately and the
mbuf freed.
Pointed out by Boris Popov (not on our lists).
 1.32 28-May-2000  matt Fix bpf output on fddi to actually work. Make it compatible with ULTRIX
and Tru64.
 1.31 30-Mar-2000  augustss branches: 1.31.2;
Kill some more register declarations.
 1.30 06-Mar-2000  thorpej Use the new macros in if.h for setting ifp->if_baudrate.
 1.29 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.28 21-Sep-1999  matt branches: 1.28.2; 1.28.8;
Make NETATALK over FDDI.
 1.27 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.26 18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.25 10-Dec-1998  christos branches: 1.25.4; 1.25.6;
IPX fixes.
 1.24 13-Oct-1998  kim branches: 1.24.4;
Use ETHERTYPE_ATALK instead of ETHERTYPE_AT. The former seems more common.
Our other constants also use "ATALK".

Added many new ETHERTYPE constants to sys/net/ethertypes.h, including the
ones from libpcap and tcpdump "ethertype.h" files.
 1.23 05-Jul-1998  jonathan defopt NS, NSIP.
 1.22 05-Jul-1998  jonathan defopt ISO TPIP.
 1.21 05-Jul-1998  jonathan defopt LLC
 1.20 05-Jul-1998  jonathan defopt CCITT.
 1.19 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.18 01-May-1998  thorpej Squash a typo.
 1.17 01-May-1998  thorpej Add FDDI source address spoofing via pseudo_AF_HDRCMPLT.
 1.16 29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.15 02-Oct-1997  is Reimplement a test for broadcast addresses advertized, which was left out
when rewriting the ARP system.
 1.14 03-Apr-1997  christos branches: 1.14.4;
Fix compile problems (from Veego)
 1.13 02-Apr-1997  christos Add netatalk stubs.
 1.12 24-Mar-1997  thorpej Resolve conflicts from merge.
 1.11 19-Mar-1997  is Deal with AF_ARP on transmission --- without it, the new ARP code doesn't
work.
 1.10 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.9 21-Oct-1996  perry branches: 1.9.4;
Small fix to make this compile even if no BPFs are being compiled
in. Unfortunately, the BPF-only code called a label that wan't also
being #if'ed, and this made the compiler bitch. Now that we compile
with -Werror, this prevented the thing from compiling at all! (sigh)
 1.8 13-Oct-1996  christos backout previous kprintf change
 1.7 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.6 10-Jul-1996  cgd #ifdef the declaration of 'ac' in fddi_input on ISO, since it's only
used if ISO is defined and -Wall complains.
 1.5 07-May-1996  christos Fix new warnings.
 1.4 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.3 24-Dec-1995  mycroft Remove old comment regarding trailers.
Fix a diagnostic message.
Make some variables use fixed-size types.
Initialize if_output in fddi_ifattach().
 1.2 19-Aug-1995  cgd local adaptations, and fix a could of compilation errors
 1.1 19-Aug-1995  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 24-Mar-1997  thorpej Update from Matt Thomas <matt@3am-software.com>. Notable highlight:
the DEFTA (TurboChannel) card now works on the Alpha!
 1.1.1.1 19-Aug-1995  cgd Generic FDDI support by Matt Thomas. Support for DEC "PDQ" FDDI chipset
and for the PCI attachment of said chipset ("if_fpa"), also from Matt Thomas.
Arguably, pdq* doesn't belong in sys/dev/ic, but it's going to be shared by
various bus attachment devices at some point in the future, and there's no
other place that seems to fit as well.
 1.9.4.3 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.9.4.2 20-Feb-1997  is Give fddi_ifattach() a 2nd parameter, like ether_ifattach(): a pointer to the
link level address. xxx_ifattach() copy it to the sockaddr_dl structure
associated with the interface.

Change pdq to pass that parameter.
Change frontend to not copy the l.l.a. itself.
 1.9.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.14.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.24.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.25.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.25.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.25.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.25.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.28.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.28.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.28.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.28.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.28.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.28.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.31.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.38.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.38.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.38.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.38.2.3 22-Oct-2001  nathanw Catch up to -current.
 1.38.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.38.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.40.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.40.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.40.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.40.2.1 03-Aug-2001  lukem update to -current
 1.43.8.1 20-Jun-2002  gehenna catch up with -current.
 1.50.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.50.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.50.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.50.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.50.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.50.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.50.2.1 03-Aug-2004  skrll Sync with HEAD
 1.53.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.53.4.1 29-Apr-2005  kent sync with -current
 1.56.2.7 27-Feb-2008  yamt sync with head.
 1.56.2.6 21-Jan-2008  yamt sync with head
 1.56.2.5 27-Oct-2007  yamt sync with head.
 1.56.2.4 03-Sep-2007  yamt sync with head.
 1.56.2.3 26-Feb-2007  yamt sync with head.
 1.56.2.2 30-Dec-2006  yamt sync with head.
 1.56.2.1 21-Jun-2006  yamt sync with head.
 1.58.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.58.10.1 19-Apr-2006  elad sync with head.
 1.58.8.3 14-Sep-2006  yamt sync with head.
 1.58.8.2 26-Jun-2006  yamt sync with head.
 1.58.8.1 24-May-2006  yamt sync with head.
 1.58.6.3 01-Jun-2006  kardel Sync with head.
 1.58.6.2 22-Apr-2006  simonb Sync with head.
 1.58.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.58.4.1 09-Sep-2006  rpaulo sync with head
 1.61.2.1 19-Jun-2006  chap Sync with head.
 1.63.6.1 18-Dec-2006  tron Pull up following revision(s) (requested by is in ticket #280):
sys/net/if_fddisubr.c: revision 1.64
sys/netiso/clnp_input.c: revision 1.34
Avoid overlapping struct assignment for FDDI. Should fix netiso like in the
Ethernet case.
 1.63.4.1 18-Dec-2006  yamt sync with head.
 1.63.2.1 12-Jan-2007  ad Sync with head.
 1.65.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.65.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.68.10.2 03-Sep-2007  skrll Sync with HEAD.
 1.68.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.68.2.3 23-Oct-2007  ad Sync with head.
 1.68.2.2 09-Oct-2007  ad Sync with head.
 1.68.2.1 20-Aug-2007  ad Sync with HEAD.
 1.69.4.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.69.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.69.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.70.2.2 07-Aug-2007  dyoung Constify.
 1.70.2.1 07-Aug-2007  dyoung file if_fddisubr.c was added on branch matt-mips64 on 2007-08-07 04:38:18 +0000
 1.71.2.3 23-Mar-2008  matt sync with HEAD
 1.71.2.2 09-Jan-2008  matt sync with HEAD
 1.71.2.1 06-Nov-2007  matt sync with HEAD
 1.72.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.73.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.73.4.1 26-Dec-2007  ad Sync with head.
 1.75.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.75.10.4 11-Aug-2010  yamt sync with head.
 1.75.10.3 11-Mar-2010  yamt sync with head
 1.75.10.2 04-May-2009  yamt sync with head.
 1.75.10.1 16-May-2008  yamt sync with head.
 1.75.8.1 18-May-2008  yamt sync with head.
 1.75.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.75.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.76.12.1 21-Apr-2010  matt sync to netbsd-5
 1.76.10.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.76.8.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.76.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.76.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.80.4.1 30-May-2010  rmind sync with head
 1.80.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.81.20.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.81.18.5 03-Dec-2017  jdolecek update from HEAD
 1.81.18.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.81.18.3 23-Jun-2013  tls resync from head
 1.81.18.2 25-Feb-2013  tls resync with head
 1.81.18.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.81.14.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.81.14.1 31-Oct-2012  riz branches: 1.81.14.1.2;
Pull up following revision(s) (requested by christos in ticket #638):
sys/net/if_ppp.c: revision 1.137
sys/netinet6/ip6_flow.c: revision 1.20
sys/net/if_fddisubr.c: revision 1.82
sys/net/if_ethersubr.c: revision 1.192
sys/netinet6/in6_var.h: revision 1.66
sys/net/if_atmsubr.c: revision 1.50
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.81.14.1.2.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.81.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.81.8.1 30-Oct-2012  yamt sync with head
 1.84.10.1 10-Aug-2014  tls Rebase.
 1.84.6.1 18-May-2014  rmind sync with head
 1.88.8.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.88.6.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.88.4.9 28-Aug-2017  skrll Sync with HEAD
 1.88.4.8 05-Feb-2017  skrll Sync with HEAD
 1.88.4.7 05-Oct-2016  skrll Sync with HEAD
 1.88.4.6 29-May-2016  skrll Sync with HEAD
 1.88.4.5 22-Apr-2016  skrll Sync with HEAD
 1.88.4.4 19-Mar-2016  skrll Sync with HEAD
 1.88.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.88.4.2 22-Sep-2015  skrll Sync with HEAD
 1.88.4.1 06-Jun-2015  skrll Sync with HEAD
 1.88.2.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.99.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.99.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.99.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.103.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.105.12.2 21-May-2018  pgoyette Sync with HEAD
 1.105.12.1 02-May-2018  pgoyette Synch with HEAD
 1.109.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.109.2.1 10-Jun-2019  christos Sync with HEAD
 1.111.6.1 25-Jan-2020  ad Sync with head.
 1.159 15-Sep-2024  skrll Drop locks before freeing unreferenced memory in gif_set_tunnel
 1.158 10-Feb-2024  andvar branches: 1.158.2;
Fix various typos in comments, log messages and documentation.
 1.157 03-Sep-2022  thorpej branches: 1.157.4; 1.157.8;
Garbage-collect the remaining vestiges of netisr.
 1.156 11-Oct-2021  knakahara Make pktq_rps_hash() pluggable for each interface type. Reviewed by gdt@n.o, thorpej@n.o, and riastradh@n.o, thanks.
 1.155 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.154 14-Oct-2020  roy branches: 1.154.6;
gif: Set the link state UP if we have a tunnel, otherwise DOWN.
 1.153 30-Mar-2020  christos On detach, destroy the mutex attach created, otherwise we crash with LOCKDEBUG.
XXX: other interface drivers have this issue.
 1.152 01-Feb-2020  riastradh Switch if_gif to atomic_load/store_*.
 1.151 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.150 30-Oct-2019  knakahara branches: 1.150.2;
Add sysctl nodes to control fragmentation with IPv[46] over IPv6 gif(4).

New sysctl node "net.inet6.ip6.gifpmtu" means
- 0 (default)
Fragment by IPV6_MMTU. All packets reach the destination certainly,
however the long packet performance is poor.
This is same behavior as before.
- 1
Fragment by outer interface's MTU. The long packet performance would
be good, however the packets may be dropped in some network paths
whose path MTU less than the interface's MTU.
- others
undefined yet

New sysctl node "net.interfaces.gif*.pmtu" means
- -1 (default)
Use system default value (net.inet6.ip6.gifpmtu).
- 0
Fragment by IPV6_MMTU for this gif(4) tunnel.
- 1
Fragment by outer interface's MTU for this gif(4) tunnel.
- others
undefined yet

See RFC4459 for more information and other solutions.
 1.149 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.148 25-Jun-2019  msaitoh branches: 1.148.2;
Simplify "LIST_HEAD();" to make the code more understandable.
No functional change.
 1.147 18-Jun-2019  msaitoh No functional change:
- Fix typo (s/configureation/configuration/)
- KNF
 1.146 22-Apr-2019  knakahara fix a potential bug of gif(4) check for tunnel duplicate.

This problem has not actualized thanks to check for duplicate
in encap_attach().
 1.145 12-Nov-2018  knakahara Fix ALTQ on gif(4). Reported and tested by Anthony Mallet, advised by Greg Troxel, thanks.

l2tp(4) and ipsecif(4) don't support ALTQ yet. So, they don't require this fix.

XXX pullup-8
 1.144 19-Oct-2018  knakahara Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.143 26-Jun-2018  msaitoh branches: 1.143.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.142 25-May-2018  ozaki-r Ensure to call if_register after interface initializations finish
 1.141 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.140 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.139 12-Feb-2018  maxv branches: 1.139.2;
Use m_freem instead of m_free. Otherwise we're leaking the next mbufs in
the chain.
 1.138 15-Jan-2018  maxv Fix spl leak.

ifconfig gif0 create
ifconfig gif0 destroy
WARNING: SPL NOT LOWERED ON ...
 1.137 21-Dec-2017  knakahara remove duplicated null ckeck
 1.136 09-Dec-2017  pgoyette Split ip_ecn code into its own module, so it can be shared between
gif(4), stf(4), and ipsec(4). Without this, loading the if_gif
module can result in redefined global symbols if either ipsec(4) or
stf(4) but not gif(4) is built into the kernel.

Fixes PR kern/52795 (as reported by martin@ via irc).

XXX pullup to netbsd-8
 1.135 06-Dec-2017  knakahara unify processing to check nesting count for some tunnel protocols.
 1.134 27-Nov-2017  knakahara IFF_RUNNING checking in Rx and Tx processing is unnecessary now.

Because the configs of gif (members of gif_var) are protected by psref(9).
 1.133 27-Nov-2017  knakahara preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).

After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).

update locking notes later.
 1.132 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.131 23-Oct-2017  msaitoh If if_initialize() failed in the attach function, free resources and return.
 1.130 21-Sep-2017  knakahara add lock for sclist to exclude ifconfig gifX add/delete and ifconfig gifX tunnel
 1.129 21-Sep-2017  knakahara add lock for percpu route like l2tp(4).
 1.128 08-Aug-2017  knakahara fix leak when encap_attach() fails twice.

XXX need pullup to -8 branch
 1.127 22-Jun-2017  knakahara I have forgotten to commit this gif(4) MP-ify patch for a long time, sorry.
 1.126 01-Jun-2017  chs branches: 1.126.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.125 13-Feb-2017  ozaki-r Remove unnecessary splnet

ok @knakahara
 1.124 14-Dec-2016  knakahara branches: 1.124.2;
fix race of gif_softc->gif_ro when we send multiple flows over gif on NET_MPSAFE enabled kernel.

make gif_softc->gif_ro percpu as well as ipforward_rt to resolve this race.
and add future TODO comment for etherip(4).
 1.123 15-Sep-2016  knakahara kmem_alloc(size, KM_SLEEP) return value NULL check is not required any more.

kmem_alloc(size, KM_SLEEP) is already fixed, that is, it never return NULL.
see: sys/kern/subr_kmem.c:r1.62
 1.122 01-Sep-2016  knakahara gif(4)'s if_output() is already MP-safe. It should enable IFEF_OUTPUT_MPSAFE.
 1.121 18-Aug-2016  knakahara fix: failed to create sysctl entries for module version gif(4).

The sysctl entries are below 2 entries.
- net.inet.ip.gifttl
- net.inet6.ip6.gifhlim
 1.120 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.119 04-Jul-2016  knakahara branches: 1.119.2;
Don't use IFQ_ENQUEUE/IFQ_DEQUEUE in the MP-ified interface without whole lock.

That causes reoder per flow, as there can be below situation
(1) CPU#A does IFQ_DEQUEUE
(2) CPU#A sleeps by some reason
(3) CPU#B does IFQ_DEQUEUE
 1.118 04-Jul-2016  knakahara make gif(4) and ip_encap MP-ify
 1.117 04-Jul-2016  knakahara make encap_lock_{enter,exit} interruptable.
 1.116 04-Jul-2016  knakahara fix: gif(4) receive side race

A panic cause in rn_match() called by encap[46]_lookup(). The reason is that
gif(4) does not suspend receive packet processing in spite of suspending
transmit packet processing while anyone is doing gif(4) ioctl.
 1.115 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (2/2) : ip_encap side

The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
 1.114 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (1/2) : gif(4) side

To prevent calling softint_schedule() after called softint_disestablish(),
the following modifications are added
+ ioctl (writing configuration) side
- off IFF_RUNNING flag before changing configuration
- wait softint handler completion before changing configuration
+ packet processing (reading configuraiotn) side
- if IFF_RUNNING flag is on, do nothing
+ in whole
- add gif_list_lock_{enter,exit} to prevent the same configuration is
set to other gif(4) interfaces
 1.113 27-Jun-2016  knakahara gif(4) does not need link state changing interrupts
 1.112 24-Jun-2016  knakahara eliminate unused softint for gif(4) Rx
 1.111 24-Jun-2016  knakahara eliminate gif(4) Tx softint

- remove gif_si from struct gif_softc
- directly call gifintr() from gif_output()
- rename gifintr() to gif_start()
- remove Tx softint processing from gif_set_tunnel() and gif_delete_tunnel()
 1.110 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.109 31-May-2016  knakahara modify some functions static. no functional change.
 1.108 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.107 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.106 26-Feb-2016  knakahara To eliminate gif_softc_list linear search, add extra argument to encapsw.pr_ctlinput().
 1.105 18-Jan-2016  knakahara Refactor protosw codes in gif(4). No functional change.

- remove unnecessary include
- reduce scopes
 1.104 08-Jan-2016  knakahara eliminate ip_input.c and ip6_input.c dependency on gif(4)
 1.103 04-Jan-2016  knakahara Revert extra wating codes.

PR kern/50522 is actually fixed by sys/kern/kern_softint.c:r1.42, so waiting
codes in if_gif.c is not required.
 1.102 11-Dec-2015  knakahara PR kern/50522: gif(4) ioctl causes panic while someone is using the gif(4) interface.

It is required to wait other CPU's softint completion before disestablishing
the softint handler.
 1.101 11-Dec-2015  knakahara revert KASSERT. It should use 'if' instead of KASSERT.

see updated(later than r1.18) kmem(9) man.
 1.100 10-Dec-2015  knakahara kmem_zalloc(, KM_SLEEP) must not return NULL.
 1.99 10-Dec-2015  knakahara add NULL check
 1.98 09-Dec-2015  knakahara gif(4) uses kmem_alloc APIs instead of malloc.
 1.97 09-Dec-2015  knakahara Refactor gif_set_tunnel(). No functional change.
 1.96 09-Dec-2015  knakahara Improve gif_set_tunnel() rollback code.
 1.95 04-Dec-2015  knakahara gif(4): Infinite recursion calls prevention code works again now.

The prevention code haven't worked since gif(4) was changed
to use softint(9). To work this prevention, git_output uses
m_tag(9) like FreeBSD and OpenBSD.

I tested with following code.
 1.94 03-Dec-2015  knakahara LIST_REMOVE should be done before clearing members of the list element.
 1.93 03-Dec-2015  knakahara remove extra encap_detach().

encap_detach() is already done in gif_delete_tunnel()->in{,6}_gif_detach().
 1.92 11-Nov-2015  knakahara fix CID 980463
 1.91 11-Nov-2015  knakahara fix panic after "ifconfig gifX tunnel src dst" failed for the reason of address pair duplication.

e.g.
====================
# ifconfig gif0 create
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.2
# ifconfig gif0 inet 172.16.0.1/24 172.16.0.2
# route add 10.1.0.0/24 172.16.0.1

# ifconfig gif1 create
# ifconfig gif1 tunnel 192.168.0.1 192.168.0.3

# ifconfig gif0 tunnel 192.168.0.1 192.168.0.3
ifconfig: SIOCSLIFPHYADDR: Can't assign requested address # expected
# ping 10.1.0.1
(panic)
====================
 1.90 10-Nov-2015  christos correct mistake in previous
 1.89 10-Nov-2015  christos CID 980463: Provide common error path for rollback. Remove extra check for
success.
 1.88 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.87 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.86 03-Jun-2015  martin Include <sys/socketvar.h> for softnet_lock.
 1.85 03-Jun-2015  hsuenaga Obtain softnet_lock before entering IP networking stack from gif software
interrupt.
 1.84 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.83 05-Jun-2014  rmind branches: 1.83.2; 1.83.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.82 01-Mar-2013  joerg branches: 1.82.10;
Retire OSI network stack. OK core@
 1.81 19-Jan-2013  degroote PR kern/47419: Antony Mallet: ifconfig doesn't diplay MTU on gif(4)

There is no special treatment for SIOCGITMTU in gif(4), so just pass it to
ifioctl_common().
 1.80 28-Oct-2011  dyoung branches: 1.80.2; 1.80.8; 1.80.12; 1.80.14;
Don't kauth-orize SIOCDIFPHYADDR, SIOCSIFFLAGS, SIOCSIFMTU, or
SIOCSLIFPHYADDR, in gif_ioctl() or in gre_ioctl(), because those
operations are ordinarily kauth-orized already in ifioctl().

Kauth-orizing SIOCSIFFLAGS in gre_ioctl() caused a panic ("panic:
bpf_detachd: ifpromisc failed: 1") when tcpdump(8) was interrupted.
Somehow bpf(4) enables promiscuous mode using different credentials than
it uses to disable promiscuous mode, hence the ifpromisc failure. This
may have something to do with privilege-separation in tcpdump(8). I.e.,
an LWP with SIOCSIFFLAGS privilege opens /dev/bpf, but an LWP without
SIOCSIFFLAGS privilege closes it.
 1.79 27-Oct-2011  dyoung Fix gif(4)/gre(4) operation over interfaces such as wm(4) that do IPv4
checksum-offload. Note well: it really is necessary to clear the
csum_data.

While I'm here, remove the do-nothing case for SIOCSIFDSTADDR and let
ifioctl_common() or the protocol handle it.
 1.78 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.77 19-Jan-2010  pooka branches: 1.77.2; 1.77.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.76 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.75 15-Jun-2008  christos branches: 1.75.2; 1.75.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.74 20-Feb-2008  matt branches: 1.74.6; 1.74.8; 1.74.10; 1.74.12; 1.74.14;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.73 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.72 08-Oct-2007  ad branches: 1.72.4;
Use the softint API.
 1.71 16-Sep-2007  dyoung branches: 1.71.2;
Save some lines of code, use sockaddr_dup(), sockaddr_free(),
sockaddr_cmp(). No functional change intended.

Bug fix: pass M_WAITOK, not M_WAIT, to malloc(9).
 1.70 14-Jul-2007  ad branches: 1.70.6; 1.70.8;
Generic soft interrupts are mandatory.
 1.69 06-May-2007  dyoung Free the route cache after detaching the interface w/ if_detach()
instead of before, because if_detach() may cause the cache to be
reloaded. (I already fixed this in both etherip(4) and gre(4).
Ewww, rampant code duplication.)
 1.68 17-Mar-2007  dyoung bcopy -> memcpy, bcmp -> memcmp.

Don't open-code LIST_FOREACH().
 1.67 04-Mar-2007  christos branches: 1.67.2; 1.67.4; 1.67.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.66 23-Feb-2007  dyoung In gif_clone_destroy(), free the cached route before freeing the
interface.
 1.65 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.64 23-Nov-2006  rpaulo branches: 1.64.4;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.63 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.62 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.61 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.60 23-Jul-2006  ad branches: 1.60.4; 1.60.6;
Use the LWP cached credentials where sane.
 1.59 14-May-2006  elad integrate kauth.
 1.58 08-Mar-2006  msaitoh branches: 1.58.2;
fix memory leak when resetting the source address and destination address.
 1.57 28-Dec-2005  christos branches: 1.57.4; 1.57.6; 1.57.8; 1.57.10;
make this compile with no INET option
 1.56 11-Dec-2005  thorpej ANSI function decls and application of static.
 1.55 11-Dec-2005  christos merge ktrace-lwp.
 1.54 06-Jun-2005  martin branches: 1.54.2;
Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.53 02-Jun-2005  tron Remove type casts and lint directives which are now longer necessary
because the first argument of m_copydata() is "const struct mbuf *" now.
 1.52 29-May-2005  christos - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.51 20-May-2005  christos PR/30285: Mile Nordin: incorrect permission check joining/leaving multicast
groups.
 1.50 26-Feb-2005  perry branches: 1.50.2;
nuke trailing whitespace
 1.49 01-Feb-2005  he Fix "unused local variable" warning/error if compiling without
bridge support by making variable declaration conditional. Found
while compiling for shark.
 1.48 31-Jan-2005  kim Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
 1.47 04-Dec-2004  peter branches: 1.47.4; 1.47.6;
Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.46 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.45 21-Apr-2004  itojun kill sprintf, use snprintf
 1.44 28-Oct-2003  mycroft branches: 1.44.2; 1.44.4;
Return a sensible error code in the previous.
 1.43 25-Oct-2003  christos Fix uninitialized variable warnings
 1.42 11-Nov-2002  itojun branches: 1.42.6;
make USE_ENCAPCHECK (in netinet*/*gif.c) to global option, GIF_ENCAPCHECK.
#ifdef out unneeded code when possible.
From: Krister Walfridsson <cato@df.lth.se>
 1.41 13-Jun-2002  itojun drop too short IPv6 frame
 1.40 26-Mar-2002  christos branches: 1.40.2; 1.40.4;
We are not guaranteed that we have enough bytes to get a struct ip from our
mbuf. So if we receive a short packet, that looks like gif we would panic.
Reviewed by thorpej, tested by Kimmo Suominen and Andreas Wrede. Thanks for
the help in tracking this down.
 1.39 05-Mar-2002  itojun bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.38 14-Jan-2002  kleink Include <machine/intr.h> unconditionally, instead of only doing so if
__HAVE_GENERIC_SOFT_INTERRUPTS and relying on <sys/param.h> to provide it
otherwise; pointed out by Aymeric Vincent.
 1.37 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.36 12-Nov-2001  lukem add RCSIDs
 1.35 26-Sep-2001  itojun don't softintr_disestablish twice.
previous code panic'ed with the following command sequence:
# ifconfig gif0 create tunnel A B
# ifconfig gif0 deletetunnel
# ifconfig gif0 destroy
 1.34 20-Aug-2001  itojun branches: 1.34.2;
fix ALTQ support. less diff with kame. kjc@csl.sony.co.jp.
 1.33 16-Aug-2001  itojun gif interface now uses generic software interrupt
(on archs that support it). also, make gif ALTQ-capable on outgoing.
sync with kame, comments from thorpej.
 1.32 30-Jul-2001  itojun raise IFF_UP on SIOCSIFADDR. commented by tv@netbsd, sync with kame
 1.31 29-Jul-2001  itojun sync gif interface code with latest kame.
IFF_RUNNING is clearified. attach/detach logic is more clearner.
the old code mistakenly set IFF_UP by itself, now the behavior is gone.
 1.30 18-Jul-2001  thorpej bzero -> memset
 1.29 14-Jun-2001  itojun branches: 1.29.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.28 04-Jun-2001  itojun if_up() requires splsoftnet. sync with kame
 1.27 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.26 21-Feb-2001  itojun branches: 1.26.2;
remove necessary global variable for eon processing. from chopps,
sync with kame
 1.25 20-Feb-2001  itojun add SIOC[SG]LIFPHYADDR ioctl. greatly simplify tunnel address settings.
sync with kame. old ioctls are supplied but not recommended for new code.
 1.24 20-Feb-2001  itojun comment on dispatches (clearify inner/outer)
 1.23 20-Feb-2001  itojun use u_int32_t, not u_int, for DLT_NULL encapsulation.
 1.22 20-Feb-2001  itojun explicitly use u_int32_t for DLT_NULL encapsulation.

correct gif address family. from chopps, sync with kame.
 1.21 20-Feb-2001  itojun cosmetic; do not use register variable declaration. sync with kame
 1.20 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.19 18-Dec-2000  thorpej Fill in if_dlt.
 1.18 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.17 19-Nov-2000  martin Allow changing of settings via ioctl only for the superuser.
Fixes PR security/11524.
 1.16 07-Oct-2000  itojun validate args to SIOC[SG]IFPHY* better.
 1.15 02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.14 06-Jul-2000  itojun remove #ifdef __FreeBSD__ or __bsdi__, as netbsd if_gif.c diverged a little
from kame tree
 1.13 05-Jul-2000  thorpej Fix a memory leak in the gif_clone_create() error path.
 1.12 02-Jul-2000  thorpej Convert `gif' to be a cloning interface.
 1.11 20-Jun-2000  itojun allow IPv[46]-over-IPv6 setting properly. sync with kame.
 1.10 17-May-2000  itojun branches: 1.10.4;
improve duplicated 'gifconfig" check (fatal typo was there). sync with kame.
 1.9 19-Apr-2000  itojun introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.8 30-Mar-2000  augustss Kill some more register declarations.
 1.7 17-Jan-2000  itojun we don't need IFF_RUNNING for gif.
 1.6 17-Jan-2000  itojun for gif interface, sync IFF_RUNNING with IFF_UP. it does not
make sense to leave IFF_RUNNING during !IFF_UP (it is pseudo interface
so we need to immitate - or is it okay if we don't raise IFF_RUNNING?)
 1.5 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.4 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.3 02-Dec-1999  itojun rcsid police
 1.2 01-Jul-1999  itojun branches: 1.2.2; 1.2.4; 1.2.10;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file if_gif.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.10.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.2.4.8 23-Apr-2001  bouyer Kill unwanted differences with HEAD
 1.2.4.7 21-Apr-2001  bouyer Sync with HEAD
 1.2.4.6 12-Mar-2001  bouyer Sync with HEAD.
 1.2.4.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.2.4.4 05-Jan-2001  bouyer Sync with HEAD
 1.2.4.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.2.4.2 22-Nov-2000  bouyer Sync with HEAD.
 1.2.4.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file if_gif.c was added on branch chs-ubc2 on 1999-07-01 23:45:19 +0000
 1.10.4.4 11-Apr-2002  he Pull up revision 1.22 (via patch, requested by jtk):
Use an explicitly sized type for DLT_NULL encapsulation.
Correct gif address family.
 1.10.4.3 19-Nov-2000  tv Pullup 1.17 [sommerfeld]:
Allow changing of settings via ioctl only for the superuser.
Fixes PR security/11524.
 1.10.4.2 17-Oct-2000  tv Pullup 1.16 [itojun]:
validate args to SIOC[SG]IFPHY* better.
 1.10.4.1 20-Jun-2000  itojun permit configuration ioctl for tunnel-over-IPv6.
approved by: releng-1-5
 1.26.2.11 11-Dec-2002  thorpej Sync with HEAD.
 1.26.2.10 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.26.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.26.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.26.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.26.2.6 28-Feb-2002  nathanw Catch up to -current.
 1.26.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.26.2.4 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.26.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.26.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.26.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.29.2.6 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.29.2.5 16-Mar-2002  jdolecek Catch up with -current.
 1.29.2.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.29.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.29.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.29.2.1 03-Aug-2001  lukem update to -current
 1.34.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.40.4.1 15-Jun-2002  lukem Pull up revision 1.41 (requested by itojun in ticket #262):
drop too short IPv6 frame
 1.40.2.1 20-Jun-2002  gehenna catch up with -current.
 1.42.6.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.6.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.42.6.6 04-Feb-2005  skrll Sync with HEAD.
 1.42.6.5 18-Dec-2004  skrll Sync with HEAD.
 1.42.6.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.6.3 18-Sep-2004  skrll Sync with HEAD.
 1.42.6.2 25-Aug-2004  skrll Sync with HEAD.
 1.42.6.1 03-Aug-2004  skrll Sync with HEAD
 1.44.4.2 31-Mar-2007  bouyer Pull up following revision(s) (requested by msaitoh in ticket #11137):
sys/net/if_gif.c: revision 1.58
fix memory leak when resetting the source address and destination address.
 1.44.4.1 24-May-2005  riz Pull up revision 1.51 (requested by christos in ticket #1536):
PR/30285: Miles Nordin: incorrect permission check joining/leaving multicast
groups.
 1.44.2.2 31-Mar-2007  bouyer Revert previous, this was for netbsd-2 only.
 1.44.2.1 31-Mar-2007  bouyer Pull up following revision(s) (requested by msaitoh in ticket #11137):
sys/net/if_gif.c: revision 1.58
fix memory leak when resetting the source address and destination address.
 1.47.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.47.6.1 12-Feb-2005  yamt sync with head.
 1.47.4.1 29-Apr-2005  kent sync with -current
 1.50.2.2 31-Mar-2007  bouyer Pull up following revision(s) (requested by msaitoh in ticket #1684):
sys/net/if_gif.c: revision 1.58
fix memory leak when resetting the source address and destination address.
 1.50.2.1 28-May-2005  tron Pull up revision 1.51 (requested by christos in ticket #330):
PR/30285: Mile Nordin: incorrect permission check joining/leaving multicast
groups.
 1.54.2.7 27-Feb-2008  yamt sync with head.
 1.54.2.6 11-Feb-2008  yamt sync with head.
 1.54.2.5 27-Oct-2007  yamt sync with head.
 1.54.2.4 03-Sep-2007  yamt sync with head.
 1.54.2.3 26-Feb-2007  yamt sync with head.
 1.54.2.2 30-Dec-2006  yamt sync with head.
 1.54.2.1 21-Jun-2006  yamt sync with head.
 1.57.10.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.57.10.3 19-Apr-2006  elad sync with head.
 1.57.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.57.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.57.8.3 11-Aug-2006  yamt sync with head
 1.57.8.2 24-May-2006  yamt sync with head.
 1.57.8.1 13-Mar-2006  yamt sync with head.
 1.57.6.2 01-Jun-2006  kardel Sync with head.
 1.57.6.1 22-Apr-2006  simonb Sync with head.
 1.57.4.1 09-Sep-2006  rpaulo sync with head
 1.58.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.60.6.2 10-Dec-2006  yamt sync with head.
 1.60.6.1 22-Oct-2006  yamt sync with head
 1.60.4.2 12-Jan-2007  ad Sync with head.
 1.60.4.1 18-Nov-2006  ad Sync with head.
 1.64.4.4 07-May-2007  yamt sync with head.
 1.64.4.3 24-Mar-2007  yamt sync with head.
 1.64.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.64.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.67.6.1 18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.67.4.1 11-Jul-2007  mjf Sync with head.
 1.67.2.6 09-Oct-2007  ad Sync with head.
 1.67.2.5 15-Jul-2007  ad Sync with head.
 1.67.2.4 15-Jul-2007  ad Sync with head.
 1.67.2.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.67.2.2 08-Jun-2007  ad Sync with head.
 1.67.2.1 10-Apr-2007  ad Sync with head.
 1.70.8.2 23-Mar-2008  matt sync with HEAD
 1.70.8.1 06-Nov-2007  matt sync with HEAD
 1.70.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.70.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.71.2.1 14-Oct-2007  yamt sync with head.
 1.72.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.74.14.1 18-Jun-2008  simonb Sync with head.
 1.74.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.74.10.3 11-Aug-2010  yamt sync with head.
 1.74.10.2 11-Mar-2010  yamt sync with head
 1.74.10.1 04-May-2009  yamt sync with head.
 1.74.8.1 17-Jun-2008  yamt sync with head.
 1.74.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.74.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.75.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.75.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.77.4.1 30-May-2010  rmind sync with head
 1.77.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.80.14.2 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1345):
sys/net/if_gif.c: revision 1.91
sys/net/if_gif.c: revision 1.92
fix panic after "ifconfig gifX tunnel src dst" failed for the reason of address pair duplication.
e.g.
====================
# ifconfig gif0 create
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.2
# ifconfig gif0 inet 172.16.0.1/24 172.16.0.2
# route add 10.1.0.0/24 172.16.0.1
# ifconfig gif1 create
# ifconfig gif1 tunnel 192.168.0.1 192.168.0.3
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.3
ifconfig: SIOCSLIFPHYADDR: Can't assign requested address # expected
# ping 10.1.0.1
(panic)
====================
fix CID 980463
 1.80.14.1 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1344):
sys/net/if_gif.c: revision 1.89
sys/net/if_gif.c: revision 1.90
CID 980463: Provide common error path for rollback. Remove extra check for
success.
 1.80.12.4 03-Dec-2017  jdolecek update from HEAD
 1.80.12.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.80.12.2 23-Jun-2013  tls resync from head
 1.80.12.1 25-Feb-2013  tls resync with head
 1.80.8.3 15-Nov-2015  bouyer Pull up following revision(s) (requested by knakahara in ticket #1345):
sys/net/if_gif.c: revision 1.91
sys/net/if_gif.c: revision 1.92
fix panic after "ifconfig gifX tunnel src dst" failed for the reason of address pair duplication.
e.g.
====================
# ifconfig gif0 create
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.2
# ifconfig gif0 inet 172.16.0.1/24 172.16.0.2
# route add 10.1.0.0/24 172.16.0.1
# ifconfig gif1 create
# ifconfig gif1 tunnel 192.168.0.1 192.168.0.3
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.3
ifconfig: SIOCSLIFPHYADDR: Can't assign requested address # expected
# ping 10.1.0.1
(panic)
====================
fix CID 980463
 1.80.8.2 15-Nov-2015  bouyer Pull up following revision(s) (requested by knakahara in ticket #1344):
sys/net/if_gif.c: revision 1.89
sys/net/if_gif.c: revision 1.90
CID 980463: Provide common error path for rollback. Remove extra check for
success.
correct mistake in previous
 1.80.8.1 08-Feb-2013  riz branches: 1.80.8.1.2;
Pull up following revision(s) (requested by degroote in ticket #792):
sys/net/if_gif.c: revision 1.81
PR kern/47419: Antony Mallet: ifconfig doesn't diplay MTU on gif(4)
There is no special treatment for SIOCGITMTU in gif(4), so just pass it to
ifioctl_common().
 1.80.8.1.2.2 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1345):
sys/net/if_gif.c: revision 1.91
sys/net/if_gif.c: revision 1.92
fix panic after "ifconfig gifX tunnel src dst" failed for the reason of address pair duplication.
e.g.
====================
# ifconfig gif0 create
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.2
# ifconfig gif0 inet 172.16.0.1/24 172.16.0.2
# route add 10.1.0.0/24 172.16.0.1
# ifconfig gif1 create
# ifconfig gif1 tunnel 192.168.0.1 192.168.0.3
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.3
ifconfig: SIOCSLIFPHYADDR: Can't assign requested address # expected
# ping 10.1.0.1
(panic)
====================
fix CID 980463
 1.80.8.1.2.1 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1344):
sys/net/if_gif.c: revision 1.89
sys/net/if_gif.c: revision 1.90
CID 980463: Provide common error path for rollback. Remove extra check for
success.
 1.80.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.80.2.1 23-Jan-2013  yamt sync with head
 1.82.10.1 10-Aug-2014  tls Rebase.
 1.83.4.10 28-Aug-2017  skrll Sync with HEAD
 1.83.4.9 05-Feb-2017  skrll Sync with HEAD
 1.83.4.8 05-Oct-2016  skrll Sync with HEAD
 1.83.4.7 09-Jul-2016  skrll Sync with HEAD
 1.83.4.6 29-May-2016  skrll Sync with HEAD
 1.83.4.5 22-Apr-2016  skrll Sync with HEAD
 1.83.4.4 19-Mar-2016  skrll Sync with HEAD
 1.83.4.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.83.4.2 22-Sep-2015  skrll Sync with HEAD
 1.83.4.1 06-Jun-2015  skrll Sync with HEAD
 1.83.2.3 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1034):
sys/net/if_gif.c: revision 1.91
sys/net/if_gif.c: revision 1.92
fix panic after "ifconfig gifX tunnel src dst" failed for the reason of address pair duplication.
e.g.
====================
# ifconfig gif0 create
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.2
# ifconfig gif0 inet 172.16.0.1/24 172.16.0.2
# route add 10.1.0.0/24 172.16.0.1
# ifconfig gif1 create
# ifconfig gif1 tunnel 192.168.0.1 192.168.0.3
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.3
ifconfig: SIOCSLIFPHYADDR: Can't assign requested address # expected
# ping 10.1.0.1
(panic)
====================
fix CID 980463
 1.83.2.2 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1033):
sys/net/if_gif.c: revision 1.89
sys/net/if_gif.c: revision 1.90
CID 980463: Provide common error path for rollback. Remove extra check for
success.
 1.83.2.1 04-Jun-2015  msaitoh branches: 1.83.2.1.2;
Pull up following revision(s) (requested by hsuenaga in ticket #822):
sys/net/if_gif.c: revision 1.85
sys/net/if_gif.c: revision 1.86
Obtain softnet_lock before entering IP networking stack from gif software
interrupt.
Include <sys/socketvar.h> for softnet_lock.
 1.83.2.1.2.2 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1034):
sys/net/if_gif.c: revision 1.91
sys/net/if_gif.c: revision 1.92
fix panic after "ifconfig gifX tunnel src dst" failed for the reason of address pair duplication.
e.g.
====================
# ifconfig gif0 create
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.2
# ifconfig gif0 inet 172.16.0.1/24 172.16.0.2
# route add 10.1.0.0/24 172.16.0.1
# ifconfig gif1 create
# ifconfig gif1 tunnel 192.168.0.1 192.168.0.3
# ifconfig gif0 tunnel 192.168.0.1 192.168.0.3
ifconfig: SIOCSLIFPHYADDR: Can't assign requested address # expected
# ping 10.1.0.1
(panic)
====================
fix CID 980463
 1.83.2.1.2.1 18-Nov-2015  msaitoh Pull up following revision(s) (requested by knakahara in ticket #1033):
sys/net/if_gif.c: revision 1.89
sys/net/if_gif.c: revision 1.90
CID 980463: Provide common error path for rollback. Remove extra check for
success.
 1.119.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.119.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.119.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.124.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.126.2.15 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.126.2.14 22-Apr-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1241):

sys/net/if_gif.c: revision 1.146

fix a potential bug of gif(4) check for tunnel duplicate.

This problem has not actualized thanks to check for duplicate
in encap_attach().
 1.126.2.13 12-Nov-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1087):

sys/net/if_gif.c: revision 1.145

Fix ALTQ on gif(4). Reported and tested by Anthony Mallet, advised by Greg Troxel, thanks.

l2tp(4) and ipsecif(4) don't support ALTQ yet. So, they don't require this fix.

XXX pullup-8
 1.126.2.12 21-Oct-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1066):

sys/net/if_vlan.c: revision 1.133
sys/net/if_gif.h: revision 1.32
sys/net/if_ipsec.c: revision 1.18
sys/net/if_ipsec.h: revision 1.4
sys/net/if_gif.c: revision 1.144
sys/net/if_l2tp.h: revision 1.6
sys/net/if_l2tp.c: revision 1.30

Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.126.2.11 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #843):

sys/dev/pci/ixgbe/ixv.c: revision 1.101
sys/net/if_bridge.c: revision 1.156
sys/net/if_pppoe.c: revision 1.138
sys/dev/pci/if_wm.c: revision 1.580
sys/dev/pci/ixgbe/ixgbe.c: revision 1.156
sys/net/if_gif.c: revision 1.142

Ensure to call if_register after interface initializations finish
 1.126.2.10 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.126.2.9 08-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #613):
sys/net/if_pppoe.c: revision 1.130,1.134
sys/net/if_spppsubr.c: revision 1.172,1.175,1.179
sys/net/if_gif.c: revision 1.138,1.139

Mark callouts of pppoe(4) CALLOUT_MPSAFE. Suggested by ozaki-r@n.o.

fix non-diagnostic compilation

Fix spl leak.
ifconfig gif0 create
ifconfig gif0 destroy
WARNING: SPL NOT LOWERED ON ...

Fix breaking character limit. Pointed out by ozaki-r@n.o, thanks.

Use m_freem instead of m_free. Otherwise we're leaking the next mbufs in
the chain.
 1.126.2.8 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.126.2.7 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #462):
sys/net/if_gif.c: revision 1.133, 1.134, 1.137
sys/net/if_gif.h: revision 1.28-1.29
sys/netinet/in_gif.c: revision 1.90-1.91
sys/netinet/in_gif.h: revision 1.18
sys/netinet6/in6_gif.c: revision 1.88-1.89
sys/netinet6/in6_gif.h: revision 1.17
preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).
After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).
update locking notes later.
--
update gif(4) locking notes.
--
IFF_RUNNING checking in Rx and Tx processing is unnecessary now.
Because the configs of gif (members of gif_var) are protected by psref(9).
--
remove duplicated null ckeck
 1.126.2.6 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.126.2.5 21-Dec-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #436):
distrib/sets/lists/modules/mi: revision 1.112
sys/modules/Makefile: revision 1.196
sys/modules/ip_ecn/Makefile: revision 1.1
sys/modules/if_gif/Makefile: revision 1.3
sys/net/if_gif.c: revision 1.136
sys/netinet/ip_ecn.c: revision 1.17
Split ip_ecn code into its own module, so it can be shared between
gif(4), stf(4), and ipsec(4). Without this, loading the if_gif
module can result in redefined global symbols if either ipsec(4) or
stf(4) but not gif(4) is built into the kernel.
Fixes PR kern/52795 (as reported by martin@ via irc).
 1.126.2.4 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.126.2.3 24-Oct-2017  snj Pull up following revision(s) (requested by knahakara in ticket #303):
sys/net/if_gif.c: 1.129-1.130
sys/net/if_gif.h: 1.26-1.27
sys/netinet/in_gif.c: 1.88
sys/netinet6/in6_gif.c: 1.86
add lock for percpu route like l2tp(4).
--
add lock for sclist to exclude ifconfig gifX add/delete and ifconfig gifX tunnel
--
update locking notes.
 1.126.2.2 09-Aug-2017  snj Pull up following revision(s) (requested by knakahara in ticket #201):
sys/net/if_gif.c: revision 1.128
fix leak when encap_attach() fails twice.
 1.126.2.1 30-Jun-2017  snj Pull up following revision(s) (requested by knakahara in ticket #58):
sys/net/if_gif.c: revision 1.127
I have forgotten to commit this gif(4) MP-ify patch for a long time, sorry.
 1.139.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.139.2.4 20-Oct-2018  pgoyette Sync with head
 1.139.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.139.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.139.2.1 02-May-2018  pgoyette Synch with HEAD
 1.143.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.143.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.143.2.1 10-Jun-2019  christos Sync with HEAD
 1.148.2.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.150.2.1 29-Feb-2020  ad Sync with head.
 1.154.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.157.8.2 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.157.8.1 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.157.4.1 21-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #901):

sys/net/if_gif.c: revision 1.159

Drop locks before freeing unreferenced memory in gif_set_tunnel
 1.158.2.1 02-Aug-2025  perseant Sync with HEAD
 1.35 01-Feb-2020  riastradh Switch if_gif to atomic_load/store_*.
 1.34 30-Oct-2019  knakahara branches: 1.34.2;
Add sysctl nodes to control fragmentation with IPv[46] over IPv6 gif(4).

New sysctl node "net.inet6.ip6.gifpmtu" means
- 0 (default)
Fragment by IPV6_MMTU. All packets reach the destination certainly,
however the long packet performance is poor.
This is same behavior as before.
- 1
Fragment by outer interface's MTU. The long packet performance would
be good, however the packets may be dropped in some network paths
whose path MTU less than the interface's MTU.
- others
undefined yet

New sysctl node "net.interfaces.gif*.pmtu" means
- -1 (default)
Use system default value (net.inet6.ip6.gifpmtu).
- 0
Fragment by IPV6_MMTU for this gif(4) tunnel.
- 1
Fragment by outer interface's MTU for this gif(4) tunnel.
- others
undefined yet

See RFC4459 for more information and other solutions.
 1.33 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.32 19-Oct-2018  knakahara branches: 1.32.4;
Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.31 27-Apr-2018  knakahara branches: 1.31.2;
Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.30 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.29 27-Nov-2017  knakahara branches: 1.29.2;
update gif(4) locking notes.
 1.28 27-Nov-2017  knakahara preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).

After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).

update locking notes later.
 1.27 21-Sep-2017  knakahara update locking notes.
 1.26 21-Sep-2017  knakahara add lock for percpu route like l2tp(4).
 1.25 14-Dec-2016  knakahara branches: 1.25.8;
fix race of gif_softc->gif_ro when we send multiple flows over gif on NET_MPSAFE enabled kernel.

make gif_softc->gif_ro percpu as well as ipforward_rt to resolve this race.
and add future TODO comment for etherip(4).
 1.24 24-Jun-2016  knakahara branches: 1.24.2;
eliminate gif(4) Tx softint

- remove gif_si from struct gif_softc
- directly call gifintr() from gif_output()
- rename gifintr() to gif_start()
- remove Tx softint processing from gif_set_tunnel() and gif_delete_tunnel()
 1.23 31-May-2016  knakahara modify some functions static. no functional change.
 1.22 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.21 04-Jan-2016  knakahara Revert extra wating codes.

PR kern/50522 is actually fixed by sys/kern/kern_softint.c:r1.42, so waiting
codes in if_gif.c is not required.
 1.20 11-Dec-2015  knakahara PR kern/50522: gif(4) ioctl causes panic while someone is using the gif(4) interface.

It is required to wait other CPU's softint completion before disestablishing
the softint handler.
 1.19 12-Nov-2008  ad branches: 1.19.26; 1.19.44;
Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.18 14-Jul-2007  ad branches: 1.18.28; 1.18.32; 1.18.38; 1.18.42;
Generic soft interrupts are mandatory.
 1.17 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.16 04-Mar-2007  christos branches: 1.16.2; 1.16.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.15 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.14 09-Dec-2006  dyoung branches: 1.14.2;
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.13 11-Dec-2005  thorpej branches: 1.13.20; 1.13.22;
ANSI function decls and application of static.
 1.12 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.11 26-Jun-2005  mlelstv branches: 1.11.2;
expire cached route. Fixes PR 22792.
 1.10 06-Jun-2005  martin Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.9 11-Nov-2002  itojun branches: 1.9.6; 1.9.12; 1.9.20;
make USE_ENCAPCHECK (in netinet*/*gif.c) to global option, GIF_ENCAPCHECK.
#ifdef out unneeded code when possible.
From: Krister Walfridsson <cato@df.lth.se>
 1.8 16-Aug-2001  itojun gif interface now uses generic software interrupt
(on archs that support it). also, make gif ALTQ-capable on outgoing.
sync with kame, comments from thorpej.
 1.7 29-Jul-2001  itojun sync gif interface code with latest kame.
IFF_RUNNING is clearified. attach/detach logic is more clearner.
the old code mistakenly set IFF_UP by itself, now the behavior is gone.
 1.6 02-Jul-2000  thorpej branches: 1.6.2; 1.6.4;
Convert `gif' to be a cloning interface.
 1.5 19-Apr-2000  itojun introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.4 02-Dec-1999  itojun rcsid police
 1.3 09-Jul-1999  thorpej branches: 1.3.2; 1.3.8;
defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file if_gif.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file if_gif.h was added on branch chs-ubc2 on 1999-07-01 23:45:19 +0000
 1.3.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.6.4.1 03-Aug-2001  lukem update to -current
 1.6.2.2 11-Dec-2002  thorpej Sync with HEAD.
 1.6.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.9.20.1 08-Jan-2006  riz Pull up following revision(s) (requested by mlelstv in ticket #1092):
sys/netinet6/in6_gif.c: revision 1.43
sys/netinet/in_gif.c: revision 1.45
sys/net/if_gif.h: revision 1.11
expire cached route. Fixes PR 22792.
 1.9.12.1 09-Jan-2006  tron Pull up following revision(s) (requested by mlelstv in ticket #10214):
sys/netinet6/in6_gif.c: revision 1.43
sys/netinet/in_gif.c: revision 1.45
sys/net/if_gif.h: revision 1.11
expire cached route. Fixes PR 22792.
 1.9.6.2 11-Dec-2005  christos Sync with head.
 1.9.6.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.11.2.4 03-Sep-2007  yamt sync with head.
 1.11.2.3 26-Feb-2007  yamt sync with head.
 1.11.2.2 30-Dec-2006  yamt sync with head.
 1.11.2.1 21-Jun-2006  yamt sync with head.
 1.13.22.1 10-Dec-2006  yamt sync with head.
 1.13.20.1 12-Jan-2007  ad Sync with head.
 1.14.2.3 07-May-2007  yamt sync with head.
 1.14.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.14.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.16.4.1 11-Jul-2007  mjf Sync with head.
 1.16.2.2 15-Jul-2007  ad Sync with head.
 1.16.2.1 08-Jun-2007  ad Sync with head.
 1.18.42.1 19-Jan-2009  skrll Sync with HEAD.
 1.18.38.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.18.32.1 04-May-2009  yamt sync with head.
 1.18.28.1 17-Jan-2009  mjf Sync with HEAD.
 1.19.44.5 05-Feb-2017  skrll Sync with HEAD
 1.19.44.4 09-Jul-2016  skrll Sync with HEAD
 1.19.44.3 29-May-2016  skrll Sync with HEAD
 1.19.44.2 19-Mar-2016  skrll Sync with HEAD
 1.19.44.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.19.26.1 03-Dec-2017  jdolecek update from HEAD
 1.24.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.25.8.5 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.25.8.4 21-Oct-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1066):

sys/net/if_vlan.c: revision 1.133
sys/net/if_gif.h: revision 1.32
sys/net/if_ipsec.c: revision 1.18
sys/net/if_ipsec.h: revision 1.4
sys/net/if_gif.c: revision 1.144
sys/net/if_l2tp.h: revision 1.6
sys/net/if_l2tp.c: revision 1.30

Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.25.8.3 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.25.8.2 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #462):
sys/net/if_gif.c: revision 1.133, 1.134, 1.137
sys/net/if_gif.h: revision 1.28-1.29
sys/netinet/in_gif.c: revision 1.90-1.91
sys/netinet/in_gif.h: revision 1.18
sys/netinet6/in6_gif.c: revision 1.88-1.89
sys/netinet6/in6_gif.h: revision 1.17
preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).
After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).
update locking notes later.
--
update gif(4) locking notes.
--
IFF_RUNNING checking in Rx and Tx processing is unnecessary now.
Because the configs of gif (members of gif_var) are protected by psref(9).
--
remove duplicated null ckeck
 1.25.8.1 24-Oct-2017  snj Pull up following revision(s) (requested by knahakara in ticket #303):
sys/net/if_gif.c: 1.129-1.130
sys/net/if_gif.h: 1.26-1.27
sys/netinet/in_gif.c: 1.88
sys/netinet6/in6_gif.c: 1.86
add lock for percpu route like l2tp(4).
--
add lock for sclist to exclude ifconfig gifX add/delete and ifconfig gifX tunnel
--
update locking notes.
 1.29.2.3 20-Oct-2018  pgoyette Sync with head
 1.29.2.2 02-May-2018  pgoyette Synch with HEAD
 1.29.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.31.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.31.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.31.2.1 10-Jun-2019  christos Sync with HEAD
 1.32.4.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.34.2.1 29-Feb-2020  ad Sync with head.
 1.187 16-Jul-2025  kre Kernel part of O_CLOFORK implementation (plus kernel revbump)

This is Ricardo Branco's implementation of O_CLOFORK (and
associated fcntl, etc) for NetBSD (with a few minor changes
by me).

For now, the header file symbols that should be exposed to
userland are hidden inside temporary #ifdef _KERNEL blocks,
just to avoid random userland apps, or config scripts, from
seeing any of this before it is better tested.

Userland parts of this will follow soon.

This also bumps the kernel version to 10.99.15 (changes to
data structs, and the signature of fd_dup()).
 1.186 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.185 03-Feb-2024  jdolecek branches: 1.185.2;
fix PIPE_SOCKETPAIR variant of pipe1() to apply correctly the 'flags'
passed when called via pipe2(2), fixing repeatable process hang during
compilation with 'gcc -pipe'

refactor fsocreate() to return the new socket and file pointers,
expect the caller to call fd_affix() once initialization is fully complete

use the new fsocreate() to replace the duplicate open-coded 'flags' handling
in makesocket() used for socketpair(2), and in the PIPE_SOCKETPAIR pipe1()

this also fixes lib/libc/sys/t_pipe2 pipe2_cloexec test to succeed
on PIPE_SOCKETPAIR kernel

fixes PR kern/55690
 1.184 03-Sep-2022  thorpej branches: 1.184.4;
Garbage-collect the remaining vestiges of netisr.
 1.183 03-Sep-2022  thorpej Convert MPLS from a legacy netisr to pktqueue.
 1.182 03-Sep-2022  thorpej Convert NETATALK from a legacy netisr to pktqueue.
 1.181 21-Sep-2021  christos don't opencode kauth_cred_get()
 1.180 14-Feb-2021  roy if_gre: Remove alignment checks in favour copying to stack

Makes the code a lot simpler, idea from dyoung@
 1.179 13-Feb-2021  roy Prior alignment fixes should not use an offset
 1.178 12-Feb-2021  roy if_gre: Ensure that gre_h is aligned
 1.177 29-Jan-2020  thorpej branches: 1.177.6;
Adopt <net/if_stats.h>.
 1.176 16-Oct-2019  knakahara branches: 1.176.2;
Fix missing kpreempt_disable() before softint_schedule() like if_vmx.c:r1.51.
 1.175 26-Apr-2019  pgoyette branches: 1.175.2;
Some more empty-string --> NULL conversions for module dependencies
 1.174 17-Apr-2019  msaitoh Remove unused inclusion.
 1.173 26-Jun-2018  msaitoh branches: 1.173.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.172 16-Jan-2018  maxv branches: 1.172.2;
style
 1.171 02-Oct-2016  christos MFREE -> m_free
 1.170 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.169 10-Jun-2016  ozaki-r branches: 1.169.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.168 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.167 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.166 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.165 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.164 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.163 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.162 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.161 05-Sep-2014  matt branches: 1.161.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.160 18-Aug-2014  riastradh Don't leak in gre_clone_create error branch.

Noted by maxv@, compile-tested for amd64.
 1.159 08-Aug-2014  rtr branches: 1.159.2;
split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.158 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.157 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.156 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.155 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.154 17-May-2014  rmind - fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.
 1.153 07-Nov-2013  christos branches: 1.153.2;
eliminate unused variable
 1.152 13-Sep-2013  martin Remove unused variable
 1.151 29-Aug-2013  rmind Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.
 1.150 09-Nov-2011  christos branches: 1.150.6; 1.150.10; 1.150.12; 1.150.14; 1.150.20;
cosmetic, no functional change:
- sizeof(*var) instead of sizeof(type)
- sort the event counters in the discard the same as alloc for readability
 1.149 02-Nov-2011  dyoung branches: 1.149.2;
For simplicity's sake, use pcq(9) instead of my own circular-queue
implementation. Saves 45 lines of code.
 1.148 28-Oct-2011  dyoung Don't kauth-orize SIOCDIFPHYADDR, SIOCSIFFLAGS, SIOCSIFMTU, or
SIOCSLIFPHYADDR, in gif_ioctl() or in gre_ioctl(), because those
operations are ordinarily kauth-orized already in ifioctl().

Kauth-orizing SIOCSIFFLAGS in gre_ioctl() caused a panic ("panic:
bpf_detachd: ifpromisc failed: 1") when tcpdump(8) was interrupted.
Somehow bpf(4) enables promiscuous mode using different credentials than
it uses to disable promiscuous mode, hence the ifpromisc failure. This
may have something to do with privilege-separation in tcpdump(8). I.e.,
an LWP with SIOCSIFFLAGS privilege opens /dev/bpf, but an LWP without
SIOCSIFFLAGS privilege closes it.
 1.147 27-Oct-2011  dyoung Fix gif(4)/gre(4) operation over interfaces such as wm(4) that do IPv4
checksum-offload. Note well: it really is necessary to clear the
csum_data.

While I'm here, remove the do-nothing case for SIOCSIFDSTADDR and let
ifioctl_common() or the protocol handle it.
 1.146 19-Oct-2011  dyoung Get rid of gre's deadlock-prone, one-off ifioctl locking. The standard
ifioctl locking will do.
 1.145 24-May-2011  joerg Use proper format string
 1.144 26-Jun-2010  kefren branches: 1.144.2;
Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.143 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.142 19-Jan-2010  pooka branches: 1.142.2; 1.142.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.141 02-Sep-2009  tls Add a direction argument to socket upcalls, so they can tell why they've
been called when, for example, they're waiting for space to write. From
Ritesh Agrawal at Coyote Point.
 1.140 28-Apr-2009  dyoung Let this build even if 'no options INET'.
 1.139 07-Nov-2008  dyoung branches: 1.139.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.138 06-Aug-2008  plunky branches: 1.138.2;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.137 24-Jun-2008  ad branches: 1.137.2;
Replace references to getsock/getvnode.
 1.136 15-Jun-2008  christos - add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.135 02-Jun-2008  dyoung branches: 1.135.2;
Destroy condition variable sc_fp_condvar.
 1.134 15-May-2008  dyoung Note both my contribution and NSF funding.
 1.133 15-May-2008  dyoung Get rid of gre_sosend()'s lwp argument.
 1.132 09-May-2008  dyoung Make gre(4) work in the New File Descriptor / Socket Locking Order.

Move the function+line printing into GRE_DPRINTF().

Retire gre_closef(). Retire gre_join(). Constify gre_reconf(),
and don't pass it an LWP any longer.

Make this work in the new file descriptor regime. Add a kernel
thread per gre(4) instance whose purpose is to install the socket
into proc0's file descriptor table. Add gre_fp_send() and
gre_fp_recv() for passing file_t pointers to proc0.

Fix locking: don't solock() in the socket upcall, where it is
already held. Do solock() before calling soconnect().

Simplify reconfiguration.

Update a comment that mentions finding a less specific route, since
we don't do that any more.
 1.131 28-Apr-2008  martin branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses
 1.130 24-Apr-2008  ad branches: 1.130.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.129 03-Apr-2008  dyoung branches: 1.129.2;
Improve error handling. gre(4) is still broken, but it does not
any longer cause a page fault trap.
 1.128 03-Apr-2008  dyoung Fix one of two bugs introduced by the descriptor handling changes
(rev 1.125): correct the check for fd_getsock() failure in
gre_socreate().

The second bug is more complicated to fix. Since rev 1.125,
gre_reconf() is using the file descriptor table of the current
process instead of the process 0's (the kernel's).
 1.127 03-Apr-2008  dyoung Cosmetic: use curlwp everywhere that it is appropriate, instead of
using a temporary variable. Remove superflous curly braces. Move
an assignment that shuts up a "variable may be used uninitialized"
warning.
 1.126 27-Mar-2008  ad Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.125 21-Mar-2008  ad Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.124 20-Feb-2008  matt branches: 1.124.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.123 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.122 20-Dec-2007  dyoung Move more code in gre_clone_destroy() under splnet() protection,
in order to protect against gre_input() on a destroyed gre.
 1.121 28-Nov-2007  dyoung branches: 1.121.2; 1.121.6;
Cosmetic: join two lines.
 1.120 24-Nov-2007  dyoung Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().
 1.119 24-Nov-2007  dyoung Fix a bunch of locking bugs ("Mutex error: lockdebug_barrier: spin
lock held"): only hold a mutex briefly at the top and bottom of
gre_ioctl(). Use splnet() to synchronize reconfiguration with
network interrupts.
 1.118 07-Nov-2007  ad Use the softint_* API.
 1.117 19-Oct-2007  ad branches: 1.117.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.116 08-Oct-2007  ad branches: 1.116.2;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.115 06-Oct-2007  dyoung Change some ints to bools.
 1.114 06-Oct-2007  dyoung Good-bye, kernel thread, we don't need you any longer.
 1.113 05-Oct-2007  dyoung Cosmetic: KNF. Litter the code with fewer #if NBPFILTER > 0.
 1.112 05-Oct-2007  dyoung Remove a lot of dead code. Move gre_do_send() code into greintr(),
and move gre_do_recv() code into gre_receive(). Get rid of some
unused event counters.
 1.111 05-Oct-2007  dyoung Work in progress: use a raw socket for GRE in IP encapsulation
instead of adding/subtracting our own IPv4 header.

There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.

There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.

I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.

A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.

Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
 1.110 08-Sep-2007  dyoung branches: 1.110.2;
Rename gre_socreate1() -> gre_socreate().
 1.109 02-Sep-2007  dyoung Delete unused variable.
 1.108 02-Sep-2007  dyoung Simplify code, add debug statements, and fix a bug that could
soclose() a UDP socket that a struct file still pointed at.
 1.107 02-Sep-2007  dyoung Get rid of struct oifreq/ifreq compat code, because ifioctl() has
taken care of this for us.
 1.106 02-Sep-2007  dyoung Be consistent: use the prefix sc_ for all members of the gre_softc.
 1.105 30-Aug-2007  dyoung Move sc_fp & sc_newfp from struct gre_softc to struct gre_soparm.
 1.104 30-Aug-2007  dyoung Remove out-of-date debug message and comment.
 1.103 30-Aug-2007  dyoung Do not hold the mutex as much in gre_thread1(). Move initial mutex
acquisition and final release out into gre_thread(). This will
fix a locking bug that LOCKDEBUG exposed: holding a spinlock over
an sosend() call is a no-no.

Cosmetic: join some lines, remove some unnecessary curly braces.
 1.102 24-Aug-2007  dyoung branches: 1.102.2;
Overhaul gre(4), especially the GRE in UDP bits:

* Create the kernel thread in gre_clone_create() instead of trying
to create it in gre_ioctl(). (Thanks ad@ for suggesting it, and
pointing out that I can't kthread_create while I hold a spin
lock.) Run the thread always, but put it to sleep while the
gre(4) is not in UDP mode.

* Use sockaddr_in_init().

* Move some thread state off of the stack and into the softc.

* Extract subroutines gre_do_recv(), gre_do_send(), and gre_reconf()
from gre_thread1(), making the code more readable.
 1.101 20-Aug-2007  skd Clean up net compat ioctls, and clean up handling of wireless ioctls.
 1.100 14-Aug-2007  joerg Explicitly assert that the protocol out pr_ctloutput before calling it.
 1.99 14-Aug-2007  seanb - Check IFF_RUNNING | IFF_UP in gre_output() correctly.
 1.98 09-Jul-2007  ad branches: 1.98.2; 1.98.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.97 30-May-2007  christos Move the nasty ifdefs in one place. Requested by ad and dyoung.
 1.96 29-May-2007  christos fix unused variable.
 1.95 29-May-2007  xtraeme Initialize oifr to fix build with COMPAT_40.
 1.94 29-May-2007  christos Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.93 06-May-2007  dyoung Switch from spl(9) to mutex(9) and condvar(9).

Fix a defect in the locking of file descriptors as we delegate a
UDP socket from userland to the kernel. Move sc_fp out of sc_soparm.
Synchronize access to sc_fp by gre_ioctl() and the kernel thread
using a condition variable. For simplicity's sake, make it the
kernel helper thread's responsibility to close its UDP socket.
 1.92 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.91 14-Apr-2007  dyoung In gre_clone_destroy(), free the route cache after calling if_detach(),
because if_detach() may cause us to transmit a packet, which
ordinarily entails reloading the route cache. This fixes a bug
where the kernel would panic later in rtflush(). Thanks Michael
Earnhart for reporting the bug.

In gre_output(), do not leak mbufs.
 1.90 21-Mar-2007  dyoung Make all debug messages use GRE_DPRINTF(). Get rid of a redundant
if_ierrors++. Change (type *)0 to NULL. Get rid of unnecessary
casts to void *.
 1.89 21-Mar-2007  dyoung If we do not recognize the protocol of a received packet, then
increase ifi_noproto. If the GRE header contains routing options,
increase the input-error count, ifi_ierrors.

While I am here, make some cosmetic changes: remove unnecessary
'proto' argument from gre_input3(). Shorten some staircases.
 1.88 04-Mar-2007  christos branches: 1.88.2; 1.88.4; 1.88.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.87 23-Feb-2007  dyoung Cosmetic: remove gratuitous () from return statements.
 1.86 23-Feb-2007  dyoung If we enter gre_output() without a route in the cache, call
rtcache_init() to try to fill the cache. rtcache_check() was not
sufficient.
 1.85 23-Feb-2007  dyoung Destroy route cache before destroying the interface.
 1.84 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.83 26-Jan-2007  dyoung branches: 1.83.2;
Fix the check for a routing loop.
 1.82 26-Jan-2007  dyoung Mark some shared variables as volatile.
 1.81 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.80 10-Dec-2006  christos initialize error, cause gcc3 says so.
 1.79 09-Dec-2006  dyoung Straggler from last: convert to rtflush().
 1.78 04-Dec-2006  dyoung Per discussion on tech-net@, discard the address-munging hack that
let one create a tunnel with equal inner and outer destination IP
numbers. Update gre(4) documentation for this change.

Extract subroutine gre_update_route() from gre_compute_route(),
and always call it in gre_output() to freshen the route for
tunnel-encapsulated packets.
 1.77 04-Dec-2006  dyoung In gre_clone_destroy,
1 use splnet() to synchronize gre clone destruction with interrupts,
and
2 wait to call if_detach() until after joining the gre kernel
thread.
 1.76 16-Nov-2006  dyoung branches: 1.76.2;
Correct the length of the TTL argument to setsockopt(IPPROTO_IP,
IP_TTL).
 1.75 16-Nov-2006  dyoung Cosmetic: s/g_proto/sc_proto/. Remove superfluous parentheses and
curly braces.
 1.74 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.73 04-Nov-2006  dyoung Change lengthy ((struct sockaddr_in *)x) to satosin(x).
 1.72 04-Nov-2006  dyoung Remove unused variables.
 1.71 04-Nov-2006  dyoung Expand the comment concerning gre_kick().

Shorten the code in gre_compute_route() that flips the least
significant bit of the tunnel address. No functional change.
 1.70 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.69 15-Oct-2006  dyoung Two bug fixes:

If gre_socreate1() cannot find out the socket's address, exit with
an error. Before, it could exit *without* an error.

If gre_thread1() finds that it is without a valid socket (i.e., so
== NULL) but the configuration is "unchanged" (in initial state),
force reconfiguration. This prevents a crash when we try to bring
up a GRE over UDP interface whose UDP endpoints have never been
specified.
 1.68 15-Oct-2006  dyoung Cosmetic: join lines to conserve vertical space.
 1.67 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.66 09-Oct-2006  dyoung Bug fix: do not try to destroy a NULL socket. Stops the kernel
from crashing when a GRE over UDP instance of gre(4) is destroyed
before its socket is created/delegated.
 1.65 07-Sep-2006  dogcow branches: 1.65.2; 1.65.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.64 03-Sep-2006  dyoung Don't use IFQ_ macros on an ifqueue. Fixes a compilation error
reported by christos.
 1.63 01-Sep-2006  dyoung Rename gre_softc member sc_sp to sc_soparm to fix NetBSD/alpha
compiles, where some other system header #defines sc_sp.

In gre_ioctl, GREDSOCK case, do not try to delete sc_fp if it is
NULL.

Move GREDSOCK and GRESSOCK definitions to where the other GRE ioctls
are defined.

Remove #ifdef GRESSOCK, it's unnecessary now that the feature is
complete.
 1.62 31-Aug-2006  dyoung Add a mode to gre(4) that sends GRE tunnel packets in UDP datagrams.
Fix MOBILE encapsulation. Add many debugging printfs (mainly
concerning UDP mode). Clean up the gre(4) code a bit. Add the
capability to setup UDP tunnels to ifconfig. Update documentation.

In UDP mode, gre(4) puts a GRE header onto transmitted packets,
and hands them to a UDP socket for transmission. That is, the
encapsulation looks like this: IP+UDP+GRE+encapsulated packet.

There are two ways to set up a UDP tunnel. One way is to tell the
source and destination IP+port to gre(4), and let gre(4) create
the socket. The other way to create a UDP tunnel is for userland
to "delegate" a UDP socket to the kernel.
 1.61 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.60 14-May-2006  elad integrate kauth.
 1.59 11-Dec-2005  thorpej branches: 1.59.4; 1.59.6; 1.59.8; 1.59.10; 1.59.12;
ANSI function decls and application of static.
 1.58 11-Dec-2005  christos merge ktrace-lwp.
 1.57 20-May-2005  christos branches: 1.57.2;
PR/30285: Mile Nordin: incorrect permission check joining/leaving multicast
groups.
 1.56 30-Mar-2005  is Add IPv6 over GRE (contributed by Gert Doering in PR 29150).
 1.55 26-Feb-2005  perry branches: 1.55.2;
nuke trailing whitespace
 1.54 06-Dec-2004  christos branches: 1.54.4; 1.54.6;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.53 04-Dec-2004  peter Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.52 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.51 13-May-2004  tron Mark gre(4) interfaces as IFT_TUNNEL (Encapsulation interface).
 1.50 21-Apr-2004  itojun kill sprintf, use snprintf
 1.49 11-Dec-2003  itojun branches: 1.49.2;
gi_len is ip_len, so it has to be network byteorder. markus friedl
 1.48 05-Sep-2003  itojun u_short -> u_int16_t
 1.47 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.46 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.45 06-May-2003  grant branches: 1.45.2;
fix grammatical error in a diagnostic message.
 1.44 23-Feb-2003  simonb Remove assigned-to but not used variable.
 1.43 04-Jan-2003  wiz Spell output with two ts.
 1.42 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.41 12-Aug-2002  itojun to be consistent with other sources, use "struct ip *ip", not inp.
(inp is usually used for pointing struct inpcb)
 1.40 10-Jun-2002  itojun return EPROTONOSUPPORT if unsupported protocol is specified
 1.39 10-Jun-2002  itojun don't abuse IFF_UP
 1.38 10-Jun-2002  itojun raise output errcnt
 1.37 10-Jun-2002  itojun ENETDOWN if outer ip address is not configured.
plug mbuf leak while here.
 1.36 10-Jun-2002  itojun don't use inner address configured by SIOCSIFADDR/DSTADDR
as outer addresses; now you need to configure outer address by
SIOCS*PHYADDR ("ifconfig tunnel"). as discussed on tech-net
 1.35 09-Jun-2002  itojun deprecate IFF_LINK2, !IFF_LINK0 is enough.
no need to manipulate IFF_LINK1 with IFF_LINK0.
remove reference to greconfig(8).
 1.34 09-Jun-2002  itojun no need for if_addrlen be 4. From: Martin Husemann <martin@duskware.de>
 1.33 09-Jun-2002  itojun make sure to bzero sockaddr_in
 1.32 09-Jun-2002  itojun style
 1.31 09-Jun-2002  itojun don't forget splx
 1.30 09-Jun-2002  itojun avoid code duplicate (route lookup)
 1.29 09-Jun-2002  itojun style
 1.28 09-Jun-2002  itojun support SIOCSLIFPHYADDR, SIOCDIFPHYADDR and SIOCGLIFPHYADDR, so that
we can manipulate tunnel endpoint by ifconfig(8).
 1.27 09-Jun-2002  martin Change default MTU to 1476 (same value that Cisco uses).
Do not limit the MTU when set by the admin with ifconfig, per discussion
on tech-net.

This fixes PR 16761 from Jasper Wallace.
 1.26 24-Feb-2002  martin branches: 1.26.8; 1.26.10;
Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.
 1.25 24-Nov-2001  martin Sanity check the tunnel route after computing it and don't mark the
interface up if there is no route or the route loops back to ourself.
This helps to avoid pilot errors which would result in kernel stack
overflows.
 1.24 24-Nov-2001  martin Make this respect down interfaces.
 1.23 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.22 12-Nov-2001  lukem add RCSIDs
 1.21 10-May-2001  itojun branches: 1.21.2;
one more indentation fix
 1.20 10-May-2001  itojun whitespace/indentation cleanup
 1.19 10-May-2001  itojun no longer need to carry local version of inet_ntoa, we have it in libkern.
 1.18 12-Apr-2001  thorpej splimp -> splnet
 1.17 20-Feb-2001  itojun branches: 1.17.2;
explicitly use u_int32_t for DLT_NULL encapsulation.

correct gif address family. from chopps, sync with kame.
 1.16 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.15 18-Dec-2000  thorpej Fill in if_dlt.
 1.14 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.13 19-Nov-2000  martin Allow changing of settings via ioctl only for the superuser.
Fixes PR security/11524.
 1.12 25-Aug-2000  mjl Add bpf tap to gre interface.
 1.11 05-Jul-2000  thorpej Fix an omission in the gre cloning changes.
 1.10 05-Jul-2000  thorpej Make gre(4) a cloning network pseudo-device.
 1.9 25-Oct-1999  drochner branches: 1.9.6;
defopt the XNS protocol (options NS), clean up the use of related
option headers / defines
 1.8 28-Jun-1999  explorer branches: 1.8.2; 1.8.4; 1.8.6;
KNFify. Add LINK1 flag to turn off that address munging thing, for cases
where the tunnel endpoint is not the same as the remote GRE destination.
 1.7 12-Mar-1999  perry branches: 1.7.2; 1.7.4; 1.7.6;
exterminate ovbcopy. patches provided by Erik Bertelsen, pr-7145
 1.6 26-Jan-1999  hwr We no longer support IPIP (IP proto 4).
 1.5 11-Jan-1999  thorpej Pull the IP-in-IP tunneling support out of the GRE code. It's not handled
by a separate IP-IP input path.

XXX Should eventually do the same thing for IPPROTO_MOBILE.
 1.4 07-Oct-1998  thorpej Fix some typos in comments, and clean up some whitespace.
 1.3 30-Sep-1998  hwr Start supporting IPPROTO_MOBILE (55) encapsulation. This is yet
another tunneling protocol used by the Mobile-IP people. See RFC 2004
for this.
 1.2 13-Sep-1998  hwr The post 1.3.2 world is actually ready for this.
 1.1 13-Sep-1998  hwr Add a gre tunnel pseudo network device. Gre = generic route encapsulation.
This device shows up like any other network interface and can be used to
tunnel L3 protocols as e.g. IP over IP.
 1.7.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.7.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.7.2.1 02-Jul-1999  perry pullup 1.7->1.8 (explorer)
 1.8.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.4.1 15-Nov-1999  fvdl Sync with -current
 1.8.2.7 21-Apr-2001  bouyer Sync with HEAD
 1.8.2.6 12-Mar-2001  bouyer Sync with HEAD.
 1.8.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.8.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.8.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.8.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.6.4 26-Feb-2002  he Pull up revision 1.26 (via patch, requested by martin):
Clear M_BCAST and M_MCAST on encapsulated packets on outgoing
mbufs. Also do not copy TTL from the inner packet, and make the
outer TTL sysctl'able. Fixes PR#14269, and makes traceroute work
over GRE tunnels.
 1.9.6.3 09-Dec-2001  he Pull up revisions 1.24-1.25 (via patch, requested by martin):
Respect down interfaces, and sanity check the tunnel route after
computing it, marking the interface down if there is no route or
it loops back to ourselves. Helps avoid pilot errors which would
result in kernel stack overflows.
 1.9.6.2 19-Nov-2000  tv Pullup 1.13 [sommerfeld]:
Allow changing of settings via ioctl only for the superuser.
Fixes PR security/11524.
 1.9.6.1 25-Aug-2000  mjl Add bpf tap to gre interfaces. Approved by thorpej.
 1.17.2.11 07-Jan-2003  thorpej Sync with HEAD.
 1.17.2.10 27-Aug-2002  nathanw Catch up to -current.
 1.17.2.9 13-Aug-2002  nathanw Catch up to -current.
 1.17.2.8 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.17.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.17.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.17.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.17.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.17.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.21.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.21.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.21.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.21.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.26.10.1 06-Nov-2002  tron Pull up revision 1.27 (requested by martin in ticket #226):
Change default MTU to 1476 (same value that Cisco uses).
Do not limit the MTU when set by the admin with ifconfig, per discussion
on tech-net.
This fixes PR 16761 from Jasper Wallace.
 1.26.8.2 29-Aug-2002  gehenna catch up with -current.
 1.26.8.1 20-Jun-2002  gehenna catch up with -current.
 1.45.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.45.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.45.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.45.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.45.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.45.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.45.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.45.2.1 03-Aug-2004  skrll Sync with HEAD
 1.49.2.1 20-May-2004  grant branches: 1.49.2.1.2;
Pull up revision 1.51 (requested by tron in ticket #324):

Mark gre(4) interfaces as IFT_TUNNEL (Encapsulation interface).
 1.49.2.1.2.2 24-May-2005  riz Pull up revision 1.57 (requested by christos in ticket #1536):
PR/30285: Miles Nordin: incorrect permission check joining/leaving multicast
groups.
 1.49.2.1.2.1 08-May-2005  snj Pull up revision 1.56 (requested by is in ticket #1382):
Add IPv6 over GRE (contributed by Gert Doering in PR 29150).
 1.54.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.54.4.1 29-Apr-2005  kent sync with -current
 1.55.2.3 26-Aug-2007  bouyer Pull up following revision(s) (requested by seanb in ticket #1830):
sys/net/if_gre.c: revision 1.99
- Check IFF_RUNNING | IFF_UP in gre_output() correctly.
 1.55.2.2 28-May-2005  tron Pull up revision 1.57 (requested by christos in ticket #330):
PR/30285: Mile Nordin: incorrect permission check joining/leaving multicast
groups.
 1.55.2.1 30-Mar-2005  tron Pull up revision 1.56 (requested by is in ticket #80):
Add IPv6 over GRE (contributed by Gert Doering in PR 29150).
 1.57.2.11 24-Mar-2008  yamt sync with head.
 1.57.2.10 27-Feb-2008  yamt sync with head.
 1.57.2.9 11-Feb-2008  yamt sync with head.
 1.57.2.8 21-Jan-2008  yamt sync with head
 1.57.2.7 07-Dec-2007  yamt sync with head
 1.57.2.6 15-Nov-2007  yamt sync with head.
 1.57.2.5 27-Oct-2007  yamt sync with head.
 1.57.2.4 03-Sep-2007  yamt sync with head.
 1.57.2.3 26-Feb-2007  yamt sync with head.
 1.57.2.2 30-Dec-2006  yamt sync with head.
 1.57.2.1 21-Jun-2006  yamt sync with head.
 1.59.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.59.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.59.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.59.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.59.8.4 14-Sep-2006  yamt sync with head.
 1.59.8.3 03-Sep-2006  yamt sync with head.
 1.59.8.2 11-Aug-2006  yamt sync with head
 1.59.8.1 24-May-2006  yamt sync with head.
 1.59.6.1 01-Jun-2006  kardel Sync with head.
 1.59.4.1 09-Sep-2006  rpaulo sync with head
 1.65.4.3 18-Dec-2006  yamt sync with head.
 1.65.4.2 10-Dec-2006  yamt sync with head.
 1.65.4.1 22-Oct-2006  yamt sync with head
 1.65.2.3 01-Feb-2007  ad Sync with head.
 1.65.2.2 12-Jan-2007  ad Sync with head.
 1.65.2.1 18-Nov-2006  ad Sync with head.
 1.76.2.3 31-Jan-2009  bouyer Pull up following revision(s) (requested by dholland in ticket #1266):
sys/net/if_gre.c: revision 1.80
initialize error, cause gcc3 says so.
 1.76.2.2 24-Aug-2007  liamjfoy Pull up following revision(s) (requested by seanb in ticket #829):
sys/net/if_gre.c: revision 1.99
- Check IFF_RUNNING | IFF_UP in gre_output() correctly.
 1.76.2.1 31-Mar-2007  bouyer branches: 1.76.2.1.2;
Pull up following revision(s) (requested by dyoung in ticket #530):
sys/net/if_gre.c: revision 1.77
In gre_clone_destroy,
1 use splnet() to synchronize gre clone destruction with interrupts,
and
2 wait to call if_detach() until after joining the gre kernel
thread.
 1.76.2.1.2.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.83.2.5 07-May-2007  yamt sync with head.
 1.83.2.4 15-Apr-2007  yamt sync with head.
 1.83.2.3 24-Mar-2007  yamt sync with head.
 1.83.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.83.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.88.6.1 29-Mar-2007  reinoud Pullup to -current
 1.88.4.1 11-Jul-2007  mjf Sync with head.
 1.88.2.10 23-Oct-2007  ad Sync with head.
 1.88.2.9 09-Oct-2007  ad Sync with head.
 1.88.2.8 09-Oct-2007  ad Sync with head.
 1.88.2.7 20-Aug-2007  ad Sync with HEAD.
 1.88.2.6 09-Jun-2007  ad Sync with head.
 1.88.2.5 08-Jun-2007  ad Sync with head.
 1.88.2.4 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.88.2.3 10-Apr-2007  ad Sync with head.
 1.88.2.2 10-Apr-2007  ad Nuke the deferred kthread creation stuff, as it's no longer needed.
Pointed out by thorpej@.
 1.88.2.1 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.98.6.8 03-Dec-2007  joerg Sync with HEAD.
 1.98.6.7 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.98.6.6 11-Nov-2007  joerg Sync with HEAD.
 1.98.6.5 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.98.6.4 07-Oct-2007  joerg Sync with HEAD.
 1.98.6.3 02-Oct-2007  joerg Sync with HEAD.
 1.98.6.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.98.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.98.2.3 10-Sep-2007  skrll Sync with HEAD.
 1.98.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.98.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.102.2.4 23-Mar-2008  matt sync with HEAD
 1.102.2.3 09-Jan-2008  matt sync with HEAD
 1.102.2.2 08-Nov-2007  matt sync with -HEAD
 1.102.2.1 06-Nov-2007  matt sync with HEAD
 1.110.2.2 14-Oct-2007  yamt sync with head.
 1.110.2.1 06-Oct-2007  yamt sync with head.
 1.116.2.2 13-Nov-2007  bouyer Sync with HEAD
 1.116.2.1 25-Oct-2007  bouyer Sync with HEAD.
 1.117.2.4 18-Feb-2008  mjf Sync with HEAD.
 1.117.2.3 27-Dec-2007  mjf Sync with HEAD.
 1.117.2.2 08-Dec-2007  mjf Sync with HEAD.
 1.117.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.121.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.121.2.1 26-Dec-2007  ad Sync with head.
 1.124.6.6 17-Jan-2009  mjf Sync with HEAD.
 1.124.6.5 28-Sep-2008  mjf Sync with HEAD.
 1.124.6.4 29-Jun-2008  mjf Sync with HEAD.
 1.124.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.124.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.124.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.129.2.3 17-Jun-2008  yamt sync with head.
 1.129.2.2 04-Jun-2008  yamt sync with head
 1.129.2.1 18-May-2008  yamt sync with head.
 1.130.2.5 11-Aug-2010  yamt sync with head.
 1.130.2.4 11-Mar-2010  yamt sync with head
 1.130.2.3 16-Sep-2009  yamt sync with head
 1.130.2.2 04-May-2009  yamt sync with head.
 1.130.2.1 16-May-2008  yamt sync with head.
 1.131.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.131.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.135.2.2 27-Jun-2008  simonb Sync with head.
 1.135.2.1 18-Jun-2008  simonb Sync with head.
 1.137.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.137.2.1 19-Oct-2008  haad Sync with HEAD.
 1.138.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.139.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.142.4.3 31-May-2011  rmind sync with head
 1.142.4.2 03-Jul-2010  rmind sync with head
 1.142.4.1 30-May-2010  rmind sync with head
 1.142.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.142.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.144.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.149.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.149.2.1 10-Nov-2011  yamt sync with head
 1.150.20.1 03-Nov-2014  msaitoh Pull up following revision(s) (requested by riastradh in ticket #1117):
sys/dev/rasops/rasops.c: revision 1.72
sys/dev/vme/if_ie_vme.c: revision 1.31
sys/dev/qbus/if_qe.c: revision 1.73
sys/altq/altq_jobs.c: revision 1.7
sys/net/if_gre.c: revision 1.160
sys/dev/ic/oosiop.c: revision 1.14
- Fix error branches in altq_jobs.c to avoid leaks, noted by maxv@.
- Fix leaks in oosiop_alloc_cb error branches, noted by maxv@.
While here, avoid a sketchy pointer cast that probably falls afoul of
strict aliasing rules. Compile-tested only, with hppa.
- Don't leak f on failurein rasops.c. Noted by maxv@.
Compile-tested only, with zaurus.
- Avoid leak in error branch in if_qe.c, noted by maxv@, compile-tested for
vax.
- Sizeof struct ievme, not sizeof size_t in if_ie_vme.c.
Noted by maxv@, compile-tested for sparc.
- Don't leak in gre_clone_create error branch.
Noted by maxv@, compile-tested for amd64.
 1.150.14.2 18-May-2014  rmind sync with head
 1.150.14.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.150.12.1 03-Nov-2014  msaitoh Pull up following revision(s) (requested by riastradh in ticket #1117):
sys/dev/rasops/rasops.c: revision 1.72
sys/dev/vme/if_ie_vme.c: revision 1.31
sys/dev/qbus/if_qe.c: revision 1.73
sys/altq/altq_jobs.c: revision 1.7
sys/net/if_gre.c: revision 1.160
sys/dev/ic/oosiop.c: revision 1.14
- Fix error branches in altq_jobs.c to avoid leaks, noted by maxv@.
- Fix leaks in oosiop_alloc_cb error branches, noted by maxv@.
While here, avoid a sketchy pointer cast that probably falls afoul of
strict aliasing rules. Compile-tested only, with hppa.
- Don't leak f on failurein rasops.c. Noted by maxv@.
Compile-tested only, with zaurus.
- Avoid leak in error branch in if_qe.c, noted by maxv@, compile-tested for
vax.
- Sizeof struct ievme, not sizeof size_t in if_ie_vme.c.
Noted by maxv@, compile-tested for sparc.
- Don't leak in gre_clone_create error branch.
Noted by maxv@, compile-tested for amd64.
 1.150.10.2 03-Dec-2017  jdolecek update from HEAD
 1.150.10.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.150.6.1 03-Nov-2014  msaitoh Pull up following revision(s) (requested by riastradh in ticket #1117):
sys/dev/rasops/rasops.c: revision 1.72
sys/dev/vme/if_ie_vme.c: revision 1.31
sys/dev/qbus/if_qe.c: revision 1.73
sys/altq/altq_jobs.c: revision 1.7
sys/net/if_gre.c: revision 1.160
sys/dev/ic/oosiop.c: revision 1.14
- Fix error branches in altq_jobs.c to avoid leaks, noted by maxv@.
- Fix leaks in oosiop_alloc_cb error branches, noted by maxv@.
While here, avoid a sketchy pointer cast that probably falls afoul of
strict aliasing rules. Compile-tested only, with hppa.
- Don't leak f on failurein rasops.c. Noted by maxv@.
Compile-tested only, with zaurus.
- Avoid leak in error branch in if_qe.c, noted by maxv@, compile-tested for
vax.
- Sizeof struct ievme, not sizeof size_t in if_ie_vme.c.
Noted by maxv@, compile-tested for sparc.
- Don't leak in gre_clone_create error branch.
Noted by maxv@, compile-tested for amd64.
 1.153.2.1 10-Aug-2014  tls Rebase.
 1.159.2.1 22-Aug-2014  martin Pull up following revision(s) (requested by riastradh in ticket #44):
sys/altq/altq_jobs.c 1.7
Fix error branches to avoid leaks, noted by maxv@.
sys/dev/ic/oosiop.c 1.14
Fix leaks in oosiop_alloc_cb error branches, noted by maxv@.
While here, avoid a sketchy pointer cast that probably falls afoul
of strict aliasing rules.
sys/dev/qbus/if_qe.c 1.73
Avoid leak in error branch, noted by maxv@, compile-tested for vax.
sys/dev/rasops/rasops.c 1.72
Don't leak f on failure. Noted by maxv@.
sys/dev/vme/if_ie_vme.c 1.31
Sizeof struct ievme, not sizeof size_t.
Noted by maxv@, compile-tested for sparc.
sys/net/if_gre.c 1.160
Don't leak in gre_clone_create error branch.
Noted by maxv@, compile-tested for amd64.
 1.161.2.6 05-Oct-2016  skrll Sync with HEAD
 1.161.2.5 09-Jul-2016  skrll Sync with HEAD
 1.161.2.4 29-May-2016  skrll Sync with HEAD
 1.161.2.3 22-Sep-2015  skrll Sync with HEAD
 1.161.2.2 06-Jun-2015  skrll Sync with HEAD
 1.161.2.1 06-Apr-2015  skrll Sync with HEAD
 1.169.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.172.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.173.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.173.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.173.2.1 10-Jun-2019  christos Sync with HEAD
 1.175.2.1 01-Nov-2019  martin Pull up following revision(s) (requested by knakahara in ticket #387):

sys/net/if_gre.c: revision 1.176
sys/net/if_l2tp.c: revision 1.40
sys/dev/pci/ixgbe/ix_txrx.c: revision 1.56
sys/net/if_tap.c: revision 1.114

Fix missing kpreempt_disable() before softint_schedule() like if_vmx.c:r1.51.
 1.176.2.1 29-Feb-2020  ad Sync with head.
 1.177.6.1 03-Apr-2021  thorpej Sync with HEAD.
 1.184.4.1 04-Feb-2024  martin Pull up following revision(s) (requested by jdolecek in ticket #583):

sys/kern/uipc_socket.c: revision 1.308
sys/kern/uipc_syscalls.c: revision 1.211
sys/sys/socketvar.h: revision 1.168
sys/net/if_gre.c: revision 1.185

fix PIPE_SOCKETPAIR variant of pipe1() to apply correctly the 'flags'
passed when called via pipe2(2), fixing repeatable process hang during
compilation with 'gcc -pipe'

refactor fsocreate() to return the new socket and file pointers,
expect the caller to call fd_affix() once initialization is fully complete
use the new fsocreate() to replace the duplicate open-coded 'flags' handling
in makesocket() used for socketpair(2), and in the PIPE_SOCKETPAIR pipe1()
this also fixes lib/libc/sys/t_pipe2 pipe2_cloexec test to succeed
on PIPE_SOCKETPAIR kernel

fixes PR kern/55690
 1.185.2.1 02-Aug-2025  perseant Sync with HEAD
 1.50 03-Dec-2021  andvar fix various typos in comments, log messages and documentation.
 1.49 14-Feb-2021  roy if_gre: Remove alignment checks in favour copying to stack

Makes the code a lot simpler, idea from dyoung@
 1.48 12-Feb-2021  roy if_gre: Ensure that gre_h is aligned
 1.47 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.46 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.45 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.44 26-Feb-2019  msaitoh branches: 1.44.12;
No functional change:
- Cosmetic change.
- Remove extra space between single quote and comma to make
"grep \'i kdump-ioctl.c | sort -n -k 5,5 | uniq | column -t" happy.
 1.43 06-Sep-2015  dholland branches: 1.43.18;
More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.42 29-Nov-2011  drochner branches: 1.42.8; 1.42.26;
sys/pcq.h isn't installed to userland, so only include it ifdef _KERNEL,
fixes glitch in kdump build
 1.41 02-Nov-2011  dyoung branches: 1.41.2;
For simplicity's sake, use pcq(9) instead of my own circular-queue
implementation. Saves 45 lines of code.
 1.40 01-Jun-2010  mjf Add __cacheline_aligned and __read_mostly annotations.

These annotations help to mitigate false sharing on multiprocessor
systems.

Variables annotated with __cacheline_aligned are placed into the
.data.cacheline_aligned section in the kernel. Each item in this
section is aligned on a cachline boundary - this avoids false
sharing. Highly contended global locks are a good candidate for
__cacheline_aligned annotation.

Variables annotated with __read_mostly are packed together tightly
into a .data.read_mostly section in the kernel. The idea here is that
we can pack infrequently modified data items into a cacheline and
avoid having to purge the cache, which would happen if read mostly
data and write mostly data shared a cachline. Initialisation variables
are a prime candiate for __read_mostly annotations.
 1.39 08-Sep-2008  gmcgarry branches: 1.39.12; 1.39.14; 1.39.16;
Replace most gcc-specific __attribute__ uses with BSD-style sys/cdef.h
preprocessor macros.
 1.38 15-May-2008  dyoung branches: 1.38.4;
Note both my contribution and NSF funding.
 1.37 09-May-2008  dyoung Make gre(4) work in the New File Descriptor / Socket Locking Order.

Move the function+line printing into GRE_DPRINTF().

Retire gre_closef(). Retire gre_join(). Constify gre_reconf(),
and don't pass it an LWP any longer.

Make this work in the new file descriptor regime. Add a kernel
thread per gre(4) instance whose purpose is to install the socket
into proc0's file descriptor table. Add gre_fp_send() and
gre_fp_recv() for passing file_t pointers to proc0.

Fix locking: don't solock() in the socket upcall, where it is
already held. Do solock() before calling soconnect().

Simplify reconfiguration.

Update a comment that mentions finding a less specific route, since
we don't do that any more.
 1.36 04-May-2008  martin branches: 1.36.2;
Move to standard TNF 2 clause license
 1.35 20-Feb-2008  matt branches: 1.35.6; 1.35.8; 1.35.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.34 12-Feb-2008  dyoung #include <sys/evcnt.h> for event counters.
 1.33 11-Feb-2008  dyoung Do not needlessly #include <sys/device.h>.
 1.32 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.31 23-Nov-2007  dyoung branches: 1.31.2; 1.31.6;
Delete state GRE_S_DOCONF, I no longer use it.
 1.30 06-Oct-2007  dyoung branches: 1.30.4;
Good-bye, kernel thread, we don't need you any longer.
 1.29 05-Oct-2007  martin Add missing include for definition of struct evcnt.
 1.28 05-Oct-2007  dyoung Remove a lot of dead code. Move gre_do_send() code into greintr(),
and move gre_do_recv() code into gre_receive(). Get rid of some
unused event counters.
 1.27 05-Oct-2007  dyoung Work in progress: use a raw socket for GRE in IP encapsulation
instead of adding/subtracting our own IPv4 header.

There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.

There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.

I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.

A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.

Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
 1.26 02-Sep-2007  dyoung branches: 1.26.2;
Be consistent: use the prefix sc_ for all members of the gre_softc.
 1.25 30-Aug-2007  dyoung Move sc_fp & sc_newfp from struct gre_softc to struct gre_soparm.
 1.24 30-Aug-2007  dyoung Cosmetic: remove an out-of-place comma in a comment.
 1.23 24-Aug-2007  dyoung branches: 1.23.2;
Overhaul gre(4), especially the GRE in UDP bits:

* Create the kernel thread in gre_clone_create() instead of trying
to create it in gre_ioctl(). (Thanks ad@ for suggesting it, and
pointing out that I can't kthread_create while I hold a spin
lock.) Run the thread always, but put it to sleep while the
gre(4) is not in UDP mode.

* Use sockaddr_in_init().

* Move some thread state off of the stack and into the softc.

* Extract subroutines gre_do_recv(), gre_do_send(), and gre_reconf()
from gre_thread1(), making the code more readable.
 1.22 06-May-2007  dyoung branches: 1.22.2; 1.22.6;
Switch from spl(9) to mutex(9) and condvar(9).

Fix a defect in the locking of file descriptors as we delegate a
UDP socket from userland to the kernel. Move sc_fp out of sc_soparm.
Synchronize access to sc_fp by gre_ioctl() and the kernel thread
using a condition variable. For simplicity's sake, make it the
kernel helper thread's responsibility to close its UDP socket.
 1.21 21-Mar-2007  dyoung If we do not recognize the protocol of a received packet, then
increase ifi_noproto. If the GRE header contains routing options,
increase the input-error count, ifi_ierrors.

While I am here, make some cosmetic changes: remove unnecessary
'proto' argument from gre_input3(). Shorten some staircases.
 1.20 26-Jan-2007  dyoung branches: 1.20.2; 1.20.6; 1.20.8; 1.20.10;
Mark some shared variables as volatile.
 1.19 16-Nov-2006  dyoung Cosmetic: s/g_proto/sc_proto/.

(Straggler from last commit affecting net/if_gre.c, netinet/ip_gre.c.)
 1.18 01-Sep-2006  dyoung branches: 1.18.2; 1.18.4;
Rename gre_softc member sc_sp to sc_soparm to fix NetBSD/alpha
compiles, where some other system header #defines sc_sp.

In gre_ioctl, GREDSOCK case, do not try to delete sc_fp if it is
NULL.

Move GREDSOCK and GRESSOCK definitions to where the other GRE ioctls
are defined.

Remove #ifdef GRESSOCK, it's unnecessary now that the feature is
complete.
 1.17 31-Aug-2006  dyoung Add a mode to gre(4) that sends GRE tunnel packets in UDP datagrams.
Fix MOBILE encapsulation. Add many debugging printfs (mainly
concerning UDP mode). Clean up the gre(4) code a bit. Add the
capability to setup UDP tunnels to ifconfig. Update documentation.

In UDP mode, gre(4) puts a GRE header onto transmitted packets,
and hands them to a UDP socket for transmission. That is, the
encapsulation looks like this: IP+UDP+GRE+encapsulated packet.

There are two ways to set up a UDP tunnel. One way is to tell the
source and destination IP+port to gre(4), and let gre(4) create
the socket. The other way to create a UDP tunnel is for userland
to "delegate" a UDP socket to the kernel.
 1.16 11-Dec-2005  thorpej branches: 1.16.4; 1.16.8;
ANSI function decls and application of static.
 1.15 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.14 26-Feb-2005  perry branches: 1.14.4;
nuke trailing whitespace
 1.13 10-Nov-2003  wiz branches: 1.13.8; 1.13.10;
Spell address with two d's. Inspired by similar changes in OpenBSD,
originating from Jonathon Gray and forwarded by jmc@openbsd.
 1.12 05-Sep-2003  itojun u_short -> u_int16_t
 1.11 08-Jul-2003  itojun prototype must not have variable name
 1.10 24-Feb-2002  martin branches: 1.10.16;
Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.
 1.9 10-May-2001  itojun branches: 1.9.2;
whitespace/indentation cleanup
 1.8 12-Dec-2000  thorpej branches: 1.8.2;
Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.7 05-Jul-2000  thorpej Fix an omission in the gre cloning changes.
 1.6 05-Jul-2000  thorpej Make gre(4) a cloning network pseudo-device.
 1.5 19-Nov-1999  thorpej branches: 1.5.4;
Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.4 22-Dec-1998  thorpej branches: 1.4.8; 1.4.14;
Add an extern declaration of gre_softc[] here. Wrap it and the prototypes
in #ifdef _KERNEL.
 1.3 07-Oct-1998  thorpej Fix some typos in comments, and clean up some whitespace.
 1.2 30-Sep-1998  hwr Start supporting IPPROTO_MOBILE (55) encapsulation. This is yet
another tunneling protocol used by the Mobile-IP people. See RFC 2004
for this.
 1.1 13-Sep-1998  hwr Add a gre tunnel pseudo network device. Gre = generic route encapsulation.
This device shows up like any other network interface and can be used to
tunnel L3 protocols as e.g. IP over IP.
 1.4.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.8.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.4.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.4.1 26-Feb-2002  he Pull up revision 1.10 (requested by martin):
Clear M_BCAST and M_MCAST on encapsulated packets on outgoing
mbufs. Also do not copy TTL from the inner packet, and make the
outer TTL sysctl'able. Fixes PR#14269, and makes traceroute work
over GRE tunnels.
 1.8.2.2 28-Feb-2002  nathanw Catch up to -current.
 1.8.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.9.2.1 16-Mar-2002  jdolecek Catch up with -current.
 1.10.16.5 11-Dec-2005  christos Sync with head.
 1.10.16.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.10.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.10.16.1 03-Aug-2004  skrll Sync with HEAD
 1.13.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.13.8.1 29-Apr-2005  kent sync with -current
 1.14.4.8 27-Feb-2008  yamt sync with head.
 1.14.4.7 21-Jan-2008  yamt sync with head
 1.14.4.6 07-Dec-2007  yamt sync with head
 1.14.4.5 27-Oct-2007  yamt sync with head.
 1.14.4.4 03-Sep-2007  yamt sync with head.
 1.14.4.3 26-Feb-2007  yamt sync with head.
 1.14.4.2 30-Dec-2006  yamt sync with head.
 1.14.4.1 21-Jun-2006  yamt sync with head.
 1.16.8.1 03-Sep-2006  yamt sync with head.
 1.16.4.1 09-Sep-2006  rpaulo sync with head
 1.18.4.1 10-Dec-2006  yamt sync with head.
 1.18.2.2 01-Feb-2007  ad Sync with head.
 1.18.2.1 18-Nov-2006  ad Sync with head.
 1.20.10.1 29-Mar-2007  reinoud Pullup to -current
 1.20.8.1 11-Jul-2007  mjf Sync with head.
 1.20.6.3 09-Oct-2007  ad Sync with head.
 1.20.6.2 08-Jun-2007  ad Sync with head.
 1.20.6.1 10-Apr-2007  ad Sync with head.
 1.20.2.2 07-May-2007  yamt sync with head.
 1.20.2.1 24-Mar-2007  yamt sync with head.
 1.22.6.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.22.6.2 07-Oct-2007  joerg Sync with HEAD.
 1.22.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.22.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.23.2.3 23-Mar-2008  matt sync with HEAD
 1.23.2.2 09-Jan-2008  matt sync with HEAD
 1.23.2.1 06-Nov-2007  matt sync with HEAD
 1.26.2.2 14-Oct-2007  yamt sync with head.
 1.26.2.1 06-Oct-2007  yamt sync with head.
 1.30.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.30.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.31.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.31.2.1 26-Dec-2007  ad Sync with head.
 1.35.10.3 11-Aug-2010  yamt sync with head.
 1.35.10.2 04-May-2009  yamt sync with head.
 1.35.10.1 16-May-2008  yamt sync with head.
 1.35.8.1 18-May-2008  yamt sync with head.
 1.35.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.35.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.36.2.2 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.36.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.38.4.1 19-Oct-2008  haad Sync with HEAD.
 1.39.16.1 03-Jul-2010  rmind sync with head
 1.39.14.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.39.12.1 03-Dec-2011  matt Add __cacheline_aligned and __read_mostly from -HEAD.
 1.41.2.1 17-Apr-2012  yamt sync with head
 1.42.26.1 22-Sep-2015  skrll Sync with HEAD
 1.42.8.1 03-Dec-2017  jdolecek update from HEAD
 1.43.18.1 10-Jun-2019  christos Sync with HEAD
 1.44.12.1 03-Apr-2021  thorpej Sync with HEAD.
 1.14 19-Jan-2020  thorpej Remove HIPPI support and the esh(4) driver that uses it. There have not
been any users of HIPPI for some time, and it is unlikely to be resurrected.
 1.13 28-Apr-2008  martin branches: 1.13.88; 1.13.94;
Remove clause 3 and 4 from TNF licenses
 1.12 20-Feb-2008  matt branches: 1.12.6; 1.12.8; 1.12.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.11 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.10 04-Mar-2007  christos branches: 1.10.16; 1.10.22; 1.10.24; 1.10.28;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.9 11-Dec-2005  thorpej branches: 1.9.26;
ANSI function decls and application of static.
 1.8 11-Dec-2005  christos merge ktrace-lwp.
 1.7 26-Feb-2005  perry branches: 1.7.4;
nuke trailing whitespace
 1.6 19-Nov-1999  thorpej branches: 1.6.28; 1.6.36; 1.6.38;
Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.5 18-May-1999  thorpej branches: 1.5.2; 1.5.8;
Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.4 29-May-1998  kleink branches: 1.4.10;
Sync the symbol used for multiple inclusion protection with the canonical
location of this header.
 1.3 17-May-1998  kml Correct copyright date.
 1.2 16-May-1998  thorpej Add missing RCS ID.
 1.1 14-May-1998  kml Driver for Essential Communications' RoadRunner HIPPI (800 Mb/sec network)
card. With some modification, this could probably also work for their
Gigabit Ethernet card based on the same chipset...
 1.4.10.1 21-Jun-1999  thorpej Sync w/ -current.
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.38.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.6.36.1 29-Apr-2005  kent sync with -current
 1.6.28.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.7.4.4 27-Feb-2008  yamt sync with head.
 1.7.4.3 21-Jan-2008  yamt sync with head
 1.7.4.2 03-Sep-2007  yamt sync with head.
 1.7.4.1 21-Jun-2006  yamt sync with head.
 1.9.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.10.28.1 02-Jan-2008  bouyer Sync with HEAD
 1.10.24.1 26-Dec-2007  ad Sync with head.
 1.10.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.10.16.2 23-Mar-2008  matt sync with HEAD
 1.10.16.1 09-Jan-2008  matt sync with HEAD
 1.12.10.1 16-May-2008  yamt sync with head.
 1.12.8.1 18-May-2008  yamt sync with head.
 1.12.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.13.94.1 25-Jan-2020  ad Sync with head.
 1.13.88.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.49 19-Jan-2020  thorpej Remove HIPPI support and the esh(4) driver that uses it. There have not
been any users of HIPPI for some time, and it is unlikely to be resurrected.
 1.48 11-Jan-2017  ozaki-r branches: 1.48.16; 1.48.22;
Get rid of unnecessary header inclusions
 1.47 28-Apr-2016  ozaki-r branches: 1.47.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.46 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.45 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.44 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.43 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.42 20-May-2015  ozaki-r Remove leftover use of AF_NS and NS option

Unnecessary NETISR_NS is also removed.
 1.41 05-Jun-2014  rmind branches: 1.41.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.40 15-May-2014  msaitoh Put schednetisr(NETISR_IP) into splnet()/splx() pair.
 1.39 05-Apr-2010  joerg branches: 1.39.18; 1.39.22; 1.39.32;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.38 19-Jan-2010  pooka branches: 1.38.2; 1.38.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.37 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.36 18-Mar-2009  cegger bcopy -> memcpy
 1.35 07-Nov-2008  dyoung branches: 1.35.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.34 20-Feb-2008  matt branches: 1.34.6; 1.34.10; 1.34.16; 1.34.18;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.33 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.32 19-Oct-2007  ad branches: 1.32.4; 1.32.8;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.31 30-Aug-2007  dyoung branches: 1.31.4;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.30 26-Aug-2007  dyoung branches: 1.30.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.29 07-Aug-2007  dyoung branches: 1.29.2;
Use satocsdl() instead of SDL(). bcopy -> memcpy.
 1.28 04-Mar-2007  christos branches: 1.28.2; 1.28.10; 1.28.14;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.27 20-Feb-2007  dyoung Remove unused #define SIN.

Constify.
 1.26 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.25 16-Nov-2006  christos branches: 1.25.4;
__unused removal on arguments; approved by core.
 1.24 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.23 07-Jun-2006  kardel branches: 1.23.6; 1.23.8;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.22 11-Dec-2005  thorpej branches: 1.22.4; 1.22.6; 1.22.8; 1.22.14;
ANSI function decls and application of static.
 1.21 11-Dec-2005  christos merge ktrace-lwp.
 1.20 30-May-2005  christos branches: 1.20.2;
bcopy -> memcpy
bcmp -> memcmp
and remove casts.
 1.19 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.18 26-Feb-2005  perry nuke trailing whitespace
 1.17 07-Aug-2003  agc branches: 1.17.8; 1.17.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.16 19-Jan-2003  simonb branches: 1.16.2;
Remove variable that is only assigned too but not referenced.
 1.15 13-Jun-2002  itojun correect AF_INET6 handling
 1.14 12-Nov-2001  lukem branches: 1.14.8; 1.14.10;
add RCSIDs
 1.13 18-Jul-2001  thorpej bzero -> memset
 1.12 14-Jun-2001  itojun branches: 1.12.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.11 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.10 17-Jan-2001  thorpej branches: 1.10.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.9 18-Dec-2000  thorpej Fill in if_dlt.
 1.8 13-Dec-2000  thorpej Add ALTQ glue.
 1.7 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.6 02-Oct-2000  itojun fix IPv6 packet manipulation. (use ip6intrq)
 1.5 30-Mar-2000  augustss branches: 1.5.4;
Kill some more register declarations.
 1.4 06-Mar-2000  thorpej Initialize ifp->if_baudrate to a sensible value when the interface is
attached. XXX Need to double-check this one.
 1.3 18-May-1999  thorpej branches: 1.3.2;
Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.2 05-Jul-1998  jonathan branches: 1.2.10;
defopt INET, NETATALK.
 1.1 14-May-1998  kml Driver for Essential Communications' RoadRunner HIPPI (800 Mb/sec network)
card. With some modification, this could probably also work for their
Gigabit Ethernet card based on the same chipset...
 1.2.10.1 21-Jun-1999  thorpej Sync w/ -current.
 1.3.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.3.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.3.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.3.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.4.1 17-Oct-2000  tv Pullup 1.6 [itojun]:
fix IPv6 packet manipulation. (use ip6intrq)
 1.10.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.10.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.10.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.12.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.12.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.12.2.1 03-Aug-2001  lukem update to -current
 1.14.10.1 15-Jun-2002  lukem Pull up revision 1.15 (requested by itojun in ticket #261):
correect AF_INET6 handling
 1.14.8.1 20-Jun-2002  gehenna catch up with -current.
 1.16.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.2.5 01-Apr-2005  skrll Sync with HEAD.
 1.16.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.16.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.2.1 03-Aug-2004  skrll Sync with HEAD
 1.17.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.17.8.1 29-Apr-2005  kent sync with -current
 1.20.2.6 27-Feb-2008  yamt sync with head.
 1.20.2.5 21-Jan-2008  yamt sync with head
 1.20.2.4 27-Oct-2007  yamt sync with head.
 1.20.2.3 03-Sep-2007  yamt sync with head.
 1.20.2.2 26-Feb-2007  yamt sync with head.
 1.20.2.1 21-Jun-2006  yamt sync with head.
 1.22.14.1 19-Jun-2006  chap Sync with head.
 1.22.8.1 26-Jun-2006  yamt sync with head.
 1.22.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.22.4.1 09-Sep-2006  rpaulo sync with head
 1.23.8.2 10-Dec-2006  yamt sync with head.
 1.23.8.1 22-Oct-2006  yamt sync with head
 1.23.6.1 18-Nov-2006  ad Sync with head.
 1.25.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.25.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.28.14.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.28.14.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.28.14.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.28.10.2 03-Sep-2007  skrll Sync with HEAD.
 1.28.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.28.2.3 23-Oct-2007  ad Sync with head.
 1.28.2.2 09-Oct-2007  ad Sync with head.
 1.28.2.1 20-Aug-2007  ad Sync with HEAD.
 1.29.2.2 07-Aug-2007  dyoung Use satocsdl() instead of SDL(). bcopy -> memcpy.
 1.29.2.1 07-Aug-2007  dyoung file if_hippisubr.c was added on branch matt-mips64 on 2007-08-07 04:39:35 +0000
 1.30.2.3 23-Mar-2008  matt sync with HEAD
 1.30.2.2 09-Jan-2008  matt sync with HEAD
 1.30.2.1 06-Nov-2007  matt sync with HEAD
 1.31.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.32.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.32.4.1 26-Dec-2007  ad Sync with head.
 1.34.18.2 28-Apr-2009  skrll Sync with HEAD.
 1.34.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.34.16.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.34.10.3 11-Aug-2010  yamt sync with head.
 1.34.10.2 11-Mar-2010  yamt sync with head
 1.34.10.1 04-May-2009  yamt sync with head.
 1.34.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.35.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.38.4.1 30-May-2010  rmind sync with head
 1.38.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.39.32.1 10-Aug-2014  tls Rebase.
 1.39.22.1 18-May-2014  rmind sync with head
 1.39.18.2 03-Dec-2017  jdolecek update from HEAD
 1.39.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.41.4.6 05-Feb-2017  skrll Sync with HEAD
 1.41.4.5 29-May-2016  skrll Sync with HEAD
 1.41.4.4 22-Apr-2016  skrll Sync with HEAD
 1.41.4.3 19-Mar-2016  skrll Sync with HEAD
 1.41.4.2 22-Sep-2015  skrll Sync with HEAD
 1.41.4.1 06-Jun-2015  skrll Sync with HEAD
 1.47.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.48.22.1 25-Jan-2020  ad Sync with head.
 1.48.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.8 20-Feb-2008  matt branches: 1.8.6; 1.8.8; 1.8.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.7 04-Mar-2007  christos branches: 1.7.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.6 10-Dec-2005  elad branches: 1.6.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.5 06-Aug-2005  kiyohara Using DLT_APPLE_IP_OVER_IEEE1394.
 1.4 11-Jul-2005  kiyohara ieee1394 import from FreeBSD.
 1.3 20-Nov-2000  onoe branches: 1.3.2; 1.3.26; 1.3.42;
Use DMA from mbuf instead of copy in transmit.
Still use memcpy in receiving because we must use buffer fill mode
and many packets may share single receive buffer.

XXX: Workaround(?) for CXD3222: it fails to DMA for selfid packet according
to code placement. I'm not sure about the reason (cache? timing? bug?).

Fixed the bug: transmitter sometimes stop and OACTIVE bit of if_fw
never be cleared.
Fixed the bug: freeing free buffer.

Enable ieee1394_drain and ieee1394_watchdog for loss of fragment.
 1.2 14-Nov-2000  onoe Add support for link fragmentation and reassemble for IEEE-1394.
XXX: drain is still not yet implemented, thus memory leak will occur
in case of any of fragment lost.
 1.1 05-Nov-2000  onoe First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.3.42.3 27-Feb-2008  yamt sync with head.
 1.3.42.2 03-Sep-2007  yamt sync with head.
 1.3.42.1 21-Jun-2006  yamt sync with head.
 1.3.26.2 11-Dec-2005  christos Sync with head.
 1.3.26.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.3.2.1 20-Nov-2000  bouyer file if_ieee1394.h was added on branch thorpej_scsipi on 2000-11-22 16:05:52 +0000
 1.6.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.7.16.1 23-Mar-2008  matt sync with HEAD
 1.8.10.1 16-May-2008  yamt sync with head.
 1.8.8.1 18-May-2008  yamt sync with head.
 1.8.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.70 21-Sep-2025  christos Centralize all the "can't handle af%d\n", messages in one place and provide
more context. Now I get ad-nauseam:
ether_output: wm1: can't handle af18 (link: link#2)
 1.69 03-Sep-2022  thorpej branches: 1.69.8;
Garbage-collect the remaining vestiges of netisr.
 1.68 03-Sep-2022  thorpej Convert ARP from a legacy netisr to pktqueue.
 1.67 31-Dec-2021  riastradh sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.66 28-Aug-2020  ozaki-r net: introduce IFQ_ENQUEUE_ISR to assemble packet queuing routines (NFCI)
 1.65 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.64 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.63 26-Jun-2018  msaitoh branches: 1.63.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.62 09-May-2018  maxv Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
 1.61 07-May-2018  maxv Use m_remove_pkthdr.

ok knakahara@ (for L2TP)
 1.60 26-Apr-2018  maxv m_copy -> m_copym
 1.59 14-Feb-2017  ozaki-r branches: 1.59.12;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.58 03-Oct-2016  ozaki-r branches: 1.58.2;
Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.57 15-Aug-2016  maxv Memory leak, found by brainy; not tested, but obvious enough
 1.56 22-Jun-2016  knakahara branches: 1.56.2;
fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.55 28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.54 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.53 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.52 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.51 13-Oct-2015  roy arpresolve() now returns 0 on success otherwise an error code.
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.

This allows applications such as ping(8) to display a suitable error
condition.
 1.50 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.49 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.48 28-Nov-2014  ozaki-r branches: 1.48.2;
Remove dead codes and make if_free_sadl static

No functional change.
 1.47 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.46 15-May-2014  msaitoh Put schednetisr() into splnet()/splx() pair.
This might avoids delay of processing a packet.
 1.45 05-Apr-2010  joerg branches: 1.45.18; 1.45.22; 1.45.32;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.44 31-Mar-2010  pgoyette Now that fw_port.h is gone, we need to directly include <sys/select.h>

Fixes build break reported by myself.
 1.43 29-Mar-2010  kiyohara Bye-bye fw_port.h.
 1.42 19-Jan-2010  pooka branches: 1.42.2; 1.42.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.41 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.40 28-Apr-2008  martin branches: 1.40.6; 1.40.8;
Remove clause 3 and 4 from TNF licenses
 1.39 20-Feb-2008  matt branches: 1.39.6; 1.39.8; 1.39.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.38 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.37 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.36 30-Aug-2007  dyoung branches: 1.36.6; 1.36.8; 1.36.12;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.35 26-Aug-2007  dyoung branches: 1.35.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.34 04-Mar-2007  christos branches: 1.34.2; 1.34.10; 1.34.14;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.33 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.32 07-Jun-2006  kardel branches: 1.32.12;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.31 11-Dec-2005  christos branches: 1.31.4; 1.31.6; 1.31.8; 1.31.14;
merge ktrace-lwp.
 1.30 06-Aug-2005  kiyohara Using DLT_APPLE_IP_OVER_IEEE1394.
 1.29 11-Jul-2005  kiyohara ieee1394 import from FreeBSD.
 1.28 08-Jan-2005  yamt branches: 1.28.10;
constify broadcastaddr.
 1.27 20-Aug-2004  tron Pass correct "mbuf" pointer to BPF framework.
 1.26 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.25 26-Oct-2003  christos Fix uninitialized variable warnings.`
 1.24 03-Oct-2003  itojun when dropping M_PKTHDR, need to free m_tag associated with it.
 1.23 23-May-2003  itojun branches: 1.23.2;
don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.22 06-May-2003  enami Initialize mb.m_data.
 1.21 01-May-2003  itojun bpf_mtap() does not care about M_PKTHDR at the top. M_COPY_PKTHDR has some
consequences, so avoid it. if we need to attach dummy headers, we should
use M_PREPEND instead.
 1.20 01-May-2003  itojun don't be too verbose on nd6_storelladdr failure
 1.19 26-Sep-2002  onoe initialize pkthdr for dummy mbuf before calling bpf_mtap().
 1.18 25-Jun-2002  onoe Fill ar_hrd for AF_ARP.
 1.17 24-Jun-2002  enami Actually inject the arp packet into softintr queue.
 1.16 24-Jun-2002  itojun integrate IEEE1394 ARP into generic ARP logic.
XXX there's no check at all in ar_hrd, and we don't set ar_hrd on outgoing.
it seems like a bad thing.
 1.15 16-May-2002  haya branches: 1.15.2;
Bugfix: s/__NetBSD_Version/__NetBSD_Version__/. IPv4 over IEEE 1394
will work with this change.
 1.14 05-Mar-2002  itojun branches: 1.14.6;
bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.13 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.12 12-Nov-2001  lukem add RCSIDs
 1.11 14-Jun-2001  itojun branches: 1.11.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.10 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.9 17-Jan-2001  thorpej branches: 1.9.2;
Correct last commit.
 1.8 17-Jan-2001  jdolecek move local variable sdl from ieee1394_ifdetach() to ieee1394_ifattach(), so that
this file is compilable after previous change
XXX not tested
 1.7 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.6 18-Dec-2000  thorpej Fill in if_dlt.
 1.5 13-Dec-2000  thorpej Add ALTQ glue.
 1.4 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.3 20-Nov-2000  onoe branches: 1.3.2;
Use DMA from mbuf instead of copy in transmit.
Still use memcpy in receiving because we must use buffer fill mode
and many packets may share single receive buffer.

XXX: Workaround(?) for CXD3222: it fails to DMA for selfid packet according
to code placement. I'm not sure about the reason (cache? timing? bug?).

Fixed the bug: transmitter sometimes stop and OACTIVE bit of if_fw
never be cleared.
Fixed the bug: freeing free buffer.

Enable ieee1394_drain and ieee1394_watchdog for loss of fragment.
 1.2 14-Nov-2000  onoe Add support for link fragmentation and reassemble for IEEE-1394.
XXX: drain is still not yet implemented, thus memory leak will occur
in case of any of fragment lost.
 1.1 05-Nov-2000  onoe First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.3.2.7 21-Apr-2001  bouyer Sync with HEAD
 1.3.2.6 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.3.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.3.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.3.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.3.2.1 20-Nov-2000  bouyer file if_ieee1394subr.c was added on branch thorpej_scsipi on 2000-11-22 16:05:53 +0000
 1.9.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.9.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.9.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.9.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.9.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.9.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.9.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.11.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.11.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.11.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.11.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.11.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.14.6.2 15-Jul-2002  gehenna catch up with -current.
 1.14.6.1 30-May-2002  gehenna Catch up with -current.
 1.15.2.1 24-Jun-2003  grant Pull up revision 1.23 (requested by itojun in ticket #1325):

don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.23.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.23.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.23.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.23.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.23.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.23.2.1 03-Aug-2004  skrll Sync with HEAD
 1.28.10.6 27-Feb-2008  yamt sync with head.
 1.28.10.5 11-Feb-2008  yamt sync with head.
 1.28.10.4 21-Jan-2008  yamt sync with head
 1.28.10.3 03-Sep-2007  yamt sync with head.
 1.28.10.2 26-Feb-2007  yamt sync with head.
 1.28.10.1 21-Jun-2006  yamt sync with head.
 1.31.14.1 19-Jun-2006  chap Sync with head.
 1.31.8.1 26-Jun-2006  yamt sync with head.
 1.31.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.31.4.1 09-Sep-2006  rpaulo sync with head
 1.32.12.2 12-Mar-2007  rmind Sync with HEAD.
 1.32.12.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.34.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.34.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.34.2.1 09-Oct-2007  ad Sync with head.
 1.35.2.3 23-Mar-2008  matt sync with HEAD
 1.35.2.2 09-Jan-2008  matt sync with HEAD
 1.35.2.1 06-Nov-2007  matt sync with HEAD
 1.36.12.1 02-Jan-2008  bouyer Sync with HEAD
 1.36.8.1 26-Dec-2007  ad Sync with head.
 1.36.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.39.10.4 11-Aug-2010  yamt sync with head.
 1.39.10.3 11-Mar-2010  yamt sync with head
 1.39.10.2 04-May-2009  yamt sync with head.
 1.39.10.1 16-May-2008  yamt sync with head.
 1.39.8.1 18-May-2008  yamt sync with head.
 1.39.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.39.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.40.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.40.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.42.4.1 30-May-2010  rmind sync with head
 1.42.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.45.32.1 10-Aug-2014  tls Rebase.
 1.45.22.1 18-May-2014  rmind sync with head
 1.45.18.2 03-Dec-2017  jdolecek update from HEAD
 1.45.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.48.2.8 28-Aug-2017  skrll Sync with HEAD
 1.48.2.7 05-Oct-2016  skrll Sync with HEAD
 1.48.2.6 09-Jul-2016  skrll Sync with HEAD
 1.48.2.5 29-May-2016  skrll Sync with HEAD
 1.48.2.4 22-Apr-2016  skrll Sync with HEAD
 1.48.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.48.2.2 22-Sep-2015  skrll Sync with HEAD
 1.48.2.1 06-Jun-2015  skrll Sync with HEAD
 1.56.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.56.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.58.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.59.12.5 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.59.12.4 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.59.12.3 28-Jul-2018  pgoyette Sync with HEAD
 1.59.12.2 21-May-2018  pgoyette Sync with HEAD
 1.59.12.1 02-May-2018  pgoyette Synch with HEAD
 1.63.2.1 10-Jun-2019  christos Sync with HEAD
 1.69.8.2 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.69.8.1 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.37 13-Oct-2003  dyoung Complete replacement of the old 802.11 layer with the new.
 1.36 06-Jul-2003  dyoung From Sam Leffler/FreeBSD: AP scanning code, for forthcoming ADM8211
driver (and for eventual synchronization w/ Sam's enhancements to
FreeBSD).

From dyoung@netbsd.org, factor ieee80211_create_ibss and
ieee80211_match_bss out of ieee80211_end_scan for re-use in the
forthcoming ADM8211 driver.
 1.35 06-Jul-2003  dyoung Move the logic to find out what channel to transmit a packet on
into ieee80211_get_channel, rather than duplicate it in ieee80211_ioctl
and in the ADM8211 driver.
 1.34 06-Jul-2003  dyoung More 802.11 media-handling consolidation. ieee80211_media_status
and ieee80211_media_change are factorizations of the media
status/change functions for wi and awi. Inspired by Sam Leffler/FreeBSD.
 1.33 06-Jul-2003  dyoung Prepare to consolidate 802.11 media handling (which is handled in
code duplicated by each driver, now) into the 802.11 framework.
 1.32 16-May-2003  dyoung branches: 1.32.2;
IEEE80211_LOCK and WI_LOCK conceal enormous differences in locking
semantics on FreeBSD and NetBSD, so I am backing them out until
the macro set is enriched.
 1.31 13-May-2003  dyoung Sync with FreeBSD. Spelling fix. Make ieee80211_decap a little more
readable. Accomodate both FreeBSD arpcom and NetBSD ethercom with
conditional compilation.
 1.30 13-May-2003  dyoung Add utility routine ieee80211_get_rate().
 1.29 13-May-2003  dyoung Begin sync with 802.11 framework in FreeBSD: adopt macros for
locking.
 1.28 13-May-2003  dyoung Define an 802.11 PLCP header and constants.

Define two new status codes for management frames.

Define 802.11 durations of important frame sequences, as will be
needed for ADMtek ADM8211 driver and others.
 1.27 08-Apr-2003  kml Host AP power saving support. The Host AP notices that the power
saving bit is set in incoming frames from a station, and buffers the
outgoing frames for the station until they are polled for. This
requires support in the driver to set a bit in the TIM bitmap sent
during 802.11 beacons.

So far, support for power saving in Host AP mode is only available
for the PRISM2 chipset.
 1.26 25-Feb-2003  dyoung Add support for Prism monitor mode. From Kevin Lahey
<kml@patheticgeek.net>.

This patch does NOT add monitor mode support for the Lucent radios.

awi(4) was only modified for compatibility with the new mediaopt.
It does NOT support monitor mode.

Tested by Kevin, Daniel Carosone, and I.
 1.25 16-Nov-2002  dyoung Fix typo: IEEE80211_FC0_SUBTYPE_CF_ACK_CF_ACK becomes
IEEE80211_FC0_SUBTYPE_CF_ACK_CF_POLL. This is the name the IEEE
802.11 specification indicates.
 1.24 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.23 15-Oct-2002  onoe Clear wep key when 'ifconfig -nwkey'.
Change the name for the size of driver private structure:
ic_bss_privlen -> ic_node_privlen
Add a hook to free node for the driver private structure, though there are
no consumer of the structure in the tree for now.
 1.22 30-Sep-2002  onoe old lucent adhoc-demo mode (adhoc,flag0 or wiconfig -p 3) wasn't handled
correctly. To avoid massive extra code in each driver, now if_iee80211subr.c
also handles non-standard old lucent adhoc-demo mode.
This also fixes PR 14227.
 1.21 30-Sep-2002  onoe Obtain some functionality from wi_hostap;
use hash for device node list.
Avoid use weak IV value, increment IV for each packet.
 1.20 27-Sep-2002  onoe Add support for SIOC[SG]80211BSSID, SIOC[SG]80211CHANNEL.
Change the name of structure ieee80211_bss to ieee80211_node, which is
used for management of stations in hostap mode, and peers in ibss mode.
Split off ic_opmode, ic_phytype from ic_flags.
Preparation to merge 'wi' driver into 80211subr.c.
 1.19 22-Sep-2002  thorpej Fix thinko in the SIOC{G,S}80211CHANNEL and SIOC{G,S}80211BSSID
ioctls.
 1.18 15-Sep-2002  thorpej Add new ioctls:
* SIOCS80211CHANNEL, SIOCG80211CHANNEL -- set/get the 802.11 channel.
* SIOCS80211BSSID, SIOCG80211BSSID -- set desired/get current BSSID.

From David Young <dyoung@ojctech.com>.
 1.17 03-Sep-2002  onoe Several fixes hostap for awi driver:
- aging ang clear inactivity stations
- DTIM field in beacon/probe response.
- ignore IFF_PROMISC for hostap mode, since 802.11 has 3 address fields,
so that promisc mode is not required for AP function.
 1.16 02-Sep-2002  onoe Add experimental support of Host-AP mode for awi driver.
It works also with WEP enabled.
But aging the associated clients is not implemented yet, so that the number
of clients may increase unlimitedly..
 1.15 28-Aug-2002  onoe Attach another DLTs for bpf: DLT_IEEE802_11 to capture raw 802.11 frame.
 1.14 11-Aug-2002  thorpej * Additional frame control types.
* BEACON and AUTH management packet info.
* Add ioctl for configuring 802.11 auth mode.

From OpenBSD.
 1.13 05-Aug-2002  onoe Fix IBSS for awi driver.
 1.12 19-Sep-2001  onoe branches: 1.12.10;
Fix for FH infrastructure mode.
XXX: FH chanset should be calculated by FH hop pattern, but BayStack 650 AP
always specify chanset to fixed value 1. The previous code try to this
hack into awi driver, but it is insufficient because the chanset value
in awi driver may change while scan and it may be different from the
value in receiving beacon/probe-response. So we save encoded FH chanset
into channel in 802.11 common bss information for now.
 1.11 18-Sep-2001  onoe Move IEEE 802.11 MAC management functions from awi driver to
if_ieee80211subr.c, which can be shared between any IEEE 802.11
drivers.
However, most of current working IEEE 802.11b wireless LAN cards
have rich firmware and we cannot have a control to management frames
for such cards.

IBSS creation is now supported for the awi driver.
 1.10 11-Sep-2001  onoe Add definition of mask/shift for seqence/fragment in sequence control field.
 1.9 25-Jun-2001  onoe branches: 1.9.2; 1.9.4;
add more capability information and status from IEEE802.11b
 1.8 21-Jun-2001  onoe Add definitions of the value for 'i_wepon' of ieee80211_nwkey to prepare
support for persistent keys.
 1.7 18-Dec-2000  thorpej branches: 1.7.2;
Add a version of the 802.11 frame header that includes the 4th address.
 1.6 12-Dec-2000  thorpej Add a way to manipulate the power management parameters specified in
802.11.
 1.5 21-Jul-2000  onoe branches: 1.5.2;
add following two ioctls to handle WEP key for IEEE 802.11 wireless
LAN drivers: SIOCS80211NWKEY and SIOCG80211NWKEY.
 1.4 05-Jul-2000  onoe change the argument of SIOCS80211NWID and SIOCG80211NWID ioctls from
u_int8_t array to struct ieee80211_nwid to prepend length field.
The length field is necessary because IEEE 802.11 spec doesn't prohibit
even '\0' for SSID.
Though the name and the value of SIOC... macro is unchanged, this change
breaks binary compatibility. The only affected userland program on the
tree is ifconfig(8).
As Jason suggested on tech-net, it is better than live with problems
since there are no releases for this ioctls yet.
 1.3 09-Jun-2000  onoe branches: 1.3.2;
cleanup haeders.
add opt_awi.h to define AWI_DEBUG, AWI_WEP_ARC4.
show the firmware version at attach.
create a framework to support WEP (encryption code is not included for now).
a new wiconfig compatible ioctl interface replaced the awictl interface.
fix memory leak in selecting AP
fix bugs in ESSID selection
changes from FreeBSD-current by Warner Losh:
revision 1.2
date: 2000/04/17 22:58:15; author: imp; state: Exp; lines: +16 -1
Provide mem* for compat with NetBSD to fix LINT
fixes from FreeBSD-current by Guido van Rooij:
revision 1.4
date: 2000/05/29 19:58:10; author: guido; state: Exp; lines: +5 -2
Fix a panic resulting from an obvious null pointer deref.
Apparently some other panics still exist in this driver, but with
this fix, it was at least possible to run the Nokia card at SANE 2000.
 1.2 10-Mar-2000  onoe branches: 1.2.2; 1.2.4;
Rename the macro IEEE80211_FC1_RCVFROM_XXX to IEEE80211_FC1_DIR_XXX
and fix the value to be consistent with IEEE 802.11 spec.
The only customer of this macro is if_ray driver for now.
 1.1 23-Jan-2000  chopps Add beginnings of ieee 802.11 generic stuff
 1.2.4.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.2.2.2 13-May-2000  he Pull up revisions 1.1-1.2 (new, requested by he):
Add a driver for ``wi'', Lucent "Orinoco"/Wavelan.
 1.2.2.1 10-Mar-2000  he file if_ieee80211.h was added on branch netbsd-1-4 on 2000-05-13 15:33:33 +0000
 1.3.2.2 21-Jul-2000  onoe Pullup 802.11 stuff (approved by jhawk)
- add support for nwkey to ifconfig
basesrc/sbin/ifconfig/ifconfig.c 1.88
basesrc/sbin/ifconfig/ifconfig.8 1.39
syssrc/sys/dev/ic/awi.c 1.26
syssrc/sys/dev/ic/awi_wep.c 1.3
syssrc/sys/dev/ic/awivar.h 1.12
syssrc/sys/dev/pcmcia/if_wi.c 1.26
syssrc/sys/net/if.c 1.69
syssrc/sys/net/if_ieee80211.h 1.5
 1.3.2.1 21-Jul-2000  onoe Pullups 802.11 stuff (approved by jhawk)
- allow non-string nwid settings
basesrc/sbin/ifconfig/ifconfig.c 1.82-1.86
basesrc/sbin/ifconfig/ifconfig.8 1.37
syssrc/sys/dev/ic/awi.c 1.21
syssrc/sys/dev/pcmcia/if_ray.c 1.21
syssrc/sys/dev/pcmcia/if_wi.c 1.23
syssrc/sys/dev/pcmcia/if_wivar.h 1.10
syssrc/sys/net/if_ieee80211.h 1.4
 1.5.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.5.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.5.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.2.1 21-Jul-2000  bouyer file if_ieee80211.h was added on branch thorpej_scsipi on 2000-11-20 18:10:03 +0000
 1.7.2.8 11-Dec-2002  thorpej Sync with HEAD.
 1.7.2.7 11-Nov-2002  nathanw Catch up to -current
 1.7.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.7.2.5 22-Sep-2002  thorpej Sync with HEAD.
 1.7.2.4 17-Sep-2002  nathanw Catch up to -current.
 1.7.2.3 13-Aug-2002  nathanw Catch up to -current.
 1.7.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.7.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.9.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.9.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.9.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.9.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.12.10.1 29-Aug-2002  gehenna catch up with -current.
 1.32.2.1 03-Aug-2004  skrll Sync with HEAD
 1.44 13-Oct-2003  dyoung Complete replacement of the old 802.11 layer with the new.
 1.43 06-Jul-2003  dyoung From Sam Leffler/FreeBSD: AP scanning code, for forthcoming ADM8211
driver (and for eventual synchronization w/ Sam's enhancements to
FreeBSD).

From dyoung@netbsd.org, factor ieee80211_create_ibss and
ieee80211_match_bss out of ieee80211_end_scan for re-use in the
forthcoming ADM8211 driver.
 1.42 06-Jul-2003  dyoung A straggler in the 802.11 media-handling consolidation.
 1.41 06-Jul-2003  dyoung Still more consolidation of 802.11 media-handling, moving moving
common code from awi and wi into the 802.11 framework. Inspired by
Sam Leffler's patches to FreeBSD.
 1.40 06-Jul-2003  dyoung Move the logic to find out what channel to transmit a packet on
into ieee80211_get_channel, rather than duplicate it in ieee80211_ioctl
and in the ADM8211 driver.
 1.39 06-Jul-2003  dyoung Bug fix: an ad-hoc node's SSID can change at any time, so record
a change of SSID (provided it changes to a non-empty SSID) regardless
of whether it comes in a probe response or a beacon.
 1.38 06-Jul-2003  dyoung With the IEEE80211_HEADER_LEN macro, accomodate hardware whose
"native" 802.11 header is a 4-address header, not the 3-address
header.
 1.37 06-Jul-2003  dyoung More 802.11 media-handling consolidation. ieee80211_media_status
and ieee80211_media_change are factorizations of the media
status/change functions for wi and awi. Inspired by Sam Leffler/FreeBSD.
 1.36 06-Jul-2003  dyoung In debug messages, tell on which channel we send a mgmt frame.
 1.35 06-Jul-2003  dyoung In debug messages, use the right MAC address to report which client
turned off power-saving mode.
 1.34 31-May-2003  dyoung branches: 1.34.2;
Don't call ieee80211_chan2ieee, we don't have it, yet.
 1.33 16-May-2003  itojun use strlcpy
 1.32 16-May-2003  dyoung IEEE80211_LOCK and WI_LOCK conceal enormous differences in locking
semantics on FreeBSD and NetBSD, so I am backing them out until
the macro set is enriched.
 1.31 13-May-2003  dyoung Sync with FreeBSD. Spelling fix. Make ieee80211_decap a little more
readable. Accomodate both FreeBSD arpcom and NetBSD ethercom with
conditional compilation.
 1.30 13-May-2003  dyoung Verify beacon/probe response information elements to prevent buffer
overflows.
 1.29 13-May-2003  dyoung Pack rates, SSID using helper routines ieee80211_add_{rates,ssid}
instead of reduplicating code. From Sam Leffler/FreeBSD.
 1.28 13-May-2003  dyoung Add utility routine ieee80211_get_rate().
 1.27 13-May-2003  dyoung Begin sync with 802.11 framework in FreeBSD: adopt macros for
locking.
 1.26 13-May-2003  dyoung Fix two bugs: supported rates elements were assembled incorrectly,
and the 'no recent beacons from %s' message told the wrong BSSID.
 1.25 08-Apr-2003  kml Host AP power saving support. The Host AP notices that the power
saving bit is set in incoming frames from a station, and buffers the
outgoing frames for the station until they are polled for. This
requires support in the driver to set a bit in the TIM bitmap sent
during 802.11 beacons.

So far, support for power saving in Host AP mode is only available
for the PRISM2 chipset.
 1.24 25-Feb-2003  dyoung Add support for Prism monitor mode. From Kevin Lahey
<kml@patheticgeek.net>.

This patch does NOT add monitor mode support for the Lucent radios.

awi(4) was only modified for compatibility with the new mediaopt.
It does NOT support monitor mode.

Tested by Kevin, Daniel Carosone, and I.
 1.23 19-Jan-2003  simonb Fix code in #ifdef WICACHE block to use renamed constants and
struct members.
Remove a break after a return.
 1.22 16-Oct-2002  onoe Do not start active scan for WI_RID_SCAN_APS request on HostAP, just return
the list of current association clients for WI_RID_READ_APS.
Reset active channel list after active scan.
 1.21 15-Oct-2002  onoe Clear wep key when 'ifconfig -nwkey'.
Change the name for the size of driver private structure:
ic_bss_privlen -> ic_node_privlen
Add a hook to free node for the driver private structure, though there are
no consumer of the structure in the tree for now.
 1.20 11-Oct-2002  onoe Use ieee80211_free_node() instead of TAILQ_REMOVE() not to forget
LIST_REMOVE() for ni_hash. This fixes panic after deassociation due to
inactivity for HostAP.
 1.19 04-Oct-2002  onoe Perform as a bridge within the AP for HostAP mode, to allow communication
between a wireless station and another wireless station.
 1.18 01-Oct-2002  onoe In AP mode, transmit deauth to (re)associating station without authenticated.
 1.17 01-Oct-2002  onoe Allow SIOCSIFADDR with AF_LINK and WI_RID_MAC_NODE (wiconfig -m)
to set MAC address.
 1.16 30-Sep-2002  onoe old lucent adhoc-demo mode (adhoc,flag0 or wiconfig -p 3) wasn't handled
correctly. To avoid massive extra code in each driver, now if_iee80211subr.c
also handles non-standard old lucent adhoc-demo mode.
This also fixes PR 14227.
 1.15 30-Sep-2002  onoe Obtain some functionality from wi_hostap;
use hash for device node list.
Avoid use weak IV value, increment IV for each packet.
 1.14 29-Sep-2002  onoe Fix SIOCG80211BSSID to return current BSSID if associated.
 1.13 27-Sep-2002  onoe Add support for SIOC[SG]80211BSSID, SIOC[SG]80211CHANNEL.
Change the name of structure ieee80211_bss to ieee80211_node, which is
used for management of stations in hostap mode, and peers in ibss mode.
Split off ic_opmode, ic_phytype from ic_flags.
Preparation to merge 'wi' driver into 80211subr.c.
 1.12 03-Sep-2002  onoe Several fixes hostap for awi driver:
- aging ang clear inactivity stations
- DTIM field in beacon/probe response.
- ignore IFF_PROMISC for hostap mode, since 802.11 has 3 address fields,
so that promisc mode is not required for AP function.
 1.11 02-Sep-2002  onoe Add experimental support of Host-AP mode for awi driver.
It works also with WEP enabled.
But aging the associated clients is not implemented yet, so that the number
of clients may increase unlimitedly..
 1.10 28-Aug-2002  onoe Attach another DLTs for bpf: DLT_IEEE802_11 to capture raw 802.11 frame.
 1.9 11-Aug-2002  drochner rename WI_ ioctl to make it compile again
 1.8 05-Aug-2002  onoe Fix IBSS for awi driver.
 1.7 12-Mar-2002  onoe branches: 1.7.4;
fix CRC (ICV) for WEP: ICV is 32bit not 16bit.
(change from htole16 to htole32, so no changes for little endian machine)
 1.6 12-Nov-2001  lukem branches: 1.6.2;
add RCSIDs
 1.5 25-Sep-2001  onoe branches: 1.5.2;
use ALIGNED_POINTER() instead of ALIGN().
The type of ALIGN() is vary on architecture and casting pointer to u_int
is incorrect for MI code.
Since the code is to make sure aligned access to IP header and requires
bcopy if the test fails. So the performance implication is not necessary
and we can use ALIGNED_POINTER() here.
pointed out by nathanw.
 1.4 24-Sep-2001  reinoud Change the caddr_t to uintptr_t and remove the cast ... it gave problems
compiling on an LP64 ... discussed with Gimpy, atatat and bleeh
 1.3 20-Sep-2001  onoe branches: 1.3.2;
Move IBSS creation stuff from awi to ieee80211.
 1.2 19-Sep-2001  onoe Fix for FH infrastructure mode.
XXX: FH chanset should be calculated by FH hop pattern, but BayStack 650 AP
always specify chanset to fixed value 1. The previous code try to this
hack into awi driver, but it is insufficient because the chanset value
in awi driver may change while scan and it may be different from the
value in receiving beacon/probe-response. So we save encoded FH chanset
into channel in 802.11 common bss information for now.
 1.1 18-Sep-2001  onoe Move IEEE 802.11 MAC management functions from awi driver to
if_ieee80211subr.c, which can be shared between any IEEE 802.11
drivers.
However, most of current working IEEE 802.11b wireless LAN cards
have rich firmware and we cannot have a control to management frames
for such cards.

IBSS creation is now supported for the awi driver.
 1.3.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.3.2.10 17-Sep-2002  nathanw Catch up to -current.
 1.3.2.9 13-Aug-2002  nathanw Catch up to -current.
 1.3.2.8 15-Jul-2002  nathanw Whitespace.
 1.3.2.7 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.3.2.6 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.3.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.3.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.3.2.3 25-Sep-2001  nathanw Catch up to -current, LWPify.
 1.3.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.3.2.1 20-Sep-2001  nathanw file if_ieee80211subr.c was added on branch nathanw_sa on 2001-09-21 22:36:45 +0000
 1.5.2.2 01-Oct-2001  fvdl Catch up with -current.
 1.5.2.1 25-Sep-2001  fvdl file if_ieee80211subr.c was added on branch thorpej-devvp on 2001-10-01 12:47:37 +0000
 1.6.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.6.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.6.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.6.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.6.2.1 12-Nov-2001  thorpej file if_ieee80211subr.c was added on branch kqueue on 2002-01-10 20:02:07 +0000
 1.7.4.1 29-Aug-2002  gehenna catch up with -current.
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.36 10-Feb-2024  andvar Fix various typos in comments, log messages and documentation.
 1.35 26-Sep-2023  knakahara branches: 1.35.4;
Use unit id instead of if_index to reduce fixed_reqid space.
 1.34 11-Oct-2022  knakahara branches: 1.34.2;
Add sadb_x_policy_flags to inform SP origination.

This extension(struct sadb_x_policy) is *not* defined by RFC2367.

OpenBSD does not have reserved fields in struct sadb_x_policy.
Linux does not use this field yet.
FreeBSD uses this field as "sadb_x_policy_scope"; the value range is
from 0x00 to 0x04.

We use from most significant bit to avoid the above usage.
 1.33 06-Oct-2022  knakahara Fix overflow case detected by clang. Pointed out by wsh@IIJ, thanks.
 1.32 30-Sep-2022  knakahara ipsecif(4) can use fixed SP reqid based on ifindex, that can reduce number of reqid.

If we want to use fixed SP reqid for ipsecif(4), set
net.ipsecif.use_fixed_reqid=1 Default(=0) is the same as before.
net.ipsecif.use_fixed_reqid can be changed only if there is no ipsecif(4) yet.

If we want to change the range of ipseif(4) SP reqid,
set net.ipsecif.reqid_base and net.ipsecif.reqid_last.
These can also be changed only if there is no ipsecif(4) yet.
 1.31 11-Oct-2021  knakahara Make pktq_rps_hash() pluggable for each interface type. Reviewed by gdt@n.o, thorpej@n.o, and riastradh@n.o, thanks.
 1.30 14-Oct-2020  roy ipsecif: Set the link state UP if we have a tunnel, otherwise DOWN.
 1.29 13-Mar-2020  knakahara reduce unnecessary reqid of NAT-T ipsecif(4), suggested by ohishi@IIJ.
 1.28 10-Mar-2020  knakahara Fix ipsecif(4) SPDADD pfkey message has garbage. Pointed out by ohishi@IIJ.

"setkey -x" output is the following.
 1.27 01-Feb-2020  riastradh Fix order in rollback case; switch if_ipsec to atomic_load/store_*.
 1.26 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.25 01-Nov-2019  knakahara branches: 1.25.2;
Make global and per-interface ipsecif(4) pmtu tunable like gif(4).

And make hop limit tunable same as gif(4).

See http://mail-index.netbsd.org/source-changes/2019/10/30/msg110426.html
 1.24 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.23 13-Sep-2019  msaitoh if_flags is neither int nor short. It's unsigned short.
 1.22 25-Jun-2019  msaitoh branches: 1.22.2;
Simplify "LIST_HEAD();" to make the code more understandable.
No functional change.
 1.21 14-Mar-2019  knakahara Fix ipsecif(4) memory leak in some ioctl cases.
 1.20 26-Dec-2018  knakahara Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.19 07-Dec-2018  knakahara ipsecif(4) support input drop packet counter.
 1.18 19-Oct-2018  knakahara Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.17 26-Jun-2018  msaitoh branches: 1.17.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.16 29-May-2018  knakahara Reviewd by ohishi@IIJ. Sorry, I jumped the gun and committed.

Fix the following two issues.
- remove extra padding of sizeof(xisr) when adding ipsec policy
- add padding for xpl when adding discard policy
 1.15 29-May-2018  knakahara Fix panic when ipsecif(4) adds discard policy. Pointed out by ohishi@IIJ, thanks.
 1.14 24-May-2018  knakahara ipsecif(4) must not set port number to spidx even if NAT-T. Pointed out by ohishi@IIJ, thanks.
 1.13 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.12 27-Apr-2018  knakahara Fix "how" argument of MGET(). Pointed out by maxv@n.o, thanks.

MGET() does not have M_ZERO flag, so add memset when it is required.
 1.11 06-Apr-2018  knakahara Fix unexpected failure when ipsecif(4) over IPv6 is changed port number only.

Here is an example of the operation which causes this problem.
# ifconfig ipsec0 create link0
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4501
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4502
 1.10 06-Apr-2018  knakahara fix ipsecif(4) stack overflow.

XXX pullup-8
 1.9 06-Apr-2018  knakahara fix ipsecif(4) unmatch curlwp_bind.

XXX pullup-8
 1.8 06-Apr-2018  knakahara fix ipsec(4) encap_lock leak.

XXX pullup-8
 1.7 13-Mar-2018  knakahara Fix IPv6 ipsecif(4) ATF regression, sorry.

There must *not* be padding between the src sockaddr and the dst sockaddr
after struct sadb_x_policy.
 1.6 09-Mar-2018  knakahara Functionalize duplicated code. No functional changes.
 1.5 09-Mar-2018  knakahara Fix missing sadb_x_ipsecrequest informations for PF_KEY message.
 1.4 09-Mar-2018  knakahara NAT-T src and dst port in ipsec_variant should be network byte order.
 1.3 31-Jan-2018  mrg branches: 1.3.2; 1.3.4;
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.2 15-Jan-2018  knakahara Fix PR kern/52920. Pointed out by David Binderman, thanks.
 1.1 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.3.4.8 18-Jan-2019  pgoyette Synch with HEAD
 1.3.4.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.3.4.6 20-Oct-2018  pgoyette Sync with head
 1.3.4.5 28-Jul-2018  pgoyette Sync with HEAD
 1.3.4.4 25-Jun-2018  pgoyette Sync with HEAD
 1.3.4.3 02-May-2018  pgoyette Synch with HEAD
 1.3.4.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.3.4.1 15-Mar-2018  pgoyette Synch with HEAD
 1.3.2.13 13-Mar-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1520):

sys/netipsec/key.c: revision 1.271
sys/net/if_ipsec.c: revision 1.28
sys/net/if_ipsec.c: revision 1.29

Fix ipsecif(4) SPDADD pfkey message has garbage. Pointed out by ohishi@IIJ.

"setkey -x" output is the following.
 1.3.2.12 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.3.2.11 15-Mar-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1216):

sys/net/if_ipsec.c: revision 1.21

Fix ipsecif(4) memory leak in some ioctl cases.
 1.3.2.10 21-Oct-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1066):

sys/net/if_vlan.c: revision 1.133
sys/net/if_gif.h: revision 1.32
sys/net/if_ipsec.c: revision 1.18
sys/net/if_ipsec.h: revision 1.4
sys/net/if_gif.c: revision 1.144
sys/net/if_l2tp.h: revision 1.6
sys/net/if_l2tp.c: revision 1.30

Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.3.2.9 07-Jun-2018  martin Pull up following revision(s) (requested by knakahara in ticket #840):

sys/net/if_ipsec.c: revision 1.15,1.16

Fix panic when ipsecif(4) adds discard policy. Pointed out by ohishi@IIJ, thanks.
Reviewd by ohishi@IIJ. Sorry, I jumped the gun and committed.

Fix the following two issues.
- remove extra padding of sizeof(xisr) when adding ipsec policy
- add padding for xpl when adding discard policy
 1.3.2.8 07-Jun-2018  martin Pull up following revision(s) (requested by knakahara in ticket #839):

sys/net/if_ipsec.c: revision 1.14

ipsecif(4) must not set port number to spidx even if NAT-T. Pointed out by ohishi@IIJ, thanks.
 1.3.2.7 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.3.2.6 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #828):

sys/net/if_ipsec.c: revision 1.12

Fix "how" argument of MGET(). Pointed out by maxv@n.o, thanks.
MGET() does not have M_ZERO flag, so add memset when it is required.
 1.3.2.5 09-Apr-2018  martin Pull up following revision(s) (requested by knakahara in ticket #714):

sys/net/if_ipsec.c: revision 1.8 - 1.11
sys/netipsec/ipsecif.h: revision 1.2
sys/netipsec/ipsecif.c: revision 1.6,1.7

fix ipsec(4) encap_lock leak.

fix ipsecif(4) unmatch curlwp_bind.

fix ipsecif(4) stack overflow.

Add IPv4 ID when the ipsecif(4) packet can be fragmented. Implemented by hsuenaga@IIJ and ohishi@IIJ, thanks.
This modification reduces packet loss of fragmented packets on a
network where reordering occurs.

Alghough this modification has been applied, IPv4 ID is not set for
the packet smaller then IP_MINFRAGSIZE. According to RFC 6864, that
must not cause problems.

Fix unexpected failure when ipsecif(4) over IPv6 is changed port number only.
Here is an example of the operation which causes this problem.
# ifconfig ipsec0 create link0
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4501
# ifconfig ipsec0 tunnel fc00:1001::2,4500 fc00:1001::1,4502
 1.3.2.4 13-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #627):
sys/netipsec/ipsecif.c: revision 1.5
tests/net/if_ipsec/t_ipsec.sh: revision 1.4
sys/net/if_ipsec.c: revision 1.7
Fix IPv6 ipsecif(4) ATF regression, sorry.
There must *not* be padding between the src sockaddr and the dst sockaddr
after struct sadb_x_policy.

Comment out confusing (and incorrect) code and add comment. Pointed out by maxv@n.o, thanks.

Enhance assertion ipsecif(4) ATF to avoid confusing setkey(8) error message.

When setkey(8) says "syntax error at [-E]", it must mean get_if_ipsec_unique()
failed.
 1.3.2.3 13-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #620):
sys/netipsec/ipsecif.c: revision 1.4
sys/net/if_ipsec.c: revision 1.4
sys/net/if_ipsec.c: revision 1.5
sys/net/if_ipsec.c: revision 1.6
NAT-T src and dst port in ipsec_variant should be network byte order.
Fix missing sadb_x_ipsecrequest informations for PF_KEY message.
Functionalize duplicated code. No functional changes.
Fix ipsec(4) I/F esp_frag support.
 1.3.2.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.3.2.1 31-Jan-2018  snj file if_ipsec.c was added on branch netbsd-8 on 2018-02-11 21:17:34 +0000
 1.17.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.17.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.17.2.1 10-Jun-2019  christos Sync with HEAD
 1.22.2.2 13-Mar-2020  martin Pull up following revision(s) (requested by knakahara in ticket #780):

sys/netipsec/key.c: revision 1.271
sys/net/if_ipsec.c: revision 1.28
sys/net/if_ipsec.c: revision 1.29

Fix ipsecif(4) SPDADD pfkey message has garbage. Pointed out by ohishi@IIJ.

"setkey -x" output is the following.
 1.22.2.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.25.2.1 29-Feb-2020  ad Sync with head.
 1.34.2.1 02-Oct-2023  martin Pull up following revision(s) (requested by knakahara in ticket #378):

tests/net/if_ipsec/t_ipsec_unnumbered.sh: revision 1.2
sys/net/if_ipsec.c: revision 1.35
sys/netipsec/key.c: revision 1.281

Use kmem_free instead of kmem_intr_free, as key_freesaval() is not called in softint after key.c:r1.223.
E.g. key_freesaval() was called the following call path before SAD MP-ify.
esp_input_cb()
KEY_FREESAV()
key_freesav()
key_delsav()
key_freesaval()
ok'ed by ozaki-r@n.o.

Use unit id instead of if_index to reduce fixed_reqid space.

Update for sys/net/if_ipsec.c:r1.35
 1.35.4.1 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.8 17-Jan-2022  andvar fix typos in comments, mainly s/foward/forward/.
 1.7 01-Feb-2020  riastradh Fix order in rollback case; switch if_ipsec to atomic_load/store_*.
 1.6 01-Nov-2019  knakahara branches: 1.6.2;
Make global and per-interface ipsecif(4) pmtu tunable like gif(4).

And make hop limit tunable same as gif(4).

See http://mail-index.netbsd.org/source-changes/2019/10/30/msg110426.html
 1.5 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.4 19-Oct-2018  knakahara branches: 1.4.4;
Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.3 27-Apr-2018  knakahara branches: 1.3.2;
Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.2 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.1 10-Jan-2018  knakahara branches: 1.1.2; 1.1.4;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.1.4.3 20-Oct-2018  pgoyette Sync with head
 1.1.4.2 02-May-2018  pgoyette Synch with HEAD
 1.1.4.1 22-Apr-2018  pgoyette Sync with HEAD
 1.1.2.5 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.1.2.4 21-Oct-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1066):

sys/net/if_vlan.c: revision 1.133
sys/net/if_gif.h: revision 1.32
sys/net/if_ipsec.c: revision 1.18
sys/net/if_ipsec.h: revision 1.4
sys/net/if_gif.c: revision 1.144
sys/net/if_l2tp.h: revision 1.6
sys/net/if_l2tp.c: revision 1.30

Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.1.2.3 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.1.2.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.1.2.1 10-Jan-2018  snj file if_ipsec.h was added on branch netbsd-8 on 2018-02-11 21:17:34 +0000
 1.3.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.3.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.2.1 10-Jun-2019  christos Sync with HEAD
 1.4.4.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.6.2.1 29-Feb-2020  ad Sync with head.
 1.49 02-Nov-2023  yamaguchi branches: 1.49.4;
l2tp(4): use ether_ifattach() to initialize ethercom
 1.48 03-Sep-2022  thorpej branches: 1.48.4;
Garbage-collect the remaining vestiges of netisr.
 1.47 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.46 25-Oct-2020  roy branches: 1.46.6;
l2tp: call if_link_state_change rather then directly setting it.

This allows protocols to do their thing.
 1.45 25-Oct-2020  roy l2tp: Revert prior

It proves I can't read comments and that if_initialize should not be used.
 1.44 15-Oct-2020  roy l2tp: Set the link state UP if we have a tunnel, otherwise DOWN.
 1.43 01-Feb-2020  riastradh Switch sys/net to percpu_create.
 1.42 01-Feb-2020  riastradh Switch if_l2tp to atomic_load/store_*.

Fix missing membar_datadep_consumer -- now atomic_load_consume -- in
l2tp_lookup_session_ref.
 1.41 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.40 16-Oct-2019  knakahara branches: 1.40.2;
Fix missing kpreempt_disable() before softint_schedule() like if_vmx.c:r1.51.
 1.39 19-Sep-2019  knakahara l2tp(4): avoid having struct ifqueue directly in a percpu storage.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.38 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.37 19-Sep-2019  knakahara Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).
 1.36 19-Aug-2019  ozaki-r l2tp: initialize mowner variables for MBUFTRACE
 1.35 25-Jun-2019  msaitoh branches: 1.35.2;
Simplify "LIST_HEAD();" to make the code more understandable.
No functional change.
 1.34 26-Apr-2019  pgoyette Some more empty-string --> NULL conversions for module dependencies
 1.33 27-Dec-2018  knakahara l2tp(4): fix output bytes counter. Pointed by k-goda@IIJ, thanks.
 1.32 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.31 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.30 19-Oct-2018  knakahara Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.29 26-Jun-2018  msaitoh branches: 1.29.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.28 25-Jun-2018  msaitoh Remove duplicated inclusion of net/bpf.h.
 1.27 08-May-2018  maxv Simplify: use M_MOVE_PKTHDR directly.

ok knakahara@
 1.26 07-May-2018  maxv Use m_remove_pkthdr.

ok knakahara@ (for L2TP)
 1.25 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.24 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.23 10-Apr-2018  knakahara Fix previous my mistake and odd unaligned case. Pointed out by maxv@n.o, thanks.

It must be rare case to be required this copy routine...
 1.22 09-Apr-2018  knakahara Improve comment. Pointed out by maxv@n.o, thanks.
 1.21 09-Apr-2018  knakahara Fix l2tp(4) alignment check. Pointed out and reviewed by k-goda@IIJ.

The alignment check should be done for the address of m_data instead of
the value of m_data.

XXX pullup-8
 1.20 26-Jan-2018  maxv branches: 1.20.2;
Use MH_ALIGN instead, ok knakahara@.
 1.19 26-Jan-2018  maxv Several fixes in L2TP:

* l2tp_input(): use m_copydata, and ensure there is enough space in the
chain. Otherwise overflow.

* l2tp_tcpmss_clamp(): ensure there is enough space in the chain.

* in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL.

* in_l2tp_input(): no need to call m_pullup since we use m_copydata.
Just check the space in the chain.

* in_l2tp_input(): if there is a cookie, make sure the chain has enough
space.

* in6_l2tp_input(): same changes as in_l2tp_input().

Ok knakahara@
 1.18 25-Jan-2018  maxv style
 1.17 19-Dec-2017  ozaki-r Don't set IFEF_MPSAFE unless NET_MPSAFE at this point

Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.

Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.16 06-Dec-2017  knakahara unify processing to check nesting count for some tunnel protocols.
 1.15 16-Nov-2017  ozaki-r branches: 1.15.2;
Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.14 30-Oct-2017  ozaki-r Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use if_link_state_change
 1.13 30-Oct-2017  knakahara If if_attach() failed in the attach function, return. Add comments about if_initialize().

suggested by ozaki-r@n.o.
 1.12 19-Oct-2017  knakahara fix l2tp panic when l2tp session id is changed (same as if_vlan.c:r1.104)

E.g. the following operation causes this panic.
====================
# ifconfig l2tp0 create
# ifconfig l2tp0 session 140 140
# ifconfig l2tp1 create
# ifconfig l2tp1 session 200 200
# ifconfig l2tp1 session 300 300
panic: kernel diagnostic assertion "new->ple_next == NULL" failed: file "/disk4/home/k-nakahara/repos/netbsd-src/sys/sys/pslist.h", line 118
====================

Pointed out by s-yamaguchi@IIJ, thanks.

XXX need pullup-8
 1.11 01-Jun-2017  chs branches: 1.11.2; 1.11.6;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.10 20-Apr-2017  knakahara branches: 1.10.2;
missing if_extflags of l2tp(4). l2tp(4) is already MP-safe.
 1.9 13-Apr-2017  knakahara l2tp(4) support when hashinit() cannot allocate requried hash size.

pointed out by s-yamaguchi@IIJ
 1.8 04-Apr-2017  knakahara fix module build
 1.7 04-Apr-2017  sevan Revert change to allow builds to continue until the missing vlan.h file is committed.
https://mail-index.netbsd.org/source-changes/2017/04/04/msg083283.html
 1.6 04-Apr-2017  knakahara remove unnecessary if_vlanvar.h. add missing include "vlan.h".

pointed out by s-yamaguchi@IIJ, thanks.
 1.5 04-Apr-2017  knakahara fix atf failed.
 1.4 03-Apr-2017  knakahara fix missing mutex_destroy when modunload.

pointed out by s-yamaguchi@IIJ, thanks.
 1.3 03-Apr-2017  knakahara fix potentially use after free between "ifconfig l2tpX destroy" and l2tp Tx.

It is protected by KERNEL_LOCK in soo_ioctl() between "ioctl destory" and
other ioctls. And, it is protected by encap_lock() between "ioctl destroy"
and Rx. However, it was not protected between "ioctl destroy" and Tx.
That is,
+ CPU#A
- do "ifconfig l2tpX destroy"
- call l2tp_clone_destroy()
- done l2tp_delete_tunnel()
+ CPU#B
- begin l2tp output processing
- call l2tp_transmit()
- done l2tp_getref_variant()
+ CPU#A
- done kmem_free(sc->l2tp_var, )
+ CPU#B
- access to sc->l2tp_var after free

pointed out by s-yamaguchi@IIJ, thanks.
 1.2 30-Mar-2017  knakahara KNF. pointed out by s-yamaguchi@IIJ
 1.1 16-Feb-2017  knakahara branches: 1.1.2;
add missing files.
 1.1.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 16-Feb-2017  pgoyette file if_l2tp.c was added on branch pgoyette-localcount on 2017-03-20 06:57:50 +0000
 1.10.2.2 21-Apr-2017  bouyer Sync with HEAD
 1.10.2.1 20-Apr-2017  bouyer file if_l2tp.c was added on branch bouyer-socketcan on 2017-04-21 16:54:05 +0000
 1.11.6.2 28-Aug-2017  skrll Sync with HEAD
 1.11.6.1 01-Jun-2017  skrll file if_l2tp.c was added on branch nick-nhusb on 2017-08-28 17:53:11 +0000
 1.11.2.11 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.11.2.10 21-Oct-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1066):

sys/net/if_vlan.c: revision 1.133
sys/net/if_gif.h: revision 1.32
sys/net/if_ipsec.c: revision 1.18
sys/net/if_ipsec.h: revision 1.4
sys/net/if_gif.c: revision 1.144
sys/net/if_l2tp.h: revision 1.6
sys/net/if_l2tp.c: revision 1.30

Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.11.2.9 26-Jul-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #938):
sys/arch/acorn32/podulebus/if_ie.c: revision 1.41
sys/arch/amiga/dev/if_es.c: revision 1.58
sys/arch/amiga/dev/if_qn.c: revision 1.45
sys/arch/arm/at91/at91emac.c: revision 1.20
sys/arch/arm/ep93xx/epe.c: revision 1.37
sys/arch/emips/ebus/if_le_ebus.c: revision 1.14
sys/arch/emips/ebus/if_le_ebus.c: revision 1.15
sys/arch/mac68k/dev/if_mc.c: revision 1.46
sys/arch/macppc/dev/am79c950.c: revision 1.39
sys/arch/newsmips/apbus/if_sn.c: revision 1.40
sys/arch/next68k/dev/mb8795.c: revision 1.59
sys/arch/playstation2/dev/if_smap.c: revision 1.25
sys/arch/playstation2/dev/if_smap.c: revision 1.26
sys/arch/sun2/dev/if_ec.c: revision 1.28
sys/arch/sun3/dev/if_ie.c: revision 1.63
sys/arch/x68k/dev/if_ne_intio.c: revision 1.19
sys/arch/xen/xen/if_xennet_xenbus.c: revision 1.75
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.63
sys/dev/bi/if_ni.c: revision 1.45
sys/dev/cadence/if_cemac.c: revision 1.12
sys/dev/ic/am7990.c: revision 1.78
sys/dev/ic/am79900.c: revision 1.27
sys/dev/ic/an.c: revision 1.67
sys/dev/ic/cs89x0.c: revision 1.40
sys/dev/ic/dm9000.c: revision 1.13
sys/dev/ic/dm9000.c: revision 1.14
sys/dev/ic/dp8390.c: revision 1.88
sys/dev/ic/elink3.c: revision 1.141
sys/dev/ic/elinkxl.c: revision 1.122
sys/dev/ic/hme.c: revision 1.98
sys/dev/ic/i82586.c: revision 1.77
sys/dev/ic/lance.c: revision 1.53
sys/dev/ic/mb86950.c: revision 1.27
sys/dev/ic/mb86960.c: revision 1.86
sys/dev/ic/mtd803.c: revision 1.34
sys/dev/ic/pdq_ifsubr.c: revision 1.59
sys/dev/ic/rrunner.c: revision 1.86
sys/dev/ic/seeq8005.c: revision 1.58
sys/dev/ic/sgec.c: revision 1.47
sys/dev/ic/smc90cx6.c: revision 1.72
sys/dev/ic/smc91cxx.c: revision 1.96
sys/dev/ic/tropic.c: revision 1.49
sys/dev/ic/wi.c: revision 1.245
sys/dev/isa/if_eg.c: revision 1.93
sys/dev/isa/if_el.c: revision 1.95
sys/dev/isa/if_iy.c: revision 1.101
sys/dev/ofw/ofnet.c: revision 1.58
sys/dev/pci/if_alc.c: revision 1.27
sys/dev/pci/if_de.c: revision 1.152
sys/dev/pci/if_fpa.c: revision 1.61
sys/dev/pci/if_jme.c: revision 1.34
sys/dev/pci/if_tl.c: revision 1.108
sys/dev/pci/if_vte.c: revision 1.19
sys/dev/pci/ixgbe/ixgbe.h: revision 1.50
sys/dev/pcmcia/if_cnw.c: revision 1.62
sys/dev/pcmcia/if_malo_pcmcia.c: revision 1.17
sys/dev/pcmcia/if_ray.c: revision 1.89
sys/dev/pcmcia/if_xi.c: revision 1.81
sys/dev/pcmcia/mhzc.c: revision 1.51
sys/dev/pcmcia/xirc.c: revision 1.34
sys/dev/qbus/if_de.c: revision 1.33
sys/dev/qbus/if_qe.c: revision 1.78
sys/dev/qbus/if_qt.c: revision 1.22
sys/dev/sbus/be.c: revision 1.87
sys/dev/sbus/qe.c: revision 1.68
sys/dev/scsipi/if_se.c: revision 1.96
sys/dev/usb/if_atu.c: revision 1.59
sys/net/if_l2tp.c: revision 1.28 via patch
sys/net/if_ppp.c: revision 1.160
It's not required to include net/bpfdesc.h. Remove it.
--
Simplify like other drivers. NULL check of ifp->if_bpf is done in
bpf_mtap(), so it's not required to do it here.
--
Remove duplicated inclusion of net/bpf.h.
--
Remove duplicated inclusion of net/bpf.h.
--
Simplify bpf_mtap() call. No functional change.
 1.11.2.8 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.11.2.7 11-Apr-2018  martin Pull up following revision(s) (requested by knakahara in ticket #730):

sys/net/if_l2tp.c: revision 1.22
sys/net/if_l2tp.c: revision 1.23

Improve comment. Pointed out by maxv@n.o, thanks.

Fix previous my mistake and odd unaligned case. Pointed out by maxv@n.o, thanks.
It must be rare case to be required this copy routine...
 1.11.2.6 09-Apr-2018  bouyer Pull up following revision(s) (requested by knakahara in ticket #725):
sys/net/if_l2tp.c: revision 1.21
Fix l2tp(4) alignment check. Pointed out and reviewed by k-goda@IIJ.
The alignment check should be done for the address of m_data instead of
the value of m_data.
XXX pullup-8
 1.11.2.5 08-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #614):
sys/net/if_l2tp.c: revision 1.20
sys/netinet6/in6_l2tp.c: revision 1.13
sys/netinet6/in6_l2tp.c: revision 1.14
sys/net/if_l2tp.h: revision 1.3
sys/net/if_l2tp.c: revision 1.13
sys/netinet/in_l2tp.c: revision 1.10
sys/net/if_l2tp.c: revision 1.18
sys/netinet/in_l2tp.c: revision 1.11
sys/net/if_l2tp.c: revision 1.19
sys/netinet/in_l2tp.c: revision 1.12

If if_attach() failed in the attach function, return. Add comments about if_initialize().
suggested by ozaki-r@n.o.

Fix null deref, m could be NULL if M_PREPEND fails.

style

Style, reduce the indentation level when possible, and add a missing NULL
check after M_PREPEND.

Several fixes in L2TP:
* l2tp_input(): use m_copydata, and ensure there is enough space in the
chain. Otherwise overflow.
* l2tp_tcpmss_clamp(): ensure there is enough space in the chain.
* in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL.
* in_l2tp_input(): no need to call m_pullup since we use m_copydata.
Just check the space in the chain.
* in_l2tp_input(): if there is a cookie, make sure the chain has enough
space.
* in6_l2tp_input(): same changes as in_l2tp_input().
Ok knakahara@

Use MH_ALIGN instead, ok knakahara@.
 1.11.2.4 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.11.2.3 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.11.2.2 08-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #349):
sys/net/if_l2tp.c: revision 1.14
sys/net/if_tap.c: revision 1.101
sys/net/if_tun.c: revision 1.141
sys/net/if_vlan.c: revision 1.106
Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use
if_link_state_change
 1.11.2.1 06-Nov-2017  snj Pull up following revision(s) (requested by knakahara in ticket #341):
sys/net/if_l2tp.c: revision 1.12
fix l2tp panic when l2tp session id is changed (same as if_vlan.c:r1.104)
E.g. the following operation causes this panic.
====================
# ifconfig l2tp0 create
# ifconfig l2tp0 session 140 140
# ifconfig l2tp1 create
# ifconfig l2tp1 session 200 200
# ifconfig l2tp1 session 300 300
panic: kernel diagnostic assertion "new->ple_next == NULL" failed: file "/disk4/home/k-nakahara/repos/netbsd-src/sys/sys/pslist.h", line 118
====================
Pointed out by s-yamaguchi@IIJ, thanks.
 1.15.2.2 03-Dec-2017  jdolecek update from HEAD
 1.15.2.1 16-Nov-2017  jdolecek file if_l2tp.c was added on branch tls-maxphys on 2017-12-03 11:39:02 +0000
 1.20.2.8 18-Jan-2019  pgoyette Synch with HEAD
 1.20.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.20.2.6 20-Oct-2018  pgoyette Sync with head
 1.20.2.5 28-Jul-2018  pgoyette Sync with HEAD
 1.20.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.20.2.3 21-May-2018  pgoyette Sync with HEAD
 1.20.2.2 02-May-2018  pgoyette Synch with HEAD
 1.20.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.29.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.29.2.1 10-Jun-2019  christos Sync with HEAD
 1.35.2.2 01-Nov-2019  martin Pull up following revision(s) (requested by knakahara in ticket #387):

sys/net/if_gre.c: revision 1.176
sys/net/if_l2tp.c: revision 1.40
sys/dev/pci/ixgbe/ix_txrx.c: revision 1.56
sys/net/if_tap.c: revision 1.114

Fix missing kpreempt_disable() before softint_schedule() like if_vmx.c:r1.51.
 1.35.2.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.40.2.1 29-Feb-2020  ad Sync with head.
 1.46.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.48.4.1 03-Nov-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #455):
sys/dev/pci/ixgbe/ixgbe.c: revision 1.347
sys/net/if_l2tp.c: revision 1.49
tests/net/if_vlan/t_vlan.sh: revision 1.25
sys/net/if_vlan.c: revision 1.171
sys/net/if_ethersubr.c: revision 1.326
sys/dev/pci/ixgbe/ixv.c: revision 1.194
Use ether_bpf_mtap only when the device supports vlan harware tagging
The function is bpf_mtap() for ethernet devices and *currently*
it is just handling VLAN tag stripped by the hardware.
l2tp(4): use ether_ifattach() to initialize ethercom
Support vlan(4) over l2tp(4)
Added the test for vlan over l2tp
 1.49.4.1 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.10 16-Mar-2021  knakahara Fix l2tp(4) ioctl type. Pointed out by yamaguchi@n.o, thanks.

XXX pullup-[89]
 1.9 01-Feb-2020  riastradh branches: 1.9.6;
Switch if_l2tp to atomic_load/store_*.

Fix missing membar_datadep_consumer -- now atomic_load_consume -- in
l2tp_lookup_session_ref.
 1.8 19-Sep-2019  knakahara branches: 1.8.2;
Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.7 19-Sep-2019  knakahara Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).
 1.6 19-Oct-2018  knakahara branches: 1.6.4;
Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.5 27-Apr-2018  knakahara branches: 1.5.2;
Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.4 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.3 30-Oct-2017  knakahara branches: 1.3.2; 1.3.4;
If if_attach() failed in the attach function, return. Add comments about if_initialize().

suggested by ozaki-r@n.o.
 1.2 31-May-2017  knakahara branches: 1.2.2; 1.2.6;
remove obsoleted comment. pointed out by s-yamaguchi@IIJ.
 1.1 16-Feb-2017  knakahara branches: 1.1.2; 1.1.6;
add missing files.
 1.1.6.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.6.1 16-Feb-2017  bouyer file if_l2tp.h was added on branch bouyer-socketcan on 2017-04-21 16:54:05 +0000
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 16-Feb-2017  pgoyette file if_l2tp.h was added on branch pgoyette-localcount on 2017-03-20 06:57:50 +0000
 1.2.6.2 28-Aug-2017  skrll Sync with HEAD
 1.2.6.1 31-May-2017  skrll file if_l2tp.h was added on branch nick-nhusb on 2017-08-28 17:53:11 +0000
 1.2.2.5 22-Mar-2021  martin Pull up following revision(s) (requested by knakahara in ticket #1665):

sys/net/if_l2tp.h: revision 1.10

Fix l2tp(4) ioctl type. Pointed out by yamaguchi@n.o, thanks.
XXX pullup-[89]
 1.2.2.4 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.2.2.3 21-Oct-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1066):

sys/net/if_vlan.c: revision 1.133
sys/net/if_gif.h: revision 1.32
sys/net/if_ipsec.c: revision 1.18
sys/net/if_ipsec.h: revision 1.4
sys/net/if_gif.c: revision 1.144
sys/net/if_l2tp.h: revision 1.6
sys/net/if_l2tp.c: revision 1.30

Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.2.2.2 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.2.2.1 08-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #614):
sys/net/if_l2tp.c: revision 1.20
sys/netinet6/in6_l2tp.c: revision 1.13
sys/netinet6/in6_l2tp.c: revision 1.14
sys/net/if_l2tp.h: revision 1.3
sys/net/if_l2tp.c: revision 1.13
sys/netinet/in_l2tp.c: revision 1.10
sys/net/if_l2tp.c: revision 1.18
sys/netinet/in_l2tp.c: revision 1.11
sys/net/if_l2tp.c: revision 1.19
sys/netinet/in_l2tp.c: revision 1.12

If if_attach() failed in the attach function, return. Add comments about if_initialize().
suggested by ozaki-r@n.o.

Fix null deref, m could be NULL if M_PREPEND fails.

style

Style, reduce the indentation level when possible, and add a missing NULL
check after M_PREPEND.

Several fixes in L2TP:
* l2tp_input(): use m_copydata, and ensure there is enough space in the
chain. Otherwise overflow.
* l2tp_tcpmss_clamp(): ensure there is enough space in the chain.
* in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL.
* in_l2tp_input(): no need to call m_pullup since we use m_copydata.
Just check the space in the chain.
* in_l2tp_input(): if there is a cookie, make sure the chain has enough
space.
* in6_l2tp_input(): same changes as in_l2tp_input().
Ok knakahara@

Use MH_ALIGN instead, ok knakahara@.
 1.3.4.3 20-Oct-2018  pgoyette Sync with head
 1.3.4.2 02-May-2018  pgoyette Synch with HEAD
 1.3.4.1 22-Apr-2018  pgoyette Sync with HEAD
 1.3.2.2 03-Dec-2017  jdolecek update from HEAD
 1.3.2.1 30-Oct-2017  jdolecek file if_l2tp.h was added on branch tls-maxphys on 2017-12-03 11:39:02 +0000
 1.5.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.5.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.5.2.1 10-Jun-2019  christos Sync with HEAD
 1.6.4.2 22-Mar-2021  martin Pull up following revision(s) (requested by knakahara in ticket #1233):

sys/net/if_l2tp.h: revision 1.10

Fix l2tp(4) ioctl type. Pointed out by yamaguchi@n.o, thanks.
XXX pullup-[89]
 1.6.4.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.8.2.1 29-Feb-2020  ad Sync with head.
 1.9.6.1 03-Apr-2021  thorpej Sync with HEAD.
 1.35 19-Nov-2022  yamt Make arp have its own mowner

This helped me to debug mbuf leaks in arp.
(if_arp.c rev. 1.298)
 1.34 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.33 11-Sep-2020  roy ARP: Use ND rather than our own.

This brings the benefit of Neighbour Unreachability Detection which is
something ARP sorely lacks.

The new timings mirror those of IPv6 and are adjustable via sysctl(8).
Unlike IPv6 ND, these are global and not per interface.
 1.32 11-Sep-2020  roy if_llatbl.c: adjust for nd changes
 1.31 25-Sep-2019  ozaki-r Make panic messages more informative
 1.30 10-Jul-2018  kre UPdate previous so that there is no unused (but assigned) variable
left when there is no ARP. Thanks gcc!
 1.29 10-Jul-2018  kre Avoid attempting to call arp related functions if there is no
arp in the kernel.
 1.28 10-Jul-2018  ozaki-r Don't overwrite an existing llentry on RTM_ADD to avoid race conditions

Reported and tested by christos@
 1.27 05-Jun-2018  nonaka branches: 1.27.2;
It is necessary to set wall time instead of monotonic time to rmx_expire.
 1.26 06-Mar-2018  ozaki-r Use pool(9) for llentry allocations

llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.25 06-Mar-2018  ozaki-r Fix memory leaks on arp -d and ndp -d for static entries

We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.
 1.24 06-Mar-2018  ozaki-r Fix reference leaks of llentry

callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).

While here, we can remove remaining abuses of mutex_owned for softnet_lock.
 1.23 14-Feb-2018  maxv branches: 1.23.2;
Remove IFF_STATICARP, we don't support this, and the code is useless in its
current form.

ok ozaki-r@
 1.22 10-Nov-2017  ozaki-r branches: 1.22.2;
Fix a deadlock between a route update and lltable

It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.

If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.

A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update

Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.

Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.21 28-Jun-2017  ozaki-r Restore ARP/NDP entries to route show and netstat -r

Requested by dyoung@ some time ago
 1.20 23-Jun-2017  ozaki-r Tweak lltable_sysctl_dumparp

- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
 1.19 22-Jun-2017  ozaki-r Purge all related L2 caches on removing a route

The change addresses situations similar to PR 51179.
 1.18 03-Mar-2017  msaitoh branches: 1.18.6;
Add missing opt_net_mpsafe.h.
 1.17 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.16 21-Dec-2016  ozaki-r branches: 1.16.2;
Fix deadlock between llentry timers and destruction of llentry

llentry timer (of nd6) holds both llentry's lock and softnet_lock.
A caller also holds them and calls callout_halt to wait for the
timer to quit. However we can pass only one lock to callout_halt,
so passing either of them can cause a deadlock. Fix it by avoid
calling callout_halt without holding llentry's lock.

BTW in the first place we cannot pass llentry's lock to callout_halt
because it's a rwlock...
 1.15 11-Oct-2016  roy Mark arprequest static and introduce arpannounce so that gratuitous
ARP requests are only send from valid addresses.
 1.14 16-Jun-2016  ozaki-r branches: 1.14.2;
Use if_get_byindex instead of if_byindex for MP-safe
 1.13 06-Apr-2016  ozaki-r Fill rtm_addrs properly

This fixes that arp(8) on some archs (only 32bit?) shows "(weird)"
for every entries unexpectedly.

Confirmed on evbarm by ryo@ and i386 by me.
 1.12 06-Apr-2016  ozaki-r Fill sdl with sockaddr_dl_init

And add an assertion of if_addrlen and ll_addr.

From christos@
 1.11 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.10 16-Feb-2016  ozaki-r Remove workaround for GATEWAY

The workaround was introduced because lltable/llentry uses rwlock
but it may be executed in hardware interrupt due to fast forward.
Now we don't run fast forward in hardware interrupt anymore, so
we can remove the workaround.
 1.9 26-Nov-2015  ozaki-r Fix build dependency of if_llatbl.c

if_llatbl.c is required if inet or inet6 is enabled. Depending on ether
doesn't suit for NDP case.
 1.8 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.7 20-Oct-2015  ozaki-r Stop using softnet_lock (fix possible deadlock)

Using softnet_lock for mutual exclusion between lltable_free and
arptimer was wrong and had an issue causing a deadlock between
them; lltable_free waits arptimer completion by calling
callout_halt with softnet_lock that is held in arptimer, however
lltable_free also holds llentry's lock that is also held in
arptimer so arptimer never obtain the lock and both never go
forward eventually. We have to pass llentry's lock to
callout_halt instead.
 1.6 30-Sep-2015  ozaki-r Make GATEWAY (fastforward) work again

With GATEWAY (fastforward), the whole forwarding processing runs in
hardware interrupt context. So we cannot use rwlock for lltable and
llentry in that case.

This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
when GATEWAY is enabled. We need to tweak locking only around rtree
in lltable_free. Other than that, what we need to do is to change macros
for locks.

I hope fastforward runs in softint some day in the future...
 1.5 28-Sep-2015  ozaki-r Tweak mutex_enter(softnet_lock) position

The previous code took locks the following order:
- LLE_WLOCKs
- mutex_enter(softnet_lock)
- LLE_WUNLOCKs
- mutex_exit(softnet_lock)

This fix moves mutex_enter(softnet_lock) before LLE_WLOCKs.
 1.4 09-Sep-2015  ozaki-r branches: 1.4.2;
Fix race condition on la_rt between lltable_free and other places touching la_rt

We have to touch la_rt always with holding softnet_lock. And we have to
use callout_halt with softnet_lock instead of callout_stop for
la_timer (arptimer) because arptimer holds softnet_lock inside it.

This fix may solve a kernel panic christos@ encountered.
 1.3 31-Aug-2015  pooka #if __NetBSD__ -> #if defined(__NetBSD__)
 1.2 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.1 31-Aug-2015  ozaki-r Import lltable/llentry from FreeBSD

lltable/llentry is new L2 nexthop cache data structures that
store caches in each interface (struct ifnet). It is imported
to replace the current ARP cache implementation that uses the
global list with the big kernel lock, and provide fine-grain
locking for cache operations. It is also planned to replace
NDP caches.

The code is based on FreeBSD's lltable/llentry as of r286629
and tweaked for NetBSD.
 1.4.2.9 28-Aug-2017  skrll Sync with HEAD
 1.4.2.8 05-Feb-2017  skrll Sync with HEAD
 1.4.2.7 05-Dec-2016  skrll Sync with HEAD
 1.4.2.6 09-Jul-2016  skrll Sync with HEAD
 1.4.2.5 22-Apr-2016  skrll Sync with HEAD
 1.4.2.4 19-Mar-2016  skrll Sync with HEAD
 1.4.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.4.2.2 22-Sep-2015  skrll Sync with HEAD
 1.4.2.1 09-Sep-2015  skrll file if_llatbl.c was added on branch nick-nhusb on 2015-09-22 12:06:10 +0000
 1.14.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.14.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.14.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.16.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.18.6.7 11-Jul-2018  martin Additionally pullup src/sys/net/if_llatbl.c r1.30 to fix build fallout
from previous, requested by both ozaki-r (ticket #918) and kre (ticket #920):

Update previous so that there is no unused (but assigned) variable
left when there is no ARP. Thanks gcc!
 1.18.6.6 10-Jul-2018  martin Additionally pull up the following, requested by ozaki-r in ticket #918:

src/sys/net/if_llatbl.c 1.29

Avoid attempting to call arp related functions if there is no
arp in the kernel.
 1.18.6.5 10-Jul-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #918):

sys/net/if_llatbl.c: revision 1.28

Don't overwrite an existing llentry on RTM_ADD to avoid race conditions
Reported and tested by christos@
 1.18.6.4 09-Jun-2018  martin Pull up following revision(s) (requested by nonaka in ticket #862):

sys/net/if_llatbl.c: revision 1.27

It is necessary to set wall time instead of monotonic time to rmx_expire.
 1.18.6.3 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.18.6.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #353):
sys/net/if_llatbl.c: 1.22
sys/net/if_llatbl.h: 1.13
sys/netinet/if_arp.c: 1.254
sys/netinet/in.c: 1.208-1.209
sys/netinet6/in6.c: 1.249-1.250
sys/netinet6/nd6.c: 1.237
Remove redundant KASSERTMSG
The function is static, has just one caller and the caller does the same check.
--
Fix a deadlock between a route update and lltable
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.18.6.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.22.2.2 03-Dec-2017  jdolecek update from HEAD
 1.22.2.1 10-Nov-2017  jdolecek file if_llatbl.c was added on branch tls-maxphys on 2017-12-03 11:39:02 +0000
 1.23.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.23.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.23.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.27.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.27.2.1 10-Jun-2019  christos Sync with HEAD
 1.19 19-Nov-2022  yamt Make arp have its own mowner

This helped me to debug mbuf leaks in arp.
(if_arp.c rev. 1.298)
 1.18 14-Sep-2020  roy nd: Name l3addr union of llentry and use in-place of nd_addr.

Probably makes more sense and makes nd.h less messy.
 1.17 18-Jul-2019  ozaki-r Show pointers of llentries on trace logs of LLE_REF_TRACE
 1.16 12-Jul-2018  ozaki-r Don't use aprint_* functions for logging unrelated to autoconf(9)
 1.15 19-Apr-2018  christos branches: 1.15.2;
s/static inline/static __inline/g for consistency.
 1.14 06-Mar-2018  ozaki-r Use pool(9) for llentry allocations

llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.13 10-Nov-2017  ozaki-r branches: 1.13.2; 1.13.4;
Fix a deadlock between a route update and lltable

It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.

If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.

A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update

Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.

Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.12 23-Jun-2017  ozaki-r Tweak lltable_sysctl_dumparp

- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
 1.11 22-Jun-2017  ozaki-r Purge all related L2 caches on removing a route

The change addresses situations similar to PR 51179.
 1.10 21-Dec-2016  ozaki-r branches: 1.10.8;
Fix deadlock between llentry timers and destruction of llentry

llentry timer (of nd6) holds both llentry's lock and softnet_lock.
A caller also holds them and calls callout_halt to wait for the
timer to quit. However we can pass only one lock to callout_halt,
so passing either of them can cause a deadlock. Fix it by avoid
calling callout_halt without holding llentry's lock.

BTW in the first place we cannot pass llentry's lock to callout_halt
because it's a rwlock...
 1.9 04-Apr-2016  ozaki-r branches: 1.9.2;
Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.8 16-Feb-2016  ozaki-r Remove workaround for GATEWAY

The workaround was introduced because lltable/llentry uses rwlock
but it may be executed in hardware interrupt due to fast forward.
Now we don't run fast forward in hardware interrupt anymore, so
we can remove the workaround.
 1.7 17-Dec-2015  ozaki-r Fix memory leak of llentry#la_opaque

llentry#la_opaque which is for token ring is allocated in arp.c
and freed in arp.c when freeing llentry. However, llentry can be
freed from other places, e.g., lltable_free. In such cases,
la_opaque is never freed.

To fix that, add a new callback (lle_ll_free) to llentry and
register a destruction function of la_opque to it. On freeing a
llentry, we can surely free la_opque via the callback.
 1.6 25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.5 05-Nov-2015  ozaki-r Improve lock traces and add reference traces
 1.4 09-Oct-2015  ozaki-r Fix LLE_TRY_UPGRADE when GATEWAY

It's expected to return a value.
 1.3 30-Sep-2015  ozaki-r Make GATEWAY (fastforward) work again

With GATEWAY (fastforward), the whole forwarding processing runs in
hardware interrupt context. So we cannot use rwlock for lltable and
llentry in that case.

This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
when GATEWAY is enabled. We need to tweak locking only around rtree
in lltable_free. Other than that, what we need to do is to change macros
for locks.

I hope fastforward runs in softint some day in the future...
 1.2 31-Aug-2015  ozaki-r branches: 1.2.2;
Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.1 31-Aug-2015  ozaki-r Import lltable/llentry from FreeBSD

lltable/llentry is new L2 nexthop cache data structures that
store caches in each interface (struct ifnet). It is imported
to replace the current ARP cache implementation that uses the
global list with the big kernel lock, and provide fine-grain
locking for cache operations. It is also planned to replace
NDP caches.

The code is based on FreeBSD's lltable/llentry as of r286629
and tweaked for NetBSD.
 1.2.2.7 28-Aug-2017  skrll Sync with HEAD
 1.2.2.6 05-Feb-2017  skrll Sync with HEAD
 1.2.2.5 22-Apr-2016  skrll Sync with HEAD
 1.2.2.4 19-Mar-2016  skrll Sync with HEAD
 1.2.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.2.2.2 22-Sep-2015  skrll Sync with HEAD
 1.2.2.1 31-Aug-2015  skrll file if_llatbl.h was added on branch nick-nhusb on 2015-09-22 12:06:10 +0000
 1.9.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.10.8.3 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.10.8.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #353):
sys/net/if_llatbl.c: 1.22
sys/net/if_llatbl.h: 1.13
sys/netinet/if_arp.c: 1.254
sys/netinet/in.c: 1.208-1.209
sys/netinet6/in6.c: 1.249-1.250
sys/netinet6/nd6.c: 1.237
Remove redundant KASSERTMSG
The function is static, has just one caller and the caller does the same check.
--
Fix a deadlock between a route update and lltable
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.10.8.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.13.4.3 28-Jul-2018  pgoyette Sync with HEAD
 1.13.4.2 22-Apr-2018  pgoyette Sync with HEAD
 1.13.4.1 15-Mar-2018  pgoyette Synch with HEAD
 1.13.2.2 03-Dec-2017  jdolecek update from HEAD
 1.13.2.1 10-Nov-2017  jdolecek file if_llatbl.h was added on branch tls-maxphys on 2017-12-03 11:39:02 +0000
 1.15.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.15.2.1 10-Jun-2019  christos Sync with HEAD
 1.23 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.22 03-Feb-2021  roy if_llc.h: Replace __packed with CTASSERT
 1.21 05-Sep-2014  matt branches: 1.21.32;
Don't use class as a structure member.
 1.20 08-Sep-2008  gmcgarry branches: 1.20.38;
Replace most gcc-specific __attribute__ uses with BSD-style sys/cdef.h
preprocessor macros.
 1.19 20-Feb-2008  matt branches: 1.19.6; 1.19.10; 1.19.12; 1.19.16;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.18 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.17 10-Dec-2006  is branches: 1.17.20; 1.17.26; 1.17.28; 1.17.32;
define some more magic LLC constants
 1.16 14-May-2006  christos branches: 1.16.8; 1.16.10; 1.16.12;
Comment out packed attributes that gcc 4 does not like.
 1.15 10-Dec-2005  elad branches: 1.15.4; 1.15.6; 1.15.8; 1.15.12;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.14 07-Aug-2003  agc branches: 1.14.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 10-Apr-2001  thorpej branches: 1.13.22;
Add 802.1D (Spanning Tree) LSAP code.
 1.12 19-Nov-1999  thorpej branches: 1.12.6;
Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.11 22-Mar-1999  bad branches: 1.11.8; 1.11.14;
Add LLC_SNAPFRAMELEN.
 1.10 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.9 02-May-1997  christos Rename a the pdu fields and don't add the bogus cast on the frmrinfo define.
 1.8 01-May-1997  christos Bring back to life struct frmrinfo and llc_frmrinfo; these are used in netccitt
 1.7 01-May-1997  christos PR/3462: William Studenmund: sizeof(struct llc) returns 10 on m68k instead
of 8. Since structure padding on the m68k is 16 and on the arm is 32, we
rearrange the frmrinfo portion of the union not to contain a second structure.
 1.6 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.11.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.8.2 21-Apr-2001  bouyer Sync with HEAD
 1.11.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.13.22.4 11-Dec-2005  christos Sync with head.
 1.13.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.13.22.1 03-Aug-2004  skrll Sync with HEAD
 1.14.16.4 27-Feb-2008  yamt sync with head.
 1.14.16.3 21-Jan-2008  yamt sync with head
 1.14.16.2 30-Dec-2006  yamt sync with head.
 1.14.16.1 21-Jun-2006  yamt sync with head.
 1.15.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.15.8.1 24-May-2006  yamt sync with head.
 1.15.6.1 01-Jun-2006  kardel Sync with head.
 1.15.4.1 09-Sep-2006  rpaulo sync with head
 1.16.12.1 18-Dec-2006  tron Pull up following revision(s) (requested by is in ticket #280):
sys/net/if_llc.h: revision 1.17
define some more magic LLC constants
 1.16.10.1 18-Dec-2006  yamt sync with head.
 1.16.8.1 12-Jan-2007  ad Sync with head.
 1.17.32.1 02-Jan-2008  bouyer Sync with HEAD
 1.17.28.1 26-Dec-2007  ad Sync with head.
 1.17.26.1 18-Feb-2008  mjf Sync with HEAD.
 1.17.20.2 23-Mar-2008  matt sync with HEAD
 1.17.20.1 09-Jan-2008  matt sync with HEAD
 1.19.16.1 19-Oct-2008  haad Sync with HEAD.
 1.19.12.1 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.19.10.1 04-May-2009  yamt sync with head.
 1.19.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.20.38.1 03-Dec-2017  jdolecek update from HEAD
 1.21.32.1 03-Apr-2021  thorpej Sync with HEAD.
 1.119 21-Sep-2025  christos Centralize all the "can't handle af%d\n", messages in one place and provide
more context. Now I get ad-nauseam:
ether_output: wm1: can't handle af18 (link: link#2)
 1.118 04-Sep-2022  thorpej branches: 1.118.8;
Fix "MPLS handled this" detection logic in the rump environment.
 1.117 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.116 03-Sep-2022  thorpej Convert MPLS from a legacy netisr to pktqueue.
 1.115 03-Sep-2022  thorpej Convert NETATALK from a legacy netisr to pktqueue.
 1.114 31-Jul-2022  mlelstv Count dropped packets caused by ENOBUFS as interface error.
 1.113 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.112 14-Oct-2020  roy branches: 1.112.6;
loop: set LINK_STATE_UP a touch earlier
 1.111 14-Oct-2020  roy loop: this interface's link state cannot be anything other than UP

Let's not pretend it's UNKNOWN anymore.
 1.110 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.109 14-Nov-2019  msaitoh branches: 1.109.2;
Fix comment.
 1.108 11-Nov-2019  msaitoh Fix IP broadcast + checksum offload problem.

When a machine sends a IP broadcast packet to an Ethernet interface that the
checksum offload flags are set, the packet goes through ether_output() ->
looutput() and the offload flags is cleard without calculating the checksum.
And then, ip_input() calculate the packet's checksum because it's csum_flags is
zero. It regard as bad checksum and it's dropped because the packet's ifp
is s not lo0's. Fixes this bug by passing csum_flags as "calculated and good"
when IN_LOOPBACK_NEED_CHECKSUM() is false. Adviced by ryo@.

This problem was seen when "routed -s" was used and the machine's interface's
offload flags were set. bad checksum field of "netstat -s" was increased every
30 minutes.
 1.107 26-Apr-2019  pgoyette branches: 1.107.2;
Set the "required modules" to NULL, not to an empty string.

It really doesn't make that much difference to the code, but the output
from modstat(8) is different! (With an empty string in the MODULE() macro
modstat reports an empty string, but with a NULL in the macro, modstat
prints a '-' just like it does for other "empty" fields.)
 1.106 15-Nov-2018  maxv Simplify the mtag API:

- Remove m_tag_init(), m_tag_first(), m_tag_next() and
m_tag_delete_nonpersistent().

- Remove the 't' argument from m_tag_delete_chain().
 1.105 10-Aug-2018  maxv Rename

ip6_undefer_csum -> in6_undefer_cksum
in6_delayed_cksum -> in6_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in6_offload.c. Add comments to explain what
we're doing.

Same as IPv4.
 1.104 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.103 28-Jun-2018  ozaki-r branches: 1.103.2;
loop: don't allocate a unnecessary link-state-change thread
 1.102 26-Jun-2018  msaitoh Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.101 19-Dec-2017  ozaki-r branches: 1.101.2;
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point

Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.

Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.100 06-Dec-2017  ozaki-r Ensure to not turn on IFF_RUNNING of an interface until its initialization completes

And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
 1.99 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.98 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.97 15-Nov-2017  ozaki-r Don't take KERNEL_LOCK in looutput if NET_MPSAFE

We can perhaps get rid of KERNEL_LOCK from looutput, but for now
keep it for safe.
 1.96 23-Oct-2017  msaitoh If if_attach() failed in the attach function, free resources and return.
 1.95 21-Sep-2017  knakahara loop_clone_create() must be called after ncpu is counted up for all CPUs.

loop_clone_create() uses ncpu in the following call-path.
- loop_clone_create()
- if_attach()
- if_percpuq_create()
- softint_establish() // use ncpu
- percpu_foreach() // use ncpu

However, loopinit() of built-in module is called from
module_init_class(MODULE_CLASS_DRIVER) which is called before ncpu is counted
up in some architectures. So, It is too fast.
On the other hand, it is too late for rump netinet component to call
loop_clone_create() in config_finalize().

As the result, loop_clone_create() shuld be called in loopattach() for built-in
module, and in loopinit() for dynamic module.

XXX need pullup -8 branch
 1.94 28-Mar-2017  ozaki-r branches: 1.94.6;
Avoid touching a mbuf after enqueuing it
 1.93 22-Nov-2016  ozaki-r branches: 1.93.2;
Make lortrequest static and rename it to loop_rtrequest

No functional change.
 1.92 11-Aug-2016  kre Avoid init'ing lo0 twice ... which rump kernels do without this hack.
If rump gets fixed, this could be removed (though it is harmless in
any case.)

This should fix several more of the currently failing ATF tests.
 1.91 10-Aug-2016  kre On the first day (that being the eighth day of the eighth month,) the
building was completed only to discover that within there lay havoc.

On the second day all just groaned and moaned, and it must be someone
else's problen.

On the third day, St. Martin stepped in and traced the culprit, which
provided inspiration, and a correction was made.

Forevermore all were agog at just how such a trivial thing could do
so much damage...


OK... to be a little less vague. The loopback interface is a truly
"special" thing, and rump knew that - and treated it very specially.
Unfortunately, when the loopback interface is changed, and rump does
not keep up, bad things happen.

This (overall) might, or might not, be the correct fix - but for now
it appears to work. If someone, sometime, finds a better way to
deal with the issues of the loopback interfaces true majesty, feel
free to revert this and do it another way.
 1.90 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.89 22-Jun-2016  knakahara branches: 1.89.2;
fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.88 20-Jun-2016  knakahara make looutput() MP-safe, so that lo(4) can enable IFEF_OUTPUT_MPSAFE.

making MP-scalable is future work.
 1.87 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.86 28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.85 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.84 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.83 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.82 25-May-2015  ozaki-r Remove leftover IPX-related stuffs

No objection on tech-kern and tech-net.
 1.81 03-Apr-2015  ozaki-r Don't grab KERNEL_LOCK during if_output when NET_MPSAFE

The change makes L3 MP-safe work easy. At this point
we deal with only IP forwarding.

No functional change when NET_MPSAFE isn't enabled.
 1.80 07-Jun-2014  rmind branches: 1.80.4;
lostart: silence gcc warning (XXX: gcc is not right though).
 1.79 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.78 20-May-2014  pooka if_free() instead of direct call to free() to avoid diagnostic panic

Bug exposed by justin's Lua ljsyscall tests:
http://build.myriabit.eu:8012/waterfall
 1.77 13-May-2014  bouyer Make sure *(if_output)() is called with KERNEL_LOCK held.
Add some KASSERT for this.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details.
 1.76 01-Mar-2013  joerg branches: 1.76.6; 1.76.10;
Retire OSI network stack. OK core@
 1.75 20-Jun-2011  kefren branches: 1.75.2; 1.75.8; 1.75.12; 1.75.14; 1.75.18;
Avoid computing INET[6] cksums for MPLS packets
 1.74 17-Jun-2011  kefren teach loopback about MPLS. Prerequisite for MPLS tunnels
 1.73 25-Apr-2011  yamt branches: 1.73.2;
undefer csum in looutput.
looutput is used by various code (ether_output, mcast) to loopback packets.
 1.72 05-Apr-2010  joerg branches: 1.72.2;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.71 19-Jan-2010  pooka branches: 1.71.2; 1.71.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.70 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.69 24-Oct-2008  dyoung branches: 1.69.2;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.68 15-Jun-2008  christos branches: 1.68.2;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.67 07-Feb-2008  dyoung branches: 1.67.6; 1.67.8; 1.67.10; 1.67.12; 1.67.14;
Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.66 19-Oct-2007  ad branches: 1.66.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.65 01-Sep-2007  dyoung branches: 1.65.4;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.64 04-Mar-2007  christos branches: 1.64.2; 1.64.10; 1.64.14; 1.64.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.63 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.62 16-Nov-2006  christos branches: 1.62.4;
__unused removal on arguments; approved by core.
 1.61 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.60 09-Oct-2006  peter Remove unneeded usage of LIST_*.

ok cube@
 1.59 08-Oct-2006  martin Make lo* always have the IFF_RUNNING flag set, to apease some
SNMP apps. Fixes PR kern/11830.
 1.58 07-Sep-2006  dogcow branches: 1.58.2; 1.58.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.57 11-Dec-2005  thorpej branches: 1.57.4; 1.57.8;
ANSI function decls and application of static.
 1.56 11-Dec-2005  christos merge ktrace-lwp.
 1.55 26-Feb-2005  perry branches: 1.55.4;
nuke trailing whitespace
 1.54 05-Dec-2004  peter branches: 1.54.4; 1.54.6;
Use ANSI function decls, change a few 0 to NULL.
 1.53 04-Dec-2004  peter Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.52 04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.51 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.50 21-Apr-2004  itojun kill sprintf, use snprintf
 1.49 13-Nov-2003  jonathan Add m_tag_delete_nonpesrsistent(), for deleting all packet tags on
mbuf chains which are recycled (e.g., ICMP reflection, loopback
interface). A consensus was reached that such recycled packets should
behave (more-or-less) the same way if a new chain had been allocated
and the contents copied to that chain.

Some packet tags may in future be marked as "persistent" (e.g., for
mandatory access controls) and should persist across such deletion.
NetBSD as yet hos no persistent tags, so m_tag_delete_nonpersistent()
just deletes all tags. This should not be relied upon.
 1.48 15-Aug-2003  jonathan Make if_loop MTU settable via SIOCSFMTU/ifconfig. Useful for testing,
and for regression-testing performance at various MTUs.

NB: route MTU may not track MTU changes, which may cause problems for
AF_ISO if loopback MTU is decreased. I've never seen problems with IP,
in various tests going back to around NetBSD 1.3.
 1.47 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.46 23-Jun-2003  martin branches: 1.46.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.45 16-May-2003  itojun use strlcpy
 1.44 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.43 01-May-2003  itojun bpf_mtap() does not care about M_PKTHDR at the top. M_COPY_PKTHDR has some
consequences, so avoid it. if we need to attach dummy headers, we should
use M_PREPEND instead.
 1.42 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.41 26-Sep-2002  darrenr Fix a case where M_COPY_PKTHDR() wasn't being used prior to calling bpf_mtap
 1.40 12-Nov-2001  lukem add RCSIDs
 1.39 14-Jun-2001  itojun branches: 1.39.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.38 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.37 20-Feb-2001  itojun branches: 1.37.2;
explicitly use u_int32_t for DLT_NULL encapsulation.

correct gif address family. from chopps, sync with kame.
 1.36 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.35 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.34 18-Dec-2000  thorpej Oops, make this build without ALTQ again.
 1.33 18-Dec-2000  thorpej Fill in if_dlt.
 1.32 18-Dec-2000  thorpej Add ALTQ support. This is used for testing/debugging ALTQ only. It
is triggered only on loopback interfaces, and not simplex interfaces
(which also use looutput()).
 1.31 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.30 30-Mar-2000  augustss Kill some more register declarations.
 1.29 06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.28 15-Dec-1999  itojun change mbuf trimming.
confirmed that ping -s 1480 (my ip addr) panics before, and works fine now.

possibly fixes PR: 8990
 1.27 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.26 01-Jul-1999  itojun branches: 1.26.2; 1.26.8;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.25 05-Jul-1998  jonathan branches: 1.25.6; 1.25.10; 1.25.12;
defopt NS, NSIP.
 1.24 05-Jul-1998  jonathan defopt ISO TPIP.
 1.23 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.22 04-May-1998  christos Add IPX bits.
 1.21 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.20 14-Aug-1997  jonathan Add MHLEN + MLEN extra space to LOMTU for IP and transport headers.
 1.19 02-Apr-1997  christos branches: 1.19.4;
Add netatalk stubs.
 1.18 21-Mar-1997  mycroft Don't feed packets to BPF that were not `sent' from the loopback device.
From PR 1693, by Jean-Luc Richier.
 1.17 13-Oct-1996  christos backout previous kprintf change
 1.16 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.15 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.14 23-Jul-1995  mycroft Make panic message consistent.
 1.13 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.12 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.11 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.10 02-Feb-1994  hpeyerl Multicast is no longer optional
 1.9 17-Dec-1993  mycroft From magnum branch:
Remove Jolitz's netisr kluge. Make sure cpl == 0 really means base priority.
Other minor cleanup.
 1.8 06-Dec-1993  hpeyerl multicast support.
From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.7 23-Nov-1993  deraadt rename loattach() to loopattach() so that the pdevinit[] stuff can find it.
 1.6 27-Jun-1993  andrew branches: 1.6.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.5 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.4 07-May-1993  cgd patch for multiple loopback interfaces (via "pseudo-device loop 2", etc.)
from David Burren <davidb@otto.bf.rmit.oz.au>
 1.3 10-Apr-1993  glass fixed missing include to avoid warning
 1.2 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6.4.4 14-Nov-1993  mycroft Canonicalize all #includes.
 1.6.4.3 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.6.4.2 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.6.4.1 14-Sep-1993  mycroft loattach() --> loopattach()
 1.19.4.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.25.12.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.25.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.25.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.25.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.26.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.26.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.26.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.26.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.26.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.26.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.26.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.37.2.3 18-Oct-2002  nathanw Catch up to -current.
 1.37.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.37.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.39.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.39.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.46.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.46.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.46.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.46.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.46.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.46.2.1 03-Aug-2004  skrll Sync with HEAD
 1.54.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.54.4.1 29-Apr-2005  kent sync with -current
 1.55.4.6 11-Feb-2008  yamt sync with head.
 1.55.4.5 27-Oct-2007  yamt sync with head.
 1.55.4.4 03-Sep-2007  yamt sync with head.
 1.55.4.3 26-Feb-2007  yamt sync with head.
 1.55.4.2 30-Dec-2006  yamt sync with head.
 1.55.4.1 21-Jun-2006  yamt sync with head.
 1.57.8.1 14-Sep-2006  yamt sync with head.
 1.57.4.1 09-Sep-2006  rpaulo sync with head
 1.58.4.2 10-Dec-2006  yamt sync with head.
 1.58.4.1 22-Oct-2006  yamt sync with head
 1.58.2.1 18-Nov-2006  ad Sync with head.
 1.62.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.62.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.64.16.2 23-Mar-2008  matt sync with HEAD
 1.64.16.1 06-Nov-2007  matt sync with HEAD
 1.64.14.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.64.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.64.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.64.2.2 23-Oct-2007  ad Sync with head.
 1.64.2.1 09-Oct-2007  ad Sync with head.
 1.65.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.66.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.67.14.1 18-Jun-2008  simonb Sync with head.
 1.67.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.67.10.3 11-Aug-2010  yamt sync with head.
 1.67.10.2 11-Mar-2010  yamt sync with head
 1.67.10.1 04-May-2009  yamt sync with head.
 1.67.8.1 17-Jun-2008  yamt sync with head.
 1.67.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.67.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.68.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.69.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.71.4.2 31-May-2011  rmind sync with head
 1.71.4.1 30-May-2010  rmind sync with head
 1.71.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.72.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.73.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.75.18.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.75.14.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.75.12.3 03-Dec-2017  jdolecek update from HEAD
 1.75.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.75.12.1 23-Jun-2013  tls resync from head
 1.75.8.1 03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.75.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.76.10.1 10-Aug-2014  tls Rebase.
 1.76.6.1 18-May-2014  rmind sync with head
 1.80.4.9 28-Aug-2017  skrll Sync with HEAD
 1.80.4.8 05-Dec-2016  skrll Sync with HEAD
 1.80.4.7 05-Oct-2016  skrll Sync with HEAD
 1.80.4.6 09-Jul-2016  skrll Sync with HEAD
 1.80.4.5 29-May-2016  skrll Sync with HEAD
 1.80.4.4 22-Apr-2016  skrll Sync with HEAD
 1.80.4.3 22-Sep-2015  skrll Sync with HEAD
 1.80.4.2 06-Jun-2015  skrll Sync with HEAD
 1.80.4.1 06-Apr-2015  skrll Sync with HEAD
 1.89.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.89.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.93.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.94.6.5 14-Nov-2019  martin Pull up the following revisions, requested by msaitoh in ticket #1438:

sys/net/if_loop.c 1.108-1.109 via patch

Fix a bug that an IP broadcast packet back to myself
is dropped as bad checksum when an interface's checksum
offload is set.
 1.94.6.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.94.6.3 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.94.6.2 23-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #382):
sys/net/if_bridge.c: revision 1.139
sys/net/if_loop.c: revision 1.97
Don't take KERNEL_LOCK in looutput if NET_MPSAFE
We can perhaps get rid of KERNEL_LOCK from looutput, but for now
keep it for safe.
--
Mark callouts of bridge CALLOUT_MPSAFE
 1.94.6.1 24-Oct-2017  snj Pull up following revision(s) (requested by knakahara in ticket #304):
sys/net/if_loop.c: revision 1.95
loop_clone_create() must be called after ncpu is counted up for all CPUs.
loop_clone_create() uses ncpu in the following call-path.
- loop_clone_create()
- if_attach()
- if_percpuq_create()
- softint_establish() // use ncpu
- percpu_foreach() // use ncpu
However, loopinit() of built-in module is called from
module_init_class(MODULE_CLASS_DRIVER) which is called before ncpu is counted
up in some architectures. So, It is too fast.
On the other hand, it is too late for rump netinet component to call
loop_clone_create() in config_finalize().
As the result, loop_clone_create() shuld be called in loopattach() for built-in
module, and in loopinit() for dynamic module.
 1.101.2.3 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.101.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.101.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.103.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.103.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.103.2.1 10-Jun-2019  christos Sync with HEAD
 1.107.2.1 14-Nov-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #424):

sys/net/if_loop.c: revision 1.108
sys/net/if_loop.c: revision 1.109

Fix IP broadcast + checksum offload problem.

When a machine sends a IP broadcast packet to an Ethernet interface that the
checksum offload flags are set, the packet goes through ether_output() ->
looutput() and the offload flags is cleared without calculating the checksum.

And then, ip_input() calculate the packet's checksum because it's csum_flags is
zero. It regard as bad checksum and it's dropped because the packet's ifp
is s not lo0's. Fixes this bug by passing csum_flags as "calculated and good"
when IN_LOOPBACK_NEED_CHECKSUM() is false. Advised by ryo@.

This problem was seen when "routed -s" was used and the machine's interface's
offload flags were set. bad checksum field of "netstat -s" was increased every
30 seconds.

Fix comment.
 1.109.2.1 29-Feb-2020  ad Sync with head.
 1.112.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.118.8.2 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.118.8.1 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.54 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.53 06-Oct-2021  andvar s/acccess/access/
 1.52 15-Mar-2020  thorpej Define and implement a locking protocol for the ifmedia / mii layers:
- MP-safe drivers provide a mutex to ifmedia that is used to serialize
access to media-related structures / hardware regsiters. Converted
drivers use the new ifmedia_init_with_lock() function for this. The
new name is provided to ease the transition.
- Un-converted drivers continue to call ifmedia_init(), which will supply
a compatibility lock to be used instead. Several media-related entry
points must be aware of this compatibility lock, and are able to acquire
it recursively a limited number of times, if needed. This is a SPIN
mutex with priority IPL_NET.
- This same lock is used to serialize access to PHY registers and other
MII-related data structures.

The PHY drivers are modified to acquire and release the lock, as needed,
and assert the lock is held as a diagnostic aid.

The "usbnet" framework has had an overhaul of its internal locking
protocols to fit in with the media / mii changes, and the drivers adapted.

USB wifi drivers have been changed to provide their own adaptive mutex
to the ifmedia later via a new ieee80211_media_init_with_lock() function.
This is required because the USB drivers need an adaptive mutex.

Besised "usbnet", a few other drivers are converted: vmx, wm, ixgbe / ixv.

mcx also now calls ifmedia_init_with_lock() because it needs to also use
an adaptive mutex. The mcx driver still needs to be fully converted to
NET_MPSAFE.
 1.51 01-Feb-2020  thorpej - Add an ifmedia_fini() routine, to free resources assocated with
an ifmedia. Currently calls ifmedia_removeall(). All drivers
that call ifmedia_init() and support detach should call this
routine.
- In ifmedia_delete_instance(), set ifm->ifm_cur to NULL and
ifm->ifm_media to IFM_NONE when removing / freeing that entry,
not simply when we've been asked to delete every media instance.
 1.50 31-Jan-2020  thorpej - Use kmem(9) instead of malloc(9).
- When handling SIOCGIFMEDIA, don't traverse the media list directly;
refactor that out into a ifmedia_getwords() function.
 1.49 20-Jan-2020  thorpej In ifmedia_ioctl(), go to splnet() before acquiring the KERNEL_LOCK.
For non-NET_MPSAFE, this is benign, because we can nest raising to
splnet(). For the NET_MPSAFE, it means that drivers don't need to
raise to splnet() just in order to call ifmedia_ioctl().
 1.48 01-Oct-2019  chs branches: 1.48.2;
in ifmedia_add(), use a wait-style memory allocation rather than
not waiting and panic'ing if the allocation fails.

Reported-by: syzbot+249ca42197f0b066e154@syzkaller.appspotmail.com
 1.47 10-Aug-2019  mrg rename _ifmedia_ioctl() to ifmedia_ioctl_locked().
 1.46 21-May-2019  msaitoh KNF. No functional change.
 1.45 17-May-2019  msaitoh The max subtype of the ifmedia word is 31. It's too small for Ethernet now.
We currently use use it up to 30. We should extend the limit to be able to use
more than 10Gbps speeds. Our ifmedia(4) is inconvenience and have some problem
so we should redesign the interface, but it's too late for netbsd-9 to do it.
So, we keep the data structure size and modify the structure a bit. The
strategy is almost the same as FreeBSD. Many bits of IFM_OMASK for Ethernet
have not used, so use some of them for Ethernet's subtype.

The differences against FreeBSD are:
- We use NetBSD style compat code (i.e. no SIOCGIFXMEDIA).
- FreeBSD's IFM_ETH_XTYPE's bit location is from 11 to "14" even though
IFM_OMASK is from 8 to "15". We use _IFM_ETH_XTMASK from bit 13 to "15".
- FreeBSD changed the meaning of IFM_TYPE_MATCH(). I think we should
not do it. We keep it not changing and added new IFM_TYPE_SUBTYPE_MATCH()
macro for matching both TYPE and SUBTYPE.
- Added up to 400GBASE-SR16.

New layout of the media word is as follows (from ifmedia_h):

* if_media Options word:
* Bits Use
* ---- -------
* 0-4 Media subtype MAX SUBTYPE == 255 for ETH and 31 for others
* 5-7 Media type
* 8-15 Type specific options
* 16-18 Mode (for multi-mode devices)
* 19 (Reserved for Future Use)
* 20-27 Shared (global) options
* 28-31 Instance
*
* 3 2 1
* 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
* +-------+---------------+-+-----+---------------+-----+---------+
* | | |R| | | | |
* | IMASK | GMASK |F|MMASK+-----+ OMASK |NMASK| TMASK |
* | | |U| |XTMSK| | | |
* +-------+---------------+-+-----+-----+---------+-----+---------+
* <-----> <---> <--->
* IFM_INST() IFM_MODE() IFM_TYPE()
*
* IFM_SUBTYPE(other than ETH)<------->
*
* <---> IFM_SUBTYPE(ETH)<------->
*
*
* <-------------> <------------->
* IFM_OPTIONS()
 1.44 10-May-2019  msaitoh Use %08x to print ifmedia word (IFMEDIA_DEBUG).
 1.43 23-Apr-2019  msaitoh KNF. No functional change.
 1.42 22-Apr-2019  msaitoh Add missing error check.
 1.41 16-Apr-2019  msaitoh It's not required (and can't do) to convert OSIOCSIFMEDIA in ifmedia_ioct()
because the conversiosn is done in doifioctl().
 1.40 10-Apr-2019  msaitoh KNF. No functional change.
 1.39 10-Apr-2019  msaitoh Fix a bug that OSIOCSIFMEDIA can't treat. Add missing inclusion of
compat/sys/sockio.h.
 1.38 28-Feb-2019  msaitoh - Remove extra cast.
- Cosmetic change.
 1.37 28-Feb-2019  msaitoh No functional change:
- Use TAILQ_FOREACH{,_SAFE}() macro.
- KNF.
 1.36 30-Mar-2018  mlelstv branches: 1.36.2;
add prototypes, validate ifm_change and ifm_status vectors. NFC.
 1.35 22-Nov-2017  ozaki-r branches: 1.35.2;
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE

If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.

This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.

Proposed on tech-kern@ and tech-net@
 1.34 23-Oct-2017  msaitoh Clear ifm_cur and ifm_media after removing all ifmedia entries(IFM_INST_ANY)
in ifmedia_delete_instance() like if_media.c rev. 1.32.
Now if_media_delete_instance(IFM_INST_ANY) is the same as ifmedia_removeall().
 1.33 20-Oct-2017  msaitoh No functional change:
- Simplify ifmedia_removeall using with ifmedia_delete_instance(IFM_INST_ANY).
- KNF.
 1.32 25-Jan-2017  msaitoh branches: 1.32.6;
ifmedia_removeall(): Clear ifm_cur and ifm_media after removing all ifmedia
entries.
 1.31 25-Jan-2017  msaitoh ifmedia_init(): Clear ifm_media with IFM_NONE instead of 0.
 1.30 05-Oct-2009  dyoung branches: 1.30.22; 1.30.40; 1.30.44; 1.30.48;
Replace u_quad_t with uint64_t. u_quad_t is just a typedef for
uint64_t, so no ABI/API breakage will result from this change.
 1.29 15-Jun-2008  christos - Add more definitions from FreeBSD
- Add ifmedia_removeall from FreeBSD
 1.28 28-Apr-2008  martin branches: 1.28.2; 1.28.4;
Remove clause 3 and 4 from TNF licenses
 1.27 10-Jan-2008  dyoung branches: 1.27.6; 1.27.8; 1.27.10;
Add a helper subroutine for ethernet drivers, ifmedia_change().
 1.26 29-May-2007  christos branches: 1.26.8; 1.26.14; 1.26.20;
Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.25 11-Dec-2005  christos branches: 1.25.30; 1.25.32;
merge ktrace-lwp.
 1.24 26-Feb-2005  perry branches: 1.24.4;
nuke trailing whitespace
 1.23 08-Dec-2004  dyoung branches: 1.23.2; 1.23.4;
As pointed out by Greg Troxel, ifmedia_entrys were allocated with
malloc_type M_IFADDR and freed with malloc_type M_DEVBUF. This
causes a panic(9) in DIAGNOSTIC kernels. Add malloc_type M_IFMEDIA
and use it for both malloc'ing and free'ing ifmedia_entrys.
 1.22 09-Apr-2004  thorpej De-__P'ify.
 1.21 19-Feb-2004  ragge branches: 1.21.2; 1.21.4;
Add media type 10GbaseLR. Change ifmb_baudrate and ifmedia_baudrate()
to u_quad_t instead of int (common speed today exceeds 2Gbit).
 1.20 03-Nov-2003  briggs ifmedia_set() should not panic, nor can it really fail. So if there is
some problem setting the media to the requested value (usually IFM_AUTO),
we now force the media selection to IFM_NONE.
This addresses PR/14029 ``panic("ifmedia_set") a little too brutal''
and may address to some degree PR/19504 and PR/23341.
 1.19 25-Jul-2003  christos Avoid DOS attack by setting ifm->ifm_media to a high number and running the
kernel out of memory. Thanks to Andreas Oman.
 1.18 12-Nov-2002  chs branches: 1.18.6;
when there are multiple matches for the requested media, select the first
matching instance rather than the last one. this restores the behaviour
in the multiple-match case to what it was when all the drivers only allowed
instance 0 (and in particular, makes autonegotiation of the on-board fxp
on my DK440LX board work again by default, which has two PHYs that both
advertise "auto"). as discussed on tech-net.
 1.17 07-Nov-2002  thorpej Fix more signed/unsigned comparison warnings.
 1.16 11-Sep-2002  itojun KNF - return is not a function.
 1.15 12-Nov-2001  lukem add RCSIDs
 1.14 18-May-2001  drochner branches: 1.14.2;
fix typo in comment
 1.13 26-Feb-2001  joda branches: 1.13.2;
when changing to an unsupported media type, return EINVAL instead of
ENXIO
 1.12 17-Jan-2001  jdolecek make local const stuff as static const, so that it's pushed to text segment
 1.11 30-Mar-2000  augustss Kill some more register declarations.
 1.10 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.9 06-Mar-2000  thorpej Add ifmedia_baudrate(), which returns a value suitable for ifi_baudrate
given a media word, or 0 for unknown.
 1.8 26-Jan-2000  thorpej Add a way to delete all media for a specified instance.
 1.7 03-Nov-1999  thorpej Make the ifmedia_entry list a TAILQ. This is pretty much for cosmetics
(media added to tail, so that when e.g. the list is run to print out
what media exist, they appear in-order).
 1.6 27-Oct-1999  thorpej Expose the ifmedia_match() function.
 1.5 30-Apr-1999  thorpej branches: 1.5.2; 1.5.4; 1.5.6;
Back out previous. It was just ... braindamaged.
 1.4 30-Apr-1999  abs If the driver only supports one media type, and ifmedia_ioctl() is called to
select the current medium, (and it is not autoselect), assume no change and
do not try to select the medium. Fixes 'ifconfig le0 medium 10base5' on sparc2
without requiring a 'do nothing' mediachange callback.
 1.3 30-Aug-1998  enami branches: 1.3.6; 1.3.8;
Make this compile with -DIFMEDIA_DEBUG.
 1.2 06-Aug-1998  thorpej Completely rewrite the way media descriptions are represented. The same
data structure is used, but a much saner matching mechanism is used, one
which allows greater ease in adding new types.
 1.1 17-Mar-1997  thorpej BSD/OS-style network interface media selection, implemented by
Jonathan Stone and myself. Many thanks to Matt Thomas for providing
the information necessary to implement this interface, and for helping
to shake out the bugs.
 1.3.8.1 21-Jun-1999  thorpej Sync w/ -current.
 1.3.6.1 11-May-2000  he Pull up revision 1.9 (requested by jhawk):
Add a driver for ``wi'', Lucent "Orinoco"/Wavelan.
 1.5.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.4.1 15-Nov-1999  fvdl Sync with -current
 1.5.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.5.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.13.2.4 11-Nov-2002  nathanw Catch up to -current
 1.13.2.3 17-Sep-2002  nathanw Catch up to -current.
 1.13.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.13.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.14.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.14.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.6.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.18.6.4 18-Dec-2004  skrll Sync with HEAD.
 1.18.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.18.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.18.6.1 03-Aug-2004  skrll Sync with HEAD
 1.21.4.1 07-Jan-2005  jdc Pull up revision 1.23 (requested by dyoung in ticket #1030).

As pointed out by Greg Troxel, ifmedia_entrys were allocated with
malloc_type M_IFADDR and freed with malloc_type M_DEVBUF. This
causes a panic(9) in DIAGNOSTIC kernels. Add malloc_type M_IFMEDIA
and use it for both malloc'ing and free'ing ifmedia_entrys.
 1.21.2.1 07-Jan-2005  jdc Pull up revision 1.23 (requested by dyoung in ticket #1030).

As pointed out by Greg Troxel, ifmedia_entrys were allocated with
malloc_type M_IFADDR and freed with malloc_type M_DEVBUF. This
causes a panic(9) in DIAGNOSTIC kernels. Add malloc_type M_IFMEDIA
and use it for both malloc'ing and free'ing ifmedia_entrys.
 1.23.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.23.2.1 29-Apr-2005  kent sync with -current
 1.24.4.2 21-Jan-2008  yamt sync with head
 1.24.4.1 03-Sep-2007  yamt sync with head.
 1.25.32.1 11-Jul-2007  mjf Sync with head.
 1.25.30.1 09-Jun-2007  ad Sync with head.
 1.26.20.1 10-Jan-2008  bouyer Sync with HEAD
 1.26.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.26.8.1 23-Mar-2008  matt sync with HEAD
 1.27.10.3 11-Mar-2010  yamt sync with head
 1.27.10.2 04-May-2009  yamt sync with head.
 1.27.10.1 16-May-2008  yamt sync with head.
 1.27.8.2 17-Jun-2008  yamt sync with head.
 1.27.8.1 18-May-2008  yamt sync with head.
 1.27.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.27.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.28.4.1 18-Jun-2008  simonb Sync with head.
 1.28.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.30.48.1 21-Apr-2017  bouyer Sync with HEAD
 1.30.44.1 20-Mar-2017  pgoyette Sync with HEAD
 1.30.40.1 05-Feb-2017  skrll Sync with HEAD
 1.30.22.1 03-Dec-2017  jdolecek update from HEAD
 1.32.6.3 14-May-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1266):

sys/net/if_media.c: revision 1.42 (via patch)

Add missing error check.
 1.32.6.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.32.6.1 22-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #370):
sys/net/if_media.c: revision 1.33
sys/net/if_media.c: revision 1.34
No functional change:
- Simplify ifmedia_removeall using with ifmedia_delete_instance(IFM_INST_ANY).
- KNF.
Clear ifm_cur and ifm_media after removing all ifmedia entries(IFM_INST_ANY)
in ifmedia_delete_instance() like if_media.c rev. 1.32.
Now if_media_delete_instance(IFM_INST_ANY) is the same as ifmedia_removeall().
 1.35.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.36.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.36.2.1 10-Jun-2019  christos Sync with HEAD
 1.48.2.2 29-Feb-2020  ad Sync with head.
 1.48.2.1 25-Jan-2020  ad Sync with head.
 1.72 18-Apr-2024  andvar s/resoure/resource/ in comments.
 1.71 15-Mar-2020  thorpej Define and implement a locking protocol for the ifmedia / mii layers:
- MP-safe drivers provide a mutex to ifmedia that is used to serialize
access to media-related structures / hardware regsiters. Converted
drivers use the new ifmedia_init_with_lock() function for this. The
new name is provided to ease the transition.
- Un-converted drivers continue to call ifmedia_init(), which will supply
a compatibility lock to be used instead. Several media-related entry
points must be aware of this compatibility lock, and are able to acquire
it recursively a limited number of times, if needed. This is a SPIN
mutex with priority IPL_NET.
- This same lock is used to serialize access to PHY registers and other
MII-related data structures.

The PHY drivers are modified to acquire and release the lock, as needed,
and assert the lock is held as a diagnostic aid.

The "usbnet" framework has had an overhaul of its internal locking
protocols to fit in with the media / mii changes, and the drivers adapted.

USB wifi drivers have been changed to provide their own adaptive mutex
to the ifmedia later via a new ieee80211_media_init_with_lock() function.
This is required because the USB drivers need an adaptive mutex.

Besised "usbnet", a few other drivers are converted: vmx, wm, ixgbe / ixv.

mcx also now calls ifmedia_init_with_lock() because it needs to also use
an adaptive mutex. The mcx driver still needs to be fully converted to
NET_MPSAFE.
 1.70 17-Feb-2020  msaitoh - Remove 50GBASE-LR10.
- Add the following medias:
- 25GBASE-ACC
- 100GBASE-ACC
- 100GBASE-AOC
- 100GBASE-FR
- 100GBASE-LR
- 200GBASE-ER4
- 400GBASE-ER8
- 400GBASE-FR4
- 400GBASE-LR4
- 400GBASE-SR4.2
- 400GBASE-SR8
 1.69 01-Feb-2020  thorpej - Add an ifmedia_fini() routine, to free resources assocated with
an ifmedia. Currently calls ifmedia_removeall(). All drivers
that call ifmedia_init() and support detach should call this
routine.
- In ifmedia_delete_instance(), set ifm->ifm_cur to NULL and
ifm->ifm_media to IFM_NONE when removing / freeing that entry,
not simply when we've been asked to delete every media instance.
 1.68 05-Dec-2019  msaitoh branches: 1.68.2;
Fix previous comment change for ifm_media. It was correct.

The real problem is that some driver misuse ifm_media as the current active
media. struct mii_data has the current active media(mii_media_active). If a
driver use mii(4), it can be use mii->mii_media_active for this purpose.
struct ifmedia has no entry for this purpose. Some drivers have an entry
in their own softc to keep the value, but some other's don't have it and
they mistakenly use ifm_media.

We might add a new entry to struct ifmedia in future to avoid this confusion
and for simplify.
 1.67 28-Nov-2019  msaitoh Fix comment. The ifm_media member of struct ifmedia is NOT user-set media
word but the current "active" media.

The user-set media word is one of the ifmedia_entry's ifm_media(A) that
ifm_cur points to (e.g. IFM_AUTO). It can be taken as ifmediareq's ifm_current
entry. The current active media word is the ifm_media(B) entry of struct
ifmedia (e.g 1000baseTX-FDX as the result of auto negotiation). It can be
taken as ifmediareq's ifm_active entry.

struct ifmedia_entry {
TAILQ_ENTRY(ifmedia_entry) ifm_list;
u_int ifm_media; /* IFMWD: description of this media */ /* A */
u_int ifm_data; /* for driver-specific use */
void *ifm_aux; /* for driver-specific use */
};

struct ifmedia {
u_int ifm_mask; /* IFMWD: mask of changes we don't care */
u_int ifm_media; /* IFMWD: current active media word */ /* B */
struct ifmedia_entry *ifm_cur; /* current user-selected media */
TAILQ_HEAD(, ifmedia_entry) ifm_list; /* list of all supported media */
ifm_change_cb_t ifm_change; /* media change driver callback */
ifm_stat_cb_t ifm_status; /* media status driver callback */
};

So:

in kernel SIOCGIFMEDIA(ifmediareq)
-----------------------------------------------------------------
user-setting: ifm->ifm_cur->ifm_media ifm_current
current active: ifm->ifm_media ifm_active

It would be good to rename some members to make those meaning clear.
 1.66 03-Oct-2019  jmcneill Add IFM_IEEE80211_VHT subtype, IFM_IEEE80211_11AC operating mode, and missing descriptions
 1.65 17-May-2019  msaitoh branches: 1.65.2;
The max subtype of the ifmedia word is 31. It's too small for Ethernet now.
We currently use use it up to 30. We should extend the limit to be able to use
more than 10Gbps speeds. Our ifmedia(4) is inconvenience and have some problem
so we should redesign the interface, but it's too late for netbsd-9 to do it.
So, we keep the data structure size and modify the structure a bit. The
strategy is almost the same as FreeBSD. Many bits of IFM_OMASK for Ethernet
have not used, so use some of them for Ethernet's subtype.

The differences against FreeBSD are:
- We use NetBSD style compat code (i.e. no SIOCGIFXMEDIA).
- FreeBSD's IFM_ETH_XTYPE's bit location is from 11 to "14" even though
IFM_OMASK is from 8 to "15". We use _IFM_ETH_XTMASK from bit 13 to "15".
- FreeBSD changed the meaning of IFM_TYPE_MATCH(). I think we should
not do it. We keep it not changing and added new IFM_TYPE_SUBTYPE_MATCH()
macro for matching both TYPE and SUBTYPE.
- Added up to 400GBASE-SR16.

New layout of the media word is as follows (from ifmedia_h):

* if_media Options word:
* Bits Use
* ---- -------
* 0-4 Media subtype MAX SUBTYPE == 255 for ETH and 31 for others
* 5-7 Media type
* 8-15 Type specific options
* 16-18 Mode (for multi-mode devices)
* 19 (Reserved for Future Use)
* 20-27 Shared (global) options
* 28-31 Instance
*
* 3 2 1
* 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
* +-------+---------------+-+-----+---------------+-----+---------+
* | | |R| | | | |
* | IMASK | GMASK |F|MMASK+-----+ OMASK |NMASK| TMASK |
* | | |U| |XTMSK| | | |
* +-------+---------------+-+-----+-----+---------+-----+---------+
* <-----> <---> <--->
* IFM_INST() IFM_MODE() IFM_TYPE()
*
* IFM_SUBTYPE(other than ETH)<------->
*
* <---> IFM_SUBTYPE(ETH)<------->
*
*
* <-------------> <------------->
* IFM_OPTIONS()
 1.64 10-May-2019  msaitoh Remove extra parenthesis.
 1.63 24-Apr-2019  msaitoh No functional change:
- IFM_AVALID and IFM_ACTIVE are NOT for the media word. Fix comment.
- RFU stands for Reserved for Future Use.
 1.62 17-Apr-2019  msaitoh Tabify. No functional change.
 1.61 04-Oct-2017  msaitoh branches: 1.61.4;
Add 2.5GBASE-T and 5GBASE-T.
 1.60 04-Oct-2017  msaitoh All Ethernet media more than 1000Mbps don't support half duplex.
For the convinience, ifconfig without "mediaopt fullduplex" sets IFM_FDX
automatically for those medias. Without this change, "ifconfig xxN mediaopt
10Gbase-T" (without "mediaopt fullduplex") returns EINVAL if a
driver doesn't call ifmedia_add() without IFM_FDX because ifmedia_match()
returns NULL.
 1.59 08-Jun-2017  msaitoh - Add some missing baudrate entries
- Add 1000BASE-KX and 2500BASE-KX
 1.58 05-Jun-2017  msaitoh No functional change:
- Relocate definitions in the following order to be easy to understand.
0) IFM_*MASK
1) macros to extract various bits of information from the media word.
2) Media type.
3) Shared media sub-type.
4) Status bits.
5) Shared (global) options
6) Media dependent definitions.
7) kernel function declarations.
7) userland function declarations.
- Add comments.

This change makes me realize that:
0) RFU bit have never used.
1) bit 1..0 are shared between Shared media sub-type and Status bits.
It's little dangerous.
2) No. 5 of Media type is not used (hole).
3) Only IEEE80211 uses IFM_MMASK(IFM_MODE()) bits.
4) IFM_TOKEN's OMASK bits doesn't start from 0x00000100 but starts from
0x00000200. Is this for BSD/OS compatibility?
 1.57 14-Sep-2016  roy branches: 1.57.8;
Introduce IFM_GENERIC.
This allows use of the media interface, but without media as such.
It's sole purpose is to facilitate the reporting of the link status.
 1.56 25-Oct-2012  msaitoh branches: 1.56.14; 1.56.18;
Add 1000baseT-FDX.
 1.55 20-Feb-2011  cegger branches: 1.55.4; 1.55.14;
add MBSS. From FreeBSD.
 1.54 26-Jan-2011  dyoung Add some 10-gigabit media words used by Intel 82599.
 1.53 05-Oct-2009  dyoung branches: 1.53.4; 1.53.6; 1.53.8;
Replace u_quad_t with uint64_t. u_quad_t is just a typedef for
uint64_t, so no ABI/API breakage will result from this change.
 1.52 12-Aug-2009  msaitoh Add 1000BASE-BX10.
 1.51 09-Sep-2008  mhitch Add support for SerDes controllers; from the OpenBSD driver. Tested on a
Dell Blade server by me, and an HP Blade server by Havard.
 1.50 15-Jun-2008  christos branches: 1.50.2;
- Add more definitions from FreeBSD
- Add ifmedia_removeall from FreeBSD
 1.49 28-Apr-2008  martin branches: 1.49.2; 1.49.4;
Remove clause 3 and 4 from TNF licenses
 1.48 13-Feb-2008  skrll branches: 1.48.4; 1.48.6; 1.48.8; 1.48.10;
CARP is Common *Address* Redundancy Protocol
 1.47 10-Jan-2008  dyoung Add a helper subroutine for ethernet drivers, ifmedia_change().
 1.46 03-Jun-2006  ragge branches: 1.46.32; 1.46.38; 1.46.46;
Add IFM_10G_SR and IFM_10G_CX4, to keep in sync with FreeBSD.
Kindly requested by Gleb Smirnoff at FreeBSD.
 1.45 18-May-2006  liamjfoy branches: 1.45.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.44 08-Mar-2006  lukem branches: 1.44.2;
Use the SI capitalization for "Hz", "kHz", and "MHz" in comments and strings.
Add a space between numbers and Hz unit.
 1.43 10-Dec-2005  elad branches: 1.43.4; 1.43.6; 1.43.8; 1.43.10;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.42 11-Nov-2004  dsl branches: 1.42.12;
Add prototypes for functions that convert media to/from strings.
In libutil/if_media.c (moved from ifconfig.c so that they can be shared
in crunched binaries)
 1.41 16-Oct-2004  dsl Put spaces either side of '|' for clarity ('|' looks too much like 'I' or 'l')
 1.40 09-Apr-2004  thorpej De-__P'ify.
 1.39 09-Apr-2004  thorpej Add flow control-related media bits / descriptions.

From HITOSHI Osada.
 1.38 10-Mar-2004  keihan branches: 1.38.4;
Add 10GBASE-LR to IFM_10G_LR.
 1.37 19-Feb-2004  ragge Add media type 10GbaseLR. Change ifmb_baudrate and ifmedia_baudrate()
to u_quad_t instead of int (common speed today exceeds 2Gbit).
 1.36 13-Oct-2003  dyoung Add constants and strings for 802.11 radios with OFDM PHY (802.11a,
802.11g).

Add constants and strings for multi-mode devices (a/b/g).

From FreeBSD/Sam Leffler.
 1.35 08-Jul-2003  itojun prototype must not have variable name
 1.34 23-Apr-2003  bjh21 branches: 1.34.2;
Accept standard IEEE 802.3 names for Ethernet medium types.
Suggested by Christos, IIRC.
 1.33 25-Feb-2003  dyoung Add support for Prism monitor mode. From Kevin Lahey
<kml@patheticgeek.net>.

This patch does NOT add monitor mode support for the Lucent radios.

awi(4) was only modified for compatibility with the new mediaopt.
It does NOT support monitor mode.

Tested by Kevin, Daniel Carosone, and I.
 1.32 07-Nov-2002  thorpej Fix more signed/unsigned comparison warnings.
 1.31 07-Nov-2002  thorpej Make ifm_data unsigned.
 1.30 07-Nov-2002  thorpej Make media and mask unsigned.
 1.29 27-Sep-2002  onoe Change ifmb_baudrate for IFM_IEEE80211_DS5: 5Mbps -> 5.5Mbps
 1.28 21-Aug-2002  onoe Delete IFM_IEEE80211_IBSS [ibss] and IFM_IEEE80211_IBSSMASTER [ibss-master]
from media options, since IEEE80211_ADHOC [adhoc] is already defined for
IBSS. Instead, [ibss] is assigned as an alias for IEEE80211_ADHOC.
 1.27 10-Aug-2002  thorpej Add "hostap", "ibss", and "ibss-master" 802.11 media options.

From OpenBSD.
 1.26 30-Jun-2001  kleink branches: 1.26.2; 1.26.14;
Rename an IFM_1000_TX occurrence missed in previous.
 1.25 30-Jun-2001  bjh21 IFM_1000_TX -> IFM_1000_T, as (breifly) discussed on tech-net.
 1.24 31-May-2001  thorpej Add an Ethernet option bit for master mode (for 1000baseTX, the link
master provides the clock -- this is normally the switch, but if you
are doing back-to-back NICs, you need to tell one side to be the master).
 1.23 06-Mar-2000  thorpej branches: 1.23.6;
Add ifmedia_baudrate(), which returns a value suitable for ifi_baudrate
given a media word, or 0 for unknown.
 1.22 17-Feb-2000  sommerfeld More 802.11 subtypes: there's also 1MB/s DS
(the BayStack 660 firmware claims to support it).
 1.21 16-Feb-2000  thorpej Fix TMASK to use all 5 lower bits of the media word, and add HomePNA 1.0.
 1.20 26-Jan-2000  thorpej Add a way to delete all media for a specified instance.
 1.19 25-Jan-2000  thorpej IFM_1000_FX -> IFM_1000_SX, like it's supposed to be, and add a few
more gigabit Ethernet tyes from FreeBSD.
 1.18 25-Jan-2000  thorpej Define some convenience information and tables related to ifmedia status
bits for ifconfig(8).
 1.17 24-Jan-2000  augustss Fix a typo.
 1.16 23-Jan-2000  chopps add 802.11 media types
 1.15 03-Nov-1999  thorpej Make the ifmedia_entry list a TAILQ. This is pretty much for cosmetics
(media added to tail, so that when e.g. the list is run to print out
what media exist, they appear in-order).
 1.14 27-Oct-1999  thorpej Expose the ifmedia_match() function.
 1.13 23-Mar-1999  thorpej branches: 1.13.2; 1.13.8; 1.13.10; 1.13.12;
Add a new shared media option, IFM_FLOW, used to enable link-level
flow control. IEEE 802.3x is in mind, but this could be generally
useful for different types of media.
 1.12 02-Nov-1998  thorpej Add "10baseT-FDX" and "100baseTX-FDX" aliases to the end of the subtype
table. These are actually subtype+option combos, but these are the
strings displayed by the MII code to indicate 10Mbps full-duplex and
100Mbps full-duplex respectively, and it's Nice that ifconfig(8) can
grok them.
 1.11 12-Aug-1998  thorpej Oops, I forgot aliases for some old names (10baseT/UTP, 10base2/BNC,
and 10base5/AUI).
 1.10 08-Aug-1998  thorpej Define IFM_INST_MAX, the largest possible "instance" value.
 1.9 06-Aug-1998  thorpej Define the minimum and maximum "network type" values. These values are
incremented by the minimum to interate through them.
 1.8 06-Aug-1998  thorpej Add a macro to create a media word from type, subtype, options, and instance.
 1.7 06-Aug-1998  thorpej Completely rewrite the way media descriptions are represented. The same
data structure is used, but a much saner matching mechanism is used, one
which allows greater ease in adding new types.
 1.6 03-Aug-1998  thorpej Add IFM_10_FL - 10baseFL (fiber)
 1.5 30-Jan-1998  jtc Fix tipo
 1.4 30-Jan-1998  thorpej Add 1000baseFX and 10baseT/STP Ethernet media types.
 1.3 26-Mar-1997  thorpej Back out the previous change (add IFM_10_EXT) after some dicussion
w/ BSDI and Matt Thomas.
 1.2 24-Mar-1997  thorpej Add the IFM_10_EXT ("external") ethernet subtype, to accomodate cards
that have the notion of an "external media port". Suggested by
Matt Thomas <matt@3am-software.com>.
 1.1 17-Mar-1997  thorpej BSD/OS-style network interface media selection, implemented by
Jonathan Stone and myself. Many thanks to Matt Thomas for providing
the information necessary to implement this interface, and for helping
to shake out the bugs.
 1.13.12.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.10.1 15-Nov-1999  fvdl Sync with -current
 1.13.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.2.1 11-May-2000  he Pull up revisions 1.16-1.17,1.19,1.21-1.23 (requested by jhawk):
Add a driver for ``wi'', Lucent "Orinoco"/Wavelan.
 1.23.6.6 11-Nov-2002  nathanw Catch up to -current
 1.23.6.5 18-Oct-2002  nathanw Catch up to -current.
 1.23.6.4 27-Aug-2002  nathanw Catch up to -current.
 1.23.6.3 13-Aug-2002  nathanw Catch up to -current.
 1.23.6.2 24-Aug-2001  nathanw Catch up with -current.
 1.23.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.26.14.1 29-Aug-2002  gehenna catch up with -current.
 1.26.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.26.2.1 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.34.2.6 11-Dec-2005  christos Sync with head.
 1.34.2.5 14-Nov-2004  skrll Sync with HEAD.
 1.34.2.4 19-Oct-2004  skrll Sync with HEAD
 1.34.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.38.4.1 24-Jul-2005  snj Pull up revision 1.42 (requested by riz in ticket #5518):
Add prototypes for functions that convert media to/from strings.
In libutil/if_media.c (moved from ifconfig.c so that they can be shared
in crunched binaries)
 1.42.12.3 27-Feb-2008  yamt sync with head.
 1.42.12.2 21-Jan-2008  yamt sync with head
 1.42.12.1 21-Jun-2006  yamt sync with head.
 1.43.10.1 19-Apr-2006  elad sync with head.
 1.43.8.3 26-Jun-2006  yamt sync with head.
 1.43.8.2 24-May-2006  yamt sync with head.
 1.43.8.1 13-Mar-2006  yamt sync with head.
 1.43.6.3 03-Jun-2006  kardel Sync with head.
 1.43.6.2 01-Jun-2006  kardel Sync with head.
 1.43.6.1 22-Apr-2006  simonb Sync with head.
 1.43.4.1 09-Sep-2006  rpaulo sync with head
 1.44.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.45.2.1 19-Jun-2006  chap Sync with head.
 1.46.46.1 10-Jan-2008  bouyer Sync with HEAD
 1.46.38.1 18-Feb-2008  mjf Sync with HEAD.
 1.46.32.1 23-Mar-2008  matt sync with HEAD
 1.48.10.4 11-Mar-2010  yamt sync with head
 1.48.10.3 19-Aug-2009  yamt sync with head.
 1.48.10.2 04-May-2009  yamt sync with head.
 1.48.10.1 16-May-2008  yamt sync with head.
 1.48.8.2 17-Jun-2008  yamt sync with head.
 1.48.8.1 18-May-2008  yamt sync with head.
 1.48.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.48.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.48.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.48.4.1 23-Feb-2008  skrll Merge from FreeBSD.
 1.49.4.1 18-Jun-2008  simonb Sync with head.
 1.49.2.2 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.49.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.50.2.1 19-Oct-2008  haad Sync with HEAD.
 1.53.8.2 05-Mar-2011  bouyer Sync with HEAD
 1.53.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.53.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.53.4.1 05-Mar-2011  rmind sync with head
 1.55.14.2 03-Dec-2017  jdolecek update from HEAD
 1.55.14.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.55.4.1 30-Oct-2012  yamt sync with head
 1.56.18.1 04-Nov-2016  pgoyette Sync with HEAD
 1.56.14.2 28-Aug-2017  skrll Sync with HEAD
 1.56.14.1 05-Oct-2016  skrll Sync with HEAD
 1.57.8.2 21-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #366):
sys/net/if_media.h: revision 1.60
sys/net/if_media.h: revision 1.61
All Ethernet media more than 1000Mbps don't support half duplex.
For the convinience, ifconfig without "mediaopt fullduplex" sets IFM_FDX
automatically for those medias. Without this change, "ifconfig xxN mediaopt
10Gbase-T" (without "mediaopt fullduplex") returns EINVAL if a
driver doesn't call ifmedia_add() without IFM_FDX because ifmedia_match()
returns NULL.
Add 2.5GBASE-T and 5GBASE-T.
 1.57.8.1 04-Jul-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #82):
sys/net/if_media.h: revision 1.58
sys/net/if_media.h: revision 1.59
No functional change:
- Relocate definitions in the following order to be easy to understand.
0) IFM_*MASK
1) macros to extract various bits of information from the media word.
2) Media type.
3) Shared media sub-type.
4) Status bits.
5) Shared (global) options
6) Media dependent definitions.
7) kernel function declarations.
7) userland function declarations.
- Add comments.
This change makes me realize that:
0) RFU bit have never used.
1) bit 1..0 are shared between Shared media sub-type and Status bits.
It's little dangerous.
2) No. 5 of Media type is not used (hole).
3) Only IEEE80211 uses IFM_MMASK(IFM_MODE()) bits.
4) IFM_TOKEN's OMASK bits doesn't start from 0x00000100 but starts from
0x00000200. Is this for BSD/OS compatibility?
- Add some missing baudrate entries
- Add 1000BASE-KX and 2500BASE-KX
 1.61.4.4 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.61.4.3 08-Apr-2020  martin Merge changes from current as of 20200406
 1.61.4.2 10-Jun-2019  christos Sync with HEAD
 1.61.4.1 12-Jul-2018  phil State save. New kernel config for this brach only. TESTWIFI does
produce a kernel. It is not working. athn files not compiling yet
and commented out of the TESTWIFI kernel, which only has urtwn 802.11
driver enabled. ieee80211_alq.c and ieee80211_ddb.c not compiling yet.
 1.65.2.2 19-Mar-2020  martin Pull up following revision(s) (requested by msaitoh in ticket #785):

sys/net/if_media.h: revision 1.70

- Remove 50GBASE-LR10.

- Add the following medias:
- 25GBASE-ACC
- 100GBASE-ACC
- 100GBASE-AOC
- 100GBASE-FR
- 100GBASE-LR
- 200GBASE-ER4
- 400GBASE-ER8
- 400GBASE-FR4
- 400GBASE-LR4
- 400GBASE-SR4.2
- 400GBASE-SR8
 1.65.2.1 25-Feb-2020  martin Pull up following revision(s) (requested by mrg in ticket #717):

sys/dev/fdt/dwcmmc_fdt.c 1.11
sys/dev/ic/bwfm.c 1.15-1.18
sys/dev/ic/bwfmreg.h 1.4-1.6
sys/dev/ic/bwfmvar.h 1.4,1.5
sys/dev/ic/dwc_mmc.c 1.21,1.22
sys/dev/ic/dwc_mmc_reg.h 1.8,1.9,1.12,1.13
sys/dev/pcmcia/pcmciareg.h 1.11
sys/dev/sdmmc/if_bwfm_sdio.c 1.4,1.6-1.12
sys/dev/sdmmc/if_bwfm_sdio.h 1.1,1.2
sys/dev/sdmmc/sdhc.c 1.105,1.106
sys/dev/sdmmc/sdmmc.c 1.37,1.39
sys/dev/sdmmc/sdmmc_cis.c 1.6,1.8
sys/dev/sdmmc/sdmmc_io.c 1.15-1.19
sys/dev/sdmmc/sdmmc_ioreg.h 1.4,1.5
sys/dev/sdmmc/sdmmc_mem.c 1.69-1.71
sys/dev/sdmmc/sdmmcdevs 1.5-1.8
sys/dev/sdmmc/sdmmcvar.h 1.31,1.33,1.34
sys/net/if_media.h 1.66

Add Broadcom devices
-
Fix typo
-
add PCMCIA_CISTPL_SDIO definition.
-
From OpenBSD:
- move event handling to workqueue
- check for save/restore capability
-
Tag work queue as MPsafe and increase length.
-
Juse use bpf_mtap(), the 802.11 encapsulation is handled by firmware.
-
From OpenBSD:
- support block length per function
- add functions to read/write regions
-
Decode (but not use) SDIO tuple in CIS.
-
Fix locking.
-
Add more SDIO defines (partially from version 3.0).
-
From OpenBSD:
- All the missing pieces (firmware load, chip setup, protocol handling)
TX queue and interrupt handling via sdmmc_task.
-
Fix locking.
-
Fix packet parsing.
-
Add parser for original firmware config files.
-
tagging work queue as MPSAFE was premature. Revert.
-
SD_IO_RW_EXTENDED is a data transfer command, so set ADTC flag instead of AC
Use correct function to verify if a task has been queued. Avoids race
that can corrupt the task queue.
-
More register definitions.
-
Add IFM_IEEE80211_VHT subtype, IFM_IEEE80211_11AC operating mode, and missing descriptions
-
If firmware is connected in HT or VHT mode, report it to SIOCGIFMEDIA
-
white space police.

Skip setting power when the voltage doesn't change.
Also increase some timeouts.
-
Add and use sdmmc_pause to avoid long-term busy waits.
-
Add sdio abort function.
-
Additional error messages.
-
Print parameters for SDIO devices.
-
Minor cosmetics.
-
Simplyfy sdmmc_io_set_blocklen function signature by dropping the
extra softc pointer. Aligns with OpenBSD.
-
Missing commit for sdio abort function.
-
More code from OpenBSD
-
no need to splnet() when enqueing packets
-
explicit structure padding
-
make internal functions static
-
also prepare for GPIO interrupts.
-
Avoid warnings for tautological shifts as sole conditional.
-
Follow the Linux driver an use the FDT "compatible" property to build a
filename for the nvram config file, fall back to the standard filename.
E.g.
[Caching 123 nodes and 1093 properties]
compatible 73696e6f 766f6970 2c627069 2d6d322d "sinovoip,bpi-m2-
0010: 7a65726f 00...... ........ ........ zero"
0015: 616c6c77 696e6e65 722c7375 6e38692d "allwinner,sun8i-
0025: 68322d70 6c757300 ........ ........ h2-plus"
interrupt-parent 00000001 ........ ........ ........ ....
model 42616e61 6e612050 69204250 492d4d32 "Banana Pi BPI-M2
0010: 2d5a6572 6f00.... ........ ........ -Zero"
name 00...... ........ ........ ........ ""
serial-number 30326330 30303432 65636431 36376566 02c00042ecd167ef
0010: 00...... ........ ........ ........ .
-rw-r--r-- 1 root wheel 875 Nov 2 12:06 brcmfmac43430-sdio.AP6212.txt
lrwxr-xr-x 1 root wheel 29 Dec 30 16:19 brcmfmac43430-sdio.sinovoip,bpi-m2-zero.txt -> brcmfmac43430-sdio.AP6212.txt
-rw-r--r-- 1 root wheel 874 Jun 30 2019 brcmfmac43430-sdio.raspberrypi,3-model-b.txt
-rw-r--r-- 1 root wheel 1864 Jun 30 2019 brcmfmac43455-sdio.raspberrypi,3-model-b-plus.txt
lrwxr-xr-x 1 root wheel 29 Dec 30 11:24 brcmfmac43455-sdio.raspberrypi,4-model-b-plus.txt -> brcmfmac43455-sdio.raspberrypi,3-model-b-plus.txt
-
Add product ID for Broadcom BCM43455
-
Use correct firmware for BCM43456
-
size check was backwards.
-
Be less noisy for some commands.
-
Fix DWC_MMC_INT_SDIO_INT bit
-
dwc_mmc fixes:
- Rockchip uses a different SDIO int bit, so take this into consideration
- Avoid unnecessary resets and always wait for resets to complete
- kpause instead of delay while holding spinlock
- Do not attempt autostop for SD_IO_RW_EXTENDED commands
- Allow for sub-blklen byte counts for single block transfers
-
More SDIO stability and performance fixes
 1.68.2.1 29-Feb-2020  ad Sync with head.
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file if_mip.c was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file if_mip.h was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.1 07-Aug-2016  christos branches: 1.1.2; 1.1.4; 1.1.18;
modularize some more drivers and merge the module glue
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 07-Aug-2016  jdolecek file if_module.h was added on branch tls-maxphys on 2017-12-03 11:39:02 +0000
 1.1.4.2 05-Oct-2016  skrll Sync with HEAD
 1.1.4.1 07-Aug-2016  skrll file if_module.h was added on branch nick-nhusb on 2016-10-05 20:56:08 +0000
 1.1.2.2 14-Sep-2016  pgoyette Sync with HEAD
 1.1.2.1 07-Aug-2016  pgoyette file if_module.h was added on branch pgoyette-localcount on 2016-09-14 03:04:19 +0000
 1.41 03-Sep-2022  thorpej Machete-waving to fix mpls rump build after pktqueue changes.
 1.40 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.39 03-Sep-2022  thorpej Convert MPLS from a legacy netisr to pktqueue.
 1.38 29-Jul-2022  skrll No need to wrap the call to if_detach with splnet / splx as if_detach
raises spl as required.
 1.37 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.36 29-Jan-2020  thorpej branches: 1.36.10;
Adopt <net/if_stats.h>.
 1.35 27-Apr-2019  pgoyette branches: 1.35.4;
A few more empty-string --> NULL in required-modules lists
 1.34 26-Jun-2018  msaitoh branches: 1.34.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.33 19-Jan-2018  maxv branches: 1.33.2;
Several changes:

* Declare TRIM_LABEL as a function.

* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.

* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
 1.32 09-Dec-2017  maxv Kick MPLS packets earlier.
 1.31 08-Dec-2017  maxv Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
 1.30 23-Oct-2017  msaitoh If if_attach() failed in the attach function, free resources and return.
 1.29 12-Dec-2016  ozaki-r branches: 1.29.8;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.28 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.27 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.26 07-Jul-2016  msaitoh branches: 1.26.2;
KNF. Remove extra spaces. No functional change.
 1.25 20-Jun-2016  knakahara fix: kern/51259
 1.24 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.23 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.22 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.21 26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.20 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.19 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.18 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.17 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.16 17-Jul-2014  bouyer branches: 1.16.2; 1.16.4; 1.16.6; 1.16.10;
Make sure to call ifp->if_output() with KERNEL_LOCK held.
Should fix mpls-related atf tests.
 1.15 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.14 06-Jun-2014  rmind - Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.13 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.12 15-May-2014  msaitoh Put schednetisr(NETISR_IP) into splnet()/splx() pair.
This avoid extra ipintr() call with empty queue.
 1.11 25-Oct-2013  kefren branches: 1.11.2;
RFC3032 conformance for Router Alert Label
 1.10 23-Jul-2013  kefren Implement RFC4182 changes - switchable via sysctl
 1.9 15-Jul-2013  kefren branches: 1.9.2;
stop abusing kmem during softint context
 1.8 03-Jul-2011  kefren branches: 1.8.2; 1.8.8; 1.8.12; 1.8.14; 1.8.16; 1.8.22;
Avoid putting implicit null labels on the wire
 1.7 22-Jun-2011  kefren make LSE prepend the rest of the shims in they exist
 1.6 21-Jun-2011  kefren learn mpls interface how to prepend multiple shims by using a vector of
smpls_addrs in sockaddr_mpls. The number of smpls_addrs is found from
smpls_len. First label encountered is BoS.
XXX: need to do the same for LSE and this feature needs to be documented.
 1.5 17-Jun-2011  kefren teach loopback about MPLS. Prerequisite for MPLS tunnels
 1.4 16-Jun-2011  kefren use ETHERTYPE_MPLS only for unicast packets (RFC3032)
 1.3 27-Jun-2010  kefren branches: 1.3.2; 1.3.4; 1.3.6; 1.3.12;
Don't assume that rt_tag family is AF_MPLS but verify it.
This way rt_tag can be used for other future work also, not only MPLS
 1.2 26-Jun-2010  kefren Fix build for MPLS import: add options MPLS, changed pseudo-device mpls
to pseudo-device ifmpls
 1.1 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.3.12.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.3.6.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.3.6.1 27-Jun-2010  uebayasi file if_mpls.c was added on branch uebayasi-xip on 2010-08-17 06:47:44 +0000
 1.3.4.2 11-Aug-2010  yamt sync with head.
 1.3.4.1 27-Jun-2010  yamt file if_mpls.c was added on branch yamt-nfs-mp on 2010-08-11 22:54:54 +0000
 1.3.2.2 03-Jul-2010  rmind sync with head
 1.3.2.1 27-Jun-2010  rmind file if_mpls.c was added on branch rmind-uvmplock on 2010-07-03 01:19:59 +0000
 1.8.22.2 13-Mar-2018  snj Pull up following revision(s) (requested by uwe in ticket #1534):
sys/net/if_mpls.c: 1.31-1.33 via patch
sys/netmpls/mpls_ttl.c: 1.9 via patch
Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
--
Kick MPLS packets earlier.
--
Several changes:
* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.
* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
 1.8.22.1 30-Jul-2013  msaitoh Pull up following revision(s) (requested by kefren in ticket #921):
sys/net/if_mpls.c: revision 1.9
stop abusing kmem during softint context to prevent panic
 1.8.16.2 18-May-2014  rmind sync with head
 1.8.16.1 28-Aug-2013  rmind sync with head
 1.8.14.2 13-Mar-2018  snj Pull up following revision(s) (requested by uwe in ticket #1534):
sys/net/if_mpls.c: 1.31-1.33 via patch
sys/netmpls/mpls_ttl.c: 1.9 via patch
Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
--
Kick MPLS packets earlier.
--
Several changes:
* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.
* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
 1.8.14.1 30-Jul-2013  msaitoh Pull up following revision(s) (requested by kefren in ticket #921):
sys/net/if_mpls.c: revision 1.9
stop abusing kmem during softint context to prevent panic
 1.8.12.2 03-Dec-2017  jdolecek update from HEAD
 1.8.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.8.8.2 13-Mar-2018  snj Pull up following revision(s) (requested by uwe in ticket #1534):
sys/net/if_mpls.c: 1.31-1.33 via patch
sys/netmpls/mpls_ttl.c: 1.9 via patch
Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
--
Kick MPLS packets earlier.
--
Several changes:
* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.
* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
 1.8.8.1 30-Jul-2013  msaitoh Pull up following revision(s) (requested by kefren in ticket #921):
sys/net/if_mpls.c: revision 1.9
stop abusing kmem during softint context to prevent panic
 1.8.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.2.1 23-Jul-2013  riastradh sync with HEAD
 1.11.2.1 10-Aug-2014  tls Rebase.
 1.16.10.1 24-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1571):
sys/net/if_mpls.c: 1.31-1.33 via patch
sys/netmpls/mpls_ttl.c: 1.9
Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
--
Kick MPLS packets earlier.
--
Several changes:
* Declare TRIM_LABEL as a function.
* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.
* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
 1.16.6.1 24-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1571):
sys/net/if_mpls.c: 1.31-1.33 via patch
sys/netmpls/mpls_ttl.c: 1.9
Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
--
Kick MPLS packets earlier.
--
Several changes:
* Declare TRIM_LABEL as a function.
* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.
* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
 1.16.4.7 05-Feb-2017  skrll Sync with HEAD
 1.16.4.6 05-Oct-2016  skrll Sync with HEAD
 1.16.4.5 09-Jul-2016  skrll Sync with HEAD
 1.16.4.4 29-May-2016  skrll Sync with HEAD
 1.16.4.3 19-Mar-2016  skrll Sync with HEAD
 1.16.4.2 22-Sep-2015  skrll Sync with HEAD
 1.16.4.1 06-Jun-2015  skrll Sync with HEAD
 1.16.2.1 24-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #1571):
sys/net/if_mpls.c: 1.31-1.33 via patch
sys/netmpls/mpls_ttl.c: 1.9
Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
--
Kick MPLS packets earlier.
--
Several changes:
* Declare TRIM_LABEL as a function.
* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.
* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
 1.26.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.26.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.29.8.2 12-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #546):
sys/net/if_mpls.c: 1.31-1.33
sys/netmpls/mpls_ttl.c: 1.9-1.11
Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.
--
Kick MPLS packets earlier.
--
Several changes:
* Declare TRIM_LABEL as a function.
* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.
* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.
--
Style in MPLS.
--
Add XXX.
 1.29.8.1 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.33.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.34.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.34.2.1 10-Jun-2019  christos Sync with HEAD
 1.35.4.1 29-Feb-2020  ad Sync with head.
 1.36.10.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.1 26-Jun-2010  kefren branches: 1.1.2; 1.1.4; 1.1.6;
Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.1.6.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.1.6.1 26-Jun-2010  uebayasi file if_mpls.h was added on branch uebayasi-xip on 2010-08-17 06:47:44 +0000
 1.1.4.2 11-Aug-2010  yamt sync with head.
 1.1.4.1 26-Jun-2010  yamt file if_mpls.h was added on branch yamt-nfs-mp on 2010-08-11 22:54:54 +0000
 1.1.2.2 03-Jul-2010  rmind sync with head
 1.1.2.1 26-Jun-2010  rmind file if_mpls.h was added on branch rmind-uvmplock on 2010-07-03 01:19:59 +0000
 1.173 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.172 03-Sep-2022  thorpej branches: 1.172.8; 1.172.10;
Garbage-collect the remaining vestiges of netisr.
 1.171 27-Aug-2022  thorpej Ensure that all queues passed to ifq_enqueue2() have a valid ifq_lock.
 1.170 27-Aug-2022  thorpej Consistently use IFQ_SET_MAXLEN(), rather than open-coding it. NFC.
 1.169 06-Jul-2022  riastradh net/if_ppp.c: Avoid user-controlled overrun in PPPIOCSCOMPRESS.

Reported-by: syzbot+2c7bda7dc2b6c0d4f279@syzkaller.appspotmail.com
 1.168 06-Jul-2022  riastradh net/if_ppp.c: Sprinkle KNF. No functional change intended.
 1.167 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.166 20-Sep-2019  maxv branches: 1.166.2;
dedup
 1.165 25-Jun-2019  msaitoh Simplify "LIST_HEAD();" to make the code more understandable.
No functional change.
 1.164 25-Jan-2019  knakahara Add __cacheline_aligned to ppp softc list and its mutex just in case.
 1.163 11-Jan-2019  knakahara Fix missing splx in ppp_inproc().
 1.162 11-Jan-2019  knakahara Fix missing mutex_exit in ppp_create().
 1.161 26-Jun-2018  msaitoh branches: 1.161.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.160 25-Jun-2018  msaitoh Remove duplicated inclusion of net/bpf.h.
 1.159 17-Sep-2017  christos branches: 1.159.2;
Add one more not supported error
 1.158 02-Oct-2016  christos branches: 1.158.8;
MFREE -> m_free
 1.157 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.156 06-Aug-2016  pgoyette Destroy the mutex when detaching ppp. Otherwise on a re-attach (ie,
module reload) we can end up with a panic "lock already initialized"
 1.155 06-Aug-2016  christos make strip and slip modular, and cosmetic for ppp.
 1.154 06-Aug-2016  pgoyette Change the internal name of the module to match its external (file
system) name. Otherwise "bad things" can happen, such as modload(8)
being able to load a second copy!
 1.153 06-Aug-2016  pgoyette Modularize the ppp driver, and adjust dependencies of the compressor
modules.

For now, this is still included as a built-in module in GENERIC kernels.
 1.152 10-Jun-2016  ozaki-r branches: 1.152.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.151 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.150 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.149 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.148 20-Aug-2015  uebayasi Honor pseudo attach decl generated by config(1).
 1.147 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.146 01-Jul-2014  msaitoh branches: 1.146.4;
KNF. No functional change.
 1.145 30-Jun-2014  ozaki-r Cleanup ppp_inproc

- Remove unnecessary variable isr
- Use pktq instead of rv to switch between inet/inet6 and other protocols

ok msaitoh@ and rmind@
 1.144 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.143 17-May-2014  rmind ppp_inproc: handle non-IP protocols correctly (hi msaitoh); PR/48813.
 1.142 15-May-2014  msaitoh Save a NETISR_* value in a variable and call schednetisr() after enqueue
a packet for readability and future modification.
 1.141 18-Sep-2013  rmind branches: 1.141.2;
Add bpf_filter_ext() to use with BPF COP, restore bpf_filter() as it was
originally to preserve compatibility. Similarly, add bpf_validate_ext()
which takes bpf_ctx_t.
 1.140 30-Aug-2013  rmind bpf_filter: add a custom argument which can be passed to coprocessor routine.
 1.139 29-Aug-2013  rmind Implement BPF_COP/BPF_COPX instructions in the misc category (BPF_MISC)
which add a capability to call external functions in a predetermined way.

It can be thought as a BPF "coprocessor" -- a generic mechanism to offload
more complex packet inspection operations. There is no default coprocessor
and this functionality is not targeted to the /dev/bpf. This is primarily
targeted to the kernel subsystems, therefore there is no way to set a custom
coprocessor at the userlevel.

Discussed on: tech-net@
OK: core@
 1.138 25-Nov-2012  mbalmer branches: 1.138.2;
Don't check mp for NULL twice. From Michael W. Bomardieri <mb@il.net>
via tech-net@NetBSD.org. Thanks!
 1.137 11-Oct-2012  christos PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.136 28-Oct-2011  dyoung branches: 1.136.2; 1.136.8; 1.136.12;
For these interfaces, the implementation of SIOCSIFDSTADDR is identical
to SIOCINITIFADDR, and SIOCSIFDSTADDR callers always fall back to
SIOCINITIFADDR, so just get rid of the SIOCSIFDSTADDR case.
 1.135 28-Oct-2011  dyoung Don't kauth-orize SIOCSIFMTU in pppsioctl() and stf_ioctl(), ifioctl()
has already done that for us.
 1.134 07-Aug-2011  rmind Convert ppp_list_lock to mutex(9).
 1.133 02-Apr-2011  mbalmer Fix misplaced parenthesis. From henning.petersen@t-online.de, thanks.
 1.132 21-Aug-2010  pgoyette branches: 1.132.2;
Update the rest of the kernel to conform to the module subsystem's new
locking protocol.
 1.131 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.130 19-Jan-2010  pooka branches: 1.130.2; 1.130.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.129 15-Apr-2009  elad Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.
 1.128 19-Jan-2009  yamt branches: 1.128.2;
ppp_get_compressor: take module_lock when trying to load a module. PR/40428
 1.127 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.126 29-Nov-2008  cube Fix handling of ppp compressor modules, from Andrew Doran's input.
- ref count each compressor
- allow {un,}registration of several modules at once
- une RUN_ONCE to make sure the mutex is initialised, because
unfortunately built-in (and bootloader-loaded) modules init functions
are run before pseudo-devices attach (reported by Nick Hudson).
 1.125 25-Nov-2008  cube Rework the way PPP compmressors are handled and allow them to be
automatically loaded when needed.
 1.124 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.123 15-Jun-2008  christos branches: 1.123.2; 1.123.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.122 24-Apr-2008  ad branches: 1.122.2; 1.122.4; 1.122.6;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.121 07-Feb-2008  dyoung branches: 1.121.6; 1.121.8;
Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.120 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.119 19-Oct-2007  ad branches: 1.119.2; 1.119.8;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.118 08-Oct-2007  ad branches: 1.118.2;
Use the softint API.
 1.117 01-Sep-2007  dyoung branches: 1.117.2;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.116 14-Jul-2007  ad branches: 1.116.2; 1.116.6; 1.116.8;
Generic soft interrupts are mandatory.
 1.115 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.114 07-Mar-2007  liamjfoy branches: 1.114.2; 1.114.4;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.113 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.112 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.111 16-Nov-2006  christos branches: 1.111.4;
__unused removal on arguments; approved by core.
 1.110 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.109 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.108 23-Jul-2006  ad branches: 1.108.4; 1.108.6;
Use the LWP cached credentials where sane.
 1.107 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.106 14-May-2006  elad branches: 1.106.2;
integrate kauth.
 1.105 02-Jan-2006  yamt branches: 1.105.2; 1.105.4; 1.105.6; 1.105.8; 1.105.10;
ppp_dequeue: fix a mbuf leak/packet loss introduced by rev.1.104.
 1.104 28-Dec-2005  christos branches: 1.104.2;
PR/5901: Felix A. Croes: PPP fast queue blocks traffic at normal priority.
Applied fix, similar to the one suggested in the PR. We use a counter to
limit the number of consecutive packets accepted from the fast queue. This
number can be set via ioctl, but this has not been implemented. Since there
are only 2 queues other proposed solutions such as ALTQ are overkill and
they have not been implemented in the past 7 years. Now LCP echos can be
used to detect that the line is up.
 1.103 11-Dec-2005  thorpej ANSI function decls and application of static.
 1.102 27-Nov-2005  thorpej Overhaul how TTY line disciplines are handled:
- Replace references to linesw[0] with a ttyldisc_default() function
that returns the default ("termios") line discipline.
- The linesw[] array is gone, replaced by a linked list.
- ttyldisc_add() and ttyldisc_remove() have been replaced by
ttyldisc_attach() and ttyldisc_detach().
- Things that provide line disciplines are now responsible for
registering those disciplines with the system. The linesw
structures are no longer declared in tty_conf.c
- Line disciplines are now refcounted; a lookup causes a reference to
be held. ttyldisc_release() releases the reference. Attempts to
detach an in-use line discipline result in EBUSY.
- Fix function signature lossage in if_sl.c, if_strip.c, and tty_tb.c
that was masked by the old tty_conf.c
- tty_init() is no longer necessary; delete it and its call from main().
 1.101 29-May-2005  christos branches: 1.101.2; 1.101.8;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.100 17-May-2005  christos Yes, it was a cool trick >20 years ago to use "0123456789abcdef"[a] to
implement, xtoa(), but I think defining the samestring 50 times is a bit
too much. Defined HEXDIGITS and hexdigits in subr_prf.c and use it...
 1.99 31-Mar-2005  christos no point in assigning to ifq twice.
 1.98 31-Mar-2005  explorer Fix error with ifq not being set before use. Explicitly set it to NULL just before it may be set to the fastq, since if this becomes a loop (and in one case already is) this will always work. ifq_enqueue2() is designed to handle this case.
 1.97 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.96 26-Feb-2005  perry nuke trailing whitespace
 1.95 05-Dec-2004  christos branches: 1.95.4; 1.95.6;
pasto: simple_lock -> simple_unlock.
 1.94 05-Dec-2004  peter Don't forget to call bpfdetach in the clone destroy function.
While here, add a missing static and change some spaces to tabs.
 1.93 05-Dec-2004  he Fix an obvious typo: scf -> sc. Discovered while compiling for x68k.
 1.92 05-Dec-2004  christos void in arg prototype.
 1.91 05-Dec-2004  christos Make ppp a cloning device. Based on the work of Quentin Garnier.
 1.90 03-Jul-2004  dyoung I changed pppoutput to use M_PREPEND. pppoutput was duplicating
the functionality of M_PREPEND, but with a bug: m_pkthdr.len was
not updated in pppoutput as it is in M_PREPEND.

Also, replace the loop that measures the length of the mbuf chain
with a call to m_length.

This fixes a PR from an anonymous bug reporter. Thank you, anonymous
bug reporter. Thanks, Itojun, for bringing the anonymous bug report
to my attention.
 1.89 21-Apr-2004  itojun kill sprintf, use snprintf
 1.88 28-Oct-2003  mycroft Also, if we're going to bail, we should free the memory we just allocated...
 1.87 28-Oct-2003  mycroft Previous patch created a dead break.
 1.86 25-Oct-2003  christos Fix uninitialized variable warnings
 1.85 01-Sep-2003  christos Add a new ioctl PPPIOCGRAWIN to get the last characters we got from the
remote site.
 1.84 02-May-2003  itojun branches: 1.84.2;
KNF
 1.83 27-Mar-2003  christos PR/20844: Iain Hibbert: PPP Compressors cannot be loaded as LKM
 1.82 19-Jan-2003  simonb Remove variable that is only assigned too but not referenced.
 1.81 02-Oct-2002  itojun backout previous two - if you use ppp* interface, kernel panics instantly.
it is apparent that the change was untested, and severety is high.
 1.80 25-Sep-2002  augustss Remove unused variable so the file compiles again.
 1.79 25-Sep-2002  darrenr Keep m_pkthdr.len updated correctly and use it rather than a loop to find
out the total length of the packet.
 1.78 01-Jul-2002  itojun new copyright boilerplate from CMU. from openbsd
 1.77 12-May-2002  matt branches: 1.77.2;
Make ppp_softc[] extern and declare in if_ppp.c
 1.76 17-Mar-2002  atatat Convert ioctl code to use EPASSTHROUGH instead of -1 or ENOTTY for
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
 1.75 05-Mar-2002  itojun bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.74 14-Jan-2002  kleink Include <machine/intr.h> unconditionally, instead of only doing so if
__HAVE_GENERIC_SOFT_INTERRUPTS and relying on <sys/param.h> to provide it
otherwise; pointed out by Aymeric Vincent.
 1.73 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.72 12-Nov-2001  lukem add RCSIDs
 1.71 05-Aug-2001  jdolecek use unsigned variable types as appropriate
 1.70 18-Jul-2001  thorpej bzero -> memset
 1.69 14-Jun-2001  itojun branches: 1.69.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.68 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.67 17-Jan-2001  thorpej branches: 1.67.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.66 15-Jan-2001  thorpej For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.65 18-Dec-2000  thorpej Fill in if_dlt.
 1.64 18-Dec-2000  thorpej Add ALTQ support.
 1.63 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.62 26-Oct-2000  wiz Fix typo (thinko?), which didn't allow MRU to be set below the default
value (instead of the minimum value). Patch supplied by Stephen Degler
in PR #9945, and reviewed by Ignatios Souvatzis.
 1.61 06-Oct-2000  onoe unique #include opt_inet.h
 1.60 04-Oct-2000  itojun need opt_inet.h for #ifdef INET
 1.59 02-Oct-2000  itojun enable VJC only with INET
 1.58 30-Mar-2000  augustss branches: 1.58.4;
Kill some more register declarations.
 1.57 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.56 27-Nov-1999  hannken Fix typo introduced in rev. 1.55.
It caused IP6 packets to be sent as AF_UNSPEC instead of AF_INET6.
 1.55 30-Jul-1999  itojun branches: 1.55.2; 1.55.8;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.54 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.53 14-May-1999  tron Fix compilation problem caused by recent changes to filtering code.
 1.52 12-May-1999  thorpej Decouple inbound and outbound filters. Now instead of using "active-filter"
and "pass-filter" and "inbound" and "outbound" qualifiers in the filter
expression, use new "active-filter-in", "active-filter-out", "pass-filter-in",
and "pass-filter-out" without these qualifiers.

This is necessary due to the horrible, awful way "inbound" and "outbound"
were specified for the filter programs when a packet was passed through them.
Basically, the "address" byte in the serial PPP header was overwritten with
a value to indicate the direction. However, the "address" byte doesn't even
exist on PPP headers for all other PPP encaps! So, this old method worked
only for serial encaps, and corrupted packets for all others (PPPoE, ATM, etc.)
 1.51 11-May-1999  thorpej * Start out with a data link type of DLT_NULL. When we change an interface
to serial encap, change its data link type to DLT_PPP_SERIAL.
* Work around some serious bogosity in the filtering code which utterly
breaks proper functioning of BPF. The PPP code and pppd(8) WILL be changed
to fix this.
 1.50 09-Jan-1999  thorpej branches: 1.50.4; 1.50.6;
Use M_LINK{0,1} for our own mbuf flags, rather than arbitrarily picking
2 bits.
 1.49 10-Dec-1998  christos Revert IPX changes that I committed accidentally.
 1.48 10-Dec-1998  christos defopt
 1.47 03-Sep-1998  christos branches: 1.47.4;
PR/5414: Ronald Khoo: tcpdump ppp does not respect inbound/outbound qualifiers.
 1.46 02-Aug-1998  sommerfe Fix PR5898: ppp delays last packet.
 1.45 09-Jul-1998  thorpej Glue in fast forwarding.
 1.44 08-Jul-1998  sommerfe Only run pppasyncstart (sc->sc_start) from the netisr handler.
This allows pppoutput to be called from splimp (e.g., when ipflow is
in use.) without requiring pppasyncstart to run at splimp.
This is believed to fix PR5624.
 1.43 06-Jul-1998  jtk use #ifdef INET so this compiles again
 1.42 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.41 02-May-1998  christos Merge changes from pppd-2.3.4; adds ppp-deflate-draft stuff and updates
zlib. Maybe we can merge our other copy of zlib with this one now and
avoid having two copies?
 1.40 16-Jun-1997  christos From Paul Mackerras: use sl_compress_setup, not sl_compress_init
 1.39 17-May-1997  christos Update to ppp-2.3b5
 1.38 16-Apr-1997  is Made pppoutput() public again on behalf of Martin Husemann (PR 3455).
Apparently, the BISDN package uses this function.
 1.37 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.36 04-Mar-1997  mycroft Use splimp() to block interrupts, not splhigh().
 1.35 13-Oct-1996  christos branches: 1.35.4;
backout previous kprintf change
 1.34 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.33 13-Jun-1996  cgd no need for a local implementation of SIOCGIFMTU; delete it.
 1.32 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.31 07-May-1996  thorpej branches: 1.31.4;
Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.30 19-Mar-1996  paulus Make inclusion of the PPP BSD-Compress and Deflate compressors
dependent on the PPP_BSDCOMP and PPP_DEFLATE kernel configuration
options, respectively.
 1.29 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.28 13-Feb-1996  christos Net prototypes
 1.27 07-Feb-1996  pk wrt. previous change: can't compute `ilen' that early; just do computation
separately when logging.
Notes: consider using mbuf pkthdr length field in PPP code.
condider doing packet log after de-compression.
 1.26 07-Feb-1996  pk Init variable before use (PRs 1646 & 2042).
 1.25 27-Dec-1995  mycroft Remove old workaround for a bug.
 1.24 05-Oct-1995  mycroft Add some missing statistics. From Thorsten Lockert.
 1.23 12-Aug-1995  mycroft splnet --> splsoftnet
 1.22 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.21 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.20 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.19 27-Jul-1994  deraadt bug 367. paulus says the fix is right & critical.
 1.18 20-Jul-1994  paulus The ppp interface now retries the mbuf allocation when it receives
a character and it doesn't already have enough space allocated.
It still needs cluster mbufs to be able to decompress VJ-compressed
packets. It drops packets if it can't allocate mbufs rather than
taking the interface down.
 1.17 20-Jul-1994  paulus Due to popular revulsion, the ppp interface now drops packets if
 1.16 18-Jul-1994  paulus If we can't get mbuf clusters, take the ppp interface down instead
of writing beyond the end of ordinary mbufs.
 1.15 03-Jul-1994  deraadt branches: 1.15.2;
bug #319. fix from <alasdair@wildcat.demon.co.uk>
 1.14 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.13 20-Jun-1994  paulus Some restructuring of the PPP packet input procedure to make it easier
to implement PPP over sync lines and PPP compression protocols.
 1.12 14-Jun-1994  paulus some minor splx-type bug fixes from christos@deshaw.com.
 1.11 29-May-1994  paulus check for escaped char before checking for escape char
so if peer escapes 0x5d we interpret it correctly
 1.10 24-May-1994  cgd MIN -> min, MAX -> max
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 08-May-1994  paulus Version from ppp-2.1 release;
this version has been restructured to make more of the code usable
with sync serial drivers
 1.7 25-Jan-1994  deraadt PPP_HEADER_LEN -> PPP_HDRLEN
 1.6 23-Dec-1993  cgd include <machine/cpu.h> rather than <machine/mtpr.h> -- if the latter
exists at all, it's supposed to be included by <machine/cpu.h>
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 04-Nov-1993  paulus Removed test (CCOUNT(&sc->sc_ttyp->t_outq) == 0) for whether
to call pppstart or not: now we call pppstart for every packet,
which should aid recovery from lost transmitter interrupts.
Also a fix for 386BSD/FreeBSD which doesn't affect NetBSD.
 1.3 02-Sep-1993  paulus branches: 1.3.2;
Fixed bug in if_ppp.c so that received IP packets are passed correctly to BPF.
 1.2 31-Aug-1993  paulus Modified if_ppp.c and if_ppp.h to add priority queueing for "interactive"
traffic (done in a similar fashion to if_sl.c), and BPF support.
 1.1 14-Aug-1993  deraadt ppp from paul mackerras
 1.3.2.4 14-Nov-1993  mycroft Canonicalize all #includes.
 1.3.2.3 27-Oct-1993  mycroft Call pppstart() redundantly.
 1.3.2.2 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.3.2.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.15.2.2 28-Jul-1994  cgd from trunk
 1.15.2.1 20-Jul-1994  cgd update from trunk, to fix serious ppp lossage.
 1.31.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.35.4.1 12-Mar-1997  is Merge in changes from The Trunk
 1.47.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.50.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.50.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.50.4.3 02-Aug-1999  thorpej Update from trunk.
 1.50.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.50.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.55.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.55.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.55.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.55.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.55.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.55.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.55.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.58.4.1 16-Aug-2001  tv Pullup [jdolecek]:

sys/arch/amiga/dev/grf_cl.c 1.26
sys/arch/amiga/dev/grfioctl.h 1.14
sys/arch/hpcmips/dev/plumvideo.c 1.20
sys/arch/macppc/dev/ofb.c 1.25
sys/arch/sparc/dev/cgtwo.c 1.35
sys/arch/sparc/include/fbio.h patch
sys/arch/sparc64/include/fbio.h patch
sys/arch/sun3/dev/cg2.c 1.14
sys/arch/sun3/include/fbio.h patch
sys/dev/pci/tga.c 1.35
sys/dev/tc/cfb.c 1.28
sys/dev/tc/mfb.c 1.27
sys/dev/tc/sfb.c 1.46
sys/dev/tc/sfbplus.c 1.10 via patch
sys/dev/tc/tfb.c 1.30
sys/dev/tc/xcfb.c 1.23
sys/net/if_ppp.c 1.71

Use unsigned variable types to make bounds checking more correct.
 1.67.2.10 01-Aug-2002  nathanw Catch up to -current.
 1.67.2.9 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.67.2.8 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.67.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.67.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.67.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.67.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.67.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.67.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.67.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.69.2.8 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.69.2.7 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.69.2.6 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.69.2.5 16-Mar-2002  jdolecek Catch up with -current.
 1.69.2.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.69.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.69.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.69.2.1 03-Aug-2001  lukem update to -current
 1.77.2.1 15-Jul-2002  gehenna catch up with -current.
 1.84.2.8 11-Dec-2005  christos Sync with head.
 1.84.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.84.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.84.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.84.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.84.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.84.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.84.2.1 03-Aug-2004  skrll Sync with HEAD
 1.95.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.95.4.1 29-Apr-2005  kent sync with -current
 1.101.8.1 29-Nov-2005  yamt sync with head.
 1.101.2.7 11-Feb-2008  yamt sync with head.
 1.101.2.6 21-Jan-2008  yamt sync with head
 1.101.2.5 27-Oct-2007  yamt sync with head.
 1.101.2.4 03-Sep-2007  yamt sync with head.
 1.101.2.3 26-Feb-2007  yamt sync with head.
 1.101.2.2 30-Dec-2006  yamt sync with head.
 1.101.2.1 21-Jun-2006  yamt sync with head.
 1.104.2.1 15-Jan-2006  yamt sync with head.
 1.105.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.105.8.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.105.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.105.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.105.6.3 11-Aug-2006  yamt sync with head
 1.105.6.2 26-Jun-2006  yamt sync with head.
 1.105.6.1 24-May-2006  yamt sync with head.
 1.105.4.2 01-Jun-2006  kardel Sync with head.
 1.105.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.105.2.1 09-Sep-2006  rpaulo sync with head
 1.106.2.1 19-Jun-2006  chap Sync with head.
 1.108.6.2 10-Dec-2006  yamt sync with head.
 1.108.6.1 22-Oct-2006  yamt sync with head
 1.108.4.1 18-Nov-2006  ad Sync with head.
 1.111.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.111.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.114.4.1 11-Jul-2007  mjf Sync with head.
 1.114.2.6 23-Oct-2007  ad Sync with head.
 1.114.2.5 09-Oct-2007  ad Sync with head.
 1.114.2.4 15-Jul-2007  ad Sync with head.
 1.114.2.3 15-Jul-2007  ad Sync with head.
 1.114.2.2 01-Jul-2007  ad Adapt to callout API change.
 1.114.2.1 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.116.8.3 23-Mar-2008  matt sync with HEAD
 1.116.8.2 09-Jan-2008  matt sync with HEAD
 1.116.8.1 06-Nov-2007  matt sync with HEAD
 1.116.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.116.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.116.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.117.2.1 14-Oct-2007  yamt sync with head.
 1.118.2.1 25-Oct-2007  bouyer Sync with HEAD.
 1.119.8.1 08-Jan-2008  bouyer Sync with HEAD
 1.119.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.121.8.2 17-Jun-2008  yamt sync with head.
 1.121.8.1 18-May-2008  yamt sync with head.
 1.121.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.121.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.121.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.122.6.1 18-Jun-2008  simonb Sync with head.
 1.122.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.122.2.4 09-Oct-2010  yamt sync with head
 1.122.2.3 11-Aug-2010  yamt sync with head.
 1.122.2.2 11-Mar-2010  yamt sync with head
 1.122.2.1 04-May-2009  yamt sync with head.
 1.123.4.3 28-Apr-2009  skrll Sync with HEAD.
 1.123.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.123.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.123.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.128.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.130.4.3 21-Apr-2011  rmind sync with head
 1.130.4.2 05-Mar-2011  rmind sync with head
 1.130.4.1 30-May-2010  rmind sync with head
 1.130.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.130.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.132.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.136.12.4 03-Dec-2017  jdolecek update from HEAD
 1.136.12.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.136.12.2 25-Feb-2013  tls resync with head
 1.136.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.136.8.1 31-Oct-2012  riz Pull up following revision(s) (requested by christos in ticket #638):
sys/net/if_ppp.c: revision 1.137
sys/netinet6/ip6_flow.c: revision 1.20
sys/net/if_fddisubr.c: revision 1.82
sys/net/if_ethersubr.c: revision 1.192
sys/netinet6/in6_var.h: revision 1.66
sys/net/if_atmsubr.c: revision 1.50
PR/47058: Antti Kantee: If the ipv6 flow code modifies the mbuf, pass the
change up to the caller.
 1.136.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.136.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.136.2.1 30-Oct-2012  yamt sync with head
 1.138.2.1 18-May-2014  rmind sync with head
 1.141.2.1 10-Aug-2014  tls Rebase.
 1.146.4.6 05-Oct-2016  skrll Sync with HEAD
 1.146.4.5 09-Jul-2016  skrll Sync with HEAD
 1.146.4.4 29-May-2016  skrll Sync with HEAD
 1.146.4.3 22-Apr-2016  skrll Sync with HEAD
 1.146.4.2 22-Sep-2015  skrll Sync with HEAD
 1.146.4.1 06-Jun-2015  skrll Sync with HEAD
 1.152.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.158.8.2 11-Jan-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1162):

sys/net/if_ppp.c: revision 1.162
sys/net/if_ppp.c: revision 1.163

Fix missing mutex_exit in ppp_create().

Fix missing splx in ppp_inproc().
 1.158.8.1 26-Jul-2018  snj Pull up following revision(s) (requested by msaitoh in ticket #938):
sys/arch/acorn32/podulebus/if_ie.c: revision 1.41
sys/arch/amiga/dev/if_es.c: revision 1.58
sys/arch/amiga/dev/if_qn.c: revision 1.45
sys/arch/arm/at91/at91emac.c: revision 1.20
sys/arch/arm/ep93xx/epe.c: revision 1.37
sys/arch/emips/ebus/if_le_ebus.c: revision 1.14
sys/arch/emips/ebus/if_le_ebus.c: revision 1.15
sys/arch/mac68k/dev/if_mc.c: revision 1.46
sys/arch/macppc/dev/am79c950.c: revision 1.39
sys/arch/newsmips/apbus/if_sn.c: revision 1.40
sys/arch/next68k/dev/mb8795.c: revision 1.59
sys/arch/playstation2/dev/if_smap.c: revision 1.25
sys/arch/playstation2/dev/if_smap.c: revision 1.26
sys/arch/sun2/dev/if_ec.c: revision 1.28
sys/arch/sun3/dev/if_ie.c: revision 1.63
sys/arch/x68k/dev/if_ne_intio.c: revision 1.19
sys/arch/xen/xen/if_xennet_xenbus.c: revision 1.75
sys/arch/xen/xen/xennetback_xenbus.c: revision 1.63
sys/dev/bi/if_ni.c: revision 1.45
sys/dev/cadence/if_cemac.c: revision 1.12
sys/dev/ic/am7990.c: revision 1.78
sys/dev/ic/am79900.c: revision 1.27
sys/dev/ic/an.c: revision 1.67
sys/dev/ic/cs89x0.c: revision 1.40
sys/dev/ic/dm9000.c: revision 1.13
sys/dev/ic/dm9000.c: revision 1.14
sys/dev/ic/dp8390.c: revision 1.88
sys/dev/ic/elink3.c: revision 1.141
sys/dev/ic/elinkxl.c: revision 1.122
sys/dev/ic/hme.c: revision 1.98
sys/dev/ic/i82586.c: revision 1.77
sys/dev/ic/lance.c: revision 1.53
sys/dev/ic/mb86950.c: revision 1.27
sys/dev/ic/mb86960.c: revision 1.86
sys/dev/ic/mtd803.c: revision 1.34
sys/dev/ic/pdq_ifsubr.c: revision 1.59
sys/dev/ic/rrunner.c: revision 1.86
sys/dev/ic/seeq8005.c: revision 1.58
sys/dev/ic/sgec.c: revision 1.47
sys/dev/ic/smc90cx6.c: revision 1.72
sys/dev/ic/smc91cxx.c: revision 1.96
sys/dev/ic/tropic.c: revision 1.49
sys/dev/ic/wi.c: revision 1.245
sys/dev/isa/if_eg.c: revision 1.93
sys/dev/isa/if_el.c: revision 1.95
sys/dev/isa/if_iy.c: revision 1.101
sys/dev/ofw/ofnet.c: revision 1.58
sys/dev/pci/if_alc.c: revision 1.27
sys/dev/pci/if_de.c: revision 1.152
sys/dev/pci/if_fpa.c: revision 1.61
sys/dev/pci/if_jme.c: revision 1.34
sys/dev/pci/if_tl.c: revision 1.108
sys/dev/pci/if_vte.c: revision 1.19
sys/dev/pci/ixgbe/ixgbe.h: revision 1.50
sys/dev/pcmcia/if_cnw.c: revision 1.62
sys/dev/pcmcia/if_malo_pcmcia.c: revision 1.17
sys/dev/pcmcia/if_ray.c: revision 1.89
sys/dev/pcmcia/if_xi.c: revision 1.81
sys/dev/pcmcia/mhzc.c: revision 1.51
sys/dev/pcmcia/xirc.c: revision 1.34
sys/dev/qbus/if_de.c: revision 1.33
sys/dev/qbus/if_qe.c: revision 1.78
sys/dev/qbus/if_qt.c: revision 1.22
sys/dev/sbus/be.c: revision 1.87
sys/dev/sbus/qe.c: revision 1.68
sys/dev/scsipi/if_se.c: revision 1.96
sys/dev/usb/if_atu.c: revision 1.59
sys/net/if_l2tp.c: revision 1.28 via patch
sys/net/if_ppp.c: revision 1.160
It's not required to include net/bpfdesc.h. Remove it.
--
Simplify like other drivers. NULL check of ifp->if_bpf is done in
bpf_mtap(), so it's not required to do it here.
--
Remove duplicated inclusion of net/bpf.h.
--
Remove duplicated inclusion of net/bpf.h.
--
Simplify bpf_mtap() call. No functional change.
 1.159.2.4 26-Jan-2019  pgoyette Sync with HEAD
 1.159.2.3 18-Jan-2019  pgoyette Synch with HEAD
 1.159.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.159.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.161.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.161.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.161.2.1 10-Jun-2019  christos Sync with HEAD
 1.166.2.1 29-Feb-2020  ad Sync with head.
 1.172.10.1 02-Aug-2025  perseant Sync with HEAD
 1.172.8.2 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.172.8.1 15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.28 08-Jan-2025  christos Remove PPP_FILTER ifdef. it has been renamed in the new pppd and we should
not be hiding ioctls anyway.
 1.27 06-Sep-2015  dholland branches: 1.27.54;
More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.26 20-Aug-2015  uebayasi Honor pseudo attach decl generated by config(1).
 1.25 29-Nov-2008  cube branches: 1.25.26; 1.25.44;
Fix handling of ppp compressor modules, from Andrew Doran's input.
- ref count each compressor
- allow {un,}registration of several modules at once
- une RUN_ONCE to make sure the mutex is initialised, because
unfortunately built-in (and bootloader-loaded) modules init functions
are run before pseudo-devices attach (reported by Nick Hudson).
 1.24 25-Nov-2008  cube Rework the way PPP compmressors are handled and allow them to be
automatically loaded when needed.
 1.23 11-Dec-2005  thorpej branches: 1.23.70; 1.23.74; 1.23.80; 1.23.84;
ANSI function decls and application of static.
 1.22 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.21 04-Sep-2003  christos branches: 1.21.16;
bump the buffer size from 15 to 63 bytes.
 1.20 01-Sep-2003  christos Add a new ioctl PPPIOCGRAWIN to get the last characters we got from the
remote site.
 1.19 01-Jul-2002  itojun branches: 1.19.6;
new copyright boilerplate from CMU. from openbsd
 1.18 15-Jan-2001  thorpej branches: 1.18.2; 1.18.4; 1.18.16;
For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.17 25-Aug-1999  christos branches: 1.17.2;
changes from ppp-2.3.9 [synchronous]
 1.16 12-May-1999  thorpej Decouple inbound and outbound filters. Now instead of using "active-filter"
and "pass-filter" and "inbound" and "outbound" qualifiers in the filter
expression, use new "active-filter-in", "active-filter-out", "pass-filter-in",
and "pass-filter-out" without these qualifiers.

This is necessary due to the horrible, awful way "inbound" and "outbound"
were specified for the filter programs when a packet was passed through them.
Basically, the "address" byte in the serial PPP header was overwritten with
a value to indicate the direction. However, the "address" byte doesn't even
exist on PPP headers for all other PPP encaps! So, this old method worked
only for serial encaps, and corrupted packets for all others (PPPoE, ATM, etc.)
 1.15 09-Feb-1998  perry branches: 1.15.10;
add multiple inclusion protection (and cleanup).
 1.14 17-May-1997  christos Update to ppp-2.3b5
 1.13 16-Apr-1997  is Made pppoutput() public again on behalf of Martin Husemann (PR 3455).
Apparently, the BISDN package uses this function.
 1.12 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.11 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.10 13-Feb-1996  christos Net prototypes
 1.9 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.8 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.7 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.6 08-May-1994  paulus Version from ppp-2.1 release.
 1.5 25-Jan-1994  deraadt PPP_HEADER_LEN -> PPP_HDRLEN
 1.4 15-Jan-1994  deraadt multiple inclusion protection
 1.3 09-Nov-1993  glass T_LINEP member of struct tty becomes t_sc. This replaces the
#define t_sc T_LINEP
that appear in tty_tb.c, if_sl.c, and if_ppp.h
 1.2 31-Aug-1993  paulus branches: 1.2.2;
Modified if_ppp.c and if_ppp.h to add priority queueing for "interactive"
traffic (done in a similar fashion to if_sl.c), and BPF support.
 1.1 14-Aug-1993  deraadt ppp from paul mackerras
 1.2.2.1 14-Nov-1993  mycroft T_LINEP --> t_sc, from trunk.
 1.15.10.1 21-Jun-1999  thorpej Sync w/ -current.
 1.17.2.1 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.18.16.1 15-Jul-2002  gehenna catch up with -current.
 1.18.4.1 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.18.2.1 01-Aug-2002  nathanw Catch up to -current.
 1.19.6.4 11-Dec-2005  christos Sync with head.
 1.19.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.19.6.1 03-Aug-2004  skrll Sync with HEAD
 1.21.16.1 21-Jun-2006  yamt sync with head.
 1.23.84.1 19-Jan-2009  skrll Sync with HEAD.
 1.23.80.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.23.74.1 04-May-2009  yamt sync with head.
 1.23.70.1 17-Jan-2009  mjf Sync with HEAD.
 1.25.44.1 22-Sep-2015  skrll Sync with HEAD
 1.25.26.1 03-Dec-2017  jdolecek update from HEAD
 1.27.54.1 02-Aug-2025  perseant Sync with HEAD
 1.184 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.183 15-Aug-2022  knakahara branches: 1.183.10;
Fix stall on PPPOE_STATE_PADR_SENT, suggested by martin@n.o, thanks.

Just drop such large PADO frames, ok'ed by yamaguchi@n.o.

I left if_pppoe.c:r1.182 code because that fixes other issues such as
pppoe(4) stall in PPPOE_STATE_PADR_SENT when the PADO sender never
replys.
 1.182 12-Aug-2022  knakahara Fix stall on PPPOE_STATE_PADR_SENT when received specific PADO, ok'ed by yamaguchi@n.o.

When pppoe receives a PADO frame larger than mbuf cluster size,
pppoe_send_padr() fails forever. So, the pppoe interface stall
on PPPOE_STATE_PADR_SENT until ifconfig down/up.
It should retry from PADI in such case.
 1.181 23-May-2022  andvar s/controll/control/ in comments.
 1.180 10-May-2022  knakahara Zeroize the length explicitly when malloc failed. Pointed out by yamaguchi@n.o.
 1.179 04-May-2022  martin Do not allocate mbuf clusters when the caller (eroneously) asks
for more than MCLBYTES size, instead fail the allocation.

When we have received multiple PADO offer packets in the discovery
phase, do not combine tags from different packets. We are supposed
to pick one PADO packet and continue session establishment with that.

The second bug could cause code to trigger the first and create
invalid response packets and also overwrite data outside of
the allocated mbuf cluster.

Fixes CVE-2022-29867.
 1.178 11-Oct-2021  knakahara Make pktq_rps_hash() pluggable for each interface type. Reviewed by gdt@n.o, thorpej@n.o, and riastradh@n.o, thanks.
 1.177 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.176 19-May-2021  yamaguchi Added a kernel option to change the number of processing packets
at one pppoeintr()
 1.175 19-May-2021  yamaguchi Added a limitation of the number of processing packets
because a enqueuing process can not add packets over IFQ_MAXLEN

and removed reschedule at pppoeintr()
because it also scheduled at enqueuing process.
 1.174 18-May-2021  yamaguchi Added missing PPPOE_UNLOCK() on dropping PADS and PADT
 1.173 13-May-2021  yamaguchi Drop PADS and PADT from unknown host for safety
 1.172 13-May-2021  yamaguchi Change reconnect delay after PADT received (15 sec -> 5 sec)

5 sec is the same as minimum PADI resending interval
 1.171 13-May-2021  yamaguchi Accept a frame like a PADT just containing PPPoE header
 1.170 22-Apr-2021  yamaguchi branches: 1.170.2; 1.170.4;
Added missing free of sc_hunique to prevent memory leak
when using PPPoE server
 1.169 16-Apr-2021  yamaguchi Stop and destroy timeout after sppp_detach and if_detach
for safety

The functions may use resources of pppoe(4) while detaching,
so the release should move after it.
 1.168 16-Apr-2021  yamaguchi Remove unnecessaly lock holdings to avoid dead lock

The locks were held while callout_halt() and workqueue_wait()
without reason.
And the locks also were held at callout and workqueue handler
so that the handler kicked by those function couldn't acquire
the lock.

The reasons why those are unneccesary are:
- Items of callout_t are protected by callout_lock
- Items of struct workqueue and struct work are protected
by q_mutex in struct workqueue
- Items of struct sppp_work protected by atomic_cas(3)
- struct pppoe_softc does not free before workqueue_wait() and
callout_halt() even if the locks are not held
 1.167 16-Apr-2021  yamaguchi Stop ppp layer at first of destroying pppoe interface
 1.166 16-Apr-2021  yamaguchi Sort initialization sequence in pppoe_clone_create() out
for refactoring

It has no functionality impact
 1.165 16-Apr-2021  yamaguchi Use kmem_zalloc to allocate pppoe_softc
 1.164 16-Apr-2021  yamaguchi Move initialization of sc_lock in pppoe_softc to first

The lock may be held in callbacks for ppp layer or other
components so that it should be initialized early.
 1.163 16-Apr-2021  yamaguchi commonize error handling in pppoe_clone_create()
 1.162 13-Apr-2021  yamaguchi Reschedule softint to process packets enqueued to ppoediscinq
while doing pppoe_data_input

And added a empty check for ppoeinq, for safety
 1.161 13-Apr-2021  yamaguchi Added missing counter clear when a pppoe state changes to PADI_SENT
 1.160 13-Apr-2021  yamaguchi Added a NULL check for parent interface of pppoe
 1.159 13-Apr-2021  yamaguchi Hold the lock for pppoe while referencing sc_id
that is an item of struct pppoe_softc
 1.158 25-Nov-2020  yamaguchi branches: 1.158.2;
Fix to reconnect after PADT received
 1.157 25-Nov-2020  yamaguchi add a logging function used at debugging pppoe(4)
 1.156 25-Nov-2020  yamaguchi fix to remove trailing garbage
 1.155 25-Nov-2020  yamaguchi stop callout even when the state is in PPPOE_STATE_INITIAL
 1.154 25-Nov-2020  yamaguchi Close lcp when the lower layer down if the interface is passive or on-demand

reivewed by knakahara@n.o.
 1.153 25-Sep-2020  yamaguchi branches: 1.153.2;
Add a function to copy AC-Name and Service-Name
 1.152 25-Sep-2020  yamaguchi Clear AC-Name and Service-Name if params are not specified
 1.151 18-Sep-2020  yamaguchi Do pppoe_timeout() in thread context

OKed by knakahara@n.o
fix port-amd64/55661
 1.150 18-Sep-2020  yamaguchi Use callout_setfunc and callout_schedule
 1.149 10-Feb-2020  mlelstv safely extract character sequences from packet for printing.
 1.148 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.147 18-Mar-2019  msaitoh branches: 1.147.4; 1.147.6;
s/pakcet/packet/ in comment.
 1.146 27-Oct-2018  maxv Remove printfs that are too easily reachable, switch to M_REGION_GET,
and simplify the initialization. No real functional change.
 1.145 27-Oct-2018  maxv style
 1.144 30-Sep-2018  maxv remove hardcoded bullshit, probably fixes PR/53644
 1.143 24-Aug-2018  maxv Use a random hunique, instead of sending the pointer of the interface.
Tested via ATF.
 1.142 13-Aug-2018  maxv Clarify two functions.
 1.141 26-Jun-2018  msaitoh branches: 1.141.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.140 18-Jun-2018  yamaguchi Fix to aquire pppoe_softc_list_lock before read and write the list

ok by knakahara@n.o
 1.139 18-Jun-2018  yamaguchi Fix not to use PPPOE_UNLOCK before acccess to pppoe_softc
to avoid a race condition

According to the locking order of pppoe(4), the access to
pppoe_softc has to follow 5 steps as below.

1. aquire pppoe_softc_list_lock
2. aquire pppoe_softc lock
3. release pppoe_softc_list_lock
4. access to pppoe_softc
5. release pppoe_softc lock

However, pppoe_dispatch_disc_pkt() releases the lock of pppoe_softc
temporarily, and then re-aquires it before step 4 of the adove. So,
it is possible for other contexts to destroy a pppoe_softc in the
interim.
To fix this condition, avoid PPPOE_UNLOCK with the problem.

ok by knakahara@n.o
 1.138 25-May-2018  ozaki-r Ensure to call if_register after interface initializations finish
 1.137 03-May-2018  maxv Drop early if there's no PPPoE interface. Otherwise it is easy for someone
to flood dmesg over the local subnet.
 1.136 18-Apr-2018  knakahara Fix sending PADT to unexpected hosts when net.pppoe.term_unknown is enabled.
 1.135 18-Apr-2018  knakahara net.pppoe.term_unknown can be written safely now.
 1.134 12-Feb-2018  maxv branches: 1.134.2;
Use m_freem instead of m_free. Otherwise we're leaking the next mbufs in
the chain.
 1.133 07-Dec-2017  ozaki-r Remove wrong assertions

rw_lock_held() returns true when any context holds the lock. However, in
if_pppoe.c, the function was used wrongly as it returns true only if the lock is
held in the same context.

From s-yamaguchi@IIJ
 1.132 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.131 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.130 15-Nov-2017  knakahara Mark callouts of pppoe(4) CALLOUT_MPSAFE. Suggested by ozaki-r@n.o.
 1.129 23-Oct-2017  msaitoh - If if_initialize() failed in the attach function, free resources and return.
- KNF
 1.128 12-Oct-2017  knakahara sppp_lock is changed from mutex to rwlock now. Contributed by s-yamaguchi@IIJ.

Add locking notes later.
 1.127 12-Oct-2017  knakahara Integrate two locks used to protect PPPoE softc. Contributed by s-yamaguchi@IIJ.

PPPOE_SESSION_LOCK protects variables used in PPP packet
processing, on the other hand PPPOE_PARAM_LOCK protects
the other variables used to establish a PPPoE session id.

Those locks isn't acquired in the same time because the
PPP packet processing doesn't work without PPPoE session id.
By the reason, the locks can be integrated into PPPOE_LOCK.

Add locking notes later.
 1.126 20-Jul-2017  knakahara fix panic when PPPOE_DEBUG enabled. implemented by s-yamaguchi@IIJ, thanks.

XXX need pullup to -8 branch
 1.125 07-Feb-2017  ozaki-r branches: 1.125.6;
Use m_get_rcvif_psref instead of m_get_rcvif

Because the critical sections are now sleepable.

Reviewed by knakahara@
 1.124 01-Feb-2017  maxv Not sure what we are trying to achieve here, but there are two issues;
error can be printed while it is not initialized, and if m_pulldown fails
m is freed and reused.

Quickly reviewed by christos and martin
 1.123 27-Dec-2016  christos branches: 1.123.2;
fix merge conflict.
 1.122 26-Dec-2016  christos pfil(9) improvements to handle address changes:

Add:
PFIL_IFADDR call on interface reconfig (mbuf is ioctl #)
PFIL_IFNET call on interface attach/detach (mbuf is PFIL_IFNET_*)

from rmind@
 1.121 16-Dec-2016  knakahara fix unlock and splx inversion. Currently, this doesn't cause problem because either one is used.
 1.120 13-Dec-2016  knakahara MP-safe pppoe(4).

Nearly all parts is implemented by Shoichi YAMAGUCHI<s-yamaguchi@IIJ>, thanks.
 1.119 18-Nov-2016  knakahara if_register() must be called after ifp->if_dl initialized.

There may be similar problems. I will fix step by step...
 1.118 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.117 11-Aug-2016  christos kill unknown sessions ifdef, link set for sysctl.
 1.116 08-Aug-2016  roy Fix compile without modules.
 1.115 08-Aug-2016  pgoyette Don't try to set-up our sysctl sub-tree if we're built-in - this will
happen automatically (via "registration" of the setup function in a
link-set), and if we're not a module, the SYSCTL_SETUP_PROTO() will
not have declared a function prototype!
 1.114 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.113 07-Aug-2016  pgoyette For modular configurations, always build with PPPOE_TERM_UNKNOWN_SESSIONS
defined, and provide a sysctl variable for enabling/disabling the option.

Update man page accordingly.
 1.112 06-Aug-2016  pgoyette Modularize the pppoe driver
 1.111 07-Jul-2016  msaitoh branches: 1.111.2;
KNF. Remove extra spaces. No functional change.
 1.110 28-Jun-2016  ozaki-r Add missing NULL checks for m_get_rcvif_psref
 1.109 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.108 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.107 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.106 24-Apr-2016  christos CID 980057, 980058, use strlcpy()
 1.105 15-Apr-2016  ozaki-r Hide PPPoE variables from if_ethersubr.c

This improves modularity of if_pppoe.

From s-yamaguchi@IIJ
 1.104 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.103 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.102 18-Oct-2014  snj branches: 1.102.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.101 13-Sep-2013  martin Remove unused variable
 1.100 17-Jul-2013  oki if received PADT, get correct sc related with session id.
RFC2516 5.5 says, no tags required in PADT packet.
 1.99 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.98 05-Sep-2011  rjs branches: 1.98.2; 1.98.12; 1.98.16;
Add support for RFC 4638 to pppoe(4).

The change to if_spppsubr.c moves the test for whether LCP should
request a mru change until after the pppoe device has picked up the
mtu of the underlying ethernet device.
 1.97 30-Aug-2011  rjs Typo in comment.
 1.96 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.95 19-Jan-2010  pooka branches: 1.95.2; 1.95.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.94 19-Feb-2009  christos PR/40690: Jordan Gordeev: pppoe(4) doesn't work when PPPoE relays are present
Add support for sending the session id tag back.
 1.93 15-Oct-2008  scw branches: 1.93.2; 1.93.4; 1.93.8;
Reduce the scope of PPPoE session IDs from globally unique to per-interface
unique. Some brands of ADSL modems pick a hard-coded session ID which
would otherwise make it impossible to use two of them in the same system
simultaneously.
 1.92 19-Aug-2008  martin Simplify auth failure reconnect a bit and make it more similar to the
session establishment timeout handling.
 1.91 19-Aug-2008  simonb Fix a tyop in a comment and a few #define<tab> nits while here.
 1.90 18-Aug-2008  martin When upper layer asks us to re-establish a connection, don't do so
synchronously, but insert a (varying) delay. Before we have only been
decoupled from the peer via network latency - now we introduce some
explicit delay. This, at least, creates batter serialized debug output.

However, if we have to reconnect because of an authentication failure,
the peer may have just been unable to access it's radius server. (I have
a setup where this seems to happen every now and then, depending on time
of day.) Backoff reconnect in this cases seriously longer - this is better
than hitting the max-auth-failure limit within a few seconds.
 1.89 18-Aug-2008  martin Test and handle memory allocation failure for the access concentrator
cookie.
 1.88 08-Aug-2008  martin Apply patch from Yasuoka Masahiko in PR kern/39321: fix length check
when parsing pppoe discovery phase packets.
 1.87 15-Jun-2008  christos branches: 1.87.2;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.86 28-Apr-2008  martin branches: 1.86.2; 1.86.4;
Remove clause 3 and 4 from TNF licenses
 1.85 24-Apr-2008  ad branches: 1.85.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.84 20-Feb-2008  matt branches: 1.84.6; 1.84.8;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.83 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.82 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.81 08-Oct-2007  ad branches: 1.81.4; 1.81.6; 1.81.10;
Use the softint API.
 1.80 09-Sep-2007  martin branches: 1.80.2;
Print the access concentrator name when a session is established.
This seems to be usefull to identify peers with known broken firmware
(e.g. that can only do IPv4 reliably).
 1.79 09-Jul-2007  ad branches: 1.79.2; 1.79.6; 1.79.8;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.78 31-Mar-2007  martin caddr_t fallout (only visible with options PPPOE_SERVER)
 1.77 04-Mar-2007  christos branches: 1.77.2; 1.77.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.76 16-Nov-2006  christos branches: 1.76.2; 1.76.4; 1.76.8;
__unused removal on arguments; approved by core.
 1.75 01-Nov-2006  martin Do not truncate the last char from a remote error message
 1.74 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.73 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.72 30-Aug-2006  christos branches: 1.72.2; 1.72.4;
Fix initializers.
 1.71 05-Aug-2006  pavel defflag PPPOE_SERVER and PPPOE_TERM_UNKNOWN_SESSIONS.
 1.70 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.69 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.68 14-May-2006  elad branches: 1.68.2;
integrate kauth.
 1.67 27-Apr-2006  tron Adapt maximum MTU permitted on pppoe(4) interfaces to the MTU of the
connected ethernet interface.
 1.66 27-Apr-2006  tron Don't allow to connect a non ethernet interface to a PPPoE interface.
 1.65 15-Apr-2006  christos Don't try to free a NULL mbuf.
 1.64 31-Jan-2006  martin branches: 1.64.2; 1.64.4; 1.64.6; 1.64.8; 1.64.10;
Make sure error messages (received from the access concentrator) are
zero terminated.
 1.63 11-Dec-2005  thorpej branches: 1.63.2;
ANSI function decls and application of static.
 1.62 11-Dec-2005  christos merge ktrace-lwp.
 1.61 31-Aug-2005  martin Fix bogus uninitialized variable warning ifdef PPPOE_SERVER.
Noticed by Marcin Jessa on current-users.
 1.60 29-May-2005  christos branches: 1.60.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.59 26-Feb-2005  perry branches: 1.59.2; 1.59.4;
nuke trailing whitespace
 1.58 19-Jan-2005  martin branches: 1.58.2;
Remove deleted interfaces from the instance list (inspired by an OpenBSD
change). While there, fix a comment.
 1.57 08-Dec-2004  martin branches: 1.57.2;
Factor out softc cleanup after loss of session into pppoe_clear_softc.
Use this when loosing the ethernet interface (when it deataches).
Fixes PR kern/28375.
 1.56 04-Dec-2004  peter Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.55 28-Nov-2004  skrll Re-order the inclusion of opt_pfil_hooks so PFIL_HOOKS gets set properly.
 1.54 28-Nov-2004  martin Add a pfil(9) hook to get notified when interfaces detach.
When the ethernet interface of a pppoe pseudo-interface detaches, remove
the association and mark the pppoe interface down.
This should fix PR kern/28375.
 1.53 21-Apr-2004  itojun kill sprintf, use snprintf
 1.52 30-Mar-2004  oki fixed mbuf leak if up pppoe but not connected an ether i/f.
 1.51 28-Nov-2003  keihan branches: 1.51.4;
s/netbsd.org/NetBSD.org/g
 1.50 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.49 25-Oct-2003  christos Fix uninitialized variable warnings
 1.48 26-Sep-2003  wiz Process has only one c. From miod@openbsd.
 1.47 16-Sep-2003  martin Tell copyinstr about the real buffer size (not one byte to few). Add more
error checking. Noticed by Quentin Garnier.
 1.46 03-Sep-2003  martin If the peer cares to send us error messages, actually display them.
 1.45 23-Aug-2003  martin When trying to (re-)establish a session cope with intermediate output
failures of the underlying ethernet interface - just keep trying.
 1.44 27-Jun-2003  oki branches: 1.44.2;
Put correct dest ether address on PPPoE server mode.
 1.43 18-Jun-2003  oki Add support in-kernel PPPoE server.
This may work with one PPPoE session.
If you want to use it, #define PPPOE_SERVER in somewhere,
or add options PPPOE_SERVER in kernel config file.

This is experimental code, and good start point for future development.
 1.42 01-Mar-2003  martin Backout previous, I'm on crack obviously.
 1.41 01-Mar-2003  martin Initialize sc to NULL, it could be used uninitialized otherwise when
searching for our sc by host unique tag.
 1.40 01-Mar-2003  aymeric when looking up a Host-Uniq tag, do not consider NULL as a valid
(struct pppoe_softc *). Although we do not generate such tags, other hosts
could and some actually do.
 1.39 04-Feb-2003  martin PADT is always send with ethertype PPPOE_DISCOVERY, no matter if we
have reached session state or not.

Fixes PR kern/20203 by Shoichi Miyake.
 1.38 03-Feb-2003  thorpej Test callout_pending(), not callout_active(), and eliminate now-unnecessary
callout_deactivate() calls.
 1.37 07-Jan-2003  martin Fix broken error handling in case M_PREPEND fails.
Noticed by Matthias Scheeler.
 1.36 25-Dec-2002  martin In do not call pppoe_abort_connect if we fail to send the initial PADI
packet - there is nothing to abort.
In pppoe_abort_connect rearange state handling slightly to avoid calls
to the PPP LCP state machine get into an infinite recursion.

This should fix the symptoms of PR kern/19500, but does not touch the
real cause for the lossage described there.
 1.35 01-Sep-2002  martin Apply patch from Todd Vierling in PR kern/17665:

remove a test that has been obsoleted by the authentication failure
counter - enter slow retry mode always, not only if we already transfered
data successfully (the test was designed to disable retries when the
initial authentication setting was not correct, the auth failure counter
does this job better).
 1.34 01-Sep-2002  martin Add an option PPPOE_TERM_UNKNOWN_SESSIONS to forcefully disconnect sessions
we don't know anything about with a PADT packet.

Use with care, this is pretty dangerous and will kill all userland PPPoE
implementations. Therefore it is not enabled by default nor added as
a commented out option to GENERIC kernels.

But it is highly usefull if you have a fixed IP, an ISP that does not use
LCP echo requests for link monitoring and you want to recover quickly after
a crash or otherwise ungracefull disconnect.
 1.33 25-Aug-2002  tron Fix typo in a comment.
 1.32 22-Jun-2002  yamt - free buf when unneeded.
- pass a consistent type to free(9).
 1.31 22-Jun-2002  yamt fix loop condition.
(don't skip last tag)
 1.30 22-Jun-2002  itojun avoid unneeded call to m_pullup
 1.29 22-Jun-2002  itojun do not require PPPoE control packet to be put into a single mbuf.
reviewed/tested by ymmt
 1.28 22-Jun-2002  itojun more style
 1.27 22-Jun-2002  itojun style
 1.26 22-Jun-2002  itojun more KNF. warn about mbuf misuse (passing pointer outside of mbuf is dangerous)
 1.25 22-Jun-2002  itojun tabify. minor KNF
 1.24 14-Apr-2002  martin branches: 1.24.2; 1.24.4;
Fix copyright notice.
 1.23 04-Mar-2002  martin Avoid noise from the kernel if we have pseudo-device pppoe configured
but not used and a userland PPPoE pkg sends/receives PPPoE packets.
 1.22 24-Feb-2002  martin Clear M_BCAST and M_MCAST flags on mbufs before passing them down to the
ethernet driver - just in case it would look at them and do the wrong
thing.
 1.21 10-Feb-2002  martin Fix typo in comment.
 1.20 01-Feb-2002  martin Avoid any non-error output for normal operations, only print those
messages if the interface is set to debug.
 1.19 01-Feb-2002  martin Tweak the slow-but-persistent connection reestablishment timeout, retrying
is not realy expensive - do it once every minute.

Prevent the MTU from being set bigger than what we can handle.
 1.18 14-Jan-2002  kleink As discussed with Aymeric, <machine/intr.h> is always required, so don't
make its inclusion conditional.
 1.17 14-Jan-2002  aymeric Don't include machine/types.h (my fault in previous commit)
Reported by Klaus Klein.
 1.16 13-Jan-2002  aymeric include machine/types.h
include machine/intr.h if defined(__HAVE_GENERIC_SOFT_INTERRUPTS)
It makes this file compile for the amiga.
 1.15 04-Jan-2002  martin Move net/if_sppp.h to net/if_spppvar.h, create a new net/if_sppp.h
containing the userland visible thinks (i.e. ioctl definitions).

Remove all (both) old ioctls, as they had a brain dead API and made keeping
binary compatibility more or less impossible.

Replace by several new ioctls. While there, remove any arbitrary limits
(resulting from the old, broken ioctls) and allow any length of names
and passwords.
 1.14 16-Dec-2001  martin Cleanup softc more completely on "ifconfig down", but only if we are
currently in a connection reestablishement state.

The previouse (incomplete/unconditional) cleanup confused the state machine.
 1.13 16-Dec-2001  martin Fix packet accounting (now netstat -i and netstat -ib show reasonable
values).

Implement a secondary connection-reestablishement mode, which is only
entered after (1) we have successfully transfered payload data over this
connection and (2) if initial retries did not reestablish a session.
In this mode we retry (infrequently) forever, until adminstrator stops
us (by "ifconfig ppppoe0 down"). XXX - need to display this mode in
pppoectl.

It is now possible to pull the DSL modems plug for say 15 minutes, plug
it back in again and just wait. The connection will be reestablished within
three minutes.
 1.12 15-Dec-2001  martin Enable additional error messages for the discovery phase, clarify some
others. Change one timeout slightly - we need to make all others user
settable.
 1.11 10-Dec-2001  martin Enable active LCP keepalive handling in the PPP layer, the PPPoE layer
itself has no means to detect broken connections.
 1.10 10-Dec-2001  martin Now that everything works without LINK1 set, do not set it by default.
While here, remove an unnecessary splnet()/splx() pair.
 1.9 01-Dec-2001  martin Fail early when trying to identify a pppoe interface softc (from a
HOST UNIQUE token) and our list of interfaces is empty. Without this
test an unitinalized pointer may be dereferenced.
 1.8 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.7 12-Nov-2001  lukem add RCSIDs
 1.6 28-Oct-2001  martin Don't call if_alloc_sadl when creating the pppoe interface, it's called
from sppp_attach.
When destroying the interface, call sppp_detach for proper cleanup.
This avoids a crash from the slow timeout handler for no longer existing
interfaces (spotted by R�mi Zara).
 1.5 04-Sep-2001  martin branches: 1.5.4;
Make this interface cloning.
 1.4 24-Jun-2001  martin branches: 1.4.2;
Take into account the two byte PPP protocol discrimator following the PPPoE
header when calculating the MTU. Ooops...

Thanks to Mario Kemper for noting this.
 1.3 18-Jun-2001  martin branches: 1.3.2;
Protect interface queue manipulations by splnet(). Splsoftnet() is not
enough.
 1.2 14-Jun-2001  itojun change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.1 29-Apr-2001  martin Add an in-kernel PPPoE (ppp over ethernet, RFC 2516) implementation,
based on the existing net/if_spppsubr.c stuff.

While there are completely userland (bpf based) implementations available,
those have a vastly larger per packet overhead thus causing major CPU
overhead and higher latency. On an i386 base router, running a 486DX at 50MHz
my line (768kBit/s downstream) was limited to something (varying) between 10
and 20 kByte/s effective download rate. With this implementation I get full
bandwidth (~85kByte/s).

This is client side only. Arguably the right way to add full PPPoE support
(including server side) would be a variation of the ppp line discipline and
appropriate modifications to pppd. I promise every help I can give to anyone
doing that - but I needed this realy fast. Besids, on low memory NAT boxes
with typically a single PPPoE connection, this implementation is more
lightweight than a pppd based one, which nicely fits my needs.
 1.3.2.18 15-Jan-2003  thorpej Sync with HEAD.
 1.3.2.17 29-Dec-2002  thorpej Sync with HEAD.
 1.3.2.16 17-Sep-2002  nathanw Catch up to -current.
 1.3.2.15 27-Aug-2002  nathanw Catch up to -current.
 1.3.2.14 01-Aug-2002  nathanw Catch up to -current.
 1.3.2.13 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.3.2.12 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.3.2.11 17-Apr-2002  nathanw Catch up to -current.
 1.3.2.10 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.3.2.9 28-Feb-2002  nathanw Catch up to -current.
 1.3.2.8 11-Jan-2002  nathanw More catchup.
 1.3.2.7 09-Jan-2002  nathanw curproc ==> curproc->l_proc
 1.3.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.3.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.3.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.3.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.3.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.3.2.1 18-Jun-2001  nathanw file if_pppoe.c was added on branch nathanw_sa on 2001-06-21 20:08:10 +0000
 1.4.2.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.4.2.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.4.2.4 16-Mar-2002  jdolecek Catch up with -current.
 1.4.2.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.4.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.4.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.5.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.24.4.16 26-Aug-2003  tron Pull up revision 1.45 (requested by martin in ticket #1438):
When trying to (re-)establish a session cope with intermediate output
failures of the underlying ethernet interface - just keep trying.
 1.24.4.15 04-Mar-2003  jmc Pullup rev 1.40 (requested by aymeric in ticket #1187)
When looking up a Host-Uniq tag, do not consider NULL as a valid
(struct pppoe_softc *). Although we do not generate such tags,
other hosts could and some actually do.
 1.24.4.14 07-Feb-2003  tron Pull up revision 1.39 (requested by martin in ticket #1152):
PADT is always send with ethertype PPPOE_DISCOVERY, no matter if we
have reached session state or not.
Fixes PR kern/20203 by Shoichi Miyake.
 1.24.4.13 07-Feb-2003  tron Pull up revision 1.37 (requested by martin in ticket #1152):
Fix broken error handling in case M_PREPEND fails.
Noticed by Matthias Scheeler.
 1.24.4.12 07-Feb-2003  tron Pull up revision 1.36 (requested by martin in ticket #1152):
In do not call pppoe_abort_connect if we fail to send the initial PADI
packet - there is nothing to abort.
In pppoe_abort_connect rearange state handling slightly to avoid calls
to the PPP LCP state machine get into an infinite recursion.
This should fix the symptoms of PR kern/19500, but does not touch the
real cause for the lossage described there.
 1.24.4.11 07-Feb-2003  tron Pull up revision 1.35 (requested by martin in ticket #1152):
Apply patch from Todd Vierling in PR kern/17665:
remove a test that has been obsoleted by the authentication failure
counter - enter slow retry mode always, not only if we already transfered
data successfully (the test was designed to disable retries when the
initial authentication setting was not correct, the auth failure counter
does this job better).
 1.24.4.10 07-Feb-2003  tron Pull up revision 1.34 (requested by martin in ticket #1152):
Add an option PPPOE_TERM_UNKNOWN_SESSIONS to forcefully disconnect sessions
we don't know anything about with a PADT packet.
Use with care, this is pretty dangerous and will kill all userland PPPoE
implementations. Therefore it is not enabled by default nor added as
a commented out option to GENERIC kernels.
But it is highly usefull if you have a fixed IP, an ISP that does not use
LCP echo requests for link monitoring and you want to recover quickly after
a crash or otherwise ungracefull disconnect.
 1.24.4.9 07-Feb-2003  tron Pull up revision 1.33 (requested by martin in ticket #1152):
Fix typo in a comment.
 1.24.4.8 07-Feb-2003  tron Pull up revision 1.32 (requested by martin in ticket #1152):
- free buf when unneeded.
- pass a consistent type to free(9).
 1.24.4.7 07-Feb-2003  tron Pull up revision 1.31 (requested by martin in ticket #1152):
fix loop condition.
(don't skip last tag)
 1.24.4.6 07-Feb-2003  tron Pull up revision 1.30 (requested by martin in ticket #1152):
avoid unneeded call to m_pullup
 1.24.4.5 07-Feb-2003  tron Pull up revision 1.29 (requested by martin in ticket #1152):
do not require PPPoE control packet to be put into a single mbuf.
reviewed/tested by ymmt
 1.24.4.4 07-Feb-2003  tron Pull up revision 1.28 (requested by martin in ticket #1152):
more style
 1.24.4.3 07-Feb-2003  tron Pull up revision 1.27 (requested by martin in ticket #1152):
style
 1.24.4.2 07-Feb-2003  tron Pull up revision 1.26 (requested by martin in ticket #1152):
more KNF. warn about mbuf misuse (passing pointer outside of mbuf is dangerous)
 1.24.4.1 07-Feb-2003  tron Pull up revision 1.25 (requested by martin in ticket #1152):
tabify. minor KNF
 1.24.2.2 29-Aug-2002  gehenna catch up with -current.
 1.24.2.1 15-Jul-2002  gehenna catch up with -current.
 1.44.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.44.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.44.2.6 24-Jan-2005  skrll Sync with HEAD.
 1.44.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.44.2.4 29-Nov-2004  skrll Sync with HEAD.
 1.44.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.44.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.44.2.1 03-Aug-2004  skrll Sync with HEAD
 1.51.4.1 01-Feb-2006  tron Pull up following revision(s) (requested by martin in ticket #10239):
sys/net/if_pppoe.c: revision 1.64
Make sure error messages (received from the access concentrator) are
zero terminated.
 1.57.2.1 29-Apr-2005  kent sync with -current
 1.58.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.59.4.2 08-Aug-2008  jdc Pull up revision 1.88 via patch (requested by martin in ticket #1953).

Apply patch from Yasuoka Masahiko in PR kern/39321: fix length check
when parsing pppoe discovery phase packets.
 1.59.4.1 19-Nov-2006  bouyer Pull up following revision(s) (requested by martin in ticket #1588):
sys/net/if_pppoe.c: revision 1.61
Fix bogus uninitialized variable warning ifdef PPPOE_SERVER.
Noticed by Marcin Jessa on current-users.
 1.59.2.4 08-Aug-2008  jdc Pull up revision 1.88 (requested by martin in ticket #1953).

Apply patch from Yasuoka Masahiko in PR kern/39321: fix length check
when parsing pppoe discovery phase packets.
 1.59.2.3 19-Nov-2006  bouyer Pull up following revision(s) (requested by martin in ticket #1588):
sys/net/if_pppoe.c: revision 1.61
Fix bogus uninitialized variable warning ifdef PPPOE_SERVER.
Noticed by Marcin Jessa on current-users.
 1.59.2.2 03-May-2006  ghen branches: 1.59.2.2.2;
Pull up following revision(s) (requested by 1297):
sys/net/if_pppoe.c: revision 1.66
Don't allow to connect a non ethernet interface to a PPPoE interface.
 1.59.2.1 01-Feb-2006  tron Pull up following revision(s) (requested by martin in ticket #1152):
sys/net/if_pppoe.c: revision 1.64
Make sure error messages (received from the access concentrator) are
zero terminated.
 1.59.2.2.2.2 08-Aug-2008  jdc Pull up revision 1.88 (requested by martin in ticket #1953).

Apply patch from Yasuoka Masahiko in PR kern/39321: fix length check
when parsing pppoe discovery phase packets.
 1.59.2.2.2.1 19-Nov-2006  bouyer Pull up following revision(s) (requested by martin in ticket #1588):
sys/net/if_pppoe.c: revision 1.61
Fix bogus uninitialized variable warning ifdef PPPOE_SERVER.
Noticed by Marcin Jessa on current-users.
 1.60.2.7 27-Feb-2008  yamt sync with head.
 1.60.2.6 11-Feb-2008  yamt sync with head.
 1.60.2.5 21-Jan-2008  yamt sync with head
 1.60.2.4 27-Oct-2007  yamt sync with head.
 1.60.2.3 03-Sep-2007  yamt sync with head.
 1.60.2.2 30-Dec-2006  yamt sync with head.
 1.60.2.1 21-Jun-2006  yamt sync with head.
 1.63.2.1 01-Feb-2006  yamt sync with head.
 1.64.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.64.8.5 11-May-2006  elad sync with head
 1.64.8.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.64.8.3 19-Apr-2006  elad sync with head.
 1.64.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.64.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.64.6.4 03-Sep-2006  yamt sync with head.
 1.64.6.3 11-Aug-2006  yamt sync with head
 1.64.6.2 26-Jun-2006  yamt sync with head.
 1.64.6.1 24-May-2006  yamt sync with head.
 1.64.4.3 01-Jun-2006  kardel Sync with head.
 1.64.4.2 22-Apr-2006  simonb Sync with head.
 1.64.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.64.2.1 09-Sep-2006  rpaulo sync with head
 1.68.2.1 19-Jun-2006  chap Sync with head.
 1.72.4.2 10-Dec-2006  yamt sync with head.
 1.72.4.1 22-Oct-2006  yamt sync with head
 1.72.2.1 18-Nov-2006  ad Sync with head.
 1.76.8.2 04-Sep-2008  skrll Sync with netbsd-4.
 1.76.8.1 23-Sep-2007  wrstuden Sync with somewhat-recent netbsd-4.
 1.76.4.2 15-Apr-2007  yamt sync with head.
 1.76.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.76.2.4 16-Mar-2009  snj Pull up following revision(s) (requested by christos in ticket #1279):
sys/net/if_pppoe.c: revision 1.94 via patch
PR/40690: Jordan Gordeev: pppoe(4) doesn't work when PPPoE relays are present
Add support for sending the session id tag back.
 1.76.2.3 20-Aug-2008  bouyer Pull up following revision(s) (requested by martin in ticket #1186):
sys/net/if_pppoe.c: revision 1.89 - 1.92
Test and handle memory allocation failure for the access concentrator
cookie.
When upper layer asks us to re-establish a connection, don't do so
synchronously, but insert a (varying) delay. Before we have only been
decoupled from the peer via network latency - now we introduce some
explicit delay. This, at least, creates batter serialized debug output.
However, if we have to reconnect because of an authentication failure,
the peer may have just been unable to access it's radius server. (I have
a setup where this seems to happen every now and then, depending on time
of day.) Backoff reconnect in this cases seriously longer - this is better
than hitting the max-auth-failure limit within a few seconds.
Simplify auth failure reconnect a bit and make it more similar to the
session establishment timeout handling.
Fix a tyop in a comment and a few #define<tab> nits while here.
 1.76.2.2 08-Aug-2008  jdc ncvs ci src/sys/net/if_pppoe.c
Pull up revision 1.88 (requested by martin in ticket #1179).

Apply patch from Yasuoka Masahiko in PR kern/39321: fix length check
when parsing pppoe discovery phase packets.
 1.76.2.1 11-Sep-2007  xtraeme branches: 1.76.2.1.4;
Pull up following revision(s) (requested by martin in ticket #873):
sys/net/if_pppoe.c: revision 1.80 (via patch)

Print the access concentrator name when a session is established.
This seems to be usefull to identify peers with known broken firmware
(e.g. that can only do IPv4 reliably).
 1.76.2.1.4.1 08-Aug-2008  jdc ncvs ci src/sys/net/if_pppoe.c
Pull up revision 1.88 (requested by martin in ticket #1179).

Apply patch from Yasuoka Masahiko in PR kern/39321: fix length check
when parsing pppoe discovery phase packets.
 1.77.4.1 11-Jul-2007  mjf Sync with head.
 1.77.2.5 09-Oct-2007  ad Sync with head.
 1.77.2.4 15-Jul-2007  ad Sync with head.
 1.77.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.77.2.2 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.77.2.1 10-Apr-2007  ad Sync with head.
 1.79.8.3 23-Mar-2008  matt sync with HEAD
 1.79.8.2 09-Jan-2008  matt sync with HEAD
 1.79.8.1 06-Nov-2007  matt sync with HEAD
 1.79.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.79.6.1 02-Oct-2007  joerg Sync with HEAD.
 1.79.2.1 10-Sep-2007  skrll Sync with HEAD.
 1.80.2.1 14-Oct-2007  yamt sync with head.
 1.81.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.81.6.1 26-Dec-2007  ad Sync with head.
 1.81.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.84.8.2 17-Jun-2008  yamt sync with head.
 1.84.8.1 18-May-2008  yamt sync with head.
 1.84.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.84.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.84.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.84.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.85.2.4 11-Aug-2010  yamt sync with head.
 1.85.2.3 11-Mar-2010  yamt sync with head
 1.85.2.2 04-May-2009  yamt sync with head.
 1.85.2.1 16-May-2008  yamt sync with head.
 1.86.4.1 18-Jun-2008  simonb Sync with head.
 1.86.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.86.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.87.2.1 19-Oct-2008  haad Sync with HEAD.
 1.93.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.93.4.1 25-Feb-2009  snj Pull up following revision(s) (requested by christos in ticket #478):
sys/net/if_pppoe.c: revision 1.94
PR/40690: Jordan Gordeev: pppoe(4) doesn't work when PPPoE relays are present
Add support for sending the session id tag back.
 1.93.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.95.4.1 30-May-2010  rmind sync with head
 1.95.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.98.16.2 18-May-2014  rmind sync with head
 1.98.16.1 28-Aug-2013  rmind sync with head
 1.98.12.2 03-Dec-2017  jdolecek update from HEAD
 1.98.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.98.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.102.2.8 28-Aug-2017  skrll Sync with HEAD
 1.102.2.7 05-Feb-2017  skrll Sync with HEAD
 1.102.2.6 05-Dec-2016  skrll Sync with HEAD
 1.102.2.5 05-Oct-2016  skrll Sync with HEAD
 1.102.2.4 09-Jul-2016  skrll Sync with HEAD
 1.102.2.3 29-May-2016  skrll Sync with HEAD
 1.102.2.2 22-Apr-2016  skrll Sync with HEAD
 1.102.2.1 22-Sep-2015  skrll Sync with HEAD
 1.111.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.111.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.111.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.123.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.125.6.11 04-May-2022  sborrill Pull up the following revisions(s) (requested by martin in ticket #1740):
sys/net/if_pppoe.c: revision 1.179

pppoe(4): fix CVE-2022-29867 - discovery phase local network
mbuf corruption.
 1.125.6.10 13-Feb-2020  martin Pull up following revision(s) (requested by mlelstv in ticket #1505):

sys/net/if_pppoe.c: revision 1.149

safely extract character sequences from packet for printing.
 1.125.6.9 12-Jul-2018  martin Pull up following revision(s) (requested by yamaguchi in ticket #890):
sys/net/if_pppoe.c: revision 1.137
sys/net/if_pppoe.c: revision 1.139
sys/net/if_pppoe.c: revision 1.140
Drop early if there's no PPPoE interface. Otherwise it is easy for someone
to flood dmesg over the local subnet.
Fix not to use PPPOE_UNLOCK before acccess to pppoe_softc
to avoid a race condition
According to the locking order of pppoe(4), the access to
pppoe_softc has to follow 5 steps as below.
1. aquire pppoe_softc_list_lock
2. aquire pppoe_softc lock
3. release pppoe_softc_list_lock
4. access to pppoe_softc
5. release pppoe_softc lock
However, pppoe_dispatch_disc_pkt() releases the lock of pppoe_softc
temporarily, and then re-aquires it before step 4 of the adove. So,
it is possible for other contexts to destroy a pppoe_softc in the
interim.
To fix this condition, avoid PPPOE_UNLOCK with the problem.
ok by knakahara@n.o
Fix to aquire pppoe_softc_list_lock before read and write the list
ok by knakahara@n.o
 1.125.6.8 07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #843):

sys/dev/pci/ixgbe/ixv.c: revision 1.101
sys/net/if_bridge.c: revision 1.156
sys/net/if_pppoe.c: revision 1.138
sys/dev/pci/if_wm.c: revision 1.580
sys/dev/pci/ixgbe/ixgbe.c: revision 1.156
sys/net/if_gif.c: revision 1.142

Ensure to call if_register after interface initializations finish
 1.125.6.7 18-Apr-2018  martin Pull up following revision(s) (requested by knakahara in ticket #779):

sys/net/if_pppoe.c: revision 1.135,1.136

net.pppoe.term_unknown can be written safely now.

Fix sending PADT to unexpected hosts when net.pppoe.term_unknown is enabled.
 1.125.6.6 08-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #613):
sys/net/if_pppoe.c: revision 1.130,1.134
sys/net/if_spppsubr.c: revision 1.172,1.175,1.179
sys/net/if_gif.c: revision 1.138,1.139

Mark callouts of pppoe(4) CALLOUT_MPSAFE. Suggested by ozaki-r@n.o.

fix non-diagnostic compilation

Fix spl leak.
ifconfig gif0 create
ifconfig gif0 destroy
WARNING: SPL NOT LOWERED ON ...

Fix breaking character limit. Pointed out by ozaki-r@n.o, thanks.

Use m_freem instead of m_free. Otherwise we're leaking the next mbufs in
the chain.
 1.125.6.5 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.125.6.4 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.125.6.3 08-Dec-2017  msaitoh Pull up following revision(s) (requested by ozaki-r in ticket #431):
sys/net/if_pppoe.c: revision 1.133
Remove wrong assertions
rw_lock_held() returns true when any context holds the lock. However, in
if_pppoe.c, the function was used wrongly as it returns true only if the lock is
held in the same context.
From s-yamaguchi@IIJ
 1.125.6.2 02-Nov-2017  snj Pull up following revision(s) (requested by knakahara in ticket #332):
sys/net/if_pppoe.c: 1.127-1.128
sys/net/if_pppoe.h: 1.15
sys/net/if_spppsubr.c: 1.170-1.171
sys/net/if_spppvar.h: 1.21-1.22
Integrate two locks used to protect PPPoE softc. Contributed by s-yamaguchi@IIJ.
PPPOE_SESSION_LOCK protects variables used in PPP packet
processing, on the other hand PPPOE_PARAM_LOCK protects
the other variables used to establish a PPPoE session id.
Those locks isn't acquired in the same time because the
PPP packet processing doesn't work without PPPoE session id.
By the reason, the locks can be integrated into PPPOE_LOCK.
Add locking notes later.
--
sppp_lock is changed from mutex to rwlock now. Contributed by s-yamaguchi@IIJ.
Add locking notes later.
--
Add a locking notes for if_pppoe
--
Add a locking notes for if_spppsubr
--
fix no INET6 build.
 1.125.6.1 25-Jul-2017  snj Pull up following revision(s) (requested by knakahara in ticket #149):
sys/net/if_pppoe.c: revision 1.126
fix panic when PPPOE_DEBUG enabled. implemented by s-yamaguchi@IIJ, thanks.
 1.134.2.7 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.134.2.6 20-Oct-2018  pgoyette Sync with head
 1.134.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.134.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.134.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.134.2.2 21-May-2018  pgoyette Sync with HEAD
 1.134.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.141.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.141.2.1 10-Jun-2019  christos Sync with HEAD
 1.147.6.1 29-Feb-2020  ad Sync with head.
 1.147.4.2 04-May-2022  sborrill Pull up the following revisions(s) (requested by martin in ticket #1442):
sys/net/if_pppoe.c: revision 1.179

pppoe(4): fix CVE-2022-29867 - discovery phase local network
mbuf corruption.
 1.147.4.1 13-Feb-2020  martin Pull up following revision(s) (requested by mlelstv in ticket #708):

sys/net/if_pppoe.c: revision 1.149

safely extract character sequences from packet for printing.
 1.153.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.158.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.170.4.1 31-May-2021  cjep sync with head
 1.170.2.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.183.10.1 02-Aug-2025  perseant Sync with HEAD
 1.15 12-Oct-2017  knakahara Add a locking notes for if_pppoe
 1.14 31-May-2017  knakahara branches: 1.14.2;
add todo comment. pointed out by s-yamaguchi@IIJ
 1.13 15-Apr-2016  ozaki-r Hide PPPoE variables from if_ethersubr.c

This improves modularity of if_pppoe.

From s-yamaguchi@IIJ
 1.12 06-Sep-2015  dholland More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.11 28-Apr-2008  martin branches: 1.11.44; 1.11.64;
Remove clause 3 and 4 from TNF licenses
 1.10 14-Jul-2007  ad branches: 1.10.28; 1.10.30; 1.10.32;
Generic soft interrupts are mandatory.
 1.9 04-Mar-2007  christos branches: 1.9.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.8 10-Dec-2005  elad branches: 1.8.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.7 26-Jun-2005  christos branches: 1.7.2;
Names could be const.
 1.6 27-Apr-2005  martin As noted by Christophe Plasschaert on tech-kern, g/c never used
idletimeout configuration ioctls.
 1.5 28-Nov-2003  keihan branches: 1.5.8;
s/netbsd.org/NetBSD.org/g
 1.4 18-Jun-2003  oki branches: 1.4.2;
Add support in-kernel PPPoE server.
This may work with one PPPoE session.
If you want to use it, #define PPPOE_SERVER in somewhere,
or add options PPPOE_SERVER in kernel config file.

This is experimental code, and good start point for future development.
 1.3 14-Apr-2002  martin Fix copyright notice.
 1.2 04-Jan-2002  martin Move net/if_sppp.h to net/if_spppvar.h, create a new net/if_sppp.h
containing the userland visible thinks (i.e. ioctl definitions).

Remove all (both) old ioctls, as they had a brain dead API and made keeping
binary compatibility more or less impossible.

Replace by several new ioctls. While there, remove any arbitrary limits
(resulting from the old, broken ioctls) and allow any length of names
and passwords.
 1.1 29-Apr-2001  martin branches: 1.1.2; 1.1.4;
Add an in-kernel PPPoE (ppp over ethernet, RFC 2516) implementation,
based on the existing net/if_spppsubr.c stuff.

While there are completely userland (bpf based) implementations available,
those have a vastly larger per packet overhead thus causing major CPU
overhead and higher latency. On an i386 base router, running a 486DX at 50MHz
my line (768kBit/s downstream) was limited to something (varying) between 10
and 20 kByte/s effective download rate. With this implementation I get full
bandwidth (~85kByte/s).

This is client side only. Arguably the right way to add full PPPoE support
(including server side) would be a variation of the ppp line discipline and
appropriate modifications to pppd. I promise every help I can give to anyone
doing that - but I needed this realy fast. Besids, on low memory NAT boxes
with typically a single PPPoE connection, this implementation is more
lightweight than a pppd based one, which nicely fits my needs.
 1.1.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.2.5 17-Apr-2002  nathanw Catch up to -current.
 1.1.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.1.2.3 11-Jan-2002  nathanw More catchup.
 1.1.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.1.2.1 29-Apr-2001  nathanw file if_pppoe.h was added on branch nathanw_sa on 2001-06-21 20:08:11 +0000
 1.4.2.5 11-Dec-2005  christos Sync with head.
 1.4.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.4.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.4.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.8.1 29-Apr-2005  kent sync with -current
 1.7.2.2 03-Sep-2007  yamt sync with head.
 1.7.2.1 21-Jun-2006  yamt sync with head.
 1.8.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.9.2.1 15-Jul-2007  ad Sync with head.
 1.10.32.1 16-May-2008  yamt sync with head.
 1.10.30.1 18-May-2008  yamt sync with head.
 1.10.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.11.64.3 28-Aug-2017  skrll Sync with HEAD
 1.11.64.2 22-Apr-2016  skrll Sync with HEAD
 1.11.64.1 22-Sep-2015  skrll Sync with HEAD
 1.11.44.1 03-Dec-2017  jdolecek update from HEAD
 1.14.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by knakahara in ticket #332):
sys/net/if_pppoe.c: 1.127-1.128
sys/net/if_pppoe.h: 1.15
sys/net/if_spppsubr.c: 1.170-1.171
sys/net/if_spppvar.h: 1.21-1.22
Integrate two locks used to protect PPPoE softc. Contributed by s-yamaguchi@IIJ.
PPPOE_SESSION_LOCK protects variables used in PPP packet
processing, on the other hand PPPOE_PARAM_LOCK protects
the other variables used to establish a PPPoE session id.
Those locks isn't acquired in the same time because the
PPP packet processing doesn't work without PPPoE session id.
By the reason, the locks can be integrated into PPPOE_LOCK.
Add locking notes later.
--
sppp_lock is changed from mutex to rwlock now. Contributed by s-yamaguchi@IIJ.
Add locking notes later.
--
Add a locking notes for if_pppoe
--
Add a locking notes for if_spppsubr
--
fix no INET6 build.
 1.28 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.27 20-Feb-2008  matt branches: 1.27.54; 1.27.74;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.26 14-Jul-2007  ad branches: 1.26.8;
Generic soft interrupts are mandatory.
 1.25 04-Mar-2007  christos branches: 1.25.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.24 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.23 23-Jul-2006  ad branches: 1.23.10;
Use the LWP cached credentials where sane.
 1.22 28-Dec-2005  christos branches: 1.22.4; 1.22.8;
PR/5901: Felix A. Croes: PPP fast queue blocks traffic at normal priority.
Applied fix, similar to the one suggested in the PR. We use a counter to
limit the number of consecutive packets accepted from the fast queue. This
number can be set via ioctl, but this has not been implemented. Since there
are only 2 queues other proposed solutions such as ALTQ are overkill and
they have not been implemented in the past 7 years. Now LCP echos can be
used to detect that the line is up.
 1.21 11-Dec-2005  thorpej ANSI function decls and application of static.
 1.20 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.19 05-Dec-2004  christos branches: 1.19.12;
Make ppp a cloning device. Based on the work of Quentin Garnier.
 1.18 01-Sep-2003  christos Add a new ioctl PPPIOCGRAWIN to get the last characters we got from the
remote site.
 1.17 08-Jul-2003  itojun prototype must not have variable name
 1.16 13-Sep-2002  itojun branches: 1.16.6;
copyright clarification. from openbsd

1.
Paul Mackerras and the Australian National University have worked things
out, and as a result, Paul now owns copyright on all these files, with the
proper terms.

2.
and... we managed to contact "Eric Rosenquist" <eric@rosenquist.com> through
the help of people who found him: first one was nick.stott@cogeco.ca
This now has a better license. Two authors left to go.
 1.15 01-Jul-2002  itojun new copyright boilerplate from CMU. from openbsd
 1.14 12-May-2002  matt branches: 1.14.2;
Make ppp_softc[] extern and declare in if_ppp.c
 1.13 15-Jan-2001  thorpej branches: 1.13.2; 1.13.4; 1.13.6;
For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.12 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.11 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.10 30-Jul-1999  itojun branches: 1.10.2;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.9 12-May-1999  thorpej Decouple inbound and outbound filters. Now instead of using "active-filter"
and "pass-filter" and "inbound" and "outbound" qualifiers in the filter
expression, use new "active-filter-in", "active-filter-out", "pass-filter-in",
and "pass-filter-out" without these qualifiers.

This is necessary due to the horrible, awful way "inbound" and "outbound"
were specified for the filter programs when a packet was passed through them.
Basically, the "address" byte in the serial PPP header was overwritten with
a value to indicate the direction. However, the "address" byte doesn't even
exist on PPP headers for all other PPP encaps! So, this old method worked
only for serial encaps, and corrupted packets for all others (PPPoE, ATM, etc.)
 1.8 09-Feb-1998  perry branches: 1.8.6; 1.8.10;
add multiple inclusion protection (and cleanup).
 1.7 17-May-1997  christos Update to ppp-2.3b5
 1.6 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.5 03-Jan-1997  mikel hide softc array and kernel routine prototypes from userland; PR misc/3070
 1.4 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.3 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.2 04-Jul-1995  briggs Use the right prototype for pppioctl().
 1.1 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.8.10.2 02-Aug-1999  thorpej Update from trunk.
 1.8.10.1 21-Jun-1999  thorpej Sync w/ -current.
 1.8.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.10.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.10.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.10.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.6.1 13-Oct-2001  fvdl Revert the t_dev -> t_devvp change in struct tty. The way that tty
structs are currently used (especially by console ttys) aren't
ready for it, and this will require quite a few changes.
 1.13.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.13.4.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.13.4.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.2.3 17-Sep-2002  nathanw Catch up to -current.
 1.13.2.2 01-Aug-2002  nathanw Catch up to -current.
 1.13.2.1 20-Jun-2002  nathanw Catch up to -current.
 1.14.2.1 15-Jul-2002  gehenna catch up with -current.
 1.16.6.5 11-Dec-2005  christos Sync with head.
 1.16.6.4 18-Dec-2004  skrll Sync with HEAD.
 1.16.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.6.1 03-Aug-2004  skrll Sync with HEAD
 1.19.12.5 27-Feb-2008  yamt sync with head.
 1.19.12.4 03-Sep-2007  yamt sync with head.
 1.19.12.3 26-Feb-2007  yamt sync with head.
 1.19.12.2 30-Dec-2006  yamt sync with head.
 1.19.12.1 21-Jun-2006  yamt sync with head.
 1.22.8.1 11-Aug-2006  yamt sync with head
 1.22.4.1 09-Sep-2006  rpaulo sync with head
 1.23.10.2 12-Mar-2007  rmind Sync with HEAD.
 1.23.10.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.25.2.1 15-Jul-2007  ad Sync with head.
 1.26.8.1 23-Mar-2008  matt sync with HEAD
 1.27.74.1 29-May-2016  skrll Sync with HEAD
 1.27.54.1 03-Dec-2017  jdolecek update from HEAD
 1.136 26-Oct-2022  riastradh branches: 1.136.6;
sl(4): Convert to ttylock/ttyunlock.
 1.135 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.134 27-Aug-2022  thorpej Ensure that all queues passed to ifq_enqueue2() have a valid ifq_lock.
 1.133 27-Aug-2022  thorpej Use IFQ_SET_MAXLEN() rather than open-coding it.
 1.132 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.131 24-Jan-2019  knakahara branches: 1.131.6;
Add comments about D_MPSAFE to functions called as struct linesw.l_ioctl.
 1.130 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.129 20-Apr-2018  knakahara branches: 1.129.2;
SIOCSIFDSTADDR uses struct ifreq instead of struct ifaddr or struct in_aliasreq.

SIOCSIFDSTADDR is not used by base package commands...

I checked sys/net*/* only.
 1.128 13-Apr-2017  maya branches: 1.128.10;
if MGETHDR fails, don't try to copy to single mbuf and deref null.

reduce ifdefs.
 1.127 02-Oct-2016  christos branches: 1.127.2;
MFREE -> m_free
 1.126 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.125 06-Aug-2016  christos make strip and slip modular, and cosmetic for ppp.
 1.124 10-Jun-2016  ozaki-r branches: 1.124.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.123 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.122 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.121 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.120 20-Aug-2015  uebayasi Honor pseudo attach decl generated by config(1).
 1.119 05-Jun-2014  rmind branches: 1.119.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.118 23-Sep-2011  christos branches: 1.118.12; 1.118.26;
Change obsolete CBSIZE constant (48), to a power of two constant (64) that
is close enough to match the original assumptions.
 1.117 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.116 19-Jan-2010  pooka branches: 1.116.2; 1.116.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.115 07-May-2009  elad Introduce actions/requests to handle authorization for ppp(4), sl(4),
strip(4), btuart(4) and bcsp(4) network interfaces and devices.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/27/msg004955.html
 1.114 17-Dec-2008  cegger branches: 1.114.2;
kill MALLOC and FREE macros.
 1.113 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.112 15-Jun-2008  christos branches: 1.112.2; 1.112.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.111 24-Apr-2008  ad branches: 1.111.2; 1.111.4; 1.111.6;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.110 07-Feb-2008  dyoung branches: 1.110.6; 1.110.8;
Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.109 10-Nov-2007  ad Call ttyflush() with tty_lock held.
 1.108 08-Oct-2007  ad branches: 1.108.2; 1.108.4;
Use the softint API.
 1.107 01-Sep-2007  dyoung branches: 1.107.2;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.106 14-Jul-2007  ad branches: 1.106.2; 1.106.6; 1.106.8;
Generic soft interrupts are mandatory.
 1.105 04-Mar-2007  christos branches: 1.105.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.104 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.103 04-Jan-2007  elad branches: 1.103.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.102 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.101 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.100 23-Jul-2006  ad branches: 1.100.4; 1.100.6;
Use the LWP cached credentials where sane.
 1.99 08-Jul-2006  tsutsui KNF.
 1.98 07-Jun-2006  kardel branches: 1.98.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.97 14-May-2006  elad branches: 1.97.2;
integrate kauth.
 1.96 02-Mar-2006  christos branches: 1.96.2; 1.96.4; 1.96.6;
Provide ppp like statistics instead of grovelling through the kernel
symbols.
 1.95 11-Dec-2005  thorpej branches: 1.95.4; 1.95.6;
ANSI function decls and application of static.
 1.94 11-Dec-2005  christos merge ktrace-lwp.
 1.93 27-Nov-2005  thorpej Overhaul how TTY line disciplines are handled:
- Replace references to linesw[0] with a ttyldisc_default() function
that returns the default ("termios") line discipline.
- The linesw[] array is gone, replaced by a linked list.
- ttyldisc_add() and ttyldisc_remove() have been replaced by
ttyldisc_attach() and ttyldisc_detach().
- Things that provide line disciplines are now responsible for
registering those disciplines with the system. The linesw
structures are no longer declared in tty_conf.c
- Line disciplines are now refcounted; a lookup causes a reference to
be held. ttyldisc_release() releases the reference. Attempts to
detach an in-use line discipline result in EBUSY.
- Fix function signature lossage in if_sl.c, if_strip.c, and tty_tb.c
that was masked by the old tty_conf.c
- tty_init() is no longer necessary; delete it and its call from main().
 1.92 18-Aug-2005  yamt branches: 1.92.6;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.91 31-Mar-2005  christos branches: 1.91.2;
factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.90 06-Dec-2004  christos branches: 1.90.4;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.89 05-Dec-2004  peter Don't forget to call bpfdetach in the clone destroy function.
While here, add a missing static and change some spaces to tabs.
 1.88 05-Dec-2004  he Fix what must have been an omission: missing brace and a leftover
use of `i', which is no longer defined. Fixes build problem for ports
not defining __HAVE_GENERIC_SOFT_INTERRUPTS.
 1.87 05-Dec-2004  christos clonify strip and sl.
 1.86 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.85 21-Apr-2004  itojun kill sprintf, use snprintf
 1.84 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.83 01-May-2003  itojun branches: 1.83.2;
bpf_mtap() does not care about M_PKTHDR at the top. M_COPY_PKTHDR has some
consequences, so avoid it. if we need to attach dummy headers, we should
use M_PREPEND instead.
 1.82 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.81 11-Sep-2002  itojun KNF - return is not a function.
 1.80 17-Mar-2002  atatat Convert ioctl code to use EPASSTHROUGH instead of -1 or ENOTTY for
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
 1.79 14-Jan-2002  kleink Include <machine/intr.h> unconditionally, instead of only doing so if
__HAVE_GENERIC_SOFT_INTERRUPTS and relying on <sys/param.h> to provide it
otherwise; pointed out by Aymeric Vincent.
 1.78 12-Nov-2001  lukem add RCSIDs
 1.77 15-Jul-2001  martin branches: 1.77.2;
Fix slight glitch from rev. 1.70: bp is not adjusted for next loop after
outputting some data.
Fix provided by isaki@par.odn.ne.jp in PR kern/13472.
 1.76 14-Jun-2001  itojun branches: 1.76.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.75 30-May-2001  itojun fix mbuf leak due to meaningless MGETHDR. from niels provos
 1.74 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.73 31-Mar-2001  enami Remove unnecessary test of tp->t_linesw against NULL; they are results
of confusion while correcting compilation error after t_line is
replaced with t_linesw.
 1.72 17-Jan-2001  thorpej branches: 1.72.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.71 15-Jan-2001  thorpej For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.70 12-Jan-2001  thorpej After freeing the input buffer, set the pointer to it to NULL.
 1.69 12-Jan-2001  thorpej Don't use splimp() to block both net and tty interrupts. Instead,
block both interrupt levels as appropriate.
 1.68 11-Jan-2001  thorpej Plug a memory leak.
 1.67 11-Jan-2001  thorpej Defer output processing to the software interrupt.
 1.66 10-Jan-2001  thorpej Move the VJ uncompress code into the software interrupt.
 1.65 09-Jan-2001  thorpej Once we have a complete frame, schedule a SLIP software interrupt,
and manipulate ipintrq from there. This will allow us to clean up
the use of splimp() in this file later.
 1.64 09-Jan-2001  thorpej Make the buffer management in SLIP just a little less evil.
 1.63 18-Dec-2000  thorpej ALTQ'ify.
 1.62 18-Dec-2000  thorpej Fill in if_dlt.
 1.61 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.60 02-Nov-2000  itohy Set the default line discipline to t_linesw, rather than just NULL it.
 1.59 02-Nov-2000  itohy Adapt to the new line discipline scheme.
 1.58 12-Jul-2000  thorpej NetBSD -> __NetBSD__ in an #ifdef, and nuke sc_bpf; there's one in
the ifnet already.
 1.57 30-Mar-2000  augustss Kill some more register declarations.
 1.56 29-Mar-2000  simonb Don't need to include <sys/conf.h> here.
 1.55 27-Mar-1999  dbj branches: 1.55.8;
fixes to compile if NBPFILTER == 0
 1.54 25-Mar-1999  tron Make it possible to set MTU via "ifconfig" at run time. "SLMTU" is now
used to set the initial value.
 1.53 06-Oct-1998  kleink branches: 1.53.4;
Use #error instead of causing parse errors; noticed by Heiko.
 1.52 26-Aug-1998  mrg use __NetBSD__ not NetBSD
 1.51 06-Jul-1998  jtk use #ifdef INET so this compiles again
 1.50 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.49 23-Mar-1998  enami Add missing comma.
 1.48 23-Mar-1998  fair add the ability to run SLIP with CLOCAL set, per PR#3586
 1.47 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.46 17-May-1997  christos Update to ppp-2.3b5
 1.45 27-Mar-1997  thorpej Update for the new mbuf code, in a slighly kludgy way. Basically, these
drivers played a somewhat evil trick with clusters, which is now
replaced by a somewhat evil trick with regular malloc'd memory.
 1.44 13-Oct-1996  christos backout previous kprintf change
 1.43 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.42 09-Aug-1996  mrg use sc_unit instead of pointer arthimetic.
 1.41 10-Jul-1996  cgd print difference between pointers with %ld, so that -Wformat works
on the Alpha and for consistency. Also, other minor formatting cleanups.
 1.40 02-Jun-1996  thorpej Move a mis-placed line on slattach() so that SLIOCGUNIT works properly.
From Jonathan O'Brien <obrien@phoenix.sfsu.edu>.
 1.39 07-May-1996  thorpej branches: 1.39.4;
Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.38 13-Feb-1996  christos Net prototypes
 1.37 12-Aug-1995  mycroft splnet --> splsoftnet
 1.36 13-Jun-1995  mycroft There's no reason to set if_next here.
 1.35 21-Mar-1995  mycroft Update to use timer{add,sub}().
 1.34 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.33 11-Dec-1994  mycroft timevalsub --> __timersub
 1.32 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.31 16-Jul-1994  cgd use NetBSD (defined in param.h) not __NetBSD__ to allow x-compilation
with native compiler.
 1.30 15-Jul-1994  cgd kill bogus external declaration of time
 1.29 29-Jun-1994  cgd branches: 1.29.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.28 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.27 08-Mar-1994  cgd Some stability/safety/extensibility patches. Inspired by Christoph Badura.
Always make sure our buffer is large enough, and restart hung lines.
 1.26 10-Feb-1994  cgd mccanne convinced me that slip.h *should* exist. this is what
i "implemented" for 4.4, and the adjustments to the other files to
match.
 1.25 05-Feb-1994  mycroft Remove an #ifdef we no longer need.
 1.24 02-Feb-1994  hpeyerl Multicast is no longer optional
 1.23 08-Jan-1994  cgd quench the grammar flames!
 1.22 20-Dec-1993  cgd serious cleanup
 1.21 19-Dec-1993  cgd include machine/cpu.h, for machines which define soft interrupt stuff
there. marked XXX; they prolly shouldn't do that...
 1.20 18-Dec-1993  mycroft Canonicalize all #includes.
 1.19 10-Dec-1993  cgd move slip compression configuration into the interface flags,
and diddle a couple of related things.
 1.18 06-Dec-1993  hpeyerl multicast support.
From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.17 09-Nov-1993  glass T_LINEP member of struct tty becomes t_sc. This replaces the
#define t_sc T_LINEP
that appear in tty_tb.c, if_sl.c, and if_ppp.h
 1.16 31-Oct-1993  glass removed slip and ppp specific prototypes from tty.h where they didn't belong.
moved them to tty_conf.c within #if Nwhatever > 0 where they do belong.
made sure that if_sl.c, and if_ppp.c still compile quietly.
 1.15 02-Oct-1993  mycroft Call output routine redundantly to possible avoid some hangs due to missed
interrupts.
 1.14 02-Oct-1993  mycroft Ignore carrier if not using hardware carrier detect (i.e. CLOCAL is set).
 1.13 23-Sep-1993  mycroft Ignore TS_CARR_ON when CLOCAL is set.
 1.12 09-Aug-1993  deraadt branches: 1.12.2;
add an additional suser() check.
regular users should not be able to change slip interface characteristics!
 1.11 01-Aug-1993  mycroft Add RCS identifiers (this time on the correct side of the branch), and
incorporate recent changes in netbsd-0-9 branch.
 1.10 12-Jul-1993  mycroft Change tty code to use clist interface, but with ring buffer implementation.
Also, fix a couple of bugs in tty.c and pccons.c, and some gross kluginess
in the hp300 stuff.
 1.9 27-Jun-1993  andrew ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.8 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.7 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.6 10-May-1993  deraadt ring buffer now uses rbchar's (shorts) instead of chars.
 1.5 09-Apr-1993  cgd bump slip MTU back down to 296...
 1.4 25-Mar-1993  cgd one line got botched during bpf patch installation
 1.3 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.2 21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.12.2.9 10-Dec-1993  cgd update from trunk
 1.12.2.8 14-Nov-1993  mycroft T_LINEP --> t_sc, from trunk.
 1.12.2.7 14-Nov-1993  mycroft Canonicalize all #includes.
 1.12.2.6 03-Nov-1993  mycroft Add prototypes for slioctl(), sloutput(), and slstart(), to eliminate compiler
warnings.
 1.12.2.5 18-Oct-1993  mycroft Remove bogus declaration of ttrstrt(), as it is now in tty.h.
 1.12.2.4 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.12.2.3 06-Oct-1993  mycroft Merge changes from trunk.
 1.12.2.2 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.12.2.1 23-Sep-1993  mycroft Ignore TS_CARR_ON when CLOCAL is set.
 1.29.2.2 16-Jul-1994  cgd update from trunk
 1.29.2.1 15-Jul-1994  cgd updates from trunk. basically, C language errors.
 1.39.4.2 26-Jan-1997  rat Pullup 1.41 -> 1.42. Use sc_unit instead of pointer arthimetic.
 1.39.4.1 02-Jun-1996  thorpej Pull up mis-placed line fix from trunk.
 1.53.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.55.8.6 21-Apr-2001  bouyer Sync with HEAD
 1.55.8.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.55.8.4 05-Jan-2001  bouyer Sync with HEAD
 1.55.8.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.55.8.2 22-Nov-2000  bouyer Sync with HEAD.
 1.55.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.72.2.10 17-Sep-2002  nathanw Catch up to -current.
 1.72.2.9 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.72.2.8 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.72.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.72.2.6 28-Feb-2002  nathanw Catch up to -current.
 1.72.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.72.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.72.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.72.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.72.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.76.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.76.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.76.2.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.76.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.76.2.1 03-Aug-2001  lukem update to -current
 1.77.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.83.2.8 11-Dec-2005  christos Sync with head.
 1.83.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.83.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.83.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.83.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.83.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.83.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.83.2.1 03-Aug-2004  skrll Sync with HEAD
 1.90.4.1 29-Apr-2005  kent sync with -current
 1.91.2.7 11-Feb-2008  yamt sync with head.
 1.91.2.6 15-Nov-2007  yamt sync with head.
 1.91.2.5 27-Oct-2007  yamt sync with head.
 1.91.2.4 03-Sep-2007  yamt sync with head.
 1.91.2.3 26-Feb-2007  yamt sync with head.
 1.91.2.2 30-Dec-2006  yamt sync with head.
 1.91.2.1 21-Jun-2006  yamt sync with head.
 1.92.6.1 29-Nov-2005  yamt sync with head.
 1.95.6.3 01-Jun-2006  kardel Sync with head.
 1.95.6.2 22-Apr-2006  simonb Sync with head.
 1.95.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.95.4.1 09-Sep-2006  rpaulo sync with head
 1.96.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.96.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.96.4.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.96.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.96.2.3 11-Aug-2006  yamt sync with head
 1.96.2.2 26-Jun-2006  yamt sync with head.
 1.96.2.1 24-May-2006  yamt sync with head.
 1.97.2.1 19-Jun-2006  chap Sync with head.
 1.98.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.100.6.2 10-Dec-2006  yamt sync with head.
 1.100.6.1 22-Oct-2006  yamt sync with head
 1.100.4.2 12-Jan-2007  ad Sync with head.
 1.100.4.1 18-Nov-2006  ad Sync with head.
 1.103.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.103.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.105.2.4 09-Oct-2007  ad Sync with head.
 1.105.2.3 15-Jul-2007  ad Sync with head.
 1.105.2.2 15-Jul-2007  ad Sync with head.
 1.105.2.1 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.106.8.3 23-Mar-2008  matt sync with HEAD
 1.106.8.2 09-Jan-2008  matt sync with HEAD
 1.106.8.1 06-Nov-2007  matt sync with HEAD
 1.106.6.3 11-Nov-2007  joerg Sync with HEAD.
 1.106.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.106.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.106.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.107.2.1 14-Oct-2007  yamt sync with head.
 1.108.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.108.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.108.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.110.8.2 17-Jun-2008  yamt sync with head.
 1.110.8.1 18-May-2008  yamt sync with head.
 1.110.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.110.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.110.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.111.6.1 18-Jun-2008  simonb Sync with head.
 1.111.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.111.2.4 11-Aug-2010  yamt sync with head.
 1.111.2.3 11-Mar-2010  yamt sync with head
 1.111.2.2 16-May-2009  yamt sync with head
 1.111.2.1 04-May-2009  yamt sync with head.
 1.112.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.112.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.114.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.116.4.1 30-May-2010  rmind sync with head
 1.116.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.118.26.1 10-Aug-2014  tls Rebase.
 1.118.12.2 03-Dec-2017  jdolecek update from HEAD
 1.118.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.119.4.6 28-Aug-2017  skrll Sync with HEAD
 1.119.4.5 05-Oct-2016  skrll Sync with HEAD
 1.119.4.4 09-Jul-2016  skrll Sync with HEAD
 1.119.4.3 29-May-2016  skrll Sync with HEAD
 1.119.4.2 22-Apr-2016  skrll Sync with HEAD
 1.119.4.1 22-Sep-2015  skrll Sync with HEAD
 1.124.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.124.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.127.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.128.10.3 26-Jan-2019  pgoyette Sync with HEAD
 1.128.10.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.128.10.1 22-Apr-2018  pgoyette Sync with HEAD
 1.129.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.129.2.1 10-Jun-2019  christos Sync with HEAD
 1.131.6.1 29-Feb-2020  ad Sync with head.
 1.136.6.2 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.136.6.1 15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.35 12-Dec-2021  andvar fix various typos, mainly in comments.
 1.34 11-Jul-2019  msaitoh Fix typo (s/supress/suppress/).
 1.33 14-Jul-2007  ad branches: 1.33.122;
Generic soft interrupts are mandatory.
 1.32 07-Jun-2006  kardel branches: 1.32.16;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.31 11-Dec-2005  thorpej branches: 1.31.4; 1.31.6; 1.31.8; 1.31.14;
ANSI function decls and application of static.
 1.30 11-Dec-2005  christos merge ktrace-lwp.
 1.29 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.28 27-Nov-2005  thorpej Overhaul how TTY line disciplines are handled:
- Replace references to linesw[0] with a ttyldisc_default() function
that returns the default ("termios") line discipline.
- The linesw[] array is gone, replaced by a linked list.
- ttyldisc_add() and ttyldisc_remove() have been replaced by
ttyldisc_attach() and ttyldisc_detach().
- Things that provide line disciplines are now responsible for
registering those disciplines with the system. The linesw
structures are no longer declared in tty_conf.c
- Line disciplines are now refcounted; a lookup causes a reference to
be held. ttyldisc_release() releases the reference. Attempts to
detach an in-use line discipline result in EBUSY.
- Fix function signature lossage in if_sl.c, if_strip.c, and tty_tb.c
that was masked by the old tty_conf.c
- tty_init() is no longer necessary; delete it and its call from main().
 1.27 26-Feb-2005  perry branches: 1.27.4; 1.27.10;
nuke trailing whitespace
 1.26 05-Dec-2004  christos branches: 1.26.4; 1.26.6;
clonify strip and sl.
 1.25 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.24 14-Jun-2001  itojun branches: 1.24.4; 1.24.22;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.23 15-Jan-2001  thorpej branches: 1.23.2;
For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.22 09-Jan-2001  thorpej Once we have a complete frame, schedule a SLIP software interrupt,
and manipulate ipintrq from there. This will allow us to clean up
the use of splimp() in this file later.
 1.21 09-Jan-2001  thorpej Make the buffer management in SLIP just a little less evil.
 1.20 12-Jul-2000  thorpej NetBSD -> __NetBSD__ in an #ifdef, and nuke sc_bpf; there's one in
the ifnet already.
 1.19 01-Mar-1998  fvdl branches: 1.19.6; 1.19.14;
Merge with Lite2 + local changes
 1.18 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.17 27-Mar-1997  thorpej Update for the new mbuf code, in a slighly kludgy way. Basically, these
drivers played a somewhat evil trick with clusters, which is now
replaced by a somewhat evil trick with regular malloc'd memory.
 1.16 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.15 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.14 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.13 16-Jul-1994  cgd use NetBSD (defined in param.h) not __NetBSD__ to allow x-compilation
with native compiler.
 1.12 29-Jun-1994  cgd branches: 1.12.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.11 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.10 08-Mar-1994  cgd Some stability/safety/extensibility patches. Inspired by Christoph Badura.
Always make sure our buffer is large enough, and restart hung lines.
 1.9 10-Feb-1994  cgd mccanne convinced me that slip.h *should* exist. this is what
i "implemented" for 4.4, and the adjustments to the other files to
match.
 1.8 08-Jan-1994  cgd quench the grammar flames!
 1.7 20-Dec-1993  cgd serious cleanup
 1.6 10-Dec-1993  cgd move slip compression configuration into the interface flags,
and diddle a couple of related things.
 1.5 20-May-1993  cgd branches: 1.5.4;
add rcs ids to everything, and clean up headers
 1.4 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.3 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.2 21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.5.4.1 10-Dec-1993  cgd update from trunk
 1.12.2.1 16-Jul-1994  cgd update from trunk
 1.19.14.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.19.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.19.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.23.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.24.22.6 11-Dec-2005  christos Sync with head.
 1.24.22.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.24.22.4 18-Dec-2004  skrll Sync with HEAD.
 1.24.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.24.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.24.22.1 03-Aug-2004  skrll Sync with HEAD
 1.24.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.26.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.26.4.1 29-Apr-2005  kent sync with -current
 1.27.10.1 29-Nov-2005  yamt sync with head.
 1.27.4.2 03-Sep-2007  yamt sync with head.
 1.27.4.1 21-Jun-2006  yamt sync with head.
 1.31.14.1 19-Jun-2006  chap Sync with head.
 1.31.8.1 26-Jun-2006  yamt sync with head.
 1.31.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.31.4.1 09-Sep-2006  rpaulo sync with head
 1.32.16.1 15-Jul-2007  ad Sync with head.
 1.33.122.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.36 14-May-2021  yamaguchi Add a parameter to change keepalive interval in each PPPoE I/F
 1.35 11-May-2021  yamaguchi clear authentication protocol when SPPP_AUTHPROTO_NONE is specified
 1.34 11-May-2021  yamaguchi Added ioctl commands for configuring NCP of pppoe(4)
 1.33 11-May-2021  yamaguchi Revert previous commit because of mistake of commit log

back to r1.230(if_spppsubr.c) and r1.31(if_sppp.h)
 1.32 11-May-2021  yamaguchi Added keywords that are ipcp, noipcp, ipv6cp, noipv6cp
for configuring NCP
 1.31 23-Apr-2021  yamaguchi branches: 1.31.2; 1.31.4;
Introduct a new flag to accept different authentication protocol
in myauthproto and hisauthproto

When the flag is enabled, a authentication protocol notified
at LCP negotiation is used as my authentication protocol.
When the flags is NOT enabled, my authentication protoco is
not changed at LCP negotiation.
 1.30 02-Dec-2020  wiz comparision -> comparison
 1.29 25-Nov-2020  yamaguchi Add commands to refer params of control protocols in if_spppsubr.c

reviewed by knakahara@n.o.
 1.28 06-Sep-2015  dholland branches: 1.28.30;
More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.27 20-Apr-2010  jmcneill branches: 1.27.18; 1.27.36;
COMPAT_50 support for SPPP[GS]ETIDLETO and SPPP[GS]ETKEEPALIVE, ok martin@
 1.26 28-Apr-2008  martin branches: 1.26.20; 1.26.22;
Remove clause 3 and 4 from TNF licenses
 1.25 20-Feb-2008  matt branches: 1.25.6; 1.25.8; 1.25.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.24 10-Dec-2005  elad branches: 1.24.46;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.23 26-Dec-2003  martin branches: 1.23.16;
Add a new ioctl SPPPGETSTATUSNCP to query the PPP phase and check wether
any NCP is UP.
 1.22 28-Nov-2003  keihan s/netbsd.org/NetBSD.org/g
 1.21 11-Sep-2003  martin Fix copy & pasto (luckily, for most archs the structs had the same size,
so this went unnoticed for quite some time now). Noticed by Thomas Bieg.
 1.20 05-Sep-2003  martin Fix copy & pasto.
 1.19 03-Sep-2003  martin Rearange dead link detection slightly:
As long as we receive data from the peer, don't worry. When we have not
received anything within the "max_noreceive" period, we start sending LCP
echo requests and count them, until we receive an answer (or some data)
or the "maxalive" count of not answered echo requests is reached.
All this is checked at a global 10 seconds interval for all interfaces.
The "max_noreceive" period and the "maxalive" count are configurable per
interface.
 1.18 06-Jan-2003  wiz branches: 1.18.2;
successful with only one l.
 1.17 14-Apr-2002  martin Fix copyright notice.
 1.16 02-Mar-2002  martin Add support to query the peer for DNS addreses when negotiating IPCP.
Add ioctls to retrieve the results.

While here remove a malloc()/free() of an unused buffer.
 1.15 15-Jan-2002  martin Make fields in ioctl parameters that are not allowed to be negative u_ints.
Better range & sanity checking for ioctl arguments (thanks, Jaromir!)
 1.14 07-Jan-2002  martin Implement a retry counter for failed authorizations and limit it to
a configurable maximum (default: 5).

Some ISPs shut down accounts (at least temporarily) after to many bad
retries. This hit me recently due to a stupid pilot error and the fast
retry rate.
 1.13 06-Jan-2002  martin Implement an activity timestamp, recording the last time payload data
passed through.

Implement optional idle timeout.
 1.12 04-Jan-2002  martin Move net/if_sppp.h to net/if_spppvar.h, create a new net/if_sppp.h
containing the userland visible thinks (i.e. ioctl definitions).

Remove all (both) old ioctls, as they had a brain dead API and made keeping
binary compatibility more or less impossible.

Replace by several new ioctls. While there, remove any arbitrary limits
(resulting from the old, broken ioctls) and allow any length of names
and passwords.
 1.11 31-Dec-2001  thorpej Fix a "pointers are not permitted as case values" gcc 3.1 warning.
 1.10 08-Dec-2001  martin Change the way IPCP negotiation is handled.

Collect both local and remote address and set them to the interface in
one step (the peer adress was not set at all before).

This causes the peer address now to show up on the interface and all
messages to the routing socket to be send with correct data. The latter
has been the last missing piece to complete PPPoE support.
 1.9 09-Apr-2001  martin branches: 1.9.2;
Add another option for encapsulation: PP_NOFRAMING.
In this mode, the PPP packets start with the protocol identifier and don't
have any explicit framing (which may be added by the lower level driver).

Make input/output statistics a little bit more correct by adding a hardware
driver adjustable framing length for each packet (instead of the constant
value "3" used before).

While there, bump authentication name length from 32 to 48 (I have a
connection where I need more than 32). XXX - this should not be artificialy
limited at all.
 1.8 25-Mar-2001  martin Make the 'cmd' argument to ioctl an unsigned long, as it is everywhere
else.
 1.7 10-Aug-2000  ad branches: 1.7.2;
Define SIOC[SG]IFGENERIC in <sys/sockio.h>, as FreeBSD and OpenBSD do.
 1.6 02-May-2000  itojun IPv6CP support. if IPv6 link-local address is configured to the interface,
the interface tries to negotiate ifid with the other end by using IPv6CP.

other changes:
- do not share ppp sequence number across protocols.
- if LCP proto-rej is received, drop the protocol mentioned by the message.
this is to be friendly with non-IPv6 peer (if the peer complains due to
lack of IPv6CP, drop IPv6CP). this basically implements "RXJ+" state
transition in the RFC.
- cleanup debugging message. always print blank just before message.

CAVEAT:
- if the peer uses the same MAC address as our side (pretty unlikely)
the code may go into req-rej loop.
- even though we negotiate ifid, we don't configure destination address
onto the interface. it is not really necessary to do so (IMHO).
- I've tested this code on a NetBSD 1.4.2 node, which was with fair amount
of modifications. not sure if the committed code does it right... (please
test and send reports)
 1.5 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.4 04-Apr-1999  explorer branches: 1.4.2;
Add NetBSD rcsid tags, and preserve old ones from i4b source
 1.3 04-Apr-1999  explorer switch to the i4b version of if_sppp*.[ch] (with mods)
 1.2 25-Mar-1999  explorer branches: 1.2.2;
put RCS ids in the right place. And yes, this is a SYNC ppp interface,
used for high-speed (T1, HSSI, DS3) interfaces.
 1.1 25-Mar-1999  explorer port FreeBSD's serial ppp layer to NetBSD. The PPP part seems broken still,
but the lmc driver uses the HDLC bits from here anyway.
 1.2.2.1 04-Apr-1999  explorer branches: 1.2.2.1.2;
Pull up recent changes to if_sppp*.[ch] (i4b code) with RCS id fixes
 1.2.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.4.2.3 21-Apr-2001  bouyer Sync with HEAD
 1.4.2.2 27-Mar-2001  bouyer Sync with HEAD.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.2.8 07-Jan-2003  thorpej Sync with HEAD.
 1.7.2.7 17-Apr-2002  nathanw Catch up to -current.
 1.7.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.7.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.7.2.4 11-Jan-2002  nathanw More catchup.
 1.7.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.7.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.7.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.9.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.9.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.9.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.9.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.2.4 11-Dec-2005  christos Sync with head.
 1.18.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.18.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.18.2.1 03-Aug-2004  skrll Sync with HEAD
 1.23.16.2 27-Feb-2008  yamt sync with head.
 1.23.16.1 21-Jun-2006  yamt sync with head.
 1.24.46.1 23-Mar-2008  matt sync with HEAD
 1.25.10.2 11-Aug-2010  yamt sync with head.
 1.25.10.1 16-May-2008  yamt sync with head.
 1.25.8.1 18-May-2008  yamt sync with head.
 1.25.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.26.22.1 30-May-2010  rmind sync with head
 1.26.20.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.27.36.1 22-Sep-2015  skrll Sync with HEAD
 1.27.18.1 03-Dec-2017  jdolecek update from HEAD
 1.28.30.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.31.4.1 31-May-2021  cjep sync with head
 1.31.2.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.31.2.1 13-May-2021  thorpej Sync with HEAD.
 1.272 07-Oct-2025  andvar Fix few typos in comments.
 1.271 05-Jun-2025  ozaki-r Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.270 05-Jun-2025  ozaki-r Apply if_first_addr() and if_first_addr_psref()
 1.269 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.268 17-Feb-2024  martin branches: 1.268.2;
PR 57941: remove duplciate declaration (copy+pasto)
 1.267 25-Mar-2023  andvar branches: 1.267.4;
s/deteted/detected/ in log message.
 1.266 03-Sep-2022  thorpej branches: 1.266.4;
Garbage-collect the remaining vestiges of netisr.
 1.265 03-Sep-2022  thorpej Only use configured RPS hash functions for IPv4 and IPv6 packets.

This is NFC change now because only IPv4 and IPv6 use pktqueue,
but that will change in future commits.
 1.264 27-Aug-2022  thorpej Ensure that all queues passed to ifq_enqueue2() have a valid ifq_lock.
 1.263 27-Aug-2022  thorpej Use IFQ_SET_MAXLEN() rather than open-coding it.
 1.262 07-Mar-2022  knakahara Don't change ifp->if_link_state directly. Pointed out by yamaguchi@n.o.
 1.261 25-Oct-2021  knakahara kpreempt_disable() before sppp_get_{ip,ip6}_addrs() are unnecessary now.
 1.260 25-Oct-2021  knakahara Fix missing curlwp_bind() for ifa_release(), ok'ed by yamaguchi@n.o.

This causes the following KASSERT failure in pppoe server.
- sppp_rcr_event()
- sppp_ipcp_confreq()
- sppp_get_ip_addrs()
- psref_release()

After if_spppsubr.c:1.227, sppp_ipcp_confreq() is done in workqueue
instead of softint.
 1.259 11-Oct-2021  knakahara Make pktq_rps_hash() pluggable for each interface type. Reviewed by gdt@n.o, thorpej@n.o, and riastradh@n.o, thanks.
 1.258 02-Jun-2021  yamaguchi Added missing definition of sppp_ipv6cp_tld

Fixed build without INET6
 1.257 01-Jun-2021  yamaguchi Fix the wrong timeout event handler for PAP

sppp_auth_to_event() is a implementation of TO+/TO- event for
authentication protocol and it drops TO+ event in Ack-rcvd state.
 1.256 01-Jun-2021  yamaguchi Send Up event in tlu action of LCP

When LCP is stopping, the layer send Down event and Close event
(Down -> Close). To align the sequence, Up event is moved
before Open event.
 1.255 01-Jun-2021  yamaguchi Added logs when IPCP and IPv6CP are up or down
 1.254 01-Jun-2021  yamaguchi Added SPPP_LOG() for refactoring around log
 1.253 01-Jun-2021  yamaguchi Send RTM_IFINFO when a network configuration protocol
is up or down
 1.252 01-Jun-2021  yamaguchi Drop the Open event of LCP to stop the interface
even a reconnection is scheduled

The queue for events in if_spppsubr.c is not possible
to enqueue the same event. So, The close event caused
while a close event and open event are enqueued for
reconnection is not possible to stop interface.
To solve this issue, The open event after
"ifconfig pppoe? down" is dropped.
 1.251 01-Jun-2021  yamaguchi remove PP_CISCO that was used in obsoleted drivers e.g. lmc(4)
 1.250 01-Jun-2021  yamaguchi Remove open event on tlf of PAP/CHAP when retry of them is over
to prevent that LCP stops at Starting state.

And also remove retry counter check on tls of LCP because of
unnecessary.
 1.249 01-Jun-2021  yamaguchi Do not if_down() when a down event of the lower layer of LCP is happened
since the layer try to reconnect.
 1.248 01-Jun-2021  yamaguchi Fix not to do if_down() before reconnect

Almost network interface do not use if_down() even when there is no
connectivity. So, pppoe(4) is also made be not used it.
This behavior can be rollbacked by SPPP_IFDOWN_RECONNECT option.
 1.247 01-Jun-2021  yamaguchi restart LCP when loopback packets are detected

In if_spppsubr.c down and up do not mean that LCP is stopping
or running, but mean that the lower layer of LCP is up or down.
And, restarting of LCP is had to use close event and open event.
 1.246 19-May-2021  yamaguchi Make functions that use for logging MP-safe

There is no change in behavior.
 1.245 19-May-2021  yamaguchi Added clear of dns addresses when IPCP is closed
 1.244 19-May-2021  yamaguchi Added logs on dropping IPCP and IPv6CP packets
 1.243 19-May-2021  yamaguchi remove a wrong ntohs().

The variable is already host-byte-order.
 1.242 19-May-2021  yamaguchi Added a log about rejection of IPCP address option
 1.241 14-May-2021  yamaguchi Add a parameter to change keepalive interval in each PPPoE I/F
 1.240 14-May-2021  yamaguchi Added SPPP_NORECV_TIME option to change pp_max_noreceive
 1.239 14-May-2021  yamaguchi Send echo request even while user data is received
if pp_max_noreceive is 0
 1.238 14-May-2021  yamaguchi Introduce SPPP_KEEPALIVE_INTERVAL option
to change the interval between LCP echo requests
 1.237 11-May-2021  yamaguchi clear authentication protocol when SPPP_AUTHPROTO_NONE is specified
 1.236 11-May-2021  yamaguchi Drop packets that have no NCP not to start auto-dial
 1.235 11-May-2021  yamaguchi Added missing if_oerror incrementing
 1.234 11-May-2021  yamaguchi Move RCA event after RCR event

A authentication failed by TO+ event between RCA and RCR events
1. RCA event in REQ-SENT state
- REQ-SENT => ACK-RCVD
2. TO+ event
- ACK-RCVD => REQ-SENT
3. RCR+ event
- REQ-SENT => ACK-SENT

By moving RCA after RCR, the state is transisted to OPENED
1. RCR event
- REQ-SENT => ACK-SENT
2. TO+ event
- state is not changed
3. RCA event
- ACK-SENT => OPENED
 1.233 11-May-2021  yamaguchi Added ioctl commands for configuring NCP of pppoe(4)
 1.232 11-May-2021  yamaguchi Revert previous commit because of mistake of commit log

back to r1.230(if_spppsubr.c) and r1.31(if_sppp.h)
 1.231 11-May-2021  yamaguchi Added keywords that are ipcp, noipcp, ipv6cp, noipv6cp
for configuring NCP
 1.230 06-May-2021  yamaguchi branches: 1.230.2;
do not clear destination address if there is no saved address
and add initialization of saved_hisaddr for safety

0.0.0.0 was sometimes configured to destination address when
ipcp close was occurred before ipcp tlu.
Following messages will be appeared when the issue is encountered and
debug for pppoe(4) is enabled.

tc-so:[ 1.890005] pppoe0: ipcp close(starting)
(snip)
tc-so:[ 1.890005] pppoe0: ipcp_open(): no IP interface
 1.229 06-May-2021  yamaguchi Added m_freem for safety

pointed out by knakahara@, thanks.
 1.228 28-Apr-2021  yamaguchi Introduce a pointer to refer sp->scp[cp->protoidx]

There is no functional difference.
 1.227 28-Apr-2021  yamaguchi Move paese of conf-req, conf-nak and conf-rej into workqueue
from softint context

When the pases were processed in softint, the state machine
in if_spppsubr.c had been broken by simultaneous events
on rare occasions.

Example:
1. Do ifconfig pppoe* up
- lcp open event is enqueued to workqueue
2. Receive conf-ack, and parse the packet
- save mru to sp->lcp.their_mru
- lcp RCR+ event is enqueued to workqueue
3. Process lcp open event
- initialize data including sp->lcp.their_mru
4. Process lcp RCR+ event
- Use sp->lcp.their_mru
- but it was initialized
 1.226 26-Apr-2021  yamaguchi Fix the wrong CHAP option length in conf-nak

RFC 1994 defines that the CHAP option length in conf-nak is 5.
However, 4 was used when CHAP is cofigured and PPP is proposed
by a peer.
 1.225 26-Apr-2021  yamaguchi Avoid updating of the state if the state is not changed
not to reset the timer for state machine
 1.224 26-Apr-2021  yamaguchi Reset LCP fail counter when doing "ifconfig pppoe* up"
 1.223 26-Apr-2021  yamaguchi Added ipcp option name for logging
 1.222 26-Apr-2021  yamaguchi Ignore 0.0.0.0 offered from PPPoE server
 1.221 26-Apr-2021  yamaguchi Fix locking order since IFNET_LOCK must be held
before acquiring SPPP_LOCK
 1.220 23-Apr-2021  yamaguchi branches: 1.220.2;
Adjust mtu at LCP instead at IPCP

The adjustment must be done at LCP when a PPPoE connection
does not use IPCP.
 1.219 23-Apr-2021  yamaguchi Fix to set mtu even if it is bigger than mru notified at LCP
 1.218 23-Apr-2021  yamaguchi Introduct a new flag to accept different authentication protocol
in myauthproto and hisauthproto

When the flag is enabled, a authentication protocol notified
at LCP negotiation is used as my authentication protocol.
When the flags is NOT enabled, my authentication protoco is
not changed at LCP negotiation.
 1.217 16-Apr-2021  yamaguchi Remove unnecessaly lock holdings to avoid dead lock

The locks were held while callout_halt() and workqueue_wait()
without reason.
And the locks also were held at callout and workqueue handler
so that the handler kicked by those function couldn't acquire
the lock.

The reasons why those are unneccesary are:
- Items of callout_t are protected by callout_lock
- Items of struct workqueue and struct work are protected
by q_mutex in struct workqueue
- Items of struct sppp_work protected by atomic_cas(3)
- struct pppoe_softc does not free before workqueue_wait() and
callout_halt() even if the locks are not held
 1.216 16-Apr-2021  yamaguchi Fix not to put the wrong error message
 1.215 27-Nov-2020  yamaguchi branches: 1.215.2;
Fix missing disable of kpreempt while getting interface address
 1.214 25-Nov-2020  yamaguchi add KASSERT(!cpu_softintr_p());

pointed out by knakahara@n.o., thanks.
 1.213 25-Nov-2020  yamaguchi Add commands to refer params of control protocols in if_spppsubr.c

reviewed by knakahara@n.o.
 1.212 25-Nov-2020  yamaguchi Reconnect when a down event caused by tlf caught
 1.211 25-Nov-2020  yamaguchi Move code related to module to bottom
 1.210 25-Nov-2020  yamaguchi Reconnect lcp after authentication or network phase finish
 1.209 25-Nov-2020  yamaguchi Close lcp when the lower layer down if the interface is passive or on-demand

reivewed by knakahara@n.o.
 1.208 25-Nov-2020  yamaguchi Update ip addresses in the workqueue for control protocols

reviewed by knakahara@n.o.
 1.207 25-Nov-2020  yamaguchi Add the id check for TERM_ACK
 1.206 25-Nov-2020  yamaguchi remove double newlines
 1.205 25-Nov-2020  yamaguchi change function name(RCR => parse_confreq)

reviewed by knakahara@n.o.
 1.204 25-Nov-2020  yamaguchi Add a function to initialize parameters
 1.203 25-Nov-2020  yamaguchi Remove unused and unimplemented code related to CP_QUAL
 1.202 25-Nov-2020  yamaguchi Simplify commonly used functions

reviewed by knakahara@n.o.
 1.201 25-Nov-2020  yamaguchi implement auth protocols on the state-machine of control protocols

reviewed by knakahara@n.o.
 1.200 25-Nov-2020  yamaguchi Insert an entry after initialization
 1.199 25-Nov-2020  yamaguchi call if_down() in workqueue instead of callout(9)
 1.198 25-Nov-2020  yamaguchi Change a state of control protocol in thread context

reviewed by knakahara@n.o.
 1.197 25-Nov-2020  yamaguchi Add a function for RXJ event
 1.196 25-Nov-2020  yamaguchi Add a function for RTR and RTA event
 1.195 25-Nov-2020  yamaguchi Add a function for RCA and RCN event
 1.194 25-Nov-2020  yamaguchi Add a function for RCR event
 1.193 25-Nov-2020  yamaguchi Refactoring functions for RCR and RCN
 1.192 25-Nov-2020  yamaguchi Add a structure for params related to control protocols
 1.191 25-Nov-2020  yamaguchi remove variable names in function declaration
 1.190 05-Oct-2020  roy branches: 1.190.2;
ppp: Remove media

There is none after all.
Applications should be using ifi_link_state and not checking media.
 1.189 04-Apr-2020  is Multilink PPP: sanity check of option values, storage of remote MRRU.
 1.188 01-Apr-2020  is Define a few more LCP options. Recognize, sanity-check and report (but
still reject for the moment) multilink PPP configuration options received.
 1.187 06-Mar-2020  knakahara branches: 1.187.2;
remove unnecessary lock in sppp_mediastatus() as it doesn't touch struct sppp.

ok'ed by yamaguchi@n.o.
 1.186 04-Feb-2020  thorpej Use ifmedia_fini().
 1.185 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.184 13-Sep-2019  msaitoh branches: 1.184.2;
if_flags is neither int nor short. It's unsigned short.
 1.183 11-Jul-2019  msaitoh Fix typo (s/supress/suppress/).
 1.182 01-Mar-2019  pgoyette Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.
 1.181 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.180 30-Mar-2018  mlelstv branches: 1.180.2;
Reset fail counter when link goes down so that next session starts
with the correct retry count.
 1.179 06-Feb-2018  knakahara branches: 1.179.2;
Fix breaking character limit. Pointed out by ozaki-r@n.o, thanks.
 1.178 28-Dec-2017  ozaki-r Ensure the timer isn't running by using workqueue_wait
 1.177 11-Dec-2017  ozaki-r Wrap if_ioctl_lock with IFNET_* macros (NFC)

Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
 1.176 07-Dec-2017  ozaki-r Ensure to call if_addr_init with holding if_ioctl_lock
 1.175 22-Nov-2017  christos fix non-diagnostic compilation
 1.174 22-Nov-2017  ozaki-r Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref (more)
 1.173 22-Nov-2017  ozaki-r Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
 1.172 15-Nov-2017  knakahara Mark callouts of pppoe(4) CALLOUT_MPSAFE. Suggested by ozaki-r@n.o.
 1.171 13-Oct-2017  knakahara fix no INET6 build.
 1.170 12-Oct-2017  knakahara sppp_lock is changed from mutex to rwlock now. Contributed by s-yamaguchi@IIJ.

Add locking notes later.
 1.169 28-Mar-2017  ozaki-r branches: 1.169.6;
Avoid touching a mbuf after enqueuing it
 1.168 28-Mar-2017  ozaki-r Use sp->pp_framebytes instead of the constant value "3"

It seems that it was forgotten to be converted in v1.22.
 1.167 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.166 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.165 27-Dec-2016  christos branches: 1.165.2;
Another missed patch
 1.164 26-Dec-2016  christos pfil(9) improvements to handle address changes:

Add:
PFIL_IFADDR call on interface reconfig (mbuf is ioctl #)
PFIL_IFNET call on interface attach/detach (mbuf is PFIL_IFNET_*)

from rmind@
 1.163 13-Dec-2016  knakahara MP-safe pppoe(4).

Nearly all parts is implemented by Shoichi YAMAGUCHI<s-yamaguchi@IIJ>, thanks.
 1.162 06-Dec-2016  knakahara add API to manipulate ifa->ia_hash and ia_hash_pslist_entry, and fix ia_hash_pslist_entry race by using them.

in_ifaddr_lock is required before writing ifa->ia_hash and
ia_hash_pslist_entry to serialize writer processings.

reviewed by ozaki-r@n.o.
 1.161 01-Dec-2016  knakahara fix two races between set_ip_addrs and clear_ip_addrs race.

(1) if set_ip_addrs and clear_ip_addrs run parallel, they can parallel call
IN_ADDRHASH_WRITER_REMOVE to the same ifa.
(2) if set_ip_addrs's workqueue is separated from clear_ip_addrs's one,
the workers can run in reverse order of enqueued.
 1.160 01-Dec-2016  knakahara fix CID 1396600: Null pointer dereferences
 1.159 25-Nov-2016  knakahara make workqueue sppp_{set,clear}_ip_addrs to be able to call pserialize_perform.
 1.158 25-Nov-2016  knakahara refactor sppp_{set,clear}_ip_addrs(). reduce iterating if_addr_pslist.
 1.157 18-Nov-2016  knakahara We must use PSLIST_ENTRY_DESTROY after PSLIST_WRITER_REMOVE and waiting all readers done.

And then, if we want to re-insert the removed pslist element, we need to
call PSLIST_ENTERY_INIT again.

advised by riastradh@n.o and reviewed by ozaki-r@n.o, thanks.
 1.156 08-Oct-2016  joerg Use uint8_t for opt as some of the values don't fit into the (positive)
range of a signed char.
 1.155 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.154 29-Sep-2016  roy Set dstaddr in in_ifinit so that sppp consumers announce the correct
dstaddr in routing messages.
 1.153 29-Sep-2016  roy Ensure we only call pfil_run_hooks if if_init succeeded.
While here, improve improve some logging.
 1.152 16-Sep-2016  roy Drop hostIsNew from in_ifinit, let the function work out if the address
has changed.
Sync address flag setup with the IPv6 counterpart.
When scrubbing the address, or setting up the address fails, restore the
old address flags as well as the old address.
 1.151 14-Sep-2016  roy Call ifmedia_delete_instance() for safety.
 1.150 14-Sep-2016  roy Add interface media for sppp consumers.
While there is no actual media to select,
the ioctl is used to query link status from userland.
 1.149 13-Sep-2016  joerg Report link state changes for sppp consumers. The link is considered up,
if the current phase is SPPP_PHASE_NETWORK, otherwise it is down. Useful
when using dhcpcd for DHCPv6 PD.
 1.148 09-Sep-2016  christos PR/51464: Shoichi YAMAGUCHI: chap authenticator of pppoe does not work
 1.147 06-Aug-2016  pgoyette Modularize the sppp_subr stuff so it can be shared by pppoe and lmc
drivers as they get modularized.
 1.146 07-Jul-2016  ozaki-r branches: 1.146.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.145 06-Jul-2016  ozaki-r Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.
 1.144 30-Jun-2016  ozaki-r Make sure that ifaddr is published after its initialization finished

Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
 1.143 20-Jun-2016  knakahara apply if_start_lock() to L2 callers which call ifp->if_start() of device derivers
 1.142 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.141 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.140 24-Apr-2016  christos CID 1210544: Tainted scalar
 1.139 24-Apr-2016  christos CID 980345: missing breaks
 1.138 24-Apr-2016  christos CID 980057, 980058, use strlcpy()
 1.137 23-Apr-2016  martin Add missing breaks (cosmetic change only)
 1.136 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.135 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.134 25-May-2015  ozaki-r Remove leftover IPX-related stuffs

No objection on tech-kern and tech-net.
 1.133 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.132 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.131 28-Nov-2014  ozaki-r branches: 1.131.2;
Remove dead codes and make if_free_sadl static

No functional change.
 1.130 06-Jun-2014  rmind branches: 1.130.2; 1.130.6;
sppp_input: handle pktqueue case correctly (fix for the previous).
 1.129 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.128 15-May-2014  msaitoh Save a NETISR_* value in a variable and call schednetisr() after enqueue
a packet for readability and future modification.
 1.127 29-Jun-2013  rmind branches: 1.127.4;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.126 01-Mar-2013  joerg branches: 1.126.6;
Retire OSI network stack. OK core@
 1.125 17-Dec-2011  tls branches: 1.125.6;

Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.
 1.124 19-Nov-2011  tls branches: 1.124.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.123 28-Oct-2011  dyoung branches: 1.123.2;
For these interfaces, the implementation of SIOCSIFDSTADDR is identical
to SIOCINITIFADDR, and SIOCSIFDSTADDR callers always fall back to
SIOCINITIFADDR, so just get rid of the SIOCSIFDSTADDR case.
 1.122 05-Sep-2011  rjs Add support for RFC 4638 to pppoe(4).

The change to if_spppsubr.c moves the test for whether LCP should
request a mru change until after the pppoe device has picked up the
mtu of the underlying ethernet device.
 1.121 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.120 20-Apr-2010  jmcneill COMPAT_50 support for SPPP[GS]ETIDLETO and SPPP[GS]ETKEEPALIVE, ok martin@
 1.119 28-Feb-2010  snj branches: 1.119.2;
Fight the ever-increasing size of src checkouts by spelling "useful"
without an extra l.
 1.118 18-Apr-2009  tsutsui branches: 1.118.2;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.117 18-Mar-2009  cegger bcopy -> memcpy
 1.116 13-Nov-2008  martin branches: 1.116.4;
Pass SIOCAIFADDR to ifioctl_common, fixes PR kern/39900.
 1.115 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.114 03-Oct-2008  pooka branches: 1.114.2;
Fix pointer size typo - affects only debug output.

Henning Petersen, PR lib/39689
 1.113 22-Aug-2008  martin Backout previous/restore initial fix for PR kern/39280.
The later changes were only cosmetic, cause problems in IPv6-only-
connections (reported by Wolfgang Solfrank in private mail), as well
as reintroducing the original bug again.
 1.112 04-Aug-2008  christos keep the loop, but arrange IDX_COUNT to be correct.
 1.111 04-Aug-2008  martin PR kern/39280: Uninitialized callout stopped in if_spppsubr layer
in kernels without options INET6.
 1.110 24-Jun-2008  gmcgarry branches: 1.110.2;
ioctl commands are unsigned long.
 1.109 20-Feb-2008  matt branches: 1.109.6; 1.109.10; 1.109.12; 1.109.14;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.108 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.107 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.106 09-Jul-2007  ad branches: 1.106.8; 1.106.14; 1.106.16; 1.106.20;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.105 23-Jun-2007  scw If the underlying link's MTU is less than PP_MTU (e.g. PPPoE), set our
MRU to the link's MTU and initiate an MRU negotiation with the peer.

This is useful when the PPP session is bridged from Ethernet to ATM
by an ADSL modem (such as the Linksys AM200). Unless we negotiate the
lower MRU, the peer is unaware that 1500-byte packets will not make
it umolested across the link (the Linksys AM200 silently truncates them
to 1498 bytes, creating a nice PMTU blackhole).

Note that the PPP RFC says peers MUST accept 1500 byte packets,
regardless of the negotiated MRU, so most ISPs which use PPPoA will
probably still send 1500-byte packets. However, I persuaded my ISP
(Andrews and Arnold) to modify their software to generate an ICMP error
"fragment needed" for packets with IP.DF set which are larger than the
negotiated MRU. They will still forward non-IP.DF packets, with the
associated truncation, but at least my PMTU troubles have gone.
 1.104 04-Mar-2007  christos branches: 1.104.2; 1.104.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.103 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.102 24-Nov-2006  wiz branches: 1.102.2; 1.102.4; 1.102.6;
Correct spelling of "immediate(ly)". From Zafer.
 1.101 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.100 26-Oct-2006  elad Kill a couple of KAUTH_GENERIC_ISSUSER usages.

I had to refactor the code a bit, I hope it's okay.
 1.99 13-Oct-2006  dogcow More -Wunused fallout. sprinkle __unused when possible; otherwise, use the
do { if (&x) {} } while (/* CONSTCOND */ 0);
construct as suggested by uwe in <20061012224845.GA9449@snark.ptc.spbu.ru>.
 1.98 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.97 07-Sep-2006  dogcow branches: 1.97.2; 1.97.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.96 23-Aug-2006  adrianp A problem has been identified in the in-kernel PPP code shared by ISDN PPP
interfaces ippp(4) and pppoe(4). Insufficient checking of options presented
by the peer may cause writing of copies of the malicious input beyond the
end of a buffer allocated for that purpose.

Issue found by pavel@
Fix from martin@

This is SA2006-019 (CVE-2006-4304)
 1.95 23-Jul-2006  ad branches: 1.95.2;
Use the LWP cached credentials where sane.
 1.94 13-Jul-2006  martin Small simplification, pointed out by Christian Hattemer in private mail.
 1.93 13-Jul-2006  martin Do not automagically UP the interface when setting the address.
Together with previous ifconfig changes, this fixes PR 30694, at
least for pppoe (and other sppp based) interfaces.
 1.92 07-Jun-2006  kardel branches: 1.92.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.91 21-May-2006  christos Fixes from David Boggs; in his words:

/sys/net/if_spppvar.h says:

"Lower layer drivers that are always ready to communicate
(like hardware HDLC) can shortcut pp_up from pp_tls,
and pp_down from pp_tlf."

When I follow those instructions, I get a kernel stack
overflow as soon as I open the HDLC device.

Here is the loop:
sppp_ioctl calls sppp_lcp_open
sppp_lcp_open calls sppp_open_event
sppp_open_event calls sppp_lcp_tls
sppp_lcp_tls calls pp_tls
pp_tls is the SHORTCUT to sppp_lcp_up
sppp_lcp_up calls spp_lcp_open
...and around we go until the stack overflows.

The fix is to reverse the order of the action (tls)
and the state change (from INITIAL to STARTING) in
sppp_open_event.

There is a similar loop during closing:
sppp_ioctl calls sppp_lcp_close
sppp_lcp_close calls sppp_close_event
spp_close_event calls sppp_lcp_tlf
sppp_lcp_tlf calls pp_tlf
pp_tlf is the SHORTCUT to sppp_lcp_down
sppp_lcp_down calls sppp_lcp_close
...and around we go until the stack overflows.

The fix is to reverse the order of the action (tlf)
and the state change (from STARTING to INITIAL) in
sppp_close_event.

Separately, while I was discovering this, I noticed
that pp_tlf was being called unconditionally rather
than first checking to see if it is NULL. pp_tlf
is a callout from sppp to the hdlc device driver.
Elsewhere in sppp, this is always checked for NULL
before calling it, and the comments in if_spppvar.h
imply that filling it in is optional.

From spppvar.h:
"These functions need to be filled in by the lower layer
(hardware) drivers if they request notification from the
PPP layer whether the link is actually required."
This clearly says that pp_tlf and pp_tls are optional
and so sppp must check before calling them.
 1.90 14-May-2006  elad branches: 1.90.2;
integrate kauth.
 1.89 14-May-2006  christos XXX: GCC uninitialized.
 1.88 20-Apr-2006  christos Add an empty attach function. Reported by David Boggs
 1.87 21-Jan-2006  rpaulo branches: 1.87.2; 1.87.4; 1.87.6; 1.87.8; 1.87.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.86 11-Dec-2005  christos branches: 1.86.2;
merge ktrace-lwp.
 1.85 29-May-2005  christos branches: 1.85.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.84 27-Apr-2005  martin Fix typo, from C. Plasschaert in PR kern/30069.
 1.83 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.82 26-Feb-2005  perry branches: 1.82.2; 1.82.4;
nuke trailing whitespace
 1.81 24-Jan-2005  matt branches: 1.81.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.80 22-Dec-2004  itojun branches: 1.80.2;
whitespace
 1.79 06-Dec-2004  christos Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.78 18-Sep-2004  yamt call PFIL_IFADDR hooks where appropriate.
 1.77 21-Apr-2004  itojun sprintf -> snprintf
 1.76 08-Apr-2004  martin Be more paranoid about data a non-root user may query.
Fixes PR kern/25099 by Christian Biere.
 1.75 26-Dec-2003  martin branches: 1.75.2;
Add a new ioctl SPPPGETSTATUSNCP to query the PPP phase and check wether
any NCP is UP.
 1.74 10-Nov-2003  wiz Spell address with two d's. Inspired by similar changes in OpenBSD,
originating from Jonathon Gray and forwarded by jmc@openbsd.
 1.73 28-Oct-2003  mycroft Fix previous differently.
 1.72 26-Oct-2003  christos Fix uninitialized variable warnings.`
 1.71 03-Oct-2003  oki Correct debug message, mine is myauth, not hisauth.
 1.70 02-Oct-2003  itojun minor KNF
 1.69 05-Sep-2003  itojun u_short -> u_int16_t
 1.68 03-Sep-2003  martin Rearange dead link detection slightly:
As long as we receive data from the peer, don't worry. When we have not
received anything within the "max_noreceive" period, we start sending LCP
echo requests and count them, until we receive an answer (or some data)
or the "maxalive" count of not answered echo requests is reached.
All this is checked at a global 10 seconds interval for all interfaces.
The "max_noreceive" period and the "maxalive" count are configurable per
interface.
 1.67 09-Jul-2003  martin We should use IFQ_DEQUEUE to get packets from the send queue, not IF_DEQUEUE.
Hopefully this will fix ALTQ for ISDN and PPPoE interfaces.

While there remove an unsued function which contained dubious code
(accessing interface queue internals w/o the proper macros).
 1.66 23-May-2003  itojun branches: 1.66.2;
don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.65 14-May-2003  itojun use arc4random
 1.64 14-May-2003  itojun remove #ifdef __FreeBSD__ (code already diverged enough)
 1.63 28-Jan-2003  tron Use MRU negotiated with remote system as MTU. This fixes PR kern/18850
by Curt Sampson.
 1.62 19-Jan-2003  simonb Remove a break after a goto.
 1.61 28-Dec-2002  kristerw Restore the system priority level in case of errors.

OK:ed by martin.
 1.60 27-Sep-2002  itojun license clarification from the author, via openbsd
>BSD-style license from Serge Vakulenko <vak@cronyx.ru>
 1.59 25-Sep-2002  itojun KNF
 1.58 11-Sep-2002  itojun KNF - return is not a function.
 1.57 01-Sep-2002  martin If the peer did not answer LCP echo requests in-time, but we got user
data through within the last LCP keepalive interval, do not count this
as a keepalive failure.

Addresses parts of kern/17723.
 1.56 30-Jul-2002  christos Fix async map handling. Many thanks to Joerg Wunsch for the explanation.
 1.55 28-Jul-2002  christos Patches from Frank Kardel:
- length was one off in names and secrets.
- add win 98 kludge but we keep it disabled for now.
- setup the authorization bit early so that we don't end up doing ppp
negotiations without authorization.
 1.54 28-Jul-2002  christos Don't throw away the name and the secret lengths. This eliminates all the
strlen() calls, and there was a whole bunch of them.
 1.53 28-Jul-2002  christos avoid modifying the buffers, by checking for matching lengths first. From
Frank Kardel.
 1.52 27-Jul-2002  christos Use strcmp() instead of memcmp() because if we get passed a 0 length name
and secret, we'll authenticate successfully! While there, rename passwd to
secret so that code looks nicer.
 1.51 13-Jul-2002  martin Use "mono_time" instead of "time" for timeout calculations.
 1.50 11-Jul-2002  yamt don't access freed memory.
 1.49 11-Jul-2002  yamt don't access freed memory.
 1.48 06-Jul-2002  itojun correct tcp header chasing in pp_fastq processing. should fix kern/17491.
 1.47 29-May-2002  itojun don't panic on invalid CONF_ACK from remote (in general, issueing panic
on remote input is bad practice)
 1.46 29-Apr-2002  martin branches: 1.46.2; 1.46.4;
"if (debug)" some log messages not signaling real errors but happening
in normal operation.
 1.45 02-Mar-2002  martin Add support to query the peer for DNS addreses when negotiating IPCP.
Add ioctls to retrieve the results.

While here remove a malloc()/free() of an unused buffer.
 1.44 10-Feb-2002  martin Use IF_IS_EMPTY and IFQ_IS_EMPTY instead of accessing queue members
directly. Noticed by Thomas Klausner.
 1.43 21-Jan-2002  martin Fix copy&pasto: truncate strings copied in at *their* right length, not
some other strings length.

Found by Arne Helme.
 1.42 18-Jan-2002  jdolecek couple cosmetic style fixes, and drop ^L's
 1.41 15-Jan-2002  martin Make fields in ioctl parameters that are not allowed to be negative u_ints.
Better range & sanity checking for ioctl arguments (thanks, Jaromir!)
 1.40 14-Jan-2002  martin Initialize the activity timestamp when opening a connection. Only idle-
timeout connection that made it to phase NETWORK yet. (For drivers using
the internal timeout mechanism; isdnd, that does the timeout handling for
ISDN drivers, still needs to be fixed.)

Thanks to Wolfgang Solfrank for finding this.
 1.39 07-Jan-2002  martin Implement a retry counter for failed authorizations and limit it to
a configurable maximum (default: 5).

Some ISPs shut down accounts (at least temporarily) after to many bad
retries. This hit me recently due to a stupid pilot error and the fast
retry rate.
 1.38 06-Jan-2002  martin Implement an activity timestamp, recording the last time payload data
passed through.

Implement optional idle timeout.
 1.37 05-Jan-2002  thorpej Fix LP64 printf format problem.
 1.36 04-Jan-2002  martin Move net/if_sppp.h to net/if_spppvar.h, create a new net/if_sppp.h
containing the userland visible thinks (i.e. ioctl definitions).

Remove all (both) old ioctls, as they had a brain dead API and made keeping
binary compatibility more or less impossible.

Replace by several new ioctls. While there, remove any arbitrary limits
(resulting from the old, broken ioctls) and allow any length of names
and passwords.
 1.35 16-Dec-2001  martin Remove yet another spurious (debug?) output.
 1.34 16-Dec-2001  martin Remove some spurious (debug?) output.
 1.33 15-Dec-2001  martin Make reconnects after LCP keepalive detected an error actually work.
 1.32 10-Dec-2001  martin We explicitly close LCP when going to state CLOSED, so we better open
it again when going from INITIAL to STARTING. This has been done for
passive or auto-conecting interfaces always, but not for permanent
ones.

This fixes session reestablishement for PPPoE interfaces without LINK1 set,
and probably also closes PR kern/11161.

Thanks to Jared D. McNeill and Ross Harvey for sugesting debug methology.
 1.31 08-Dec-2001  martin Change the way IPCP negotiation is handled.

Collect both local and remote address and set them to the interface in
one step (the peer adress was not set at all before).

This causes the peer address now to show up on the interface and all
messages to the routing socket to be send with correct data. The latter
has been the last missing piece to complete PPPoE support.
 1.30 04-Dec-2001  ross code cleanup for portability
 1.29 12-Nov-2001  lukem add RCSIDs
 1.28 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.27 29-Oct-2001  martin In preparation for further changes: remove big parts of the ifdef mess
for OSes we no longer share this file with.
 1.26 23-Aug-2001  itojun branches: 1.26.4;
IFQ_PURGE cannot be used against ifqueue. use IF_PURGE.
 1.25 18-Jul-2001  thorpej bzero -> memset
 1.24 17-Jul-2001  martin Fix a slight bug introduced with revision 1.9 (IPv6 integration) where
the bit mask of open NCPs got out of sync.
Defer the (potential) closing of LCP after a NCP went down until after
the state machines got updated.

This fixes PR kern/11161.
 1.23 13-Apr-2001  thorpej branches: 1.23.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.22 09-Apr-2001  martin Add another option for encapsulation: PP_NOFRAMING.
In this mode, the PPP packets start with the protocol identifier and don't
have any explicit framing (which may be added by the lower level driver).

Make input/output statistics a little bit more correct by adding a hardware
driver adjustable framing length for each packet (instead of the constant
value "3" used before).

While there, bump authentication name length from 32 to 48 (I have a
connection where I need more than 32). XXX - this should not be artificialy
limited at all.
 1.21 25-Mar-2001  martin Make the 'cmd' argument to ioctl an unsigned long, as it is everywhere
else.
 1.20 17-Jan-2001  thorpej branches: 1.20.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.19 16-Jan-2001  itojun run IPCP only if we have IPv4 in kernel
 1.18 15-Jan-2001  martin Don't peek at part of a structure via fuword. Does not work well on
64bit architectures. XXX - have to check other changes in the I4B
distribution, this had been fixed there a long time ago.
 1.17 07-Jan-2001  martin 64bit police.
Rumors say there are archs without ISA busses, so avoid including
(uneccesarily) isa bus headers in MI files.
XXX this is the minimal solution, layer interface calls will have
XXX to be revisited later
 1.16 18-Dec-2000  thorpej Use IFQ_PURGE().
 1.15 13-Dec-2000  thorpej Add ALTQ glue.
 1.14 10-Oct-2000  itojun fix comment (s/IPv6/IP/)
 1.13 08-Oct-2000  itojun fix operator precedence (& and &&). do not transmit too much message
from LCP layer to NCP layer. PR 11161.
 1.12 02-Oct-2000  itojun fix compilation without INET
 1.11 02-Jul-2000  sommerfeld Merge if_spppsubr.c PPP protocol declarations list with the one found
in ppp_defs.h, and have if_spppsubr.c include ppp_defs.h rather than
duplicate its definitions.

[This is a stopgap measure to clean up build lossage.]
 1.10 16-May-2000  itojun branches: 1.10.4;
propose better IPv6 ifid alternative to the peer, when ifid collides
during IPv6CP negotiation. it is very rare to see collision.
 1.9 02-May-2000  itojun IPv6CP support. if IPv6 link-local address is configured to the interface,
the interface tries to negotiate ifid with the other end by using IPv6CP.

other changes:
- do not share ppp sequence number across protocols.
- if LCP proto-rej is received, drop the protocol mentioned by the message.
this is to be friendly with non-IPv6 peer (if the peer complains due to
lack of IPv6CP, drop IPv6CP). this basically implements "RXJ+" state
transition in the RFC.
- cleanup debugging message. always print blank just before message.

CAVEAT:
- if the peer uses the same MAC address as our side (pretty unlikely)
the code may go into req-rej loop.
- even though we negotiate ifid, we don't configure destination address
onto the interface. it is not really necessary to do so (IMHO).
- I've tested this code on a NetBSD 1.4.2 node, which was with fair amount
of modifications. not sure if the committed code does it right... (please
test and send reports)
 1.8 12-Apr-2000  itojun add more IPv6 cases. not tested.
TODO: IPv6CP support. currently IPv6 packet will be generated right
after link up (spec violation)
 1.7 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.6 19-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.5 30-Jul-1999  itojun branches: 1.5.2; 1.5.8;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 04-Apr-1999  explorer Add NetBSD rcsid tags, and preserve old ones from i4b source
 1.3 04-Apr-1999  explorer switch to the i4b version of if_sppp*.[ch] (with mods)
 1.2 25-Mar-1999  explorer branches: 1.2.2;
put RCS ids in the right place. And yes, this is a SYNC ppp interface,
used for high-speed (T1, HSSI, DS3) interfaces.
 1.1 25-Mar-1999  explorer port FreeBSD's serial ppp layer to NetBSD. The PPP part seems broken still,
but the lmc driver uses the HDLC bits from here anyway.
 1.2.2.1 04-Apr-1999  explorer branches: 1.2.2.1.2;
Pull up recent changes to if_sppp*.[ch] (i4b code) with RCS id fixes
 1.2.2.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.2.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.5.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.5.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.5.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.2 29-Jul-2001  he Pull up revision 1.24 (requested by martin):
Fix bug causing NCP bit mask to get out of sync. Fixes PR#11161.
 1.10.4.1 17-Oct-2000  tv Pullup 1.13 [itojun]:
fix operator precedence (& and &&). do not transmit too much message
from LCP layer to NCP layer. PR 11161.
 1.20.2.17 29-Dec-2002  thorpej Sync with HEAD.
 1.20.2.16 18-Oct-2002  nathanw Catch up to -current.
 1.20.2.15 17-Sep-2002  nathanw Catch up to -current.
 1.20.2.14 01-Aug-2002  nathanw Catch up to -current.
 1.20.2.13 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.20.2.12 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.20.2.11 20-Jun-2002  nathanw Catch up to -current.
 1.20.2.10 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.20.2.9 28-Feb-2002  nathanw Catch up to -current.
 1.20.2.8 01-Feb-2002  gmcgarry lwp'ify
 1.20.2.7 11-Jan-2002  nathanw More catchup.
 1.20.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.20.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.20.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.20.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.20.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.20.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.23.2.8 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.23.2.7 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.23.2.6 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.23.2.5 16-Mar-2002  jdolecek Catch up with -current.
 1.23.2.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.23.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.23.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.23.2.1 03-Aug-2001  lukem update to -current
 1.26.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.46.4.19 10-Jul-2003  tron Pull up revision 1.67 via patch (requested by martin in ticket #1374):
We should use IFQ_DEQUEUE to get packets from the send queue, not IF_DEQUEUE.
Hopefully this will fix ALTQ for ISDN and PPPoE interfaces.
While there remove an unsued function which contained dubious code
(accessing interface queue internals w/o the proper macros).
 1.46.4.18 24-Jun-2003  grant Pull up revision 1.66 (requested by itojun in ticket #1325):

don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.46.4.17 07-Feb-2003  tron Pull up revision 1.62 (requested by martin in ticket #1152):
Remove a break after a goto.
 1.46.4.16 07-Feb-2003  tron Pull up revision 1.61 (requested by martin in ticket #1152):
Restore the system priority level in case of errors.
OK:ed by martin.
 1.46.4.15 07-Feb-2003  tron Pull up revision 1.60 (requested by martin in ticket #1152):
license clarification from the author, via openbsd
BSD-style license from Serge Vakulenko <vak@cronyx.ru>
 1.46.4.14 07-Feb-2003  tron Pull up revision 1.59 (requested by martin in ticket #1152):
KNF
 1.46.4.13 07-Feb-2003  tron Pull up revision 1.58 (requested by martin in ticket #1152):
KNF - return is not a function.
 1.46.4.12 07-Feb-2003  tron Pull up revision 1.57 (requested by martin in ticket #1152):
If the peer did not answer LCP echo requests in-time, but we got user
data through within the last LCP keepalive interval, do not count this
as a keepalive failure.
Addresses parts of kern/17723.
 1.46.4.11 07-Feb-2003  tron Pull up revision 1.56 (requested by martin in ticket #1152):
Fix async map handling. Many thanks to Joerg Wunsch for the explanation.
 1.46.4.10 07-Feb-2003  tron Pull up revision 1.51 (requested by martin in ticket #1152):
Use "mono_time" instead of "time" for timeout calculations.
 1.46.4.9 07-Feb-2003  tron Pull up revision 1.50 (requested by martin in ticket #1152):
don't access freed memory.
 1.46.4.8 07-Feb-2003  tron Pull up revision 1.49 (requested by martin in ticket #1152):
don't access freed memory.
 1.46.4.7 07-Feb-2003  tron Pull up revision 1.47 (requested by martin in ticket #1152):
don't panic on invalid CONF_ACK from remote (in general, issueing panic
on remote input is bad practice)
 1.46.4.6 28-Jan-2003  jmc Pullup revisions 1.62-1.63 (requested by tron in ticket #1133)
Use MRU negotiated with remote system as MTU. Fixes PR#18850.
 1.46.4.5 10-Jan-2003  jmc Pull up revisions 1.47-1.48 (requested by tron in ticket #1061)
correct tcp header chasing in pp_fastq processing. should fix kern/17491.
 1.46.4.4 17-Aug-2002  lukem Pull up revision 1.55 (requested by groo in ticket #669):
Patches from Frank Kardel:
- length was one off in names and secrets.
- add win 98 kludge but we keep it disabled for now.
- setup the authorization bit early so that we don't end up doing ppp
negotiations without authorization.
 1.46.4.3 17-Aug-2002  lukem Pull up revision 1.54 (requested by groo in ticket #669):
Don't throw away the name and the secret lengths. This eliminates all the
strlen() calls, and there was a whole bunch of them.
 1.46.4.2 17-Aug-2002  lukem Pull up revision 1.53 (requested by groo in ticket #669):
avoid modifying the buffers, by checking for matching lengths first. From
Frank Kardel.
 1.46.4.1 17-Aug-2002  lukem Pull up revision 1.52 (requested by groo in ticket #669):
Use strcmp() instead of memcmp() because if we get passed a 0 length name
and secret, we'll authenticate successfully! While there, rename passwd to
secret so that code looks nicer.
 1.46.2.3 29-Aug-2002  gehenna catch up with -current.
 1.46.2.2 15-Jul-2002  gehenna catch up with -current.
 1.46.2.1 30-May-2002  gehenna Catch up with -current.
 1.66.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.66.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.66.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.66.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.66.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.66.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.66.2.4 24-Sep-2004  skrll Sync with HEAD.
 1.66.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.66.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.66.2.1 03-Aug-2004  skrll Sync with HEAD
 1.75.2.2 23-Aug-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10677):
sys/net/if_spppsubr.c: revision 1.96
A problem has been identified in the in-kernel PPP code shared by ISDN PPP
interfaces ippp(4) and pppoe(4). Insufficient checking of options presented
by the peer may cause writing of copies of the malicious input beyond the
end of a buffer allocated for that purpose.
Issue found by pavel@
Fix from martin@
This is SA2006-019 (CVE-2006-4304)
 1.75.2.1 08-Apr-2004  jdc branches: 1.75.2.1.2; 1.75.2.1.4;
Pull up revision 1.76 (requested by martin in ticket #98)

Be more paranoid about data a non-root user may query.
Fixes PR kern/25099 by Christian Biere.
 1.75.2.1.4.1 23-Aug-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10677):
sys/net/if_spppsubr.c: revision 1.96
A problem has been identified in the in-kernel PPP code shared by ISDN PPP
interfaces ippp(4) and pppoe(4). Insufficient checking of options presented
by the peer may cause writing of copies of the malicious input beyond the
end of a buffer allocated for that purpose.
Issue found by pavel@
Fix from martin@
This is SA2006-019 (CVE-2006-4304)
 1.75.2.1.2.1 23-Aug-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10677):
sys/net/if_spppsubr.c: revision 1.96
A problem has been identified in the in-kernel PPP code shared by ISDN PPP
interfaces ippp(4) and pppoe(4). Insufficient checking of options presented
by the peer may cause writing of copies of the malicious input beyond the
end of a buffer allocated for that purpose.
Issue found by pavel@
Fix from martin@
This is SA2006-019 (CVE-2006-4304)
 1.80.2.1 29-Apr-2005  kent sync with -current
 1.81.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.82.4.1 23-Aug-2006  tron Pull up following revision(s) (requested by adrianp in ticket #1476):
sys/net/if_spppsubr.c: revision 1.96
A problem has been identified in the in-kernel PPP code shared by ISDN PPP
interfaces ippp(4) and pppoe(4). Insufficient checking of options presented
by the peer may cause writing of copies of the malicious input beyond the
end of a buffer allocated for that purpose.
Issue found by pavel@
Fix from martin@
This is SA2006-019 (CVE-2006-4304)
 1.82.2.2 15-Jun-2007  liamjfoy Pull up following revision(s) (requested by msaitoh in ticket #1802):
sys/net/if_spppsubr.c 1.93-1.94

Do not automagically UP the interface when setting the address.
This is the interface level (e.g. pppoe) fix for PR 30694.
 1.82.2.1 23-Aug-2006  tron Pull up following revision(s) (requested by adrianp in ticket #1476):
sys/net/if_spppsubr.c: revision 1.96
A problem has been identified in the in-kernel PPP code shared by ISDN PPP
interfaces ippp(4) and pppoe(4). Insufficient checking of options presented
by the peer may cause writing of copies of the malicious input beyond the
end of a buffer allocated for that purpose.
Issue found by pavel@
Fix from martin@
This is SA2006-019 (CVE-2006-4304)
 1.85.2.7 27-Feb-2008  yamt sync with head.
 1.85.2.6 11-Feb-2008  yamt sync with head.
 1.85.2.5 21-Jan-2008  yamt sync with head
 1.85.2.4 03-Sep-2007  yamt sync with head.
 1.85.2.3 26-Feb-2007  yamt sync with head.
 1.85.2.2 30-Dec-2006  yamt sync with head.
 1.85.2.1 21-Jun-2006  yamt sync with head.
 1.86.2.1 01-Feb-2006  yamt sync with head.
 1.87.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.87.8.4 11-May-2006  elad sync with head
 1.87.8.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.87.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.87.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.87.6.5 14-Sep-2006  yamt sync with head.
 1.87.6.4 03-Sep-2006  yamt sync with head.
 1.87.6.3 11-Aug-2006  yamt sync with head
 1.87.6.2 26-Jun-2006  yamt sync with head.
 1.87.6.1 24-May-2006  yamt sync with head.
 1.87.4.3 01-Jun-2006  kardel Sync with head.
 1.87.4.2 22-Apr-2006  simonb Sync with head.
 1.87.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.87.2.1 09-Sep-2006  rpaulo sync with head
 1.90.2.1 19-Jun-2006  chap Sync with head.
 1.92.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.95.2.1 23-Aug-2006  tron Pull up following revision(s) (requested by adrianp in ticket #55):
sys/net/if_spppsubr.c: revision 1.96
A problem has been identified in the in-kernel PPP code shared by ISDN PPP
interfaces ippp(4) and pppoe(4). Insufficient checking of options presented
by the peer may cause writing of copies of the malicious input beyond the
end of a buffer allocated for that purpose.
Issue found by pavel@
Fix from martin@
This is SA2006-019 (CVE-2006-4304)
 1.97.4.2 10-Dec-2006  yamt sync with head.
 1.97.4.1 22-Oct-2006  yamt sync with head
 1.97.2.2 12-Jan-2007  ad Sync with head.
 1.97.2.1 18-Nov-2006  ad Sync with head.
 1.102.6.1 04-Sep-2008  skrll Sync with netbsd-4.
 1.102.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.102.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.102.2.1 20-Aug-2008  bouyer Pull up following revision(s) (requested by martin in ticket #1185):
sys/net/if_spppsubr.c: revision 1.111
PR kern/39280: Uninitialized callout stopped in if_spppsubr layer
in kernels without options INET6.
 1.104.4.1 11-Jul-2007  mjf Sync with head.
 1.104.2.2 15-Jul-2007  ad Sync with head.
 1.104.2.1 01-Jul-2007  ad Adapt to callout API change.
 1.106.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.106.16.1 26-Dec-2007  ad Sync with head.
 1.106.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.106.8.2 23-Mar-2008  matt sync with HEAD
 1.106.8.1 09-Jan-2008  matt sync with HEAD
 1.109.14.1 27-Jun-2008  simonb Sync with head.
 1.109.12.2 10-Oct-2008  skrll Sync with HEAD.
 1.109.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.109.10.3 11-Aug-2010  yamt sync with head.
 1.109.10.2 11-Mar-2010  yamt sync with head
 1.109.10.1 04-May-2009  yamt sync with head.
 1.109.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.109.6.3 05-Oct-2008  mjf Sync with HEAD.
 1.109.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.109.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.110.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.110.2.1 19-Oct-2008  haad Sync with HEAD.
 1.114.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.114.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.116.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.118.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.119.2.1 30-May-2010  rmind sync with head
 1.123.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.123.2.1 17-Apr-2012  yamt sync with head
 1.124.2.1 18-Feb-2012  mrg merge to -current.
 1.125.6.3 03-Dec-2017  jdolecek update from HEAD
 1.125.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.125.6.1 23-Jun-2013  tls resync from head
 1.126.6.2 18-May-2014  rmind sync with head
 1.126.6.1 28-Aug-2013  rmind sync with head
 1.127.4.1 10-Aug-2014  tls Rebase.
 1.130.6.1 18-Jan-2017  skrll Sync with netbsd-5
 1.130.2.1 25-Sep-2016  bouyer Pull up following revision(s) (requested by joerg in ticket #1254):
sys/net/if_spppsubr.c: revision 1.149
Report link state changes for sppp consumers. The link is considered up,
if the current phase is SPPP_PHASE_NETWORK, otherwise it is down. Useful
when using dhcpcd for DHCPv6 PD.
 1.131.2.9 28-Aug-2017  skrll Sync with HEAD
 1.131.2.8 05-Feb-2017  skrll Sync with HEAD
 1.131.2.7 05-Dec-2016  skrll Sync with HEAD
 1.131.2.6 05-Oct-2016  skrll Sync with HEAD
 1.131.2.5 09-Jul-2016  skrll Sync with HEAD
 1.131.2.4 29-May-2016  skrll Sync with HEAD
 1.131.2.3 22-Apr-2016  skrll Sync with HEAD
 1.131.2.2 22-Sep-2015  skrll Sync with HEAD
 1.131.2.1 06-Jun-2015  skrll Sync with HEAD
 1.146.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.146.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.146.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.146.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.165.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.169.6.5 08-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #613):
sys/net/if_pppoe.c: revision 1.130,1.134
sys/net/if_spppsubr.c: revision 1.172,1.175,1.179
sys/net/if_gif.c: revision 1.138,1.139

Mark callouts of pppoe(4) CALLOUT_MPSAFE. Suggested by ozaki-r@n.o.

fix non-diagnostic compilation

Fix spl leak.
ifconfig gif0 create
ifconfig gif0 destroy
WARNING: SPL NOT LOWERED ON ...

Fix breaking character limit. Pointed out by ozaki-r@n.o, thanks.

Use m_freem instead of m_free. Otherwise we're leaking the next mbufs in
the chain.
 1.169.6.4 16-Jan-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #497):
tests/rump/rumpkern/Makefile: revision 1.16
tests/rump/kernspace/Makefile: revision 1.6
tests/rump/kernspace/workqueue.c: revision 1.1
tests/rump/kernspace/workqueue.c: revision 1.2
tests/rump/kernspace/workqueue.c: revision 1.3
tests/rump/kernspace/workqueue.c: revision 1.4
tests/rump/kernspace/workqueue.c: revision 1.5
tests/rump/kernspace/workqueue.c: revision 1.6
tests/rump/rumpkern/t_workqueue.c: revision 1.1
sys/sys/workqueue.h: revision 1.10
tests/rump/rumpkern/t_workqueue.c: revision 1.2
tests/rump/kernspace/kernspace.h: revision 1.5
tests/rump/kernspace/kernspace.h: revision 1.6
sys/net/if_bridge.c: revision 1.147
distrib/sets/lists/debug/mi: revision 1.225
sys/kern/subr_workqueue.c: revision 1.34
share/man/man9/workqueue.9: revision 1.12
sys/net/if_spppsubr.c: revision 1.178
distrib/sets/lists/tests/mi: revision 1.763
Add simple test for workqueue(9)
Add declaration. build fix
sorry, I forgot to commit this file.
Tweak use of cv_timedwait
- Handle its return value
- Specify more appropriate time-out periods (2 ticks is too short)
Fix a race condition on taking the mutex
The workqueue worker can take the mutex before the tester tries to take it after
calling workqueue_enqueue. If it happens, the worker calls cv_broadcast before
the tester calls cv_timedwait and the tester will wait until the cv timed out
Take the mutex before calling workqueue_enqueue so that the tester surely calls
cv_timedwait before the worker calls cv_broadcast.
The fix stabilizes the test, t_workqueue/workqueue1.
Add workqueue_wait that waits for a specific work to finish
The caller must ensure that no new work is enqueued before calling
workqueue_wait. Note that Note that if the workqueue is WQ_PERCPU, the caller
can enqueue a new work to another queue other than the waiting queue.
Discussed on tech-kern@
Ensure the timer isn't running by using workqueue_wait
Functionalize some routines to add new tests easily (NFC)
Add a test case for workqueue_wait
Fix build
 1.169.6.3 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.169.6.2 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #407):
sys/compat/linux32/common/linux32_socket.c: revision 1.28
sys/net/if.c: revision 1.400
sys/netipsec/key.c: revision 1.243
sys/compat/linux/common/linux_socket.c: revision 1.139
sys/netinet/ip_carp.c: revision 1.93
sys/netinet6/in6.c: revision 1.252
sys/netinet6/in6.c: revision 1.253
sys/netinet6/in6.c: revision 1.254
sys/net/if_spppsubr.c: revision 1.173
sys/net/if_spppsubr.c: revision 1.174
sys/compat/common/uipc_syscalls_40.c: revision 1.14
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Fix usage of FOREACH macro
key_sad.lock is held there so SAVLIST_WRITER_FOREACH is enough.
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref (more)
Fix and make consistent of usages of psz/psref in ifconf variants
Remove unnecessary goto because there is no cleanup code to share (NFC)
Tweak a condition; we don't need to care ifacount to be negative
Fix a race condition of in6_ifinit
in6_ifinit checks the number of IPv6 addresses on a given interface and
if it's zero (i.e., an IPv6 address being assigned to the interface
is the first one), call if_addr_init. However, the actual assignment of
the address (ifa_insert) is out of in6_ifinit. The check and the
assignment must be done atomically.
Fix it by holding in6_ifaddr_lock during in6_ifinit and ifa_insert.
And also add missing pserialize to IFADDR_READER_FOREACH.
 1.169.6.1 02-Nov-2017  snj Pull up following revision(s) (requested by knakahara in ticket #332):
sys/net/if_pppoe.c: 1.127-1.128
sys/net/if_pppoe.h: 1.15
sys/net/if_spppsubr.c: 1.170-1.171
sys/net/if_spppvar.h: 1.21-1.22
Integrate two locks used to protect PPPoE softc. Contributed by s-yamaguchi@IIJ.
PPPOE_SESSION_LOCK protects variables used in PPP packet
processing, on the other hand PPPOE_PARAM_LOCK protects
the other variables used to establish a PPPoE session id.
Those locks isn't acquired in the same time because the
PPP packet processing doesn't work without PPPoE session id.
By the reason, the locks can be integrated into PPPOE_LOCK.
Add locking notes later.
--
sppp_lock is changed from mutex to rwlock now. Contributed by s-yamaguchi@IIJ.
Add locking notes later.
--
Add a locking notes for if_pppoe
--
Add a locking notes for if_spppsubr
--
fix no INET6 build.
 1.179.2.11 22-Jan-2019  pgoyette Convert the MODULE_{,VOID_}HOOK_CALL macros to do everything in-line
rather than defining an intermediate hook##call function. Almost
all of the hooks are called only once, and although we lose the
ability of doing things like

if (MODULE_HOOK_CALL(...) == 0) ...

we simplify things quite a bit. With this change, we no longer need
to have both declaration and definition macros, and the definition
no longer needs to have both prototype argument list and a "real"
argument list.

FWIW, the above if now needs to written as

int ret;

MODULE_HOOK_CALL(..., ret);
if (ret == 0) ...

with appropriate use of braces {}.
 1.179.2.10 18-Jan-2019  pgoyette Don't restrict hooks to having only int or void types. Pass the hook's
type to the various macros, as needed.

Allows us to reduce diffs to original in at least one or two places (we
no longer have to provide an additional parameter to the hook routine
for returning a non-int return value).
 1.179.2.9 14-Jan-2019  pgoyette Create a variant of the HOOK macros that handles hook routines of
type void, and use them where appropriate.
 1.179.2.8 13-Jan-2019  pgoyette Remove the HOOK2 versions of the MODULE_HOOK macros. There were
only a few uses, and using them led to some lack of clarity in the
code. Instead, we now use two separate hooks, with names that
make it clear(er) what we're doing.

This also positions us to start unraveling some of the rtsock_50
mess, which will need (at least) five hooks.
 1.179.2.7 29-Sep-2018  pgoyette In MODULE_HOOK_CALL_DECL we don't need to provide the actual argument
list for calling the hook function, nor do we need to provide the
default value (for when the hook has not been set).
 1.179.2.6 18-Sep-2018  pgoyette The COMPAT_HOOK macros were renamed to MODULE_HOOK, adjust all callers
 1.179.2.5 18-Sep-2018  pgoyette Split the COMPAT_CALL_HOOK to separate the declaration from the
implementation. Some hooks are called from multiple source files,
and the old method resulted in duplicate implementations.

Implement MP-safe hooks for the usb_subr_30 code. Pass the helper
functions as arguments to the compat code so it does not have to
determine if the kernel contains usb code.
 1.179.2.4 17-Sep-2018  pgoyette Adapt (most of) the indirect function pointers to the new MP-safe
mechanism. Still remaining are the compat_netbsd32 stuff, and
some usb subroutines.
 1.179.2.3 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.179.2.2 30-Mar-2018  pgoyette Import fixes from HEAD
 1.179.2.1 21-Mar-2018  pgoyette Move if_spppsubr compat code into the compat50 module.

More prep work for compat80 module (for raidframe)
 1.180.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.180.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.180.2.1 10-Jun-2019  christos Sync with HEAD
 1.184.2.1 29-Feb-2020  ad Sync with head.
 1.187.2.11 11-Apr-2020  is trying to ind the right place for MLPPP renegotiation.
 1.187.2.10 11-Apr-2020  is typo
 1.187.2.9 11-Apr-2020  is MLPPP mrru negotiation.
 1.187.2.8 11-Apr-2020  is macro error
 1.187.2.7 11-Apr-2020  is whitespace error
 1.187.2.6 11-Apr-2020  is compat code to make transplanting this to netbsd-8 easier
 1.187.2.5 10-Apr-2020  is syntax fixes.
 1.187.2.4 10-Apr-2020  is first part of defragmentation code. No dropping/sequence error statistics
yet, and no MRRU negotiation so not active.
 1.187.2.3 07-Apr-2020  is The specification calls this a class, not type.
 1.187.2.2 07-Apr-2020  is Multilink PPP: sanity check of option values, storage of remote MRRU.
 1.187.2.1 07-Apr-2020  is Define a few more LCP options. Recognize, sanity-check and report (but
still reject for the moment) multilink PPP configuration options received.
 1.190.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.215.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.220.2.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.220.2.1 13-May-2021  thorpej Sync with HEAD.
 1.230.2.1 31-May-2021  cjep sync with head
 1.266.4.1 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.267.4.3 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.267.4.2 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.267.4.1 15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.268.2.1 02-Aug-2025  perseant Sync with HEAD
 1.42 01-Jun-2021  yamaguchi Drop the Open event of LCP to stop the interface
even a reconnection is scheduled

The queue for events in if_spppsubr.c is not possible
to enqueue the same event. So, The close event caused
while a close event and open event are enqueued for
reconnection is not possible to stop interface.
To solve this issue, The open event after
"ifconfig pppoe? down" is dropped.
 1.41 01-Jun-2021  yamaguchi remove PP_CISCO that was used in obsoleted drivers e.g. lmc(4)
 1.40 01-Jun-2021  yamaguchi Fix not to do if_down() before reconnect

Almost network interface do not use if_down() even when there is no
connectivity. So, pppoe(4) is also made be not used it.
This behavior can be rollbacked by SPPP_IFDOWN_RECONNECT option.
 1.39 01-Jun-2021  yamaguchi restart LCP when loopback packets are detected

In if_spppsubr.c down and up do not mean that LCP is stopping
or running, but mean that the lower layer of LCP is up or down.
And, restarting of LCP is had to use close event and open event.
 1.38 14-May-2021  yamaguchi Add a parameter to change keepalive interval in each PPPoE I/F
 1.37 11-May-2021  yamaguchi Added ioctl commands for configuring NCP of pppoe(4)
 1.36 11-May-2021  yamaguchi back to r1.34 because of mistake of commit log
 1.35 11-May-2021  yamaguchi Added keywords that are ipcp, noipcp, ipv6cp, noipv6cp
for configuring NCP
 1.34 28-Apr-2021  yamaguchi branches: 1.34.2;
Move paese of conf-req, conf-nak and conf-rej into workqueue
from softint context

When the pases were processed in softint, the state machine
in if_spppsubr.c had been broken by simultaneous events
on rare occasions.

Example:
1. Do ifconfig pppoe* up
- lcp open event is enqueued to workqueue
2. Receive conf-ack, and parse the packet
- save mru to sp->lcp.their_mru
- lcp RCR+ event is enqueued to workqueue
3. Process lcp open event
- initialize data including sp->lcp.their_mru
4. Process lcp RCR+ event
- Use sp->lcp.their_mru
- but it was initialized
 1.33 16-Apr-2021  yamaguchi branches: 1.33.2;
Added missing locking order between sppp and IFNET_LOCK
 1.32 25-Nov-2020  yamaguchi branches: 1.32.2;
Add commands to refer params of control protocols in if_spppsubr.c

reviewed by knakahara@n.o.
 1.31 25-Nov-2020  yamaguchi Reconnect when a down event caused by tlf caught
 1.30 25-Nov-2020  yamaguchi Update ip addresses in the workqueue for control protocols

reviewed by knakahara@n.o.
 1.29 25-Nov-2020  yamaguchi implement auth protocols on the state-machine of control protocols

reviewed by knakahara@n.o.
 1.28 25-Nov-2020  yamaguchi call if_down() in workqueue instead of callout(9)
 1.27 25-Nov-2020  yamaguchi Change a state of control protocol in thread context

reviewed by knakahara@n.o.
 1.26 25-Nov-2020  yamaguchi Add a function for RCR event
 1.25 25-Nov-2020  yamaguchi Add a structure for params related to control protocols
 1.24 05-Oct-2020  roy branches: 1.24.2;
ppp: Remove media

There is none after all.
Applications should be using ifi_link_state and not checking media.
 1.23 04-Apr-2020  is Multilink PPP: sanity check of option values, storage of remote MRRU.
 1.22 12-Oct-2017  knakahara branches: 1.22.4; 1.22.12;
Add a locking notes for if_spppsubr
 1.21 12-Oct-2017  knakahara sppp_lock is changed from mutex to rwlock now. Contributed by s-yamaguchi@IIJ.

Add locking notes later.
 1.20 13-Dec-2016  knakahara branches: 1.20.8;
MP-safe pppoe(4).

Nearly all parts is implemented by Shoichi YAMAGUCHI<s-yamaguchi@IIJ>, thanks.
 1.19 01-Dec-2016  knakahara fix two races between set_ip_addrs and clear_ip_addrs race.

(1) if set_ip_addrs and clear_ip_addrs run parallel, they can parallel call
IN_ADDRHASH_WRITER_REMOVE to the same ifa.
(2) if set_ip_addrs's workqueue is separated from clear_ip_addrs's one,
the workers can run in reverse order of enqueued.
 1.18 25-Nov-2016  knakahara make workqueue sppp_{set,clear}_ip_addrs to be able to call pserialize_perform.
 1.17 14-Sep-2016  roy Add interface media for sppp consumers.
While there is no actual media to select,
the ioctl is used to query link status from userland.
 1.16 05-Oct-2009  dyoung branches: 1.16.22; 1.16.40; 1.16.44;
Replace u_quad_t with uint64_t. u_quad_t is just a typedef for
uint64_t, so no ABI/API breakage will result from this change.
 1.15 22-Aug-2008  martin Backout previous/restore initial fix for PR kern/39280.
The later changes were only cosmetic, cause problems in IPv6-only-
connections (reported by Wolfgang Solfrank in private mail), as well
as reintroducing the original bug again.
 1.14 05-Aug-2008  degroote We have a dummy entry for IPV6CP even if the non-INET6 case.
So always reference IDX_IPV6CP
Fix build of if_spppsubr.c if INET6 is not defined.
 1.13 04-Aug-2008  christos keep the loop, but arrange IDX_COUNT to be correct.
 1.12 20-Feb-2008  matt branches: 1.12.10; 1.12.16;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.11 10-Dec-2005  elad branches: 1.11.46;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.10 03-Sep-2003  martin branches: 1.10.16;
Rearange dead link detection slightly:
As long as we receive data from the peer, don't worry. When we have not
received anything within the "max_noreceive" period, we start sending LCP
echo requests and count them, until we receive an answer (or some data)
or the "maxalive" count of not answered echo requests is reached.
All this is checked at a global 10 seconds interval for all interfaces.
The "max_noreceive" period and the "maxalive" count are configurable per
interface.
 1.9 09-Jul-2003  martin We should use IFQ_DEQUEUE to get packets from the send queue, not IF_DEQUEUE.
Hopefully this will fix ALTQ for ISDN and PPPoE interfaces.

While there remove an unsued function which contained dubious code
(accessing interface queue internals w/o the proper macros).
 1.8 08-Jul-2003  itojun prototype must not have variable name
 1.7 28-Jan-2003  tron branches: 1.7.2;
Use MRU negotiated with remote system as MTU. This fixes PR kern/18850
by Curt Sampson.
 1.6 22-Jan-2003  jmmv Fix typo: realy -> really. Okay'ed by wiz.
 1.5 28-Jul-2002  christos Don't throw away the name and the secret lengths. This eliminates all the
strlen() calls, and there was a whole bunch of them.
 1.4 02-Mar-2002  martin branches: 1.4.6; 1.4.8;
Add support to query the peer for DNS addreses when negotiating IPCP.
Add ioctls to retrieve the results.

While here remove a malloc()/free() of an unused buffer.
 1.3 07-Jan-2002  martin branches: 1.3.2; 1.3.4;
Implement a retry counter for failed authorizations and limit it to
a configurable maximum (default: 5).

Some ISPs shut down accounts (at least temporarily) after to many bad
retries. This hit me recently due to a stupid pilot error and the fast
retry rate.
 1.2 06-Jan-2002  martin Implement an activity timestamp, recording the last time payload data
passed through.

Implement optional idle timeout.
 1.1 05-Jan-2002  martin Ooops, forgot to commit this file when doing the great if_spppsubr.c
rotottil. Thanks to Launey Thomas for pointing this out.
 1.3.4.4 01-Aug-2002  nathanw Catch up to -current.
 1.3.4.3 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.3.4.2 11-Jan-2002  nathanw More catchup.
 1.3.4.1 07-Jan-2002  nathanw file if_spppvar.h was added on branch nathanw_sa on 2002-01-11 23:39:45 +0000
 1.3.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.3.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.3.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.3.2.1 07-Jan-2002  thorpej file if_spppvar.h was added on branch kqueue on 2002-01-10 20:02:13 +0000
 1.4.8.3 28-Jan-2003  jmc Pullup revisions 1.62-1.63 (requested by tron in ticket #1133)
Use MRU negotiated with remote system as MTU. Fixes PR#18850.
 1.4.8.2 26-Jan-2003  jmc Pullup revisions 1.5-1.6 (requested by jmmv in ticket #1102)
Fix typo: realy -> really. Okay'ed by wiz.
 1.4.8.1 17-Aug-2002  lukem Pull up revision 1.5 (requested by groo in ticket #669):
Don't throw away the name and the secret lengths. This eliminates all the
strlen() calls, and there was a whole bunch of them.
 1.4.6.1 29-Aug-2002  gehenna catch up with -current.
 1.7.2.4 11-Dec-2005  christos Sync with head.
 1.7.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.2.1 03-Aug-2004  skrll Sync with HEAD
 1.10.16.2 27-Feb-2008  yamt sync with head.
 1.10.16.1 21-Jun-2006  yamt sync with head.
 1.11.46.1 23-Mar-2008  matt sync with HEAD
 1.12.16.1 19-Oct-2008  haad Sync with HEAD.
 1.12.10.1 11-Mar-2010  yamt sync with head
 1.16.44.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.16.44.1 04-Nov-2016  pgoyette Sync with HEAD
 1.16.40.3 05-Feb-2017  skrll Sync with HEAD
 1.16.40.2 05-Dec-2016  skrll Sync with HEAD
 1.16.40.1 05-Oct-2016  skrll Sync with HEAD
 1.16.22.1 03-Dec-2017  jdolecek update from HEAD
 1.20.8.1 02-Nov-2017  snj Pull up following revision(s) (requested by knakahara in ticket #332):
sys/net/if_pppoe.c: 1.127-1.128
sys/net/if_pppoe.h: 1.15
sys/net/if_spppsubr.c: 1.170-1.171
sys/net/if_spppvar.h: 1.21-1.22
Integrate two locks used to protect PPPoE softc. Contributed by s-yamaguchi@IIJ.
PPPOE_SESSION_LOCK protects variables used in PPP packet
processing, on the other hand PPPOE_PARAM_LOCK protects
the other variables used to establish a PPPoE session id.
Those locks isn't acquired in the same time because the
PPP packet processing doesn't work without PPPoE session id.
By the reason, the locks can be integrated into PPPOE_LOCK.
Add locking notes later.
--
sppp_lock is changed from mutex to rwlock now. Contributed by s-yamaguchi@IIJ.
Add locking notes later.
--
Add a locking notes for if_pppoe
--
Add a locking notes for if_spppsubr
--
fix no INET6 build.
 1.22.12.3 10-Apr-2020  is first part of defragmentation code. No dropping/sequence error statistics
yet, and no MRRU negotiation so not active.
 1.22.12.2 07-Apr-2020  is fix typo in comment.
 1.22.12.1 07-Apr-2020  is Multilink PPP: sanity check of option values, storage of remote MRRU.
 1.22.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.24.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.32.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.33.2.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.33.2.1 13-May-2021  thorpej Sync with HEAD.
 1.34.2.1 31-May-2021  cjep sync with head
 1.32 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.31 29-Jan-2020  thorpej branches: 1.31.10;
Adopt <net/if_stats.h>.
 1.30 27-Apr-2019  pgoyette branches: 1.30.4;
A few more empty-string --> NULL in required-modules lists
 1.29 26-Mar-2019  pgoyette Add cloned-interface-create code to srt open() routine so behavior
matches that which is documented in srtconfig(1) man page. Without
this, srt only works if you first create the srtN interface with
ifconfig(8).
 1.28 26-Mar-2019  pgoyette Add devsw_{attach,detach} stuff for _MODULE variant. (Not needed for
built-in variant since the devsw is also built-in.) This will allow
the modular srt devices to be accessed via open(2) and ioctl(2).

XXX Someone(tm) needs to update MAKEDEV to create the /dev/srtN device
nodes (with device-major 179)!
 1.27 23-Oct-2017  msaitoh branches: 1.27.4;
- If if_attach() failed in the attach function, free resources and return.
- KNF
 1.26 14-Feb-2017  ozaki-r branches: 1.26.4; 1.26.6;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.25 09-Feb-2017  kre PR kern/51280

This allows srt devices to work for IPv6. srt still needs work
(particularly #ifdef INET6 but also general effeciency and similar.)
 1.24 14-Jan-2017  maya branches: 1.24.2;
appease coverity by using strlcpy instead of strncpy

ok riastradh
 1.23 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.22 20-Jun-2016  knakahara branches: 1.22.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.21 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.20 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.19 25-Jul-2014  dholland branches: 1.19.4;
Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.18 16-Mar-2014  dholland branches: 1.18.2;
Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.17 28-Oct-2011  dyoung branches: 1.17.2; 1.17.12; 1.17.16;
For these interfaces, the implementation of SIOCSIFDSTADDR is identical
to SIOCINITIFADDR, and SIOCSIFDSTADDR callers always fall back to
SIOCINITIFADDR, so just get rid of the SIOCSIFDSTADDR case.
 1.16 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.15 09-Sep-2010  tls From Coyote Point source tree: "fix" srt IPv4 lookup on little-endian
hosts. IPv6 is probably still broken, and, actually, the lookup table
for mask values should be kept in network byte order, not host byte order
and the corresponding change to the srtconfig ioctl interface made.

But at least this works.
 1.14 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.13 19-Jan-2010  pooka branches: 1.13.2; 1.13.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.12 09-Dec-2009  dyoung KNF.
 1.11 18-Mar-2009  cegger bzero -> memset
 1.10 18-Mar-2009  cegger bcmp -> memcmp
 1.9 07-Nov-2008  dyoung branches: 1.9.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.8 15-Jun-2008  christos branches: 1.8.2; 1.8.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.7 07-Feb-2008  dyoung branches: 1.7.6; 1.7.8; 1.7.10; 1.7.12; 1.7.14;
Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.6 11-Dec-2007  lukem use __KERNEL_RCSID()
 1.5 04-Mar-2007  christos branches: 1.5.16; 1.5.22; 1.5.24; 1.5.26; 1.5.28;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.4 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.3 05-Jan-2007  mouse branches: 1.3.2; 1.3.4;
Add workarounds for include-file bugs exposed by this file. (Ideal, of
course, would be to fix the include-file bugs; that may follow later.)
 1.2 29-Dec-2006  wiz branches: 1.2.2;
Add RCS Id.
 1.1 29-Dec-2006  mouse Very first import of the source-address-based routing pseudo-device,
before any cleanup at all, per discussion with perry@.
 1.2.2.6 11-Feb-2008  yamt sync with head.
 1.2.2.5 21-Jan-2008  yamt sync with head
 1.2.2.4 03-Sep-2007  yamt sync with head.
 1.2.2.3 26-Feb-2007  yamt sync with head.
 1.2.2.2 30-Dec-2006  yamt sync with head.
 1.2.2.1 29-Dec-2006  yamt file if_srt.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:20 +0000
 1.3.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.3.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.3.2.2 12-Jan-2007  ad Sync with head.
 1.3.2.1 05-Jan-2007  ad file if_srt.c was added on branch newlock2 on 2007-01-12 01:04:12 +0000
 1.5.28.1 13-Dec-2007  bouyer Sync with HEAD
 1.5.26.1 11-Dec-2007  yamt sync with head.
 1.5.24.1 26-Dec-2007  ad Sync with head.
 1.5.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.5.16.2 23-Mar-2008  matt sync with HEAD
 1.5.16.1 09-Jan-2008  matt sync with HEAD
 1.7.14.1 18-Jun-2008  simonb Sync with head.
 1.7.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.7.10.4 09-Oct-2010  yamt sync with head
 1.7.10.3 11-Aug-2010  yamt sync with head.
 1.7.10.2 11-Mar-2010  yamt sync with head
 1.7.10.1 04-May-2009  yamt sync with head.
 1.7.8.1 17-Jun-2008  yamt sync with head.
 1.7.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.7.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.8.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.8.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.8.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.9.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.13.4.2 05-Mar-2011  rmind sync with head
 1.13.4.1 30-May-2010  rmind sync with head
 1.13.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.13.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.17.16.1 18-May-2014  rmind sync with head
 1.17.12.2 03-Dec-2017  jdolecek update from HEAD
 1.17.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.18.2.1 10-Aug-2014  tls Rebase.
 1.19.4.6 28-Aug-2017  skrll Sync with HEAD
 1.19.4.5 05-Feb-2017  skrll Sync with HEAD
 1.19.4.4 05-Oct-2016  skrll Sync with HEAD
 1.19.4.3 09-Jul-2016  skrll Sync with HEAD
 1.19.4.2 29-May-2016  skrll Sync with HEAD
 1.19.4.1 22-Sep-2015  skrll Sync with HEAD
 1.22.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.24.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.26.6.1 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.26.4.2 29-Apr-2017  pgoyette Revise previous. Rather than explicitly including <sys/localcount.h>
in all the places where {b,c}devsw is initialized, just include it
from <sys/conf.h>. This avoids an include-sequence dependancy.
 1.26.4.1 29-Apr-2017  pgoyette Add DEVSW_MODULE_INIT to existing device-driver modules, so that they
willl have a localcount defined and thus be permitted to load. Without
a localcount, loading the module will return EINVAL.

XXX the dtrace and drm stuff might need to be fed back upstream?
 1.27.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.27.4.1 10-Jun-2019  christos Sync with HEAD
 1.30.4.1 29-Feb-2020  ad Sync with head.
 1.31.10.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.4 06-Sep-2015  dholland More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.3 09-Dec-2009  dyoung branches: 1.3.22; 1.3.40;
KNF.
 1.2 29-Dec-2006  wiz branches: 1.2.2; 1.2.4; 1.2.48;
Add RCS Id.
 1.1 29-Dec-2006  mouse Very first import of the source-address-based routing pseudo-device,
before any cleanup at all, per discussion with perry@.
 1.2.48.1 11-Mar-2010  yamt sync with head
 1.2.4.2 12-Jan-2007  ad Sync with head.
 1.2.4.1 29-Dec-2006  ad file if_srt.h was added on branch newlock2 on 2007-01-12 01:04:12 +0000
 1.2.2.2 30-Dec-2006  yamt sync with head.
 1.2.2.1 29-Dec-2006  yamt file if_srt.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:20 +0000
 1.3.40.1 22-Sep-2015  skrll Sync with HEAD
 1.3.22.1 03-Dec-2017  jdolecek update from HEAD
 1.5 29-Jun-2024  riastradh if_stats(9): New dtrace probes on if_statinc/dec/add/sub.

Note: This doesn't apply to if_statinc/dec/add/sub_ref, because we
don't have the ifp passed through. To be done in a separate commit
which also adjusts all drivers.

PR kern/58377
 1.4 29-Jun-2021  riastradh Make if_stats_init, if_attach, if_initialize return void.

percpu_alloc can't fail.


Author: Maya Rashish <maya@NetBSD.org>
Committer: Taylor R Campbell <riastradh@NetBSD.org>
 1.3 14-Feb-2020  thorpej branches: 1.3.2; 1.3.6; 1.3.14;
Remove the conditional __IF_STATS_PERCPU.
 1.2 07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.1 29-Jan-2020  thorpej Add support for MP-safe network interface statistics by maintaining them
in per-cpu storage, and collecting them for export in an if_data structure
when user-space wants them.

The new if_stat API is structured to make a gradual transition to the
new way in network drivers possible, and per-cpu stats are currently
disabled (thus there is no kernel ABI change). Once all drivers have
been converted, the old ABI will be removed, and per-cpu stats will be
enabled universally.
 1.3.14.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3.6.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.6.1 14-Feb-2020  martin file if_stats.c was added on branch phil-wifi on 2020-04-08 14:08:57 +0000
 1.3.2.2 29-Feb-2020  ad Sync with head.
 1.3.2.1 14-Feb-2020  ad file if_stats.c was added on branch ad-namecache on 2020-02-29 20:21:06 +0000
 1.6 01-Jul-2024  riastradh if_stats(9): New dtrace probes on if_statinc/dec/add/sub_ref.

PR kern/58377
 1.5 29-Jun-2024  riastradh branches: 1.5.2;
if_stats(9): Add ifp argument to if_stat..._ref.

This will enable us to pass the ifp through to a dtrace probe inside.

No functional change intended in this change, but this is an API
change visible to modules so it shouldn't be pulled up.

PR kern/58377
 1.4 29-Jun-2024  riastradh if_stats(9): New dtrace probes on if_statinc/dec/add/sub.

Note: This doesn't apply to if_statinc/dec/add/sub_ref, because we
don't have the ifp passed through. To be done in a separate commit
which also adjusts all drivers.

PR kern/58377
 1.3 29-Jun-2021  riastradh Make if_stats_init, if_attach, if_initialize return void.

percpu_alloc can't fail.


Author: Maya Rashish <maya@NetBSD.org>
Committer: Taylor R Campbell <riastradh@NetBSD.org>
 1.2 14-Feb-2020  thorpej branches: 1.2.2; 1.2.6; 1.2.14;
Remove the conditional __IF_STATS_PERCPU.
 1.1 29-Jan-2020  thorpej Add support for MP-safe network interface statistics by maintaining them
in per-cpu storage, and collecting them for export in an if_data structure
when user-space wants them.

The new if_stat API is structured to make a gradual transition to the
new way in network drivers possible, and per-cpu stats are currently
disabled (thus there is no kernel ABI change). Once all drivers have
been converted, the old ABI will be removed, and per-cpu stats will be
enabled universally.
 1.2.14.1 01-Aug-2021  thorpej Sync with HEAD.
 1.2.6.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2.6.1 14-Feb-2020  martin file if_stats.h was added on branch phil-wifi on 2020-04-08 14:08:57 +0000
 1.2.2.2 29-Feb-2020  ad Sync with head.
 1.2.2.1 14-Feb-2020  ad file if_stats.h was added on branch ad-namecache on 2020-02-29 20:21:06 +0000
 1.5.2.1 02-Aug-2025  perseant Sync with HEAD
 1.109 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.108 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.107 29-Jan-2020  thorpej branches: 1.107.10;
Adopt <net/if_stats.h>.
 1.106 26-Apr-2019  pgoyette branches: 1.106.4;
Some more empty-string --> NULL conversions for module dependencies
 1.105 26-Jun-2018  msaitoh branches: 1.105.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.104 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.103 15-Nov-2017  knakahara branches: 1.103.2;
Add argument to encapsw->pr_input() instead of m_tag.
 1.102 23-Oct-2017  msaitoh - If if_attach() failed in the attach function, free resources and return.
- KNF
 1.101 12-Dec-2016  ozaki-r branches: 1.101.8;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.100 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.99 18-Aug-2016  knakahara eliminate stf(4)'s dependency on gif(4).

stf(4) depends on not gif(4) but ip_encap.
 1.98 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.97 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.96 08-Jul-2016  ozaki-r branches: 1.96.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.95 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.94 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.93 04-Jul-2016  knakahara make encap_lock_{enter,exit} interruptable.
 1.92 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (2/2) : ip_encap side

The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
 1.91 22-Jun-2016  ozaki-r Remove unnecessary NULL checks of ifa->ifa_addr

If it's NULL, it should be a bug. There many IFADDR_FOREACH that don't do
NULL check. If it can be NULL, they should fire already.
 1.90 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.89 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.88 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.87 28-Jan-2016  knakahara fix my wrong modification
 1.86 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.85 22-Jan-2016  riastradh Back out previous change to introduce struct encapsw.

This change was intended, but Nakahara-san had already made a better
one locally! So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
 1.84 22-Jan-2016  riastradh Don't abuse struct protosw for ip_encap -- introduce struct encapsw.

Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
 1.83 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.82 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.81 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.80 12-Jun-2014  christos branches: 1.80.4;
PR/48901: Fail at compile time when trying to compile stf without inet6,
and print an explanatory message.
 1.79 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.78 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.77 28-Oct-2011  dyoung branches: 1.77.12; 1.77.16; 1.77.26;
Don't kauth-orize SIOCSIFMTU in pppsioctl() and stf_ioctl(), ifioctl()
has already done that for us.
 1.76 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.75 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.74 19-Jan-2010  pooka branches: 1.74.2; 1.74.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.73 08-Nov-2009  christos PR/42285: PR/41559: Daniel Hagerty: if_stf doesn't count output bytes
 1.72 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.71 15-Apr-2009  elad Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.
 1.70 18-Mar-2009  cegger bcopy -> memcpy
 1.69 18-Mar-2009  cegger bcmp -> memcmp
 1.68 07-Nov-2008  dyoung branches: 1.68.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.67 24-Oct-2008  dyoung branches: 1.67.2;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.66 15-Jun-2008  christos branches: 1.66.2;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.65 20-Feb-2008  matt branches: 1.65.6; 1.65.8; 1.65.10; 1.65.12; 1.65.14;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.64 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.63 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.62 19-Oct-2007  ad branches: 1.62.2; 1.62.4; 1.62.8;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.61 01-Sep-2007  dyoung branches: 1.61.4;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.60 02-May-2007  dyoung branches: 1.60.2; 1.60.6; 1.60.8;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.59 29-Mar-2007  ad lwp::l_acflag is no longer used.
 1.58 04-Mar-2007  christos branches: 1.58.2; 1.58.4; 1.58.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.57 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.56 15-Dec-2006  joerg branches: 1.56.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.55 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.54 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.53 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.52 23-Jul-2006  ad branches: 1.52.4; 1.52.6;
Use the LWP cached credentials where sane.
 1.51 14-May-2006  elad integrate kauth.
 1.50 11-Dec-2005  thorpej branches: 1.50.4; 1.50.6; 1.50.8; 1.50.10; 1.50.12;
ANSI function decls and application of static.
 1.49 11-Dec-2005  christos merge ktrace-lwp.
 1.48 02-Jun-2005  tron branches: 1.48.2;
Change the first argument of the encapsulation check function from
"const struct mbuf *" to "struct mbuf *". Without this change the
actual implementation cannot even use m_copydata() on the mbuf chain
which is broken.
 1.47 02-Jun-2005  tron Remove type casts and lint directives which are now longer necessary
because the first argument of m_copydata() is "const struct mbuf *" now.
 1.46 11-Mar-2005  tron Add support for changing the MTU to stf(4).
 1.45 26-Feb-2005  perry nuke trailing whitespace
 1.44 25-Jan-2005  matt Switch to using ifa for ifaddr's instead of ia (which are traditionally
used for in_ifaddr's) which could lead to confusion.
 1.43 25-Jan-2005  tron branches: 1.43.2;
Fix cut and paste error in last commit.
 1.42 24-Jan-2005  matt Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.41 04-Dec-2004  peter branches: 1.41.4;
Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.40 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.39 26-Apr-2004  matt Remove #else of #if __STDC__
 1.38 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.37 21-Apr-2004  itojun kill sprintf, use snprintf
 1.36 12-Nov-2003  cl branches: 1.36.4;
catch up with in_ifaddr -> in_ifaddrhead rename
 1.35 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.34 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.33 01-May-2003  itojun branches: 1.33.2;
bpf_mtap() does not care about M_PKTHDR at the top. M_COPY_PKTHDR has some
consequences, so avoid it. if we need to attach dummy headers, we should
use M_PREPEND instead.
 1.32 17-Nov-2002  itojun more pickier packet validation, based on
draft-savola-v6ops-6to4-security-00.txt. sync w/kame
 1.31 17-Sep-2002  itojun fix comment, sync with kame
 1.30 17-Sep-2002  itojun reject SIOCAIFADDR if embedded address is in private address range. sync w/kame
 1.29 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.28 06-Aug-2002  itojun backout previous. i was looking at the wrong RFC.
 1.27 05-Aug-2002  itojun based on RFC2529, stf(4) should have 1480 as MTU, not 1280.
tron found it, sync w/kame
 1.26 23-Jul-2002  tron Increase interface output error count in case of a failure.
 1.25 23-Jul-2002  tron Increase interface output counter for every encapsulated packet sent to IP.
 1.24 20-Jun-2002  itojun reject packets with IPv4 private address range. sync w/kame
 1.23 21-Dec-2001  itojun branches: 1.23.8; 1.23.10;
move protosw fragment for gif/stf to their own source code.
reduce #ifdef in stf code. sync with kame
 1.22 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.21 12-Nov-2001  lukem add RCSIDs
 1.20 06-Nov-2001  itojun too many curly brace.
 1.19 06-Nov-2001  matt Fix pr#14481
 1.18 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.17 18-Jul-2001  thorpej branches: 1.17.4;
bzero -> memset
 1.16 08-Jun-2001  itojun branches: 1.16.2;
inject outgoing packet to bpf. KAME PR 358.
 1.15 10-May-2001  itojun correct ecn consideration on tunnel encap/decap. sync with kame.
 1.14 29-Apr-2001  itojun correct outbound outer IPv4 destination address selection.
IFF_LINK0 disables inbound path, removes security worries.
more examples in manpage.
 1.13 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.12 20-Feb-2001  itojun branches: 1.12.2;
explicitly use u_int32_t for DLT_NULL encapsulation.

correct gif address family. from chopps, sync with kame.
 1.11 17-Feb-2001  itojun update comment to meet 6to4 RFC. sync with kame
 1.10 22-Jan-2001  itojun make it possible to turn off ingress filter on gif/stf tunnel egress,
by using IFF_LINK2. (part of) PR 11163 from Ken Raeburn.
 1.9 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.8 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.7 18-Dec-2000  thorpej Fill in if_dlt.
 1.6 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.5 05-Jul-2000  thorpej branches: 1.5.2;
stf(4) is now a cloning network interface (although, only one is allowed
to be created).
 1.4 10-Jun-2000  itojun branches: 1.4.2;
update i-d #. (sync with kame)
 1.3 14-May-2000  itojun branches: 1.3.2;
sync IPv4 rogue address filter with RFC1122. (sync with kame)
 1.2 21-Apr-2000  itojun update comment (analysis on 04 draft)
 1.1 19-Apr-2000  itojun introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.3.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.4.2.3 10-Jun-2001  he Pull up revision 1.16 (requested by itojun):
Inject packets to bpf in outgoing path.
 1.4.2.2 01-May-2001  he Pull up revision 1.14 (partial, via patch, requested by itojun):
Fix outbound outer IPv4 destination address selection.
 1.4.2.1 01-May-2001  he Pull up revision 1.10 (via patch, requested by itojun):
Make it possible to turn off ingress filter on gif/stf tunnel
egress by using IFF_LINK2. Fixes (part of) PR#11163.
 1.5.2.9 23-Apr-2001  bouyer Kill unwanted differences with HEAD
 1.5.2.8 21-Apr-2001  bouyer Sync with HEAD
 1.5.2.7 12-Mar-2001  bouyer Sync with HEAD.
 1.5.2.6 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.5.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.5.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.5.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.2.1 05-Jul-2000  bouyer file if_stf.c was added on branch thorpej_scsipi on 2000-11-20 18:10:06 +0000
 1.12.2.8 11-Dec-2002  thorpej Sync with HEAD.
 1.12.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.12.2.6 27-Aug-2002  nathanw Catch up to -current.
 1.12.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.12.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.12.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.12.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.12.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.16.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.16.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.2.1 03-Aug-2001  lukem update to -current
 1.17.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.23.10.5 06-Aug-2002  lukem Pull up revision 1.28 (requested by itojun in ticket #630):
backout previous. i was looking at the wrong RFC.
 1.23.10.4 06-Aug-2002  lukem Pull up revision 1.27 (requested by itojun in ticket #628):
based on RFC2529, stf(4) should have 1480 as MTU, not 1280.
tron found it, sync w/kame
 1.23.10.3 05-Aug-2002  lukem Pull up revision 1.26 (requested by tron in ticket #623):
Increase interface output error count in case of a failure.
 1.23.10.2 05-Aug-2002  lukem Pull up revision 1.25 (requested by tron in ticket #623):
Increase interface output counter for every encapsulated packet sent to IP.
 1.23.10.1 21-Jun-2002  lukem Pull up revision 1.24 (requested by itojun in ticket #326):
reject packets with IPv4 private address range. sync w/kame
 1.23.8.2 29-Aug-2002  gehenna catch up with -current.
 1.23.8.1 15-Jul-2002  gehenna catch up with -current.
 1.33.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.33.2.8 01-Apr-2005  skrll Sync with HEAD.
 1.33.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.33.2.6 04-Feb-2005  skrll Sync with HEAD.
 1.33.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.33.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.33.2.1 03-Aug-2004  skrll Sync with HEAD
 1.36.4.1 08-May-2005  snj Pull up revision 1.46 (requested by tron in ticket #1312):
Add support for changing the MTU to stf(4).
 1.41.4.1 29-Apr-2005  kent sync with -current
 1.43.2.3 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.43.2.2 12-Feb-2005  yamt sync with head.
 1.43.2.1 25-Jan-2005  yamt file if_stf.c was added on branch yamt-km on 2005-02-12 18:17:53 +0000
 1.48.2.8 27-Feb-2008  yamt sync with head.
 1.48.2.7 11-Feb-2008  yamt sync with head.
 1.48.2.6 21-Jan-2008  yamt sync with head
 1.48.2.5 27-Oct-2007  yamt sync with head.
 1.48.2.4 03-Sep-2007  yamt sync with head.
 1.48.2.3 26-Feb-2007  yamt sync with head.
 1.48.2.2 30-Dec-2006  yamt sync with head.
 1.48.2.1 21-Jun-2006  yamt sync with head.
 1.50.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.50.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.50.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.50.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.50.8.2 11-Aug-2006  yamt sync with head
 1.50.8.1 24-May-2006  yamt sync with head.
 1.50.6.1 01-Jun-2006  kardel Sync with head.
 1.50.4.1 09-Sep-2006  rpaulo sync with head
 1.52.6.3 18-Dec-2006  yamt sync with head.
 1.52.6.2 10-Dec-2006  yamt sync with head.
 1.52.6.1 22-Oct-2006  yamt sync with head
 1.52.4.2 12-Jan-2007  ad Sync with head.
 1.52.4.1 18-Nov-2006  ad Sync with head.
 1.56.2.4 07-May-2007  yamt sync with head.
 1.56.2.3 15-Apr-2007  yamt sync with head.
 1.56.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.56.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.58.6.1 29-Mar-2007  reinoud Pullup to -current
 1.58.4.1 11-Jul-2007  mjf Sync with head.
 1.58.2.4 23-Oct-2007  ad Sync with head.
 1.58.2.3 09-Oct-2007  ad Sync with head.
 1.58.2.2 08-Jun-2007  ad Sync with head.
 1.58.2.1 10-Apr-2007  ad Sync with head.
 1.60.8.3 23-Mar-2008  matt sync with HEAD
 1.60.8.2 09-Jan-2008  matt sync with HEAD
 1.60.8.1 06-Nov-2007  matt sync with HEAD
 1.60.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.60.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.60.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.61.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.62.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.62.4.1 26-Dec-2007  ad Sync with head.
 1.62.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.65.14.1 18-Jun-2008  simonb Sync with head.
 1.65.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.65.10.3 11-Aug-2010  yamt sync with head.
 1.65.10.2 11-Mar-2010  yamt sync with head
 1.65.10.1 04-May-2009  yamt sync with head.
 1.65.8.1 17-Jun-2008  yamt sync with head.
 1.65.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.65.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.66.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.67.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.67.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.68.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.74.4.1 30-May-2010  rmind sync with head
 1.74.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.77.26.1 10-Aug-2014  tls Rebase.
 1.77.16.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.77.12.2 03-Dec-2017  jdolecek update from HEAD
 1.77.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.80.4.6 05-Feb-2017  skrll Sync with HEAD
 1.80.4.5 05-Oct-2016  skrll Sync with HEAD
 1.80.4.4 09-Jul-2016  skrll Sync with HEAD
 1.80.4.3 29-May-2016  skrll Sync with HEAD
 1.80.4.2 19-Mar-2016  skrll Sync with HEAD
 1.80.4.1 22-Sep-2015  skrll Sync with HEAD
 1.96.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.96.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.101.8.2 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.101.8.1 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.103.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.103.2.1 02-May-2018  pgoyette Synch with HEAD
 1.105.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.105.2.1 10-Jun-2019  christos Sync with HEAD
 1.106.4.1 29-Feb-2020  ad Sync with head.
 1.107.10.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.8 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.7 18-Aug-2016  knakahara branches: 1.7.8;
eliminate stf(4)'s dependency on gif(4).

stf(4) depends on not gif(4) but ip_encap.
 1.6 28-Jan-2016  knakahara fix my wrong modification
 1.5 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.4 11-Dec-2005  thorpej branches: 1.4.120; 1.4.140;
ANSI function decls and application of static.
 1.3 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 11-Mar-2005  tron branches: 1.2.4;
Add support for changing the MTU to stf(4).
 1.1 19-Apr-2000  itojun branches: 1.1.6; 1.1.30; 1.1.36; 1.1.38; 1.1.40;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.1.40.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.38.1 29-Apr-2005  kent sync with -current
 1.1.36.1 08-May-2005  snj Pull up revision 1.2 (requested by tron in ticket #1312):
Add support for changing the MTU to stf(4).
 1.1.30.2 11-Dec-2005  christos Sync with head.
 1.1.30.1 01-Apr-2005  skrll Sync with HEAD.
 1.1.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.6.1 19-Apr-2000  bouyer file if_stf.h was added on branch thorpej_scsipi on 2000-11-20 18:10:06 +0000
 1.2.4.1 21-Jun-2006  yamt sync with head.
 1.4.140.2 05-Oct-2016  skrll Sync with HEAD
 1.4.140.1 19-Mar-2016  skrll Sync with HEAD
 1.4.120.1 03-Dec-2017  jdolecek update from HEAD
 1.7.8.1 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.114 19-Jan-2020  thorpej Remove the strip(4) - Starmode Radio IP - pseudo-device driver. It is
long since obsolete.
 1.113 03-Feb-2019  mrg branches: 1.113.6;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily
 1.112 24-Jan-2019  knakahara Add comments about D_MPSAFE to functions called as struct linesw.l_ioctl.
 1.111 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.110 06-Jun-2018  maya branches: 1.110.2;
Remove duplicate ;
 1.109 20-Apr-2018  knakahara SIOCSIFDSTADDR uses struct ifreq instead of struct ifaddr or struct in_aliasreq.

SIOCSIFDSTADDR is not used by base package commands...

I checked sys/net*/* only.
 1.108 13-Apr-2017  maya branches: 1.108.10;
if MGETHDR fails, don't try to copy to single mbuf and deref null.

reduce ifdefs.
 1.107 02-Oct-2016  christos branches: 1.107.2;
MFREE -> m_free
 1.106 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.105 06-Aug-2016  christos make strip and slip modular, and cosmetic for ppp.
 1.104 10-Jun-2016  ozaki-r branches: 1.104.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.103 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.102 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.101 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.100 14-Jul-2015  ozaki-r Move rt_gwroute operation out of stripoutput

We should do it in ip_hresolv_needed.
 1.99 14-Jul-2015  ozaki-r Remove unnecessary if_type setting

if_type is set as IFT_SLIP below.
 1.98 14-Jul-2015  ozaki-r KNF
 1.97 05-Jun-2014  rmind branches: 1.97.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.96 23-Sep-2011  christos branches: 1.96.12; 1.96.26;
Change obsolete CBSIZE constant (48), to a power of two constant (64) that
is close enough to match the original assumptions.
 1.95 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.94 19-Jan-2010  pooka branches: 1.94.2; 1.94.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.93 07-May-2009  elad Introduce actions/requests to handle authorization for ppp(4), sl(4),
strip(4), btuart(4) and bcsp(4) network interfaces and devices.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/27/msg004955.html
 1.92 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.91 18-Mar-2009  cegger bcopy -> memcpy
 1.90 11-Jan-2009  christos branches: 1.90.2;
merge christos-time_t
 1.89 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.88 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.87 15-Jun-2008  christos branches: 1.87.2; 1.87.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.86 24-Apr-2008  ad branches: 1.86.2; 1.86.4; 1.86.6;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.85 20-Feb-2008  matt branches: 1.85.6; 1.85.8; 1.85.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.84 10-Nov-2007  ad Call ttyflush() with tty_lock held.
 1.83 08-Oct-2007  ad branches: 1.83.2; 1.83.4;
Use the softint API.
 1.82 29-Sep-2007  scw s/NPBFILTER/NBPFILTER/
Compile-tested only.
 1.81 01-Sep-2007  dyoung branches: 1.81.2;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.80 27-Aug-2007  dyoung branches: 1.80.2;
Remove dead code.
 1.79 26-Aug-2007  dyoung Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.78 07-Aug-2007  dyoung branches: 1.78.2;
Use satocsdl() instead of SDL().
 1.77 19-Jul-2007  dyoung branches: 1.77.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.76 14-Jul-2007  ad branches: 1.76.2;
Generic soft interrupts are mandatory.
 1.75 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.74 04-Mar-2007  christos branches: 1.74.2; 1.74.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.73 18-Feb-2007  dyoung Use satocsin to cast to const struct sockaddr_in *.
 1.72 18-Feb-2007  dogcow constify struct sockaddr.
 1.71 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.70 04-Jan-2007  elad branches: 1.70.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.69 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.68 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.67 23-Jul-2006  ad branches: 1.67.4; 1.67.6;
Use the LWP cached credentials where sane.
 1.66 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.65 14-May-2006  elad branches: 1.65.2;
integrate kauth.
 1.64 11-Dec-2005  thorpej branches: 1.64.4; 1.64.6; 1.64.8; 1.64.10; 1.64.12;
ANSI function decls and application of static.
 1.63 11-Dec-2005  christos merge ktrace-lwp.
 1.62 27-Nov-2005  thorpej Overhaul how TTY line disciplines are handled:
- Replace references to linesw[0] with a ttyldisc_default() function
that returns the default ("termios") line discipline.
- The linesw[] array is gone, replaced by a linked list.
- ttyldisc_add() and ttyldisc_remove() have been replaced by
ttyldisc_attach() and ttyldisc_detach().
- Things that provide line disciplines are now responsible for
registering those disciplines with the system. The linesw
structures are no longer declared in tty_conf.c
- Line disciplines are now refcounted; a lookup causes a reference to
be held. ttyldisc_release() releases the reference. Attempts to
detach an in-use line discipline result in EBUSY.
- Fix function signature lossage in if_sl.c, if_strip.c, and tty_tb.c
that was masked by the old tty_conf.c
- tty_init() is no longer necessary; delete it and its call from main().
 1.61 18-Aug-2005  yamt branches: 1.61.6;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.60 29-May-2005  christos branches: 1.60.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.59 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.58 26-Feb-2005  perry nuke trailing whitespace
 1.57 06-Dec-2004  christos branches: 1.57.4; 1.57.6;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.56 05-Dec-2004  peter Don't forget to call bpfdetach in the clone destroy function.
While here, add a missing static and change some spaces to tabs.
 1.55 05-Dec-2004  christos fix compilation issues. my kernel did not have strip...
 1.54 05-Dec-2004  christos clonify strip and sl.
 1.53 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.52 21-Apr-2004  itojun kill sprintf, use snprintf
 1.51 19-Jan-2004  atatat Remove redundant prototypes
 1.50 05-Sep-2003  itojun u_short -> u_int16_t
 1.49 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.48 01-May-2003  itojun branches: 1.48.2;
bpf_mtap() does not care about M_PKTHDR at the top. M_COPY_PKTHDR has some
consequences, so avoid it. if we need to attach dummy headers, we should
use M_PREPEND instead.
 1.47 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.46 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.45 11-Sep-2002  itojun KNF - return is not a function.
 1.44 17-Mar-2002  atatat Convert ioctl code to use EPASSTHROUGH instead of -1 or ENOTTY for
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
 1.43 26-Feb-2002  wiz Typo fix.
 1.42 14-Jan-2002  kleink Include <machine/intr.h> unconditionally, instead of only doing so if
__HAVE_GENERIC_SOFT_INTERRUPTS and relying on <sys/param.h> to provide it
otherwise; pointed out by Aymeric Vincent.
 1.41 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.40 12-Nov-2001  lukem add RCSIDs
 1.39 14-Jun-2001  itojun branches: 1.39.2; 1.39.4;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.38 07-May-2001  lukem delint to c89; use #define instead of static const int for an array size
 1.37 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.36 31-Mar-2001  enami Remove unnecessary test of tp->t_linesw against NULL; they are results
of confusion while correcting compilation error after t_line is
replaced with t_linesw.
 1.35 17-Jan-2001  thorpej branches: 1.35.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.34 15-Jan-2001  thorpej For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.33 12-Jan-2001  thorpej Fix a comment.
 1.32 12-Jan-2001  thorpej Don't use splimp() to block both net and tty interrupts. Instead,
block both interrupt levels as appropriate.
 1.31 12-Jan-2001  thorpej Sync with if_sl.c,v 1.67:
Defer output processing to the software interrupt.

Note, that in the process of doing this, I discovered several
very broken things about this driver, which are not fixed with
this commit. It should work as well as it previously did, but
this code could be seriously improved. As soon as I can find
my second Metricom radio, I'll give it a proper shakedown.
 1.30 11-Jan-2001  thorpej Sync if if_sl.c,v 1.66:

Move the VJ uncompress code into the software interrupt.
 1.29 11-Jan-2001  thorpej Sync with if_sl.c,v 1.65:

Once we have a complete frame, schedule a STRIP software interrupt,
and manipulate ipintrq from there. This will allow us to clean up
the use of splimp() in this file later.
 1.28 11-Jan-2001  thorpej Sync with if_sl.c,v 1.64:

Make the buffer management in STRIP just a little less evil.
 1.27 08-Jan-2001  thorpej Fix a typo in the ALTQ changes.
 1.26 18-Dec-2000  thorpej ALTQ'ify.
 1.25 18-Dec-2000  thorpej Fill in if_dlt.
 1.24 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.23 02-Nov-2000  eeh Fix bug w/previous.
 1.22 02-Nov-2000  itohy Set the default line discipline to t_linesw, rather than just NULL it.
 1.21 02-Nov-2000  itohy Adapt to the new line discipline scheme.
 1.20 02-Oct-2000  itojun cosmetic; repair indentation
 1.19 30-Mar-2000  augustss Kill some more register declarations.
 1.18 29-Mar-2000  simonb Don't need to include <sys/conf.h> here.
 1.17 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.16 07-Jul-1998  thorpej branches: 1.16.6; 1.16.14;
Make this compile again.
 1.15 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.14 20-Nov-1997  thorpej Start the watchdog timer in stripopen(), and make sure it's cancelled in
stripclose(). In strip_watchdog(), make abort if the line has been closed.

This fixes kern/4470 (Wolfgang Rupprecht), which was a bad pointer passed
to b_to_q() from strip_proberadio() called via strip_watchdog(); the tty
hadn't yet been attached to the strip interface.
 1.13 17-Nov-1997  thorpej Change the interface name from "st" to "strip", so as to match the
pseudo-device option listed in the kernel config file, and to avoid
a name clash with the "SCSI tape" driver.
 1.12 17-Nov-1997  thorpej Remove a gratuitous debugging printf.
 1.11 24-May-1997  christos branches: 1.11.8;
PR/3665: Martin Husemann: if_strip calls sl_compress_init with extra arg.
 1.10 07-May-1997  mikel fix bogons; from Jonathan O'Brien in PR kern/3571.
 1.9 27-Mar-1997  thorpej Update for the new mbuf code, in a slighly kludgy way. Basically, these
drivers played a somewhat evil trick with clusters, which is now
replaced by a somewhat evil trick with regular malloc'd memory.
 1.8 25-Oct-1996  cgd -Wcast-qual cleanups. Don't discard 'const' when casting.
 1.7 13-Oct-1996  christos backout previous kprintf change
 1.6 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.5 02-Aug-1996  jonathan * Remove old, unused SLIP variables from softc.
* Fix radio error-message parsing bug: old comparison against error
prefix string off by the size of the Starmode link-level protocol field.
* Fix radio reset finite state machine, given that parsing works properly.
* Add debugging messages about reset FSM if IFF_DEBUG is set.
* Remove #ifdefs notdef around back the check that discards newlines at
the beginning of a frame. Every error message from the radio has a newline,
as the radio send error messages terminated by \r\n, not just \r like data
frames. Not dropping the \n garbles the following data frame.
* Remove unused low-level debugging routines.
* Reformat the low-level bytestuff/RLL code to match the canonical source.
* Reduce MTU to 1100 bytes; 1200 bytes can overflow the radio buffers if the
bytestuff/RLL does poorly.
* Fix radio-probe string to _not_ include a frame delimiter (\r): sending
a \r to the radio tickles a bug in the firmware, causing the radio to
smash the next frame sent after the "**\r" probe string.
* Add calls to the tty t_oproc routine to make sure the probe and reset
strings get sent to the modem promptly, rather than waiting for the next
packet.
* Add PPP-style calls to the tty start-output function; seems to
reduce latency marginally.

still to do:
* Flush output queue if resetting, since the radio is going to drop
frames on the floor anyway if it needs resetting.
* Reduce tty start-output calls.
 1.4 26-Jun-1996  jonathan * Put in fix to in bytestuffing/RLL code from Stuart Cheshire, that
somehow got lost between NetBSD and Linux. Output side mbuf-walking
code now correctly bytestuffs mbuf chains, as well as single mbufs
and clusters.

* Update radio error-parsing code to Stuart's latest stable code.
We now parse error messages the older code didn't.

* Note where radio-crash watchdogs should be added (the linux code
is there, #ifdef'ed out). This still just doesn't work.
The radio reset doesn't always work even when slattatch is first started,
but I have a radio with old firmware, which may be a contributing factor.

* Correct the checks for the tty output queue being overfull; estimate
the stuffed pkt size as (original * 65/64) + STRIP_HDRLEN + 2,
instead of SLIP's (2*SLMTU). Re-enable the disabled check now the
size estimate isn't excessively large.

* Fix BPF tapping of strip interfaces, STRIP packets are wrapped
in a SLIP bpf header. This implies no BPF support for arp or atalk,
even though Linux boxes are sending arp requests and gratuitous arps.
There may be no good fix short of adding explicit STRIP encapsulation
support to bpf/tcpdump.

* Still need a solid walkthrough, and rewrite to eliminate redundant
receive-side mbuf copying.
 1.3 05-Jun-1996  thorpej Initialize sc_unit in the right place, a'la if_sl.c. Thanks to
Jonathan Stone <jonathan@DSG.Stanford.EDU> for pointing this out.
 1.2 19-May-1996  jonathan branches: 1.2.4;
Catch up to removal of if_unit and addition of if_xname and sc_unit,
blindly following the changes to if_sl.c.
 1.1 19-May-1996  jonathan Packet-mode driver for Metricom Ricochet radios (Starmode Radio IP).
 1.2.4.3 03-Aug-1996  jtc Pulled up from rev 1.5 by request from Jonathan Stone
 1.2.4.2 26-Jun-1996  jtc Pulled up from rev 1.4 by request from Jonathan Stone
 1.2.4.1 05-Jun-1996  thorpej Update from trunk:

Initialize sc_unit in the right place, a'la if_sl.c. Thanks to
Jonathan Stone <jonathan@DSG.Stanford.EDU> for pointing this out.
 1.11.8.3 20-Nov-1997  thorpej Pull up from trunk: fix bad pointer deref in b_to_q().
 1.11.8.2 17-Nov-1997  thorpej Sync w/ trunk.
 1.11.8.1 17-Nov-1997  thorpej Sync w/ trunk.
 1.16.14.6 21-Apr-2001  bouyer Sync with HEAD
 1.16.14.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.16.14.4 05-Jan-2001  bouyer Sync with HEAD
 1.16.14.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.16.14.2 22-Nov-2000  bouyer Sync with HEAD.
 1.16.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.35.2.10 11-Nov-2002  nathanw Catch up to -current
 1.35.2.9 17-Sep-2002  nathanw Catch up to -current.
 1.35.2.8 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.35.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.35.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.35.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.35.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.35.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.35.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.35.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.39.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.39.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.39.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.39.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.39.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.39.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.48.2.9 11-Dec-2005  christos Sync with head.
 1.48.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.48.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.48.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.48.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.48.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.48.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.48.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.48.2.1 03-Aug-2004  skrll Sync with HEAD
 1.57.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.57.4.1 29-Apr-2005  kent sync with -current
 1.60.2.7 27-Feb-2008  yamt sync with head.
 1.60.2.6 15-Nov-2007  yamt sync with head.
 1.60.2.5 27-Oct-2007  yamt sync with head.
 1.60.2.4 03-Sep-2007  yamt sync with head.
 1.60.2.3 26-Feb-2007  yamt sync with head.
 1.60.2.2 30-Dec-2006  yamt sync with head.
 1.60.2.1 21-Jun-2006  yamt sync with head.
 1.61.6.1 29-Nov-2005  yamt sync with head.
 1.64.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.64.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.64.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.64.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.64.8.3 11-Aug-2006  yamt sync with head
 1.64.8.2 26-Jun-2006  yamt sync with head.
 1.64.8.1 24-May-2006  yamt sync with head.
 1.64.6.2 01-Jun-2006  kardel Sync with head.
 1.64.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.64.4.1 09-Sep-2006  rpaulo sync with head
 1.65.2.1 19-Jun-2006  chap Sync with head.
 1.67.6.2 10-Dec-2006  yamt sync with head.
 1.67.6.1 22-Oct-2006  yamt sync with head
 1.67.4.2 12-Jan-2007  ad Sync with head.
 1.67.4.1 18-Nov-2006  ad Sync with head.
 1.70.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.70.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.74.4.1 11-Jul-2007  mjf Sync with head.
 1.74.2.6 09-Oct-2007  ad Sync with head.
 1.74.2.5 20-Aug-2007  ad Sync with HEAD.
 1.74.2.4 15-Jul-2007  ad Sync with head.
 1.74.2.3 15-Jul-2007  ad Sync with head.
 1.74.2.2 01-Jul-2007  ad Adapt to callout API change.
 1.74.2.1 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.76.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.76.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.77.4.5 11-Nov-2007  joerg Sync with HEAD.
 1.77.4.4 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.77.4.3 02-Oct-2007  joerg Sync with HEAD.
 1.77.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.77.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.78.2.2 07-Aug-2007  dyoung Use satocsdl() instead of SDL().
 1.78.2.1 07-Aug-2007  dyoung file if_strip.c was added on branch matt-mips64 on 2007-08-07 04:41:16 +0000
 1.80.2.3 23-Mar-2008  matt sync with HEAD
 1.80.2.2 09-Jan-2008  matt sync with HEAD
 1.80.2.1 06-Nov-2007  matt sync with HEAD
 1.81.2.2 14-Oct-2007  yamt sync with head.
 1.81.2.1 06-Oct-2007  yamt sync with head.
 1.83.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.83.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.85.10.2 17-Jun-2008  yamt sync with head.
 1.85.10.1 18-May-2008  yamt sync with head.
 1.85.8.4 27-Dec-2008  christos merge with head.
 1.85.8.3 09-Nov-2008  christos merge with head.
 1.85.8.2 01-Nov-2008  christos Sync with head.
 1.85.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.85.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.85.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.85.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.86.6.1 18-Jun-2008  simonb Sync with head.
 1.86.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.86.2.4 11-Aug-2010  yamt sync with head.
 1.86.2.3 11-Mar-2010  yamt sync with head
 1.86.2.2 16-May-2009  yamt sync with head
 1.86.2.1 04-May-2009  yamt sync with head.
 1.87.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.87.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.87.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.90.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.94.4.1 30-May-2010  rmind sync with head
 1.94.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.96.26.1 10-Aug-2014  tls Rebase.
 1.96.12.2 03-Dec-2017  jdolecek update from HEAD
 1.96.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.97.4.6 28-Aug-2017  skrll Sync with HEAD
 1.97.4.5 05-Oct-2016  skrll Sync with HEAD
 1.97.4.4 09-Jul-2016  skrll Sync with HEAD
 1.97.4.3 29-May-2016  skrll Sync with HEAD
 1.97.4.2 22-Apr-2016  skrll Sync with HEAD
 1.97.4.1 22-Sep-2015  skrll Sync with HEAD
 1.104.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.104.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.107.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.108.10.4 26-Jan-2019  pgoyette Sync with HEAD
 1.108.10.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.108.10.2 25-Jun-2018  pgoyette Sync with HEAD
 1.108.10.1 22-Apr-2018  pgoyette Sync with HEAD
 1.110.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.110.2.1 10-Jun-2019  christos Sync with HEAD
 1.113.6.1 25-Jan-2020  ad Sync with head.
 1.21 19-Jan-2020  thorpej Remove the strip(4) - Starmode Radio IP - pseudo-device driver. It is
long since obsolete.
 1.20 11-Jul-2019  msaitoh branches: 1.20.4;
Fix typo (s/supress/suppress/).
 1.19 14-Jul-2007  ad branches: 1.19.122;
Generic soft interrupts are mandatory.
 1.18 07-Jun-2006  kardel branches: 1.18.16;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.17 11-Dec-2005  thorpej branches: 1.17.4; 1.17.6; 1.17.8; 1.17.14;
ANSI function decls and application of static.
 1.16 11-Dec-2005  christos merge ktrace-lwp.
 1.15 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.14 27-Nov-2005  thorpej Overhaul how TTY line disciplines are handled:
- Replace references to linesw[0] with a ttyldisc_default() function
that returns the default ("termios") line discipline.
- The linesw[] array is gone, replaced by a linked list.
- ttyldisc_add() and ttyldisc_remove() have been replaced by
ttyldisc_attach() and ttyldisc_detach().
- Things that provide line disciplines are now responsible for
registering those disciplines with the system. The linesw
structures are no longer declared in tty_conf.c
- Line disciplines are now refcounted; a lookup causes a reference to
be held. ttyldisc_release() releases the reference. Attempts to
detach an in-use line discipline result in EBUSY.
- Fix function signature lossage in if_sl.c, if_strip.c, and tty_tb.c
that was masked by the old tty_conf.c
- tty_init() is no longer necessary; delete it and its call from main().
 1.13 05-Dec-2004  christos branches: 1.13.12; 1.13.18;
clonify strip and sl.
 1.12 14-Jun-2001  itojun branches: 1.12.4; 1.12.22;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.11 15-Jan-2001  thorpej branches: 1.11.2;
For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.10 11-Jan-2001  thorpej Sync with if_sl.c,v 1.65:

Once we have a complete frame, schedule a STRIP software interrupt,
and manipulate ipintrq from there. This will allow us to clean up
the use of splimp() in this file later.
 1.9 11-Jan-2001  thorpej Sync with if_sl.c,v 1.64:

Make the buffer management in STRIP just a little less evil.
 1.8 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.7 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.6 09-Feb-1998  perry branches: 1.6.6; 1.6.14;
add multiple inclusion protection (and cleanup).
 1.5 17-Nov-1997  thorpej Change the interface name from "st" to "strip", so as to match the
pseudo-device option listed in the kernel config file, and to avoid
a name clash with the "SCSI tape" driver.
 1.4 27-Mar-1997  thorpej branches: 1.4.8;
Update for the new mbuf code, in a slighly kludgy way. Basically, these
drivers played a somewhat evil trick with clusters, which is now
replaced by a somewhat evil trick with regular malloc'd memory.
 1.3 02-Aug-1996  jonathan * Remove old, unused SLIP variables from softc.
* Fix radio error-message parsing bug: old comparison against error
prefix string off by the size of the Starmode link-level protocol field.
* Fix radio reset finite state machine, given that parsing works properly.
* Add debugging messages about reset FSM if IFF_DEBUG is set.
* Remove #ifdefs notdef around back the check that discards newlines at
the beginning of a frame. Every error message from the radio has a newline,
as the radio send error messages terminated by \r\n, not just \r like data
frames. Not dropping the \n garbles the following data frame.
* Remove unused low-level debugging routines.
* Reformat the low-level bytestuff/RLL code to match the canonical source.
* Reduce MTU to 1100 bytes; 1200 bytes can overflow the radio buffers if the
bytestuff/RLL does poorly.
* Fix radio-probe string to _not_ include a frame delimiter (\r): sending
a \r to the radio tickles a bug in the firmware, causing the radio to
smash the next frame sent after the "**\r" probe string.
* Add calls to the tty t_oproc routine to make sure the probe and reset
strings get sent to the modem promptly, rather than waiting for the next
packet.
* Add PPP-style calls to the tty start-output function; seems to
reduce latency marginally.

still to do:
* Flush output queue if resetting, since the radio is going to drop
frames on the floor anyway if it needs resetting.
* Reduce tty start-output calls.
 1.2 19-May-1996  jonathan branches: 1.2.4;
Catch up to removal of if_unit and addition of if_xname and sc_unit,
blindly following the changes to if_sl.c.
 1.1 19-May-1996  jonathan Packet-mode driver for Metricom Ricochet radios (Starmode Radio IP).
 1.2.4.1 05-Aug-1996  jtc Pulled up from rev 1.3 by request from Jonathan Stone
 1.4.8.1 17-Nov-1997  thorpej Sync w/ trunk.
 1.6.14.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.6.14.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.6.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.11.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.12.22.2 11-Dec-2005  christos Sync with head.
 1.12.22.1 18-Dec-2004  skrll Sync with HEAD.
 1.12.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.13.18.1 29-Nov-2005  yamt sync with head.
 1.13.12.2 03-Sep-2007  yamt sync with head.
 1.13.12.1 21-Jun-2006  yamt sync with head.
 1.17.14.1 19-Jun-2006  chap Sync with head.
 1.17.8.1 26-Jun-2006  yamt sync with head.
 1.17.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.17.4.1 09-Sep-2006  rpaulo sync with head
 1.18.16.1 15-Jul-2007  ad Sync with head.
 1.19.122.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.20.4.1 25-Jan-2020  ad Sync with head.
 1.136 10-Nov-2024  mlelstv Add MBUFTRACE
 1.135 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.134 18-Aug-2024  rin if_tap: Explicitly include "opt_net_mpsafe.h", NFC

because it was included via <net/if.h> anyway.
 1.133 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.132 17-Apr-2024  riastradh branches: 1.132.2;
tap(4): Just use mutex_enter.

PR kern/58167
 1.131 17-Apr-2024  riastradh tap(4): Use DETACH_FORCE with config_detach.

It doesn't make a difference here, because tap_detach never fails,
but let's make it more obvious at the call site that failure is
forbidden here.

No functional change intended.

PR kern/58166
 1.130 17-Apr-2024  riastradh tap(4): Prune dead branches around tap_dev_destroyer.

No functional change intended.

PR kern/58166
 1.129 17-Apr-2024  riastradh tap(4): Prune dead branches around tap_dev_close.

No functional change intended.

PR kern/58166
 1.128 06-Jan-2023  ozaki-r tap: link up an interface cloned from /dev/tap

Fix PR 57155 (partially)
 1.127 10-Apr-2022  andvar branches: 1.127.4;
fix various typos in comments and output/log messages.
 1.126 31-Mar-2022  pgoyette For device modules that provide both auto-config and /dev/xxx
interfaces, make sure that initialization and destruction
follow the proper sequence. This is triggered by the recent
changes to the devsw stuff; per riastradh@ the required call
sequence is:

devsw_attach()
config_init_component() or config_cf*_attach()
...
config_fini_component() or config_cf*_detach()
devsw_detach()

While here, add a few missing calls to some of the detach
routines.

Testing of these changes has been limited to:
1. compile without build break
2. no related test failures from atf
3. modload/modunload work as well as
before.

No functional device testing done, since I don't have any
of these devices. Let me know of any damage I might cause
here!

XXX Some of the modules affected by this commit are already
XXX broken; see kern/56772. This commit does not break
any additional modules (as far as I know).
 1.125 28-Mar-2022  riastradh driver(9): devsw_detach never fails. Make it return void.

Prune a whole lotta dead branches as a result of this. (Some logic
calling this is also wrong for other reasons; devsw_detach is final
-- you should never have any reason to decide to roll it back. To be
cleaned up in subsequent commits...)

XXX kernel ABI change to devsw_detach signature requires bump
 1.124 26-Sep-2021  thorpej Use seltrue_filtops rather than rolling our own with filt_seltrue.
 1.123 26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.122 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.121 18-Dec-2020  thorpej branches: 1.121.4;
Use sel{record,remove}_knote().
 1.120 30-Oct-2020  christos branches: 1.120.2;
use c99 initializers
 1.119 27-Sep-2020  roy tap: Report link state based on if the interface has been opened or not

While a nice addition, it does render tap(4) useless as a bridge(4)
endpoint. We now have vether(4) for use as bridge endpoint.
 1.118 26-Sep-2020  roy tap: Remove media from this virtual interface

It serves no purpose at all.
 1.117 04-Feb-2020  thorpej Use ifmedia_fini().
 1.116 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.115 06-Jan-2020  christos branches: 1.115.2;
Add missing packet filter hooks, byte accounting.
 1.114 16-Oct-2019  knakahara Fix missing kpreempt_disable() before softint_schedule() like if_vmx.c:r1.51.
 1.113 29-May-2019  msaitoh branches: 1.113.2;
Even if we don't use MII(4), use the common path of SIOC[GS]IFMEDIA in
sys/net/if_ethersubr.c if we can.
- Add ec_ifmedia into struct ethercom.
- ec_mii in struct ethercom is kept and used as it is. It might be used in
future. Note that some Ethernet drivers which _DOESN'T_ use mii(4) use
ec_mii for keeping the if_media. Those should be changed in future.
 1.112 21-May-2019  msaitoh KNF. No functional change.
 1.111 26-Apr-2019  pgoyette Some more empty-string --> NULL conversions for module dependencies
 1.110 16-Apr-2019  msaitoh The path of SOICSIFMEDA or TAPGIFNAME calls are as follows:

doifioctl()

pre-convert (if_cvtcmd_43_hook & ifreqo2n)

(*ifp->if_ioctl)(ifp, cmd, data);

post-convert (ifreqn2o)

so it's not required to check OSIOCSIFMEDIA and OTAPGIFNAME in if_tap.c.
Those two command is converted to new command in if_cvtcmd_43_hook and
always new commands are seen in tap_ioctl().

OK'd by pgoyette.
 1.109 25-Mar-2019  pgoyette Put the #ifdef where it belongs (after defining the out2 label which is
referenced only inside #ifdef block)
 1.108 25-Mar-2019  pgoyette Resequence the activities in tapdetach() so that no new units can be
created, either by opening /dev/tap or ifconfig tapx create, before
checking to see if we have any active units.
 1.107 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.106 26-Jun-2018  msaitoh branches: 1.106.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.105 19-Dec-2017  ozaki-r branches: 1.105.2;
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point

Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.

Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.104 30-Nov-2017  christos add fo_name so we can identify the fileops in a simple way.
 1.103 29-Nov-2017  jmcneill set IFEF_MPSAFE
 1.102 29-Nov-2017  jmcneill Make tap(4) MP-safe.
 1.101 30-Oct-2017  ozaki-r Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use if_link_state_change
 1.100 23-Oct-2017  msaitoh - If if_initialize() failed in the attach function, free resources and return.
- KNF
 1.99 12-Feb-2017  skrll branches: 1.99.4; 1.99.6;
Whitespace
 1.98 12-Feb-2017  skrll Convert to kmem(9)
 1.97 12-Feb-2017  skrll Typo in comment
 1.96 12-Feb-2017  skrll KNF (sort #include <sys/...>) and remove a duplicate
 1.95 07-Feb-2017  skrll KNF and trailing whitespace. No functional change.
 1.94 15-Dec-2016  ozaki-r branches: 1.94.2;
Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input

The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
 1.93 02-Oct-2016  christos MFREE -> m_free
 1.92 15-Aug-2016  christos remove MODULAR/COMPAT_40 ifdef.
 1.91 14-Aug-2016  christos fix rump tests.
 1.90 08-Aug-2016  kre create++, destroy--
 1.89 08-Aug-2016  pgoyette Typo (missing ampersand)
 1.88 08-Aug-2016  pgoyette Final part of fixing if_tap. The module needs to attach its cdevsw (and
detach it later).
 1.87 08-Aug-2016  pgoyette Add the devsw_attach stuff, since the tap device can be accessed via
/dev/tap

This is a partial fix for the build. The rump tap component will be
fixed shortly.
 1.86 08-Aug-2016  pgoyette Partial fix - restore creation of our sysctl subtree for _MODULE
builds (it's already handled for built-in builds via registration
in a link-set).

XXX The build is still broken in rump...
 1.85 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.84 10-Jun-2016  ozaki-r branches: 1.84.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.83 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.82 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.81 17-Dec-2014  ozaki-r Replace if_attach of if_tap with if_initialize and if_register
 1.80 07-Nov-2014  ozaki-r branches: 1.80.2;
Complete the initialization of tap_softc before if_attach

Basically we should complete the initializaiton of softc before if_attach
because once if_attach is called if_detach can be called for the softc
before returning from if_attach. In case of tap, mutex_destroy can be
called before mutex_init that comes after if_attach.
 1.79 03-Oct-2014  skrll Remove unneeded #include
 1.78 05-Sep-2014  matt Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.77 25-Jul-2014  dholland Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.76 07-May-2014  cube Prevent a race between tap_dev_poll() and tap_start() by making sure the
call to selrecord() happens at splnet(). Fixes kern/47506 and kern/46199.
 1.75 20-Apr-2014  aymeric Call mutex_destroy() on sc_kqlock in tap_detach(). Found by LOCKDEBUG.
 1.74 20-Mar-2014  skrll branches: 1.74.2;
Mechanically replace simplelock with kmutex_t.
 1.73 16-Mar-2014  dholland Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.72 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.71 20-Aug-2013  yamt - deal with softint_establish failure
- establish softint only when necessary
 1.70 28-Jan-2013  yamt branches: 1.70.2;
use cprng_fast instead of getmicrouptime to generate "random" mac address
because the latter often produces the same addresses for subsequent tap
instances.
 1.69 28-Jan-2013  yamt whitespace
 1.68 27-Oct-2012  chs split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.
 1.67 02-Jun-2012  dsl branches: 1.67.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.66 22-Nov-2010  christos branches: 1.66.8; 1.66.14; 1.66.18; 1.66.20;
PR/44131: Matthew Mondor: if_tap.c tap_dev_ioctl() not propagating error,
always returns 0.
 1.65 19-May-2010  christos Replace ether_nonstatic_aton with a
- better named one
- not suffering from buffer oveflow
- simpler
- handling different separators
- returning error codes for errors

Some ideas from one posted on tech-net by Jonathan A. Kollasch
 1.64 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.63 19-Jan-2010  pooka branches: 1.63.2; 1.63.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.62 20-Dec-2009  dsl If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
 1.61 09-Dec-2009  dsl Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
 1.60 29-Nov-2009  plunky fix a potential leak on tap device close, purging the send queue
did not actually release the dequeued mbufs.

pointed out by Paul Forgey on tech-net
 1.59 15-Sep-2009  drochner fix undefined result of stat(), found by clang static analyzer
 1.58 23-Jul-2009  plunky Avoid a kernel assertion failure upstream by using FSTATE_NOTFOUND
rather than FSTATE_FOUND when setting the unit number directly.

config_attach_pseudo() will convert it to FSTATE_FOUND just after the
assertion.
 1.57 11-Apr-2009  christos Fix locking as Andy explained. Also fill in uid and gid like sys_pipe did.
 1.56 11-Apr-2009  christos Fix PR/37878 and PR/37550: Provide stat(2) for all devices and don't use
fbadop_stat.
 1.55 04-Apr-2009  ad Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)
 1.54 13-Mar-2009  plunky Deprecate the SIOCSIFPHYADDR ioctl and the sysctl node in favour
of the generic SIOCALIFADDR.

As suggested by cube.
 1.53 10-Mar-2009  plunky repair the SIOCSIFPHYADDR ioctl handler to be compatible with previous
versions which used a "struct sockaddr"
 1.52 01-Feb-2009  pooka branches: 1.52.2;
Drop splnet() *after* tsleep instead of before. Fixes a race condition
between sleep and wakeup. (tested on NetBSD 4.0)
 1.51 12-Nov-2008  ad Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.50 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.49 03-Nov-2008  hans call pmf_device_deregister in detach functions. requested by jmcneill.
 1.48 02-Nov-2008  hans Add NULL pmf handlers. OK by cube.
 1.47 26-Aug-2008  rmind branches: 1.47.2; 1.47.4;
tap_attach/tap_detach: selinit/seldestroy the selinfo structure.
Should fix PR/39237.
 1.46 10-Jun-2008  cegger branches: 1.46.2;
device_private(device_lookup()) -> device_lookup_private()
ok cube@
 1.45 28-May-2008  dyoung branches: 1.45.2;
In tap_clone_destroy(), don't treat a pointer to the tap(4) softc
like it is a device_t.

In tap_clone_creator(), set cf_fstate to FSTATE_FOUND instead of
_NOTFOUND to avoid a panic in config_detach() on a DIAGNOSTIC
kernel. XXX I'm not sure that that is the right fix.

These changes should put a stop to the crash described in kern/38759.
 1.44 21-May-2008  ad Acquire kernel_lock in tap's fileops.
 1.43 29-Apr-2008  martin branches: 1.43.2;
Convert to new 2 clause license
 1.42 24-Apr-2008  ad branches: 1.42.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.41 21-Mar-2008  ad branches: 1.41.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.40 04-Mar-2008  cube Split device_t/softc, and other related cosmetic changes.
 1.39 01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.38 20-Feb-2008  matt branches: 1.38.2; 1.38.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.37 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.36 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.35 05-Dec-2007  pooka branches: 1.35.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.34 05-Dec-2007  ad lockmgr -> mutex
 1.33 10-Sep-2007  cube branches: 1.33.6; 1.33.8;
Remove 3rd clause and my name from all the licences which were only in my
name.
 1.32 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.31 26-Aug-2007  dyoung branches: 1.31.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.30 30-May-2007  christos branches: 1.30.2; 1.30.6;
Move the nasty ifdefs in one place. Requested by ad and dyoung.
 1.29 29-May-2007  christos Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.28 17-May-2007  christos return POLLERR instead of ENXIO since we are expecting an revents mask not
an errno.
 1.27 09-Mar-2007  drochner branches: 1.27.2; 1.27.4;
don't use DVUNIT_ANY as unit number to attach pseudo devices,
use FSTATE_STAR and cf_unit=0 like normal devices.
Thanks to Arnaud Degroote for the bug report and testing.
 1.26 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.25 26-Feb-2007  cube Remove unnecessary output and reduce verbosity in dmesg(8) output. That
makes the output consistent with etherip(4).

Suggested by Nino Dehne on current-users@.
 1.24 24-Nov-2006  rpaulo branches: 1.24.2; 1.24.4;
The change I committed to etherip was wrong. ether_snprintf doesn't make
sense when chaning the MAC address of the virtual interface as pointed
out by Hans himself.
So, introduce ether_nonstatic_aton() and make etherip(4) and tap(4) use it.
 1.23 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.22 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.21 01-Sep-2006  cube branches: 1.21.2; 1.21.4;
Add a note about the use of CTL_CREATE in sysctl_createv, otherwise the
code can be confusing.
 1.20 30-Aug-2006  christos fix initializers.
 1.19 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.18 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.17 14-May-2006  elad branches: 1.17.2;
integrate kauth.
 1.16 29-Mar-2006  thorpej Use device_cfdata().
 1.15 28-Mar-2006  thorpej Use device_unit().
 1.14 16-Mar-2006  christos branches: 1.14.2;
Add a new function called ether_snprintf() which takes an external buffer
and a length. The buffer should be 3 * addrlen.
Remove local tap_ether_sprintf(), and use ether_snprintf() instead.
 1.13 24-Feb-2006  pooka branches: 1.13.2; 1.13.4;
comment police: p_dupfd is now known as l_dupfd and lives in struct lwp
 1.12 01-Feb-2006  cube branches: 1.12.2;
Properly dispose of cfdata memory when unloading the tap(4) LKM.
 1.11 11-Dec-2005  christos branches: 1.11.2; 1.11.4;
merge ktrace-lwp.
 1.10 20-Jun-2005  atatat branches: 1.10.2;
Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
 1.9 10-Jun-2005  bouyer call (ifp->if_input) at splnet(). ifp->if_input points to ether_input()
which doesn't raise the IPL itself in all cases.
Should also fix PR 29546 (the pkgsrc kernel module needs to be updated).
 1.8 17-May-2005  christos Yes, it was a cool trick >20 years ago to use "0123456789abcdef"[a] to
implement, xtoa(), but I think defining the samestring 50 times is a bit
too much. Defined HEXDIGITS and hexdigits in subr_prf.c and use it...
 1.7 24-Mar-2005  cube Set bit 0x2 of the first byte of the generated MAC address, to indicate it
is a locally administered address. Pointed out by Ignatios Souvatzis.
 1.6 26-Feb-2005  perry branches: 1.6.2;
nuke trailing whitespace
 1.5 12-Feb-2005  christos pass the flag to fdclone.
 1.4 25-Jan-2005  ragge branches: 1.4.2;
Do not cast simple_lock() to (void). It may be a do { } while() macro,
and then compilation fails. Found by H�vard Eidnes.
 1.3 22-Jan-2005  cube s/PF_LINK/AF_LINK/ because that way it makes sense.
 1.2 19-Jan-2005  cube Make this LKM-friendly by using _KERNEL_OPT and having a prototype for the
sysctl setup function.
 1.1 08-Jan-2005  cube branches: 1.1.2; 1.1.4;
Addition of tap(4).

NAME
tap - virtual Ethernet device

SYNOPSIS
pseudo-device tap

DESCRIPTION
The tap driver allows the creation and use of virtual Ethernet devices.
Those interfaces appear just as any real Ethernet NIC to the kernel, but
can also be accessed by userland through a character device node in order
to read frames being sent by the system or to inject frames.

In that respect it is very similar to what tun(4) provides, but the added
Ethernet layer allows easy integration with machine emulators or virtual
Ethernet networks through the use of bridge(4) with tunneling.

``Qui tacet consentire videtur.''
 1.1.4.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.7 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.4.5 15-Feb-2005  skrll Sync with HEAD.
 1.1.4.4 04-Feb-2005  skrll Sync with HEAD.
 1.1.4.3 24-Jan-2005  skrll Sync with HEAD.
 1.1.4.2 17-Jan-2005  skrll Sync with HEAD.
 1.1.4.1 08-Jan-2005  skrll file if_tap.c was added on branch ktrace-lwp on 2005-01-17 19:32:38 +0000
 1.1.2.1 29-Apr-2005  kent sync with -current
 1.4.2.3 26-Mar-2005  yamt sync with head.
 1.4.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.4.2.1 25-Jan-2005  yamt file if_tap.c was added on branch yamt-km on 2005-03-19 08:36:31 +0000
 1.6.2.2 21-Jan-2006  snj Pull up following revision(s) (requested by cube in ticket #1113):
sys/net/if_tap.c: revision 1.7
Set bit 0x2 of the first byte of the generated MAC address, to indicate it
is a locally administered address. Pointed out by Ignatios Souvatzis.
 1.6.2.1 10-Jun-2005  tron Pull up revision 1.9 (requested by bouyer in ticket #438):
call (ifp->if_input) at splnet(). ifp->if_input points to ether_input()
which doesn't raise the IPL itself in all cases.
Should also fix PR 29546 (the pkgsrc kernel module needs to be updated).
 1.10.2.9 24-Mar-2008  yamt sync with head.
 1.10.2.8 17-Mar-2008  yamt sync with head.
 1.10.2.7 27-Feb-2008  yamt sync with head.
 1.10.2.6 21-Jan-2008  yamt sync with head
 1.10.2.5 07-Dec-2007  yamt sync with head
 1.10.2.4 27-Oct-2007  yamt sync with head.
 1.10.2.3 03-Sep-2007  yamt sync with head.
 1.10.2.2 30-Dec-2006  yamt sync with head.
 1.10.2.1 21-Jun-2006  yamt sync with head.
 1.11.4.1 09-Sep-2006  rpaulo sync with head
 1.11.2.2 01-Mar-2006  yamt sync with head.
 1.11.2.1 01-Feb-2006  yamt sync with head.
 1.12.2.4 01-Jun-2006  kardel Sync with head.
 1.12.2.3 22-Apr-2006  simonb Fix sync-with-trunc botch.
 1.12.2.2 22-Apr-2006  simonb Sync with head.
 1.12.2.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.13.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.13.4.2 19-Apr-2006  elad sync with head.
 1.13.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.13.2.5 03-Sep-2006  yamt sync with head.
 1.13.2.4 11-Aug-2006  yamt sync with head
 1.13.2.3 26-Jun-2006  yamt sync with head.
 1.13.2.2 24-May-2006  yamt sync with head.
 1.13.2.1 01-Apr-2006  yamt sync with head.
 1.14.2.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.14.2.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.17.2.1 19-Jun-2006  chap Sync with head.
 1.21.4.2 10-Dec-2006  yamt sync with head.
 1.21.4.1 22-Oct-2006  yamt sync with head
 1.21.2.2 12-Jan-2007  ad Sync with head.
 1.21.2.1 18-Nov-2006  ad Sync with head.
 1.24.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.24.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.24.2.1 03-Dec-2009  sborrill Pull up the following revisions(s) (requested by plunky in ticket #1368):
sys/net/if_tap.c: revision 1.60

Fix a potential leak on tap device close; purging the send queue did not
actually release the dequeued mbufs.
 1.27.4.1 11-Jul-2007  mjf Sync with head.
 1.27.2.3 09-Oct-2007  ad Sync with head.
 1.27.2.2 09-Jun-2007  ad Sync with head.
 1.27.2.1 08-Jun-2007  ad Sync with head.
 1.30.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.30.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.30.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.30.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.31.2.3 23-Mar-2008  matt sync with HEAD
 1.31.2.2 09-Jan-2008  matt sync with HEAD
 1.31.2.1 06-Nov-2007  matt sync with HEAD
 1.33.8.2 26-Dec-2007  ad Sync with head.
 1.33.8.1 08-Dec-2007  ad Sync with head.
 1.33.6.3 18-Feb-2008  mjf Sync with HEAD.
 1.33.6.2 27-Dec-2007  mjf Sync with HEAD.
 1.33.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.35.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.35.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.38.6.7 17-Jan-2009  mjf Sync with HEAD.
 1.38.6.6 28-Sep-2008  mjf Sync with HEAD.
 1.38.6.5 29-Jun-2008  mjf Sync with HEAD.
 1.38.6.4 02-Jun-2008  mjf Sync with HEAD.
 1.38.6.3 06-Apr-2008  mjf - after some discussion with agc@ i agreed it would be a good idea to move
device_unregister_* to device_deregister_* to be more like the pmf(9)
functions, especially since a lot of the time the function calls are next
to each other.

- add device_register_name() support for dk(4).
 1.38.6.2 05-Apr-2008  mjf - add "file-system DEVFS" and "pseudo-device devfsctl" to conf/std seeing
as these are always needed.

- convert many, many drivers over to the New Devfs World Order. For a
list of device drivers yet to be converted see,
http://www.netbsd.org/~mjf/devfs-todo.html.

- add a new device_unregister_all(device_t) function to remove all device
names associated with a device_t, which saves us having to construct
device names when the driver is detached.

- add a DEV_AUDIO type for devices.
 1.38.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.38.2.1 24-Mar-2008  keiichi sync with head.
 1.41.2.3 17-Jun-2008  yamt sync with head.
 1.41.2.2 04-Jun-2008  yamt sync with head
 1.41.2.1 18-May-2008  yamt sync with head.
 1.42.2.6 11-Aug-2010  yamt sync with head.
 1.42.2.5 11-Mar-2010  yamt sync with head
 1.42.2.4 16-Sep-2009  yamt sync with head
 1.42.2.3 19-Aug-2009  yamt sync with head.
 1.42.2.2 04-May-2009  yamt sync with head.
 1.42.2.1 16-May-2008  yamt sync with head.
 1.43.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.43.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.45.2.1 18-Jun-2008  simonb Sync with head.
 1.46.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.46.2.1 19-Oct-2008  haad Sync with HEAD.
 1.47.4.8 09-Dec-2010  riz Pull up following revision(s) (requested by christos in ticket #1492):
sys/net/if_tap.c: revision 1.66
PR/44131: Matthew Mondor: if_tap.c tap_dev_ioctl() not propagating error,
always returns 0.
 1.47.4.7 03-Dec-2009  sborrill Pull up the following revisions(s) (requested by plunky in ticket #1173):
sys/net/if_tap.c: revision 1.60

Fix a potential leak on tap device close; purging the send queue
did not actually release the dequeued mbufs.
 1.47.4.6 04-Apr-2009  snj branches: 1.47.4.6.4;
Pull up following revision(s) (requested by ad in ticket #661):
sys/arch/xen/xen/xenevt.c: revision 1.32
sys/compat/svr4/svr4_net.c: revision 1.56
sys/compat/svr4_32/svr4_32_net.c: revision 1.19
sys/dev/dmover/dmover_io.c: revision 1.32
sys/dev/putter/putter.c: revision 1.21
sys/kern/kern_descrip.c: revision 1.190
sys/kern/kern_drvctl.c: revision 1.23
sys/kern/kern_event.c: revision 1.64
sys/kern/sys_mqueue.c: revision 1.14
sys/kern/sys_pipe.c: revision 1.109
sys/kern/sys_socket.c: revision 1.59
sys/kern/uipc_syscalls.c: revision 1.136
sys/kern/vfs_vnops.c: revision 1.164
sys/kern/uipc_socket.c: revision 1.188
sys/net/bpf.c: revision 1.144
sys/net/if_tap.c: revision 1.55
sys/opencrypto/cryptodev.c: revision 1.47
sys/sys/file.h: revision 1.67
sys/sys/param.h: patch
sys/sys/socketvar.h: revision 1.119
Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
 1.47.4.5 18-Mar-2009  snj Pull up following revision(s) (requested by plunky in ticket #575):
sys/net/if_tap.c: revision 1.54
Deprecate the SIOCSIFPHYADDR ioctl and the sysctl node in favour
of the generic SIOCALIFADDR.
As suggested by cube.
 1.47.4.4 18-Mar-2009  snj Pull up following revision(s) (requested by plunky in ticket #572):
sys/net/if_tap.c: revision 1.53
repair the SIOCSIFPHYADDR ioctl handler to be compatible with previous
versions which used a "struct sockaddr"
 1.47.4.3 06-Feb-2009  snj Pull up following revision(s) (requested by pooka in ticket #405):
sys/net/if_tap.c: revision 1.52
Drop splnet() *after* tsleep instead of before. Fixes a race condition
between sleep and wakeup. (tested on NetBSD 4.0)
 1.47.4.2 19-Nov-2008  snj Pull up following revision(s) (requested by hans in ticket #89):
sys/net/if_tap.c: revision 1.49
sys/net/if_etherip.c: revision 1.24
call pmf_device_deregister in detach functions. requested by jmcneill.
 1.47.4.1 19-Nov-2008  snj Pull up following revision(s) (requested by hans in ticket #89):
sys/net/if_tap.c: revision 1.48
sys/net/if_etherip.c: revision 1.23
Add NULL pmf handlers. OK by cube.
 1.47.4.6.4.1 21-Apr-2010  matt sync to netbsd-5
 1.47.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.47.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.47.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.52.2.2 23-Jul-2009  jym Sync with HEAD.
 1.52.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.63.4.2 05-Mar-2011  rmind sync with head
 1.63.4.1 30-May-2010  rmind sync with head
 1.63.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.63.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.66.20.1 03-Jun-2014  msaitoh Pull up following revision(s) (requested by cube in ticket #1071):
sys/net/if_tap.c 1.76

Prevent a race between tap_dev_poll() and tap_start() by making sure the
call to selrecord() happens at splnet(). Fixes kern/47506 and kern/46199.
 1.66.18.1 03-Jun-2014  msaitoh Pull up following revision(s) (requested by cube in ticket #1071):
sys/net/if_tap.c 1.76

Prevent a race between tap_dev_poll() and tap_start() by making sure the
call to selrecord() happens at splnet(). Fixes kern/47506 and kern/46199.
 1.66.14.1 03-Jun-2014  msaitoh Pull up following revision(s) (requested by cube in ticket #1071):
sys/net/if_tap.c 1.76

Prevent a race between tap_dev_poll() and tap_start() by making sure the
call to selrecord() happens at splnet(). Fixes kern/47506 and kern/46199.
 1.66.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.66.8.1 30-Oct-2012  yamt sync with head
 1.67.2.4 03-Dec-2017  jdolecek update from HEAD
 1.67.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.67.2.2 25-Feb-2013  tls resync with head
 1.67.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.70.2.2 18-May-2014  rmind sync with head
 1.70.2.1 28-Aug-2013  rmind sync with head
 1.74.2.1 10-Aug-2014  tls Rebase.
 1.80.2.7 28-Aug-2017  skrll Sync with HEAD
 1.80.2.6 05-Feb-2017  skrll Sync with HEAD
 1.80.2.5 05-Oct-2016  skrll Sync with HEAD
 1.80.2.4 09-Jul-2016  skrll Sync with HEAD
 1.80.2.3 19-Mar-2016  skrll Sync with HEAD
 1.80.2.2 22-Sep-2015  skrll Sync with HEAD
 1.80.2.1 06-Apr-2015  skrll Sync with HEAD
 1.84.2.6 20-Mar-2017  pgoyette Sync with HEAD
 1.84.2.5 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.84.2.4 04-Nov-2016  pgoyette Sync with HEAD
 1.84.2.3 26-Jul-2016  pgoyette Rename LOCALCOUNT_INITIALIZER to DEVSW_MODULE_INIT. This better describes
what we're doing, and why.
 1.84.2.2 19-Jul-2016  pgoyette Instead of repeatedly typing the conditional initialization of the
.d_localcount members in the various {b,c}devsw, define an initializer
macro and use it. This also removes the need for defining new symbols
for each 'struct localcount'.

As suggested by riastradh@
 1.84.2.1 18-Jul-2016  pgoyette Rump drivers are always installed via devsw_attach() so we need to
always allocate a 'struct localcount' for these drivers whenever they
are built as modules.
 1.94.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.99.6.2 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.99.6.1 08-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #349):
sys/net/if_l2tp.c: revision 1.14
sys/net/if_tap.c: revision 1.101
sys/net/if_tun.c: revision 1.141
sys/net/if_vlan.c: revision 1.106
Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use
if_link_state_change
 1.99.4.3 17-May-2017  pgoyette Actually return the retrun value that we computed.
 1.99.4.2 17-May-2017  pgoyette At suggestion of chuq@, modify config_attach_pseudo() to return with a
reference held on the device.

Adapt callers to expect the reference to exist, and to ensure that the
reference is released.
 1.99.4.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.105.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.105.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.106.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.106.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.106.2.1 10-Jun-2019  christos Sync with HEAD
 1.113.2.1 01-Nov-2019  martin Pull up following revision(s) (requested by knakahara in ticket #387):

sys/net/if_gre.c: revision 1.176
sys/net/if_l2tp.c: revision 1.40
sys/dev/pci/ixgbe/ix_txrx.c: revision 1.56
sys/net/if_tap.c: revision 1.114

Fix missing kpreempt_disable() before softint_schedule() like if_vmx.c:r1.51.
 1.115.2.1 29-Feb-2020  ad Sync with head.
 1.120.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.121.4.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.127.4.3 12-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #852):

sys/net/if_tap.c: revision 1.134

if_tap: Explicitly include "opt_net_mpsafe.h", NFC
because it was included via <net/if.h> anyway.
 1.127.4.2 11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #825):

sys/net/if_tap.c: revision 1.129
sys/net/if_tap.c: revision 1.130
sys/net/if_tap.c: revision 1.131
sys/net/if_tap.c: revision 1.132

tap(4): Prune dead branches around tap_dev_close.
No functional change intended.
PR kern/58166

tap(4): Prune dead branches around tap_dev_destroyer.
No functional change intended.
PR kern/58166

tap(4): Use DETACH_FORCE with config_detach.
It doesn't make a difference here, because tap_detach never fails,
but let's make it more obvious at the call site that failure is
forbidden here.

No functional change intended.
PR kern/58166

tap(4): Just use mutex_enter.
PR kern/58167
 1.127.4.1 06-Jan-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #38):

sys/net/if_tap.c: revision 1.128

tap: link up an interface cloned from /dev/tap

Fix PR 57155 (partially)
 1.132.2.1 02-Aug-2025  perseant Sync with HEAD
 1.6 06-Sep-2015  dholland More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.5 29-Apr-2008  martin branches: 1.5.44; 1.5.64;
Convert to new 2 clause license
 1.4 10-Sep-2007  cube branches: 1.4.20; 1.4.22; 1.4.24;
Remove 3rd clause and my name from all the licences which were only in my
name.
 1.3 10-Dec-2005  elad branches: 1.3.30; 1.3.44; 1.3.46;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 26-Feb-2005  perry branches: 1.2.4;
nuke trailing whitespace
 1.1 08-Jan-2005  cube branches: 1.1.2; 1.1.4; 1.1.6;
Addition of tap(4).

NAME
tap - virtual Ethernet device

SYNOPSIS
pseudo-device tap

DESCRIPTION
The tap driver allows the creation and use of virtual Ethernet devices.
Those interfaces appear just as any real Ethernet NIC to the kernel, but
can also be accessed by userland through a character device node in order
to read frames being sent by the system or to inject frames.

In that respect it is very similar to what tun(4) provides, but the added
Ethernet layer allows easy integration with machine emulators or virtual
Ethernet networks through the use of bridge(4) with tunneling.

``Qui tacet consentire videtur.''
 1.1.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.4.4 11-Dec-2005  christos Sync with head.
 1.1.4.3 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.4.2 17-Jan-2005  skrll Sync with HEAD.
 1.1.4.1 08-Jan-2005  skrll file if_tap.h was added on branch ktrace-lwp on 2005-01-17 19:32:38 +0000
 1.1.2.1 29-Apr-2005  kent sync with -current
 1.2.4.2 27-Oct-2007  yamt sync with head.
 1.2.4.1 21-Jun-2006  yamt sync with head.
 1.3.46.1 06-Nov-2007  matt sync with HEAD
 1.3.44.1 02-Oct-2007  joerg Sync with HEAD.
 1.3.30.1 09-Oct-2007  ad Sync with head.
 1.4.24.1 16-May-2008  yamt sync with head.
 1.4.22.1 18-May-2008  yamt sync with head.
 1.4.20.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.64.1 22-Sep-2015  skrll Sync with HEAD
 1.5.44.1 03-Dec-2017  jdolecek update from HEAD
 1.18 19-Jan-2020  thorpej Remove Token Ring support.
 1.17 16-Dec-2015  ozaki-r branches: 1.17.18; 1.17.24;
Fix token_rif extractions from llentry
 1.16 20-Feb-2008  matt branches: 1.16.54; 1.16.74;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.15 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.14 01-Sep-2007  dyoung branches: 1.14.6; 1.14.8; 1.14.12;
token_addmulti and token_delmulti are never used in the kernel, so
delete them.
 1.13 04-Mar-2007  christos branches: 1.13.2; 1.13.10; 1.13.14; 1.13.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.12 11-Dec-2005  thorpej branches: 1.12.26;
ANSI function decls and application of static.
 1.11 11-Dec-2005  christos merge ktrace-lwp.
 1.10 26-Feb-2005  perry branches: 1.10.4;
nuke trailing whitespace
 1.9 10-Nov-2003  wiz branches: 1.9.8; 1.9.10;
Spell address with two d's. Inspired by similar changes in OpenBSD,
originating from Jonathon Gray and forwarded by jmc@openbsd.
 1.8 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.7 27-Feb-2000  soren branches: 1.7.28;
Add empty token_ifdetach().
 1.6 19-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol
data formats.
 1.5 30-May-1999  bad branches: 1.5.2; 1.5.8;
Fix thinko of mine in previous. The source route info is not at m->m_data
after various m_adj()s have been done. Kludge around this with a cheesy
macro that knows where the drivers put the mac header in the first mbuf.

XXX There should be a better way to do this.
 1.4 18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.3 08-Apr-1999  bad Fix version id strings in comment.
 1.2 28-Mar-1999  kleink branches: 1.2.2;
ANSI C police.
 1.1 22-Mar-1999  bad Support routines for Token-Ring network drivers.

By Onno van der Linden.
 1.2.2.1 08-Apr-1999  bad branches: 1.2.2.1.2;
Pull up if_token.h:1.3 and if_tokensubr.c:1.4
Fix version id strings in comments which tell where the files are derived from.
 1.2.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.5.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.28.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.7.28.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.28.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.28.1 03-Aug-2004  skrll Sync with HEAD
 1.9.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.9.8.1 29-Apr-2005  kent sync with -current
 1.10.4.4 27-Feb-2008  yamt sync with head.
 1.10.4.3 21-Jan-2008  yamt sync with head
 1.10.4.2 03-Sep-2007  yamt sync with head.
 1.10.4.1 21-Jun-2006  yamt sync with head.
 1.12.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.13.16.3 23-Mar-2008  matt sync with HEAD
 1.13.16.2 09-Jan-2008  matt sync with HEAD
 1.13.16.1 06-Nov-2007  matt sync with HEAD
 1.13.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.13.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.13.2.1 09-Oct-2007  ad Sync with head.
 1.14.12.1 02-Jan-2008  bouyer Sync with HEAD
 1.14.8.1 26-Dec-2007  ad Sync with head.
 1.14.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.16.74.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.16.54.1 03-Dec-2017  jdolecek update from HEAD
 1.17.24.1 25-Jan-2020  ad Sync with head.
 1.17.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.85 19-Jan-2020  thorpej Remove Token Ring support.
 1.84 05-Feb-2019  msaitoh branches: 1.84.6;
Remove very old IFF_NOTRAILERS flag.
 1.83 09-May-2018  maxv branches: 1.83.2;
Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
 1.82 26-Apr-2018  maxv m_copy -> m_copym
 1.81 31-Jan-2017  maxv branches: 1.81.12;
Correctly handle the return value of arpresolve, otherwise we either leak
memory or use some we already freed.

Sent on tech-net, ok christos
 1.80 24-Jan-2017  maxv Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.79 11-Jan-2017  ozaki-r branches: 1.79.2;
Get rid of unnecessary header inclusions
 1.78 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.77 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.76 28-Apr-2016  ozaki-r branches: 1.76.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.75 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.74 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.73 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.72 16-Dec-2015  ozaki-r Fix token_rif extractions from llentry
 1.71 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.70 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.69 01-Jul-2015  ozaki-r Use ip_hresolv_output for if_token as well

I thought we cannot apply ip_hresolv_output to if_token because
rt0 looked being needed by arpresolve in token_output. However,
rt0 is actually not used by arpresolve in NetBSD (see obsolete
ARPRESOLVE macro).
 1.68 25-May-2015  ozaki-r Remove leftover DECNET-related stuffs

No objection on tech-kern and tech-net.
 1.67 20-May-2015  ozaki-r Remove leftover use of AF_NS and NS option

Unnecessary NETISR_NS is also removed.
 1.66 28-Nov-2014  ozaki-r branches: 1.66.2;
Remove dead codes and make if_free_sadl static

No functional change.
 1.65 05-Jun-2014  rmind branches: 1.65.2; 1.65.4; 1.65.6;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.64 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.63 15-May-2014  msaitoh Save a NETISR_* value in a variable and call schednetisr() after enqueue
a packet for readability and future modification.
 1.62 01-Mar-2013  joerg branches: 1.62.6; 1.62.10;
Retire OSI network stack. OK core@
 1.61 19-Jul-2011  tron branches: 1.61.2; 1.61.8; 1.61.12; 1.61.14; 1.61.18;
Fix weird hardware address assignment that GCC 4.5 complains about.
 1.60 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.59 19-Jan-2010  pooka branches: 1.59.2; 1.59.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.58 20-Nov-2009  christos ar_tha() can return NULL; treat this as an error.
 1.57 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.56 18-Mar-2009  cegger bcopy -> memcpy
 1.55 07-Nov-2008  dyoung branches: 1.55.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.54 28-Apr-2008  martin branches: 1.54.6; 1.54.8; 1.54.10; 1.54.12; 1.54.14;
Remove clause 3 and 4 from TNF licenses
 1.53 20-Feb-2008  matt branches: 1.53.6; 1.53.8; 1.53.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.52 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.51 19-Oct-2007  ad branches: 1.51.4; 1.51.8;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.50 30-Aug-2007  dyoung branches: 1.50.4;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.49 26-Aug-2007  dyoung branches: 1.49.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.48 07-Aug-2007  dyoung branches: 1.48.2;
Constify. bcopy -> memcpy.
 1.47 21-Jul-2007  dyoung branches: 1.47.4;
Use NULL instead of 0 for null pointers.
 1.46 04-Mar-2007  christos branches: 1.46.2; 1.46.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.45 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.44 10-Dec-2006  is branches: 1.44.2;
Explain XID magic constants, correcting the format ID
 1.43 10-Dec-2006  is oops, forgot lan_hdr_len length offset
 1.42 10-Dec-2006  is oops, forgot the m_adj
 1.41 10-Dec-2006  is was wrong magic constant. no functional change.
 1.40 10-Dec-2006  is Avoid overlapping struct assignment, like in the Ethernet and FDDI cases.
 1.39 07-Sep-2006  dogcow branches: 1.39.2; 1.39.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.38 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.37 18-May-2006  liamjfoy branches: 1.37.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.36 12-May-2006  mrg since ar_tha() can return NULL, don't pass it directly to functions
that expect real addresses. explicitly KASSERT() that it is not
NULL in the kernel and just avoid using it userland.

(the kernel could be more defensive about this, but, until now it
would have just crashed anyway.)
 1.35 15-Apr-2006  christos Coverity CID 1147: Protect against NULL deref.
 1.34 11-Dec-2005  thorpej branches: 1.34.4; 1.34.6; 1.34.8; 1.34.10; 1.34.12;
ANSI function decls and application of static.
 1.33 11-Dec-2005  christos merge ktrace-lwp.
 1.32 30-May-2005  christos branches: 1.32.2;
bcopy -> memcpy
bcmp -> memcmp
and remove casts.
 1.31 31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.30 26-Feb-2005  perry nuke trailing whitespace
 1.29 30-Dec-2004  reinoud branches: 1.29.2; 1.29.4;
Fixup of bad patch made by me; the #ifdef ought to be also including ISO
rather than being removed. Also fixed a small comment about the scope of
#if's.

This code is a but ugly IMHO but as long as we dont have to change it ....
 1.28 29-Dec-2004  reinoud Remove conditional around label. Its allways used in the code and thus not
explicitly only for the protocols indicated by the #if

Allthough its unlikely a kernel will be build without NET_INET, it will
fail compilation here when NET_INET is not defined.
 1.27 06-Dec-2004  christos Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.26 18-Jun-2004  wiz Onno van der Linden assigned copyright for his work on this file
to TNF. Change license accordingly. Ok'd by christos for board.
 1.25 22-Mar-2004  matt Update my copyright to not include advertising clause.
 1.24 05-Sep-2003  itojun u_short -> u_int16_t
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.22 23-May-2003  itojun branches: 1.22.2;
don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.21 02-May-2003  itojun KNF
 1.20 01-May-2003  itojun consistency; use tokenbroadcastaddr, not ether*.
 1.19 12-Nov-2001  lukem branches: 1.19.10;
add RCSIDs
 1.18 14-Jun-2001  itojun branches: 1.18.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.17 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.16 28-Feb-2001  wiz branches: 1.16.2;
Fix pasto reported in kern/12241 by Michael van Elst.
 1.15 17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.14 18-Dec-2000  thorpej Fill in if_dlt.
 1.13 13-Dec-2000  thorpej Add ALTQ glue.
 1.12 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.11 02-Oct-2000  itojun fix compilation without INET.
 1.10 14-Jun-2000  mycroft Check the multicast bit in the header mbuf while interrupts are still blocked.
Otherwise we can run off into space if the packet was sent immediately and the
mbuf freed.
Pointed out by Boris Popov (not on our lists).
 1.9 30-Mar-2000  augustss branches: 1.9.2;
Kill some more register declarations.
 1.8 27-Feb-2000  soren Add empty token_ifdetach().
 1.7 30-May-1999  bad branches: 1.7.2;
Fix thinko of mine in previous. The source route info is not at m->m_data
after various m_adj()s have been done. Kludge around this with a cheesy
macro that knows where the drivers put the mac header in the first mbuf.

XXX There should be a better way to do this.
 1.6 29-May-1999  bad Don't assume the Token-Ring source route is in the m_pktdat. Use
m_data instead. This isn't a problem with ARP packets but is correct
way to this.

Noticed by pmara@cactus.org (Shashi Mara).
 1.5 18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.4 08-Apr-1999  bad Fix version id strings in comment.
 1.3 22-Mar-1999  bad branches: 1.3.2;
Appease GCC.
#ifdef FreeBSD some debug code as is done if if_fddisubr.c.
 1.2 22-Mar-1999  bad Oops. RcsID police.
 1.1 22-Mar-1999  bad Support routines for Token-Ring network drivers.

By Onno van der Linden.
 1.3.2.1 08-Apr-1999  bad branches: 1.3.2.1.2;
Pull up if_token.h:1.3 and if_tokensubr.c:1.4
Fix version id strings in comments which tell where the files are derived from.
 1.3.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.7.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.7.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.7.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.7.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.7.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.7.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.16.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.18.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.10.1 24-Jun-2003  grant Pull up revision 1.22 (requested by itojun in ticket #1325):

don't call if_free_sadl() until very end of if_detach() logic. many of
routing table manipulation code assumes the presense of AF_LINK sockaddr.
should fix PR 21581
 1.22.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.22.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.22.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.22.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.22.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.29.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.29.2.1 29-Apr-2005  kent sync with -current
 1.32.2.7 27-Feb-2008  yamt sync with head.
 1.32.2.6 21-Jan-2008  yamt sync with head
 1.32.2.5 27-Oct-2007  yamt sync with head.
 1.32.2.4 03-Sep-2007  yamt sync with head.
 1.32.2.3 26-Feb-2007  yamt sync with head.
 1.32.2.2 30-Dec-2006  yamt sync with head.
 1.32.2.1 21-Jun-2006  yamt sync with head.
 1.34.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.34.10.1 19-Apr-2006  elad sync with head.
 1.34.8.3 14-Sep-2006  yamt sync with head.
 1.34.8.2 26-Jun-2006  yamt sync with head.
 1.34.8.1 24-May-2006  yamt sync with head.
 1.34.6.3 01-Jun-2006  kardel Sync with head.
 1.34.6.2 22-Apr-2006  simonb Sync with head.
 1.34.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.34.4.1 09-Sep-2006  rpaulo sync with head
 1.37.2.1 19-Jun-2006  chap Sync with head.
 1.39.4.1 18-Dec-2006  yamt sync with head.
 1.39.2.1 12-Jan-2007  ad Sync with head.
 1.44.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.44.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.46.10.2 03-Sep-2007  skrll Sync with HEAD.
 1.46.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.46.2.3 23-Oct-2007  ad Sync with head.
 1.46.2.2 09-Oct-2007  ad Sync with head.
 1.46.2.1 20-Aug-2007  ad Sync with HEAD.
 1.47.4.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.47.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.47.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.48.2.2 07-Aug-2007  dyoung Constify. bcopy -> memcpy.
 1.48.2.1 07-Aug-2007  dyoung file if_tokensubr.c was added on branch matt-mips64 on 2007-08-07 04:41:47 +0000
 1.49.2.3 23-Mar-2008  matt sync with HEAD
 1.49.2.2 09-Jan-2008  matt sync with HEAD
 1.49.2.1 06-Nov-2007  matt sync with HEAD
 1.50.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.51.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.51.4.1 26-Dec-2007  ad Sync with head.
 1.53.10.4 11-Aug-2010  yamt sync with head.
 1.53.10.3 11-Mar-2010  yamt sync with head
 1.53.10.2 04-May-2009  yamt sync with head.
 1.53.10.1 16-May-2008  yamt sync with head.
 1.53.8.1 18-May-2008  yamt sync with head.
 1.53.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.53.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.54.14.1 21-Apr-2010  matt sync to netbsd-5
 1.54.12.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.54.10.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.54.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.54.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.54.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.55.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.59.4.1 30-May-2010  rmind sync with head
 1.59.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.61.18.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.61.14.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.61.12.3 03-Dec-2017  jdolecek update from HEAD
 1.61.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.61.12.1 23-Jun-2013  tls resync from head
 1.61.8.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1429):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.61.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.62.10.1 10-Aug-2014  tls Rebase.
 1.62.6.2 18-May-2014  rmind sync with head
 1.62.6.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.65.6.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.65.4.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.65.2.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1355):
sys/net/if_arcsubr.c: revision 1.76 via patch
sys/net/if_ecosubr.c: revision 1.50 via patch
sys/net/if_ethersubr.c: revision 1.236 via patch
sys/net/if_fddisubr.c: revision 1.104 via patch
sys/net/if_tokensubr.c: revision 1.80 via patch
Don't forget to free the mbuf when we decide not to reply to an ARP
request. This obviously is a terrible bug, since it allows a remote sender
to DoS the system with specially-crafted requests sent in a loop.
 1.66.2.8 05-Feb-2017  skrll Sync with HEAD
 1.66.2.7 05-Oct-2016  skrll Sync with HEAD
 1.66.2.6 29-May-2016  skrll Sync with HEAD
 1.66.2.5 22-Apr-2016  skrll Sync with HEAD
 1.66.2.4 19-Mar-2016  skrll Sync with HEAD
 1.66.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.66.2.2 22-Sep-2015  skrll Sync with HEAD
 1.66.2.1 06-Jun-2015  skrll Sync with HEAD
 1.76.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.76.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.76.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.79.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.81.12.2 21-May-2018  pgoyette Sync with HEAD
 1.81.12.1 02-May-2018  pgoyette Synch with HEAD
 1.83.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.83.2.1 10-Jun-2019  christos Sync with HEAD
 1.84.6.1 25-Jan-2020  ad Sync with head.
 1.177 18-Sep-2024  rin tun(4): Mark tunread_filtops `FILTEROP_MPSAFE`

Filter handlers have already been MP-safe since 2018:
https://mail-index.netbsd.org/source-changes/2018/08/06/msg097317.html

Note that we do not expect deadlocks similar to bpf(4) (PR kern/58531),
b/w KERNEL_LOCK and spin mutex for TX queue.

For tun(4), filt_tunread() acquires adaptive mutex. This is forbidden
when spin mutex is already held.

Such a path must have already been detected if present.

Thanks ozaki-r@ for discussion.
 1.176 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.175 09-Mar-2024  riastradh branches: 1.175.2;
tun(4): Allow IPv6 packets with TUNSLMODE configured.

PR kern/58013
 1.174 29-Dec-2023  chs tun: add missing kpreempt_enable() if pktq_enqueue() fails
 1.173 28-Mar-2022  riastradh branches: 1.173.4; 1.173.8;
driver(9): devsw_detach never fails. Make it return void.

Prune a whole lotta dead branches as a result of this. (Some logic
calling this is also wrong for other reasons; devsw_detach is final
-- you should never have any reason to decide to roll it back. To be
cleaned up in subsequent commits...)

XXX kernel ABI change to devsw_detach signature requires bump
 1.172 15-Mar-2022  riastradh tun(4): Fix bug introduced in previous locking change.

Now that tun_lock runs at IPL_NONE, taking it does not have the side
effect of disabling preemption, but pktq_enqueue assumes the caller
has disabled preemption so it can safely schedule a softint.

This isn't a problem in most physical network drivers because the
pktq_enqueue call happens from within the driver's softint context
anyway. But tun(4) is special -- here, the pktq_enqueue is triggered
by a userland write to the device, which is in thread context. So
let's just disable preemption in tunwrite.

Reported-by: syzbot+21c2cb300f1ec2162b35@syzkaller.appspotmail.com
 1.171 13-Mar-2022  riastradh tun(4): Fix some error branches in tunwrite.
 1.170 13-Mar-2022  riastradh tun(4): Omit TUN_RWAIT micro-optimization.

cv_broadcast aleady has a fast path for no-waiters.
 1.169 13-Mar-2022  riastradh tun(4): Deliver SIGIO for hangup under tun_lock.

Otherwise, tp->tun_pgid is not stable.
 1.168 13-Mar-2022  riastradh tun(4): Reduce lock from IPL_NET to IPL_SOFTNET.

This is never taken from hardware interrupt handlers any more, as far
as I can tell -- only SOFTINT_NET soft interrupt handlers.

This avoids trying to take an adaptive lock, proc_lock, in fownsignal
while holding a spin lock. Unfortunately, it doesn't entirely fix the
problem -- proc_lock is at IPL_NONE, and is held across some not
entirely trivial computations like allocating a new pid table. So it
would really be better if we had some way to deliver SIGIO without
taking proc_lock.

Reported-by: syzbot+3dd54993d3e92e697e72@syzkaller.appspotmail.com
Reported-by: syzbot+aca29415f2f0bf23f082@syzkaller.appspotmail.com
 1.167 13-Mar-2022  riastradh tun(4): Reduce tun_softc_lock from IPL_NET to IPL_NONE.

This is always taken in process/thread context, never in interrupt
context, hard or soft.
 1.166 13-Mar-2022  riastradh tun(4): Factor out setup/teardown into separate routines.

- Reduce duplication.
- Plug softint leak on recycling tun.

(This recycling business seems kinda sketchy...)
 1.165 13-Mar-2022  riastradh tun(4): Add missing cv_destroy in tunclose.
 1.164 26-Sep-2021  thorpej Use seltrue_filtops rather than rolling our own with filt_seltrue.
 1.163 26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.162 18-Dec-2020  thorpej Use sel{record,remove}_knote().
 1.161 27-Sep-2020  roy branches: 1.161.2;
tun: Report link state based on if the interface has been opened or not

This mirrors tap(4).
 1.160 29-Aug-2020  maxv Correct my rev1.159, it was incomplete, the check must be done later
because the value can change in the meantime (and get set to zero).
 1.159 23-Jun-2020  maxv Hum. Fix NULL deref triggerable with just write(0).

Reported-by: syzbot+45b31355bf880e175b73@syzkaller.appspotmail.com
 1.158 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.157 13-Dec-2019  maxv branches: 1.157.2;
Read the len before pushing the packet, otherwise possible use-after-free.
Found by a custom query on LGTM.
 1.156 26-Apr-2019  pgoyette branches: 1.156.2;
Set the "required modules" to NULL, not to an empty string.

It really doesn't make that much difference to the code, but the output
from modstat(8) is different! (With an empty string in the MODULE() macro
modstat reports an empty string, but with a NULL in the macro, modstat
prints a '-' just like it does for other "empty" fields.)
 1.155 25-Mar-2019  pgoyette in tundetach(), error is only used #ifdef _MODULE so wrap its declaration.
 1.154 25-Mar-2019  pgoyette Resequence the stuff in tundetach() to ensure that no new device units
can be created by either 'ifconfig create' or 'open("/dev/tun0")' paths.

Note: previous efforts at fixing 'modunload if_tun' are abandoned, since
there is no bug. Just need to ensure that the cloned interface is both
close(1)d _and_ 'ifconfig tunx destroy' before trying to unload.
 1.153 25-Mar-2019  msaitoh Revert rev. 1.151 and 1.152 to avoid compile error. Requested by pgoyette.
 1.152 25-Mar-2019  pgoyette Use correct list name
 1.151 25-Mar-2019  pgoyette This should do it!

Remove the zombie unit from the zombie list, not the regular list!
 1.150 25-Mar-2019  pgoyette And revert both of the previous. It seems that the structure has
already been removed from the list in the find_zunit() code.

So now, off to really find out why the module won't unload.
 1.149 25-Mar-2019  pgoyette Fix previous - remove it from the list before freeing the memory.
 1.148 25-Mar-2019  pgoyette If the unit being closed was a "zombie" (ie, the interface was destroyed
previously), remove it from the zombie list after freeing all of its
resources.

This should allow the module to be unloaded even if there was a zombie
at some point. Without this change, the zombie list never gets emptied.
 1.147 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.146 06-Aug-2018  ozaki-r Fix tun(4) kevent locking

filt_tunread gets called in two contexts:

- by calls to selnotify in if_tun.c (or knote, as the case may be,
but not here), in which case tp->tun_lock is held; and

- by internal logic in kevent, in which tp->tun_lock is not held.

The standard convention to discriminate between these two cases is by
setting the kernel-only NOTE_SUBMIT bit in the hint to selnotify or
knote; then in filt_*:

if (hint & NOTE_SUBMIT)
KASSERT(mutex_owned(&tp->tun_lock));
else
mutex_enter(&tp->tun_lock);
...
if (hint & NOTE_SUBMIT)
KASSERT(mutex_owned(&tp->tun_lock));
else
mutex_exit(&tp->tun_lock);

Pointed out by and patch from riastradh@
Tested by ozaki-r@ (only the former path)
 1.145 03-Aug-2018  ozaki-r tun: fix locking against myself

filt_tunread is called with tun_lock held from tun_output (via tun_output =>
selnotify => knote), so we must not take tun_lock in filt_tunread. The bug
is triggered only if a tun is used through kqueue.

Found by k-goda@IIJ
 1.144 26-Jun-2018  msaitoh branches: 1.144.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.143 16-Mar-2018  tih Add packet filtering to tun(4) interfaces.

Calls to pfil_run_hooks() were missing in if_tun.c. This meant that
filtering configuration could be added to e.g. /etc/npf.conf, but
would be ignored, because the filter never saw the packets. This
change adds the required calls.

While here, correct the return value from tun_output(): it's been
returning 0 regardless of any error condition present, but will now
correctly propagate such information upward.

Thanks to maxv for guidance!

OK: christos, martin
 1.142 06-Dec-2017  ozaki-r branches: 1.142.2;
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes

And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
 1.141 30-Oct-2017  ozaki-r Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use if_link_state_change
 1.140 25-Oct-2017  maya Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.139 24-May-2017  pgoyette branches: 1.139.2;
Call cv_destroy() to deactivate the tun_cv before calling kmem_intr_free()
to deallocate the containing memory chunk (the tunnel's softc). Otherwise
a LOCKDEBUG kernel will panic in tun_clone_destroy().

Fixes PR kern/52255
 1.138 29-Jan-2017  maya branches: 1.138.4;
Most error paths that goto out; don't hold tun_lock.
so don't mutex_exit(tun_lock) in them, but only in
the one that needs it.

ok skrll
 1.137 26-Jan-2017  skrll Fix logic inversion spotted by paulg
 1.136 26-Jan-2017  skrll Make MP-safe and use kmem(9)

Mostly from rmind-smpnet
 1.135 23-Jan-2017  skrll KNF. Same code before and after.
 1.134 11-Jan-2017  ozaki-r branches: 1.134.2;
Get rid of unnecessary header inclusions
 1.133 02-Oct-2016  christos MFREE -> m_free
 1.132 07-Sep-2016  ozaki-r Fix tun_enable

Before the rearrangement of ifaddr initializations (in.c,v 1.169),
when we called tun_enable via ioctl(SIOCINITIFADDR), an ifaddr
in question was inserted in the interface address list. However,
after the change the ifaddr isn't in the list at that point. So
we shouldn't rely on that we can find the ifaddr by
IFADDR_READER_FOREACH. Instead simply use the ifaddr passed by
ioctl(SIOCINITIFADDR).
 1.131 07-Sep-2016  ozaki-r Rename tuncreate to tun_enable

It should be more proper.
 1.130 05-Sep-2016  ozaki-r Support tun devices on rump kernels
 1.129 05-Sep-2016  ozaki-r Fix typo in a comment
 1.128 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.127 07-Jul-2016  ozaki-r branches: 1.127.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.126 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.125 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.124 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.123 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.122 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.121 20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.120 25-Jul-2014  dholland branches: 1.120.4;
Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.119 19-Jun-2014  ws Enqueue the mbuf with the start of the packet,
not some intermediate one (hi, rmind!).
 1.118 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.117 20-Mar-2014  skrll branches: 1.117.2;
Mechanically replace simplelock with kmutex_t.
 1.116 16-Mar-2014  dholland Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.115 28-Jan-2012  rmind branches: 1.115.6; 1.115.10;
Replace tun_lock with mutex(9). XXX: too far from being MP-safe yet.
 1.114 28-Oct-2011  dyoung branches: 1.114.2; 1.114.6;
For these interfaces, the implementation of SIOCSIFDSTADDR is identical
to SIOCINITIFADDR, and SIOCSIFDSTADDR callers always fall back to
SIOCINITIFADDR, so just get rid of the SIOCSIFDSTADDR case.
 1.113 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.112 19-Jan-2010  pooka branches: 1.112.2; 1.112.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.111 08-May-2009  elad Add and use a network scope action/request for tun(4), similar to ppp(4),
sl(4), and strip(4).
 1.110 20-Nov-2008  dyoung branches: 1.110.4;
Update comment for last.
 1.109 20-Nov-2008  dyoung In the new ifioctl order, tun_ioctl() can call itself through
ifioctl_common(). Since the first tun_ioctl() call already holds
the simplelock, the second tun_ioctl() call will wait forever to
acquire it: deadlock.

To fix this, wait to acquire the lock until tuninit().
 1.108 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.107 15-Jun-2008  christos branches: 1.107.2; 1.107.4;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.106 24-Apr-2008  ad branches: 1.106.2; 1.106.4; 1.106.6;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.105 21-Mar-2008  ad branches: 1.105.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.104 01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.103 20-Feb-2008  matt branches: 1.103.2; 1.103.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.102 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.101 04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.100 05-Dec-2007  pooka branches: 1.100.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.99 19-Oct-2007  ad branches: 1.99.2; 1.99.4;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.98 01-Sep-2007  dyoung branches: 1.98.4;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.97 04-Mar-2007  christos branches: 1.97.2; 1.97.10; 1.97.14; 1.97.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.96 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.95 04-Jan-2007  elad branches: 1.95.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.94 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.93 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.92 07-Sep-2006  dogcow branches: 1.92.2; 1.92.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.91 30-Aug-2006  christos fix initializer
 1.90 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.89 14-May-2006  elad integrate kauth.
 1.88 18-Apr-2006  rpaulo Fix another typo... I must be on drugs...
 1.87 08-Apr-2006  rpaulo IFHEAD and PREPADDR are mutually exclusive. From FreeBSD.
 1.86 04-Apr-2006  rpaulo Add another bit from FreeBSD that I forgot: in tun_output, don't try to send
an AF_INET packet if TUN_IFHEAD is not set.
From FreeBSD and spotted (again) by DEGROOTE Arnaud.
 1.85 04-Apr-2006  rpaulo Fix a if-clause botched in a previous revision now that we have TUN_IFHEAD.
Spotted by DEGROOTE Arnaud <degroote@enseirb.fr>.
 1.84 03-Apr-2006  rpaulo Implement TUN_IFHEAD, the missing piece that was breaking old applications.
 1.83 29-Mar-2006  rpaulo Add missing break tunwrite() which was causing EAFNOSUPPORT to be
returned, thus making IPv6 support broken.
!@#$%^...
 1.82 03-Mar-2006  rpaulo branches: 1.82.2; 1.82.4; 1.82.6;
Some minor KNF.
 1.81 03-Mar-2006  rpaulo Fix typo in comment.
 1.80 28-Feb-2006  rpaulo Add full support for IPv6 tunnels. From DEGROOTE Arnaud in PR 32944.
The PR submitter and the PR handler were unable to test this code
using Teredo userland clients such as Miredo. However, the PR handler
dumped and analyzed some of the packets produced by Miredo and they
seemed fine.
(On a side note: I was unable to setup Teredo in Windows XP and the
problem seemed similar to what I currently see in NetBSD: lack of
replies from the Teredo relay).
 1.79 05-Feb-2006  rpaulo Add preliminary/not tested support for IPv6.
 1.78 11-Dec-2005  thorpej branches: 1.78.2; 1.78.4; 1.78.6;
ANSI function decls and application of static.
 1.77 11-Dec-2005  christos merge ktrace-lwp.
 1.76 24-Jan-2005  matt branches: 1.76.8;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.75 06-Dec-2004  christos branches: 1.75.4;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.74 04-Dec-2004  peter Remove redundant conditional; NTUN is always 1 when this file is compiled.
Also remove tun.h include, since it's no longer needed.
 1.73 04-Dec-2004  peter Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.72 19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.71 06-Jun-2004  dyoung Use bpf_mtap2 in tun(4).
 1.70 14-May-2004  pk Fix locking issues noticed by Tom Ivar Helbekkmo on tech-net:
* always acquire the device instance lock at splnet()
* missing unlocks in various places

Also, since this driver allows its device instances manipulated by two
independent subsystems (character device & interface clone create/destroy),
be careful not to rip away instance data in a clone destroy request if the
instance is still opened as a character device.
 1.69 13-May-2004  tron Initialize interface type to IFT_TUNNEL as suggested by Erik �ngg�rd
in PR kern/25555.
 1.68 01-Mar-2004  tron branches: 1.68.2;
Don't leak memory if a copyin fails.
 1.67 22-Sep-2003  cl pass signo to fownsignal #ifdef ALTQ
 1.66 22-Sep-2003  christos - pass signo to fownsignal [ok by jd]
- make urg signal handling use fownsignal
- remove out of band detection in sowakeup
 1.65 22-Sep-2003  jdolecek kill unused variable in #ifdef ALTQ part, to make this compile
with ALTQ configured in
 1.64 21-Sep-2003  jdolecek cleanup & uniform descriptor owner handling:
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl

change discussed on tech-kern@
 1.63 29-Jun-2003  fvdl branches: 1.63.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.62 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.61 02-May-2003  itojun KNF
 1.60 01-May-2003  itojun bpf_mtap() does not care about M_PKTHDR at the top. M_COPY_PKTHDR has some
consequences, so avoid it. if we need to attach dummy headers, we should
use M_PREPEND instead.
 1.59 13-Mar-2003  dsl Validate pgid arg to TIOCSPGRP
 1.58 25-Dec-2002  jdolecek count input/output bytes for tun device
Problem reported and patch provided in PR kern/19554 by Michael van Elst
 1.57 26-Nov-2002  christos si_ -> sel_
 1.56 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.55 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.54 23-Sep-2002  simonb uio_resid is a size_t (ie, unsigned), so don't check if it's less than 0.
 1.53 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.52 29-Jul-2002  atatat Make tun interfaces perform auto-creation. This means that if a
program opens /dev/tun# and tun# has not been SIOCIFCREATE'd already,
it will be SIOCIFCREATE'd automatically. FreeBSD's tun interfaces
behave in a somewhat similar fashion.
 1.51 13-Mar-2002  itojun branches: 1.51.4; 1.51.6;
suppress -Wunused if !INET6
 1.50 05-Mar-2002  itojun bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.49 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.48 12-Nov-2001  lukem add RCSIDs
 1.47 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.46 31-Oct-2001  atatat Turn the tun device/network interface into a cloning device.
 1.45 03-Aug-2001  itojun branches: 1.45.2; 1.45.4;
simplify previous fix (0-length mbuf in mbuf chain). from freebsd
 1.44 02-Aug-2001  itojun do not break from loop even if m_len == 0. it's valid to have
mbuf with m_len == 0 in mbuf chain.
 1.43 13-Apr-2001  thorpej branches: 1.43.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.42 17-Jan-2001  thorpej branches: 1.42.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.41 18-Dec-2000  thorpej Fill in if_dlt.
 1.40 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.39 30-Mar-2000  augustss Kill some more register declarations.
 1.38 01-Jul-1999  itojun branches: 1.38.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.37 04-Mar-1999  mjacob branches: 1.37.4; 1.37.6;
adjust format args for compiler changes
 1.36 30-Nov-1998  sommerfe branches: 1.36.2;
Fix PR6473: allow sends to tun* devices using bpf.
 1.35 20-Aug-1998  veego Add some braces to stop the new egcs warnings.
 1.34 05-Jul-1998  jonathan defopt NS, NSIP.
 1.33 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.32 25-Sep-1997  matt Add SIOC{ADD|DEL}MULTI ioctl to support (for IFF_MULTICAST).
 1.31 24-Sep-1997  matt Add support of SIOCIFMTU to vary mtu of interface. Also allow IFF_MULTICAST
on TUNSIFMODE (sometimes you'd like to do IP multicast on tunnel devices).
 1.30 15-Mar-1997  is branches: 1.30.4;
New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.29 13-Oct-1996  christos branches: 1.29.4;
backout previous kprintf change
 1.28 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.27 07-Sep-1996  mycroft Implement poll(2).
 1.26 25-Jun-1996  pk A couple of emulation enhancements from der mouse's PR#2411:
- ability to be either a BROADCAST or POINTTOPOINT interface.
- a humble beginning of link-layer addressing (differs from PR
by using a `struct sockaddr' instead of single byte).
 1.25 22-May-1996  mycroft Removing a completely unneeded reference to curproc.
 1.24 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.23 30-Mar-1996  christos Eliminate need for and remove net_conf.h
 1.22 13-Feb-1996  christos Net prototypes
 1.21 05-Feb-1996  scottr Grammar police; noted by Peter Seebach <seebs@solon.com>. Closes PR #1982.
 1.20 01-Feb-1996  mycroft Rename tunioctl() and tuncioctl() so that cdevsw points to the right one.
From der Mouse, PR 2005.
 1.19 13-Dec-1995  pk Return actual packet length in FIONREAD (noted by Bob Smart).
 1.18 13-Jun-1995  mycroft Update to match data structure changes.
 1.17 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.16 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.15 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.14 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.13 26-May-1994  deraadt MIN -> min
 1.12 15-May-1994  deraadt repair protos and functions
 1.11 03-May-1994  deraadt fixes from <brad@fcr.com> who claims it now works correctly
 1.10 28-Feb-1994  andrew Fixed a bug with TUN_OPEN flag handling during tunclose(), as noted by
Mark Delany <markd@bushwire.apana.org.au>.
 1.9 24-Dec-1993  deraadt must pull in machine-cpu.h
 1.8 13-Dec-1993  deraadt tunnel driver cleanup done by Brad Parker <brad@fcr.com> and myself
 1.7 14-Nov-1993  deraadt use one stop shopping selwakeup/selrecord
 1.6 14-Nov-1993  deraadt cleaned up version of the tunnel driver
 1.5 09-Aug-1993  deraadt branches: 1.5.2;
suser() was being called in the old 4.3 way
 1.4 07-Aug-1993  cgd merge in changes from netbsd-0-9-ALPHA2
 1.3 22-May-1993  cgd branches: 1.3.2;
add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.2.1 31-Jul-1993  cgd give names, err, wmesg's, to my "pain" -- i.e. convert sleep() to tsleep()
 1.5.2.1 03-Nov-1993  mycroft Delete useless assignments to if_init.
 1.29.4.2 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.29.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.30.4.1 29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.36.2.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.37.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.37.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.37.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.38.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.38.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.38.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.38.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.38.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.42.2.10 29-Dec-2002  thorpej Sync with HEAD.
 1.42.2.9 11-Dec-2002  thorpej Sync with HEAD.
 1.42.2.8 11-Nov-2002  nathanw Catch up to -current
 1.42.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.42.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.42.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.42.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.42.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.42.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.42.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.43.2.10 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.43.2.9 02-Oct-2002  jdolecek do not need the (void *) cast for kn_hook anymore
 1.43.2.8 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.43.2.7 16-Mar-2002  jdolecek Catch up with -current.
 1.43.2.6 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.43.2.5 08-Sep-2001  thorpej Use the seltrue filter as appropriate (or, rather, as the "poll"
entry points of these drivers indicate).
 1.43.2.4 08-Sep-2001  thorpej Oops, selwakeup() -> selnotify() for last.
 1.43.2.3 08-Sep-2001  thorpej Add kqueue support.
 1.43.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.43.2.1 03-Aug-2001  lukem update to -current
 1.45.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.45.2.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.45.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.51.6.1 30-Jul-2002  lukem Pull up revision 1.52 (requested by atatat in ticket #572):
Make tun interfaces perform auto-creation. This means that if a
program opens /dev/tun# and tun# has not been SIOCIFCREATE'd already,
it will be SIOCIFCREATE'd automatically. FreeBSD's tun interfaces
behave in a somewhat similar fashion.
 1.51.4.2 29-Aug-2002  gehenna catch up with -current.
 1.51.4.1 16-May-2002  gehenna Add the character device switch.
 1.63.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.63.2.6 18-Dec-2004  skrll Sync with HEAD.
 1.63.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.63.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.63.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.63.2.2 03-Aug-2004  skrll Sync with HEAD
 1.63.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.68.2.2 20-May-2004  grant Pull up revision 1.69 (requested by tron in ticket #325):

Initialize interface type to IFT_TUNNEL as suggested by Erik �ngg�rd
in PR kern/25555.
 1.68.2.1 15-May-2004  tron Pull up revision 1.70 (requested by pk in ticket #335):
Fix locking issues noticed by Tom Ivar Helbekkmo on tech-net:
* always acquire the device instance lock at splnet()
* missing unlocks in various places
Also, since this driver allows its device instances manipulated by two
independent subsystems (character device & interface clone create/destroy),
be careful not to rip away instance data in a clone destroy request if the
instance is still opened as a character device.
 1.75.4.1 29-Apr-2005  kent sync with -current
 1.76.8.11 24-Mar-2008  yamt sync with head.
 1.76.8.10 17-Mar-2008  yamt sync with head.
 1.76.8.9 27-Feb-2008  yamt sync with head.
 1.76.8.8 11-Feb-2008  yamt sync with head.
 1.76.8.7 21-Jan-2008  yamt sync with head
 1.76.8.6 07-Dec-2007  yamt sync with head
 1.76.8.5 27-Oct-2007  yamt sync with head.
 1.76.8.4 03-Sep-2007  yamt sync with head.
 1.76.8.3 26-Feb-2007  yamt sync with head.
 1.76.8.2 30-Dec-2006  yamt sync with head.
 1.76.8.1 21-Jun-2006  yamt sync with head.
 1.78.6.2 01-Jun-2006  kardel Sync with head.
 1.78.6.1 22-Apr-2006  simonb Sync with head.
 1.78.4.1 09-Sep-2006  rpaulo sync with head
 1.78.2.2 01-Mar-2006  yamt sync with head.
 1.78.2.1 18-Feb-2006  yamt sync with head.
 1.82.6.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.82.6.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.82.4.5 11-May-2006  elad sync with head
 1.82.4.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.82.4.3 19-Apr-2006  elad sync with head.
 1.82.4.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.82.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.82.2.6 14-Sep-2006  yamt sync with head.
 1.82.2.5 03-Sep-2006  yamt sync with head.
 1.82.2.4 11-Aug-2006  yamt sync with head
 1.82.2.3 24-May-2006  yamt sync with head.
 1.82.2.2 11-Apr-2006  yamt sync with head
 1.82.2.1 01-Apr-2006  yamt sync with head.
 1.92.4.2 10-Dec-2006  yamt sync with head.
 1.92.4.1 22-Oct-2006  yamt sync with head
 1.92.2.2 12-Jan-2007  ad Sync with head.
 1.92.2.1 18-Nov-2006  ad Sync with head.
 1.95.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.95.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.97.16.3 23-Mar-2008  matt sync with HEAD
 1.97.16.2 09-Jan-2008  matt sync with HEAD
 1.97.16.1 06-Nov-2007  matt sync with HEAD
 1.97.14.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.97.14.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.97.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.97.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.97.2.2 23-Oct-2007  ad Sync with head.
 1.97.2.1 09-Oct-2007  ad Sync with head.
 1.98.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.99.4.1 08-Dec-2007  ad Sync with head.
 1.99.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.99.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.100.4.1 08-Jan-2008  bouyer Sync with HEAD
 1.103.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.103.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.103.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.103.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.103.2.1 24-Mar-2008  keiichi sync with head.
 1.105.2.2 17-Jun-2008  yamt sync with head.
 1.105.2.1 18-May-2008  yamt sync with head.
 1.106.6.1 18-Jun-2008  simonb Sync with head.
 1.106.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.106.2.4 11-Aug-2010  yamt sync with head.
 1.106.2.3 11-Mar-2010  yamt sync with head
 1.106.2.2 16-May-2009  yamt sync with head
 1.106.2.1 04-May-2009  yamt sync with head.
 1.107.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.107.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.110.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.112.4.1 30-May-2010  rmind sync with head
 1.112.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.114.6.1 18-Feb-2012  mrg merge to -current.
 1.114.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.114.2.1 17-Apr-2012  yamt sync with head
 1.115.10.2 18-May-2014  rmind sync with head
 1.115.10.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.115.6.2 03-Dec-2017  jdolecek update from HEAD
 1.115.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.117.2.1 10-Aug-2014  tls Rebase.
 1.120.4.8 28-Aug-2017  skrll Sync with HEAD
 1.120.4.7 05-Feb-2017  skrll Sync with HEAD
 1.120.4.6 05-Oct-2016  skrll Sync with HEAD
 1.120.4.5 09-Jul-2016  skrll Sync with HEAD
 1.120.4.4 29-May-2016  skrll Sync with HEAD
 1.120.4.3 22-Apr-2016  skrll Sync with HEAD
 1.120.4.2 22-Sep-2015  skrll Sync with HEAD
 1.120.4.1 06-Jun-2015  skrll Sync with HEAD
 1.127.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.127.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.134.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.138.4.2 29-Apr-2017  pgoyette Remove explicit inclusion of <sys/localcount.h> since there is no
explicit usage of localcounts here. <sys/conf.h> will take care of
including as needed.
 1.138.4.1 28-Apr-2017  pgoyette Add a localcount to the devsw so it can be loaded as a rump module
 1.139.2.5 11-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1946):

sys/net/if_tun.c: revision 1.175

tun(4): Allow IPv6 packets with TUNSLMODE configured.
PR kern/58013
 1.139.2.4 15-Aug-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #974):

sys/net/if_tun.c: revision 1.145
sys/net/if_tun.c: revision 1.146

tun: fix locking against myself

filt_tunread is called with tun_lock held from tun_output (via tun_output =>
selnotify => knote), so we must not take tun_lock in filt_tunread. The bug
is triggered only if a tun is used through kqueue.

Found by k-goda@IIJ

Fix tun(4) kevent locking

filt_tunread gets called in two contexts:
- by calls to selnotify in if_tun.c (or knote, as the case may be,
but not here), in which case tp->tun_lock is held; and
- by internal logic in kevent, in which tp->tun_lock is not held.

The standard convention to discriminate between these two cases is by
setting the kernel-only NOTE_SUBMIT bit in the hint to selnotify or
knote; then in filt_*:

if (hint & NOTE_SUBMIT)
KASSERT(mutex_owned(&tp->tun_lock));
else
mutex_enter(&tp->tun_lock);
...
if (hint & NOTE_SUBMIT)
KASSERT(mutex_owned(&tp->tun_lock));
else
mutex_exit(&tp->tun_lock);

Pointed out by and patch from riastradh@
Tested by ozaki-r@ (only the former path)
 1.139.2.3 17-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #638):
sys/net/if_tun.c: revision 1.143

Add packet filtering to tun(4) interfaces.

Calls to pfil_run_hooks() were missing in if_tun.c. This meant that
filtering configuration could be added to e.g. /etc/npf.conf, but
would be ignored, because the filter never saw the packets. This
change adds the required calls.

While here, correct the return value from tun_output(): it's been
returning 0 regardless of any error condition present, but will now
correctly propagate such information upward.

Thanks to maxv for guidance!
OK: christos, martin
 1.139.2.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.139.2.1 08-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #349):
sys/net/if_l2tp.c: revision 1.14
sys/net/if_tap.c: revision 1.101
sys/net/if_tun.c: revision 1.141
sys/net/if_vlan.c: revision 1.106
Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use
if_link_state_change
 1.142.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.142.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.142.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.144.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.144.2.1 10-Jun-2019  christos Sync with HEAD
 1.156.2.1 11-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1815):

sys/net/if_tun.c: revision 1.175

tun(4): Allow IPv6 packets with TUNSLMODE configured.
PR kern/58013
 1.157.2.1 29-Feb-2020  ad Sync with head.
 1.161.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.173.8.1 16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.173.4.3 21-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #899):

sys/net/if_tun.c: revision 1.177

tun(4): Mark tunread_filtops `FILTEROP_MPSAFE`

Filter handlers have already been MP-safe since 2018:
https://mail-index.netbsd.org/source-changes/2018/08/06/msg097317.html

Note that we do not expect deadlocks similar to bpf(4) (PR kern/58531),
b/w KERNEL_LOCK and spin mutex for TX queue.

For tun(4), filt_tunread() acquires adaptive mutex. This is forbidden
when spin mutex is already held.

Such a path must have already been detected if present.

Thanks ozaki-r@ for discussion.
 1.173.4.2 11-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #627):

sys/net/if_tun.c: revision 1.175

tun(4): Allow IPv6 packets with TUNSLMODE configured.
PR kern/58013
 1.173.4.1 14-Jan-2024  martin Pull up following revision(s) (requested by chs in ticket #540):

sys/net/if_tun.c: revision 1.174

tun: add missing kpreempt_enable() if pktq_enqueue() fails
 1.175.2.1 02-Aug-2025  perseant Sync with HEAD
 1.22 13-Mar-2022  riastradh tun(4): Omit TUN_RWAIT micro-optimization.

cv_broadcast aleady has a fast path for no-waiters.
 1.21 13-Mar-2022  riastradh tun(4): Add missing includes in if_tun.h.
 1.20 26-Jan-2017  skrll Make MP-safe and use kmem(9)

Mostly from rmind-smpnet
 1.19 06-Sep-2015  dholland branches: 1.19.2; 1.19.4;
More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.18 18-Oct-2014  snj branches: 1.18.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.17 28-Jan-2012  rmind branches: 1.17.6; 1.17.10;
Replace tun_lock with mutex(9). XXX: too far from being MP-safe yet.
 1.16 24-Apr-2008  ad branches: 1.16.36; 1.16.40;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.15 04-Apr-2006  rpaulo branches: 1.15.58; 1.15.60;
Change the number of TUN[GS]IFHEAD to avoid collision with if_pp.
Noticed by Simon Burge.
 1.14 03-Apr-2006  rpaulo Implement TUN_IFHEAD, the missing piece that was breaking old applications.
 1.13 11-Dec-2005  christos branches: 1.13.4; 1.13.6; 1.13.8; 1.13.10; 1.13.12;
merge ktrace-lwp.
 1.12 26-Feb-2005  perry branches: 1.12.4;
nuke trailing whitespace
 1.11 21-Sep-2003  jdolecek branches: 1.11.8; 1.11.10;
cleanup & uniform descriptor owner handling:
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl

change discussed on tech-kern@
 1.10 31-Oct-2001  atatat branches: 1.10.16;
Turn the tun device/network interface into a cloning device.
 1.9 12-Dec-2000  thorpej branches: 1.9.2; 1.9.4; 1.9.8;
Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.8 18-Mar-1998  tv branches: 1.8.6; 1.8.14;
PR #2736: wrap the softc in #ifdef _KERNEL so userland can include this
file to get at the ioctl values without barfing on the softc
 1.7 05-Jan-1998  perry Fix imported RCS keyword slightly
 1.6 25-Jun-1996  pk A couple of emulation enhancements from der mouse's PR#2411:
- ability to be either a BROADCAST or POINTTOPOINT interface.
- a humble beginning of link-layer addressing (differs from PR
by using a `struct sockaddr' instead of single byte).
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 03-May-1994  deraadt fixes from <brad@fcr.com> who claims it now works correctly
 1.3 13-Dec-1993  deraadt change MTU to 1500 bytes. Should be settable?
 1.2 13-Dec-1993  deraadt tunnel driver cleanup done by Brad Parker <brad@fcr.com> and myself
 1.1 14-Nov-1993  deraadt branches: 1.1.2;
cleaned up version of the tunnel driver
 1.1.2.2 14-Nov-1993  deraadt cleaned up version of the tunnel driver
 1.1.2.1 14-Nov-1993  deraadt file if_tun.h was added on branch magnum on 1993-11-14 20:07:24 +0000
 1.8.14.1 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.8.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.9.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.9.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.10.16.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.10.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.10.16.1 03-Aug-2004  skrll Sync with HEAD
 1.11.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.11.8.1 29-Apr-2005  kent sync with -current
 1.12.4.1 21-Jun-2006  yamt sync with head.
 1.13.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.13.10.1 19-Apr-2006  elad sync with head.
 1.13.8.1 11-Apr-2006  yamt sync with head
 1.13.6.1 22-Apr-2006  simonb Sync with head.
 1.13.4.1 09-Sep-2006  rpaulo sync with head
 1.15.60.1 18-May-2008  yamt sync with head.
 1.15.58.1 02-Jun-2008  mjf Sync with HEAD.
 1.16.40.1 18-Feb-2012  mrg merge to -current.
 1.16.36.1 17-Apr-2012  yamt sync with head
 1.17.10.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.17.6.1 03-Dec-2017  jdolecek update from HEAD
 1.18.2.2 05-Feb-2017  skrll Sync with HEAD
 1.18.2.1 22-Sep-2015  skrll Sync with HEAD
 1.19.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.34 20-Mar-2022  andvar s/Multilik/Multilink/
 1.33 20-Mar-2022  andvar s/circut/circuit/ and s/circiut/circuit/ in comments and acronyms file.
 1.32 09-Aug-2021  andvar fix typos in asymmetry, asymmetric(al), symmetrical.
 1.31 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.30 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.29 31-Jul-2018  khorben Add a port of the umb(4) driver from OpenBSD

The umb(4) driver provides support for USB MBIM (Mobile Broadband
Interface Model) devices.

MBIM devices establish connections via cellular networks such as GPRS,
UMTS, and LTE. They appear as a regular point-to-point network interface, transporting raw IP frames.

Required configuration parameters like PIN and APN have to be set with
umbctl(8), a new tool specific to this driver. The IP address is configured
automatically; the default route and DNS server information have to be set
separately.

The driver is not fully functional yet, it is therefore still marked as
experimental and disabled by default. Any help welcome to complete it!

Tested on NetBSD/amd64, with a Sierra Wireless EM7345 LTE modem on a Lenovo
ThinkPad T440s. No functional change expected otherwise.
 1.28 10-Jan-2018  knakahara branches: 1.28.2; 1.28.4;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.27 16-Feb-2017  knakahara branches: 1.27.6;
add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.26 05-Aug-2012  wiz branches: 1.26.2; 1.26.16; 1.26.20; 1.26.24;
Avoid ambiguity by having only one comment close mark.
PR 46771 by bsiegert.
 1.25 18-May-2006  liamjfoy branches: 1.25.54; 1.25.74; 1.25.98; 1.25.104;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.24 10-Dec-2005  elad branches: 1.24.4; 1.24.6; 1.24.8; 1.24.12;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.23 22-Jun-2004  itojun branches: 1.23.12;
prepare PF-related hooks. reviewed by matt, perry, christos
 1.22 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.21 10-Jul-2002  itojun branches: 1.21.6;
use iana-assigned value for IFT_STF. sync w/kame
 1.20 23-May-2002  itojun add recently-added IANA values from http://www.iana.org/assignments/smi-numbers
 1.19 07-Nov-2001  bjh21 branches: 1.19.8;
Sync with IANA. This finally gets us IFT_ECONET.
 1.18 23-Aug-2001  bjh21 branches: 1.18.4;
Update location of IANA smi-numbers file, since the old one doesn't work any
more.

While I'm here, add IANA assignments 0xbe--0xc5.
 1.17 26-Oct-2000  onoe branches: 1.17.2; 1.17.4;
Add new numbers from IANA: 0x83 - 0xbd
 1.16 19-Apr-2000  itojun introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.15 22-Mar-2000  itojun beautify
 1.14 01-Jul-1999  itojun branches: 1.14.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.13 18-Jan-1999  msaitoh branches: 1.13.4; 1.13.6;
fix "CMSA CD" to "CSMA/CD"
 1.12 01-Mar-1998  ross Add new type number received from IANA. Also, note the new home of
the IANA master list, post RFC1573.
 1.11 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.10 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.9 04-Feb-1998  ross And yet more numbers, e.g., CATV upstream and downstream types.
 1.8 03-Feb-1998  ross Add the last few years of IANA assignments, e.g., Gb ethernet.
 1.7 27-Feb-1995  glass fix some typos. from frank@fwi.uva.nl (Frank van der Linden)
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 14-Aug-1993  deraadt ppp from paul mackerras
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.13.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.13.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.13.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.13.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.14.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.14.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.4.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.17.2.4 01-Aug-2002  nathanw Catch up to -current.
 1.17.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.18.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.8.2 15-Jul-2002  gehenna catch up with -current.
 1.19.8.1 30-May-2002  gehenna Catch up with -current.
 1.21.6.4 11-Dec-2005  christos Sync with head.
 1.21.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.21.6.1 03-Aug-2004  skrll Sync with HEAD
 1.23.12.1 21-Jun-2006  yamt sync with head.
 1.24.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.24.8.1 24-May-2006  yamt sync with head.
 1.24.6.1 01-Jun-2006  kardel Sync with head.
 1.24.4.1 09-Sep-2006  rpaulo sync with head
 1.25.104.1 08-Aug-2012  martin Pull up following revision(s) (requested by wiz in ticket #464):
sys/net/if_types.h: revision 1.26
Avoid ambiguity by having only one comment close mark.
PR 46771 by bsiegert.
 1.25.98.1 30-Oct-2012  yamt sync with head
 1.25.74.1 22-Aug-2012  bouyer Pull up following revision(s) (requested by wiz in ticket #1786):
sys/net/if_types.h: revision 1.26
Avoid ambiguity by having only one comment close mark.
PR 46771 by bsiegert.
 1.25.54.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.26.24.1 21-Apr-2017  bouyer Sync with HEAD
 1.26.20.1 20-Mar-2017  pgoyette Sync with HEAD
 1.26.16.1 28-Aug-2017  skrll Sync with HEAD
 1.26.2.1 03-Dec-2017  jdolecek update from HEAD
 1.27.6.1 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.28.4.1 10-Jun-2019  christos Sync with HEAD
 1.28.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.4 26-Sep-2024  roy vether(4): control link state via media rather than flags

This mirrors shmif(4) from rump.
 1.3 24-Sep-2024  roy vether(4): link0 now means link state up

Which makes more sense than -link0 meaning link state up.
link0 is now a default flag when the interface is created.
 1.2 24-Sep-2024  roy vether(4): allow link state to be toggled by link0

Take link state down: ifconfig vether0 link0
Bring link state up: ifconfig vether0 -link0

Handy for testing how programs react to link state change on a VM.
 1.1 27-Sep-2020  roy branches: 1.1.26;
vether: Implement a virtual ethernet interface

The vether interface simulates a normal Ethernet interface by encapsulating
standard network frames with an Ethernet header, specifically for use as
a member in a bridge(4).

To use vether the administrator needs to configure an address onto the
interface so that packets can be routed to it. An Ethernet header will
be prepended and, if the vether interface is a member of a bridge(4),
the frame will show up there.

Taken from OpenBSD.
 1.1.26.1 02-Aug-2025  perseant Sync with HEAD
 1.172 29-Jun-2024  riastradh if_stats(9): Add ifp argument to if_stat..._ref.

This will enable us to pass the ifp through to a dtrace probe inside.

No functional change intended in this change, but this is an API
change visible to modules so it shouldn't be pulled up.

PR kern/58377
 1.171 02-Nov-2023  yamaguchi branches: 1.171.2;
Support vlan(4) over l2tp(4)
 1.170 20-Jun-2022  yamaguchi branches: 1.170.4;
bridge(4): support VLAN frames stripped by hardware tagging
 1.169 20-Jun-2022  yamaguchi Determine the length of VLAN encapsulation by an interface type,
and remove it from struct ifvlan_linkmib
 1.168 20-Jun-2022  yamaguchi Handling frames that vlan id is 0 as non-VLAN frames
even if a vlan tag is stripped by harware offloading
 1.167 24-Dec-2021  yamaguchi Fix missing curlwp_bind()

Fixes kern/56556
 1.166 06-Dec-2021  yamaguchi decrease the MTU of vlan(4)
only when ETHERCAP_VLAN_MTU of the parent interface is enabled

This fixed the bug that the MTU of a vlan interface is decreased
when the parent interface already has another vlan interface.
pointed out by tnn@n.o, thanks.
 1.165 15-Nov-2021  yamaguchi introduced APIs to configure VLAN TAG to ethernet devices
 1.164 05-Oct-2021  yamaguchi Replace the list for vlan interfaces with the counter

The list had been used in vlan_ifdetach(), but it is no longer in
use as a linked list by introducing ether_ifdetach hook.
 1.163 30-Sep-2021  yamaguchi vlan: Register vlan_ifdetach to ether_ifdetach hook
 1.162 30-Sep-2021  yamaguchi vlan: Register the callback to update link-state of vlan I/F
to link-state change hook

The callback is registered in every vlan I/F even if the parent
interface is the same. Therefore it is not needed to search the
vlan I/F by the parent interface unlike the previous callback.
 1.161 17-Jul-2021  hannken Mark vlan_safe_ifpromisc_locked() as "__unused" to appease LLVM.

Maybe completely remove this short helper?
 1.160 15-Jul-2021  yamaguchi vlan: drop tagged outgoing packets

vlan(4) doesn't support Q-in-Q yet.
 1.159 14-Jul-2021  yamaguchi unset IFF_PROMISC at bpf_detach()

Doing "d->bd_promisc = 0" is that bpf_detach() does not call
ifpromisc(ifp, 0). Currently, there is no reason for
this behavior so that it is removed.
In addition to the change, the workaround for it in vlan(4)
is also removed.
 1.158 14-Jul-2021  yamaguchi Make an mbuf writable before un-tagging
 1.157 06-Jul-2021  yamaguchi Drop unicast packets that are not for us
when vlan(4) is not in promisc
 1.156 06-Jul-2021  yamaguchi vlan: added NULL check for the parent interface

The pointer may set to NULL by vlan_unconfig
while packet processing
 1.155 06-Jul-2021  yamaguchi vlan: set the link state to DOWN when its parent detaches
 1.154 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.153 26-Sep-2020  roy branches: 1.153.6;
vlan: match the interface link state with that of the parent

Now addresses on a vlan will detach and undergo duplicate address
dectection on link state changes just as on a standard interface.
 1.152 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.151 01-Feb-2020  riastradh Switch if_vlan to atomic_load/store_*.

Fix missing membar_datadep_consumer -- now atomic_load_consume -- in
vlan_lookup_tag_psref.
 1.150 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.149 12-Dec-2019  pgoyette branches: 1.149.2;
Rather than keeping a separate mutex, condvar, and pserialize for each
module hook, we can share a common set of synchronization structures.
This cuts the amount of cacheline_aligned data for these structures by
50%.

Note that we still have a per-hook localcount, since we need to count
individual references.

As discussed with riastradh@

Welcome to 9.99.22 !
 1.148 11-Nov-2019  yamaguchi Fix a bug that vlan(4) fragments IPv6 packets
even the MTU > packet length.

The bug is appeared when the mtu is increased on SIOCSETVLAN.

From t-kusaba@IIJ
 1.147 21-Oct-2019  ozaki-r vlan: get rid of unnecessary if_ipackets++ in vlan_input

It's done by if_input() below now.

Pointed out by msaitoh@
 1.146 23-Aug-2019  msaitoh - kmem_alloc(,KM_SLEEP) never return NULL, so remove NULL check.
- VLAN ID is never duplicated, so break the loop when found. Also move
kmen_free() outside of ETHER_LOCK(ec)/ETHER_UNLOCK(ec) to reduce the hold
time. suggested by ozaki-r.
- Whitespace fix.
 1.145 21-Aug-2019  msaitoh Use ETHER_LOCK()/ETHER_UNLOCK() suggested by knakahara.
 1.144 20-Aug-2019  msaitoh Fix a bug that VLAN HW "tagging" enable/disable may not refrect correctly.

- Always call ec_vlan_cb() if it exists.
- Some (or all?) ethernet drivers don't enable HW tagging if no any vlan is
attached. ixgbe is one of them. Check the the transition and update
VLAN HW tagging function.

XXX pullup-9
 1.143 20-Aug-2019  msaitoh Add missing IFNET_LOCK() and IFNET_UNLOCK() in vlan_config().

XXX pullup-9
 1.142 20-Aug-2019  msaitoh Check ec_capenable instead of ec_capabilities to control TX side of VLAN HW
tagging correctly.

XXX pullup-9
 1.141 17-Jul-2019  msaitoh branches: 1.141.2;
Implement VLAN hardware filter function(ETHERCAP_VLAN_HWFILTER).
First proposed by jmcneill in 2017 and modified by me.

How to use:

- Set callback function:

ether_set_vlan_cb(struct ethercom *, ether_vlancb_t)

- Callback. This function is called when a vlan is attached/detached to the
parent interface:

int (*ether_vlancb_t)(struct ethercom *ec, uint16_t vlanid, bool set);

- ifconfig(8)

ifconfig ixg0 [-]vlan-hwfilter

Note that ETHERCAP_VLAN_HWFILTER is set by default on ixg(4) because
the PF driver usually enable "all block" filter by default.
 1.140 17-Jul-2019  msaitoh KNF. No functional change.
 1.139 09-Jul-2019  msaitoh Don't automatically set ec_capenable's ETHERCAP_VLAN_HWTAGGING bit in
vlan_config() to make it user-controllable. Instead, set the bit in
xxx_attach().
 1.138 25-Jun-2019  msaitoh Simplify "LIST_HEAD();" to make the code more understandable.
No functional change.
 1.137 18-Jun-2019  msaitoh KNF. No functional change.
 1.136 15-May-2019  ozaki-r Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
Unfortunatetly it caused a deadlock instead. A known scenario causing a
deadlock is to occur the following two operations concurrently: (a) a removal of
an IP adddres assigned to an interface and (b) a manipulation of multicast
groups to the interface. The resource dependency graph is like this:
softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

Thanks to the previous commit that avoids data races on if_flags for
IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
deadlock.

PR kern/54189
 1.135 26-Apr-2019  pgoyette Some more empty-string --> NULL conversions for module dependencies
 1.134 23-Mar-2019  pgoyette Replace compile-time checking for vlan code with a module hook.

Should resolve the errors reported on irc when booting a kernel which
has agr without vlan:


[ 1.0000000] WARNING: module error: built-in module if_agr can't find builtin dependency `if_vlan'
[ 1.0000000] WARNING: module error: built-in module if_agr prerequisite if_vlan failed, error 2
 1.133 19-Oct-2018  knakahara Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.132 18-Oct-2018  knakahara fix panic when do ifconfig -vlanif and ifconfig vlanif again. advised by ozaki-r@.

e.g. do the following commands.
====================
# ifconfig vlan0 create
# ifconfig vlan0 vlan 100 vlanif wm0
# ifconfig vlan0 -vlanif wm0
# ifconfig vlan0 vlan 100 vlanif wm0
====================

ATF net/if_vlan do this type of test, however it cannot detect this bug.
Because the shmif(4)'s ifp->if_hwdl is always NULL as shmif(4)'s ethernet
address is set U/L bit.
See: https://nxr.netbsd.org/xref/src/sys/net/if_ethersubr.c#997
 1.131 03-Aug-2018  jmcneill Use a different psz for a different lock. Patch from riastradh, reviewed
by ozaki-r.
 1.130 26-Jun-2018  msaitoh branches: 1.130.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.129 14-Jun-2018  yamaguchi Fix to check whether the address has been added before delete

The list named ifv_mc_listhead saves multicast addresses that
are added through SIOCADDMULTI. Each nodes added to the list
are used for deleting the related address from a parent I/F
when remove the configuration of parent I/F.
In carp(4) and OpenBSD's vlan(4), the lists is used to check
a parameter of SIOCDELMULTI in addition to the use.
Based on them, the check is added to vlan(4)

ok ozaki-r@
 1.128 14-Jun-2018  yamaguchi Add the lock to refer the list included in ethercom for safety

The lock is already held while adding and deleting
ok ozaki-r@
 1.127 14-Jun-2018  yamaguchi Use ether_lookup_multi() instead of the macro

ok ozaki-r@
 1.126 12-Jun-2018  ozaki-r vlan: call ether_ifdetach without IFNET_LOCK

Fix PR kern/53357
 1.125 16-Mar-2018  tih Fix the handling of the state returned from pfil_run_hooks().

pfil_run_hooks() invokes any registered packet filters on the packet
being handled. It may return a (non-zero) errno, indicating that a
filter has decided that the packet should be discarded, and has freed
the mbuf. While a non-error (0) return usually means that the packet
should be processed normally, a filter may still free the mbuf if the
packet is a fragment, and the filter is holding it for reassembly and
future evaluation. Therefore, there must be separate tests for the
return value and for a possible discarded packet. (See pfil(9).)

OK: christos, martin
 1.124 15-Jan-2018  maxv branches: 1.124.2;
Mostly style, and add a bunch of KASSERTs.
 1.123 15-Jan-2018  maxv Style, improve comment, and add KASSERTs on the assumptions.
 1.122 14-Jan-2018  maxv If cnt == 0, don't kmem_alloc(0). Found by Mootja.

Looking at the code, I also find it suspicious that we read
ifv->ifv_mib->ifvm_p directly without making sure ifv_mib != NULL.
 1.121 19-Dec-2017  ozaki-r Don't set IFEF_MPSAFE unless NET_MPSAFE at this point

Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.

Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.120 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.119 11-Dec-2017  ozaki-r Wrap if_ioctl_lock with IFNET_* macros (NFC)

Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
 1.118 08-Dec-2017  ozaki-r Fix build of kernels without ether

By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.

PR kern/52790
 1.117 06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock on if_up and if_down

One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
 1.116 06-Dec-2017  ozaki-r Ensure to not turn on IFF_RUNNING of an interface until its initialization completes

And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
 1.115 06-Dec-2017  ozaki-r Fix locking against myself on ifpromisc

vlan_unconfig_locked could be called with holding if_ioctl_lock.
 1.114 06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock when calling if_flags_set
 1.113 27-Nov-2017  jmcneill kern/52765: npf cannot do port forwarding on vlan interfaces

Add pfil hooks support to vlan(4), from christos@
 1.112 22-Nov-2017  msaitoh s/65535/4095/ in the comment. Pointed out by christos. Thanks.
 1.111 22-Nov-2017  msaitoh Return EINVAL in vlan_config() when a VLAN ID is 0 or 65535. The spec states
0 and 65535 are reserved.
 1.110 22-Nov-2017  msaitoh No functional change:
- u_int16_t -> uint16_t
- u_short -> uint16_t
- tag_hash_func -> vlan_tag_hash
- 0 -> NULL because vlr_parent is a pointer.
 1.109 22-Nov-2017  ozaki-r Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE

If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.

This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.

Proposed on tech-kern@ and tech-net@
 1.108 22-Nov-2017  msaitoh Fix a bug that a vlan packet which has priority or CFI bit in the tag causes
panic.
 1.107 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.106 30-Oct-2017  ozaki-r Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use if_link_state_change
 1.105 23-Oct-2017  msaitoh If if_initialize() failed in the attach function, free resources and return.
 1.104 19-Oct-2017  knakahara fix vlan panic when vlan is re-configured without destroy.

E.g. the following operation causes this panic.
====================
# ifconfig vlan0 create
# ifconfig vlan0 vlan 1 vlanif ixg3
# ifconfig vlan1 create
# ifconfig vlan1 vlan 1 vlanif ixg2
# ifconfig vlan1 -vlanif
# ifconfig vlan1 vlan 1 vlanif ixg2

panic: kernel diagnostic assertion "new->ple_next == NULL" failed: file "/git/netbsd-src/sys/sys/pslist.h", line 118
====================

Pointed out and tested by msaitoh@n.o, fixed by s-yamaguchi@IIJ, thanks.

XXX need pullup-8
 1.103 12-Oct-2017  ozaki-r Set IFEF_START_MPSAFE by default

Because vlan_start is already MP-safe, there is no reason to not do so.

Acked by s-yamaguchi@IIJ
 1.102 11-Oct-2017  msaitoh Remove accidentally added code (for VLAN hardware filter).
 1.101 11-Oct-2017  msaitoh Check if VLAN ID isn't duplicated on a same parent interface and return
EEXIST if it failed.
 1.100 26-Sep-2017  knakahara VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.

I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html

XXX need pullup to -8 branch
 1.99 09-Aug-2017  knakahara Fix vlan(4) obytes counter. Implemented by s-yamaguchi@IIJ, thanks.
 1.98 07-Jun-2017  knakahara vlan(4) MP-ify. contributed by s-yamaguchi@IIJ, thanks.

XXX Pull-ups needed for netbsd-8 branch
 1.97 29-May-2017  ozaki-r branches: 1.97.2;
Call in6_ifdetach only if in6_present (for rump)

Otherwise ifconfig -vlanif causes a panic on a rump_server without
the netinet6 library.

Reported by s-yamaguchi@IIJ
 1.96 15-Mar-2017  ozaki-r Fix memory leak in vlan_start
 1.95 23-Jan-2017  ozaki-r Fix typo in a comment
 1.94 13-Jan-2017  msaitoh branches: 1.94.2;
Fix a bug that the parent interface's callback wasn't called when the vlan
interface is configured. A callback function uses VLAN_ATTACHED() function
which check ec->ec_nvlans, the value should be incremented before calling the
callback. This bug was added in if_vlan.c rev. 1.83 (2015/11/19).
 1.93 15-Dec-2016  ozaki-r Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input

The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
 1.92 28-Nov-2016  joerg Don't check parent capabilities when a parent interface hasn't been
assigned.
 1.91 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.90 22-Jun-2016  knakahara branches: 1.90.2;
fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.89 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.88 09-May-2016  christos Don't increment the reference count only when it was 0...
From Jean-Jacques.Puig
 1.87 28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.86 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.85 20-Apr-2016  knakahara IFQ_ENQUEUE refactor (2/3) : eliminate pktattr argument from altq implemantation
 1.84 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.83 19-Nov-2015  christos Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig@espci.fr). Factor out the vlan_mtu enabling and
disabling code.
 1.82 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.81 18-Apr-2015  ozaki-r Count up parent's obytes and omcasts counters

PR kern/49837
 1.80 29-Mar-2015  ozaki-r Correct frame padding length

vlan pads a frame with zeros up to 68 bytes
(ETHER_MIN_LEN + ETHER_VLAN_ENCAP_LEN). It expects
that even if the frame is untagged, it keeps 64 bytes
at least. However, it lacks concern about CRC
(4 bytes). So a sending frame can be 72 (68 + 4) bytes.

PR 49788
 1.79 16-Jan-2015  ozaki-r Introduce defflag for NET_MPSAFE
 1.78 11-Oct-2014  ozaki-r branches: 1.78.2;
Execute if_detach within splnet where vlan_unconfig is

With the fix, a ifnet data of vlan can avoid use after free
that results in a fatal page fault.

This problem was found when fixing PR 49264. See
http://mail-index.netbsd.org/netbsd-bugs/2014/10/10/msg038536.html
for more detail.
 1.77 11-Oct-2014  ozaki-r Tweak vlan_unconfig

No functional change.
 1.76 11-Oct-2014  ozaki-r Protect vlan_unconfig with a mutex

It is not thread-safe but is likely to be executed in concurrent.
See PR 49264 for more detail.
 1.75 09-Oct-2014  ozaki-r Do KASSERT(KERNEL_LOCKED_P()) only when NET_MPSAFE off

When NET_MPSAFE, bridge_enqueue calls vlan_start w/o KERNEL_LOCK.
 1.74 15-Sep-2014  ozaki-r Delete link local addresses of a vlan interface when detaching its parent

This fixes PR 49197.
 1.73 15-Sep-2014  ozaki-r Leave promiscuous mode when detaching a parent (ifconfig -vlanif)

We have to call ifpromisc(ifp, 0) for both a VLAN interface
and its parent when they are in promiscuous mode.

PR 49196
 1.72 12-Sep-2014  ozaki-r Call if_input of vlan interface itself, not parent's one

And also we need to drop M_PROMISC before calling if_input;
it was originally at just before bridge_input in ether_input.

Then we can bridge vlan interfaces again.
 1.71 12-Sep-2014  ozaki-r Restore vlan_ioctl overwritten by ether_ifdetach in vlan_unconfig

This fixes PR 49112.
 1.70 13-May-2014  bouyer branches: 1.70.2;
Make sure *(if_output)() is called with KERNEL_LOCK held.
Add some KASSERT for this.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details.
 1.69 19-Oct-2011  dyoung branches: 1.69.8; 1.69.12; 1.69.14; 1.69.16; 1.69.22; 1.69.26;
Use if_flags_set() and if_mcast_op().
 1.68 19-Oct-2011  dyoung Use if_mcast_op() and if_flags_set() instead of calling ifp->if_ioctl().
 1.67 08-Apr-2011  sborrill PR kern/38871

Fix LAN on bge(4), alc(4). Flag VLAN capability in ec_capenable as used by network
card drivers.
 1.66 05-Apr-2010  joerg branches: 1.66.2;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.65 28-Feb-2010  darran branches: 1.65.2;
Propagate the IFCAP_TSOv6 property also.
 1.64 19-Jan-2010  pooka branches: 1.64.2;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.63 01-Apr-2009  darran Also inherit the parent's TCP segmentation offload capability.
Note the vlan interface does not see updates to the parents capabilities
so if, for example, TSO is on in both, then turned off in the parent it
will remain on in the vlan interface.
 1.62 17-Dec-2008  cegger branches: 1.62.2;
kill MALLOC and FREE macros.
 1.61 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.60 11-Oct-2008  bouyer branches: 1.60.2; 1.60.4; 1.60.8;
Make SIOCSIFCAP work again on vlan interfaces: first check that the
capability is enabled on parent, then call ifioctl_common().
 1.59 15-Jun-2008  christos branches: 1.59.2;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.58 28-Apr-2008  martin branches: 1.58.2; 1.58.4;
Remove clause 3 and 4 from TNF licenses
 1.57 20-Feb-2008  matt branches: 1.57.6; 1.57.8; 1.57.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.56 07-Feb-2008  dyoung Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.55 19-Sep-2007  dyoung branches: 1.55.6;
Constify sockaddr argument to ether_multiaddr(). Change struct
ifreq * arguments to ether_addmulti() and ether_delmulti() to const
struct sockaddr *, since ether_{add,del}multi() only ever read the
sockaddr ifreq member, ifr_addr. Update uses in carp(4) and in
vlan(4).
 1.54 26-Aug-2007  dyoung branches: 1.54.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.53 04-Mar-2007  christos branches: 1.53.2; 1.53.10; 1.53.14;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.52 16-Nov-2006  christos branches: 1.52.4;
__unused removal on arguments; approved by core.
 1.51 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.50 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.49 23-Jul-2006  ad branches: 1.49.4; 1.49.6;
Use the LWP cached credentials where sane.
 1.48 14-May-2006  elad integrate kauth.
 1.47 11-Dec-2005  christos branches: 1.47.4; 1.47.6; 1.47.8; 1.47.10; 1.47.12;
merge ktrace-lwp.
 1.46 02-May-2005  yamt branches: 1.46.2;
split IFCAP_CSUM_xxx to IFCAP_CSUM_xxx_Rx and IFCAP_CSUM_xxx_Tx.
 1.45 17-Mar-2005  yamt vlan_input: add a missing EVL_VLANOFTAG in the case of hw offloading.
 1.44 26-Feb-2005  perry branches: 1.44.2;
nuke trailing whitespace
 1.43 21-Feb-2005  christos Re-arrange code slightly to avoid code duplication and allow to bail
out faster without doing de-capsulation work. From FreeBSD.
 1.42 04-Dec-2004  peter branches: 1.42.4; 1.42.6;
Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.41 08-Jul-2004  mycroft If the parent interface is not IFF_RUNNING, do not call its start function.
This prevent a variety of fun panics, and therefore fixes PR 12932, PR 17561,
and PR 18376...

XXX
...however this is most definitely a hack. The real problem here is that there
is no callback to notify a "client" interface like vlan when a "parent"
interface's status changes, and therefore the vlan interface is always
IFF_RUNNING. This allows packets to be queued on vlan interface at any time.
We can't simply leave the packets on the vlan interface, either, because there
is no callback to dequeue them. And last, since it's always IFF_RUNNING, if
we just *toss* the packets, we lose gratuitous ARPs and DAD packets.

"This needs work," but at least it no longer bleeds.
 1.40 23-Apr-2004  simonb s/the the/the/ (only in sources that aren't regularly imported from
elsewhere).
 1.39 21-Apr-2004  itojun kill sprintf, use snprintf
 1.38 05-Dec-2003  scw branches: 1.38.2;
To cater for VLAN-aware layer 2 ethernet switches which may need to strip
the tag before forwarding the packet, make sure the packet+tag is at least
68 bytes long.

This is necessary because our parent will only pad to 64 bytes (ETHER_MIN_LEN)
and some switches will not pad by themselves after deleting a tag.
 1.37 02-Oct-2003  itojun need to use m_freem(), not m_free(). from iij seil team
 1.36 09-Sep-2003  drochner Fix vlan tag sending in the ETHERCAP_VLAN_HWTAGGING case.
Makes my "txp" work and fixes "bge" -- PR kern/20363 by Scott Ellis.
 1.35 17-Jan-2003  itojun branches: 1.35.2;
switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.34 11-Jun-2002  pooka fix a few typos in comments
 1.33 12-Nov-2001  lukem branches: 1.33.8; 1.33.10;
add RCSIDs
 1.32 12-Jun-2001  thorpej branches: 1.32.2;
If the parent interface can do hardware-assisted VLAN encapsulation,
then propagate its hardware-assisted checksumming flags.
 1.31 07-Apr-2001  thorpej Add ALTQ support (both for the VLAN interface itself, as well as for
being a VLAN on a ALTQ'ified interface).
 1.30 29-Jan-2001  thorpej branches: 1.30.2;
Start out with a link name that says "802.1Q VLAN", and inherit the
parent interface's as usual once we attach to the parent. When we
detach from the parent, reset our link name to the "802.1Q VLAN" name.
 1.29 28-Jan-2001  itojun call if_alloc_sadl(). without it the following operation causes kernel panic:
# ifconfig vlan0 create
# ifconfig vlan0
 1.28 17-Jan-2001  thorpej If no link level name is assigned, return EADDRNOTAVAIL on
SIOCGIFADDR.
 1.27 16-Jan-2001  thorpej No need to reference ifnet_addrs[].
 1.26 18-Dec-2000  thorpej branches: 1.26.2;
Small cosmetic change.
 1.25 18-Dec-2000  thorpej We now support hw vlan tag support in network interfaces, so remote it
from the TODO list.
 1.24 17-Nov-2000  bouyer branches: 1.24.2;
Supports hardware 802.1q VLAN tagging, per discussion on tech-net. The tag is
stored in a m_aux mbuf defined by AF_LINK, ETHERTYPE_VLAN.
Thanks to Jason & Itojun for the feedback.
 1.23 15-Nov-2000  bouyer Per discussion with Jason, change flags filter to
(IFF_UP | IFF_BROADCAST | IFF_RUNNING | IFF_ALLMULTI | IFF_SIMPLEX)
Also, put the ifp->if_opackets++ at the rigth place so that the counter is
incremented even when the parent is OACTIVE.
Fix a bug in vlan_input where the ethernet src and dst addrs would not be
correct because we only memmove() only ifv->ifv_encaplen instead of
sizeof(struct ether_header).
 1.22 15-Nov-2000  thorpej Move bpfattach()/bpfdetach() calls into ether_ifattach()/ether_ifdetach().
 1.21 12-Nov-2000  bouyer In vlan_config(), filter flags inherited from parent interface to
(IFF_UP | IFF_BROADCAST | IFF_RUNNING | IFF_ALLMULTI | IFF_SIMPLEX)
Without this, if the parent is OACTIVE of PROMISC at config time, we
loose.
 1.20 10-Nov-2000  enami Don't return uninitialized value.
 1.19 10-Nov-2000  enami Don't unlink and deallocate ether_multi here. ether_ifdetach will do it.
 1.18 10-Nov-2000  enami Define struct member correctly. This fixes a panic due to overwrite of stack.
 1.17 09-Nov-2000  thorpej Implement promiscuous mode.
 1.16 15-Oct-2000  bouyer Don't try to handle SIOCSIFADDR/SIOCADDMULTI/SIOCDELMULTI if a vlan/vlanif
hasn't been configured (prevent a panic in arp_ifinit when setting an
IP addr with no vlan/vlanif).
 1.15 10-Oct-2000  ad Remove defunct bpfdetach()/ether_ifdetach() calls.
 1.14 04-Oct-2000  enami Cosmetic changes.
 1.13 04-Oct-2000  enami Remove redundant assignment.
 1.12 03-Oct-2000  thorpej Pop one off the TODO list.
 1.11 03-Oct-2000  thorpej When an Ethernet interface detaches, unconfigure any VLANs associated
with it.
 1.10 03-Oct-2000  thorpej Improve the VLAN support, in particular, handling of MTU:
- Add a macro to compute the max frame length based on Ethertype
and presence of FCS, and use it to validate the packet size
in ether_input().
- Add capabilites to struct ethercom, and allow hardware drivers
to specify that they can handle the larger hardware MTU that
VLANs require in order to strictly conform to 802.1Q.
- Make ether_ifdetach() clear out the link address and free all of
the Ethernet multicast structures.

Also, rearrange the VLAN driver itself in preparation to supporting
other hardware types, including FDDI (which has 802.1Q VLAN capability).
 1.9 02-Oct-2000  ad htons -> ntohs. From Alan Barrett <apb@cequrux.com>.
 1.8 28-Sep-2000  ad Add an item to the TODO list.
 1.7 28-Sep-2000  enami Don't unconfigure if it is already unconfigured.
 1.6 28-Sep-2000  enami Fix think-o in previous; don't do the same test twice.
 1.5 28-Sep-2000  enami Port the multicast handling to NetBSD correctly.
 1.4 28-Sep-2000  enami s/6/ETHER_ADDR_LEN/
 1.3 28-Sep-2000  enami Remove unnecessary test.
 1.2 28-Sep-2000  enami Remove unnecessary function decl.
 1.1 27-Sep-2000  thorpej Support for 802.1Q Virtual LANs. Derived and cleaned up by
Andy Doran <ad@netbsd.org> from the FreeBSD/OpenBSD implementation.
A few minor changes to how it all hooks into the system by me.
 1.24.2.7 21-Apr-2001  bouyer Sync with HEAD
 1.24.2.6 11-Feb-2001  bouyer Sync with HEAD.
 1.24.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.24.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.24.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.24.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.24.2.1 17-Nov-2000  bouyer file if_vlan.c was added on branch thorpej_scsipi on 2000-11-20 18:10:08 +0000
 1.26.2.3 07-Apr-2004  jmc Pullup rev 1.37 (requested by itojun in ticket #93)

Need to use m_freem(), not m_free().
 1.26.2.2 31-Dec-2000  jhawk Pull up revisions 1.1-1.14, 1.16-1.21, 1.23-1.26 (new) (requested by bouyer):
Add support for 802.1Q virtual LANs.
 1.26.2.1 18-Dec-2000  jhawk file if_vlan.c was added on branch netbsd-1-5 on 2000-12-31 20:14:32 +0000
 1.30.2.8 17-Jan-2003  thorpej Sync with HEAD.
 1.30.2.7 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.30.2.6 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.30.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.30.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.30.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.30.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.30.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.32.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.32.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.33.10.1 02-Oct-2003  tron Pull up revision 1.37 (requested by itojun in ticket #1499):
need to use m_freem(), not m_free(). from iij seil team
 1.33.8.1 20-Jun-2002  gehenna catch up with -current.
 1.35.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.35.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.35.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.35.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.35.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.35.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.35.2.1 03-Aug-2004  skrll Sync with HEAD
 1.38.2.1 23-Jul-2004  he Pull up revision 1.41 (requested by mycroft in ticket #697):
If the parent interface is not IFF_RUNNING, do not call
its start function. This prevents a variety of panics,
and therefore fixes PR#12932, PR#17561, and PR#18376.
 1.42.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.42.4.1 29-Apr-2005  kent sync with -current
 1.44.2.1 30-Mar-2005  tron Pull up revision 1.45 (requested by yamt in ticket #70):
vlan_input: add a missing EVL_VLANOFTAG in the case of hw offloading.
 1.46.2.6 27-Feb-2008  yamt sync with head.
 1.46.2.5 11-Feb-2008  yamt sync with head.
 1.46.2.4 27-Oct-2007  yamt sync with head.
 1.46.2.3 03-Sep-2007  yamt sync with head.
 1.46.2.2 30-Dec-2006  yamt sync with head.
 1.46.2.1 21-Jun-2006  yamt sync with head.
 1.47.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.47.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.47.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.47.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.47.8.2 11-Aug-2006  yamt sync with head
 1.47.8.1 24-May-2006  yamt sync with head.
 1.47.6.1 01-Jun-2006  kardel Sync with head.
 1.47.4.1 09-Sep-2006  rpaulo sync with head
 1.49.6.2 10-Dec-2006  yamt sync with head.
 1.49.6.1 22-Oct-2006  yamt sync with head
 1.49.4.1 18-Nov-2006  ad Sync with head.
 1.52.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.53.14.2 02-Oct-2007  joerg Sync with HEAD.
 1.53.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.53.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.53.2.1 09-Oct-2007  ad Sync with head.
 1.54.2.2 23-Mar-2008  matt sync with HEAD
 1.54.2.1 06-Nov-2007  matt sync with HEAD
 1.55.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.57.10.4 11-Aug-2010  yamt sync with head.
 1.57.10.3 11-Mar-2010  yamt sync with head
 1.57.10.2 04-May-2009  yamt sync with head.
 1.57.10.1 16-May-2008  yamt sync with head.
 1.57.8.2 17-Jun-2008  yamt sync with head.
 1.57.8.1 18-May-2008  yamt sync with head.
 1.57.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.57.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.57.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.58.4.1 18-Jun-2008  simonb Sync with head.
 1.58.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.59.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.59.2.1 19-Oct-2008  haad Sync with HEAD.
 1.60.8.1 21-Apr-2010  matt sync to netbsd-5
 1.60.4.1 03-May-2009  snj Pull up following revision(s) (requested by darran in ticket #644):
sys/net/if_vlan.c: revision 1.63
Also inherit the parent's TCP segmentation offload capability.
Note the vlan interface does not see updates to the parents capabilities
so if, for example, TSO is on in both, then turned off in the parent it
will remain on in the vlan interface.
 1.60.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.60.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.62.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.64.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.65.2.2 21-Apr-2011  rmind sync with head
 1.65.2.1 30-May-2010  rmind sync with head
 1.66.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.69.26.1 10-Aug-2014  tls Rebase.
 1.69.22.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.69.16.1 18-May-2014  rmind sync with head
 1.69.14.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.69.12.2 03-Dec-2017  jdolecek update from HEAD
 1.69.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.69.8.4 24-Apr-2015  msaitoh Pull up following revision(s) (requested by ozaki-r in ticket #1295):
sys/net/if_vlan.c: revision 1.81
Count up parent's obytes and omcasts counters
PR kern/49837
 1.69.8.3 16-Apr-2015  msaitoh Pull up following revision(s) (requested by ozaki-r in ticket #1286):
sys/net/if_vlan.c: revision 1.80
Correct frame padding length
vlan pads a frame with zeros up to 68 bytes
(ETHER_MIN_LEN + ETHER_VLAN_ENCAP_LEN). It expects
that even if the frame is untagged, it keeps 64 bytes
at least. However, it lacks concern about CRC
(4 bytes). So a sending frame can be 72 (68 + 4) bytes.
PR 49788
 1.69.8.2 03-Nov-2014  msaitoh Pull up following revision(s) (requested by ozaki-r in ticket #1156):
sbin/ifconfig/vlan.c: revision 1.14
sbin/ifconfig/ifconfig.8: revision 1.108
sys/net/if_vlan.c: revision 1.71
sys/net/if_vlan.c: revision 1.73
sys/net/if_vlan.c: revision 1.74
- PR#49114: Write about -vlanif in ifconfig.8.
Add -vlanif to the help message of ifconfig.
- PR#49196: Leave promiscuous mode when detaching a parent (ifconfig -vlanif)
We have to call ifpromisc(ifp, 0) for both a VLAN interface
and its parent when they are in promiscuous mode.
- PR#49197: Delete link local addresses of a vlan interface when detaching its
parent.
- PR#49112: Restore vlan_ioctl overwritten by ether_ifdetach in vlan_unconfig
 1.69.8.1 03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.70.2.5 24-Sep-2017  snj Pull up following revision(s) (requested by manu in ticket #1409):
sys/arch/xen/xen/if_xennet_xenbus.c: 1.65
sys/arch/xen/xen/xennetback_xenbus.c: 1.53, 1.56 via patch
sys/net/if_bridge.c: 1.105
sys/net/if_ether.h: 1.65
sys/net/if_ethersubr.c: 1.215, 1.235
sys/net/if_vlan.c: 1.76, 1.77, 1.83, 1.88, 1.94
Protect vlan_unconfig with a mutex
It is not thread-safe but is likely to be executed in concurrent.
See PR 49264 for more detail.
--
Tweak vlan_unconfig
No functional change.
--
Add handling of VLAN packets in if_bridge where the parent interface supports
them (Jean-Jacques.Puig%espci.fr@localhost). Factor out the vlan_mtu enabling and
disabling code.
--
Enable the VLAN mtu capability and check for the adjusted packet size
(Jean-Jacques.Puig at espci.fr).
Factor out the packet-size checking function for clarity.
--
Don't increment the reference count only when it was 0...
From Jean-Jacques.Puig
--
Account for the CRC len (Jean-Jacques.Puig)
--
Fix a bug that the parent interface's callback wasn't called when the vlan
interface is configured. A callback function uses VLAN_ATTACHED() function
which check ec->ec_nvlans, the value should be incremented before calling the
callback. This bug was added in if_vlan.c rev. 1.83 (2015/11/19).
 1.70.2.4 03-Dec-2016  martin Pull up following revision(s) (requested by joerg in ticket #1279):
sys/net/if_vlan.c: revision 1.92
Don't check parent capabilities when a parent interface hasn't been
assigned.
 1.70.2.3 23-Apr-2015  snj branches: 1.70.2.3.2; 1.70.2.3.4;
Pull up following revision(s) (requested by ozaki-r in ticket #710):
sys/net/if_vlan.c: revision 1.81
Count up parent's obytes and omcasts counters
PR kern/49837
 1.70.2.2 04-Apr-2015  martin Pull up following revision(s) (requested by ozaki-r in ticket #653):
sys/net/if_vlan.c: revision 1.80
Correct frame padding length
vlan pads a frame with zeros up to 68 bytes
(ETHER_MIN_LEN + ETHER_VLAN_ENCAP_LEN). It expects
that even if the frame is untagged, it keeps 64 bytes
at least. However, it lacks concern about CRC
(4 bytes). So a sending frame can be 72 (68 + 4) bytes.
PR 49788
 1.70.2.1 22-Sep-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #108):
sbin/ifconfig/vlan.c: revision 1.14
sbin/ifconfig/ifconfig.8: revision 1.108
sys/net/if_vlan.c: revision 1.71-1.74

Document -vlanif in ifconfig.8 and in usage measseg (PR 49114).
Leave promiscuous mode when detaching a parent (PR 49196) and
delete link local addresses (49197).
Restore vlan_ioctl overwritten by ether_ifdetach in vlan_unconfig
(PR 49112).
Call if_input of vlan interface itself, not parent one.
This allows bridging vlan interfaces again.
 1.70.2.3.4.1 18-Jan-2017  skrll Sync with netbsd-5
 1.70.2.3.2.1 03-Dec-2016  martin Pull up following revision(s) (requested by joerg in ticket #1279):
sys/net/if_vlan.c: revision 1.92
Don't check parent capabilities when a parent interface hasn't been
assigned.
 1.78.2.12 28-Aug-2017  skrll Sync with HEAD
 1.78.2.11 05-Feb-2017  skrll Sync with HEAD
 1.78.2.10 05-Dec-2016  skrll Sync with HEAD
 1.78.2.9 05-Oct-2016  skrll Sync with HEAD
 1.78.2.8 09-Jul-2016  skrll Sync with HEAD
 1.78.2.7 29-May-2016  skrll Sync with HEAD
 1.78.2.6 22-Apr-2016  skrll Sync with HEAD
 1.78.2.5 19-Mar-2016  skrll Sync with HEAD
 1.78.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.78.2.3 22-Sep-2015  skrll Sync with HEAD
 1.78.2.2 06-Jun-2015  skrll Sync with HEAD
 1.78.2.1 06-Apr-2015  skrll Sync with HEAD
 1.90.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.90.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.94.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.97.2.18 13-Nov-2019  martin Pull up following revision(s) (requested by yamaguchi in ticket #1434):

sys/net/if_vlan.c: revision 1.148

Fix a bug that vlan(4) fragments IPv6 packets
even the MTU > packet length.

The bug is appeared when the mtu is increased on SIOCSETVLAN.
From t-kusaba@IIJ
 1.97.2.17 24-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1411):

sys/net/if_vlan.c: revision 1.147

vlan: get rid of unnecessary if_ipackets++ in vlan_input

It's done by if_input() below now.
Pointed out by msaitoh@
 1.97.2.16 22-Oct-2018  martin Additionally pull up r1.131 for ticket #1066 (requested by knakahara):

Use a different psz for a different lock. Patch from riastradh, reviewed
by ozaki-r.
 1.97.2.15 21-Oct-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1066):

sys/net/if_vlan.c: revision 1.133
sys/net/if_gif.h: revision 1.32
sys/net/if_ipsec.c: revision 1.18
sys/net/if_ipsec.h: revision 1.4
sys/net/if_gif.c: revision 1.144
sys/net/if_l2tp.h: revision 1.6
sys/net/if_l2tp.c: revision 1.30

Fix panic when doing ioctl to multiple pseudo interfaces. Pointed out by k-goda@IIJ.

XXX pullup-8
 1.97.2.14 12-Jun-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #876):
sys/net/if_vlan.c: 1.126
tests/net/if_vlan/t_vlan.sh: 1.9
vlan: call ether_ifdetach without IFNET_LOCK
Fix PR kern/53357
--
Add tests of vlan with bridge
The tests trigger a panic reported in PR kern/53357.
 1.97.2.13 06-May-2018  martin Pull up following revision(s) (requested by spz in ticket #813):

sys/net/if_vlan.c: revision 1.122

If cnt == 0, don't kmem_alloc(0). Found by Mootja.

Looking at the code, I also find it suspicious that we read
ifv->ifv_mib->ifvm_p directly without making sure ifv_mib != NULL.
 1.97.2.12 14-Apr-2018  martin Pull up following revision(s) (requested by ryo in ticket #752):

sys/net/if_vlan.c: revision 1.125

Fix the handling of the state returned from pfil_run_hooks().

pfil_run_hooks() invokes any registered packet filters on the packet
being handled. It may return a (non-zero) errno, indicating that a
filter has decided that the packet should be discarded, and has freed
the mbuf. While a non-error (0) return usually means that the packet
should be processed normally, a filter may still free the mbuf if the
packet is a fragment, and the filter is holding it for reassembly and
future evaluation. Therefore, there must be separate tests for the
return value and for a possible discarded packet. (See pfil(9).)

OK: christos, martin
 1.97.2.11 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.97.2.10 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.97.2.9 27-Nov-2017  martin Pull up following revision(s) (requested by jmcneill in ticket #398):
sys/net/if_vlan.c: revision 1.113
kern/52765: npf cannot do port forwarding on vlan interfaces
Add pfil hooks support to vlan(4), from christos@
 1.97.2.8 24-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #389):
sys/net/if_ether.h: revision 1.69
sys/net/if_vlan.c: revision 1.108
sys/dev/pci/if_bge.c: revision 1.313
sys/net/if_vlanvar.h: revision 1.11
sys/net/if_vlanvar.h: revision 1.12
sys/net/if_ether.h: revision 1.70
sys/net/if_vlan.c: revision 1.110
sys/dev/pci/if_wm.c: revision 1.544
sys/dev/pci/if_wmreg.h: revision 1.105
Fix a bug that a vlan packet which has priority or CFI bit in the tag causes
panic.
Revert part of if_bge.c 1.312. It's not required to mask other than VLAN ID
bits in VLAN tag.
Revert if_wmreg.h 1.104 and if_wm.c 1.542. It's not required to mask other
than VLAN ID bits in VLAN tag.
No functional change:
- u_int16_t -> uint16_t
- u_short -> uint16_t
- tag_hash_func -> vlan_tag_hash
- 0 -> NULL because vlr_parent is a pointer.
 1.97.2.7 22-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #368):
sys/net/if_vlan.c: revision 1.101
sys/net/if_vlan.c: revision 1.102
Check if VLAN ID isn't duplicated on a same parent interface and return
EEXIST if it failed.
Remove accidentally added code (for VLAN hardware filter).
 1.97.2.6 08-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #349):
sys/net/if_l2tp.c: revision 1.14
sys/net/if_tap.c: revision 1.101
sys/net/if_tun.c: revision 1.141
sys/net/if_vlan.c: revision 1.106
Set IFEF_NO_LINK_STATE_CHANGE flag to pseudo devices that don't use
if_link_state_change
 1.97.2.5 06-Nov-2017  snj Pull up following revision(s) (requested by knahakara in ticket #340):
sys/net/if_vlan.c: revision 1.104
fix vlan panic when vlan is re-configured without destroy.
E.g. the following operation causes this panic.
====================
# ifconfig vlan0 create
# ifconfig vlan0 vlan 1 vlanif ixg3
# ifconfig vlan1 create
# ifconfig vlan1 vlan 1 vlanif ixg2
# ifconfig vlan1 -vlanif
# ifconfig vlan1 vlan 1 vlanif ixg2
panic: kernel diagnostic assertion "new->ple_next == NULL" failed: file "/git/netbsd-src/sys/sys/pslist.h", line 118
====================
Pointed out and tested by msaitoh@n.o, fixed by s-yamaguchi@IIJ, thanks.
 1.97.2.4 25-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #328):
sys/net/if_vlan.c: revision 1.103
Set IFEF_START_MPSAFE by default
Because vlan_start is already MP-safe, there is no reason to not do so.
Acked by s-yamaguchi@IIJ
 1.97.2.3 24-Oct-2017  snj Pull up following revision(s) (requested by knakahara in ticket #302):
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.30-1.31
sys/arch/x86/pci/if_vmx.c: 1.20
sys/dev/ic/i82557.c: 1.148
sys/dev/ic/rtl8169.c: 1.152
sys/dev/pci/cxgb/cxgb_sge.c: 1.5
sys/dev/pci/if_age.c: 1.51
sys/dev/pci/if_alc.c: 1.25
sys/dev/pci/if_ale.c: 1.23
sys/dev/pci/if_bge.c: 1.311
sys/dev/pci/if_bge.c: 1.312
sys/dev/pci/if_bnx.c: 1.62
sys/dev/pci/if_jme.c: 1.32
sys/dev/pci/if_nfe.c: 1.64
sys/dev/pci/if_sip.c: 1.167
sys/dev/pci/if_stge.c: 1.63-1.64
sys/dev/pci/if_ti.c: 1.102
sys/dev/pci/if_txp.c: 1.48
sys/dev/pci/if_vge.c: 1.61
sys/dev/pci/if_wm.c: 1.538
sys/dev/pci/ixgbe/ix_txrx.c: 1.29 via patch
sys/net/agr/if_agrether_hash.c: 1.4
sys/net/if_ether.h: 1.67-1.68
sys/net/if_ethersubr.c: 1.244
sys/net/if_vlan.c: 1.100
sys/net80211/ieee80211_input.c: 1.89
sys/net80211/ieee80211_output.c: 1.59
sys/sys/mbuf.h: 1.171
VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.
I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html
--
only get vtag when we have vtag like the other drivers.
--
- only get the vtag if we have it like the other drivers
- mask the hardware vlan tag
--
- add a constant for the vlan mask.
- enforce that we have a tag before we get it.
only get vtag when we have vtag like the other drivers.
like if_bge.c:1.312 and if_stge.c:1.64.
fixed by s-yamaguchi@IIJ, thanks.
 1.97.2.2 14-Aug-2017  snj Pull up following revision(s) (requested by knakahara in ticket #205):
sys/net/if_vlan.c: revision 1.99
Fix vlan(4) obytes counter. Implemented by s-yamaguchi@IIJ, thanks.
 1.97.2.1 21-Jun-2017  snj Pull up following revision(s) (requested by knakahara in ticket #41):
sys/net/if_vlan.c: revision 1.98
sys/net/if_vlanvar.h: revision 1.10
vlan(4) MP-ify. contributed by s-yamaguchi@IIJ, thanks.
 1.124.2.5 20-Oct-2018  pgoyette Sync with head
 1.124.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.124.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.124.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.124.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.130.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.130.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.130.2.1 10-Jun-2019  christos Sync with HEAD
 1.141.2.3 13-Nov-2019  martin Pull up following revision(s) (requested by yamaguchi in ticket #420):

sys/net/if_vlan.c: revision 1.148
tests/net/if_vlan/t_vlan.sh: revision 1.16

Fix a bug that vlan(4) fragments IPv6 packets
even the MTU > packet length.

The bug is appeared when the mtu is increased on SIOCSETVLAN.
From t-kusaba@IIJ

atf: add test cases for MTU that is increased on SIOCSETVLAN
From t-kusaba@IIJ, thanks
 1.141.2.2 23-Oct-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #367):

sys/net/if_vlan.c: revision 1.147

vlan: get rid of unnecessary if_ipackets++ in vlan_input
It's done by if_input() below now.

Pointed out by msaitoh@
 1.141.2.1 01-Sep-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #133):

sys/dev/pci/ixgbe/ixgbe.c: revision 1.200
sys/dev/pci/ixgbe/ixgbe.c: revision 1.201
sys/dev/pci/ixgbe/ixv.c: revision 1.126
sys/dev/pci/ixgbe/ixv.c: revision 1.127
sys/net/if_vlan.c: revision 1.142
sys/net/if_vlan.c: revision 1.143
sys/net/if_vlan.c: revision 1.144
sys/net/if_vlan.c: revision 1.145
sys/net/if_vlan.c: revision 1.146

Check ec_capenable instead of ec_capabilities to control TX side of VLAN HW
tagging correctly.
XXX pullup-9

Add missing IFNET_LOCK() and IFNET_UNLOCK() in vlan_config().
XXX pullup-9

Fix a bug that VLAN HW "tagging" enable/disable may not reflect correctly.
- Always call ec_vlan_cb() if it exists.
- Some (or all?) ethernet drivers don't enable HW tagging if no any vlan is
attached. ixgbe is one of them. Check the the transition and update
VLAN HW tagging function.
XXX pullup-9

Use ETHER_LOCK()/ETHER_UNLOCK() suggested by knakahara.
- kmem_alloc(,KM_SLEEP) never return NULL, so remove NULL check.
- VLAN ID is never duplicated, so break the loop when found. Also move
kmen_free() outside of ETHER_LOCK(ec)/ETHER_UNLOCK(ec) to reduce the hold
time. suggested by ozaki-r.
- Whitespace fix.
 1.149.2.1 29-Feb-2020  ad Sync with head.
 1.153.6.2 01-Aug-2021  thorpej Sync with HEAD.
 1.153.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.170.4.1 03-Nov-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #455):
sys/dev/pci/ixgbe/ixgbe.c: revision 1.347
sys/net/if_l2tp.c: revision 1.49
tests/net/if_vlan/t_vlan.sh: revision 1.25
sys/net/if_vlan.c: revision 1.171
sys/net/if_ethersubr.c: revision 1.326
sys/dev/pci/ixgbe/ixv.c: revision 1.194
Use ether_bpf_mtap only when the device supports vlan harware tagging
The function is bpf_mtap() for ethernet devices and *currently*
it is just handling VLAN tag stripped by the hardware.
l2tp(4): use ether_ifattach() to initialize ethercom
Support vlan(4) over l2tp(4)
Added the test for vlan over l2tp
 1.171.2.1 11-Nov-2023  thorpej branches: 1.171.2.1.2;
Mostly de-tangle ifnet::if_snd from ifaltq, in a way that's minimally-
invasive to the ALTQ code itself.

The point of this is to lay the groundwork for future changes to ifqueue,
which among other benefits, will also hide the ALTQ ABI from drivers.
 1.171.2.1.2.2 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.171.2.1.2.1 16-Nov-2023  thorpej Clean up the locking protocol around altq_etherclassify(). It's no longer
required to acquire KERNEL_LOCK *just* because ALTQ is compiled into the
kernel; you only have to acquire it if ALTQ is enabled on the interface
in question.
 1.17 20-Jun-2022  yamaguchi Handling frames that vlan id is 0 as non-VLAN frames
even if a vlan tag is stripped by harware offloading
 1.16 30-Sep-2021  yamaguchi vlan: Register vlan_ifdetach to ether_ifdetach hook
 1.15 30-Sep-2021  yamaguchi vlan: Register the callback to update link-state of vlan I/F
to link-state change hook

The callback is registered in every vlan I/F even if the parent
interface is the same. Therefore it is not needed to search the
vlan I/F by the parent interface unlike the previous callback.
 1.14 26-Sep-2020  roy vlan: match the interface link state with that of the parent

Now addresses on a vlan will detach and undergo duplicate address
dectection on link state changes just as on a standard interface.
 1.13 15-Jan-2018  maxv Mostly style, and add a bunch of KASSERTs.
 1.12 22-Nov-2017  msaitoh No functional change:
- u_int16_t -> uint16_t
- u_short -> uint16_t
- tag_hash_func -> vlan_tag_hash
- 0 -> NULL because vlr_parent is a pointer.
 1.11 22-Nov-2017  msaitoh Fix a bug that a vlan packet which has priority or CFI bit in the tag causes
panic.
 1.10 07-Jun-2017  knakahara vlan(4) MP-ify. contributed by s-yamaguchi@IIJ, thanks.

XXX Pull-ups needed for netbsd-8 branch
 1.9 28-Apr-2008  martin branches: 1.9.44; 1.9.64; 1.9.80;
Remove clause 3 and 4 from TNF licenses
 1.8 20-Feb-2008  matt branches: 1.8.6; 1.8.8; 1.8.10;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.7 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.6 11-Dec-2005  christos branches: 1.6.46; 1.6.52; 1.6.56; 1.6.60;
merge ktrace-lwp.
 1.5 26-Feb-2005  perry branches: 1.5.4;
nuke trailing whitespace
 1.4 03-Oct-2000  thorpej branches: 1.4.2; 1.4.4; 1.4.28; 1.4.36; 1.4.38;
When an Ethernet interface detaches, unconfigure any VLANs associated
with it.
 1.3 03-Oct-2000  thorpej Improve the VLAN support, in particular, handling of MTU:
- Add a macro to compute the max frame length based on Ethertype
and presence of FCS, and use it to validate the packet size
in ether_input().
- Add capabilites to struct ethercom, and allow hardware drivers
to specify that they can handle the larger hardware MTU that
VLANs require in order to strictly conform to 802.1Q.
- Make ether_ifdetach() clear out the link address and free all of
the Ethernet multicast structures.

Also, rearrange the VLAN driver itself in preparation to supporting
other hardware types, including FDDI (which has 802.1Q VLAN capability).
 1.2 28-Sep-2000  enami Port the multicast handling to NetBSD correctly.
 1.1 27-Sep-2000  thorpej Support for 802.1Q Virtual LANs. Derived and cleaned up by
Andy Doran <ad@netbsd.org> from the FreeBSD/OpenBSD implementation.
A few minor changes to how it all hooks into the system by me.
 1.4.38.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.4.36.1 29-Apr-2005  kent sync with -current
 1.4.28.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.4.4.2 31-Dec-2000  jhawk Pull up revisions 1.1-1.4 (new) (requested by bouyer):
Add support for 802.1Q virtual LANs.
 1.4.4.1 03-Oct-2000  jhawk file if_vlanvar.h was added on branch netbsd-1-5 on 2000-12-31 20:14:35 +0000
 1.4.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.2.1 03-Oct-2000  bouyer file if_vlanvar.h was added on branch thorpej_scsipi on 2000-11-20 18:10:08 +0000
 1.5.4.2 27-Feb-2008  yamt sync with head.
 1.5.4.1 21-Jan-2008  yamt sync with head
 1.6.60.1 02-Jan-2008  bouyer Sync with HEAD
 1.6.56.1 26-Dec-2007  ad Sync with head.
 1.6.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.6.46.2 23-Mar-2008  matt sync with HEAD
 1.6.46.1 09-Jan-2008  matt sync with HEAD
 1.8.10.1 16-May-2008  yamt sync with head.
 1.8.8.1 18-May-2008  yamt sync with head.
 1.8.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.80.2 24-Nov-2017  martin Pull up following revision(s) (requested by msaitoh in ticket #389):
sys/net/if_ether.h: revision 1.69
sys/net/if_vlan.c: revision 1.108
sys/dev/pci/if_bge.c: revision 1.313
sys/net/if_vlanvar.h: revision 1.11
sys/net/if_vlanvar.h: revision 1.12
sys/net/if_ether.h: revision 1.70
sys/net/if_vlan.c: revision 1.110
sys/dev/pci/if_wm.c: revision 1.544
sys/dev/pci/if_wmreg.h: revision 1.105
Fix a bug that a vlan packet which has priority or CFI bit in the tag causes
panic.
Revert part of if_bge.c 1.312. It's not required to mask other than VLAN ID
bits in VLAN tag.
Revert if_wmreg.h 1.104 and if_wm.c 1.542. It's not required to mask other
than VLAN ID bits in VLAN tag.
No functional change:
- u_int16_t -> uint16_t
- u_short -> uint16_t
- tag_hash_func -> vlan_tag_hash
- 0 -> NULL because vlr_parent is a pointer.
 1.9.80.1 21-Jun-2017  snj Pull up following revision(s) (requested by knakahara in ticket #41):
sys/net/if_vlan.c: revision 1.98
sys/net/if_vlanvar.h: revision 1.10
vlan(4) MP-ify. contributed by s-yamaguchi@IIJ, thanks.
 1.9.64.1 28-Aug-2017  skrll Sync with HEAD
 1.9.44.1 03-Dec-2017  jdolecek update from HEAD
 1.135 27-Dec-2024  riastradh wg(4): Fix thinko in previous. Should unbreak the rump build.

PR kern/58938: wg tunnel dies after a few days
 1.134 27-Dec-2024  riastradh wg(4): Add debug log for which address we send handshake msgs to.

Maybe this will help to diagnose:

PR kern/58938: wg tunnel dies after a few days
 1.133 28-Nov-2024  riastradh wg(4): Avoid spurious kassert for harmless race in session retry.

If we have already transitioned away from INIT_ACTIVE by the time the
retry timer has fired, the handshake start time may have been zeroed,
but that's harmless. So don't kassert about it until after we've
verified we're still in INIT_ACTIVE state.

PR kern/58859: KASSERT in wg_task_retry_handshake
 1.132 08-Oct-2024  riastradh wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.

PR kern/58688: userland panic of kernel via wg(4)
 1.131 31-Jul-2024  riastradh wg(4): Add Internet Archive links for the versions cited.

No functional change.
 1.130 31-Jul-2024  riastradh wg(4): Make a rule for who wins when both peers send INIT at once.

The rule is that the peer with the numerically smaller public key
hash, in little-endian, takes priority iff the low order bit of

H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)

is 0, and the peer with the lexicographically larger public key takes
priority iff the low-order bit is 1.

Another case of:

PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

This one is, as far as I can tell, simply a deadlock in the protocol
of the whitepaper -- until both sides give up on the handshake and
one of them (but not both) later decides to try sending data again.

(But not related to our t_misc:wg_rekey test, as far as I can tell,
and I haven't put enough thought into how to reliably trigger this
race to write a new automatic test for it.)
 1.129 29-Jul-2024  riastradh wg(4): Sprinkle volatile on variables requiring atomic access.

No functional change intended, since the relevant access is always
done with atomic_* when it might race with concurrent access -- and
really this should be _Atomic or something. But for now our
atomic_ops(9) API is still spelled with volatile, so we'll use that.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.128 29-Jul-2024  riastradh wg(4): When a session is established, send first packet directly.

Like we would do with the keepalive packet, if we had to send that
instead -- no need to defer it to the pktq. Keep it simple.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.127 29-Jul-2024  riastradh wg(4): Queue packet for post-handshake retransmit if limits are hit.

PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
 1.126 29-Jul-2024  riastradh wg(4): Trigger session initiation in wgintr, not in wg_output.

We have to look up the session in wgintr anyway, for
wg_send_data_msg. By triggering session initiation in wgintr instead
of wg_output, we can skip the stable session lookup and reference in
wg_output -- simpler that way.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.125 29-Jul-2024  riastradh wg(4): Add missing barriers around wgp_pending access.

PR kern/58520: experimental wg(4) lacks barriers around access to
packet pending initiation
 1.124 29-Jul-2024  riastradh wg(4): Force rekey on tx if session is older than reject-after-time.

One more corner case for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.123 29-Jul-2024  riastradh wg(4): Read wgs_state atomically in wg_get_stable_session.

As noted in the comment above, it may concurrently transition from
ESTABLISHED to DESTROYING.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.122 29-Jul-2024  riastradh wg(4): Deduplicate session establishment actions.

The actions to

(a) record the last handshake time,
(b) clear some handshake state,
(c) transmit first data if queued, or (if initiator) keepalive, and
(d) begin destroying the old session,

were formerly duplicated between wg_handle_msg_resp (for when we're
the initiator) and wg_task_establish_session (for when we're the
responder).

Instead, let's factor this out into wg_swap_session so there's only
one copy of the logic.

This requires moving wg_update_endpoint_if_necessary a little earlier
in wg_handle_msg_resp -- which should be done anyway so that the
endpoint is updated _before_ the session is published for the data tx
path to use.

Other than moving wg_update_endpoint_if_necessary a little earlier,
no functional change intended.

Post-fix tidying for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.121 29-Jul-2024  riastradh wg(4): Sprinkle comments on internal sliding window API.

Post-fix tidying for:

PR kern/58480: experimental wg(4) sliding window logic has oopsie
 1.120 29-Jul-2024  riastradh wg(4): Omit needless atomic_load.

wgs_local_index is only ever written to while only one thread has
access to it and it is not in the thmap -- before it is published in
wg_get_session_index, and after it is unpublished in
wg_destroy_session. So no need for atomic_load -- it is stable if we
observe it in thmap_get result.

(Of course this is only for an assertion, which if tripped obviously
indicates a violation of our assumptions. But if that happens, well,
in the worst case we'll see a weird assertion message claiming that
the index is not equal to itself, which from which we can conclude
there must have been a concurrent update, which is good enough to
help diagnose that problem without any atomic_load.)

Tidying some of the changes for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.119 29-Jul-2024  riastradh wg(4): Fix typo in comment recently added.

Comment added in the service of:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.118 29-Jul-2024  riastradh wg(4): Fix memory ordering in detach.

PR kern/58510: experimental wg(4) lacks memory ordering between
wg_count_dec and module unload
 1.117 29-Jul-2024  riastradh wg(4): No need for atomic access to wgs_time_established in tx/rx.

This is stable while the session is visible to the tx/rx paths -- it
is initialized before the session is exposed to tx/rx, and doesn't
change until the session is no longer used by any tx/rx path and has
been recycled.

When I sprinkled atomic access to wgs_time_established in if_wg.c
rev. 1.104, it was a vestige of an uncommitted draft that did the
transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
an attempt to enable prompter tx on the new session as soon as it is
established. This turned out to be unnecessary, so I reverted most
of it, but forgot that wgs_time_established no longer needed atomic
treatment.

We could go back to using time_t and time_uptime, now that there's no
need to do atomic loads and stores on these quantities. But there's
no point in 64-bit arithmetic when the time differences are all
guaranteed bounded by a few minutes, so keeping it 32-bit is probably
a slight performance improvement on 32-bit systems.

(In contrast, wgs_time_last_data_sent is both written and read in the
tx path, which may run in parallel on multiple CPUs, so it still
requires the atomic treatment.)

Tidying up for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.116 29-Jul-2024  riastradh wg(4): Sprinkle comments into wg_swap_sessions.

No functional change intended.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.115 29-Jul-2024  riastradh wg(4): Queue pending packet in FIFO order, not LIFO order.

Sometimes the session takes a seconds to establish, for whatever
reason. It is better if the pending packet, which we queue up to
send as soon as we get the responder's handshake response, is the
most recent packet, rather than the first packet.

That way, we don't wind up with a weird multi-second-delayed ping,
followed by a bunch of dropped, followed by normal ping timings, or
wind up sending the first TCP SYN instead of the most recent, or what
have you. Senders need to be prepared to retransmit anyway if
packets are dropped.

PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
first handshake
 1.114 29-Jul-2024  riastradh wg(4): Sprinkle static on fixed-size array parameters.

Let's make the static size declarations useful.

No functional change intended.
 1.113 29-Jul-2024  riastradh wg(4): Put force_rekey state in the session, not the peer.

That way, there is a time when one thread has exclusive access to the
state, in wg_destroy_session under the peer lock, when we can clear
the state without racing against the data tx path.

This will work more reliably than the atomic_swap_uint I used before.

Noted by kre@.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.112 28-Jul-2024  riastradh wg(4): Explain why gethexdump/puthexdump is there, and tidy.

This way I will not be tempted to replace it by in-line calls to
libkern hexdump.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.111 28-Jul-2024  riastradh wg(4): Delete temporary hacks to dump keys and packets.

No longer useful for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.110 28-Jul-2024  riastradh wg(4): Parenthesize macro expansions properly.

PR kern/58480: experimental wg(4) sliding window logic has oopsie
 1.109 28-Jul-2024  riastradh wg(4): Be more consistent about #ifdef INET/INET6.

PR kern/58478: experimental wg(4) probably doesn't build with
INET6-only
 1.108 28-Jul-2024  riastradh wg(4): Tidy up error branches.

No functional change intended, except to add some log messages in
failure cases.

Cleanup after:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.107 28-Jul-2024  riastradh wg(4): Process all altq'd packets when deleting peer.

Can't just drop them because we can only go through all packets on an
interface at a time, for all peers -- so we'd either have to drop all
peers' packets, or requeue the packets for other peers. Probably not
worth the trouble, so let's just wait for all the packets currently
queued up to go through first.

This requires reordering teardown so that we wg_destroy_all_peers,
and thus wg_purge_pending_packets, _before_ we wg_if_detach, because
wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses.

PR kern/58477: experimental wg(4) ALTQ support is probably buggy
 1.106 28-Jul-2024  riastradh wg(4): Fix quotation in comment.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.105 28-Jul-2024  riastradh wg(4): Make time_uptime32 work in netbsd<=10.

This is the low 32 bits of time_uptime.

Will simplify pullups to 10 for:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.104 28-Jul-2024  riastradh wg(4): Use 32-bit for times handled in rx/tx paths.

The rx and tx paths require unlocked access to wgs_time_established
(to decide whether it's time to rekey) and wgs_time_last_data_sent
(to decide whether we need to reply to incoming data with a keepalive
packet), so do it with atomic_load/store_*.

On 32-bit platforms, we may not be able to do that on time_t.
However, since sessions only last for a few minutes before
reject-after-time kicks in and they are erased, 32 bits is plenty to
record the durations that we need to record here, so this shouldn't
introduce any new bugs even on hosts that exceed 136 years of uptime.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.103 28-Jul-2024  riastradh wg(4): Make sure to update endpoint on keepalive packets too.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.102 28-Jul-2024  riastradh wg(4): On rx of valid ciphertext, make sure to update state machine.

Previously, we also required the plaintext to be a plausible-looking
IP packet before updating the state machine.

But keepalive packets are empty -- and if the peer initiated the
session to rekey after last tx but had no more data to tx, it will
send a keepalive to finish session initiation.

If we didn't update the state machine in that case, we would stay in
INIT_PASSIVE state unable to tx on the session, which would make
things hang.

So make sure to always update the state machine once we have accepted
a packet as genuine, even if it's genuine garbage on the inside.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.101 28-Jul-2024  riastradh wg(4): Reject rx on sessions older than reject-after-time sec.

Prompted by (but won't fix anything in):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.100 28-Jul-2024  riastradh wg(4): Fix session destruction.

Schedule destruction as soon as the session is created, to ensure key
erasure within 2*reject-after-time seconds. Previously, we would
schedule destruction of the previous session 1 second after the next
one has been established. Combined with a failure to update the
state machine on keepalive packets, this led to temporary deadlock
scenarios.

To keep it simple, there's just one callout which runs every
reject-after-time seconds and erases keys in sessions older than
reject-after-time, so if a session is established the moment after it
runs, the keys might not be erased until (2-eps)*reject-after-time
seconds.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.99 28-Jul-2024  riastradh wg(4): Mark wgp_pending volatile to reflect its usage.

Prompted by (but won't fix any part of):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.98 28-Jul-2024  riastradh wg(4): Expand cookie secret to 32 bytes.

This is only relevant for denial of service mitigation, so it's not
that big a deal, and the spec doesn't say anything about the size,
but let's make it the standard key size.

PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not
32-byte cookie secret
 1.97 28-Jul-2024  riastradh wg(4): Omit needless pserialize_perform on transition to DESTROYING.

A session can still be used when it is in the DESTROYING state, so
there's no need to wait for users to drain here -- that's the whole
point of a separate DESTROYING state.

It is only the transition from DESTROYING back to UNKNOWN, after the
session has been unpublished so no new users can begin, that requires
waiting for all users to drain, and we already do that in
wg_destroy_session.

Prompted by (but won't fix anything in, because this is just a
performance optimization):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.96 28-Jul-2024  riastradh wg(4): Use callout_halt, not callout_stop.

It's possible that callout_stop might work here, but let's simplify
reasoning about it -- the timers in question only take the peer intr
lock, so it's safe to wait for them while holding the peer lock in
the handshake worker thread.

We may have to undo the task bit but that will take a bit more
analysis to determine.

Prompted by (but probably won't fix anything in):

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.95 28-Jul-2024  riastradh wg(4): Fix logic to ensure session initiation is underway.

Previously, wg_task_send_init_message would call
wg_send_handshake_msg_init if either:

(a) the stable session is UNKNOWN, meaning a session has not yet been
established, either by us or by the peer (but it could be in
progress); or

(b) the stable session is not UNKNOWN but the unstable session is
_not_ INIT_ACTIVE, meaning there is an established session and we
are not currently initiating a new session.

If wg_output (or wgintr) found no established session while there was
already a session being initiated, we may only enter
wg_task_send_init_message after the session is already established,
and trigger spurious reinitiation.

Instead, create a separate flag to indicate whether it is mandatory
to rekey because limits have passed. Then create a session only if:

(a) the stable session is not ESTABLISHED, or
(b) the mandatory rekey flag is not set,

and clear the mandatory rekey flag.

While here, arrange to do rekey-after-time on tx, not on callout. If
there's no data to tx, we shouldn't reinitiate a session -- we should
stay quiet on the network.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.94 28-Jul-2024  riastradh wg(4): Rework some details of internal session state machine.

This way:

- There is a clear transition between when a session is being set up,
and when it is exposed to the data rx path (wg_handle_msg_data):
atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
ESTABLISHED.

(The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
data rx path, so that's just atomic_store_relaxed. Similarly the
transition to DESTROYING.)

- There is a clear transition between when a session is being set up,
and when it is exposed to the data tx path (wg_output):
atomic_store_release to set wgp->wgp_session_stable to it.

- Every path that reinitializes a session must go through
wg_destroy_session via wg_put_index_session first. This avoids
races between session reuse and the data rx/tx paths.

- Add a log message at the time of every state transition.

Prompted by:

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
 1.93 27-Jul-2024  christos Limit the size of the packet, and print ... if it is bigger. (from kre@)
 1.92 26-Jul-2024  riastradh wg(4): Allow modunload before any interface creation.

The workqueue and pktq are both lazily created, for annoying module
initialization order reasons, so they may not have been created by
the time of modunload.

PR kern/58470
 1.91 25-Jul-2024  christos consistently use printf instead of aprint_debug and print the tkeys with
the packet.
 1.90 25-Jul-2024  christos Add more debugging from Taylor
 1.89 25-Jul-2024  kre Make the debug (WG_DEBUG) func gethexdump() always return a valid
pointer, never NULL, so it doesn't need to be tested before being
printed, which was being done sometimes, but not always.
 1.88 25-Jul-2024  kre There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs
WG_DEBUG defined as well, if set.
 1.87 25-Jul-2024  kre Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu
to print size_t values.
 1.86 25-Jul-2024  christos use hexdump...
 1.85 25-Jul-2024  christos fix size limit calculation in dump and NULL checks
 1.84 24-Jul-2024  christos Add packet dump debugging
 1.83 24-Jul-2024  kre While the previous change fixed the broken build, it wasn't the best
way, as defining any of the WG_DEBUG_XXX symbols then effectively
defined all of them - making them as seperate entities, pointless.

So, rearrange the way things are done a little to avoid doing that.
 1.82 24-Jul-2024  kre If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a
stray rump Makefile...) then we now must have WG_DEBUG also defined, so
if it wasn't, make it so.
 1.81 24-Jul-2024  christos Add more debugging in packet validation
 1.80 24-Jul-2024  christos Add a wg_debug variable to split between debug/trace/dump messages
 1.79 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.78 10-Mar-2024  riastradh branches: 1.78.2;
wg(4): Bind to CPU in wg_handle_packet.

Required by use of psref there.

Assert we're bound up front so we catch mistakes early, rather than
later on if we get unlucky in preemption and scheduling.

PR bin/58021
 1.77 01-Aug-2023  mrg branches: 1.77.2;
fix simple mis-matched function prototype and definitions.

most of these are like, eg

void foo(int[2]);

with either of these

void foo(int*) { ... }
void foo(int[]) { ... }

in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.

found by GCC 12.
 1.76 11-Apr-2023  jakllsch Give scope and additional details to wg(4) diagnostic messages.
 1.75 05-Apr-2023  andvar s/termintaed/terminated/ in comment.
 1.74 05-Jan-2023  christos centralize the kauth ugliness.
 1.73 05-Jan-2023  jakllsch wg(4): Allow non-root to retrieve information other than the private
key and the peer preshared key.

Add kauth(9) enums for wg(4) and add use them in suser secmodel.

Refines fix for PR 57161.
 1.72 05-Jan-2023  jakllsch Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4).

Addresses PR 57161.
 1.71 04-Nov-2022  ozaki-r branches: 1.71.2;
inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.70 28-Oct-2022  ozaki-r Adjust pf, wg, dccp and sctp for struct inpcb integration
 1.69 25-Mar-2022  hannken Prevent memory corruption from wg_send_handshake_msg_init() on
LP64 machines with "MSIZE == 256", sparc64 for example.

wg_send_handshake_msg_init() tries to put 148 bytes into a buffer
of 144 bytes and overwrites 4 bytes following the mbuf. Check
for "sizeof() > MHLEN" and use a cluster in this case.

With help from Taylor R Campbell <riastradh@>
 1.68 16-Jan-2022  riastradh wg(4): Limit the size of ifdrv requests.

Avoids potential integer overflow or kernel memory exhaustion.

Reported by Thomas Leroy a while back.
 1.67 31-Dec-2021  riastradh sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.66 31-Dec-2021  riastradh sys: Use if_stop wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.65 17-Aug-2021  christos Some signnes, casts, and constant sizes.
Add module dependencies.
 1.64 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.63 29-Apr-2021  riastradh Sprinkle __noinline to reduce gigantic stack frames in ALL kernels.

In principle this might just push a real problem around, but this is
unlikely to be a real problem because:

1. The large stack frames are really only in the setup state machine
message handlers, which run at the top loop of a thread with a
shallow stack anyway.

2. If these are inlined, gcc might create multiple nonoverlapping
stack buffers, whereas if not inlined, the stack frames from
consecutive or alternative procedure calls would overlap anyway.

(I haven't investigated exactly what's going on leading to ~5 KB-byte
stack frames, but this shuts gcc up, at least, and the hypotheses
sound plausible to me!)
 1.62 11-Nov-2020  riastradh branches: 1.62.4;
wg: Sprinkle #ifdef INET6. Avoid unconditional use of ip6 structs.

Fixes no-INET6 build.

Based on patch from Brad Spencer:

https://mail-index.NetBSD.org/current-users/2020/11/11/msg039883.html
 1.61 15-Oct-2020  roy branches: 1.61.2;
wg: with no peers, the link status is DOWN, otherwise UP

This mirrors the recent changes to gif(4) where the link is UP when a
tunnel is set, otherwise DOWN.
 1.60 14-Sep-2020  riastradh wg: Add altq hooks.

While here, remove the IFQ_CLASSIFY bottleneck (takes the ifq lock,
so it would serialize all transmission to all peers on a single wg(4)
interface).

altq can be disabled at compile-time or at run-time; even if included
at comple-time the run-time impact should be negligible if disabled.
 1.59 13-Sep-2020  riastradh wg: Fix detach logic.

Not tested but this should be less of a rake to step on if anyone
made an unloadable wg module.
 1.58 13-Sep-2020  riastradh wg: Use RUN_ONCE to defer workqueue_create until after configure.

Should really fix workqueue(9) so workqueue_create can be done before
CPUs have been detected in configure, but this will serve as a stop-
gap measure.
 1.57 13-Sep-2020  riastradh wg: Add missing kpreempt_disable/enable around pktq_enqueue.
 1.56 08-Sep-2020  riastradh wg: Drop wgp_lock while waiting for endpoint psref to drain.

- This is safe because wgp_endpoint_changing locks out any attempts
to change the endpoint until the draining is complete.

- This is necessary to avoid a deadlock where the handshake thread
holds a psref and awaits mutex_enter(wgp->wgp_lock).

XXX The same deadlock may occur in wg_destroy_session. Not clear
that it's safe to just release wgp_lock there; may need to create a
new session state, say WGS_STATE_DRAINING, while we wait for
psref_target_destroy. But this needs a little more thought; a new
state may not be necessary, and would be nice to avoid if not
necessary.
 1.55 07-Sep-2020  riastradh wg: Use threadpool(9) and workqueue(9) for asynchronous tasks.

- Using threadpool(9) job per interface to receive incoming handshake
messages gives the same concurrency for active interfaces but
doesn't waste kthreads for inactive ones.

=> Can't really do this with a global workqueue(9) because there's
no bound on the amount of time wg_receive_packets() might run
for; we really need separate threads or threadpool jobs in order
to avoid having one interface starve all the others.

- Using a global workqueue(9) for asynchronous peer tasks avoids
creating unnecessary kthreads.

=> Each task does a more or less bounded amount of work, so it's OK
to share a global workqueue -- there's no advantage to adding
concurrency for what is almost certainly going to be CPU-bound
asymmetric crypto.

=> This way we don't need a thread per peer or iteration over a
list of all peers, so the task mechanism should no longer be a
bottleneck to scaling to thousands of peers.

XXX This doesn't distribute the load across CPUs -- it keeps it on
the same CPU where the packet came in. Should consider doing
something to balance the load -- maybe note if the current CPU is
loaded, and if so, sort CPUs by queue length or some other measure of
load and pick the least loaded one or something.
 1.54 07-Sep-2020  riastradh wg: Use a global pktqueue rather than a per-peer pcq.

- Improves scalability -- won't hit limit on softints no matter how
many peers there are.
- Improves parallelism -- softint was kernel-locked to serialize
access to the pcq.
- Requires per-peer queue on handshake init to avoid dropping first
packet.
. Per-peer queue is currently a single packet -- should serve well
enough for pings, dns queries, tcp connections, &c.
 1.53 07-Sep-2020  riastradh wg: Fix debug output now that the priority is mixed into it.
 1.52 07-Sep-2020  riastradh wg: Fix non-DIAGNOSTIC build.
 1.51 31-Aug-2020  riastradh wg: Avoid memory leak if socreate fails.
 1.50 31-Aug-2020  riastradh wg: Make it build with WG_DEBUG on 32-bit platforms.
 1.49 31-Aug-2020  riastradh wg: Simplify locking.

Summary: Access to a stable established session is still allowed via
psref; all other access to peer and session state is now serialized
by struct wg_peer::wgp_lock, with no dancing around a per-session
lock. This way, the handshake paths are locked, while the data
transmission paths are pserialized.

- Eliminate struct wg_session::wgs_lock.

- Eliminate wg_get_unstable_session -- access to the unstable session
is allowed only with struct wgp_peer::wgp_lock held.

- Push INIT_PASSIVE->ESTABLISHED transition down into a thread task.

- Push rekey down into a thread task.

- Allocate session indices only on transition from UNKNOWN and free
them only on transition back to UNKNOWN.

- Be a little more explicit about allowed state transitions, and
reject some nonsensical ones.

- Sprinkle assertions and comments.

- Reduce atomic r/m/w swap operations that can just as well be
store-release.
 1.48 31-Aug-2020  riastradh wg: M_NOWAIT -> M_DONTWAIT

These happen to be aliases, but M_NOWAIT is part of the legacy malloc
API whereas M_DONTWAIT is part of the mbuf API.
 1.47 31-Aug-2020  riastradh wg: wg_sockaddr audit.

- Ensure all access to struct wg_peer::wgp_endpoint happens while
holding a psref.

- Simplify internalize/externalize logic and be more careful about
verifying it before printing anything.
 1.46 31-Aug-2020  riastradh wg: On INIT, do DH and decrypt timestamp before locking session.

This narrows the window when the session is unlocked. Really there
should be no such window, but we'll finish getting rid of it later.
 1.45 31-Aug-2020  riastradh wg: Verify or send cookie challenge before looking up session.

This step doesn't depend on the session, so let's avoid touching the
session state until we've passed it.
 1.44 31-Aug-2020  riastradh wg: Verify mac1 as the first step on INIT and RESP messages.

This avoids the expensive DH computation before the sender has proven
knowledge of our public key.
 1.43 31-Aug-2020  riastradh wg: Omit needless variable.
 1.42 31-Aug-2020  riastradh wg: Switch to callout_stop for session destructor timer.

Can't release the lock here, and can't sleep waiting for the callout
while we hold it without risking deadlock. But not waiting is fine;
after we transition out of WGS_STATE_UNKNOWN the timer has no effect.
 1.41 31-Aug-2020  riastradh wg: Fix indentation. No functional change.
 1.40 31-Aug-2020  riastradh wg: Just call callout_halt directly.

No functional change, just makes it easier to read where callout_halt
happens.
 1.39 31-Aug-2020  riastradh wg: Fix byte order on wire.

Give this a chance to work on big-endian systems.
 1.38 31-Aug-2020  riastradh wg: mbuf m_freem audit.

1. wg_handle_msg_data frees m but the other wg_handle_msg_* just take
a pointer to the mbuf content and not m itself, so free m in those
cases.

2. Can't trivially prove that the pcq is empty by the time
wg_destroy_peer runs pcq_destroy, so let's explicitly purge it
just in case.

3. If wg_send_udp isn't doing udp_send or udp6_output, it still has
to free m in the !INET6 error branch for IPv6 packets.

4. After rumpuser_wg_send_peer or rumpuser_wg_send_user, we still
need to free the mbuf.
 1.37 31-Aug-2020  riastradh wg: Use thmap(9) for peer and session lookup.

Make sure we also don't trip over our own shoelaces by choosing the
same session index twice.
 1.36 31-Aug-2020  riastradh wg: XAEAD doesn't use a counter, so don't pass one.
 1.35 31-Aug-2020  riastradh wg: Count down wg_npeers in wg_destroy_all_peers too.

Doesn't actually make a difference -- wg_destroy_all_peers is only
used when we're destroying the wg instance altogether -- but let's
not leave rakes to step on.
 1.34 31-Aug-2020  riastradh wg: Note lock order.
 1.33 31-Aug-2020  riastradh wg: Remove IFF_POINTOPOINT.

Unclear why this was set; setting it seems to have required a kludge
in netinet/in.c that broke ipsec tunnels. Clearing it makes wg work
again after that kludge was reverted.
 1.32 28-Aug-2020  riastradh wg: Sort includes.
 1.31 27-Aug-2020  tih Summary: let wg interfaces carry multicast traffic

Once a wg interface is up and running, it is useful to be able to run
a routing protocol over it. Marking the interface multicast capable
enables this. (One must also use the wgconfig --allowed-ips option to
explicitly permit the group one needs, e.g. 224.0.0.5/32 for OSPF.)
 1.30 27-Aug-2020  riastradh wg: Assert MCLBYTES is enough for requested length in wg_get_mbuf.
 1.29 27-Aug-2020  riastradh wg: Make sure all paths into wg_handle_msg_data guarantee enough m_len.

Earlier commit moved the m_pullup into wg_validate_msg_header, but
wg_overudp_cb doesn't go through that.
 1.28 27-Aug-2020  riastradh wg: Drop invalid message types on the floor faster.

Don't even let them reach the thread -- drop them in softint.
 1.27 27-Aug-2020  riastradh wg: KASSERT m_len before mtod.

XXX We should really make mtod do this automagically, and use
something else for mtod(m, void *).
 1.26 27-Aug-2020  riastradh wg: Use m_pullup to make message header contiguous before processing.
 1.25 27-Aug-2020  riastradh wg: Check mbuf chain length before m_copydata.
 1.24 26-Aug-2020  riastradh Clarify wg(4)'s relation to WireGuard, pending further discussion.

Still planning to replace wgconfig(8) and wg-keygen(8) by one wg(8)
tool compatible with wireguard-tools; update wg(4) for the minor
changes from the 2018-06-30 spec to the 2020-06-01 spec; &c. This just
clarifies the current state of affairs as it exists in the development
tree for now.

Mark the man page EXPERIMENTAL for extra clarity.
 1.23 23-Aug-2020  riastradh Initialize peers early on for error branch.
 1.22 21-Aug-2020  riastradh Use lock rather than 64-bit atomics for platforms without the latter.
 1.21 21-Aug-2020  riastradh Fix sysctl types.

- CTLTYPE_QUAD, not CTLTYPE_LONG, for uint64_t
- use unsigned rather than time_t -- these are all short durations
- clamp timeouts to be safe for conversion to int ticks in callout

Should fix 32-bit builds.
 1.20 21-Aug-2020  riastradh Ifdef out fast path that relies on atomic 64-bit load/store.

(Really this sliding window business could probably be done with
32-bit sequence numbers and careful detection of wraparound, but
that's a little more effort to work out -- let's just unbreak the
builds for now.)
 1.19 20-Aug-2020  riastradh Mark KASSERT-only variable as __diagused.
 1.18 20-Aug-2020  riastradh Avoid callout_halt under lock.

- We could pass the lock in, except we hold another lock too.

- We could halt before taking the other lock, but it's not safe to
sleep after getting the session pointer before taking its lock.

- We could halt before getting the session pointer, but then there's
no point in doing it under the lock.

So just halt a little earlier instead.
 1.17 20-Aug-2020  riastradh Sprinkle const.
 1.16 20-Aug-2020  riastradh Use container_of rather than casts via void *.
 1.15 20-Aug-2020  riastradh Use be32enc, rather than possibly unaligned uint32_t cast and htonl.
 1.14 20-Aug-2020  riastradh KNF
 1.13 20-Aug-2020  riastradh Use consttime_memequal, not memcmp, to compare secrets for equality.
 1.12 20-Aug-2020  riastradh Take advantage of prop_dictionary_util(3).
 1.11 20-Aug-2020  riastradh Split up wg_process_peer_tasks into bite-size functions.
 1.10 20-Aug-2020  riastradh Fix race in wg_worker kthread destruction.

Also allow the thread to migrate between CPUs -- just not while we're
in the middle of processing and holding onto things with psrefs.
 1.9 20-Aug-2020  riastradh Update for proplib API changes.
 1.8 20-Aug-2020  riastradh Use SYSCTL_SETUP for net.wireguard subtree.
 1.7 20-Aug-2020  riastradh Fix in-kernel debug build.
 1.6 20-Aug-2020  riastradh Implement sliding window for wireguard replay detection.
 1.5 20-Aug-2020  riastradh Don't falsely assert cpu_softintr_p().

Will fail in the following stack trace:

wg_worker (kthread)
wg_receive_packets
wg_handle_packet
wg_handle_msg_data
KASSERT(cpu_softintr_p())

Instead, use kpreempt_disable/enable around softint_schedule.

XXX Not clear that softint is the right place to do this!
 1.4 20-Aug-2020  riastradh Convert wg(4) to if_stat.
 1.3 20-Aug-2020  riastradh Use cprng_strong, not cprng_fast, for ephemeral key.
 1.2 20-Aug-2020  riastradh [ozaki-r] Fix bugs found by maxv's audits
 1.1 20-Aug-2020  riastradh [ozaki-r] Add wg files
 1.61.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.62.4.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.62.4.1 13-May-2021  thorpej Sync with HEAD.
 1.71.2.5 15-Dec-2024  martin Pull up following revision(s) (requested by alnsn in ticket #1022):

sys/net/if_wg.c: revision 1.133

wg(4): Avoid spurious kassert for harmless race in session retry.

If we have already transitioned away from INIT_ACTIVE by the time the
retry timer has fired, the handshake start time may have been zeroed,
but that's harmless. So don't kassert about it until after we've
verified we're still in INIT_ACTIVE state.

PR kern/58859: KASSERT in wg_task_retry_handshake
 1.71.2.4 09-Oct-2024  martin Pull up following revision(s) (requested by riastradh in ticket #934):

sys/net/if_wg.c: revision 1.117
sys/net/if_wg.c: revision 1.118
sys/net/if_wg.c: revision 1.119
sys/net/if_wg.c: revision 1.80
sys/net/if_wg.c: revision 1.81
tests/net/if_wg/t_misc.sh: revision 1.13
sys/net/if_wg.c: revision 1.82
sys/net/if_wg.c: revision 1.130
tests/net/if_wg/t_misc.sh: revision 1.14
sys/net/if_wg.c: revision 1.83
sys/net/if_wg.c: revision 1.131
tests/net/if_wg/t_misc.sh: revision 1.15
sys/net/if_wg.c: revision 1.84
sys/net/if_wg.c: revision 1.132
tests/net/if_wg/t_misc.sh: revision 1.16
sys/net/if_wg.c: revision 1.85
sys/net/if_wg.c: revision 1.86
tests/net/if_wg/t_basic.sh: revision 1.5
sys/net/if_wg.c: revision 1.87
tests/net/if_wg/t_basic.sh: revision 1.6
sys/net/if_wg.c: revision 1.88
sys/net/if_wg.c: revision 1.89
sys/net/if_wg.c: revision 1.100
sys/net/if_wg.c: revision 1.101
sys/net/if_wg.c: revision 1.102
sys/net/if_wg.c: revision 1.103
sys/net/if_wg.c: revision 1.104
sys/net/if_wg.c: revision 1.105
sys/net/if_wg.c: revision 1.106
sys/net/if_wg.c: revision 1.107
sys/net/if_wg.c: revision 1.108
sys/net/if_wg.c: revision 1.109
sys/net/if_wg.c: revision 1.120
sys/net/if_wg.c: revision 1.121
sys/net/if_wg.c: revision 1.122
sys/net/if_wg.c: revision 1.123
sys/net/if_wg.c: revision 1.124
sys/net/if_wg.c: revision 1.75
sys/net/if_wg.c: revision 1.77
sys/net/if_wg.c: revision 1.125
sys/net/if_wg.c: revision 1.126
sys/net/if_wg.c: revision 1.79
sys/net/if_wg.c: revision 1.127
sys/net/if_wg.c: revision 1.128
sys/net/if_wg.c: revision 1.129
sys/net/if_wg.c: revision 1.90
sys/net/if_wg.c: revision 1.91
sys/net/if_wg.c: revision 1.92
sys/net/if_wg.c: revision 1.93
sys/net/if_wg.c: revision 1.94
sys/net/if_wg.c: revision 1.95
sys/net/if_wg.c: revision 1.96
sys/net/if_wg.c: revision 1.97
sys/net/if_wg.c: revision 1.98
sys/net/if_wg.c: revision 1.99
sys/net/if_wg.c: revision 1.110
sys/net/if_wg.c: revision 1.111
sys/net/if_wg.c: revision 1.112
sys/net/if_wg.c: revision 1.113
sys/net/if_wg.c: revision 1.114
sys/net/if_wg.c: revision 1.115
sys/net/if_wg.c: revision 1.116

fix simple mis-matched function prototype and definitions.
most of these are like, eg
void foo(int[2]);
with either of these
void foo(int*) { ... }
void foo(int[]) { ... }
in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.
found by GCC 12.

sys: Drop redundant NULL check before m_freem(9)
m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c
Compile-tested on amd64/ALL.
Suggested by knakahara@

Add a wg_debug variable to split between debug/trace/dump messages

Add more debugging in packet validation

If any of the WG_DEBUG_XXX symbols happens to be defined (say, from a
stray rump Makefile...) then we now must have WG_DEBUG also defined, so
if it wasn't, make it so.

While the previous change fixed the broken build, it wasn't the best
way, as defining any of the WG_DEBUG_XXX symbols then effectively
defined all of them - making them as seperate entities, pointless.

So, rearrange the way things are done a little to avoid doing that.

Add packet dump debugging
fix size limit calculation in dump and NULL checks
use hexdump...

Fix 32 bit (32 bit size_t) WG_DEBUG builds - use %zu rather than %lu
to print size_t values.

There's a new WG_DEBUG_XXX ( XXX==PACKET ) to deal with now. That needs
WG_DEBUG defined as well, if set.

Make the debug (WG_DEBUG) func gethexdump() always return a valid
pointer, never NULL, so it doesn't need to be tested before being
printed, which was being done sometimes, but not always.

Add more debugging from Taylor

wg(4): Allow modunload before any interface creation.

The workqueue and pktq are both lazily created, for annoying module
initialization order reasons, so they may not have been created by
the time of modunload.
PR kern/58470

Limit the size of the packet, and print ... if it is bigger. (from kre@)
wg(4): Rework some details of internal session state machine.

This way:
- There is a clear transition between when a session is being set up,
and when it is exposed to the data rx path (wg_handle_msg_data):
atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
ESTABLISHED.
(The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
data rx path, so that's just atomic_store_relaxed. Similarly the
transition to DESTROYING.)
- There is a clear transition between when a session is being set up,
and when it is exposed to the data tx path (wg_output):
atomic_store_release to set wgp->wgp_session_stable to it.
- Every path that reinitializes a session must go through
wg_destroy_session via wg_put_index_session first. This avoids
races between session reuse and the data rx/tx paths.
- Add a log message at the time of every state transition.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix logic to ensure session initiation is underway.

Previously, wg_task_send_init_message would call
wg_send_handshake_msg_init if either:
(a) the stable session is UNKNOWN, meaning a session has not yet been
established, either by us or by the peer (but it could be in
progress); or
(b) the stable session is not UNKNOWN but the unstable session is
_not_ INIT_ACTIVE, meaning there is an established session and we
are not currently initiating a new session.

If wg_output (or wgintr) found no established session while there was
already a session being initiated, we may only enter
wg_task_send_init_message after the session is already established,
and trigger spurious reinitiation.

Instead, create a separate flag to indicate whether it is mandatory
to rekey because limits have passed. Then create a session only if:
(a) the stable session is not ESTABLISHED, or
(b) the mandatory rekey flag is not set,
and clear the mandatory rekey flag.

While here, arrange to do rekey-after-time on tx, not on callout. If
there's no data to tx, we shouldn't reinitiate a session -- we should
stay quiet on the network.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

PR kern/56252: wg(4) state machine has race conditions

PR kern/58463: if_wg does not work when idle.

wg(4): Use callout_halt, not callout_stop.
It's possible that callout_stop might work here, but let's simplify
reasoning about it -- the timers in question only take the peer intr
lock, so it's safe to wait for them while holding the peer lock in
the handshake worker thread.

We may have to undo the task bit but that will take a bit more
analysis to determine.
Prompted by (but probably won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless pserialize_perform on transition to DESTROYING.

A session can still be used when it is in the DESTROYING state, so
there's no need to wait for users to drain here -- that's the whole
point of a separate DESTROYING state.

It is only the transition from DESTROYING back to UNKNOWN, after the
session has been unpublished so no new users can begin, that requires
waiting for all users to drain, and we already do that in
wg_destroy_session.

Prompted by (but won't fix anything in, because this is just a
performance optimization):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Expand cookie secret to 32 bytes.
This is only relevant for denial of service mitigation, so it's not
that big a deal, and the spec doesn't say anything about the size,
but let's make it the standard key size.

PR kern/58479: experimental wg(4) uses 32-bit cookie secret, not
32-byte cookie secret

wg(4): Mark wgp_pending volatile to reflect its usage.
Prompted by (but won't fix any part of):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix session destruction.
Schedule destruction as soon as the session is created, to ensure key
erasure within 2*reject-after-time seconds. Previously, we would
schedule destruction of the previous session 1 second after the next
one has been established. Combined with a failure to update the
state machine on keepalive packets, this led to temporary deadlock
scenarios.

To keep it simple, there's just one callout which runs every
reject-after-time seconds and erases keys in sessions older than
reject-after-time, so if a session is established the moment after it
runs, the keys might not be erased until (2-eps)*reject-after-time
seconds.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Reject rx on sessions older than reject-after-time sec.
Prompted by (but won't fix anything in):
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): On rx of valid ciphertext, make sure to update state machine.

Previously, we also required the plaintext to be a plausible-looking
IP packet before updating the state machine.

But keepalive packets are empty -- and if the peer initiated the
session to rekey after last tx but had no more data to tx, it will
send a keepalive to finish session initiation.
If we didn't update the state machine in that case, we would stay in
INIT_PASSIVE state unable to tx on the session, which would make
things hang.

So make sure to always update the state machine once we have accepted
a packet as genuine, even if it's genuine garbage on the inside.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make sure to update endpoint on keepalive packets too.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout.

Most of the timers in wg(4) have only 1sec resolution, which might be
rounded in either direction, so make sure there's a 2sec buffer on
either side of the event we care about (the point at which wg(4)
decides to stop retrying handshake).

Won't fix any bugs, but might make the tests slightly less flaky.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions

tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages.

Helpful for following the test log when things go wrong.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.
wg(4): Tests should pass now.

PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Use 32-bit for times handled in rx/tx paths.

The rx and tx paths require unlocked access to wgs_time_established
(to decide whether it's time to rekey) and wgs_time_last_data_sent
(to decide whether we need to reply to incoming data with a keepalive
packet), so do it with atomic_load/store_*.

On 32-bit platforms, we may not be able to do that on time_t.

However, since sessions only last for a few minutes before
reject-after-time kicks in and they are erased, 32 bits is plenty to
record the durations that we need to record here, so this shouldn't
introduce any new bugs even on hosts that exceed 136 years of uptime.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make time_uptime32 work in netbsd<=10.

This is the low 32 bits of time_uptime.
Will simplify pullups to 10 for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix quotation in comment.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Process all altq'd packets when deleting peer.

Can't just drop them because we can only go through all packets on an
interface at a time, for all peers -- so we'd either have to drop all
peers' packets, or requeue the packets for other peers. Probably not
worth the trouble, so let's just wait for all the packets currently
queued up to go through first.

This requires reordering teardown so that we wg_destroy_all_peers,
and thus wg_purge_pending_packets, _before_ we wg_if_detach, because
wg_if_detach -> if_detach destroys the lock that IFQ_DEQUEUE uses.

PR kern/58477: experimental wg(4) ALTQ support is probably buggy

wg(4): Tidy up error branches.
No functional change intended, except to add some log messages in
failure cases.
Cleanup after:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Be more consistent about #ifdef INET/INET6.
PR kern/58478: experimental wg(4) probably doesn't build with
INET6-only

wg(4): Parenthesize macro expansions properly.

PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Delete temporary hacks to dump keys and packets.
No longer useful for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Explain why gethexdump/puthexdump is there, and tidy.
This way I will not be tempted to replace it by in-line calls to
libkern hexdump.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Put force_rekey state in the session, not the peer.
That way, there is a time when one thread has exclusive access to the
state, in wg_destroy_session under the peer lock, when we can clear
the state without racing against the data tx path.
This will work more reliably than the atomic_swap_uint I used before.
Noted by kre@.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle static on fixed-size array parameters.

Let's make the static size declarations useful.
No functional change intended.

wg(4): Queue pending packet in FIFO order, not LIFO order.

Sometimes the session takes a seconds to establish, for whatever
reason. It is better if the pending packet, which we queue up to
send as soon as we get the responder's handshake response, is the
most recent packet, rather than the first packet.

That way, we don't wind up with a weird multi-second-delayed ping,
followed by a bunch of dropped, followed by normal ping timings, or
wind up sending the first TCP SYN instead of the most recent, or what
have you. Senders need to be prepared to retransmit anyway if
packets are dropped.

PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
first handshake
wg(4): Sprinkle comments into wg_swap_sessions.
No functional change intended.
Prompted by:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): No need for atomic access to wgs_time_established in tx/rx.

This is stable while the session is visible to the tx/rx paths -- it
is initialized before the session is exposed to tx/rx, and doesn't
change until the session is no longer used by any tx/rx path and has
been recycled.

When I sprinkled atomic access to wgs_time_established in if_wg.c
rev. 1.104, it was a vestige of an uncommitted draft that did the
transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
an attempt to enable prompter tx on the new session as soon as it is
established. This turned out to be unnecessary, so I reverted most
of it, but forgot that wgs_time_established no longer needed atomic
treatment.

We could go back to using time_t and time_uptime, now that there's no
need to do atomic loads and stores on these quantities. But there's
no point in 64-bit arithmetic when the time differences are all
guaranteed bounded by a few minutes, so keeping it 32-bit is probably
a slight performance improvement on 32-bit systems.
(In contrast, wgs_time_last_data_sent is both written and read in the
tx path, which may run in parallel on multiple CPUs, so it still
requires the atomic treatment.)
Tidying up for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Fix memory ordering in detach.
PR kern/58510: experimental wg(4) lacks memory ordering between
wg_count_dec and module unload

wg(4): Fix typo in comment recently added.
Comment added in the service of:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Omit needless atomic_load.
wgs_local_index is only ever written to while only one thread has
access to it and it is not in the thmap -- before it is published in
wg_get_session_index, and after it is unpublished in
wg_destroy_session. So no need for atomic_load -- it is stable if we
observe it in thmap_get result.
(Of course this is only for an assertion, which if tripped obviously
indicates a violation of our assumptions. But if that happens, well,
in the worst case we'll see a weird assertion message claiming that
the index is not equal to itself, which from which we can conclude
there must have been a concurrent update, which is good enough to
help diagnose that problem without any atomic_load.)

Tidying some of the changes for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle comments on internal sliding window API.
Post-fix tidying for:
PR kern/58480: experimental wg(4) sliding window logic has oopsie

wg(4): Deduplicate session establishment actions.
The actions to
(a) record the last handshake time,
(b) clear some handshake state,
(c) transmit first data if queued, or (if initiator) keepalive, and
(d) begin destroying the old session,
were formerly duplicated between wg_handle_msg_resp (for when we're
the initiator) and wg_task_establish_session (for when we're the
responder).

Instead, let's factor this out into wg_swap_session so there's only
one copy of the logic.
This requires moving wg_update_endpoint_if_necessary a little earlier
in wg_handle_msg_resp -- which should be done anyway so that the
endpoint is updated _before_ the session is published for the data tx
path to use.

Other than moving wg_update_endpoint_if_necessary a little earlier,
no functional change intended.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Read wgs_state atomically in wg_get_stable_session.
As noted in the comment above, it may concurrently transition from
ESTABLISHED to DESTROYING.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Force rekey on tx if session is older than reject-after-time.
One more corner case for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Add missing barriers around wgp_pending access.
PR kern/58520: experimental wg(4) lacks barriers around access to
packet pending initiation
wg(4): Trigger session initiation in wgintr, not in wg_output.

We have to look up the session in wgintr anyway, for
wg_send_data_msg. By triggering session initiation in wgintr instead
of wg_output, we can skip the stable session lookup and reference in
wg_output -- simpler that way.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Queue packet for post-handshake retransmit if limits are hit.
PR kern/58521: experimental wg(4) may drop packet after minutes of quiet
wg(4): When a session is established, send first packet directly.

Like we would do with the keepalive packet, if we had to send that
instead -- no need to defer it to the pktq. Keep it simple.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Sprinkle volatile on variables requiring atomic access.
No functional change intended, since the relevant access is always
done with atomic_* when it might race with concurrent access -- and
really this should be _Atomic or something. But for now our
atomic_ops(9) API is still spelled with volatile, so we'll use that.
Post-fix tidying for:
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

wg(4): Make a rule for who wins when both peers send INIT at once.
The rule is that the peer with the numerically smaller public key
hash, in little-endian, takes priority iff the low order bit of
H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)
is 0, and the peer with the lexicographically larger public key takes
priority iff the low-order bit is 1.

Another case of:
PR kern/56252: wg(4) state machine has race conditions
PR kern/58463: if_wg does not work when idle.

This one is, as far as I can tell, simply a deadlock in the protocol
of the whitepaper -- until both sides give up on the handshake and
one of them (but not both) later decides to try sending data again.
(But not related to our t_misc:wg_rekey test, as far as I can tell,
and I haven't put enough thought into how to reliably trigger this
race to write a new automatic test for it.)
wg(4): Add Internet Archive links for the versions cited.
No functional change.

tests/net/if_wg/t_misc: Add some diagnostics.
PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails

wg(4): Test truncated UDP input from the network.
This triggers double-free in the IPv6 udp6_input path -- but,
confusingly, not the IPv4 udp_input path, even though the overudp_cb
interface ought to be the same:
/* udp_input -- no further use of m if return is -1 */
if ((n = udp4_realinput(&src, &dst, &m, iphlen)) == -1) {
UDP_STATINC(UDP_STAT_HDROPS);
return;
}
/* udp6_input -- m_freem if return is not 0 */
if (udp6_realinput(AF_INET6, &src, &dst, &m, off) == 0) {
...
}
bad:
m_freem(m);
return IPPROTO_DONE;

The subroutines udp4_realinput and udp6_realinput pass through the
return value of overudp_cb in essentially the same way:
/* udp4_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sintosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;
/* udp6_realinput */
if (inp->inp_overudp_cb != NULL) {
int ret;
ret = inp->inp_overudp_cb(mp, off, inp->inp_socket,
sin6tosa(src), inp->inp_overudp_arg);
switch (ret) {
case -1: /* Error, m was freed */
rcvcnt = -1;
goto bad;
...
bad:
return rcvcnt;

PR kern/58688: userland panic of kernel via wg(4)

wg(4): Fix wg_overudp_cb drop paths to null out *mp as caller needs.
PR kern/58688: userland panic of kernel via wg(4)
 1.71.2.3 11-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #628):

sys/net/if_wg.c: revision 1.78

wg(4): Bind to CPU in wg_handle_packet.

Required by use of psref there.
Assert we're bound up front so we catch mistakes early, rather than
later on if we get unlucky in preemption and scheduling.

PR bin/58021
 1.71.2.2 07-Jul-2023  martin Pull up following revision(s) (requested by jakllsch in ticket #228):

sys/net/if_wg.c: revision 1.76

Give scope and additional details to wg(4) diagnostic messages.
 1.71.2.1 13-Jan-2023  martin Pull up following revision(s) (requested by jakllsch in ticket #49):

sys/secmodel/suser/secmodel_suser.c: revision 1.57
sys/sys/kauth.h: revision 1.89
sys/net/if_wg.c: revision 1.72
sys/net/if_wg.c: revision 1.73
sys/net/if_wg.c: revision 1.74

Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4).
Addresses PR 57161.

wg(4): Allow non-root to retrieve information other than the private
key and the peer preshared key.

Add kauth(9) enums for wg(4) and add use them in suser secmodel.

Refines fix for PR 57161.

centralize the kauth ugliness.
 1.77.2.1 14-Nov-2023  thorpej branches: 1.77.2.1.2;
Update for the new location of altq_flags (not in if_snd directly).
 1.77.2.1.2.1 15-Nov-2023  thorpej wg_output(): Use ifq_classify_packet(), and let that function check
for ALTQ-enabled. Acquire KERNEL_LOCK before calling ALTQ_ENQUEUE().
XXX The ALTQ integration here is a mess.
 1.78.2.1 02-Aug-2025  perseant Sync with HEAD
 1.1 20-Aug-2020  riastradh [ozaki-r] Add wg files
 1.41 05-Jun-2025  ozaki-r Apply if_first_addr() and if_first_addr_psref()
 1.40 31-Dec-2021  riastradh branches: 1.40.4; 1.40.10;
sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.39 25-Sep-2019  ozaki-r Make panic messages more informative
 1.38 29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.37 28-Jan-2019  martin Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.36 06-Apr-2017  ozaki-r branches: 1.36.6; 1.36.14;
Revert "Make sure to hold if_ioctl_lock when calling ifp->if_ioctl"

As per pgoyette@ and riastradh@ requests; we shouldn't decide to
hold a lock based on if the lock is held or not.
 1.35 05-Apr-2017  ozaki-r Make sure to hold if_ioctl_lock when calling ifp->if_ioctl

Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.
 1.34 11-Jan-2017  ozaki-r branches: 1.34.2;
Don't call ifa_remove with holding psref
 1.33 26-Dec-2016  ozaki-r Use psz/psref to hold ifa
 1.32 19-Nov-2016  njoly Make fstat(2) work on AF_LINK socket descriptors.
 1.31 07-Jul-2016  ozaki-r branches: 1.31.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.30 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.29 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.28 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.27 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.26 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.25 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.24 09-Aug-2014  rtr branches: 1.24.2; 1.24.4; 1.24.6; 1.24.10;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.23 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.22 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.21 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.20 31-Jul-2014  rtr split PRU_CONNECT, PRU_RCVOOB and PRU_SENDOOB into separate functions
(all implemented as EOPNOTSUPP).
 1.19 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.18 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.17 21-Jul-2014  ozaki-r Don't assume if_init is always set

if_init may be NULL, e.g., if_vlan.

PR kern/48997
 1.16 15-Jul-2014  joerg socklen_t is not uint8_t, so don't print it as such.
 1.15 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.14 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.13 06-Jul-2014  rtr * split PRU_SENSE functionality out of link_usrreq() and place into
separate link_stat(struct socket *, struct stat *) function
 1.12 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.11 23-Jun-2014  rtr where appropriate rename xxx_ioctl() struct mbuf * parameters from
`control' to `ifp' after split from xxx_usrreq().

sys_socket.c
fix wrapping of arguments to be consistent with other function calls
in the file after replacing pr_usrreq() call with pr_ioctl() which
required one less argument.

link_proto.c
fix indentation of parameters in link_ioctl() prototype to be
consistent with the rest of the file.

discussed with rmind@
 1.10 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.9 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.8 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.7 07-Oct-2011  dyoung branches: 1.7.8; 1.7.12; 1.7.14; 1.7.16; 1.7.22; 1.7.26;
Cosmetic: remove whitespace at the end of line.
 1.6 12-Nov-2010  roy Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.
 1.5 07-Nov-2008  dyoung branches: 1.5.8;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.4 13-May-2008  dyoung branches: 1.4.4; 1.4.6;
Let us call ioctl(SIOC[ADG]LIFADDR) with a link-layer address on
an AF_LINK socket, only, to be consistent with SIOC[ADG]LIFADDR
behavior on AF_INET and AF_INET6 sockets. Let us create AF_LINK
sockets for this purpose. Note that most operations on AF_LINK
sockets are not implemented.
 1.3 30-Aug-2007  dyoung branches: 1.3.2; 1.3.22; 1.3.24; 1.3.26; 1.3.28;
Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.2 07-Aug-2007  dyoung branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Lengthen sockaddr_dl so that a 16-byte FireWire address will fit
into sdl_data[].

Move the macro satocsdl() to net/if_dl.h, and introduce satosdl().

Add some helpers for initializing sockaddr_dl (sockaddr_dl_init),
for finding out the length to put in a sockaddr_dl's sdl_len member
(sockaddr_dl_measure), and for setting the link-layer address in
a sockaddr_dl to a new value (sockaddr_dl_setaddr).

Make sockaddr_copy() panic if the caller tries to copy a sockaddr
to a destination where it will not fit.
 1.1 19-Jul-2007  dyoung branches: 1.1.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.1.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.1.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.2.8.2 07-Aug-2007  dyoung Lengthen sockaddr_dl so that a 16-byte FireWire address will fit
into sdl_data[].

Move the macro satocsdl() to net/if_dl.h, and introduce satosdl().

Add some helpers for initializing sockaddr_dl (sockaddr_dl_init),
for finding out the length to put in a sockaddr_dl's sdl_len member
(sockaddr_dl_measure), and for setting the link-layer address in
a sockaddr_dl to a new value (sockaddr_dl_setaddr).

Make sockaddr_copy() panic if the caller tries to copy a sockaddr
to a destination where it will not fit.
 1.2.8.1 07-Aug-2007  dyoung file link_proto.c was added on branch matt-mips64 on 2007-08-07 04:06:21 +0000
 1.2.6.1 06-Nov-2007  matt sync with HEAD
 1.2.4.3 09-Oct-2007  ad Sync with head.
 1.2.4.2 20-Aug-2007  ad Sync with HEAD.
 1.2.4.1 07-Aug-2007  ad file link_proto.c was added on branch vmlocking on 2007-08-20 22:07:08 +0000
 1.2.2.3 03-Sep-2007  skrll Sync with HEAD.
 1.2.2.2 15-Aug-2007  skrll Sync with HEAD.
 1.2.2.1 07-Aug-2007  skrll file link_proto.c was added on branch nick-csl-alignment on 2007-08-15 13:49:41 +0000
 1.3.28.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.3.26.2 04-May-2009  yamt sync with head.
 1.3.26.1 16-May-2008  yamt sync with head.
 1.3.24.1 18-May-2008  yamt sync with head.
 1.3.22.2 17-Jan-2009  mjf Sync with HEAD.
 1.3.22.1 02-Jun-2008  mjf Sync with HEAD.
 1.3.2.2 03-Sep-2007  yamt sync with head.
 1.3.2.1 30-Aug-2007  yamt file link_proto.c was added on branch yamt-lazymbuf on 2007-09-03 14:42:21 +0000
 1.4.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.4.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.5.8.1 05-Mar-2011  rmind sync with head
 1.7.26.1 10-Aug-2014  tls Rebase.
 1.7.22.1 07-Aug-2014  msaitoh Pull up following revision(s) (requested by ozaki-r in ticket #1103):
sys/net/link_proto.c revision 1.17
Don't assume if_init is always set. if_init may be NULL, e.g., if_vlan.
PR kern/48997.
 1.7.16.2 18-May-2014  rmind sync with head
 1.7.16.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.7.14.1 07-Aug-2014  msaitoh Pull up following revision(s) (requested by ozaki-r in ticket #1103):
sys/net/link_proto.c revision 1.17
Don't assume if_init is always set. if_init may be NULL, e.g., if_vlan.
PR kern/48997.
 1.7.12.2 03-Dec-2017  jdolecek update from HEAD
 1.7.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.8.1 07-Aug-2014  msaitoh Pull up following revision(s) (requested by ozaki-r in ticket #1103):
sys/net/link_proto.c revision 1.17
Don't assume if_init is always set. if_init may be NULL, e.g., if_vlan.
PR kern/48997.
 1.24.10.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.24.6.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.24.4.5 05-Feb-2017  skrll Sync with HEAD
 1.24.4.4 05-Dec-2016  skrll Sync with HEAD
 1.24.4.3 09-Jul-2016  skrll Sync with HEAD
 1.24.4.2 06-Jun-2015  skrll Sync with HEAD
 1.24.4.1 06-Apr-2015  skrll Sync with HEAD
 1.24.2.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.31.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.31.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.31.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.34.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.36.14.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.36.14.1 10-Jun-2019  christos Sync with HEAD
 1.36.6.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1175):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/sctp_usrreq.c 1.14
sys/netinet/tcp_usrreq.c 1.223
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/sctp6_usrreq.c 1.17
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.40.10.1 02-Aug-2025  perseant Sync with HEAD
 1.40.4.1 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file mipsock.c was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file mipsock.h was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.8 18-Aug-2025  ozaki-r nd: fix the number of requests for address resolution

ARP is expected to send requests for address resolution
net.inet.arp.nd_bmaxtries times at most. However, it sends
one more. IPv6 ND also behaves the same way.

The fix requires nd_set_timer reorganization to handle
scheduling timer without sending an NS message.

PR kern/59596
 1.7 30-May-2024  riastradh branches: 1.7.4;
nd_timer: Update la_numheld when we clear la_hold (a.k.a. ln_hold).

Followup for PR kern/58297 fix. Patch by mlelstv@.

PR kern/58301
 1.6 28-May-2024  riastradh nd_resolve: Maintain la_numheld.

Otherwise lltable_drop_entry_queue never drops anything.

Addresses mbuf leak, PR kern/58297.
 1.5 19-Nov-2022  yamt branches: 1.5.2;
Make arp have its own mowner

This helped me to debug mbuf leaks in arp.
(if_arp.c rev. 1.298)
 1.4 15-Sep-2020  roy nd: give missed a default of ND_LLINFO_NOSTATE

It's impossible to miss from this state, where-as 0 is ND_LLINFO_INCOMPLETE
which we can miss from.
 1.3 15-Sep-2020  roy Implement RFC 7048, making Neighbor Unreachability Detection less impatient

RFC 7048 Section 3 says in the UNREACHABLE state packets continue to be
sent to the link-layer address and then backoff exponentially.
We adjust this slightly and move to the INCOMPLETE state after
`nd_mmaxtries` probes and then start backing off.

This results in simpler code whilst providing a more robust model which
doubles the time to failure over what we did before.
We don't want to be back to the old ARP model where no unreachability
errors are returned because very few applications would look at
unreachability hints provided such as ND_LLINFO_UNREACHABLE or RTM_MISS.
 1.2 14-Sep-2020  roy nd: Name l3addr union of llentry and use in-place of nd_addr.

Probably makes more sense and makes nd.h less messy.
 1.1 11-Sep-2020  roy Implement address agnostic Neighbor Detection.

This is heavily based on IPv6 Neighbor Detection and allows per protocol
timers which also facilitate Neighor Unreachability Detection.
 1.5.2.2 29-Aug-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1154):

sys/net/nd.c: revision 1.8
tests/net/arp/t_arp.sh: revision 1.49

nd: fix the number of requests for address resolution

ARP is expected to send requests for address resolution
net.inet.arp.nd_bmaxtries times at most. However, it sends
one more. IPv6 ND also behaves the same way.

The fix requires nd_set_timer reorganization to handle
scheduling timer without sending an NS message.
PR kern/59596

tests: add tests for ARP address resolution
 1.5.2.1 11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #827):

sys/net/nd.c: revision 1.6
sys/net/nd.c: revision 1.7

nd_resolve: Maintain la_numheld.

Otherwise lltable_drop_entry_queue never drops anything.

Addresses mbuf leak, PR kern/58297.

nd_timer: Update la_numheld when we clear la_hold (a.k.a. ln_hold).

Followup for PR kern/58297 fix. Patch by mlelstv@.
PR kern/58301
 1.7.4.1 29-Aug-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #22):

sys/net/nd.c: revision 1.8
tests/net/arp/t_arp.sh: revision 1.49

nd: fix the number of requests for address resolution

ARP is expected to send requests for address resolution
net.inet.arp.nd_bmaxtries times at most. However, it sends
one more. IPv6 ND also behaves the same way.

The fix requires nd_set_timer reorganization to handle
scheduling timer without sending an NS message.
PR kern/59596

tests: add tests for ARP address resolution
 1.3 15-Sep-2020  roy Implement RFC 7048, making Neighbor Unreachability Detection less impatient

RFC 7048 Section 3 says in the UNREACHABLE state packets continue to be
sent to the link-layer address and then backoff exponentially.
We adjust this slightly and move to the INCOMPLETE state after
`nd_mmaxtries` probes and then start backing off.

This results in simpler code whilst providing a more robust model which
doubles the time to failure over what we did before.
We don't want to be back to the old ARP model where no unreachability
errors are returned because very few applications would look at
unreachability hints provided such as ND_LLINFO_UNREACHABLE or RTM_MISS.
 1.2 14-Sep-2020  roy nd: Name l3addr union of llentry and use in-place of nd_addr.

Probably makes more sense and makes nd.h less messy.
 1.1 11-Sep-2020  roy Implement address agnostic Neighbor Detection.

This is heavily based on IPv6 Neighbor Detection and allows per protocol
timers which also facilitate Neighor Unreachability Detection.
 1.2 30-Mar-1996  christos Eliminate need for and remove net_conf.h
 1.1 13-Feb-1996  christos Net prototypes
 1.5 14-May-2003  itojun no need to compile net_osdep.c. simplify net_osdep.h conditions (remove
bsdi/freebsd/openbsd stuff)
 1.4 21-Dec-2001  itojun whitespace and comment. sync with kame
 1.3 12-Nov-2001  lukem add RCSIDs
 1.2 13-Dec-1999  itojun branches: 1.2.2; 1.2.8; 1.2.10; 1.2.12;
sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.1 30-Nov-1999  itojun branches: 1.1.2;
file net_osdep.c was initially added on branch kame.
 1.1.2.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.2.12.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.2.10.2 08-Jan-2002  nathanw Catch up to -current.
 1.2.10.1 14-Nov-2001  nathanw Catch up to -current.
 1.2.8.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.8.1 13-Dec-1999  bouyer file net_osdep.c was added on branch thorpej_scsipi on 2000-11-20 18:10:09 +0000
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.21 03-May-2018  maxv Remove net_osdep.h completely.
 1.20 01-May-2018  maxv Move if_name() from net_osdep.h to if.h. net_osdep.h is now unused and can
be removed - the other BSDs did the same.

Discussed with Kengo (if.h suggested by him).
 1.19 08-Feb-2018  maxv branches: 1.19.2;
Remove ovbcopy. It's long dead; only sparc has a reference to a function
of the same name, which too should be removed.
 1.18 06-May-2009  elad Provide privilege checking code snippets for all significant NetBSD
versions: < 2 (suser, proc), 2 & 3 (suser, lwp), >= 4 (kauth, lwp).

No functional change as it's all inside a big comment.
 1.17 04-Mar-2007  christos branches: 1.17.40; 1.17.56;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.16 04-Jan-2007  elad branches: 1.16.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.15 23-Sep-2006  elad PR/19795: Joel Wilsson: net_osdep.h is lying.
Sync comment with reality, thanks for the patch!
 1.14 23-Jul-2006  ad branches: 1.14.4; 1.14.6;
Use the LWP cached credentials where sane.
 1.13 14-May-2006  elad integrate kauth.
 1.12 28-Jan-2006  rpaulo branches: 1.12.2; 1.12.4; 1.12.6; 1.12.8; 1.12.10;
Reflect reality (ktrace-lwp).
 1.11 10-Dec-2005  elad branches: 1.11.2;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.10 26-Feb-2005  perry branches: 1.10.4;
nuke trailing whitespace
 1.9 04-Dec-2004  peter branches: 1.9.4; 1.9.6;
Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.8 14-May-2003  itojun branches: 1.8.2;
no need to compile net_osdep.c. simplify net_osdep.h conditions (remove
bsdi/freebsd/openbsd stuff)
 1.7 21-Dec-2001  itojun whitespace and comment. sync with kame
 1.6 07-Jul-2001  perry branches: 1.6.2;
add ovbcopy macro for KAME compat.
 1.5 07-Jul-2001  itojun have ovbcopy() macro, for cross-BSD compatibility only.
 1.4 08-Feb-2001  itojun branches: 1.4.2;
sync comment with latest kame
 1.3 19-Aug-2000  itojun branches: 1.3.2;
- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)
 1.2 13-Dec-1999  itojun branches: 1.2.2;
sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.1 30-Nov-1999  itojun branches: 1.1.2;
file net_osdep.h was initially added on branch kame.
 1.1.2.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.2.1 19-Aug-2000  bouyer file net_osdep.h was added on branch thorpej_scsipi on 2000-11-20 18:10:09 +0000
 1.4.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.4.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.6.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.2.3 11-Dec-2005  christos Sync with head.
 1.8.2.2 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.8.2.1 18-Dec-2004  skrll Sync with HEAD.
 1.9.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.9.4.1 29-Apr-2005  kent sync with -current
 1.10.4.4 03-Sep-2007  yamt sync with head.
 1.10.4.3 26-Feb-2007  yamt sync with head.
 1.10.4.2 30-Dec-2006  yamt sync with head.
 1.10.4.1 21-Jun-2006  yamt sync with head.
 1.11.2.1 01-Feb-2006  yamt sync with head.
 1.12.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.12.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.12.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.12.6.2 11-Aug-2006  yamt sync with head
 1.12.6.1 24-May-2006  yamt sync with head.
 1.12.4.1 01-Jun-2006  kardel Sync with head.
 1.12.2.1 09-Sep-2006  rpaulo sync with head
 1.14.6.1 22-Oct-2006  yamt sync with head
 1.14.4.2 12-Jan-2007  ad Sync with head.
 1.14.4.1 18-Nov-2006  ad Sync with head.
 1.16.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.17.56.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.17.40.1 16-May-2009  yamt sync with head
 1.19.2.2 21-May-2018  pgoyette Sync with HEAD
 1.19.2.1 02-May-2018  pgoyette Synch with HEAD
 1.6 07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.5 01-Jun-2017  chs branches: 1.5.10; 1.5.16;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.4 04-May-2008  thorpej branches: 1.4.4; 1.4.6; 1.4.48; 1.4.68;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.3 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.2 26-Apr-2008  yamt branches: 1.2.2;
netstat_sysctl: set sysctl_size correctly. (fix netstat -s garbage output)
 1.1 23-Apr-2008  thorpej Add subroutines to support collating per-cpu-gathered network statistics.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.4.68.1 28-Aug-2017  skrll Sync with HEAD
 1.4.48.1 03-Dec-2017  jdolecek update from HEAD
 1.4.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.4.6.1 04-May-2008  mjf file net_stats.c was added on branch mjf-devfs2 on 2008-06-02 13:24:22 +0000
 1.4.4.2 18-May-2008  yamt sync with head.
 1.4.4.1 04-May-2008  yamt file net_stats.c was added on branch yamt-pf42 on 2008-05-18 12:35:28 +0000
 1.5.16.1 29-Feb-2020  ad Sync with head.
 1.5.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.6 29-Jun-2024  riastradh net_stats(9): Make this API slightly more type-safe.

TBD: Convert the macros to inline functions for better type-safety.

PR kern/58380
 1.5 29-Jan-2020  thorpej - Make _NET_STAT_GETREF()'s return value a net_stat_ref_t, which is
defined as a "void *" to prevent using a net_stat_ref_t as an array.
- For each _NET_STATADD(), etc. macro, also define a _NET_STATADD_REF()
macro that takes a ref returned by _NET_STAT_GETREF() as an argument.
This is intended to replace direct subscripting of the refernce;
consumers of this API will be updated in future commits.
 1.4 05-Sep-2014  matt branches: 1.4.20; 1.4.26;
Cast return value of _NET_STAT_GETREF
 1.3 04-May-2008  thorpej branches: 1.3.4; 1.3.6; 1.3.48;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.2 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.1 23-Apr-2008  thorpej branches: 1.1.2;
Add subroutines to support collating per-cpu-gathered network statistics.
 1.1.2.1 16-May-2008  yamt sync with head.
 1.3.48.1 03-Dec-2017  jdolecek update from HEAD
 1.3.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.6.1 04-May-2008  mjf file net_stats.h was added on branch mjf-devfs2 on 2008-06-02 13:24:23 +0000
 1.3.4.2 18-May-2008  yamt sync with head.
 1.3.4.1 04-May-2008  yamt file net_stats.h was added on branch yamt-pf42 on 2008-05-18 12:35:28 +0000
 1.4.26.1 29-Feb-2020  ad Sync with head.
 1.4.20.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.47 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.46 06-Sep-2018  maxv Remove the network ATM code.
 1.45 27-May-2017  bouyer branches: 1.45.8; 1.45.10;
merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.44 25-May-2015  ozaki-r branches: 1.44.4;
Remove leftover IPX-related stuffs

No objection on tech-kern and tech-net.
 1.43 20-May-2015  ozaki-r Remove leftover use of AF_NS and NS option

Unnecessary NETISR_NS is also removed.
 1.42 01-Mar-2013  joerg branches: 1.42.14;
Retire OSI network stack. OK core@
 1.41 27-Jun-2010  kefren branches: 1.41.8; 1.41.18;
Style fix: Tab consistency with the lines around it
 1.40 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.39 12-Nov-2008  ad branches: 1.39.6; 1.39.8;
Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.38 14-Oct-2008  pooka branches: 1.38.2;
Give maximum level of network softinterrupts a symbolic constant
(which happened to get bumbed from 32 to 33 (AF_MAX) now).
 1.37 03-Dec-2007  ad branches: 1.37.14; 1.37.18; 1.37.24;
Interrupt handling changes, in discussion since February:

- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
 1.36 14-Jul-2007  ad branches: 1.36.6; 1.36.8; 1.36.14;
Generic soft interrupts are mandatory.
 1.35 07-Sep-2006  dogcow branches: 1.35.12;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.34 10-Dec-2005  elad branches: 1.34.4; 1.34.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.33 26-Feb-2005  perry branches: 1.33.4;
nuke trailing whitespace
 1.32 07-Aug-2003  agc branches: 1.32.8; 1.32.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.31 15-Mar-2003  matt branches: 1.31.2;
Allow a machine-dependent definition of schednetisr.
 1.30 12-May-2002  matt Eliminate more commons.
 1.29 06-Oct-2001  thorpej The bridge driver does all forwarding at interrupt level, and
does not use software interrupts; remove these bridge netisr
hooks left over from a previous incarnation of the bridge code.

Noted by Andrew Brown <atatat@atatdot.net>.
 1.28 11-Apr-2001  thorpej branches: 1.28.2; 1.28.4;
Add bridge netisr glue (only used if no __HAVE_GENERIC_SOFT_INTERRUPTS).
 1.27 15-Jan-2001  thorpej branches: 1.27.2;
For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.26 11-Jan-2001  thorpej Process STRIP software interrupts.
 1.25 09-Jan-2001  thorpej Fix oversight in slip softintr changes.
 1.24 09-Jan-2001  thorpej Add NETISRs for SLIP and STRIP. (Geez, I wish we had the softintr
API everywhere...)
 1.23 03-Jul-2000  cgd don't include the config-generated headers if _LKM defined
 1.22 02-Jul-2000  cgd oops! include arp.h and ppp.h even if _LOCORE defined
 1.21 02-Jul-2000  sommerfeld Reduce namespace pollution from netcciitt-land
 1.20 02-Jul-2000  cgd Kwality control:
* put #includes of opt headers and headers to get protos used by
net/netisr_dispatch.h in net/netisr.h (if !defined(_LOCORE)) (rather than
in netisr_dispatch.h itself, and potentially nowhere, respectively).
* require netisr.h to be included before netisr_dispatch.h.
* minor additional cleanup of both netisr.h and netisr_dispatch.h.
* clean up uses to remove now-unnecessary header file inclusions, and
local prototypes of the fns.
* convert netisr dispatch implementations which didn't use
netisr_dispatch.h (pc532) to use it.
 1.19 21-Feb-2000  erh Remove NETISR_IMP. Make NETISR_ARP == AF_ARP, renumber NETISR_PPP to allow this.
 1.18 01-Jul-1999  itojun branches: 1.18.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.17 04-May-1998  christos branches: 1.17.10; 1.17.12;
Add IPX bits.
 1.16 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.15 02-Apr-1997  christos Add netatalk stubs.
 1.14 04-Jul-1996  chuck add native mode atm network interrupt
 1.13 01-Feb-1996  mycroft LOCORE -> _LOCORE
 1.12 12-Aug-1995  mycroft splnet --> splsoftnet
 1.11 04-Jul-1995  paulus Add definition for NETISR_PPP.
 1.10 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.9 26-Jul-1994  cgd kill vax code, at ragge's requeust.
 1.8 29-Jun-1994  cgd branches: 1.8.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.6 18-Apr-1994  mycroft NETISR_RAW is defunct.
 1.5 18-Apr-1994  mycroft Add NETISR_ARP.
 1.4 17-Dec-1993  mycroft From magnum branch:
Remove Jolitz's netisr kluge. Make sure cpl == 0 really means base priority.
Other minor cleanup.
 1.3 20-May-1993  cgd branches: 1.3.4;
add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.2 18-Oct-1993  mycroft Remove #ifdef i386 kluge, may it rot in Hell.
 1.3.4.1 14-Oct-1993  mycroft Remove part of the i386 kluge. softem is not used anywhere.
 1.8.2.1 14-Aug-1994  mycroft update from trunk (to remove ancient vax stuff)
 1.17.12.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.17.12.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.17.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.17.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.18.2.3 21-Apr-2001  bouyer Sync with HEAD
 1.18.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.18.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.27.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.27.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.27.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.28.4.1 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.28.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.28.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.31.2.5 11-Dec-2005  christos Sync with head.
 1.31.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.31.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.31.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.31.2.1 03-Aug-2004  skrll Sync with HEAD
 1.32.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.32.8.1 29-Apr-2005  kent sync with -current
 1.33.4.4 07-Dec-2007  yamt sync with head
 1.33.4.3 03-Sep-2007  yamt sync with head.
 1.33.4.2 30-Dec-2006  yamt sync with head.
 1.33.4.1 21-Jun-2006  yamt sync with head.
 1.34.8.1 14-Sep-2006  yamt sync with head.
 1.34.4.1 09-Sep-2006  rpaulo sync with head
 1.35.12.2 15-Jul-2007  ad Sync with head.
 1.35.12.1 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.36.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.36.8.1 09-Jan-2008  matt sync with HEAD
 1.36.6.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.37.24.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.37.24.1 19-Oct-2008  haad Sync with HEAD.
 1.37.18.2 11-Aug-2010  yamt sync with head.
 1.37.18.1 04-May-2009  yamt sync with head.
 1.37.14.1 17-Jan-2009  mjf Sync with HEAD.
 1.38.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.39.8.1 03-Jul-2010  rmind sync with head
 1.39.6.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.41.18.2 03-Dec-2017  jdolecek update from HEAD
 1.41.18.1 23-Jun-2013  tls resync from head
 1.41.8.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.42.14.2 28-Aug-2017  skrll Sync with HEAD
 1.42.14.1 06-Jun-2015  skrll Sync with HEAD
 1.44.4.1 15-Jan-2017  bouyer Initial commit of a CAN socket layer, compatible with linux SoccketCAN
(but incomplete). Based on work from Robert Swindells.
 1.45.10.1 10-Jun-2019  christos Sync with HEAD
 1.45.8.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.25 03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.24 03-Sep-2022  thorpej Convert MPLS from a legacy netisr to pktqueue.
 1.23 03-Sep-2022  thorpej Convert CAN from a legacy netisr to pktqueue.
 1.22 03-Sep-2022  thorpej Convert NETATALK from a legacy netisr to pktqueue.
 1.21 03-Sep-2022  thorpej Convert ARP from a legacy netisr to pktqueue.
 1.20 06-Sep-2018  maxv Remove the network ATM code.
 1.19 27-May-2017  bouyer branches: 1.19.8; 1.19.10;
merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.18 05-Jun-2014  rmind branches: 1.18.4; 1.18.12;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.17 01-Mar-2013  joerg branches: 1.17.10;
Retire OSI network stack. OK core@
 1.16 30-Jun-2011  wiz branches: 1.16.2; 1.16.12;
dependant -> dependent
 1.15 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.14 14-Jul-2007  ad branches: 1.14.32; 1.14.54; 1.14.56;
Generic soft interrupts are mandatory.
 1.13 07-Sep-2006  dogcow branches: 1.13.12;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.12 10-Dec-2005  elad branches: 1.12.4; 1.12.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.11 26-Feb-2005  perry branches: 1.11.4;
nuke trailing whitespace
 1.10 02-Nov-2002  kristerw branches: 1.10.6; 1.10.14; 1.10.16;
Revert previous. Nested comments are evil.
 1.9 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.8 06-Oct-2001  thorpej The bridge driver does all forwarding at interrupt level, and
does not use software interrupts; remove these bridge netisr
hooks left over from a previous incarnation of the bridge code.

Noted by Andrew Brown <atatat@atatdot.net>.
 1.7 14-Apr-2001  augustss branches: 1.7.2; 1.7.4;
Only dispatch slnetisr & co if we don't have generic soft interrupts.
 1.6 11-Apr-2001  thorpej Add bridge netisr glue (only used if no __HAVE_GENERIC_SOFT_INTERRUPTS).
 1.5 15-Jan-2001  thorpej branches: 1.5.2;
For SLIP/STRIP/PPP, use generic soft interrupts, if available.
 1.4 11-Jan-2001  thorpej Process STRIP software interrupts.
 1.3 09-Jan-2001  thorpej Once we have a complete frame, schedule a SLIP software interrupt,
and manipulate ipintrq from there. This will allow us to clean up
the use of splimp() in this file later.
 1.2 02-Jul-2000  cgd branches: 1.2.2;
Kwality control:
* put #includes of opt headers and headers to get protos used by
net/netisr_dispatch.h in net/netisr.h (if !defined(_LOCORE)) (rather than
in netisr_dispatch.h itself, and potentially nowhere, respectively).
* require netisr.h to be included before netisr_dispatch.h.
* minor additional cleanup of both netisr.h and netisr_dispatch.h.
* clean up uses to remove now-unnecessary header file inclusions, and
local prototypes of the fns.
* convert netisr dispatch implementations which didn't use
netisr_dispatch.h (pc532) to use it.
 1.1 21-Feb-2000  erh This is a fragment of the network soft interrupt routine in MD code. DONETISR should be defined to do the appropriate thing for each port before including this. This file is to keep the available NETISRs the same across all ports.
 1.2.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.2.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.2.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.2.1 02-Jul-2000  bouyer file netisr_dispatch.h was added on branch thorpej_scsipi on 2000-11-20 18:10:09 +0000
 1.5.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.5.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.7.4.1 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.7.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.16.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.10.14.1 29-Apr-2005  kent sync with -current
 1.10.6.2 11-Dec-2005  christos Sync with head.
 1.10.6.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.11.4.3 03-Sep-2007  yamt sync with head.
 1.11.4.2 30-Dec-2006  yamt sync with head.
 1.11.4.1 21-Jun-2006  yamt sync with head.
 1.12.8.1 14-Sep-2006  yamt sync with head.
 1.12.4.1 09-Sep-2006  rpaulo sync with head
 1.13.12.1 15-Jul-2007  ad Sync with head.
 1.14.56.1 03-Jul-2010  rmind sync with head
 1.14.54.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.14.32.1 11-Aug-2010  yamt sync with head.
 1.16.12.3 03-Dec-2017  jdolecek update from HEAD
 1.16.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.16.12.1 23-Jun-2013  tls resync from head
 1.16.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.17.10.1 10-Aug-2014  tls Rebase.
 1.18.12.1 15-Jan-2017  bouyer Initial commit of a CAN socket layer, compatible with linux SoccketCAN
(but incomplete). Based on work from Robert Swindells.
 1.18.4.1 28-Aug-2017  skrll Sync with HEAD
 1.19.10.1 10-Jun-2019  christos Sync with HEAD
 1.19.8.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.42 16-Aug-2022  knakahara micro optimaize for pfil_run_hooks(), ok'ed by ozaki-r@n.o and ryo@n.o.

That can improve IPv4 forwarding throughput 5% - 10%.
 1.41 17-May-2022  riastradh pfil(9): Assert pfil lists are not run in interrupt context.

All the paths leading to this should have been dispensed with by now.
The network stack runs in thread or softint context these days; hard
interrupt context is used only to put packets on queues deferred to
softint.
 1.40 17-May-2022  riastradh pfil(9): Assert sleepable when editing pfil lists.

These might sleep to wait for users to drain.
 1.39 22-Jun-2020  maxv pfil_psz gets dropped by the compiler because it is unused if !NET_MPSAFE,
so add an #ifdef around it, not to leak memory. Found by kLSan.
 1.38 27-Apr-2020  nat Remove inappropriate place for __predict_false.

Ok mrg@ maya@.
 1.37 27-Apr-2020  nat Skip pfil_run_hooks if no packet filter configured in kernel.
 1.36 01-Feb-2020  riastradh Fix wrong memory order and switch pfil to atomic_load/store_*.
 1.35 10-Mar-2017  ryo branches: 1.35.14; 1.35.20;
need to membar_producer() *before* switching.

pointed out by riastradh@, thanks
 1.34 23-Jan-2017  ozaki-r Call pserialize_perform and psref_target_destroy only if NET_MPSAFE

They shouldn't be used with holding softnet_lock.
 1.33 23-Jan-2017  ozaki-r Add curlwp_bind

It is necessary for example when we use tun(4). Without it the following
panic occurs:

panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, but preemption is enabled and the caller is not in a softint or CPU-bound LWP
Backtrace:
vpanic()
ch_voltag_convert_in()
psref_release()
pfil_run_arg.isra.0()
if_initialize()
if_attach()
tun_clone_create()
tunopen()
cdev_open()
spec_open()
VOP_OPEN()
vn_open()
do_open()
do_sys_openat()
sys_open()
syscall()
 1.32 16-Jan-2017  ryo Make pfil(9) MP-safe (applying psref(9))
 1.31 12-Jan-2017  ryo branches: 1.31.2;
* pfil_add_hook() no longer treats PFIL_IFADDR and PFIL_IFNET. delete them from pfil_flag_cases[].
* add/fix KASSERT
* fix comment
 1.30 04-Jan-2017  ryo Not to use ph_inout[2]. dir (= PFIL_IN or PFIL_OUT) is 1 or 2, not 0 or 1.
 1.29 26-Dec-2016  christos pfil(9) improvements to handle address changes:

Add:
PFIL_IFADDR call on interface reconfig (mbuf is ioctl #)
PFIL_IFNET call on interface attach/detach (mbuf is PFIL_IFNET_*)

from rmind@
 1.28 29-Jun-2013  rmind branches: 1.28.8; 1.28.12;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.27 23-Jun-2008  dyoung branches: 1.27.30; 1.27.40; 1.27.46;
Cosmetic: use LIST_FOREACH(). Join lines.
 1.26 23-Jun-2008  dyoung Cosmetic: use TAILQ_FOREACH(). Join lines.
 1.25 29-May-2008  mrg branches: 1.25.2;
remove clause #3 from my license where there are no other
copyright holders involved.
 1.24 11-Dec-2005  christos branches: 1.24.70; 1.24.72; 1.24.74; 1.24.76;
merge ktrace-lwp.
 1.23 27-Jul-2004  yamt - rename PFIL_NEWIF to PFIL_IFNET, and handle interface detach events
as well.
- use it for pf(4).

mostly from Peter Postma. PR/26403.
 1.22 18-Jul-2004  yamt pfil_run_hooks: don't dereference 'mp' unless it's a pointer.
 1.21 22-Jun-2004  itojun prepare PF-related hooks. reviewed by matt, perry, christos
 1.20 12-Nov-2001  lukem branches: 1.20.16;
add RCSIDs
 1.19 28-Dec-2000  thorpej branches: 1.19.2; 1.19.4;
Back out the sledgehammer damage applied by wiz while I was out for
the holiday.
 1.18 25-Dec-2000  wiz Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.
 1.17 22-Dec-2000  thorpej Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.
 1.16 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.15 23-Feb-2000  mycroft For pfil_add_hook(..., PFIL_ALL, ...), if we fail to add the output filter,
make sure to remove the input filter.
 1.14 22-Feb-2000  darrenr only call pfil_list_add with one of PFIL_IN or PFIL_OUT defined
 1.13 22-Feb-2000  darrenr return int from pfil_add_hook and pfil_remove_hook to indicate failure
or success, rather than panic'ing
 1.12 22-Feb-2000  darrenr fix from Mike Pelley to add filters in the reverse order for output
compared with input.
 1.11 20-Feb-2000  darrenr pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".
 1.10 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.9 10-Oct-1999  mrg branches: 1.9.2;
pass a pointer to the list, rather than passing a copy of it, when removing
functions from the pfil hook lists. this fixes the "missing function" problem.
also, re-add support for WAITOK that was lost several deltas ago.
 1.8 18-Jun-1999  mrg branches: 1.8.2;
call pfil_list_add with the right flag, to ensure it goes into the right list.
from mike@pelley.com in PR#7802.
 1.7 19-Mar-1998  mrg branches: 1.7.8; 1.7.10; 1.7.12;
convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.
 1.6 10-Oct-1997  mrg branches: 1.6.2;
remove advertising clause from all my licenses.
 1.5 20-Dec-1996  mrg branches: 1.5.10;
remove pfil_bad.
 1.4 13-Oct-1996  christos backout previous kprintf change
 1.3 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.2 05-Oct-1996  mrg minor copyright update.
 1.1 14-Sep-1996  mrg move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.
 1.5.10.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.6.2.1 23-Jul-1998  mellon Pull up 1.7 (veego)
 1.7.12.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.7.10.1 21-Jun-1999  thorpej Sync w/ -current.
 1.7.8.2 10-Oct-1999  cgd pull up rev 1.9 from trunk (requested by mrg):
Fix panic()s in pfil_list_remove() when running "ipf -D" a second
time with a DIAGNOSTIC kernel.
 1.7.8.1 24-Jun-1999  perry pullup 1.7->1.8 (mrg)
 1.8.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.9.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.9.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.9.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.19.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.20.16.1 03-Aug-2004  skrll Sync with HEAD
 1.24.76.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.24.76.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.24.74.1 04-May-2009  yamt sync with head.
 1.24.72.1 04-Jun-2008  yamt sync with head
 1.24.70.2 29-Jun-2008  mjf Sync with HEAD.
 1.24.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.25.2.1 27-Jun-2008  simonb Sync with head.
 1.27.46.1 28-Aug-2013  rmind sync with head
 1.27.40.2 03-Dec-2017  jdolecek update from HEAD
 1.27.40.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.27.30.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.28.12.2 20-Mar-2017  pgoyette Sync with HEAD
 1.28.12.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.28.8.2 28-Aug-2017  skrll Sync with HEAD
 1.28.8.1 05-Feb-2017  skrll Sync with HEAD
 1.31.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.35.20.1 29-Feb-2020  ad Sync with head.
 1.35.14.2 27-Apr-2020  nat Skip pfil_run_hooks if no packet filter enabled in the kernel.
 1.35.14.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.33 16-Jan-2017  ryo Make pfil(9) MP-safe (applying psref(9))
 1.32 26-Dec-2016  christos branches: 1.32.2;
pfil(9) improvements to handle address changes:

Add:
PFIL_IFADDR call on interface reconfig (mbuf is ioctl #)
PFIL_IFNET call on interface attach/detach (mbuf is PFIL_IFNET_*)

from rmind@
 1.31 29-Jun-2013  rmind branches: 1.31.8; 1.31.12;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.30 30-Sep-2012  dholland branches: 1.30.2;
u_long -> unsigned long, so this header compiles on its own like it
should. (and without adding <sys/types.h>)
 1.29 29-May-2008  mrg branches: 1.29.32; 1.29.42;
remove clause #3 from my license where there are no other
copyright holders involved.
 1.28 16-Feb-2006  perry branches: 1.28.64; 1.28.66; 1.28.68; 1.28.70;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.27 04-Jan-2006  perry branches: 1.27.2; 1.27.4;
#ifdef _KERNEL some function prototypes and an inline function
definition.

XXX It may be that this file needs more namespace cleaning (or the
files that include it, like if.h, might need it.)
 1.26 24-Dec-2005  perry branches: 1.26.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.25 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.24 27-Jul-2004  yamt branches: 1.24.12;
- rename PFIL_NEWIF to PFIL_IFNET, and handle interface detach events
as well.
- use it for pf(4).

mostly from Peter Postma. PR/26403.
 1.23 22-Jun-2004  itojun prepare PF-related hooks. reviewed by matt, perry, christos
 1.22 23-Jun-2003  martin branches: 1.22.2;
Protect kernel opt_*.h include by #ifdef _KERNEL_OPT
 1.21 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.20 30-May-2001  mrg use _KERNEL_OPT
 1.19 11-Apr-2001  itojun need to declare NULL for inline function.
 1.18 28-Dec-2000  thorpej branches: 1.18.2;
Back out the sledgehammer damage applied by wiz while I was out for
the holiday.
 1.17 25-Dec-2000  wiz Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.
 1.16 22-Dec-2000  thorpej Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.
 1.15 12-Dec-2000  thorpej Use <net/dlt.h>
 1.14 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.13 19-Apr-2000  itojun branches: 1.13.4;
remove extra memory region kept by "struct pfil_head pfil_head_t;".
it seems totally, unnecessary, or seems to be typo for typedef.
(correct me if i'm wrong)
 1.12 22-Feb-2000  darrenr return int from pfil_add_hook and pfil_remove_hook to indicate failure
or success, rather than panic'ing
 1.11 20-Feb-2000  darrenr pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".
 1.10 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.9 19-Mar-1998  mrg branches: 1.9.14;
convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.
 1.8 10-Oct-1997  mrg branches: 1.8.2;
remove advertising clause from all my licenses.
 1.7 29-Mar-1997  thorpej branches: 1.7.4;
Don't attempt to include config(8)-generated headers if we're included
by userland.
 1.6 22-Feb-1997  scottr Avoid duplicate definition of PFIL_HOOKS in the case that the config
file specifies that option.
 1.5 19-Feb-1997  scottr Don't include ipfilter.h if building an LKM.
 1.4 18-Feb-1997  mrg pseudo-device ipfilter brings in PFIL_HOOKS.
 1.3 20-Dec-1996  mrg branches: 1.3.4;
remove pfil_bad.
 1.2 05-Oct-1996  mrg minor copyright update.
 1.1 14-Sep-1996  mrg move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.
 1.3.4.1 12-Mar-1997  is Merge in changes from The Trunk
 1.7.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.8.2.1 23-Jul-1998  mellon Pull up 1.9 (veego)
 1.9.14.5 21-Apr-2001  bouyer Sync with HEAD
 1.9.14.4 05-Jan-2001  bouyer Sync with HEAD
 1.9.14.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.9.14.2 22-Nov-2000  bouyer Sync with HEAD.
 1.9.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.4.1 23-Apr-2001  he Pull up revision 1.19 (via patch, requested by itojun):
Include <sys/null.h> to define NULL for inline function.
 1.18.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.22.2.4 11-Dec-2005  christos Sync with head.
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.24.12.1 21-Jun-2006  yamt sync with head.
 1.26.2.2 18-Feb-2006  yamt sync with head.
 1.26.2.1 15-Jan-2006  yamt sync with head.
 1.27.4.1 22-Apr-2006  simonb Sync with head.
 1.27.2.1 09-Sep-2006  rpaulo sync with head
 1.28.70.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.28.68.1 04-May-2009  yamt sync with head.
 1.28.66.1 04-Jun-2008  yamt sync with head
 1.28.64.1 02-Jun-2008  mjf Sync with HEAD.
 1.29.42.3 03-Dec-2017  jdolecek update from HEAD
 1.29.42.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.29.42.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.29.32.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.32.1 30-Oct-2012  yamt sync with head
 1.30.2.1 28-Aug-2013  rmind sync with head
 1.31.12.2 20-Mar-2017  pgoyette Sync with HEAD
 1.31.12.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.31.8.1 05-Feb-2017  skrll Sync with HEAD
 1.32.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.35 04-Jan-2023  knakahara Fix libreswan build failure. Pointed out by Andrew Cagney, thanks.
 1.34 11-Oct-2022  knakahara branches: 1.34.2;
Add sadb_x_policy_flags to inform SP origination.

This extension(struct sadb_x_policy) is *not* defined by RFC2367.

OpenBSD does not have reserved fields in struct sadb_x_policy.
Linux does not use this field yet.
FreeBSD uses this field as "sadb_x_policy_scope"; the value range is
from 0x00 to 0x04.

We use from most significant bit to avoid the above usage.
 1.33 16-Apr-2022  andvar fix various typos in comments and log messages.
 1.32 04-Jul-2017  ozaki-r Introduce and use SADB_SASTATE_USABLE_P
 1.31 13-Apr-2017  christos branches: 1.31.4;
Redo the statistics through an indirection array and put the definitions
of the arrays in pfkeyv2.h so that they are next to the index definitions.
Remove "bogus" comment about compressing the statistics which is now fixed.
 1.30 09-Jun-2011  drochner branches: 1.30.12; 1.30.30; 1.30.34; 1.30.38;
more "const"
 1.29 26-May-2011  drochner branches: 1.29.2;
pull in AES-GCM/GMAC support from OpenBSD
This is still somewhat experimental. Tested between 2 similar boxes
so far. There is much potential for performance improvement. For now,
I've changed the gmac code to accept any data alignment, as the "char *"
pointer suggests. As the code is practically used, 32-bit alignment
can be assumed, at the cost of data copies. I don't know whether
bytewise access or copies are worse performance-wise. For efficient
implementations using SSE2 instructions on x86, even stricter
alignment requirements might arise.
 1.28 05-May-2011  drochner add IANA number for camellia-cbc, copied from FreeBSD
 1.27 05-Sep-2010  spz branches: 1.27.2;
fix two bugs in the PFKEY interface:

1) RFC2367 says in 2.3.3 Address Extension: "All non-address
information in the sockaddrs, such as sin_zero for AF_INET sockaddrs,
and sin6_flowinfo for AF_INET6 sockaddrs, MUST be zeroed out."
the IPSEC_NAT_T code was expecting the port information it needs
to be conveyed in the sockaddr instead of exclusively by
SADB_X_EXT_NAT_T_SPORT and SADB_X_EXT_NAT_T_DPORT,
and was not zeroing out the port information in the non-nat-traversal
case.
Since it was expecting the port information to reside in the sockaddr
it could get away with (re)setting the ports after starting to use them.
-> Set the natt ports before setting the SA mature.

2) RFC3947 has two Original Address fields, initiator and responder,
so we need SADB_X_EXT_NAT_T_OAI and SADB_X_EXT_NAT_T_OAR and not just
SADB_X_EXT_NAT_T_OA

The change has been created using vanhu's patch for FreeBSD as reference.

Note that establishing actual nat-t sessions has not yet been tested.

Likely fixes the following:
PR bin/41757
PR net/42592
PR net/42606
 1.26 20-Feb-2008  matt branches: 1.26.2; 1.26.10; 1.26.30; 1.26.32;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.25 10-Dec-2005  elad branches: 1.25.46;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.24 29-Oct-2005  yamt correct SADB_X_MIGRATE. pointed by Francis Dupont.
 1.23 25-Oct-2005  yamt add some #if 0'ed out SADB_X_* definitions found in kame tree
to avoid conflicting numbers.
 1.22 28-Jun-2005  christos branches: 1.22.2; 1.22.4;
Add some casts to appease lint
 1.21 26-Jun-2005  christos de-lint some pointer casts.
 1.20 26-Feb-2005  perry nuke trailing whitespace
 1.19 12-Feb-2005  manu Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.18 14-Jan-2005  itojun branches: 1.18.2; 1.18.4;
ESP AESCTR got an official protocol number
http://www.iana.org/assignments/isakmp-registry
 1.17 06-Dec-2004  itojun reqid (for unique policy) is u_int16_t quantity. from markus@openbsd
 1.16 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.15 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.14 25-Jul-2003  itojun add AH/ESP algorithms: hmac-ripemd160 (AH), AES XCBC MAC (AH),
AES counter mode (ESP)
 1.13 22-Jul-2003  itojun add hmac-sha2 support. various cleanups (like avoid hardcoding '16').
from kame
 1.12 02-Aug-2001  itojun branches: 1.12.20;
pass replay sequence number on sadb_x_sa2 (it's outside of PF_KEY standard
anyways).
 1.11 03-Oct-2000  itojun branches: 1.11.2; 1.11.4;
typo
 1.10 03-Oct-2000  itojun forgot to update maximum number of algorithms
 1.9 03-Oct-2000  itojun add official # for AES (12), and make it equal to rijndael.
Note that:
- IANA assignment was made for AES
- we still have some time window till AES gets finalized, so until it gets
finalized, we are not certain if AES == rijndael
but it should now be okay.
 1.8 03-Oct-2000  itojun get rid of RC5 algorithm # completely (#if 0'ed for a long time)
sync with kame.
 1.7 18-Jul-2000  itojun correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release)
 1.6 01-Jul-2000  itojun nuke sadb_x_ident_id, wihich violates pfkey standard.
correct get/set SA handling.
(from kame)
 1.5 12-Jun-2000  itojun branches: 1.5.2;
sync with almost-latest KAME IPsec. full changelog would be too big
to mention here. notable changes are like below.

kernel:
- make PF_KEY kernel interface more robust against broken input stream.
it includes complete internal structure change in sys/netkey/key.c.
- remove non-RFC compliant change in PF_KEY API, in particular,
in struct sadb_msg. we cannot just change these standard structs.
sadb_x_sa2 is introduced instead.
- remove prototypes for pfkey_xx functions from /usr/include/net/pfkeyv2.h.
these functions are not supplied in /usr/lib.

setkey(8):
- get/delete does not require "-m mode" (ignored with warning, if you
specify it)
- spddelete takes direction specification
 1.4 09-Feb-2000  itojun branches: 1.4.2;
for more strict rfc2367 conformance, move netkey/keyv2.h into net/pfkeyv2.h
(net/pfkeyv2.h used to just include netkey/keyv2.h).

netkey/keyv2.h includes #error only for several days, to inform
of file path change. after that I plan to nuke the file.
 1.3 06-Jul-1999  itojun branches: 1.3.2;
sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file pfkeyv2.h was initially added on branch kame.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file pfkeyv2.h was added on branch chs-ubc2 on 1999-07-01 23:45:20 +0000
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.5.2.3 04-Oct-2000  itojun pullup (approved by releng-1-5)
rijndael-cbc kernel support.

sys/crypto/rijndael/* add tag for latest
sys/netinet6/esp_rijndael.[ch] add tag for latest
sys/netinet6/esp_core.c 1.9 -> 1.11
sys/conf/files 1.389 -> 1.390, 1.395 -> 1.396
sys/net/pfkeyv2.h 1.7 -> 1.11
 1.5.2.2 25-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

correct RFC2367 PF_KEY conformance (SADB_[AE]ALG_xx values and namespaces).
sync from kame.

WARNING: need recompilation of setkey(8) and pkgsrc/security/racoon.
(no ipsec-ready netbsd was released as official release, so binary backward
compatibility is less big issue)

(sys/netinet6/esp.h only, 1.10 -> 1.11)
wrap kernel function prototype by #ifdef _KERNEL.

--- revisions pulled up:
1.6 -> 1.7 syssrc/sys/net/pfkeyv2.h
1.10 -> 1.11 syssrc/sys/netinet6/ah.h
1.10 -> 1.11 syssrc/sys/netinet6/ah_output.c
1.19 -> 1.20 syssrc/sys/netinet6/ah_core.c
1.15 -> 1.16 syssrc/sys/netinet6/ah_input.c
1.8 -> 1.9 syssrc/sys/netinet6/esp.h
1.10 -> 1.11 syssrc/sys/netinet6/esp.h
1.1 -> 1.2 syssrc/sys/netinet6/esp_core.c
1.1 -> 1.2 syssrc/sys/netinet6/esp_input.c
1.2 -> 1.3 syssrc/sys/netinet6/esp_output.c
1.26 -> 1.27 syssrc/sys/netkey/key.c
 1.5.2.1 01-Jul-2000  itojun pullup (approved by releng-1-5)

nuke sadb_x_ident_id, wihich violates pfkey standard.
correct get/set SA handling.
(from kame)
 1.11.4.1 03-Aug-2001  lukem update to -current
 1.11.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.12.20.9 11-Dec-2005  christos Sync with head.
 1.12.20.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.12.20.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.12.20.6 15-Feb-2005  skrll Sync with HEAD.
 1.12.20.5 17-Jan-2005  skrll Sync with HEAD.
 1.12.20.4 18-Dec-2004  skrll Sync with HEAD.
 1.12.20.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.20.2 18-Sep-2004  skrll Sync with HEAD.
 1.12.20.1 03-Aug-2004  skrll Sync with HEAD
 1.18.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.18.4.1 12-Feb-2005  yamt sync with head.
 1.18.2.1 29-Apr-2005  kent sync with -current
 1.22.4.2 02-Nov-2005  yamt sync with head.
 1.22.4.1 26-Oct-2005  yamt sync with head
 1.22.2.2 27-Feb-2008  yamt sync with head.
 1.22.2.1 21-Jun-2006  yamt sync with head.
 1.25.46.1 23-Mar-2008  matt sync with HEAD
 1.26.32.3 12-Jun-2011  rmind sync with head
 1.26.32.2 31-May-2011  rmind sync with head
 1.26.32.1 05-Mar-2011  rmind sync with head
 1.26.30.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.26.10.1 09-Oct-2010  yamt sync with head
 1.26.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.27.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.29.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.30.38.1 21-Apr-2017  bouyer Sync with HEAD
 1.30.34.1 26-Apr-2017  pgoyette Sync with HEAD
 1.30.30.1 28-Aug-2017  skrll Sync with HEAD
 1.30.12.1 03-Dec-2017  jdolecek update from HEAD
 1.31.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.34.2.1 04-Jan-2023  martin Pull up following revision(s) (requested by knakahara in ticket #36):

sys/net/pfkeyv2.h: revision 1.35

Fix libreswan build failure. Pointed out by Andrew Cagney, thanks.
 1.22 28-May-2023  andvar s/explcit/explicit/ in comment.
 1.21 04-Sep-2022  thorpej In pktq_flush():
- Run a dummy softint at IPL_SOFTNET on all CPUs to ensure that the
ISR for this pktqueue is not running (addresses a pre-existing XXX).
- Hold the barrier lock around the critical section to ensure that
implicit pktq_barrier() calls via pktq_ifdetach() are held off during
the critical section.
- Ensure the critical section completes in minimal time by not freeing
memory during the critical section; instead, just build a list of the
packets pulled out of the per-CPU queues and free them after the critical
section is over.
 1.20 02-Sep-2022  thorpej Re-factor how pktq_barrier() is issued by if_detach().

Rather than excplicitly referencing ip_pktq and ip6_pktq in if_detach(),
instead add all pktqueues to a global list. This list is then used in
the new pktq_ifdetach() function to issue a barrier on all pktqueues.

Note that the performance of this list is not critical; it will seldom
be accessed (then pktqueues are created/destroyed and when network
interfaces are detached), and so a simple synchronization strategy using
a rwlock is sufficient.
 1.19 02-Sep-2022  thorpej pktqueue: Re-factor sysctl handling.

Provide a new pktq_sysctl_setup() function that attaches standard
pktq sysctl nodes below a specified parent node, with either a
fixed node ID or CTL_CREATE to dynamically assign node IDs. Make
all of the sysctl handlers private to pktqueue.c, and remove the
INET- and INET6-specific pktqueue sysctl code from net/if.c.
 1.18 01-Sep-2022  thorpej pktq_rps_hash(): Make the "funcp" argument const.
 1.17 01-Sep-2022  thorpej pktq_dequeue(): Prevent packets from getting stuck beind barrier markers.

pktq_barrier() ensures that all packets enqueued before the barrier have
been dequeued before the barrier returns. However, previously, pktq_dequeue()
would return NULL when a barrier marker was encountered. If there were
packets queued up behind the marker and no additional softint were scheduled
for the pktqueue, those packets would end up stranded. pktq_dequeue() now
continues to the next slot after the marker, ensuring that processing can
continue after the barrier has been signaled.
 1.16 21-Dec-2021  knakahara Fix net.*.rps_hash=toeplitz-othercpus on one CPU systems.
 1.15 15-Dec-2021  knakahara Fix typo in comment.
 1.14 11-Oct-2021  knakahara Make pktq_rps_hash() pluggable for each interface type. Reviewed by gdt@n.o, thorpej@n.o, and riastradh@n.o, thanks.
 1.13 25-Mar-2021  skrll Remove strange padding #define and replace with anonymous struct/union
 1.12 11-Sep-2020  riastradh branches: 1.12.2; 1.12.4;
pktqueue(9): Use percpu_create to allow early initialization.

Otherwise pktqueues can't be created before all CPUs are detected --
they will have a queue only for the primary CPU, not for others.

This will also be necessary if we want to add CPU hotplug (still need
some way to block hotplug during pktq_set_maxlen but it's a start).
 1.11 07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.10 10-Aug-2018  msaitoh branches: 1.10.6;
- Fix a bug that drop counter shows incorrect vaule like
"net.inet.ip.ifq.drops = 72059810241052672"
- Change pktq's length sysctl to uint64_t.
 1.9 01-Jun-2017  chs branches: 1.9.8; 1.9.10;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.8 04-Jul-2014  ozaki-r branches: 1.8.2; 1.8.6; 1.8.8;
Fix pktq_enqueue for rump

Add _RUMP_NATIVE_ABI to the macro condition for i386 and x86_64 because
_RUMPKERNEL is not defined for them. See sys/rump/Makefile.rump.

Found by ATF
 1.7 02-Jul-2014  ozaki-r Restore RPS of pktq_enqueue unless _RUMPKERNEL

It's a workaround and would be fixed in rump soon.

ok pooka@
 1.6 16-Jun-2014  ozaki-r Move sysctl_pktq_{maxlen,count} to pktqueue.c and make them global

They will be used by bridge.

ok rmind@
 1.5 16-Jun-2014  ozaki-r Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@
 1.4 09-Jun-2014  rmind pktqueue: add or fix some comments, remove some header inclusions.
 1.3 09-Jun-2014  rmind Restore the assert in RUMP's softint_schedule_cpu() and just ensure
curcpu() in the caller.
 1.2 09-Jun-2014  rmind Implement pktq_set_maxlen() and let sysctl net.inet.{ip,ip6}.ifq.maxlen be
changed on the fly again.
 1.1 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.8.8.1 28-Aug-2017  skrll Sync with HEAD
 1.8.6.3 03-Dec-2017  jdolecek update from HEAD
 1.8.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.8.6.1 04-Jul-2014  tls file pktqueue.c was added on branch tls-maxphys on 2014-08-20 00:04:34 +0000
 1.8.2.2 10-Aug-2014  tls Rebase.
 1.8.2.1 04-Jul-2014  tls file pktqueue.c was added on branch tls-earlyentropy on 2014-08-10 06:56:15 +0000
 1.9.10.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9.10.1 10-Jun-2019  christos Sync with HEAD
 1.9.8.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.10.6.1 29-Feb-2020  ad Sync with head.
 1.12.4.1 03-Apr-2021  thorpej Sync with HEAD.
 1.12.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.8 02-Sep-2022  thorpej Re-factor how pktq_barrier() is issued by if_detach().

Rather than excplicitly referencing ip_pktq and ip6_pktq in if_detach(),
instead add all pktqueues to a global list. This list is then used in
the new pktq_ifdetach() function to issue a barrier on all pktqueues.

Note that the performance of this list is not critical; it will seldom
be accessed (then pktqueues are created/destroyed and when network
interfaces are detached), and so a simple synchronization strategy using
a rwlock is sufficient.
 1.7 02-Sep-2022  thorpej pktqueue: Re-factor sysctl handling.

Provide a new pktq_sysctl_setup() function that attaches standard
pktq sysctl nodes below a specified parent node, with either a
fixed node ID or CTL_CREATE to dynamically assign node IDs. Make
all of the sysctl handlers private to pktqueue.c, and remove the
INET- and INET6-specific pktqueue sysctl code from net/if.c.
 1.6 01-Sep-2022  thorpej pktq_rps_hash(): Make the "funcp" argument const.
 1.5 11-Oct-2021  knakahara Make pktq_rps_hash() pluggable for each interface type. Reviewed by gdt@n.o, thorpej@n.o, and riastradh@n.o, thanks.
 1.4 16-Jun-2014  ozaki-r branches: 1.4.2; 1.4.6;
Move sysctl_pktq_{maxlen,count} to pktqueue.c and make them global

They will be used by bridge.

ok rmind@
 1.3 16-Jun-2014  ozaki-r Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@
 1.2 09-Jun-2014  rmind Implement pktq_set_maxlen() and let sysctl net.inet.{ip,ip6}.ifq.maxlen be
changed on the fly again.
 1.1 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.4.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.6.1 16-Jun-2014  tls file pktqueue.h was added on branch tls-maxphys on 2014-08-20 00:04:34 +0000
 1.4.2.2 10-Aug-2014  tls Rebase.
 1.4.2.1 16-Jun-2014  tls file pktqueue.h was added on branch tls-earlyentropy on 2014-08-10 06:56:16 +0000
 1.16 29-Nov-2008  cube Fix handling of ppp compressor modules, from Andrew Doran's input.
- ref count each compressor
- allow {un,}registration of several modules at once
- une RUN_ONCE to make sure the mutex is initialised, because
unfortunately built-in (and bootloader-loaded) modules init functions
are run before pseudo-devices attach (reported by Nick Hudson).
 1.15 25-Nov-2008  cube Rework the way PPP compmressors are handled and allow them to be
automatically loaded when needed.
 1.14 20-Feb-2008  matt branches: 1.14.6; 1.14.10; 1.14.16; 1.14.18;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.13 11-Dec-2005  thorpej branches: 1.13.46;
ANSI function decls and application of static.
 1.12 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.11 20-Feb-2005  cube branches: 1.11.4;
Add MPPE definitions (from ppp-2.4.3).
 1.10 08-Jul-2003  itojun branches: 1.10.8; 1.10.10;
prototype must not have variable name
 1.9 27-Mar-2003  christos branches: 1.9.2;
PR/20844: Iain Hibbert: PPP Compressors cannot be loaded as LKM
 1.8 13-Sep-2002  itojun copyright clarification. from openbsd

1.
Paul Mackerras and the Australian National University have worked things
out, and as a result, Paul now owns copyright on all these files, with the
proper terms.

2.
and... we managed to contact "Eric Rosenquist" <eric@rosenquist.com> through
the help of people who found him: first one was nick.stott@cogeco.ca
This now has a better license. Two authors left to go.
 1.7 01-Jul-2002  itojun new copyright boilerplate from CMU. from openbsd
 1.6 29-May-2002  christos add 2 more CCP defines.
 1.5 23-Feb-2001  christos branches: 1.5.2; 1.5.4; 1.5.16;
change CCP maxlen to 64 to accomodate mschap-2.
 1.4 02-May-1998  christos branches: 1.4.14;
Merge changes from pppd-2.3.4; adds ppp-deflate-draft stuff and updates
zlib. Maybe we can merge our other copy of zlib with this one now and
avoid having two copies?
 1.3 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.2 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.1 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.4.14.1 12-Mar-2001  bouyer Sync with HEAD.
 1.5.16.2 15-Jul-2002  gehenna catch up with -current.
 1.5.16.1 30-May-2002  gehenna Catch up with -current.
 1.5.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.5.4.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.5.4.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.5.2.3 17-Sep-2002  nathanw Catch up to -current.
 1.5.2.2 01-Aug-2002  nathanw Catch up to -current.
 1.5.2.1 20-Jun-2002  nathanw Catch up to -current.
 1.9.2.5 11-Dec-2005  christos Sync with head.
 1.9.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.9.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.2.1 03-Aug-2004  skrll Sync with HEAD
 1.10.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.10.8.1 29-Apr-2005  kent sync with -current
 1.11.4.2 27-Feb-2008  yamt sync with head.
 1.11.4.1 21-Jun-2006  yamt sync with head.
 1.13.46.1 23-Mar-2008  matt sync with HEAD
 1.14.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.14.16.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.14.10.1 04-May-2009  yamt sync with head.
 1.14.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.24 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.23 06-Aug-2016  pgoyette branches: 1.23.52;
Catch up with the renaming of module ppp --> if_ppp and avoid warning
messages at boot (or module load) time.
 1.22 06-Aug-2016  pgoyette Modularize the ppp driver, and adjust dependencies of the compressor
modules.

For now, this is still included as a built-in module in GENERIC kernels.
 1.21 05-Apr-2016  pgoyette Add modular dependency on zlib module.
 1.20 17-Dec-2008  cegger branches: 1.20.24; 1.20.42;
kill MALLOC and FREE macros.
 1.19 29-Nov-2008  cube Fix handling of ppp compressor modules, from Andrew Doran's input.
- ref count each compressor
- allow {un,}registration of several modules at once
- une RUN_ONCE to make sure the mutex is initialised, because
unfortunately built-in (and bootloader-loaded) modules init functions
are run before pseudo-devices attach (reported by Nick Hudson).
 1.18 25-Nov-2008  cube Rework the way PPP compmressors are handled and allow them to be
automatically loaded when needed.
 1.17 05-May-2008  ad branches: 1.17.6; 1.17.8;
Back out previous. It broke the build.
 1.16 04-May-2008  ad Move zlib out of net/ and into kern/. It would probably be better to use
the reachover Makefiles and libz, but this is already here and it works.
 1.15 16-Nov-2006  christos branches: 1.15.48; 1.15.52;
__unused removal on arguments; approved by core.
 1.14 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.13 11-Dec-2005  thorpej branches: 1.13.20; 1.13.22;
ANSI function decls and application of static.
 1.12 13-Sep-2002  itojun branches: 1.12.22;
copyright clarification. from openbsd

1.
Paul Mackerras and the Australian National University have worked things
out, and as a result, Paul now owns copyright on all these files, with the
proper terms.

2.
and... we managed to contact "Eric Rosenquist" <eric@rosenquist.com> through
the help of people who found him: first one was nick.stott@cogeco.ca
This now has a better license. Two authors left to go.
 1.11 13-Mar-2002  fvdl Fix what looks like a merge error: olen = 0 in z_decompress, not
PPP_HDRLEN, which caused lots of 'ppp_deflate0: exceeded mru (1508 > 1504)'
messages.
 1.10 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.9 12-Nov-2001  lukem add RCSIDs
 1.8 18-Jul-2001  thorpej bzero -> memset
 1.7 25-Aug-2000  thorpej branches: 1.7.2; 1.7.4;
Don't use MALLOC() for variable-sized allocations.
 1.6 02-May-1998  christos branches: 1.6.14; 1.6.24;
Merge changes from pppd-2.3.4; adds ppp-deflate-draft stuff and updates
zlib. Maybe we can merge our other copy of zlib with this one now and
avoid having two copies?
 1.5 17-May-1997  christos Update to ppp-2.3b5
 1.4 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.3 13-Oct-1996  christos backout previous kprintf change
 1.2 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.1 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.6.24.1 20-Mar-2002  he Pull up revision 1.11 (requested by fvdl):
Fix a problem related to compression which would cause a lot of
logged incorrect warnings.
 1.6.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.4.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.7.4.3 16-Mar-2002  jdolecek Catch up with -current.
 1.7.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.7.4.1 03-Aug-2001  lukem update to -current
 1.7.2.5 17-Sep-2002  nathanw Catch up to -current.
 1.7.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.7.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.7.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.7.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.12.22.1 21-Jun-2006  yamt sync with head.
 1.13.22.2 10-Dec-2006  yamt sync with head.
 1.13.22.1 22-Oct-2006  yamt sync with head
 1.13.20.1 18-Nov-2006  ad Sync with head.
 1.15.52.1 04-May-2009  yamt sync with head.
 1.15.48.1 17-Jan-2009  mjf Sync with HEAD.
 1.17.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.17.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.20.42.2 05-Oct-2016  skrll Sync with HEAD
 1.20.42.1 22-Apr-2016  skrll Sync with HEAD
 1.20.24.1 03-Dec-2017  jdolecek update from HEAD
 1.23.52.1 02-Aug-2025  perseant Sync with HEAD
 1.14 04-Apr-2020  is Multilink fragment protocol type.
 1.13 20-Feb-2008  matt branches: 1.13.98; 1.13.106;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.12 17-Feb-2007  dyoung branches: 1.12.18;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.11 10-Dec-2005  elad branches: 1.11.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.10 20-Feb-2005  cube branches: 1.10.4;
Add a couple of protocols (from ppp-2.4.3).
 1.9 26-Apr-2004  matt branches: 1.9.4; 1.9.6;
Remove #else of #if __STDC__
 1.8 13-Sep-2002  itojun branches: 1.8.6;
copyright clarification. from openbsd

1.
Paul Mackerras and the Australian National University have worked things
out, and as a result, Paul now owns copyright on all these files, with the
proper terms.

2.
and... we managed to contact "Eric Rosenquist" <eric@rosenquist.com> through
the help of people who found him: first one was nick.stott@cogeco.ca
This now has a better license. Two authors left to go.
 1.7 02-Jul-2000  sommerfeld branches: 1.7.2; 1.7.4;
Merge if_spppsubr.c PPP protocol declarations list with the one found
in ppp_defs.h, and have if_spppsubr.c include ppp_defs.h rather than
duplicate its definitions.

[This is a stopgap measure to clean up build lossage.]
 1.6 01-Jul-1999  itojun branches: 1.6.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.5 04-May-1998  christos branches: 1.5.10; 1.5.12;
Add IPX bits.
 1.4 09-Feb-1998  perry add multiple inclusion protection (and cleanup).
 1.3 17-May-1997  christos Update to ppp-2.3b5
 1.2 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.1 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.5.12.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.5.12.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.5.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.5.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.4.1 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.7.2.1 17-Sep-2002  nathanw Catch up to -current.
 1.8.6.5 11-Dec-2005  christos Sync with head.
 1.8.6.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.8.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.8.6.1 03-Aug-2004  skrll Sync with HEAD
 1.9.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.9.4.1 29-Apr-2005  kent sync with -current
 1.10.4.3 27-Feb-2008  yamt sync with head.
 1.10.4.2 26-Feb-2007  yamt sync with head.
 1.10.4.1 21-Jun-2006  yamt sync with head.
 1.11.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.12.18.1 23-Mar-2008  matt sync with HEAD
 1.13.106.1 07-Apr-2020  is Multilink fragment protocol type.
 1.13.98.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.73 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.72 21-Dec-2022  chs branches: 1.72.6;
ppp: remove ioctls that never worked and crash the kernel

Remove vestigial bits of PPP HDLC support that never worked on netbsd.
The TIOCRCVFRAME ioctl was apparently intended to be called only from
within the kernel, but nothing prevents user code from calling this ioctl
and crashing the kernel.

Reported-by: syzbot+53e4620d0d17a4dd08fa@syzkaller.appspotmail.com
Reported-by: syzbot+d3a8b784fed1e32e0768@syzkaller.appspotmail.com
Reported-by: syzbot+375bab63345a6a7a3331@syzkaller.appspotmail.com
Reported-by: syzbot+ba7ac85196274a20b54a@syzkaller.appspotmail.com
Reported-by: syzbot+57ddb63a3d1d3299ef18@syzkaller.appspotmail.com
 1.71 26-Oct-2022  riastradh branches: 1.71.2;
ppp(4): Convert to ttylock/ttyunlock.
 1.70 04-May-2022  andvar fix various typos in comments and log messages.
 1.69 13-Dec-2021  msaitoh Use unsigned to avoid undefined behavior. Found by kUBSan.

Reported-by: syzbot+699ce32cd32e2a670788@syzkaller.appspotmail.com
 1.68 27-Sep-2021  msaitoh Use unsigned to avoid undefined behavior in pppasyncstart().

Reported-by: syzbot+7c8c7977e2756ac13f0a@syzkaller.appspotmail.com
 1.67 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.66 20-Sep-2019  maxv branches: 1.66.2;
dedup
 1.65 24-Jan-2019  knakahara branches: 1.65.4;
Add KERNEL_LOCK in ppptioctl() to protect struct ppp_softc members.

struct linesw.i_ioctl can be called without any preservation when the caller's
struct cdevsw is set D_MPSAFE such as ucom(4).
 1.64 07-Feb-2018  mrg branches: 1.64.2; 1.64.4;
ppprcvframe() has indentation issues.
 1.63 02-Oct-2016  christos branches: 1.63.6; 1.63.8;
MFREE -> m_free
 1.62 06-Aug-2016  pgoyette Modularize the ppp driver, and adjust dependencies of the compressor
modules.

For now, this is still included as a built-in module in GENERIC kernels.
 1.61 20-Jun-2016  knakahara branches: 1.61.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.60 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.59 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.58 22-May-2014  dholland branches: 1.58.4;
Use accessor functions for the tty's table of control characters.
(at least from outside the core tty sources)

Move some xon/xoff code from net/ppp_tty.c to kern/tty.c.
 1.57 05-Apr-2010  joerg branches: 1.57.18; 1.57.32;
Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.56 19-Jan-2010  pooka branches: 1.56.2; 1.56.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.55 07-May-2009  elad Introduce actions/requests to handle authorization for ppp(4), sl(4),
strip(4), btuart(4) and bcsp(4) network interfaces and devices.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/27/msg004955.html
 1.54 15-Apr-2009  elad Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.
 1.53 25-May-2008  ad branches: 1.53.6; 1.53.12;
Properly fix the "hanging in tty" bug that was worked around with cv_wakeup()
some time again.
 1.52 20-Feb-2008  matt branches: 1.52.6; 1.52.8; 1.52.10; 1.52.12;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.51 28-Nov-2007  ad Grab tty_lock in more places. Noted and tested by degroote@.
 1.50 12-Nov-2007  ad Call ttwakeup() with tty_lock held.
 1.49 10-Nov-2007  ad Call ttyflush() with tty_lock held.
 1.48 07-Nov-2007  ad Merge tty changes from the vmlocking branch.
 1.47 04-Mar-2007  christos branches: 1.47.2; 1.47.14; 1.47.16; 1.47.20; 1.47.22;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.46 04-Jan-2007  elad branches: 1.46.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.45 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.44 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.43 23-Jul-2006  ad branches: 1.43.4; 1.43.6;
Use the LWP cached credentials where sane.
 1.42 14-May-2006  elad integrate kauth.
 1.41 11-Dec-2005  thorpej branches: 1.41.4; 1.41.6; 1.41.8; 1.41.10; 1.41.12;
ANSI function decls and application of static.
 1.40 11-Dec-2005  christos merge ktrace-lwp.
 1.39 27-Nov-2005  thorpej Overhaul how TTY line disciplines are handled:
- Replace references to linesw[0] with a ttyldisc_default() function
that returns the default ("termios") line discipline.
- The linesw[] array is gone, replaced by a linked list.
- ttyldisc_add() and ttyldisc_remove() have been replaced by
ttyldisc_attach() and ttyldisc_detach().
- Things that provide line disciplines are now responsible for
registering those disciplines with the system. The linesw
structures are no longer declared in tty_conf.c
- Line disciplines are now refcounted; a lookup causes a reference to
be held. ttyldisc_release() releases the reference. Attempts to
detach an in-use line discipline result in EBUSY.
- Fix function signature lossage in if_sl.c, if_strip.c, and tty_tb.c
that was masked by the old tty_conf.c
- tty_init() is no longer necessary; delete it and its call from main().
 1.38 11-Jun-2005  christos branches: 1.38.2; 1.38.8;
30393/Miles Nordin: PF/ALTQ does not work on ppp(4) interfaces
This is because the mbuf chain created did not have a header.
 1.37 29-May-2005  christos - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.36 17-May-2005  christos Yes, it was a cool trick >20 years ago to use "0123456789abcdef"[a] to
implement, xtoa(), but I think defining the samestring 50 times is a bit
too much. Defined HEXDIGITS and hexdigits in subr_prf.c and use it...
 1.35 26-Feb-2005  perry nuke trailing whitespace
 1.34 01-Sep-2003  christos branches: 1.34.8; 1.34.10;
Add a new ioctl PPPIOCGRAWIN to get the last characters we got from the
remote site.
 1.33 26-Feb-2003  matt branches: 1.33.2;
Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.32 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.31 01-Jul-2002  itojun new copyright boilerplate from CMU. from openbsd
 1.30 17-Mar-2002  atatat branches: 1.30.4;
Convert ioctl code to use EPASSTHROUGH instead of -1 or ENOTTY for
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
 1.29 05-Mar-2002  itojun bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.28 13-Nov-2001  lukem remove unnecessary #if NFOO > 0 .... #endif wrappers
 1.27 12-Nov-2001  lukem add RCSIDs
 1.26 18-Jul-2001  thorpej branches: 1.26.2;
bzero -> memset
 1.25 14-Jun-2001  itojun branches: 1.25.2;
change the meaning of ifnet.if_lastchange to meet RFC1573 ifLastChange.
follows BSD/OS practice and ucd-snmp code (FreeBSD does it for specific
interfaces only).

was: if_lastchange get updated on every packet transmission/receipt.
now: if_lastchange get updated when IFF_UP is changed.
 1.24 31-Mar-2001  enami Remove unnecessary test of tp->t_linesw against NULL; they are results
of confusion while correcting compilation error after t_line is
replaced with t_linesw.
 1.23 18-Jan-2001  jdolecek branches: 1.23.2;
constify
 1.22 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.21 02-Nov-2000  itojun sync with struct tty change (does it look correct?)
 1.20 30-Mar-2000  augustss Kill some more register declarations.
 1.19 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.18 25-Aug-1999  christos branches: 1.18.2;
changes from ppp-2.3.9 [synchronous]
 1.17 24-May-1999  tron Fix kernel builds if ppp interface but no bpf filters are configured.
Patch supplied by Takahiro Kambe in PR kern/7639, also fixes PR kern/7632
by Bjoern Labitzke.
 1.16 11-May-1999  thorpej * Start out with a data link type of DLT_NULL. When we change an interface
to serial encap, change its data link type to DLT_PPP_SERIAL.
* Work around some serious bogosity in the filtering code which utterly
breaks proper functioning of BPF. The PPP code and pppd(8) WILL be changed
to fix this.
 1.15 12-Dec-1998  christos branches: 1.15.4;
#include "opt_ppp.h" otherwise struct ppp_softc can be the wrong size
(From mycroft)
 1.14 02-Aug-1998  sommerfe branches: 1.14.4;
Fix PR5898: ppp delays last packet.
 1.13 25-Mar-1997  christos make sure that the tty layer restarts the ppp layer when there is an error
such as out of buffer space.
 1.12 24-Mar-1997  christos Add missing slpx(); from Bill Sommerfeld
 1.11 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.10 13-Oct-1996  christos backout previous kprintf change
 1.9 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.8 07-Sep-1996  mycroft Implement poll(2).
 1.7 14-Jun-1996  cgd avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.
 1.6 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.5 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.4 13-Feb-1996  christos Net prototypes
 1.3 05-Oct-1995  mycroft Add some missing statistics. From Thorsten Lockert.
 1.2 04-Jul-1995  paulus Change $Id to $NetBSD
 1.1 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.14.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.15.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.18.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.18.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.18.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.18.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.18.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.23.2.9 17-Sep-2002  nathanw Catch up to -current.
 1.23.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.23.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.23.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.23.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.23.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.23.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.23.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.23.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.25.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.25.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.25.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.25.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.25.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.25.2.1 03-Aug-2001  lukem update to -current
 1.26.2.2 13-Oct-2001  fvdl Revert the t_dev -> t_devvp change in struct tty. The way that tty
structs are currently used (especially by console ttys) aren't
ready for it, and this will require quite a few changes.
 1.26.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.30.4.2 15-Jul-2002  gehenna catch up with -current.
 1.30.4.1 16-May-2002  gehenna Add the character device switch.
Replace the direct-access to devsw table with calling devsw APIs.
 1.33.2.6 11-Dec-2005  christos Sync with head.
 1.33.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.33.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.33.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.1 03-Aug-2004  skrll Sync with HEAD
 1.34.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.34.8.1 29-Apr-2005  kent sync with -current
 1.38.8.1 29-Nov-2005  yamt sync with head.
 1.38.2.7 27-Feb-2008  yamt sync with head.
 1.38.2.6 07-Dec-2007  yamt sync with head
 1.38.2.5 15-Nov-2007  yamt sync with head.
 1.38.2.4 03-Sep-2007  yamt sync with head.
 1.38.2.3 26-Feb-2007  yamt sync with head.
 1.38.2.2 30-Dec-2006  yamt sync with head.
 1.38.2.1 21-Jun-2006  yamt sync with head.
 1.41.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.41.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.41.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.41.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.41.8.2 11-Aug-2006  yamt sync with head
 1.41.8.1 24-May-2006  yamt sync with head.
 1.41.6.1 01-Jun-2006  kardel Sync with head.
 1.41.4.1 09-Sep-2006  rpaulo sync with head
 1.43.6.2 10-Dec-2006  yamt sync with head.
 1.43.6.1 22-Oct-2006  yamt sync with head
 1.43.4.2 12-Jan-2007  ad Sync with head.
 1.43.4.1 18-Nov-2006  ad Sync with head.
 1.46.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.47.22.2 08-Dec-2007  mjf Sync with HEAD.
 1.47.22.1 19-Nov-2007  mjf Sync with HEAD.
 1.47.20.1 13-Nov-2007  bouyer Sync with HEAD
 1.47.16.3 23-Mar-2008  matt sync with HEAD
 1.47.16.2 09-Jan-2008  matt sync with HEAD
 1.47.16.1 08-Nov-2007  matt sync with -HEAD
 1.47.14.3 03-Dec-2007  joerg Sync with HEAD.
 1.47.14.2 14-Nov-2007  joerg Sync with HEAD.
 1.47.14.1 11-Nov-2007  joerg Sync with HEAD.
 1.47.2.2 23-Oct-2007  ad Sync with head.
 1.47.2.1 05-Apr-2007  ad Compile fixes.
 1.52.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.52.10.4 11-Aug-2010  yamt sync with head.
 1.52.10.3 11-Mar-2010  yamt sync with head
 1.52.10.2 16-May-2009  yamt sync with head
 1.52.10.1 04-May-2009  yamt sync with head.
 1.52.8.1 04-Jun-2008  yamt sync with head
 1.52.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.53.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.53.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.56.4.1 30-May-2010  rmind sync with head
 1.56.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.57.32.1 10-Aug-2014  tls Rebase.
 1.57.18.2 03-Dec-2017  jdolecek update from HEAD
 1.57.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.58.4.3 05-Oct-2016  skrll Sync with HEAD
 1.58.4.2 09-Jul-2016  skrll Sync with HEAD
 1.58.4.1 22-Sep-2015  skrll Sync with HEAD
 1.61.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.61.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.63.8.1 29-Jan-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1727):

sys/net/ppp_tty.c: revision 1.68
sys/net/ppp_tty.c: revision 1.69

Use unsigned to avoid undefined behavior in pppasyncstart().

Use unsigned to avoid undefined behavior. Found by kUBSan.
 1.63.6.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.64.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.64.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.64.4.1 10-Jun-2019  christos Sync with HEAD
 1.64.2.1 26-Jan-2019  pgoyette Sync with HEAD
 1.65.4.1 29-Jan-2022  martin Pull up following revision(s) (requested by msaitoh in ticket #1411):

sys/net/ppp_tty.c: revision 1.68
sys/net/ppp_tty.c: revision 1.69

Use unsigned to avoid undefined behavior in pppasyncstart().

Use unsigned to avoid undefined behavior. Found by kUBSan.
 1.66.2.1 29-Feb-2020  ad Sync with head.
 1.71.2.1 21-Dec-2022  martin Pull up following revision(s) (requested by chs in ticket #19):

external/gpl3/gcc/dist/libsanitizer/sanitizer_common/sanitizer_interceptors_ioctl_netbsd.inc: revision 1.5
external/gpl3/gcc/dist/libsanitizer/sanitizer_common/sanitizer_platform_limits_netbsd.h: revision 1.8
sys/sys/ttycom.h: revision 1.22
sys/net/ppp_tty.c: revision 1.72
external/gpl3/gcc/dist/libsanitizer/sanitizer_common/sanitizer_platform_limits_netbsd.cc: revision 1.9

ppp: remove ioctls that never worked and crash the kernel

Remove vestigial bits of PPP HDLC support that never worked on netbsd.

The TIOCRCVFRAME ioctl was apparently intended to be called only from
within the kernel, but nothing prevents user code from calling this ioctl
and crashing the kernel.
 1.72.6.1 02-Aug-2025  perseant Sync with HEAD
 1.49 18-Oct-2020  gson Suppress the "rn_init: radix functions require max_keylen be set"
message when _KERNEL is defined, to avoid spurious messages from
kernels that have no routable network domains. Fixes PR kern/55691.
 1.48 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.47 12-Dec-2016  ozaki-r branches: 1.47.14; 1.47.16;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.46 15-Nov-2016  ozaki-r Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
 1.45 24-Aug-2015  pooka branches: 1.45.2;
sprinkle _KERNEL_OPT
 1.44 17-Jul-2011  joerg branches: 1.44.12; 1.44.30;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.43 27-May-2009  pooka Make it possible to register delayed radix tree head inits which
will be processed when the radix "subsystem" is initialized -- all
users must be attached before any inits to know the max keylength.
Use of link sets is no longer required, and only attached domains
need to be considered.
 1.42 15-Mar-2009  cegger ansify function definitions
 1.41 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.40 25-Nov-2008  pooka branches: 1.40.4;
Make dom_maxrtkey of inet/inet6domain the size of the ip_encap pack
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
 1.39 11-May-2008  dyoung branches: 1.39.4; 1.39.6;
Use memset, memmove, and memcmp instead of Bzero, Bcopy, and Bcmp,
respectively.
 1.38 12-Jul-2007  dyoung branches: 1.38.28; 1.38.30; 1.38.32; 1.38.34;
Cosmetic: KNF. Shorten a staircase.
 1.37 11-Jul-2007  dyoung Cosmetic: KNF.
 1.36 13-Jun-2007  dyoung Remove unnecessary __UNCONST().
 1.35 09-Jun-2007  dyoung Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.34 04-Mar-2007  christos branches: 1.34.2; 1.34.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.33 17-Feb-2007  dyoung branches: 1.33.2;
Clean this code up some.

Extract subroutine rn_delete1() to ease RADIX_MPATH integration,
should we ever do that.

Remove RN_DEBUG code that does not compile.

Join some lines of the type

type var1;
type var2;
type var3;

making

type var1, var2, var3.

Break lines of the type if (expr) stmt1; else stmt2; so that normal
people can read them.
 1.32 04-Dec-2006  dyoung Extract subroutines rn_walkfirst() and rn_walknext() from rn_walktree().
No functional change intended.

Add some new diagnostic code, bracketed by #ifdef RN_DEBUG, that
uses the two new subroutines to walk and print a tree.

XXX The format of the diagnostic print-outs needs improvement.
 1.31 25-Feb-2006  wiz branches: 1.31.14; 1.31.16;
Fix typos, reported by Alexey Dobriyan ("Gathered from Linux"),
forwarded by jmc@openbsd.
 1.30 11-Dec-2005  christos branches: 1.30.2; 1.30.4; 1.30.6;
merge ktrace-lwp.
 1.29 29-May-2005  christos branches: 1.29.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.28 26-Feb-2005  perry nuke trailing whitespace
 1.27 24-Jan-2005  enami branches: 1.27.2;
To fix bad pointer dereference on start up when gif is used,
- Allow rn_init() to be called multiple times, but do nothing except the
first call.
- Include opt_inet.h so that #ifdef INET works.
- Call rn_init() from encap_init() explicitly rather than depending on the
order of initialization.
 1.26 23-Jan-2005  matt Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.25 06-Dec-2004  christos branches: 1.25.4;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.24 17-Aug-2004  itojun initialize max_keylen for ip_encap.c earlier
 1.23 21-Apr-2004  matt ANSI-fy and some additional de-__P and constification.
 1.22 21-Apr-2004  christos fix constification botch. (hi gimpy)
 1.21 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.20 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 25-Nov-2002  thorpej branches: 1.19.6;
Avoid strict-alias warnings.
 1.18 12-Nov-2001  lukem add RCSIDs
 1.17 10-Jan-2001  itojun branches: 1.17.2; 1.17.4;
fix indentation
 1.16 04-Jan-2001  enami Missing newline in log messsage.
 1.15 17-Dec-2000  itojun fix typo in function name (rn_satsifies_leaf -> satisfies). indent.
split rn_inithead() into two function - i'm putting some hook around here.
 1.14 30-Mar-2000  augustss Kill some more register declarations.
 1.13 01-Mar-1998  fvdl branches: 1.13.12; 1.13.14;
Merge with Lite2 + local changes
 1.12 02-Apr-1997  christos Sync with Lite2.
 1.11 16-Mar-1996  christos - fix misparenthesized ((a&(B|C) == 0))
- fix printf format arguments
 1.10 13-Feb-1996  christos Net prototypes
 1.9 17-May-1995  mycroft Newer version from CSRG.
 1.8 28-Mar-1995  jtc KERNEL -> _KERNEL
 1.7 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.6 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.5 14-Mar-1994  glass put declarations in argument order
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 04-Sep-1993  jtc branches: 1.3.2;
include systm.h to get prototypes (and possibly inlines) of *max functions.
 1.2 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.4 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.2.2 14-Nov-1993  mycroft Munged a directory name in last change.
 1.3.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.13.14.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.13.14.2 05-Jan-2001  bouyer Sync with HEAD
 1.13.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.12.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.13.12.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.13.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.17.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.2.2 11-Dec-2002  thorpej Sync with HEAD.
 1.17.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.19.6.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.6.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.6.6 24-Jan-2005  skrll Sync with HEAD.
 1.19.6.5 18-Dec-2004  skrll Sync with HEAD.
 1.19.6.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.6.3 18-Sep-2004  skrll Sync with HEAD.
 1.19.6.2 25-Aug-2004  skrll Sync with HEAD.
 1.19.6.1 03-Aug-2004  skrll Sync with HEAD
 1.25.4.1 29-Apr-2005  kent sync with -current
 1.27.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.29.2.4 03-Sep-2007  yamt sync with head.
 1.29.2.3 26-Feb-2007  yamt sync with head.
 1.29.2.2 30-Dec-2006  yamt sync with head.
 1.29.2.1 21-Jun-2006  yamt sync with head.
 1.30.6.1 22-Apr-2006  simonb Sync with head.
 1.30.4.1 09-Sep-2006  rpaulo sync with head
 1.30.2.1 01-Mar-2006  yamt sync with head.
 1.31.16.1 10-Dec-2006  yamt sync with head.
 1.31.14.1 12-Jan-2007  ad Sync with head.
 1.33.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.33.2.1 17-Feb-2007  rmind file radix.c was added on branch yamt-idlelwp on 2007-03-12 05:59:15 +0000
 1.34.4.1 11-Jul-2007  mjf Sync with head.
 1.34.2.1 15-Jul-2007  ad Sync with head.
 1.38.34.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.38.32.3 20-Jun-2009  yamt sync with head
 1.38.32.2 04-May-2009  yamt sync with head.
 1.38.32.1 16-May-2008  yamt sync with head.
 1.38.30.1 18-May-2008  yamt sync with head.
 1.38.28.2 17-Jan-2009  mjf Sync with HEAD.
 1.38.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.39.6.2 28-Apr-2009  skrll Sync with HEAD.
 1.39.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.39.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.40.4.2 23-Jul-2009  jym Sync with HEAD.
 1.40.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.44.30.3 05-Feb-2017  skrll Sync with HEAD
 1.44.30.2 05-Dec-2016  skrll Sync with HEAD
 1.44.30.1 22-Sep-2015  skrll Sync with HEAD
 1.44.12.1 03-Dec-2017  jdolecek update from HEAD
 1.45.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.47.16.1 10-Jun-2019  christos Sync with HEAD
 1.47.14.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.23 15-Nov-2016  ozaki-r Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
 1.22 27-May-2009  pooka branches: 1.22.22; 1.22.40; 1.22.44;
Make it possible to register delayed radix tree head inits which
will be processed when the radix "subsystem" is initialized -- all
users must be attached before any inits to know the max keylength.
Use of link sets is no longer required, and only attached domains
need to be considered.
 1.21 05-Feb-2009  dyoung branches: 1.21.2;
Cosmetic: break a line, change some spaces to tabs, remove an extra
empty line.
 1.20 11-May-2008  dyoung branches: 1.20.6;
Bzero, Bcmp, and Bcopy are not used any more, so delete them.
 1.19 09-Jun-2007  dyoung branches: 1.19.28; 1.19.30; 1.19.32; 1.19.34;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.18 17-Feb-2007  dyoung branches: 1.18.2; 1.18.6; 1.18.8;
Clean this code up some.

Extract subroutine rn_delete1() to ease RADIX_MPATH integration,
should we ever do that.

Remove RN_DEBUG code that does not compile.

Join some lines of the type

type var1;
type var2;
type var3;

making

type var1, var2, var3.

Break lines of the type if (expr) stmt1; else stmt2; so that normal
people can read them.
 1.17 22-Oct-2006  christos don't leak kernel variable declarations to userland.
 1.16 10-Dec-2005  elad branches: 1.16.20; 1.16.22;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.15 29-May-2005  christos branches: 1.15.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.14 21-Apr-2004  matt ANSI-fy and some additional de-__P and constification.
 1.13 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.12 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.11 17-Dec-2000  itojun branches: 1.11.24;
fix typo in function name (rn_satsifies_leaf -> satisfies). indent.
split rn_inithead() into two function - i'm putting some hook around here.
 1.10 06-Nov-2000  itojun avoid namespace pollution by radix.h. the #ifndef _KERNEL portion was to
use radix.c in userland compilation, however, noone is using it.
(routed has its own radix.c)
 1.9 02-Apr-1997  christos branches: 1.9.20; 1.9.22; 1.9.32;
Sync with Lite2.
 1.8 13-Feb-1996  christos Net prototypes
 1.7 17-May-1995  mycroft Newer version from CSRG.
 1.6 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.32.1 20-Feb-2002  he Pull up revision 1.10 (requested by he):
Avoid namespace pollution, only define certain macros under _KERNEL.
 1.9.22.2 05-Jan-2001  bouyer Sync with HEAD
 1.9.22.1 22-Nov-2000  bouyer Sync with HEAD.
 1.9.20.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.9.20.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.9.20.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.11.24.5 11-Dec-2005  christos Sync with head.
 1.11.24.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.11.24.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.24.2 18-Sep-2004  skrll Sync with HEAD.
 1.11.24.1 03-Aug-2004  skrll Sync with HEAD
 1.15.2.4 03-Sep-2007  yamt sync with head.
 1.15.2.3 26-Feb-2007  yamt sync with head.
 1.15.2.2 30-Dec-2006  yamt sync with head.
 1.15.2.1 21-Jun-2006  yamt sync with head.
 1.16.22.1 10-Dec-2006  yamt sync with head.
 1.16.20.1 18-Nov-2006  ad Sync with head.
 1.18.8.1 11-Jul-2007  mjf Sync with head.
 1.18.6.1 15-Jul-2007  ad Sync with head.
 1.18.2.2 17-Feb-2007  dyoung Clean this code up some.

Extract subroutine rn_delete1() to ease RADIX_MPATH integration,
should we ever do that.

Remove RN_DEBUG code that does not compile.

Join some lines of the type

type var1;
type var2;
type var3;

making

type var1, var2, var3.

Break lines of the type if (expr) stmt1; else stmt2; so that normal
people can read them.
 1.18.2.1 17-Feb-2007  dyoung file radix.h was added on branch yamt-idlelwp on 2007-02-17 07:46:39 +0000
 1.19.34.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.19.32.3 20-Jun-2009  yamt sync with head
 1.19.32.2 04-May-2009  yamt sync with head.
 1.19.32.1 16-May-2008  yamt sync with head.
 1.19.30.1 18-May-2008  yamt sync with head.
 1.19.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.20.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.21.2.1 23-Jul-2009  jym Sync with HEAD.
 1.22.44.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.22.40.1 05-Dec-2016  skrll Sync with HEAD
 1.22.22.1 03-Dec-2017  jdolecek update from HEAD
 1.24 25-Sep-2017  ozaki-r Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
 1.23 27-Jul-2017  ozaki-r Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
 1.22 21-May-2014  rmind branches: 1.22.4; 1.22.20;
raw_detach: rawpcb may be embedded, free using the real size (saved in rcb).
 1.21 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.20 04-Aug-2008  matt branches: 1.20.38; 1.20.44; 1.20.54;
Remove the pcb from the rawcb list before sofree'ing it.
Don't reacquire softnet_lock until after we've freed the pcb.
 1.19 24-Apr-2008  ad branches: 1.19.2; 1.19.4; 1.19.8;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.18 04-Mar-2007  christos branches: 1.18.36; 1.18.38;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.17 11-Dec-2005  thorpej branches: 1.17.26;
ANSI function decls and application of static.
 1.16 11-Dec-2005  christos merge ktrace-lwp.
 1.15 26-Feb-2005  perry branches: 1.15.4;
nuke trailing whitespace
 1.14 07-Aug-2003  agc branches: 1.14.8; 1.14.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 12-May-2002  matt branches: 1.13.10;
Eliminate more commons.
 1.12 12-Nov-2001  lukem add RCSIDs
 1.11 30-Mar-2000  augustss branches: 1.11.6; 1.11.8;
Kill some more register declarations.
 1.10 23-May-1996  mycroft branches: 1.10.28;
We must indirect through the higher-level protocol for
PRU_{BIND,CONNECT} so that it can check the sockaddr.
 1.9 13-Feb-1996  christos branches: 1.9.4;
Net prototypes
 1.8 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.7 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.6 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.5 16-Jan-1994  cgd include <machine/cpu.h> not <machine/mtpr.h>
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 22-May-1993  cgd branches: 1.3.4;
add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.2 14-Nov-1993  mycroft Canonicalize all #includes.
 1.3.4.1 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.9.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.10.28.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.11.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.11.6.1 14-Nov-2001  nathanw Catch up to -current.
 1.13.10.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.13.10.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.10.2 18-Sep-2004  skrll Sync with HEAD.
 1.13.10.1 03-Aug-2004  skrll Sync with HEAD
 1.14.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.14.8.1 29-Apr-2005  kent sync with -current
 1.15.4.2 03-Sep-2007  yamt sync with head.
 1.15.4.1 21-Jun-2006  yamt sync with head.
 1.17.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.18.38.1 18-May-2008  yamt sync with head.
 1.18.36.2 28-Sep-2008  mjf Sync with HEAD.
 1.18.36.1 02-Jun-2008  mjf Sync with HEAD.
 1.19.8.1 19-Oct-2008  haad Sync with HEAD.
 1.19.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.19.2.1 04-May-2009  yamt sync with head.
 1.20.54.1 10-Aug-2014  tls Rebase.
 1.20.44.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.20.38.2 03-Dec-2017  jdolecek update from HEAD
 1.20.38.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.22.20.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.22.4.1 28-Aug-2017  skrll Sync with HEAD
 1.30 07-Sep-2018  maxv Make raw_input non-variadic.
 1.29 11-May-2018  roy branches: 1.29.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.
 1.28 25-Sep-2017  ozaki-r branches: 1.28.2;
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
 1.27 11-Apr-2017  roy branches: 1.27.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.
 1.26 20-Jan-2016  riastradh branches: 1.26.2; 1.26.4;
Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.25 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.24 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.23 05-Aug-2014  rtr branches: 1.23.4;
split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.22 21-May-2014  rmind raw_detach: rawpcb may be embedded, free using the real size (saved in rcb).
 1.21 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.20 17-Feb-2007  dyoung branches: 1.20.88; 1.20.94; 1.20.104;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.19 11-Dec-2005  thorpej branches: 1.19.26;
ANSI function decls and application of static.
 1.18 11-Dec-2005  christos merge ktrace-lwp.
 1.17 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.16 07-Aug-2003  agc branches: 1.16.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.15 29-Jun-2003  fvdl branches: 1.15.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.14 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.13 12-May-2002  matt Eliminate more commons.
 1.12 09-Feb-1998  perry branches: 1.12.26; 1.12.28;
add multiple inclusion protection (and cleanup).
 1.11 28-May-1996  pk Prototype new raw_*() functions.
 1.10 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.9 13-Feb-1996  christos branches: 1.9.4;
Net prototypes
 1.8 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.7 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 06-Dec-1993  hpeyerl multicast support.
From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.12.28.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.12.26.1 20-Jun-2002  nathanw Catch up to -current.
 1.15.2.5 11-Dec-2005  christos Sync with head.
 1.15.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.15.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.15.2.2 03-Aug-2004  skrll Sync with HEAD
 1.15.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.16.16.2 26-Feb-2007  yamt sync with head.
 1.16.16.1 21-Jun-2006  yamt sync with head.
 1.19.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.20.104.1 10-Aug-2014  tls Rebase.
 1.20.94.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.20.88.2 03-Dec-2017  jdolecek update from HEAD
 1.20.88.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.23.4.3 28-Aug-2017  skrll Sync with HEAD
 1.23.4.2 19-Mar-2016  skrll Sync with HEAD
 1.23.4.1 06-Jun-2015  skrll Sync with HEAD
 1.26.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.26.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.27.4.2 12-May-2018  martin Pull up following revision(s) (requested by roy in ticket #821):

sys/netinet6/in6_proto.c: revision 1.125
sys/net/raw_cb.h: revision 1.29
sys/kern/uipc_usrreq.c: revision 1.186

Increase the default size of some receive buffers from 8k to 16k.

This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.
 1.27.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.28.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.28.2.1 21-May-2018  pgoyette Sync with HEAD
 1.29.2.1 10-Jun-2019  christos Sync with HEAD
 1.66 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.65 02-Sep-2022  thorpej branches: 1.65.10;
Remove unnecessary inclusion of <net/netisr.h>.
 1.64 02-Aug-2019  ozaki-r Fix typo (s/m_free/m_freem/) (one more)
 1.63 02-Aug-2019  ozaki-r Fix typo (s/m_free/m_freem/)

This fixes PR kern/54419 "mbuf leak when deleting route" from sc dying.
 1.62 07-Sep-2018  maxv branches: 1.62.4;
Make raw_input non-variadic.
 1.61 09-May-2018  maxv branches: 1.61.2;
Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
 1.60 26-Apr-2018  maxv m_copy -> m_copym
 1.59 19-Mar-2018  roy socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.
 1.58 25-Sep-2017  ozaki-r branches: 1.58.2;
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
 1.57 25-Sep-2017  ozaki-r Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
 1.56 11-Apr-2017  roy branches: 1.56.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.
 1.55 20-Jan-2016  riastradh branches: 1.55.2; 1.55.4;
Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.54 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.53 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.52 09-Aug-2014  rtr branches: 1.52.4;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.51 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.50 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.49 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.48 31-Jul-2014  rtr fix missed conversion to call to pr_connect() from pr_generic() when
PRU_CONNECT split was done.

- error = (*so->so_proto->pr_usrreqs->pr_generic)(so,
- PRU_CONNECT, NULL, nam, NULL, l);
+ error = (*so->so_proto->pr_usrreqs->pr_connect)(so, nam);

without this change KASSERT() would be triggered if raw send needs to
perform a connect.
 1.47 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.46 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.45 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.44 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.43 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.42 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.41 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.40 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.39 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.38 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.37 17-Jul-2011  joerg branches: 1.37.12; 1.37.16; 1.37.26;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.36 11-Jan-2011  pooka Apply patch from PR kern/44369 by Wolfgang Stukenbrock.
 1.35 29-May-2008  dyoung branches: 1.35.8; 1.35.14; 1.35.20; 1.35.22;
Delete local variable 'sockets', whose value is never used. Reported
by J.T. Conklin.
 1.34 24-Apr-2008  ad branches: 1.34.2; 1.34.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.33 06-May-2007  dyoung branches: 1.33.28; 1.33.30;
Cosmetic: make the macro 'equal' into an inline subroutine, bcmp
-> memcmp, bcopy -> memcpy, 0 -> NULL, shorten staircases, remove
needless cast to int.
 1.32 04-Mar-2007  christos branches: 1.32.2; 1.32.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.31 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.30 16-Nov-2006  christos branches: 1.30.4;
__unused removal on arguments; approved by core.
 1.29 25-Oct-2006  elad Introduce KAUTH_REQ_NETWORK_SOCKET_OPEN, to check if opening a socket is
allowed. It takes three int * arguments indicating domain, type, and
protocol. Replace previous KAUTH_REQ_NETWORK_SOCKET_RAWSOCK with it (but
keep it still).

Places that used to explicitly check for privileged context now don't
need it anymore, so I replaced these with XXX comment indiacting it for
future reference.

Documented and updated examples as well.
 1.28 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.27 23-Jul-2006  ad branches: 1.27.4; 1.27.6;
Use the LWP cached credentials where sane.
 1.26 14-May-2006  elad integrate kauth.
 1.25 11-Dec-2005  thorpej branches: 1.25.4; 1.25.6; 1.25.8; 1.25.10; 1.25.12;
ANSI function decls and application of static.
 1.24 11-Dec-2005  christos merge ktrace-lwp.
 1.23 26-Feb-2005  perry branches: 1.23.4;
nuke trailing whitespace
 1.22 26-Apr-2004  matt branches: 1.22.4; 1.22.6;
Remove #else of #if __STDC__
 1.21 30-Sep-2003  christos Fix off-by-one in PRC_NCMDS check. From FreeBSD via OpenBSD
 1.20 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 29-Jun-2003  fvdl branches: 1.19.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.18 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.17 12-Nov-2001  lukem add RCSIDs
 1.16 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.15 30-Mar-2000  augustss branches: 1.15.6; 1.15.8; 1.15.12;
Kill some more register declarations.
 1.14 28-May-1996  pk branches: 1.14.28;
Remove unused variable.
 1.13 23-May-1996  mycroft Fix race condition in PRU_DISCONNECT.
Unimplement PRU_ABORT, as it's not needed and wasn't correct.
Some stylistic cleanup.
Make sure the control mbufs are freed in all cases.
We must indirect through the higher-level protocol for
PRU_{BIND,CONNECT} so that it can check the sockaddr.
 1.12 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.11 13-Feb-1996  christos branches: 1.11.4;
Net prototypes
 1.10 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.9 22-Apr-1995  cgd quiet compiler warning via (ugly) cast
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.6 16-Jan-1994  cgd include <machine/cpu.h> not <machine/mtpr.h>
 1.5 06-Jan-1994  deraadt don't need to #include <sys/socket.h> twice.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 22-May-1993  cgd branches: 1.3.4;
add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.2 14-Nov-1993  mycroft Canonicalize all #includes.
 1.3.4.1 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.11.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.14.28.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.15.12.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.15.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.6.1 14-Nov-2001  nathanw Catch up to -current.
 1.19.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.19.2.2 03-Aug-2004  skrll Sync with HEAD
 1.19.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.22.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.22.4.1 29-Apr-2005  kent sync with -current
 1.23.4.4 03-Sep-2007  yamt sync with head.
 1.23.4.3 26-Feb-2007  yamt sync with head.
 1.23.4.2 30-Dec-2006  yamt sync with head.
 1.23.4.1 21-Jun-2006  yamt sync with head.
 1.25.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.25.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.25.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.25.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.25.8.2 11-Aug-2006  yamt sync with head
 1.25.8.1 24-May-2006  yamt sync with head.
 1.25.6.1 01-Jun-2006  kardel Sync with head.
 1.25.4.1 09-Sep-2006  rpaulo sync with head
 1.27.6.2 10-Dec-2006  yamt sync with head.
 1.27.6.1 22-Oct-2006  yamt sync with head
 1.27.4.1 18-Nov-2006  ad Sync with head.
 1.30.4.3 07-May-2007  yamt sync with head.
 1.30.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.30.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.32.4.1 11-Jul-2007  mjf Sync with head.
 1.32.2.1 08-Jun-2007  ad Sync with head.
 1.33.30.2 04-Jun-2008  yamt sync with head
 1.33.30.1 18-May-2008  yamt sync with head.
 1.33.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.34.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.34.2.1 04-May-2009  yamt sync with head.
 1.35.22.1 16-Jan-2011  bouyer Pull up following revision(s) (requested by pooka in ticket #1529):
sys/net/raw_usrreq.c: revision 1.36
Apply patch from PR kern/44369 by Wolfgang Stukenbrock.
 1.35.20.1 05-Mar-2011  rmind sync with head
 1.35.14.1 16-Jan-2011  bouyer Pull up following revision(s) (requested by pooka in ticket #1529):
sys/net/raw_usrreq.c: revision 1.36
Apply patch from PR kern/44369 by Wolfgang Stukenbrock.
 1.35.8.1 16-Jan-2011  bouyer Pull up following revision(s) (requested by pooka in ticket #1529):
sys/net/raw_usrreq.c: revision 1.36
Apply patch from PR kern/44369 by Wolfgang Stukenbrock.
 1.37.26.1 10-Aug-2014  tls Rebase.
 1.37.16.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.37.12.2 03-Dec-2017  jdolecek update from HEAD
 1.37.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.52.4.3 28-Aug-2017  skrll Sync with HEAD
 1.52.4.2 19-Mar-2016  skrll Sync with HEAD
 1.52.4.1 06-Jun-2015  skrll Sync with HEAD
 1.55.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.55.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.56.4.3 04-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1318):

sys/net/raw_usrreq.c: revision 1.63
sys/net/raw_usrreq.c: revision 1.64

Fix typo (s/m_free/m_freem/)
This fixes PR kern/54419 "mbuf leak when deleting route" from sc dying.

-

Fix typo (s/m_free/m_freem/) (one more)
 1.56.4.2 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.56.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.58.2.4 30-Sep-2018  pgoyette Ssync with HEAD
 1.58.2.3 21-May-2018  pgoyette Sync with HEAD
 1.58.2.2 02-May-2018  pgoyette Synch with HEAD
 1.58.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.61.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.61.2.1 10-Jun-2019  christos Sync with HEAD
 1.62.4.1 04-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #7):

sys/net/raw_usrreq.c: revision 1.63
sys/net/raw_usrreq.c: revision 1.64

Fix typo (s/m_free/m_freem/)
This fixes PR kern/54419 "mbuf leak when deleting route" from sc dying.

-

Fix typo (s/m_free/m_freem/) (one more)
 1.65.10.1 02-Aug-2025  perseant Sync with HEAD
 1.240 21-Sep-2025  christos Centralize all the "can't handle af%d\n", messages in one place and provide
more context. Now I get ad-nauseam:
ether_output: wm1: can't handle af18 (link: link#2)
 1.239 12-Jun-2025  ozaki-r route: lower the priority of the workqueues

PRI_SOFTNET makes the kthread of a workqueue SCHED_RR which can monopolize
a CPU if there are many rtentries to free in rt_free_work. So lower the
prirority of the workqueues to PRI_USER which is the scheduling class for
time-sharing.

Also change rt_timer_wq as well just in case.
 1.238 12-Jun-2025  ozaki-r route: do ifa_rtrequest() before rt_addaddr()

ifa_rtrequest() could change a given rtentry in the routing table.
 1.237 05-Jun-2023  ozaki-r branches: 1.237.6;
route: run workqueue kthreads with KERNEL_LOCK unless NET_MPSAFE

Without KERNEL_LOCK, rt_timer_work and rt_free_work can run in parallel
with other LWPs running in the network stack, which eventually results
in say use-after-free of a deleted route.
 1.236 22-Dec-2022  riastradh route(4): Work around deadlock in rt_free wait path.

PR kern/56844

XXX pullup-8
XXX pullup-9
XXX pullup-10
 1.235 25-Nov-2022  knakahara branches: 1.235.2;
Support explicit unnumbered interface.

Currently, NetBSD supports implicit unnumbered interface by setting
the same IP address to two interfaces. However, such interface is not
treated as unnumbered when one of the interfaces is being changed and
has been changed IP address. That behavior can be harmful for some
routing daemons.
 1.234 20-Sep-2022  knakahara Remove routes on an address removal if the routes referencing to the address. Implemented by ozaki-r@n.o.

A route that has a gateway is on a connected route can be invalid if the
connected route is deleted, i.e., an associated address is removed.
Traditionally NetBSD doesn't sweep such a route on the address removal. Sending
packets over the route fails with "No route to host". Also the route holds an
orphan ifaddr as rt_ifa that is destructed say by in_purgeaddr.

If the same address is assgined again in such a state, there can be two
different ifaddr objects with the same address. Until recently it's not a
big problem because we can send packets anyway. However after MP-ification
of the network stack, we can't send packets because we strictly check if rt_ifa
(i.e., the (old) ifaddr) is valid.

This change automatically removes such routes on a removal of an associated
address to avoid keeping inconsistent routes.
 1.233 29-Aug-2022  knakahara Fix build failure when no options INET6.
 1.232 29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.231 26-Aug-2022  knakahara Refactor: rtrequest_newmsg() is no longer used after nd6_rtr.c:r1.149

That has bumped up to 9.99.66 when nd6_rtr.c:r1.149 was commited.
 1.230 05-Dec-2021  msaitoh s/gurantee/guarantee/ in comment.
 1.229 08-Apr-2020  knakahara Fix typo in comment
 1.228 01-Apr-2020  knakahara Fix typo in comment.
 1.227 01-Feb-2020  riastradh Switch sys/net to percpu_create.
 1.226 13-Nov-2019  ozaki-r branches: 1.226.2;
Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.225 03-Oct-2019  knakahara Revert route.c:r1.224 to fix net/arp/t_arp and net/ndp/t_ndp failure.

And refactor a little. Discussed with ozaki-r@n.o.
 1.224 30-Sep-2019  knakahara Fix a ifa_release() leak for a specific struct rt_addrinfo.

ok by ozaki-r@n.o
 1.223 25-Sep-2019  ozaki-r Make panic messages more informative
 1.222 23-Sep-2019  rin Stop passing a large const structure by value, in order to avoid
possible kernel stack overflow; const pointer is suffice here.

Pointed out by the lgtm bot and kamil.

OK ozaki-r

XXX
pullup to netbsd-9
 1.221 19-Sep-2019  ozaki-r Add missing #include <sys/kmem.h>
 1.220 19-Sep-2019  ozaki-r Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.219 17-May-2019  ozaki-r branches: 1.219.2;
Implement an aggressive psref leak detector

It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.

Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted. A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread. Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.

The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire). We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.

The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.

Proposed on tech-kern
 1.218 29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.217 11-Mar-2019  ozaki-r Add missing ifa_release on error paths
 1.216 30-Oct-2018  ozaki-r Use rt_update framework on updating a rtentry
 1.215 30-Oct-2018  ozaki-r Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
 1.214 30-Oct-2018  ozaki-r Avoid a dangling pointer during rt_replace_ifa
 1.213 05-Sep-2018  ozaki-r route: avoid overwriting rt_free_global.enqueued unexpectedly

rt_free_global.enqueued can be set to true by rt_free during rt_free_work
because rt_free_work releases rt_free_global.lock. So rt_free_work must update
it once and not update after releasing the lock.
 1.212 05-Sep-2018  ozaki-r route: don't take an extra reference of a rtentry for the delayed free mechanism

Because a reference is already taken at that point.
 1.211 12-Jul-2018  ozaki-r Don't use aprint_* functions for logging unrelated to autoconf(9)
 1.210 01-Jun-2018  ozaki-r branches: 1.210.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).
 1.209 12-Apr-2018  ozaki-r Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.
 1.208 05-Apr-2018  ozaki-r Kill remaining rt->rt_refcnt++
 1.207 23-Mar-2018  ozaki-r Don't take RT_LOCK in DDB

It definitely causes a diagnostic failure if LOCKDEBUG is enabled.
 1.206 30-Jan-2018  ozaki-r branches: 1.206.2;
Prevent rt_free_global.wk from being enqueued to workqueue doubly
 1.205 23-Jan-2018  ozaki-r Fix a return value of rt_update_prepare

Callers expect it to be an errno.
 1.204 19-Jan-2018  ozaki-r Suppress noisy debugging outputs

Even if DEBUG they are too noisy under load.
 1.203 09-Jan-2018  christos Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8
 1.202 05-Jan-2018  christos Don't stomp past the end of the array! need __arraycount not sizeof()
Found by chuq, while debugging the sdf.org crashes
XXX: pullup-8
Restructure a bit for readability.
 1.201 25-Sep-2017  ozaki-r Synchronize on rtcache_generation with rtlock

It's racy if NET_MPSAFE is enabled.

Pointed out by joerg@
 1.200 22-Sep-2017  ozaki-r Remove the global lock for rtcache

Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
 1.199 21-Sep-2017  ozaki-r Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
 1.198 21-Sep-2017  ozaki-r Remove unnecessary NULL check of rt_ifp

It's always non-NULL.
 1.197 28-Jun-2017  ozaki-r Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes

They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
 1.196 22-Jun-2017  ozaki-r Purge all related L2 caches on removing a route

The change addresses situations similar to PR 51179.
 1.195 22-Jun-2017  ozaki-r Fix locking in rtalloc1 (affected only if NET_MPSAFE)
 1.194 24-Mar-2017  ozaki-r branches: 1.194.6;
Forbit installing a route which its gateway is unreachable

This change needs a tweak in route_output_change to unbreak route
change commands (e.g., route change -inet6 default -reject).

PR kern/52077 (s-yamaguchi@IIJ and ozaki-r@)
 1.193 22-Mar-2017  ozaki-r Tweak and KNF some functions
 1.192 20-Feb-2017  ozaki-r Make updating a rtentry in rtinit MP-safe
 1.191 17-Feb-2017  ozaki-r Make NOMPSAFE comments informative
 1.190 10-Feb-2017  ozaki-r Ensure that nobody references a rtentry that is passed to rt_setgate
 1.189 10-Feb-2017  ozaki-r Fix locking against myself in ifa_ifwithroute_psref

It happened on the path: rtrequest1 => rt_getifa => ifa_ifwithroute_psref.

Reported by ryo@
 1.188 19-Jan-2017  ozaki-r Disable rt_update mechanism by default

This is a workaround for PR kern/51877. Enable again once the issue
is fixed.
 1.187 17-Jan-2017  ozaki-r Fix typo in comments
 1.186 11-Jan-2017  ozaki-r branches: 1.186.2;
Get rid of unnecessary header inclusions
 1.185 21-Dec-2016  ozaki-r Don't call psref_target_destroy unless NET_MPSAFE

We don't need it if NET_MPSAFE off and also it causes lockup
sometimes because of calling it with holding softnet_lock.
 1.184 21-Dec-2016  ozaki-r Fix kernel build with RT_DEBUG and !NET_MPSAFE
 1.183 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.182 15-Nov-2016  ozaki-r Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
 1.181 25-Oct-2016  ozaki-r Remove unnecessary argument

No functional change.
 1.180 24-Oct-2016  ozaki-r Revert v1.157

We need to hold the rtentry over rtrequest1 for info that dereferences
member variables of the rtentry after rtrequest1.
 1.179 21-Oct-2016  ozaki-r Delete rt_timers on RTM_DELETE surely

We want to ensure that a rtentry is referenced by nobody after
RTM_DELETE (except for the caller). However, rt_timer could
have a reference to the rtentry after that.
 1.178 21-Oct-2016  ozaki-r Remove unnecessary argument

No functional change.
 1.177 21-Oct-2016  ozaki-r Make some rt_timer functions and variables static

No functional change.
 1.176 21-Oct-2016  ozaki-r Avoid temporal dangling reference
 1.175 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.174 05-Aug-2016  ozaki-r CID 1364759: fix using uninitialized value
 1.173 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.172 15-Jul-2016  martin Mark the rt_timer callout MPSAFE and move the first reset a few lines
down so the the workqueue is properly prepared (the latter being more
a cosmetical change). Ok: ozaki-r@
 1.171 13-Jul-2016  hannken branches: 1.171.2;
rtcache_clear_rtentry: use LIST_FOREACH_SAFE as the element gets
removed from the list.
 1.170 11-Jul-2016  ozaki-r Run timers in workqueue

Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).

Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.

Proposed on tech-net and tech-kern.
 1.169 07-Jul-2016  msaitoh KNF. Remove extra spaces. No functional change.
 1.168 01-Jul-2016  ozaki-r Make sure to free all interface addresses in if_detach

Addresses of an interface (struct ifaddr) have a (reverse) pointer of an
interface object (ifa->ifa_ifp). If the addresses are surely freed when
their interface is destroyed, the pointer is always valid and we don't
need a tweak of replacing the pointer to if_index like mbuf.

In order to make sure the assumption, the following changes are required:
- Deactivate the interface at the firstish of if_detach. This prevents
in6_unlink_ifa from saving multicast addresses (wrongly)
- Invalidate rtcache(s) and clear a rtentry referencing an address on
RTM_DELETE. rtcache(s) may delay freeing an address
- Replace callout_stop with callout_halt of DAD timers to ensure stopping
such timers in if_detach
 1.167 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.166 26-Apr-2016  ozaki-r Stop using rt_gwroute completely
 1.165 26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.164 25-Apr-2016  ozaki-r Check error of rt_setgate and rt_settag
 1.163 25-Apr-2016  ozaki-r Don't rt_setkey twice
 1.162 13-Apr-2016  ozaki-r ddb: rename show arptab to show routes

show arptab command of ddb is now inappropriate because it actually dumps
routes but arp entries aren't routes anymore. So rename it to show routes
and move the code from if_arp.c to route.c.

ok christos@
 1.161 11-Apr-2016  ozaki-r Remove out-dated comments and unnecessary splsoftnet for pool_{get,put}
 1.160 07-Apr-2016  christos remove useless cast.
 1.159 07-Apr-2016  christos Don't create an RTM_MISS message for every route allocation.
GC unused code and variables.
 1.158 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.157 01-Apr-2016  ozaki-r Don't request returning rtentry if not use it
 1.156 01-Apr-2016  ozaki-r Remove unnecessary RTTIMER_CALLOUT macro

rttimer#rtt_func never be NULL.
 1.155 01-Apr-2016  ozaki-r Make some global variables static
 1.154 24-Mar-2016  ozaki-r Constify rt_newmsg's arguments
 1.153 22-Dec-2015  ozaki-r Tweak return value handling

rtrequest1 ensures to return an rtentry on success.
 1.152 07-Oct-2015  roy Remove rt_ifa_localrequest().
In it's place, use rtrequest1() inside rt_ifa_addlocal() and
rtdeletemsg() inside rt_ifa_remlocal().

This removes the need for INET/INET6 specific code and allows
greater control over the creation of the local address route.
 1.151 03-Sep-2015  ozaki-r Add refcnt constraint checks for debugging

It's useful to know where the constraint is violated (by extra rtfree).
It's enabled only if DEBUG because it's heavy (O(n)).
 1.150 31-Aug-2015  ozaki-r Make rt_refcnt take into account rt_timer
 1.149 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.148 13-Aug-2015  ozaki-r Remove extra rt_refcnt++ in rtalloc1

rtrequest has already done it. So we don't need to do it once more.

This fixes regressed behavior of ARP cache expiration which an expired
cache doesn't disappear.
 1.147 13-Aug-2015  ozaki-r Move rtfree to a common place

This change also plugs a missing rtfree on an error path.
 1.146 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.145 08-Jun-2015  roy Guard against the possibility the there is no ready address.
 1.144 30-Apr-2015  ozaki-r Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey
 1.143 28-Apr-2015  ozaki-r Fix previous

sockaddr_copy never fail here so we can just return in success.

The previous code worked anyway, so I didn't notice the mistake...
 1.142 07-Apr-2015  ozaki-r Clean up rtcache_setdst

We can assume sockaddr_copy never return NULL when we pass
non-NULL dst (1st argument).
 1.141 06-Apr-2015  ozaki-r Make rt_maskedcopy static
 1.140 06-Apr-2015  ozaki-r Include sys/sysctl.h ifdef RTFLUSH_DEBUG
 1.139 06-Apr-2015  ozaki-r Remove unnecessary inclusions
 1.138 03-Apr-2015  ozaki-r Restructure rtcache_lookup2 to make it clear what it does

No functional change.
 1.137 26-Mar-2015  ozaki-r Remove redundant rtcache_invariants

It's done in rtcache_getdst.
 1.136 26-Feb-2015  roy Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
 1.135 25-Feb-2015  roy Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.
 1.134 02-Dec-2014  christos zero out the sockaddrs when dup'ing.
 1.133 09-Sep-2014  rmind branches: 1.133.2;
Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.132 06-Jun-2014  rmind branches: 1.132.2;
rtfree: let's assert for a non-negative reference count and see what happens.
 1.131 06-Jun-2014  rmind - Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.130 26-Apr-2014  pooka It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.
 1.129 22-Mar-2014  maxv branches: 1.129.2;
'newrt' is not supposed to be NULL. Therefore, the NULL-check in the if()
is pointless; and even if 'newrt' were NULL, 'rt' would be dereferenced
later. This is not a bug.

CID 270855

ok christos@
 1.128 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.127 08-Jun-2013  christos branches: 1.127.2;
PR/44032: Proxy entries stopped working with pppd. The issue here is that
the route entry was added, but the RTF_LLINFO bit was not set, making arp -a
not showing the entry, but netstat -rn -f inet showing it with the missing
L bit. The order of resolution in ifa_ifwithroute() is that if a destination
address is found, then the interface chosen for the route is that of the
destination. This does not work for link-level addresses since the ppp
interface does not arp (uses link_rtrequest, not arp_rtrequest), so the
bit is never set. The easy solution here is to check that the gateway is
a link address, and use the interface which we chose for the link address
as opposed to the interface that routes to the destination. This restores
the previous behavior, but is it correct?
 1.126 30-Jan-2012  christos branches: 1.126.2; 1.126.6;
Count length from the beginning of the structure not the sa_data portion.
From skrll@
 1.125 31-Mar-2011  dyoung branches: 1.125.4; 1.125.8;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.124 01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.123 26-Jun-2010  kefren branches: 1.123.2; 1.123.4;
Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.122 02-May-2010  kefren Permit the existence of a route with unlinked ifp and ifa,
enabling this way the posibility to send a packet on an interface with
source address from another interface.
 1.121 03-Nov-2009  dyoung branches: 1.121.2; 1.121.4;
s/u_quad_t/uint64_t/
 1.120 03-Oct-2009  elad We only care about KAUTH_NETWORK_ROUTE.
 1.119 02-Oct-2009  elad Move routing socket security policy back to the subsystem.
 1.118 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.117 02-Apr-2009  christos Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0
 1.116 24-Mar-2009  roy When a route is deleted or it's ifa changed and it's the connected route
for the ifa we should ensure the IFA_ROUTE flag is removed from the ifa
and if applicable, added to the new ifa.
 1.115 20-Feb-2009  yamt - rtredirect: use sockaddr_cmp directly.
- remove now unused equal.
 1.114 07-Nov-2008  dyoung branches: 1.114.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.113 04-Oct-2008  pooka branches: 1.113.2; 1.113.4;
POOL_INIT -> pool_init
 1.112 13-May-2008  dyoung branches: 1.112.4;
rtinit() should pass RTM_ADD to ifa->ifa_rtrequest instead of cmd,
after all.
 1.111 13-May-2008  dyoung In rtinit(), when cmd == RTM_ADD, pass cmd instead of RTM_ADD to
ifa->ifa_rtrequest(), in preparation for handling rtinit(RTM_CHANGE)
in the RTM_ADD branch.
 1.110 13-May-2008  dyoung Simplify the RT_DPRINTF() calls.
 1.109 11-May-2008  dyoung Use memset, memmove, and memcmp instead of Bzero, Bcopy, and Bcmp,
respectively.
 1.108 28-Apr-2008  martin branches: 1.108.2;
Remove clause 3 and 4 from TNF licenses
 1.107 10-Apr-2008  dyoung branches: 1.107.2; 1.107.4;
Add some assertions that will catch any exception to
ro->ro_sa == NULL implies ro->_ro_rt == NULL.
 1.106 26-Mar-2008  ad Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.
 1.105 21-Jan-2008  dyoung branches: 1.105.6;
In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.
 1.104 14-Jan-2008  dyoung Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.
 1.103 12-Jan-2008  dyoung _rtcache_init(): shorten this by getting out immediately if rtalloc1()
returns NULL.
rtcache_copy(): re-order operations a bit. KASSERT() that we are
not copying a route over itself.
 1.102 10-Jan-2008  dyoung Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.
 1.101 08-Jan-2008  dyoung Delete an unnecessary cast.
 1.100 04-Jan-2008  dyoung Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.
 1.99 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.98 10-Oct-2007  dyoung branches: 1.98.4; 1.98.6; 1.98.10;
Delete dead code.
 1.97 30-Aug-2007  dyoung branches: 1.97.2;
Make rtcache() and rtflush() block IPL_NET while they add/remove
a route from the cached routes list, so that the list won't change
out from under them.
 1.96 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.95 21-Jul-2007  dyoung branches: 1.95.4; 1.95.6; 1.95.8;
Cosmetic: remove superfluous parentheses. Compare pointers with
NULL instead of testing "truth." Remove unnecessary casts to void*
in memset() calls.
 1.94 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.93 09-Jul-2007  ad branches: 1.93.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.92 09-Jun-2007  dyoung Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.91 06-May-2007  dyoung Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.
 1.90 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.89 22-Apr-2007  xtraeme rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.
 1.88 12-Mar-2007  ad branches: 1.88.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.87 04-Mar-2007  christos branches: 1.87.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.86 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.85 17-Feb-2007  dyoung branches: 1.85.2;
Cosmetic: don't open-code LIST_FOREACH(). Remove extraneous
parentheses. Bzero -> memset. Shorten staircase in rt_timer_add().
 1.84 05-Jan-2007  joerg Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.
 1.83 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.82 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.81 07-Dec-2006  joerg Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.
 1.80 07-Dec-2006  joerg Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.
 1.79 04-Dec-2006  dyoung Replace the temporary variable ndst with rt_key(rt). This will
simplify the application of RADIX_MPATH patches.

No functional change intended.
 1.78 04-Dec-2006  dyoung Paranoid protection against use after free: in rtfree(), set rt_ifa
and rt_ifp to NULL.
 1.77 04-Dec-2006  dyoung Cosmetic: remove extra empty line.
 1.76 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.75 13-Nov-2006  dyoung In rtalloc(), release our reference to the prior rtentry before
referencing a new rtentry.
 1.74 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.73 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.72 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.71 07-Sep-2006  dogcow branches: 1.71.2; 1.71.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.70 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.69 15-Apr-2006  christos branches: 1.69.2;
Coverity CID 855: Add a KASSERT for null route from successful rtrequest.
 1.68 10-Apr-2006  christos PR/33231: Anraud Degroote: Miscellaneous cleanups in the route code:
- use of 0 instead of NULL
- questionnable macros
 1.67 11-Dec-2005  christos branches: 1.67.4; 1.67.6; 1.67.8; 1.67.10; 1.67.12;
merge ktrace-lwp.
 1.66 29-May-2005  christos branches: 1.66.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.65 26-Feb-2005  perry nuke trailing whitespace
 1.64 23-Jan-2005  matt branches: 1.64.2;
Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.63 30-Sep-2004  christos branches: 1.63.4;
Fix problem in previous commit; we need to create a new sockaddr.
 1.62 29-Sep-2004  christos PR/22849: Sean Boudreau: rtrequest() w/ RTM_DELETE not honouring netmask
as it does w/ RTM_ADD.
 1.61 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.60 21-Apr-2004  matt ANSI-fy and some additional de-__P and constification.
 1.59 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.58 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.57 29-Jun-2003  fvdl branches: 1.57.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.56 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.55 12-Nov-2002  itojun remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.
 1.54 12-Nov-2002  itojun add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.
 1.53 26-Aug-2002  thorpej Fix a signed/unsigned comparison warning from GCC 3.3.
 1.52 12-May-2002  matt branches: 1.52.2; 1.52.4;
Eliminate more commons.
 1.51 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.50 12-Nov-2001  lukem add RCSIDs
 1.49 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.48 16-Oct-2001  itojun branches: 1.48.2;
on RTM_DELETE, reduce refcnt on rt->rt_parent, to avoid leaks.
from IIJ seil team
 1.47 26-Jul-2001  itojun do not initialize rmx_mtu on RTM_ADD.

on gateway change, copy rmx_mtu from gateway only under the following condition:
- current MTU is not locked
- current MTU was discovered via PMTUD

XXX if gateway has MTU == 0, current MTU is set to 0 and we are going to
rediscover PMTU again. is it good or bad?
 1.46 25-Jul-2001  itojun do not copy rmx_mtu on RTM_ADD/RESOLVE. the fragment was mistakenly
introduced on 1.25, from other *bsd via kame. from thorpej
 1.45 20-Jul-2001  itojun validate sa_len on equal() macro. without the change we may touch the content
of a2 beyond a2->sa_len mistakelnly. sync with kame
 1.44 18-Jul-2001  thorpej bzero -> memset
 1.43 21-Feb-2001  itojun branches: 1.43.2; 1.43.4;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).
 1.42 27-Jan-2001  itojun change non-intuitive function name. s/rtflushit/rtflushclone1/
 1.41 27-Jan-2001  itojun cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.
 1.40 27-Jan-2001  itojun mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.
 1.39 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.38 11-Dec-2000  itojun do not touch region after free
 1.37 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.36 30-Mar-2000  augustss branches: 1.36.4;
Kill some more register declarations.
 1.35 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.34 22-Mar-2000  itojun remove bogus comment
 1.33 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.32 10-Mar-2000  itojun do not touch radix_node with RNF_ROOT on route_output(). this can
cause kernel panic (by non-root invocation of route(8)) on certain
routing table setup.
KAME PR: 217
 1.31 02-Feb-2000  thorpej Wrap a debugging printf in IFAREF_DEBUG.
 1.30 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.29 09-Oct-1999  sommerfeld branches: 1.29.2;
Fix PR7946 (neighbor discovery tries to block at interrupt level).
stack-allocate a sockaddr_storage for the temporary sockaddr rather
than putting it in an mbuf.

neighbor discovery wants to delete expired ifa's from a timeout
handler. allowing rtinit(RTM_DELETE, ...) to run at interrupt level
allows this to work.
i think we can afford the extra ~128 bytes of stack depth ..
 1.28 09-Oct-1999  erh Delay clearing of RTF_UP until after deleting rt_gwroute. Otherwise, if rt_gwroute is the same as the original route it will get freed twice. It can end up the same because of unusual "route" commands (PR4561) or certain icmp redirects (PR4827).
 1.27 21-Aug-1999  matt branches: 1.27.2;
Cleanup a little kludge in mtu handling in route.c. Bring down FDDI
mtu to legal IP max but don't affect other protocols.
 1.26 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.25 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.24 27-Dec-1998  thorpej branches: 1.24.2; 1.24.4; 1.24.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.
 1.23 27-Dec-1998  veego Fix kern/6658 from Martin Husemann:
After booting a current kernel and receiving a few arp requests on the
network it panics (data modified on free list). The panic message is wrong,
as code inspection shows the memory pool for routing entries is intialized
twice, while the routing timer memory pool is never initialized.
 1.22 22-Dec-1998  thorpej Use pools for rtentry and rttimer structures.
 1.21 28-Oct-1998  kml branches: 1.21.4;
Add call to splsoftnet() in rt_timer_timer to avoid possible race
condition in deleting timer queue (PMTU) entries.
 1.20 15-Aug-1998  thorpej Explicitly dereference the route timer expiration function pointer.
 1.19 05-Jul-1998  jonathan defopt NS, NSIP.
 1.18 29-Apr-1998  kml Add generic route timeout functionality; used by path MTU discovery code
 1.17 02-Apr-1997  christos branches: 1.17.8;
Sync with Lite2.
 1.16 13-Oct-1996  christos backout previous kprintf change
 1.15 10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.14 13-Feb-1996  christos Net prototypes
 1.13 12-Aug-1995  mycroft splnet --> splsoftnet
 1.12 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.11 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.10 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.9 11-May-1994  mycroft Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.
 1.8 23-Mar-1994  cgd two reference count fixes, and minor cleanup (to offset the added goto! 8-).
 1.7 10-Feb-1994  mycroft Deprecate af.h.
 1.6 16-Jan-1994  cgd include <machine/cpu.h> not <machine/mtpr.h>
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 22-May-1993  cgd branches: 1.4.4;
add include of select.h if necessary for protos, or delete if extraneous
 1.3 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.2 21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.4.3 08-Nov-1993  mycroft Remove references to af.h.
 1.4.4.2 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.4.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.17.8.1 09-May-1998  mycroft Pull up patch from kml.
 1.21.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.24.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.24.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.24.4.2 02-Aug-1999  thorpej Update from trunk.
 1.24.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.24.2.1 10-Oct-1999  cgd pull up rev 1.28 from trunk (requested by erh):
Avoid duplicate free() calls if a route's gateway points to itself,
by marking a route down (~RTF_UP) _after_ calling RTFREE on its
gateway. Partial fix for PR#4561 and PR#4827 (the looped route can
still occur, but it won't cause a panic).
 1.27.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.29.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.29.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.29.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.29.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.29.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.36.4.3 13-Nov-2002  itojun sys/net/route.c 1.55 via patch
sys/net/route.h 1.32
sys/netinet/ip_input.c 1.163

Remove all entries on rt timer queue on ip_mtudisc change, instead
of destroying the queue.

(itojun, redo)
 1.36.4.2 13-Nov-2001  he Pull up revision 1.48 (requested by itojun):
Avoid memory leak on RTM_DELETE.
 1.36.4.1 05-Apr-2001  he Pull up revisions 1.40-1.41 (via patch, requested by itojun):
Mark cloned routes with RTF_CLONED. Present it in ``netstat -r''
output by ``c''.

Let static routes overwrite cloned routes, as cloned routes can
come back again if necessary. Should fix PR#11916 and maybe some
other PRs with ARP behavior.

Cleanup cloned route when parent route (RTF_CLONING) goes away.
Adds rt_parent to link parent from child (like NRL did, ours do
refcnt rt_refcnt properly).
 1.43.4.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.43.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.43.4.3 16-Mar-2002  jdolecek Catch up with -current.
 1.43.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.43.4.1 03-Aug-2001  lukem update to -current
 1.43.2.7 11-Dec-2002  thorpej Sync with HEAD.
 1.43.2.6 27-Aug-2002  nathanw Catch up to -current.
 1.43.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.43.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.43.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.43.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.43.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.48.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.52.4.2 17-Jun-2003  msaitoh Pullup rev. 1.55 (requested by itojun in ticket #984):
remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.
 1.52.4.1 11-Dec-2002  he Pull up revision 1.54 (requested by itojun in ticket #982):
Add an argument to rt_timer_remove_all(), to specify if we
need to call timeout routine on removal.
 1.52.2.1 29-Aug-2002  gehenna catch up with -current.
 1.57.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.57.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.57.2.6 24-Jan-2005  skrll Sync with HEAD.
 1.57.2.5 19-Oct-2004  skrll Sync with HEAD
 1.57.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.57.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.57.2.2 03-Aug-2004  skrll Sync with HEAD
 1.57.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.63.4.1 29-Apr-2005  kent sync with -current
 1.64.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.66.2.7 04-Feb-2008  yamt sync with head.
 1.66.2.6 21-Jan-2008  yamt sync with head
 1.66.2.5 27-Oct-2007  yamt sync with head.
 1.66.2.4 03-Sep-2007  yamt sync with head.
 1.66.2.3 26-Feb-2007  yamt sync with head.
 1.66.2.2 30-Dec-2006  yamt sync with head.
 1.66.2.1 21-Jun-2006  yamt sync with head.
 1.67.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.67.10.1 19-Apr-2006  elad sync with head.
 1.67.8.4 14-Sep-2006  yamt sync with head.
 1.67.8.3 26-Jun-2006  yamt sync with head.
 1.67.8.2 24-May-2006  yamt sync with head.
 1.67.8.1 11-Apr-2006  yamt sync with head
 1.67.6.2 22-Apr-2006  simonb Sync with head.
 1.67.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.67.4.1 09-Sep-2006  rpaulo sync with head
 1.69.2.1 19-Jun-2006  chap Sync with head.
 1.71.4.3 18-Dec-2006  yamt sync with head.
 1.71.4.2 10-Dec-2006  yamt sync with head.
 1.71.4.1 22-Oct-2006  yamt sync with head
 1.71.2.2 12-Jan-2007  ad Sync with head.
 1.71.2.1 18-Nov-2006  ad Sync with head.
 1.85.2.5 07-May-2007  yamt sync with head.
 1.85.2.4 24-Mar-2007  yamt sync with head.
 1.85.2.3 12-Mar-2007  rmind Sync with HEAD.
 1.85.2.2 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.85.2.1 17-Feb-2007  yamt file route.c was added on branch yamt-idlelwp on 2007-02-27 16:54:46 +0000
 1.87.2.7 12-Oct-2007  ad Sync with head.
 1.87.2.6 09-Oct-2007  ad Sync with head.
 1.87.2.5 20-Aug-2007  ad Sync with HEAD.
 1.87.2.4 15-Jul-2007  ad Sync with head.
 1.87.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.87.2.2 08-Jun-2007  ad Sync with head.
 1.87.2.1 13-Mar-2007  ad Sync with head.
 1.88.2.1 11-Jul-2007  mjf Sync with head.
 1.93.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.93.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.95.8.2 21-Jul-2007  dyoung Cosmetic: remove superfluous parentheses. Compare pointers with
NULL instead of testing "truth." Remove unnecessary casts to void*
in memset() calls.
 1.95.8.1 21-Jul-2007  dyoung file route.c was added on branch matt-mips64 on 2007-07-21 03:12:11 +0000
 1.95.6.3 23-Mar-2008  matt sync with HEAD
 1.95.6.2 09-Jan-2008  matt sync with HEAD
 1.95.6.1 06-Nov-2007  matt sync with HEAD
 1.95.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.95.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.97.2.1 14-Oct-2007  yamt sync with head.
 1.98.10.5 23-Jan-2008  bouyer Sync with HEAD.
 1.98.10.4 19-Jan-2008  bouyer Sync with HEAD
 1.98.10.3 10-Jan-2008  bouyer Sync with HEAD
 1.98.10.2 08-Jan-2008  bouyer Sync with HEAD
 1.98.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.98.6.1 26-Dec-2007  ad Sync with head.
 1.98.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.105.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.105.6.3 05-Oct-2008  mjf Sync with HEAD.
 1.105.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.105.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.107.4.4 11-Aug-2010  yamt sync with head.
 1.107.4.3 11-Mar-2010  yamt sync with head
 1.107.4.2 04-May-2009  yamt sync with head.
 1.107.4.1 16-May-2008  yamt sync with head.
 1.107.2.1 18-May-2008  yamt sync with head.
 1.108.2.2 10-Oct-2008  skrll Sync with HEAD.
 1.108.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.112.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.112.4.1 19-Oct-2008  haad Sync with HEAD.
 1.113.4.2 05-Feb-2012  bouyer Pull up following revision(s) (requested by christos in ticket #1721):
sys/net/route.c: revision 1.126
Count length from the beginning of the structure not the sa_data portion.
=46rom skrll@
 1.113.4.1 03-Apr-2009  snj branches: 1.113.4.1.2; 1.113.4.1.4; 1.113.4.1.6;
Pull up following revision(s) (requested by christos in ticket #650):
sys/net/route.c: revision 1.117
sys/net/route.h: revision 1.73
sys/net/rtsock.c: revision 1.125
usr.sbin/arp/arp.c: revision 1.48
usr.sbin/pppd/pppd/sys-bsd.c: revision 1.59
Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.
 1.113.4.1.6.1 05-Feb-2012  bouyer Pull up following revision(s) (requested by christos in ticket #1721):
sys/net/route.c: revision 1.126
Count length from the beginning of the structure not the sa_data portion.
=46rom skrll@
 1.113.4.1.4.1 16-Aug-2010  matt Use uint64_t instead of u_quad_t
 1.113.4.1.2.1 05-Feb-2012  bouyer Pull up following revision(s) (requested by christos in ticket #1721):
sys/net/route.c: revision 1.126
Count length from the beginning of the structure not the sa_data portion.
=46rom skrll@
 1.113.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.113.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.113.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.114.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.121.4.4 21-Apr-2011  rmind sync with head
 1.121.4.3 05-Mar-2011  rmind sync with head
 1.121.4.2 03-Jul-2010  rmind sync with head
 1.121.4.1 30-May-2010  rmind sync with head
 1.121.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.123.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.123.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.125.8.1 18-Feb-2012  mrg merge to -current.
 1.125.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.125.4.1 17-Apr-2012  yamt sync with head
 1.126.6.3 03-Dec-2017  jdolecek update from HEAD
 1.126.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.126.6.1 23-Jun-2013  tls resync from head
 1.126.2.1 29-Jul-2013  msaitoh Pull up following revision(s) (requested by christos in ticket #909):
sys/net/route.c: revision 1.127
PR/44032: Proxy entries stopped working with pppd. The issue here is that
the route entry was added, but the RTF_LLINFO bit was not set, making arp -a
not showing the entry, but netstat -rn -f inet showing it with the missing
L bit. The order of resolution in ifa_ifwithroute() is that if a destination
address is found, then the interface chosen for the route is that of the
destination. This does not work for link-level addresses since the ppp
interface does not arp (uses link_rtrequest, not arp_rtrequest), so the
bit is never set. The easy solution here is to check that the gateway is
a link address, and use the interface which we chose for the link address
as opposed to the interface that routes to the destination. This restores
the previous behavior, but is it correct?
 1.127.2.1 18-May-2014  rmind sync with head
 1.129.2.1 10-Aug-2014  tls Rebase.
 1.132.2.1 12-May-2017  snj Pull up following revision(s) (requested by skrll/ozaki-r in ticket #1402):
sys/net/route.c: revision 1.170 via patch
sys/netinet/ip_flow.c: revision 1.73 via patch
sys/netinet6/ip6_flow.c: revision 1.28 via patch
sys/netinet6/nd6.c: revision 1.203 via patch
Run timers in workqueue
Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).
Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.
Proposed on tech-net and tech-kern.
 1.133.2.11 28-Aug-2017  skrll Sync with HEAD
 1.133.2.10 05-Feb-2017  skrll Sync with HEAD
 1.133.2.9 05-Dec-2016  skrll Sync with HEAD
 1.133.2.8 05-Oct-2016  skrll Sync with HEAD
 1.133.2.7 09-Jul-2016  skrll Sync with HEAD
 1.133.2.6 29-May-2016  skrll Sync with HEAD
 1.133.2.5 22-Apr-2016  skrll Sync with HEAD
 1.133.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.133.2.3 22-Sep-2015  skrll Sync with HEAD
 1.133.2.2 06-Jun-2015  skrll Sync with HEAD
 1.133.2.1 06-Apr-2015  skrll Sync with HEAD
 1.171.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.171.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.171.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.171.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.171.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.171.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.186.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.194.6.17 08-Jun-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1824):

sys/net/route.c: revision 1.237

route: run workqueue kthreads with KERNEL_LOCK unless NET_MPSAFE

Without KERNEL_LOCK, rt_timer_work and rt_free_work can run in parallel
with other LWPs running in the network stack, which eventually results
in say use-after-free of a deleted route.
 1.194.6.16 22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1801):

sys/net/route.c: revision 1.236

route(4): Work around deadlock in rt_free wait path.
PR kern/56844
 1.194.6.15 04-Oct-2019  martin Pull up following revision(s) (requested by rin in ticket #1398):

sys/net/route.c: revision 1.222

Stop passing a large const structure by value, in order to avoid
possible kernel stack overflow; const pointer is suffice here.

Pointed out by the lgtm bot and kamil.
OK ozaki-r

XXX
pullup to netbsd-9
 1.194.6.14 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.194.6.13 15-Mar-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1215):

sys/net/route.c: revision 1.217

Add missing ifa_release on error paths
 1.194.6.12 06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1080):

sys/netinet6/nd6.c: revision 1.251
sys/netinet/if_arp.c: revision 1.276
sys/net/if.c: revision 1.438
sys/net/if.c: revision 1.439
sys/net/route.c: revision 1.214
sys/net/route.c: revision 1.215
sys/net/route.c: revision 1.216
sys/netinet6/in6.c: revision 1.270
sys/net/route.h: revision 1.120
sys/net/if.c: revision 1.440

Remove a wrong assertion in ifaref

-

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.

-

Use atomic operations for ifa_refcnt

-

Avoid a dangling pointer during rt_replace_ifa

-

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.

-

Use rt_update framework on updating a rtentry
 1.194.6.11 07-Sep-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1012):

sys/net/route.c: revision 1.212
sys/net/route.c: revision 1.213

route: don't take an extra reference of a rtentry for the delayed free mechanism
Because a reference is already taken at that point.

-

route: avoid overwriting rt_free_global.enqueued unexpectedly

rt_free_global.enqueued can be set to true by rt_free during rt_free_work
because rt_free_work releases rt_free_global.lock. So rt_free_work must update
it once and not update after releasing the lock.
 1.194.6.10 08-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #852):

sys/netinet6/icmp6.c: revision 1.238
sys/netinet/ip_icmp.c: revision 1.171
sys/net/route.c: revision 1.210

Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release
the reference of a passed rtentry by themselves (but they didn't).
 1.194.6.9 14-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #749):

sys/net/if.h: revision 1.259
sys/net/route.c: revision 1.209
sys/net/route.h: revision 1.118
sys/net/rtsock.c: revision 1.240

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by
moving utility functions of rtentry updates from rtsock.c and ensuring
holding the rt_lock.
It also improves the atomicity of a update of a rtentry.
 1.194.6.8 05-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #697):

sys/net/route.c: revision 1.208

Kill remaining rt->rt_refcnt++
 1.194.6.7 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.194.6.6 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.194.6.5 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.194.6.4 13-Jan-2018  snj Pull up following revision(s) (requested by christos in ticket #496):
sys/net/route.c: revision 1.202-1.203
sys/net/route.h: revision 1.117
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
--
Don't stomp past the end of the array! need __arraycount not sizeof()
Found by chuq, while debugging the sdf.org crashes
Restructure a bit for readability.
 1.194.6.3 24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #305):
distrib/sets/lists/tests/mi: revision 1.762
sys/net/route.c: revision 1.198-1.201
sys/net/route.h: revision 1.114
sys/netatalk/at_proto.c: revision 1.22
sys/netinet/in_proto.c: revision 1.124
sys/netinet6/in6_proto.c: revision 1.118
sys/netmpls/mpls_proto.c: revision 1.31
sys/netnatm/natm_proto.c: revision 1.18
sys/rump/net/lib/libsockin/sockin.c: revision 1.65
sys/sys/domain.h: revision 1.33
tests/net/route/Makefile: revision 1.6
tests/net/route/t_rtcache.sh: revision 1.1
Add tests of rtcache invalidation
Remove unnecessary NULL check of rt_ifp
It's always non-NULL.
Invalidate rtcache based on a global generation counter
The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.
One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.
This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
Remove the global lock for rtcache
Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
Synchronize on rtcache_generation with rtlock
It's racy if NET_MPSAFE is enabled.
Pointed out by joerg@
 1.194.6.2 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.194.6.1 25-Jun-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #57):
sys/net/route.c: revision 1.195
Fix locking in rtalloc1 (affected only if NET_MPSAFE)
 1.206.2.7 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.206.2.6 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.206.2.5 28-Jul-2018  pgoyette Sync with HEAD
 1.206.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.206.2.3 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.206.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.206.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.210.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.210.2.1 10-Jun-2019  christos Sync with HEAD
 1.219.2.4 08-Jun-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1642):

sys/net/route.c: revision 1.237

route: run workqueue kthreads with KERNEL_LOCK unless NET_MPSAFE

Without KERNEL_LOCK, rt_timer_work and rt_free_work can run in parallel
with other LWPs running in the network stack, which eventually results
in say use-after-free of a deleted route.
 1.219.2.3 22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1602):

sys/net/route.c: revision 1.236

route(4): Work around deadlock in rt_free wait path.
PR kern/56844
 1.219.2.2 03-Oct-2019  martin Pull up following revision(s) (requested by knakahara in ticket #272):

sys/net/route.c: revision 1.222
sys/net/route.c: revision 1.224
sys/net/route.c: revision 1.225

Stop passing a large const structure by value, in order to avoid
possible kernel stack overflow; const pointer is suffice here.
Pointed out by the lgtm bot and kamil.
OK ozaki-r

-

Fix a ifa_release() leak for a specific struct rt_addrinfo.
ok by ozaki-r@n.o

-

Revert route.c:r1.224 to fix net/arp/t_arp and net/ndp/t_ndp failure.
And refactor a little. Discussed with ozaki-r@n.o.
 1.219.2.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.226.2.1 29-Feb-2020  ad Sync with head.
 1.235.2.3 14-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1136):

sys/net/route.c: revision 1.238
sys/net/route.c: revision 1.239
sys/net/if.c: revision 1.535

route: do ifa_rtrequest() before rt_addaddr()

ifa_rtrequest() could change a given rtentry in the routing table.


route: lower the priority of the workqueues

PRI_SOFTNET makes the kthread of a workqueue SCHED_RR which can monopolize
a CPU if there are many rtentries to free in rt_free_work. So lower the
prirority of the workqueues to PRI_USER which is the scheduling class for
time-sharing.

Also change rt_timer_wq as well just in case.


if: protect if_link_state_change_process with IFNET_LOCK

This change avoids race conditions between if_link_state_change handlers
and other operations on a target interface such as if_ioctl.
 1.235.2.2 08-Jun-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #195):

sys/net/route.c: revision 1.237

route: run workqueue kthreads with KERNEL_LOCK unless NET_MPSAFE

Without KERNEL_LOCK, rt_timer_work and rt_free_work can run in parallel
with other LWPs running in the network stack, which eventually results
in say use-after-free of a deleted route.
 1.235.2.1 22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #99):

sys/net/route.c: revision 1.236

route(4): Work around deadlock in rt_free wait path.
PR kern/56844
 1.237.6.1 02-Aug-2025  perseant Sync with HEAD
 1.135 21-Sep-2025  christos Centralize all the "can't handle af%d\n", messages in one place and provide
more context. Now I get ad-nauseam:
ether_output: wm1: can't handle af18 (link: link#2)
 1.134 16-Jun-2023  rin Align function name in its declaration consistently.
No binary changes.
 1.133 16-Jun-2023  rin Consistently use __inline instead of inline, as done for rev. 1.119:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/route.h#rev1.119
 1.132 20-Sep-2022  knakahara Remove routes on an address removal if the routes referencing to the address. Implemented by ozaki-r@n.o.

A route that has a gateway is on a connected route can be invalid if the
connected route is deleted, i.e., an associated address is removed.
Traditionally NetBSD doesn't sweep such a route on the address removal. Sending
packets over the route fails with "No route to host". Also the route holds an
orphan ifaddr as rt_ifa that is destructed say by in_purgeaddr.

If the same address is assgined again in such a state, there can be two
different ifaddr objects with the same address. Until recently it's not a
big problem because we can send packets anyway. However after MP-ification
of the network stack, we can't send packets because we strictly check if rt_ifa
(i.e., the (old) ifaddr) is valid.

This change automatically removes such routes on a removal of an associated
address to avoid keeping inconsistent routes.
 1.131 29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.130 26-Aug-2022  knakahara Refactor: rtrequest_newmsg() is no longer used after nd6_rtr.c:r1.149

That has bumped up to 9.99.66 when nd6_rtr.c:r1.149 was commited.
 1.129 09-Aug-2021  andvar fix various typos in compatibility, mainly in comments.
 1.128 22-Mar-2021  christos Add a list of names
 1.127 09-Mar-2020  roy branches: 1.127.6; 1.127.8;
route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.126 08-Feb-2020  roy route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.
 1.125 19-Sep-2019  ozaki-r branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.124 22-Aug-2019  roy rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9
 1.123 29-Apr-2019  roy branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.
 1.122 29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.121 29-Apr-2019  pgoyette For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.
 1.120 30-Oct-2018  ozaki-r Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
 1.119 19-Apr-2018  christos branches: 1.119.2;
s/static inline/static __inline/g for consistency.
 1.118 12-Apr-2018  ozaki-r Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.
 1.117 09-Jan-2018  christos branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8
 1.116 18-Dec-2017  ozaki-r Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity
 1.115 13-Dec-2017  christos Add bit definitions for flags so that route(8) can use them.
 1.114 21-Sep-2017  ozaki-r Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
 1.113 16-Jun-2017  ozaki-r Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@
 1.112 11-Apr-2017  roy branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.
 1.111 19-Dec-2016  roy branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.
 1.110 16-Dec-2016  christos Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.
 1.109 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.108 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.107 15-Nov-2016  ozaki-r Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
 1.106 25-Oct-2016  ozaki-r Remove unnecessary argument

No functional change.
 1.105 21-Oct-2016  ozaki-r Make some rt_timer functions and variables static

No functional change.
 1.104 18-Oct-2016  ozaki-r Remove unused rtcache_lookup_noclone
 1.103 21-Sep-2016  roy Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.
 1.102 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.101 28-Apr-2016  ozaki-r branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.100 26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.99 11-Apr-2016  ozaki-r Don't use radix tree API directly
 1.98 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.97 24-Mar-2016  ozaki-r Constify rt_newmsg's arguments
 1.96 02-Sep-2015  ozaki-r Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.
 1.95 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.94 31-Aug-2015  ozaki-r Make rt_refcnt take into account rt_timer
 1.93 24-Aug-2015  ozaki-r Add an assertion; if rtcache has an rtentry, its refcnt must be > 0
 1.92 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.91 30-Apr-2015  ozaki-r Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey
 1.90 06-Apr-2015  ozaki-r Classify and sort prototype declarations

No functional change.
 1.89 06-Apr-2015  ozaki-r Make rt_maskedcopy static
 1.88 23-Mar-2015  roy Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.
 1.87 26-Feb-2015  roy Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
 1.86 25-Feb-2015  roy Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.
 1.85 24-Feb-2015  roy Clean comments and style.
 1.84 06-Jun-2014  rmind branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.83 26-Apr-2014  pooka It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.
 1.82 01-Mar-2013  joerg branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@
 1.81 18-Feb-2012  rmind branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.
 1.80 11-Nov-2011  gdt branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.
 1.79 31-Mar-2011  dyoung branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.78 01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.77 26-Jan-2011  dyoung Update comment on RTM_CHGADDR to describe better what it's for.
 1.76 12-Nov-2010  roy branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.
 1.75 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.74 03-Nov-2009  dyoung branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.
 1.73 02-Apr-2009  christos Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0
 1.72 11-Jan-2009  christos branches: 1.72.2;
merge christos-time_t
 1.71 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.70 26-Mar-2008  ad branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.
 1.69 20-Feb-2008  matt branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.68 11-Feb-2008  simonb Don't look for <stdbool.h> if compiling _STANDALONE as well.
 1.67 21-Jan-2008  dyoung struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.
 1.66 21-Jan-2008  dyoung Move struct route inside of #ifdef _KERNEL to protect userland from
it.
 1.65 21-Jan-2008  dyoung In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.
 1.64 14-Jan-2008  dyoung Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.
 1.63 12-Jan-2008  dyoung Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().
 1.62 11-Jan-2008  dyoung Cosmetic: remove redundant 'not' from a comment, re-wrap lines.
 1.61 10-Jan-2008  dyoung Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.
 1.60 04-Jan-2008  dyoung Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.
 1.59 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.58 27-Aug-2007  dyoung branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.
 1.57 19-Jul-2007  dyoung branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.56 09-Jun-2007  dyoung branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.55 06-May-2007  dyoung Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.
 1.54 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.53 22-Apr-2007  xtraeme rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.
 1.52 04-Mar-2007  christos branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.51 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.50 05-Jan-2007  joerg branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.
 1.49 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.48 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.47 07-Dec-2006  joerg Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.
 1.46 07-Dec-2006  joerg Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.
 1.45 13-Nov-2006  dyoung Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.
 1.44 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.43 11-Dec-2005  christos branches: 1.43.20; 1.43.22;
merge ktrace-lwp.
 1.42 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.41 22-Jun-2005  dyoung branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.
 1.40 29-May-2005  christos - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.39 26-Feb-2005  perry nuke trailing whitespace
 1.38 21-Apr-2004  matt branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.
 1.37 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.36 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.35 29-Jun-2003  fvdl branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.34 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.33 18-Jan-2003  wiz bandwidth, not bandwith.
 1.32 12-Nov-2002  itojun remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.
 1.31 12-Nov-2002  itojun add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.
 1.30 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.29 12-May-2002  matt branches: 1.29.4;
Eliminate more commons.
 1.28 08-Mar-2001  enami branches: 1.28.2;
- lineup comment.
- fix typo in comment.
 1.27 21-Feb-2001  itojun branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).
 1.26 27-Jan-2001  itojun cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.
 1.25 27-Jan-2001  itojun mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.
 1.24 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.23 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.22 04-May-2000  ragge branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.
 1.21 06-Mar-2000  thorpej - Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.
 1.20 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.19 30-Jul-1999  itojun branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.18 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.17 27-Dec-1998  thorpej branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.
 1.16 10-Dec-1998  christos IPX counters and centralize statistics routine.
 1.15 25-Aug-1998  thorpej Use do { ... } while (0) in RTFREE().
 1.14 02-May-1998  thorpej Need <sys/socket.h> to stand alone.
 1.13 29-Apr-1998  thorpej Oops, we depend on <sys/queue.h>.
 1.12 29-Apr-1998  kml Add generic route timeout functionality; used by path MTU discovery code
 1.11 02-Apr-1997  christos branches: 1.11.8;
Sync with Lite2.
 1.10 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.9 13-Feb-1996  christos branches: 1.9.4;
Net prototypes
 1.8 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.7 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 11-May-1994  mycroft Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.11.8.1 09-May-1998  mycroft Pull up patch from kml.
 1.17.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.17.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.17.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.17.4.2 02-Aug-1999  thorpej Update from trunk.
 1.17.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.19.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.19.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.19.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.19.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.19.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.19.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.22.4.2 13-Nov-2002  itojun sys/net/route.c 1.55 via patch
sys/net/route.h 1.32
sys/netinet/ip_input.c 1.163

Remove all entries on rt timer queue on ip_mtudisc change, instead
of destroying the queue.

(itojun, redo)
 1.22.4.1 05-Apr-2001  he Pull up revisions 1.25-1.26 (via patch, requested by itojun):
Mark cloned routes with RTF_CLONED. Present it in ``netstat -r''
output by ``c''.

Let static routes overwrite cloned routes, as cloned routes can
come back again if necessary. Should fix PR#11916 and maybe some
other PRs with ARP behavior.

Cleanup cloned route when parent route (RTF_CLONING) goes away.
Adds rt_parent to link parent from child (like NRL did, ours do
refcnt rt_refcnt properly).
 1.27.2.4 11-Dec-2002  thorpej Sync with HEAD.
 1.27.2.3 11-Nov-2002  nathanw Catch up to -current
 1.27.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.27.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.28.2.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.29.4.2 17-Jun-2003  msaitoh Pullup rev. 1.32 (requested by itojun in ticket #984):
remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.
 1.29.4.1 11-Dec-2002  he Pull up revision 1.31 (requested by itojun in ticket #982):
Add an argument to rt_timer_remove_all(), to specify if we
need to call timeout routine on removal.
 1.35.2.7 11-Dec-2005  christos Sync with head.
 1.35.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.35.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.35.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.35.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.35.2.2 03-Aug-2004  skrll Sync with HEAD
 1.35.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.38.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.38.4.1 29-Apr-2005  kent sync with -current
 1.41.2.8 27-Feb-2008  yamt sync with head.
 1.41.2.7 11-Feb-2008  yamt sync with head.
 1.41.2.6 04-Feb-2008  yamt sync with head.
 1.41.2.5 21-Jan-2008  yamt sync with head
 1.41.2.4 03-Sep-2007  yamt sync with head.
 1.41.2.3 26-Feb-2007  yamt sync with head.
 1.41.2.2 30-Dec-2006  yamt sync with head.
 1.41.2.1 21-Jun-2006  yamt sync with head.
 1.43.22.2 18-Dec-2006  yamt sync with head.
 1.43.22.1 10-Dec-2006  yamt sync with head.
 1.43.20.2 12-Jan-2007  ad Sync with head.
 1.43.20.1 18-Nov-2006  ad Sync with head.
 1.50.2.3 07-May-2007  yamt sync with head.
 1.50.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.50.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.52.4.1 11-Jul-2007  mjf Sync with head.
 1.52.2.4 09-Oct-2007  ad Sync with head.
 1.52.2.3 20-Aug-2007  ad Sync with HEAD.
 1.52.2.2 15-Jul-2007  ad Sync with head.
 1.52.2.1 08-Jun-2007  ad Sync with head.
 1.56.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.56.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.57.6.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.57.6.1 19-Jul-2007  dyoung file route.h was added on branch matt-mips64 on 2007-07-19 20:48:54 +0000
 1.57.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.58.14.6 23-Jan-2008  bouyer Sync with HEAD.
 1.58.14.5 19-Jan-2008  bouyer Sync with HEAD
 1.58.14.4 11-Jan-2008  bouyer Sync with HEAD
 1.58.14.3 10-Jan-2008  bouyer Sync with HEAD
 1.58.14.2 08-Jan-2008  bouyer Sync with HEAD
 1.58.14.1 02-Jan-2008  bouyer Sync with HEAD
 1.58.10.1 26-Dec-2007  ad Sync with head.
 1.58.8.1 18-Feb-2008  mjf Sync with HEAD.
 1.58.2.2 23-Mar-2008  matt sync with HEAD
 1.58.2.1 09-Jan-2008  matt sync with HEAD
 1.69.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.69.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.69.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.70.16.1 03-Apr-2009  snj branches: 1.70.16.1.4;
Pull up following revision(s) (requested by christos in ticket #650):
sys/net/route.c: revision 1.117
sys/net/route.h: revision 1.73
sys/net/rtsock.c: revision 1.125
usr.sbin/arp/arp.c: revision 1.48
usr.sbin/pppd/pppd/sys-bsd.c: revision 1.59
Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.
 1.70.16.1.4.3 16-Aug-2010  matt Use uint64_t instead of u_quad_t
 1.70.16.1.4.2 13-May-2010  matt Add RTAX_NAMES macro to initialize an array of names for RTAX_*
 1.70.16.1.4.1 11-May-2010  matt A few changes that make the route interface and related sysctls 32/64 bit
independent so the netbsd32 userland can use them.
 1.70.14.2 28-Apr-2009  skrll Sync with HEAD.
 1.70.14.1 19-Jan-2009  skrll Sync with HEAD.
 1.70.12.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.70.6.3 11-Aug-2010  yamt sync with head.
 1.70.6.2 11-Mar-2010  yamt sync with head
 1.70.6.1 04-May-2009  yamt sync with head.
 1.70.2.4 28-Dec-2008  christos ort_metrics -> rt_metrics
rt_metrics -> nrt_metrics
for userland compatibility
 1.70.2.3 10-Nov-2008  christos add back RTM_IFINFO.
 1.70.2.2 09-Nov-2008  christos merge with head.
 1.70.2.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.72.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.74.4.3 21-Apr-2011  rmind sync with head
 1.74.4.2 05-Mar-2011  rmind sync with head
 1.74.4.1 03-Jul-2010  rmind sync with head
 1.74.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.76.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.76.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.79.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.79.4.1 17-Apr-2012  yamt sync with head
 1.80.4.1 24-Feb-2012  mrg sync to -current.
 1.81.2.3 03-Dec-2017  jdolecek update from HEAD
 1.81.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.81.2.1 23-Jun-2013  tls resync from head
 1.82.10.1 10-Aug-2014  tls Rebase.
 1.82.6.1 18-May-2014  rmind sync with head
 1.84.4.9 28-Aug-2017  skrll Sync with HEAD
 1.84.4.8 05-Feb-2017  skrll Sync with HEAD
 1.84.4.7 05-Dec-2016  skrll Sync with HEAD
 1.84.4.6 05-Oct-2016  skrll Sync with HEAD
 1.84.4.5 29-May-2016  skrll Sync with HEAD
 1.84.4.4 22-Apr-2016  skrll Sync with HEAD
 1.84.4.3 22-Sep-2015  skrll Sync with HEAD
 1.84.4.2 06-Jun-2015  skrll Sync with HEAD
 1.84.4.1 06-Apr-2015  skrll Sync with HEAD
 1.101.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.101.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.101.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.101.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.111.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.112.4.6 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.112.4.5 06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1080):

sys/netinet6/nd6.c: revision 1.251
sys/netinet/if_arp.c: revision 1.276
sys/net/if.c: revision 1.438
sys/net/if.c: revision 1.439
sys/net/route.c: revision 1.214
sys/net/route.c: revision 1.215
sys/net/route.c: revision 1.216
sys/netinet6/in6.c: revision 1.270
sys/net/route.h: revision 1.120
sys/net/if.c: revision 1.440

Remove a wrong assertion in ifaref

-

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.

-

Use atomic operations for ifa_refcnt

-

Avoid a dangling pointer during rt_replace_ifa

-

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.

-

Use rt_update framework on updating a rtentry
 1.112.4.4 14-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #749):

sys/net/if.h: revision 1.259
sys/net/route.c: revision 1.209
sys/net/route.h: revision 1.118
sys/net/rtsock.c: revision 1.240

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by
moving utility functions of rtentry updates from rtsock.c and ensuring
holding the rt_lock.
It also improves the atomicity of a update of a rtentry.
 1.112.4.3 13-Jan-2018  snj Pull up following revision(s) (requested by christos in ticket #496):
sys/net/route.c: revision 1.202-1.203
sys/net/route.h: revision 1.117
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
--
Don't stomp past the end of the array! need __arraycount not sizeof()
Found by chuq, while debugging the sdf.org crashes
Restructure a bit for readability.
 1.112.4.2 24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #305):
distrib/sets/lists/tests/mi: revision 1.762
sys/net/route.c: revision 1.198-1.201
sys/net/route.h: revision 1.114
sys/netatalk/at_proto.c: revision 1.22
sys/netinet/in_proto.c: revision 1.124
sys/netinet6/in6_proto.c: revision 1.118
sys/netmpls/mpls_proto.c: revision 1.31
sys/netnatm/natm_proto.c: revision 1.18
sys/rump/net/lib/libsockin/sockin.c: revision 1.65
sys/sys/domain.h: revision 1.33
tests/net/route/Makefile: revision 1.6
tests/net/route/t_rtcache.sh: revision 1.1
Add tests of rtcache invalidation
Remove unnecessary NULL check of rt_ifp
It's always non-NULL.
Invalidate rtcache based on a global generation counter
The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.
One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.
This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
Remove the global lock for rtcache
Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
Synchronize on rtcache_generation with rtlock
It's racy if NET_MPSAFE is enabled.
Pointed out by joerg@
 1.112.4.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.117.2.5 14-Jan-2019  pgoyette Create a variant of the HOOK macros that handles hook routines of
type void, and use them where appropriate.
 1.117.2.4 13-Jan-2019  pgoyette Add the required hooks for rtsock_50 and modify the COMPATCALL() macro
to use the hooks. While the rtsock_50 situation is still sub-optimal
(it includes the main rtsock.c with a whole bunch of function and
variable redefinitions via macros), this at least makes it possible to
load the rtsock_50 code separately from more recent code, rather than
the previous requirement that rtsock_50 be built-in.
 1.117.2.3 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.117.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.117.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.119.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.119.2.1 10-Jun-2019  christos Sync with HEAD
 1.123.2.2 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.123.2.1 26-Aug-2019  martin Pull up following revision(s) (requested by roy in ticket #109):

sys/net/route.h: revision 1.124
sys/netinet6/nd6.c: revision 1.258
sys/netinet6/nd6.c: revision 1.259
sys/net/rtsock.c: revision 1.251
sys/netinet/if_arp.c: revision 1.284
sys/netinet6/nd6_nbr.c: revision 1.167

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9

-

nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.125.2.1 29-Feb-2020  ad Sync with head.
 1.127.8.1 03-Apr-2021  thorpej Sync with HEAD.
 1.127.6.1 03-Apr-2021  thorpej Sync with HEAD.
 1.3 24-Sep-2021  knakahara Add RSS toeplitz hash functions which calculate from mbuf.
 1.2 20-Nov-2019  knakahara "rss_symmetric_key" iniitalizer is too short. Pointed out by ryo@n.o, thanks.

It is not used yet.
 1.1 16-Feb-2018  knakahara branches: 1.1.2; 1.1.6;
Introduce very simple Receive Side Scaling (RSS) utility.

ok by msaitoh@n.o.
 1.1.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1.2.2 26-Feb-2018  snj Pull up following revision(s) (requested by knakahara in ticket #567):
distrib/sets/lists/comp/mi: 1.2182-1.2183
sys/dev/pci/if_wm.c: 1.564
sys/dev/pci/ixgbe/ixgbe.c: 1.122
sys/dev/pci/ixgbe/ixgbe_rss.h: 1.3
sys/dev/pci/ixgbe/ixv.c: 1.78
sys/net/Makefile: 1.35-1.36
sys/net/files.net: 1.15
sys/net/rss_config.c: 1.1
sys/net/rss_config.h: 1.1
Introduce very simple Receive Side Scaling (RSS) utility.
ok by msaitoh@n.o.
--
Apply RSS utility to wm(4).
ok by msaitoh@n.o.
--
Apply RSS utility to ixg(4) and ixv(4).
ok by msaitoh@n.o.
--
Fix build failure, sorry.
--
Currently, it is not necessary to install rss_config.h. Pointed out by msaitoh@n.o.
 1.1.2.1 16-Feb-2018  snj file rss_config.c was added on branch netbsd-8 on 2018-02-26 00:25:16 +0000
 1.2 24-Sep-2021  knakahara Add RSS toeplitz hash functions which calculate from mbuf.
 1.1 16-Feb-2018  knakahara branches: 1.1.2;
Introduce very simple Receive Side Scaling (RSS) utility.

ok by msaitoh@n.o.
 1.1.2.2 26-Feb-2018  snj Pull up following revision(s) (requested by knakahara in ticket #567):
distrib/sets/lists/comp/mi: 1.2182-1.2183
sys/dev/pci/if_wm.c: 1.564
sys/dev/pci/ixgbe/ixgbe.c: 1.122
sys/dev/pci/ixgbe/ixgbe_rss.h: 1.3
sys/dev/pci/ixgbe/ixv.c: 1.78
sys/net/Makefile: 1.35-1.36
sys/net/files.net: 1.15
sys/net/rss_config.c: 1.1
sys/net/rss_config.h: 1.1
Introduce very simple Receive Side Scaling (RSS) utility.
ok by msaitoh@n.o.
--
Apply RSS utility to wm(4).
ok by msaitoh@n.o.
--
Apply RSS utility to ixg(4) and ixv(4).
ok by msaitoh@n.o.
--
Fix build failure, sorry.
--
Currently, it is not necessary to install rss_config.h. Pointed out by msaitoh@n.o.
 1.1.2.1 16-Feb-2018  snj file rss_config.h was added on branch netbsd-8 on 2018-02-26 00:25:16 +0000
 1.7 01-Jun-2017  chs remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.6 11-Jan-2017  ozaki-r Get rid of unnecessary header inclusions
 1.5 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.4 15-Nov-2016  ozaki-r Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.
 1.3 11-Apr-2016  ozaki-r branches: 1.3.2;
Don't use radix tree API directly
 1.2 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.1 31-Mar-2011  dyoung branches: 1.1.2; 1.1.6; 1.1.18; 1.1.36;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.1.36.5 28-Aug-2017  skrll Sync with HEAD
 1.1.36.4 05-Feb-2017  skrll Sync with HEAD
 1.1.36.3 05-Dec-2016  skrll Sync with HEAD
 1.1.36.2 22-Apr-2016  skrll Sync with HEAD
 1.1.36.1 22-Sep-2015  skrll Sync with HEAD
 1.1.18.1 03-Dec-2017  jdolecek update from HEAD
 1.1.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.1.6.1 31-Mar-2011  jruoho file rtbl.c was added on branch jruoho-x86intr on 2011-06-06 09:09:53 +0000
 1.1.2.2 21-Apr-2011  rmind sync with head
 1.1.2.1 31-Mar-2011  rmind file rtbl.c was added on branch rmind-uvmplock on 2011-04-21 01:42:13 +0000
 1.3.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.3.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.256 27-Aug-2022  skrll Add a little const. NFC.
 1.255 09-Mar-2020  roy route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.254 03-Feb-2020  roy rtsock: favour ifatoia and ifatoia6 over direct struct casts
 1.253 29-Jan-2020  thorpej Do not reference ifp->if_data directly; use if_export_if_data().
 1.252 01-Sep-2019  roy branches: 1.252.2;
inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.
This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted (RTM_DELETED)
or has failed to been resolved (RTM_MISS). The latter case can be
interpreted as unreachable.
 1.251 22-Aug-2019  roy rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9
 1.250 27-May-2019  ozaki-r branches: 1.250.2;
Don't take softnet_lock in sysctl_rtable

Taking softnet_lock there can cause a locking error with nfs sosend, so we don't.
Having only KERNEL_LOCK is enough because now the routing table is protected by
KERNEL_LOCK that was introduced by the fix for PR 53043.

PR kern/54227 from Paul Ripke
 1.249 29-Apr-2019  pgoyette For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.
 1.248 01-Mar-2019  pgoyette Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.
 1.247 27-Feb-2019  ozaki-r Protect sysctl_rtable with KERNEL_LOCK and softnet_lock

In the function the routing table could be accessed without any locks, which was
unsafe. Actually, on netbsd-7, a kernel panic happened(*). The situation of
locking hasn't changed since netbsd-7 so we still need to hold the big locks on
-current (and netbsd-8) too.

Note that if NET_MPSAFE is enabled, the routing table is protected by its own
lock and we don't need the locks.

Reported and tested on netbsd-7 by sborrill@

(*) http://mail-index.netbsd.org/tech-net/2018/11/08/msg007153.html
 1.246 29-Jan-2019  pgoyette Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.
 1.245 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.244 13-Nov-2018  maxv Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.

[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.243 07-Sep-2018  maxv Set unused pr_input field to NULL, discussed on tech-net@.
 1.242 31-Aug-2018  maxv Fix buffer overflow, detected by kASan.

ifconfig gif0 create
ifconfig gif0 up

[ 50.682919] kASan: Unauthorized Access In 0xffffffff80f22655: Addr 0xffffffff81b997a0 [8 bytes, read]
[ 50.682919] #0 0xffffffff8021ce6a in kasan_memcpy <netbsd>
[ 50.692999] #1 0xffffffff80f22655 in m_copyback_internal <netbsd>
[ 50.692999] #2 0xffffffff80f22e81 in m_copyback <netbsd>
[ 50.692999] #3 0xffffffff8103109a in rt_msg1 <netbsd>
[ 50.692999] #4 0xffffffff8159109a in compat_70_rt_newaddrmsg1 <netbsd>
[ 50.692999] #5 0xffffffff81031b0f in rt_newaddrmsg <netbsd>
[ 50.692999] #6 0xffffffff8102c35e in rt_ifa_addlocal <netbsd>
[ 50.692999] #7 0xffffffff80a5287c in in6_update_ifa1 <netbsd>
[ 50.692999] #8 0xffffffff80a54149 in in6_update_ifa <netbsd>
[ 50.692999] #9 0xffffffff80a59176 in in6_ifattach <netbsd>
[ 50.692999] #10 0xffffffff80a56dd4 in in6_if_up <netbsd>
[ 50.692999] #11 0xffffffff80fc5cb8 in if_up_locked <netbsd>
[ 50.703622] #12 0xffffffff80fcc4c1 in ifioctl_common <netbsd>
[ 50.703622] #13 0xffffffff80fde694 in gif_ioctl <netbsd>
[ 50.703622] #14 0xffffffff80fcdb1f in doifioctl <netbsd>
 1.241 25-Apr-2018  ozaki-r branches: 1.241.2;
Fix a deadlock (rt_free vs. route_intr on rt_so_mtx)

It occurs only if NET_MPSAFE is enabled.
 1.240 12-Apr-2018  ozaki-r Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.
 1.239 19-Mar-2018  roy rtsock: log dropped messages that we cannot report to userland
 1.238 25-Jan-2018  ozaki-r branches: 1.238.2;
Fix another deadlock

When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.237 19-Jan-2018  ozaki-r Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr

The deadlock happened only if NET_MPSAFE on.
 1.236 18-Dec-2017  ozaki-r Fix compile error (may be used uninitialized)

Hmm, __noinline had hidden this error.
 1.235 18-Dec-2017  ozaki-r Revert "Spinkle __noinline to some non-performance-sensitive functions for debugging"

We should do this kind of tweaks for debugging just locally and personally.

Requested by christos@
 1.234 14-Dec-2017  ozaki-r Fix a bug that tries to psref_acquire ifa with a psref used before

This fixes ATF tests that started to fail by a recent change to psref.
 1.233 14-Dec-2017  ozaki-r Protect ifp returned from route_output_get_ifa surely

An ifp returned from route_output_get_ifa was supposed to be protected
by a returned ifa; if the ifa belongs to ifp, holding the ifa prevents
the ifp from being freed. However route_output_get_ifa can return an ifp
to which a returned ifa doesn't belong. So we need to take a reference
to a returning ifp separately.
 1.232 14-Dec-2017  ozaki-r Spinkle __noinline to some non-performance-sensitive functions for debugging
 1.231 19-Nov-2017  christos Avoid using a zero family mask.
 1.230 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.229 25-Sep-2017  ozaki-r Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
 1.228 25-Sep-2017  ozaki-r Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
 1.227 01-Jul-2017  christos put the code that returns the sizeof the socket by family in one place.
 1.226 30-Jun-2017  christos Avoid DIAGNOSTIC warning with previous fix and simplify it (don't require
memory alloc/free).
 1.225 30-Jun-2017  ozaki-r Restore the original length of a sockaddr for netmask

route(8) passes a sockaddr for netmask that is truncated with its
prefixlen. However the kernel basically doesn't expect such format
and may read beyond the data. So restore the original length of the
the data at the beginning of the kernel for the rest components.

Failures of ATF tests such as route_flags_blackhole6 should
be fixed.
 1.224 28-Jun-2017  ozaki-r Restore ARP/NDP entries to route show and netstat -r

Requested by dyoung@ some time ago
 1.223 26-Jun-2017  ozaki-r Drop RTF_UP from a routing message of a deleted ARP/NDP entry
 1.222 26-Jun-2017  ozaki-r Fix ifdef; care about a case w/ INET6 and w/o INET
 1.221 26-Jun-2017  ozaki-r Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry

A message originally included only DST and GATEWAY. Restore it.
 1.220 26-Jun-2017  ozaki-r Fix usage of routing messages on arp -d and ndp -d

It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
 1.219 23-Jun-2017  ozaki-r Tweak lltable_sysctl_dumparp

- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
 1.218 23-Jun-2017  ozaki-r Fix build of kernels without both INET and INET6
 1.217 22-Jun-2017  ozaki-r Purge L2 caches on changing an interface of a route

The change addresses situations similar to PR 51179.
 1.216 16-Jun-2017  ozaki-r Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries

ARP/NDP entries aren't connected routes.

Reported by ryo@
 1.215 16-Jun-2017  ozaki-r Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@
 1.214 15-Jun-2017  ozaki-r Simplify

We can assume that rt_ifp is always non-NULL.
 1.213 01-Jun-2017  chs branches: 1.213.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.212 11-Apr-2017  roy Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.
 1.211 24-Mar-2017  ozaki-r Forbit installing a route which its gateway is unreachable

This change needs a tweak in route_output_change to unbreak route
change commands (e.g., route change -inet6 default -reject).

PR kern/52077 (s-yamaguchi@IIJ and ozaki-r@)
 1.210 22-Mar-2017  ozaki-r Tweak and KNF some functions
 1.209 17-Mar-2017  ozaki-r Add missing NULL check

Fix PR kern/52083
 1.208 14-Mar-2017  ozaki-r Add missing pserialize_read_exit

Pointed out by riastradh@
 1.207 14-Mar-2017  ozaki-r Use if_acquire and if_release instead of using psref API directly

- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
 1.206 14-Mar-2017  ozaki-r Fix use of curlwp_bind

There was an error path that returned without curlwp_bindx.
 1.205 14-Mar-2017  ozaki-r Fix race condition in sysctl_iflist

We need to use psref for the ifa iteration because iflist_addr can sleep.
 1.204 14-Mar-2017  ozaki-r Replace DIAGNOSTIC + panic with KASSERT
 1.203 14-Mar-2017  ozaki-r Avoid debug printf just if DIAGNOSTIC
 1.202 21-Feb-2017  ozaki-r Use kmem instead of malloc
 1.201 17-Feb-2017  ozaki-r Fill rmx_locks too

Otherwise userland sees garbage in it.

This should fix t_mtudisc6 failing on babylon5.
 1.200 19-Jan-2017  ozaki-r Disable rt_update mechanism by default

This is a workaround for PR kern/51877. Enable again once the issue
is fixed.
 1.199 12-Dec-2016  ozaki-r branches: 1.199.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.198 26-Oct-2016  ozaki-r Pull RTM_CHANGE code out of route_output to make further changes easy

No functional change.
 1.197 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.196 21-Sep-2016  roy Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.
 1.195 01-Sep-2016  roy Split out sysctl_iflist into sysctl_iflist_if and sysctl_iflist_addr.
Setup a command and function pointer in one case statement
instead of having a seconary case statement within a loop.
This makes the code much easier to follow, and possibly to add more compat
in the future.

Don't panic when running an old binary without compat support.
 1.194 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.193 28-Jul-2016  martin PR kern/51371: avoid shifting negative values
 1.192 21-Jul-2016  ozaki-r Make complex RTM_CHANGE code understandable

Tests for route change added recently would reduce the possibility of
regressions.

Reviewed by ryo@
 1.191 07-Jul-2016  ozaki-r branches: 1.191.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.190 16-Jun-2016  ozaki-r Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND
 1.189 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.188 17-May-2016  ozaki-r Fix RT_IN_PRINT
 1.187 17-May-2016  ozaki-r Tidy up route_output

Avoid jumping into the middle of a switch statement, use a function instead.
 1.186 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.185 25-Apr-2016  roy Set rtm_pid = curproc->p_pid for a few more messages.
 1.184 25-Apr-2016  ozaki-r Check error of rt_setgate and rt_settag
 1.183 25-Apr-2016  ozaki-r Fix errno on rt_setgate error

I bet it's not EDQUOT (Disc quota exceeded).
 1.182 08-Apr-2016  christos - remove printf
- fix indent
 1.181 07-Apr-2016  christos Use sockaddr_dl_init
 1.180 06-Apr-2016  christos Don't interpret routing requests by interface index as arp entry additions!
 1.179 05-Apr-2016  ozaki-r Unbreak build of kernels without INET
 1.178 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.177 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.176 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.175 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.174 13-Oct-2015  rjs Add core networking support for SCTP.
 1.173 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.172 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.171 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.170 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.169 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.168 06-Apr-2015  ozaki-r Add hint comments for big ifdef
 1.167 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.166 02-Dec-2014  christos fix debugging printf.
 1.165 02-Dec-2014  christos use the new printing code.
 1.164 05-Sep-2014  matt branches: 1.164.2;
Don't use C++ new keyword
 1.163 09-Aug-2014  rtr branches: 1.163.2; 1.163.4; 1.163.8;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.162 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.161 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.160 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.159 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.158 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.157 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.156 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.155 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.154 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.153 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.152 07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.151 07-Jul-2014  rtr return EOPNOTSUPP for pr_stat instead of returning success since we
don't fill in the struct stat passed to us.
 1.150 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.149 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.148 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.147 21-May-2014  rmind raw_detach: rawpcb may be embedded, free using the real size (saved in rcb).
 1.146 20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.145 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.144 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.143 25-Feb-2014  pooka branches: 1.143.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.142 24-Jul-2013  kefren report about route tag in sysctl route walker
 1.141 01-Mar-2013  joerg branches: 1.141.6;
Retire OSI network stack. OK core@
 1.140 30-Jan-2012  christos branches: 1.140.6;
- don't copy past the end of sockaddr if we are rounding, zero it out instead,
from mlelstv@
- put a comment explaining the 6 nuls.
 1.139 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.138 12-Dec-2011  roy When adding or scrubbing a prefix, always notify userland even if the
prefix does not have IFA_ROUTE.
Don't scrub the interface in SIOCAIFADDR if the new address does't
have IFA_ROUTE. If more functions are added to in_ifscrub then this logic
might need to be revisited.

Fixes PR/26450.
 1.137 31-Oct-2011  yamt branches: 1.137.2; 1.137.6;
remove an unnecessary cast
 1.136 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.135 31-Mar-2011  dyoung Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.134 10-Feb-2011  kefren Allow changing route flags. Should fix PR/40455
OK'ed: dyoung@
 1.133 01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.132 25-Dec-2010  christos branches: 1.132.2; 1.132.4;
merge the length getting code from rt_msg1 and rt_msg2 and make it fail
when the compatibility ifinfo is missing instead of returning junk.
 1.131 12-Nov-2010  roy Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.
 1.130 28-Jun-2010  kefren we need to set rt_ifp even if ifa is the same. Fixes the case when one
changes route to a different ifp but wants to keep the same ifa
 1.129 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.128 02-May-2010  kefren Permit the existence of a route with unlinked ifp and ifa,
enabling this way the posibility to send a packet on an interface with
source address from another interface.
 1.127 16-Sep-2009  pooka branches: 1.127.2; 1.127.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.126 12-Sep-2009  tsutsui Make this compile with options RTSOCK_DEBUG.
Noticed by PR kern/41842, but fixed differently.
 1.125 02-Apr-2009  christos Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0
 1.124 11-Mar-2009  roy Revert r1.119 as the implementation is broken.
 1.123 20-Feb-2009  yamt remove inline from some functions which are not small or critical.
 1.122 14-Feb-2009  christos mention when this will really break, not 2038 but 2145.
 1.121 11-Jan-2009  christos branches: 1.121.2;
we need route_enqueue not to be static
 1.120 11-Jan-2009  christos merge christos-time_t
 1.119 21-Dec-2008  roy When removing routes automatically added, remove the flag from the associated
address.
When changing routes automatically addded, move the flag to the new assoicated
address.
 1.118 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.117 12-Dec-2008  christos RTAX_GENMASK and RTAX_AUTHOR could cause kernel memory corruption because
info struct members could be pointing to free'd memory. Fix from dyoung.
XXX: Pullup to 5.0
 1.116 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.115 28-Oct-2008  christos branches: 1.115.2;
Fold long lines created by the previous commit. No functional change.
 1.114 28-Oct-2008  dyoung Stop the "Sleazy use of local variables throughout file", replace
'dst' with 'info.rti_info[RTAX_DST]', et cetera.
 1.113 25-Oct-2008  christos branches: 1.113.2;
Fix handling of RTAX_GENMASK. Since this has been removed, userland programs
that set it, ended up causing the kernel to reference random garbage. Ignore
it for compatibility, but add a DIAGNOSTIC message so that userland programs
that set it can be fixed. The only one so far is pppd. Hi dyoung!
 1.112 24-Oct-2008  dyoung Do not gratuitously cast to void *. Remove excess parenthesization.
Do not "test truth" of pointers, but compare with NULL.

No functional change intended.
 1.111 28-Aug-2008  christos - more void * removal
- bcopy -> memcpy
- memmove -> memcpy
- explicitly initialize size to 0 on memory allocation failure.
 1.110 28-Aug-2008  dyoung Do not cast to void * unnecessarily.
 1.109 15-Jun-2008  cube branches: 1.109.2;
Fix previous: a well hidden assignment was lost.
 1.108 15-Jun-2008  christos - add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.107 01-Jun-2008  christos branches: 1.107.2;
Don't obliterate the whole message, preserve the data we have just written
and only zero out the rest.
 1.106 29-May-2008  christos PR/38791: J.T. Conklin: routing socket event header not cleared
 1.105 25-May-2008  dholland fix typo
 1.104 24-May-2008  christos Coverity CID 5013: Add diagnostic test for bad cmd parameter.
 1.103 13-May-2008  dyoung Replace a call to rtrequest() with single dst, mask, gateway
arguments, with a call to rtrequest1() with the rt_addrinfo those
single arguments come from. No functional change intended.
 1.102 11-May-2008  dyoung Use memset, memmove, and memcmp instead of Bzero, Bcopy, and Bcmp,
respectively.
 1.101 24-Apr-2008  ad branches: 1.101.2; 1.101.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.100 29-Mar-2008  yamt branches: 1.100.2; 1.100.4;
route_intr: fill a correct member of sockproto. (sp_family -> sp_protocol)
 1.99 26-Mar-2008  ad Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.
 1.98 20-Feb-2008  matt branches: 1.98.2; 1.98.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.97 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.96 05-Dec-2007  dyoung branches: 1.96.4;
Use IFADDR_FIRST(), IFADDR_NEXT().
 1.95 19-Jul-2007  dyoung branches: 1.95.4; 1.95.6; 1.95.12; 1.95.14; 1.95.16;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.94 09-Jun-2007  dyoung branches: 1.94.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.93 04-Mar-2007  christos branches: 1.93.2; 1.93.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.92 18-Feb-2007  matt Initialize routeswitch with structure initializers.
 1.91 13-Nov-2006  dyoung branches: 1.91.4;
make the routing socket report the right source address in RTM_GET
responses when a source-address selection policy is in use.
 1.90 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.89 19-Sep-2006  elad Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.
 1.88 08-Sep-2006  elad branches: 1.88.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.87 03-Sep-2006  christos branches: 1.87.2;
use c99 initializers
 1.86 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.85 27-May-2006  elad add sysctl for routing stats
 1.84 14-May-2006  elad branches: 1.84.2;
integrate kauth.
 1.83 15-Apr-2006  christos Coverity CID 854: Add KASSERT before deref.
 1.82 15-Apr-2006  christos Coverity CID 853: Prevent NULL deref.
 1.81 21-Feb-2006  rpaulo branches: 1.81.2; 1.81.4; 1.81.6;
In sysctl_iflist() don't assume TAILQ_FIRST() will never be NULL.
Prevents crash found by Uwe and fix confirmed working by Jeff Ito (all
on tech-net).
 1.80 24-Dec-2005  perry branches: 1.80.2; 1.80.4; 1.80.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.79 11-Dec-2005  christos merge ktrace-lwp.
 1.78 22-Jun-2005  dyoung branches: 1.78.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.
 1.77 09-Jun-2005  atatat Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.
 1.76 29-May-2005  christos - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.75 26-Feb-2005  perry nuke trailing whitespace
 1.74 24-Jan-2005  matt branches: 1.74.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.73 23-Jan-2005  matt Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.72 23-Oct-2004  christos branches: 1.72.4;
PR/27286: Tom Ivar Helbekkmo: Allow RTM_GET to work with RTA_IFA|RTA_IFP set.

Quiting Tom: The problem is the special case of an RTM_GET message
that wants interface information included in the response, and
therefore include the RTA_IFA or RTA_IFP (or both) flags in the
bitmask that says what addresses are supplied in the message. For
the RTM_GET message, it doesn't make sense to supply addresses
other than the one you're asking about, so those two other bits
are, in that specific case, overloaded with this meaning.

There is code in sys/net/rtsock.c to handle the case, but at some
time, extra sanity checking of the received message was added, that
failed to take this possibility into account.

The patch, is needed for the Asterisk software PBX to work properly
when it has multiple interfaces active: it needs to ask the kernel
for the IP address of the interface that will be used to communicate
with a given host.
 1.71 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.70 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.69 21-Apr-2004  matt ANSI-fy and some additional de-__P and constification.
 1.68 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.67 24-Mar-2004  atatat branches: 1.67.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.66 28-Dec-2003  atatat Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.
 1.65 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.64 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.63 29-Jun-2003  fvdl branches: 1.63.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.62 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.61 24-Jun-2003  itojun recover code that requires exact match on rtm_change/lock (lost in 1.16).
without it "route change X" would change less-specific route by mistake.
reported by jinmei@kame
 1.60 16-May-2003  itojun use strlcpy
 1.59 02-May-2003  itojun KNF
 1.58 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.57 24-Nov-2002  scw Quell an uninitialised variable warning.
 1.56 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.55 22-Feb-2002  christos branches: 1.55.10;
PR/15703: Sean Boudreau: Case in route_output() where struct rtentry *rt
dereferenced after free.
 1.54 12-Nov-2001  lukem add RCSIDs
 1.53 05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.52 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.51 16-Sep-2001  wiz branches: 1.51.2;
Spell 'occurred' with two 'r's.
 1.50 21-Jul-2001  itojun branches: 1.50.2;
repair validation on RTAX_GENMASK insertion. has been broken since 44bsd.
(freebsd3 has a fix since 1999, but has insufficient validation on sa_len)
 1.49 19-Jul-2001  enami No need to clear part of struct rt_addrinfo in rt_xaddrs() since the only
caller clears whole the struct.
 1.48 18-Jul-2001  thorpej bzero -> memset
 1.47 04-Jun-2001  itojun branches: 1.47.2;
simplify previous change (mbuf length adjustment for rtsock response).
 1.46 04-Jun-2001  itojun adjust routing socket response mbufs to the correct length. sync with kame.
 1.45 17-Jan-2001  itojun branches: 1.45.2;
pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.44 10-Nov-2000  enami Don't require the size of sockaddr to be rounded up if it was the last one
and was netmask.
 1.43 19-Oct-2000  itojun prevent stack overwrite due to bzero() arg mistake. from msaitoh.
 1.42 28-Sep-2000  erh When grabbing address structures out of a character array make sure that the number of addresses and length of each match up with the size of the data we're handed. Fixes arp on the alpha.
 1.41 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.40 15-Apr-2000  simonb branches: 1.40.4;
Remove some routing specific sysctl function declarations from
<sys/sysctl.h> and make them static in net/rtsock.c.
 1.39 30-Mar-2000  augustss Kill some more register declarations.
 1.38 12-Mar-2000  itojun initialize rn with 0, just to be sure
 1.37 10-Mar-2000  itojun do not touch radix_node with RNF_ROOT on route_output(). this can
cause kernel panic (by non-root invocation of route(8)) on certain
routing table setup.
KAME PR: 217
 1.36 06-Mar-2000  thorpej - Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.
 1.35 17-Feb-2000  itojun backout incomplete hack from KAME codebase (originally from bbn).

the hack tries to respect ifa or ifp passed to RTM_ADD. However, the change
broke certain link-layers. They include:
- midway ethernet card (en*), which uses sockaddr_dl in gateway portion
to pass PVC information. with the patch, the gateway portion will be
overwritten by empty sockaddr_dl and PVC initialization will fail.
- IPv6, which can't set static ND table with the patch (ndp -s), for the
similar reason as above.

There may be improved hack coming soon, hope the new one does not break others.
 1.34 11-Feb-2000  itojun make assumption in rt_msg1 (len <= MHLEN + MLEN) explicit.
panic if not satisfied.
 1.33 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.32 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.31 09-Jul-1999  thorpej branches: 1.31.2; 1.31.8;
defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.30 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.29 02-Apr-1999  chopps deal with failure of malloc NOWAIT by restarting after mallocing with WAIT.
don't write beyond the users given buffer size (this happened if there was
enough space for the initial malloc to succeed).
 1.28 12-Dec-1998  christos branches: 1.28.2;
fix thinko in previous change.
 1.27 10-Dec-1998  christos IPX counters and centralize statistics routine.
 1.26 01-Mar-1998  fvdl branches: 1.26.6;
Merge with Lite2 + local changes
 1.25 10-Dec-1997  christos PR/2733: Bill Sommerfeld: route change command can crash system. Actually
the case mentioned in the PR was fixed as part of PR/2582. There was a similar
case though that was not handled as part of my initial fix, which was fixed
in FreeBSD. I applied the remaining part from FreeBSD and the code matches
now the FreeBSD respective version. [this probably should be pulled up for 1.3]
 1.24 27-Mar-1997  thorpej branches: 1.24.8;
m_copyback() is now in uipc_mbuf.c
 1.23 22-Feb-1997  thorpej Allow non-superuser to open, listen to, and send safe commands on the
routing socket. Superuser priviledge is required for all commands
but RTM_GET.
 1.22 11-Dec-1996  mycroft branches: 1.22.4;
Undo silly part of previous change.
 1.21 01-Jul-1996  christos - Fix PR/2582: default route change without specifying gateway kills system.

While I was there:
- Fix KNF style problem.
- Remove bogus casts to 0, and (caddr_t).
 1.20 23-May-1996  mycroft We must indirect through the higher-level protocol for
PRU_{BIND,CONNECT} so that it can check the sockaddr.
 1.19 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.18 29-Mar-1996  cgd branches: 1.18.4;
make this version of ROUNDUP() consistent with the others in this directory.
(only makes a diff on the alpha.)
 1.17 13-Feb-1996  christos Net prototypes
 1.16 19-Aug-1995  cgd Update to latest code from CSRG.
 1.15 17-Aug-1995  mycroft so_pcb should be a void *.
 1.14 12-Aug-1995  mycroft splnet --> splsoftnet
 1.13 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.12 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.11 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.10 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.9 11-May-1994  mycroft Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.
 1.8 07-May-1994  cgd kill kinfo stuff, for now
 1.7 10-Feb-1994  mycroft Deprecate af.h.
 1.6 16-Jan-1994  cgd include <machine/cpu.h> not <machine/mtpr.h>
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 04-Sep-1993  jtc branches: 1.4.2;
include systm.h to get prototypes (and possibly inlines) of *max functions.
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.4 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.3 08-Nov-1993  mycroft Remove references to af.h.
 1.4.2.2 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.4.2.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.18.4.2 11-Dec-1996  mycroft From trunk:
Fix null pointer dereference when attempting to change the default route
without specifying a gateway.
 1.18.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.22.4.1 12-Mar-1997  is Merge in changes from The Trunk
 1.24.8.1 15-Dec-1997  mellon Pull rev 1.25 up from trunk (christos)
 1.26.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.28.2.1 02-Apr-1999  chopps branches: 1.28.2.1.2; 1.28.2.1.4;
pull-up revision 1.29
 1.28.2.1.4.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.28.2.1.4.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.28.2.1.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.28.2.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.28.2.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.28.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.31.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.31.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.31.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.31.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.40.4.2 25-Jun-2003  msaitoh Pullup reviosion 1.60 (requested by itojun in ticket #48):
recover code that requires exact match on rtm_change/lock (lost in 1.16).
without it "route change X" would change less-specific route by mistake.
reported by jinmei@kame
 1.40.4.1 19-Oct-2000  he Pull up revision 1.43 (requested by itojun):
Prevent stack overwrite due to bzero() argument mistake.
 1.45.2.11 11-Dec-2002  thorpej Sync with HEAD.
 1.45.2.10 11-Nov-2002  nathanw Catch up to -current
 1.45.2.9 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.45.2.8 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.45.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.45.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.45.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.45.2.4 24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.45.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.45.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.45.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.47.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.47.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.47.2.1 03-Aug-2001  lukem update to -current
 1.50.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.51.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.10.1 24-Jun-2003  grant Pull up revision 1.61 (requested by itojun in ticket #1336):

recover code that requires exact match on rtm_change/lock (lost in
1.16). without it "route change X" would change less-specific route by
mistake. reported by jinmei@kame
 1.63.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.63.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.63.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.63.2.6 24-Jan-2005  skrll Sync with HEAD.
 1.63.2.5 02-Nov-2004  skrll Sync with HEAD.
 1.63.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.63.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.63.2.2 03-Aug-2004  skrll Sync with HEAD
 1.63.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.67.2.1 28-May-2004  tron branches: 1.67.2.1.2;
Pull up revision 1.71 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.67.2.1.2.1 18-May-2005  riz Pull up revision 1.72 via patch (requested by christos in ticket #961):
PR/27286: Tom Ivar Helbekkmo: Allow RTM_GET to work with RTA_IFA|RTA_IFP set.
Quiting Tom: The problem is the special case of an RTM_GET message
that wants interface information included in the response, and
therefore include the RTA_IFA or RTA_IFP (or both) flags in the
bitmask that says what addresses are supplied in the message. For
the RTM_GET message, it doesn't make sense to supply addresses
other than the one you're asking about, so those two other bits
are, in that specific case, overloaded with this meaning.
There is code in sys/net/rtsock.c to handle the case, but at some
time, extra sanity checking of the received message was added, that
failed to take this possibility into account.
The patch, is needed for the Asterisk software PBX to work properly
when it has multiple interfaces active: it needs to ask the kernel
for the IP address of the interface that will be used to communicate
with a given host.
 1.72.4.1 29-Apr-2005  kent sync with -current
 1.74.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.78.2.7 27-Feb-2008  yamt sync with head.
 1.78.2.6 21-Jan-2008  yamt sync with head
 1.78.2.5 07-Dec-2007  yamt sync with head
 1.78.2.4 03-Sep-2007  yamt sync with head.
 1.78.2.3 26-Feb-2007  yamt sync with head.
 1.78.2.2 30-Dec-2006  yamt sync with head.
 1.78.2.1 21-Jun-2006  yamt sync with head.
 1.80.6.2 01-Jun-2006  kardel Sync with head.
 1.80.6.1 22-Apr-2006  simonb Sync with head.
 1.80.4.1 09-Sep-2006  rpaulo sync with head
 1.80.2.1 01-Mar-2006  yamt sync with head.
 1.81.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.81.4.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.81.4.3 19-Apr-2006  elad sync with head.
 1.81.4.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.81.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.81.2.5 14-Sep-2006  yamt sync with head.
 1.81.2.4 03-Sep-2006  yamt sync with head.
 1.81.2.3 11-Aug-2006  yamt sync with head
 1.81.2.2 26-Jun-2006  yamt sync with head.
 1.81.2.1 24-May-2006  yamt sync with head.
 1.84.2.1 19-Jun-2006  chap Sync with head.
 1.87.2.1 18-Nov-2006  ad Sync with head.
 1.88.2.2 10-Dec-2006  yamt sync with head.
 1.88.2.1 22-Oct-2006  yamt sync with head
 1.91.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.91.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.93.4.1 11-Jul-2007  mjf Sync with head.
 1.93.2.2 20-Aug-2007  ad Sync with HEAD.
 1.93.2.1 15-Jul-2007  ad Sync with head.
 1.94.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.95.16.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.95.16.1 19-Jul-2007  dyoung file rtsock.c was added on branch matt-mips64 on 2007-07-19 20:48:54 +0000
 1.95.14.2 26-Dec-2007  ad Sync with head.
 1.95.14.1 08-Dec-2007  ad Sync with head.
 1.95.12.2 27-Dec-2007  mjf Sync with HEAD.
 1.95.12.1 08-Dec-2007  mjf Sync with HEAD.
 1.95.6.2 23-Mar-2008  matt sync with HEAD
 1.95.6.1 09-Jan-2008  matt sync with HEAD
 1.95.4.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.96.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.98.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.98.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.98.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.98.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.98.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.98.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.100.4.3 17-Jun-2008  yamt sync with head.
 1.100.4.2 04-Jun-2008  yamt sync with head
 1.100.4.1 18-May-2008  yamt sync with head.
 1.100.2.7 29-Dec-2008  christos protect with _KERNEL_OPT the compat netbsd option.
 1.100.2.6 28-Dec-2008  christos ort_metrics -> rt_metrics
rt_metrics -> nrt_metrics
for userland compatibility
 1.100.2.5 27-Dec-2008  christos merge with head.
 1.100.2.4 09-Nov-2008  christos merge with head.
 1.100.2.3 01-Nov-2008  christos Sync with head.
 1.100.2.2 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.100.2.1 29-Mar-2008  christos file rtsock.c was added on branch christos-time_t on 2008-03-29 20:47:02 +0000
 1.101.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.101.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.101.2.5 11-Aug-2010  yamt sync with head.
 1.101.2.4 11-Mar-2010  yamt sync with head
 1.101.2.3 16-Sep-2009  yamt sync with head
 1.101.2.2 04-May-2009  yamt sync with head.
 1.101.2.1 16-May-2008  yamt sync with head.
 1.107.2.1 18-Jun-2008  simonb Sync with head.
 1.109.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.109.2.1 19-Oct-2008  haad Sync with HEAD.
 1.113.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.113.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.113.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.115.2.4 03-Apr-2009  snj branches: 1.115.2.4.4;
Pull up following revision(s) (requested by christos in ticket #650):
sys/net/route.c: revision 1.117
sys/net/route.h: revision 1.73
sys/net/rtsock.c: revision 1.125
usr.sbin/arp/arp.c: revision 1.48
usr.sbin/pppd/pppd/sys-bsd.c: revision 1.59
Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.
 1.115.2.3 15-Mar-2009  snj Pull up following revision(s) (requested by roy in ticket #560):
sys/net/rtsock.c: revision 1.124
Revert r1.119 as the implementation is broken.
 1.115.2.2 09-Jan-2009  snj Pull up following revision(s) (requested by roy in ticket #239):
sys/net/rtsock.c: revision 1.119
When removing routes automatically added, remove the flag from the
associated address.
When changing routes automatically addded, move the flag to the new
assoicated address.
 1.115.2.1 23-Dec-2008  snj Pull up following revision(s) (requested by christos in ticket #202):
sys/net/rtsock.c: revision 1.117
RTAX_GENMASK and RTAX_AUTHOR could cause kernel memory corruption because
info struct members could be pointing to free'd memory. Fix from dyoung.
XXX: Pullup to 5.0
 1.115.2.4.4.2 13-May-2010  matt Make sure all structure lengths are rounded via RT_ROUNDUP in routing messages.
This simplies the protocol since all items will now start on a RT_ROUNDUP
aligned address independent of the structure.
 1.115.2.4.4.1 27-Apr-2010  matt Make sure each rt_msg has an aligned length.
 1.121.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.127.4.4 21-Apr-2011  rmind sync with head
 1.127.4.3 05-Mar-2011  rmind sync with head
 1.127.4.2 03-Jul-2010  rmind sync with head
 1.127.4.1 30-May-2010  rmind sync with head
 1.127.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.132.4.2 17-Feb-2011  bouyer Sync with HEAD
 1.132.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.132.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.137.6.1 18-Feb-2012  mrg merge to -current.
 1.137.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.137.2.1 17-Apr-2012  yamt sync with head
 1.140.6.3 03-Dec-2017  jdolecek update from HEAD
 1.140.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.140.6.1 23-Jun-2013  tls resync from head
 1.141.6.3 18-May-2014  rmind sync with head
 1.141.6.2 28-Aug-2013  rmind sync with head
 1.141.6.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.143.2.1 10-Aug-2014  tls Rebase.
 1.163.8.2 23-Feb-2019  martin Apply patch, requested by sborrill in ticket #1680:

sys/net/rtsock.c (apply patch)

Fix locking for sysctl_rtable (fix in HEAD will be different).
 1.163.8.1 28-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1657):

sys/net/rtsock.c: revision 1.244 (adapted)

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.163.4.2 23-Feb-2019  martin Apply patch, requested by sborrill in ticket #1680:

sys/net/rtsock.c (apply patch)

Fix locking for sysctl_rtable (fix in HEAD will be different).
 1.163.4.1 28-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1657):

sys/net/rtsock.c: revision 1.244 (adapted)

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.163.2.2 23-Feb-2019  martin Apply patch, requested by sborrill in ticket #1680:

sys/net/rtsock.c (apply patch)

Fix locking for sysctl_rtable (fix in HEAD will be different).
 1.163.2.1 28-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1657):

sys/net/rtsock.c: revision 1.244 (adapted)

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.164.2.12 28-Aug-2017  skrll Sync with HEAD
 1.164.2.11 05-Feb-2017  skrll Sync with HEAD
 1.164.2.10 05-Dec-2016  skrll Sync with HEAD
 1.164.2.9 05-Oct-2016  skrll Sync with HEAD
 1.164.2.8 09-Jul-2016  skrll Sync with HEAD
 1.164.2.7 29-May-2016  skrll Sync with HEAD
 1.164.2.6 22-Apr-2016  skrll Sync with HEAD
 1.164.2.5 19-Mar-2016  skrll Sync with HEAD
 1.164.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.164.2.3 22-Sep-2015  skrll Sync with HEAD
 1.164.2.2 06-Jun-2015  skrll Sync with HEAD
 1.164.2.1 06-Apr-2015  skrll Sync with HEAD
 1.191.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.191.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.191.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.191.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.191.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.191.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.199.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.213.2.13 29-May-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1276):

sys/net/rtsock.c: revision 1.250

Don't take softnet_lock in sysctl_rtable

Taking softnet_lock there can cause a deadlock with nfs sosend, so we don't.
Having only KERNEL_LOCK is enough because now the routing table is protected by
KERNEL_LOCK that was introduced by the fix for PR 53043.

PR kern/54227 from Paul Ripke
 1.213.2.12 07-Mar-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1203):

sys/net/rtsock.c: revision 1.247

Protect sysctl_rtable with KERNEL_LOCK and softnet_lock

In the function the routing table could be accessed without any locks, which was
unsafe. Actually, on netbsd-7, a kernel panic happened(*). The situation of
locking hasn't changed since netbsd-7 so we still need to hold the big locks on
-current (and netbsd-8) too.

Note that if NET_MPSAFE is enabled, the routing table is protected by its own
lock and we don't need the locks.

Reported and tested on netbsd-7 by sborrill@
(*) http://mail-index.netbsd.org/tech-net/2018/11/08/msg007153.html
 1.213.2.11 21-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1101):

sys/net/rtsock.c: revision 1.244

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.213.2.10 05-May-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #788):

sys/net/rtsock.c: revision 1.241

Fix a deadlock (rt_free vs. route_intr on rt_so_mtx)
It occurs only if NET_MPSAFE is enabled.
 1.213.2.9 14-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #749):

sys/net/if.h: revision 1.259
sys/net/route.c: revision 1.209
sys/net/route.h: revision 1.118
sys/net/rtsock.c: revision 1.240

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by
moving utility functions of rtentry updates from rtsock.c and ensuring
holding the rt_lock.
It also improves the atomicity of a update of a rtentry.
 1.213.2.8 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.213.2.7 28-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #595):
sys/net/if.c: revision 1.398
sys/net/rtsock.c: revision 1.231
remove useless cast, initialize family.
Avoid using a zero family mask.
 1.213.2.6 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.213.2.5 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #457):
sys/net/rtsock.c: revision 1.233-1.234, 1.236
Protect ifp returned from route_output_get_ifa surely
An ifp returned from route_output_get_ifa was supposed to be protected
by a returned ifa; if the ifa belongs to ifp, holding the ifa prevents
the ifp from being freed. However route_output_get_ifa can return an ifp
to which a returned ifa doesn't belong. So we need to take a reference
to a returning ifp separately.
--
Fix a bug that tries to psref_acquire ifa with a psref used before
This fixes ATF tests that started to fail by a recent change to psref.
--
Fix compile error (may be used uninitialized)
Hmm, __noinline had hidden this error.
 1.213.2.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.213.2.3 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.213.2.2 25-Jul-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #140):
sys/kern/uipc_domain.c: 1.97-1.99
sys/net/rtsock.c: 1.225-1.227
sys/sys/socket.h: 1.123
Restore the original length of a sockaddr for netmask
route(8) passes a sockaddr for netmask that is truncated with its
prefixlen. However the kernel basically doesn't expect such format
and may read beyond the data. So restore the original length of the
the data at the beginning of the kernel for the rest components.
Failures of ATF tests such as route_flags_blackhole6 should
be fixed.
--
Avoid DIAGNOSTIC warning with previous fix and simplify it (don't require
memory alloc/free).
--
put the code that returns the sizeof the socket by family in one place.
--
don't warn about AF_LINK sockets with sa_len less than the size of the sockaddr
--
don't print diagnostic for AF_LINK
 1.213.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.238.2.25 22-Jan-2019  pgoyette Convert the MODULE_{,VOID_}HOOK_CALL macros to do everything in-line
rather than defining an intermediate hook##call function. Almost
all of the hooks are called only once, and although we lose the
ability of doing things like

if (MODULE_HOOK_CALL(...) == 0) ...

we simplify things quite a bit. With this change, we no longer need
to have both declaration and definition macros, and the definition
no longer needs to have both prototype argument list and a "real"
argument list.

FWIW, the above if now needs to written as

int ret;

MODULE_HOOK_CALL(..., ret);
if (ret == 0) ...

with appropriate use of braces {}.
 1.238.2.24 21-Jan-2019  pgoyette No need to declare the hook_call() function for void hooks. So
remove and simplify.
 1.238.2.23 18-Jan-2019  pgoyette Don't restrict hooks to having only int or void types. Pass the hook's
type to the various macros, as needed.

Allows us to reduce diffs to original in at least one or two places (we
no longer have to provide an additional parameter to the hook routine
for returning a non-int return value).
 1.238.2.22 15-Jan-2019  pgoyette Remove a couple of unneeded #include-s

XXX There's probably a lot more clean-up that could happen here!
 1.238.2.21 15-Jan-2019  pgoyette Add vectors for sctp_{add,delete}_ipaddr() so we can check them
in rtsock.c rather than depending on the SCTP kernel compile
option. This is similar to what was done previously with NTP.
 1.238.2.20 15-Jan-2019  pgoyette Split sys/net/rtsock.c into two pieces, one of which is applicable only
to -current and one which is shared between -current and COMPAT_50.
 1.238.2.19 14-Jan-2019  pgoyette Create a variant of the HOOK macros that handles hook routines of
type void, and use them where appropriate.
 1.238.2.18 13-Jan-2019  pgoyette Add the required hooks for rtsock_50 and modify the COMPATCALL() macro
to use the hooks. While the rtsock_50 situation is still sub-optimal
(it includes the main rtsock.c with a whole bunch of function and
variable redefinitions via macros), this at least makes it possible to
load the rtsock_50 code separately from more recent code, rather than
the previous requirement that rtsock_50 be built-in.
 1.238.2.17 13-Jan-2019  pgoyette Remove the HOOK2 versions of the MODULE_HOOK macros. There were
only a few uses, and using them led to some lack of clarity in the
code. Instead, we now use two separate hooks, with names that
make it clear(er) what we're doing.

This also positions us to start unraveling some of the rtsock_50
mess, which will need (at least) five hooks.
 1.238.2.16 13-Jan-2019  pgoyette Rearrange a bit, put all the sysctl-related stuff at the end of the
file, and enclose it in a single ``#ifdef COMPAT_RTSOCK ... #endif''
block.

XXX Arguably, this code might better belong in its own source file,
but I'll leave that for a future project.
 1.238.2.15 11-Jan-2019  pgoyette Don't accept OIFLIST operation unless the rtsock_70_hook is loaded,
even though the results are otherwise identical to those on current.
 1.238.2.14 11-Jan-2019  pgoyette Rework the various sysctl-related routines to call the correct code
for each version. While here, extract the 5.0 specific code instead
of including in the main rtsock.c code.

Also, clean up all the sysctl-related routines to prevent building
more than one copy, no matter how many places rtsock.c gets #include'd
into!
 1.238.2.13 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.238.2.12 15-Oct-2018  pgoyette Convert a couple more hooks to the MP-safe mechanism.

While here, clean up some headers and remove any that are now empty.
 1.238.2.11 30-Sep-2018  pgoyette Ssync with HEAD
 1.238.2.10 29-Sep-2018  pgoyette In MODULE_HOOK_CALL_DECL we don't need to provide the actual argument
list for calling the hook function, nor do we need to provide the
default value (for when the hook has not been set).
 1.238.2.9 18-Sep-2018  pgoyette The COMPAT_HOOK macros were renamed to MODULE_HOOK, adjust all callers
 1.238.2.8 18-Sep-2018  pgoyette Split the COMPAT_CALL_HOOK to separate the declaration from the
implementation. Some hooks are called from multiple source files,
and the old method resulted in duplicate implementations.

Implement MP-safe hooks for the usb_subr_30 code. Pass the helper
functions as arguments to the compat code so it does not have to
determine if the kernel contains usb code.
 1.238.2.7 17-Sep-2018  pgoyette Adapt (most of) the indirect function pointers to the new MP-safe
mechanism. Still remaining are the compat_netbsd32 stuff, and
some usb subroutines.
 1.238.2.6 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.238.2.5 02-May-2018  pgoyette Synch with HEAD
 1.238.2.4 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.238.2.3 30-Mar-2018  pgoyette Extract compat_14 stuff into its own module
 1.238.2.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.238.2.1 15-Mar-2018  pgoyette Create a separate module for COMPAT_70 code only, and untangle the
70 compat code from the current.
 1.241.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.241.2.1 10-Jun-2019  christos Sync with HEAD
 1.250.2.2 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #168):

sys/net/rtsock.c: revision 1.252
sys/netinet6/nd6_nbr.c: revision 1.168 - 1.172
sys/netinet6/nd6.c: revision 1.262

inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.

This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted
(RTM_DELETED) or has failed to been resolved (RTM_MISS).

The latter case can be interpreted as unreachable.

inet6: change rt_announce and llchange to bool in nd6_na_input()
more bool
 1.250.2.1 26-Aug-2019  martin Pull up following revision(s) (requested by roy in ticket #109):

sys/net/route.h: revision 1.124
sys/netinet6/nd6.c: revision 1.258
sys/netinet6/nd6.c: revision 1.259
sys/net/rtsock.c: revision 1.251
sys/netinet/if_arp.c: revision 1.284
sys/netinet6/nd6_nbr.c: revision 1.167

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9

-

nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.252.2.1 29-Feb-2020  ad Sync with head.
 1.1 20-Nov-2000  bouyer branches: 1.1.2;
file rtsock.c.old was initially added on branch thorpej_scsipi.
 1.1.2.2 22-Nov-2000  bouyer Should not be there
 1.1.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.23 04-Oct-2022  msaitoh Fix comment (!COMPAT_RTSOCK case). No functional change.
 1.22 01-Jul-2022  riastradh route(4): Use m_copydata, not misaligned mtod struct access.

XXX Maybe this should check rtm_len too like route_output does.

Reported-by: syzbot+d37eaf0a26097572bbbc@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=3cdfefd8b7938c9606ed68b4191e97fabdbd7b08
 1.21 29-Jun-2022  riastradh route(4): Avoid unaligned access to struct rt_msghdr, take two.

Can't even take the address of the misaligned struct member for
memcpy. Just copy the header out into a stack variable instead.

Reported-by: syzbot+083d9be5cb3c2e78ed1c@syzkaller.appspotmail.com
 1.20 26-Jun-2022  riastradh route(4): Avoid unaligned access to struct rt_msghdr.

Reported-by: syzbot+e0048186a5cc97b1c5a6@syzkaller.appspotmail.com
 1.19 26-Jun-2020  roy Adjust prior to enforce minimum socket length includes sa_family

Not that the code strictly needs it, but if the macro is ever used
elsewhere then it makes sense as every sockaddr must have it.
The rest of the structure is dictated by the family and in some cases,
truncated on purpose so this is fine.
 1.18 24-Jun-2020  roy Ensure sockaddrs have valid lengths for RO_MISSFILTER.

Thanks to maxv@ for spotting this.
 1.17 13-Mar-2020  christos Use the socket credentials that are established during the socket creation
instead of the current process credentials (which can change via
set{e,}{u,g}id(2)) and by passing the fd to a different process. This makes
the routing socket behave like other file descriptors. Proposed in tech-kern.
 1.16 12-Mar-2020  christos move debugging code after the NULL check.
 1.15 22-Feb-2020  maxv pass the address of the field, instead of relying on it being the first
field of the structure, no functional change
 1.14 09-Feb-2020  roy route(4): dst addr could be in a different mbuf for RO_MISSFILTER

While here, the correct assertation is RTAX_DST == 0.
RTA_DST is just a flag.
 1.13 08-Feb-2020  roy route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.
 1.12 29-Jan-2020  thorpej Do not reference ifp->if_data directly; use if_export_if_data().
 1.11 14-Oct-2019  maxv branches: 1.11.2;
Error out if the type is beyond the storage size. No functional change,
since the shift would otherwise 'and' against zero, returning EEXIST.

Reported-by: syzbot+cb68ccdc1ef3aca2d679@syzkaller.appspotmail.com
 1.10 19-Aug-2019  ozaki-r Initialize dom_mowner for MBUFTRACE
 1.9 03-May-2019  pgoyette branches: 1.9.2; 1.9.4;
Only initialize the NET_MPSAFE stuff once, for the non-compat version
of route_init().
 1.8 29-Apr-2019  roy Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.
 1.7 29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.6 29-Apr-2019  pgoyette For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.
 1.5 10-Apr-2019  thorpej Avoid a maybe-uninitialized warning by checking for an error return
that might indicate that 'len' was not initialized.
 1.4 01-Mar-2019  pgoyette Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.
 1.3 29-Jan-2019  pgoyette Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.
 1.2 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.1 15-Jan-2019  pgoyette branches: 1.1.2;
file rtsock_shared.c was initially added on branch pgoyette-compat.
 1.1.2.7 22-Jan-2019  pgoyette Convert the MODULE_{,VOID_}HOOK_CALL macros to do everything in-line
rather than defining an intermediate hook##call function. Almost
all of the hooks are called only once, and although we lose the
ability of doing things like

if (MODULE_HOOK_CALL(...) == 0) ...

we simplify things quite a bit. With this change, we no longer need
to have both declaration and definition macros, and the definition
no longer needs to have both prototype argument list and a "real"
argument list.

FWIW, the above if now needs to written as

int ret;

MODULE_HOOK_CALL(..., ret);
if (ret == 0) ...

with appropriate use of braces {}.
 1.1.2.6 21-Jan-2019  pgoyette No need to declare the hook_call() function for void hooks. So
remove and simplify.
 1.1.2.5 18-Jan-2019  pgoyette Don't restrict hooks to having only int or void types. Pass the hook's
type to the various macros, as needed.

Allows us to reduce diffs to original in at least one or two places (we
no longer have to provide an additional parameter to the hook routine
for returning a non-int return value).
 1.1.2.4 15-Jan-2019  pgoyette More #include removal
 1.1.2.3 15-Jan-2019  pgoyette Add vectors for sctp_{add,delete}_ipaddr() so we can check them
in rtsock.c rather than depending on the SCTP kernel compile
option. This is similar to what was done previously with NTP.
 1.1.2.2 15-Jan-2019  pgoyette Split sys/net/rtsock.c into two pieces, one of which is applicable only
to -current and one which is shared between -current and COMPAT_50.
 1.1.2.1 15-Jan-2019  pgoyette First pass at extracting the "shared" compat code into its own source
file, rather than burying it in sys/net/rtsock.c and conditionalizing
various pieces.

XXX Not yet used - it will eventually be #include-d by sys/net/rtsock.c
XXX and compat/common/rtsock_50.c
 1.9.4.1 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #269):

sys/netinet6/nd6.h: revision 1.88
sys/net/rtsock_shared.c: revision 1.10
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.283
sys/netinet/if_arp.c: revision 1.288

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.

-

Initialize dom_mowner for MBUFTRACE
 1.9.2.4 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.9.2.3 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9.2.2 10-Jun-2019  christos Sync with HEAD
 1.9.2.1 03-May-2019  christos file rtsock_shared.c was added on branch phil-wifi on 2019-06-10 22:09:45 +0000
 1.11.2.1 29-Feb-2020  ad Sync with head.
 1.42 02-Sep-2024  andvar s/compess/compress/
 1.41 06-Apr-2019  msaitoh branches: 1.41.36;
KNF. No functional change.
 1.40 05-Aug-2016  pgoyette branches: 1.40.16;
Actually commit the changes for making this into a loadable module. The
module infrastructure was committed earlier, but the "guts" of the commit
were somehow missed.
 1.39 24-Aug-2015  pooka branches: 1.39.2;
sprinkle _KERNEL_OPT
 1.38 18-Apr-2009  tsutsui branches: 1.38.22; 1.38.40;
Use memcmp(9) and memcpy(9) directly rather than via
local BCMP() and BCOPY() macro.
 1.37 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.36 18-Mar-2009  cegger bcopy -> memcpy
 1.35 18-Mar-2009  cegger bcmp -> memcmp
 1.34 15-Jun-2008  christos branches: 1.34.4; 1.34.10;
remove unnecessary casts.
 1.33 20-Feb-2008  matt branches: 1.33.6; 1.33.8; 1.33.10; 1.33.12; 1.33.14;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.32 04-Mar-2007  christos branches: 1.32.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.31 19-Apr-2006  christos branches: 1.31.14;
Perry reports that buf can be NULL, so deal with it.
 1.30 15-Apr-2006  christos Coverity CID 756: Remove bogus NULL checks.
 1.29 15-Apr-2006  christos Coverity CID 755: Protect against NULL deref.
 1.28 11-Dec-2005  thorpej branches: 1.28.4; 1.28.6; 1.28.8; 1.28.10; 1.28.12;
ANSI function decls and application of static.
 1.27 11-Dec-2005  christos merge ktrace-lwp.
 1.26 06-Dec-2004  christos branches: 1.26.12;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.25 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.24 02-May-2003  itojun branches: 1.24.2;
KNF
 1.23 12-Nov-2001  lukem add RCSIDs
 1.22 18-Jul-2001  thorpej bzero -> memset
 1.21 30-Mar-2000  augustss branches: 1.21.6; 1.21.8;
Kill some more register declarations.
 1.20 13-Mar-1999  drochner branches: 1.20.8;
make this compile again
 1.19 12-Mar-1999  perry exterminate ovbcopy. patches provided by Erik Bertelsen, pr-7145
 1.18 12-Dec-1998  christos Synchronize with the Ultrix version of the ppp release.
 1.17 17-May-1997  christos Update to ppp-2.3b5
 1.16 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.15 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.14 13-Feb-1996  christos Net prototypes
 1.13 20-Nov-1995  cgd fix casts; should cast pointers to longs, not ints.
 1.12 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.11 28-Mar-1995  jtc KERNEL -> _KERNEL
 1.10 08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.7 08-May-1994  paulus Version from ppp-2.1 release.
 1.6 21-Jan-1994  glass got rid of a warning reported by Bill Sommerfeld
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 14-Aug-1993  deraadt branches: 1.4.2;
ppp from paul mackerras
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.20.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.8.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.21.8.1 03-Aug-2001  lukem update to -current
 1.21.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.21.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.24.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.24.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.24.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.24.2.1 03-Aug-2004  skrll Sync with HEAD
 1.26.12.3 27-Feb-2008  yamt sync with head.
 1.26.12.2 03-Sep-2007  yamt sync with head.
 1.26.12.1 21-Jun-2006  yamt sync with head.
 1.28.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.28.10.2 11-May-2006  elad sync with head
 1.28.10.1 19-Apr-2006  elad sync with head.
 1.28.8.1 24-May-2006  yamt sync with head.
 1.28.6.1 22-Apr-2006  simonb Sync with head.
 1.28.4.1 09-Sep-2006  rpaulo sync with head
 1.31.14.1 12-Mar-2007  rmind Sync with HEAD.
 1.32.16.1 23-Mar-2008  matt sync with HEAD
 1.33.14.1 18-Jun-2008  simonb Sync with head.
 1.33.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.33.10.1 04-May-2009  yamt sync with head.
 1.33.8.1 17-Jun-2008  yamt sync with head.
 1.33.6.1 29-Jun-2008  mjf Sync with HEAD.
 1.34.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.34.4.1 28-Apr-2009  skrll Sync with HEAD.
 1.38.40.2 05-Oct-2016  skrll Sync with HEAD
 1.38.40.1 22-Sep-2015  skrll Sync with HEAD
 1.38.22.1 03-Dec-2017  jdolecek update from HEAD
 1.39.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.40.16.1 10-Jun-2019  christos Sync with HEAD
 1.41.36.1 02-Aug-2025  perseant Sync with HEAD
 1.20 05-Mar-2020  riastradh Need opt_inet.h for #ifdef INET, INET6.
 1.19 12-Dec-2016  maya branches: 1.19.16; 1.19.20;
acknowleg -> acknowledg, proceedure -> procedure.
only comments were changed.

from miod
 1.18 20-Feb-2008  matt branches: 1.18.54; 1.18.74; 1.18.78;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.17 11-Dec-2005  thorpej branches: 1.17.46;
ANSI function decls and application of static.
 1.16 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.15 26-Feb-2005  perry branches: 1.15.4;
nuke trailing whitespace
 1.14 06-Dec-2004  christos branches: 1.14.4; 1.14.6;
Sprinkle #ifdef INET to make a GENERIC kernel compile with INET undefined.
 1.13 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.12 09-Feb-1998  perry branches: 1.12.48;
add multiple inclusion protection (and cleanup).
 1.11 17-May-1997  christos Update to ppp-2.3b5
 1.10 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.9 04-Jul-1995  paulus Latest version of PPP stuff, with packet compression and other
improvements. The PPP kernel code is now split into if_ppp.c,
containing generic PPP support, and ppp_tty.c, which specifically
supports PPP on async tty devices (as a line discipline). This is
so that other devices can be supported without making them look
like ttys.
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.6 08-May-1994  paulus Version from ppp-2.1 release.
 1.5 15-Jan-1994  deraadt multiple inclusion protection
 1.4 14-Aug-1993  deraadt ppp from paul mackerras
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.12.48.6 11-Dec-2005  christos Sync with head.
 1.12.48.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.12.48.4 18-Dec-2004  skrll Sync with HEAD.
 1.12.48.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.48.2 18-Sep-2004  skrll Sync with HEAD.
 1.12.48.1 03-Aug-2004  skrll Sync with HEAD
 1.14.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.14.4.1 29-Apr-2005  kent sync with -current
 1.15.4.2 27-Feb-2008  yamt sync with head.
 1.15.4.1 21-Jun-2006  yamt sync with head.
 1.17.46.1 23-Mar-2008  matt sync with HEAD
 1.18.78.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.18.74.1 05-Feb-2017  skrll Sync with HEAD
 1.18.54.1 03-Dec-2017  jdolecek update from HEAD
 1.19.20.1 19-Mar-2020  martin Pull up following revision(s) (requested by riastradh in ticket #787):

sys/altq/altq_flowvalve.h: revision 1.4
sys/net/zlib.h: revision 1.15
sys/dist/pf/net/pfvar.h: revision 1.23
sys/external/bsd/drm2/dist/include/drm/drmP.h: revision 1.38
sys/external/bsd/drm2/dist/drm/drm_drv.c: revision 1.13
sys/net/slcompress.h: revision 1.20

Need opt_inet.h for #ifdef INET, INET6.

Avoid duplicate definition of internal_state struct.

Avoid struct inode.

This is an fs-independent structure in Linux. We don't actually use
it as such; it's just a dummy struct tag. But we do have an actual
struct inode in ufs and in lfs, and using the same struct tag here
confuses ctf leading to four copies of pretty much every drm data
structure.
 1.19.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.10 06-Sep-2015  dholland More on PR 41200: headers that declare ioctls should include sys/ioccom.h.
This covers (I think) all the MI headers outside of external/ (and dist/).
 1.9 10-Dec-2005  elad branches: 1.9.120; 1.9.140;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.8 07-Aug-2003  agc branches: 1.8.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.7 09-Feb-1998  perry branches: 1.7.48;
add multiple inclusion protection (and cleanup).
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 10-Feb-1994  cgd mccanne convinced me that slip.h *should* exist. this is what
i "implemented" for 4.4, and the adjustments to the other files to
match.
 1.3 20-May-1993  cgd add rcs ids to everything, and clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 25-Mar-1993  cgd branches: 1.1.1;
added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.7.48.4 11-Dec-2005  christos Sync with head.
 1.7.48.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.48.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.48.1 03-Aug-2004  skrll Sync with HEAD
 1.8.16.1 21-Jun-2006  yamt sync with head.
 1.9.140.1 22-Sep-2015  skrll Sync with HEAD
 1.9.120.1 03-Dec-2017  jdolecek update from HEAD
 1.5 07-Feb-2024  msaitoh Remove ryo@'s mail addresses.
 1.4 24-Sep-2021  knakahara Add copyright for no-memcpy toeplitz hash, pointed out by wiz@n.o, thanks.
 1.3 24-Sep-2021  knakahara Import asymmetric toeplitz hash without memcpy implemented by ryo@n.o.

This implementation has better performance than memcpy'ed one.
(30%-60% improvement in micro benchmark)

import from
https://github.com/ryo/l2pkt/blob/master/l2pkt/toeplitz_hash.c
 1.2 05-Apr-2021  yamaguchi s/nitems/__arraycount/
 1.1 30-Jan-2021  jmcneill branches: 1.1.2; 1.1.4;
Add symmetric toeplitz implementation with integration for NICs, from OpenBSD.
 1.1.4.2 03-Apr-2021  thorpej Sync with HEAD.
 1.1.4.1 30-Jan-2021  thorpej file toeplitz.c was added on branch thorpej-futex on 2021-04-03 22:29:01 +0000
 1.1.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.3 24-Sep-2021  knakahara Import asymmetric toeplitz hash without memcpy implemented by ryo@n.o.

This implementation has better performance than memcpy'ed one.
(30%-60% improvement in micro benchmark)

import from
https://github.com/ryo/l2pkt/blob/master/l2pkt/toeplitz_hash.c
 1.2 05-Apr-2021  yamaguchi Include opt_inet.h for INET6
 1.1 30-Jan-2021  jmcneill branches: 1.1.2; 1.1.4;
Add symmetric toeplitz implementation with integration for NICs, from OpenBSD.
 1.1.4.2 03-Apr-2021  thorpej Sync with HEAD.
 1.1.4.1 30-Jan-2021  thorpej file toeplitz.h was added on branch thorpej-futex on 2021-04-03 22:29:01 +0000
 1.1.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.2 10-Sep-2016  pgoyette Move tun.c into the module's own directory, since it is specific to the
module subsystem.
 1.1 10-Sep-2016  pgoyette Add a dummy "tun" module, whose only job is to trigger an autoload of
required module "if_tun". This allows access to /dev/tunN to autload
the require interface module.

XXX There's might be a better place/name for net/tun.c
 1.39 04-Jul-2024  rin net/zlib.c: Add apparent /* FALLTHROUGH */'s to appease compilers

Ideas from t-kusaba at iij, thanks!
 1.38 12-Apr-2022  andvar branches: 1.38.4; 1.38.10;
s/similarily/similarly/
 1.37 11-Jul-2019  msaitoh Fix typo (s/supress/suppress/).
 1.36 19-Feb-2019  christos revert previous. we don't want to change upstream code.
 1.35 18-Feb-2019  christos add fallthrough's
 1.34 29-Dec-2013  pgoyette branches: 1.34.30;
Modularize net/zlib so it can be used by the vnd module (and, eventually,
by an opencrypto module).
 1.33 18-Mar-2009  cegger branches: 1.33.12; 1.33.22; 1.33.26;
Ansify function definitions w/o arguments. Generated with sed.
 1.32 16-Mar-2009  cegger ansify function definitions
 1.31 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.30 05-May-2008  ad branches: 1.30.8; 1.30.14;
Back out previous. It broke the build.
 1.29 04-May-2008  ad Move zlib out of net/ and into kern/. It would probably be better to use
the reachover Makefiles and libz, but this is already here and it works.
 1.28 16-Nov-2006  christos branches: 1.28.52;
__unused removal on arguments; approved by core.
 1.27 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.26 10-May-2006  mrg branches: 1.26.8; 1.26.10;
quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
 1.25 15-Apr-2006  christos Don't use KASSERT, return an error instead to fix the build.
 1.24 15-Apr-2006  christos Coverity CID 1193: Add KASSERT before negative array deref.
 1.23 14-Jan-2006  christos branches: 1.23.2; 1.23.4; 1.23.6; 1.23.8; 1.23.10;
prepare for userland compilation.
 1.22 11-Dec-2005  christos branches: 1.22.2;
merge ktrace-lwp.
 1.21 29-May-2005  christos branches: 1.21.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.20 26-Feb-2005  perry nuke trailing whitespace
 1.19 20-Aug-2002  kristerw branches: 1.19.6; 1.19.14; 1.19.16;
#if 0 a couple of unused functions.
 1.18 07-May-2002  tron branches: 1.18.2;
Fix memory leak which occurs after an allocation failure.
 1.17 13-Mar-2002  fvdl Upgrade this generated version to be based on zlib-1.1.4
 1.16 23-Dec-2001  thorpej Do not provide memcpy()/memset()/memcmp() macros as wrappers
around b*() functions (!!).
 1.15 12-Nov-2001  lukem add RCSIDs
 1.14 14-Oct-2001  simonb Put the storage class first in an array declaration.
 1.13 05-Feb-2001  chs branches: 1.13.2; 1.13.4;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.12 18-Jan-2001  jdolecek constify
 1.11 17-Jan-2001  jdolecek mark local constant stuff as const, so that it's pushed to text segment
 1.10 30-Mar-2000  augustss branches: 1.10.4;
Kill some more register declarations.
 1.9 19-Nov-1999  ragge Include param.h instead of types.h, to get mem* macros on vax.
 1.8 15-Feb-1999  hubertf branches: 1.8.8; 1.8.14;
RCS ID police
 1.7 02-May-1998  christos Merge changes from pppd-2.3.4; adds ppp-deflate-draft stuff and updates
zlib. Maybe we can merge our other copy of zlib with this one now and
avoid having two copies?
 1.6 17-May-1997  christos Update to ppp-2.3b5
 1.5 13-Mar-1997  fvdl Avoid 'unused variable' warning for copyright string, like in the
previous zlib.c version in the tree.
 1.4 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.3 18-Sep-1996  scottr Use sys/types.h and sys/systm.h to bring in prototypes for bzero() and
bcopy(), instead of string.h
 1.2 16-Mar-1996  christos branches: 1.2.4;
#if 0 unused string
 1.1 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.2.4.1 11-Dec-1996  mycroft From trunk:
Use sys/types.h and sys/systm.h to bring in prototypes for bzero() and
bcopy(), instead of string.h.
 1.8.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.8.2 11-Feb-2001  bouyer Sync with HEAD.
 1.8.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.2 21-Mar-2002  he Revert pull-up of revision 1.13 (requested by he):
Need to reexpose local definition of MAX.
 1.10.4.1 20-Mar-2002  he Pull up revisions 1.11-1.17 (requested by fvdl):
Upgrade libz to 1.1.4 due to a possible security bug.
 1.13.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.13.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.4.2 16-Mar-2002  jdolecek Catch up with -current.
 1.13.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.2.6 27-Aug-2002  nathanw Catch up to -current.
 1.13.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.13.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.13.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.13.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.13.2.1 22-Oct-2001  nathanw Catch up to -current.
 1.18.2.1 29-Aug-2002  gehenna catch up with -current.
 1.19.16.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.19.14.1 29-Apr-2005  kent sync with -current
 1.19.6.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.6.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.21.2.2 30-Dec-2006  yamt sync with head.
 1.21.2.1 21-Jun-2006  yamt sync with head.
 1.22.2.1 15-Jan-2006  yamt sync with head.
 1.23.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.23.8.2 11-May-2006  elad sync with head
 1.23.8.1 19-Apr-2006  elad sync with head.
 1.23.6.1 24-May-2006  yamt sync with head.
 1.23.4.2 01-Jun-2006  kardel Sync with head.
 1.23.4.1 22-Apr-2006  simonb Sync with head.
 1.23.2.1 09-Sep-2006  rpaulo sync with head
 1.26.10.2 10-Dec-2006  yamt sync with head.
 1.26.10.1 22-Oct-2006  yamt sync with head
 1.26.8.1 18-Nov-2006  ad Sync with head.
 1.28.52.1 04-May-2009  yamt sync with head.
 1.30.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.30.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.33.26.1 18-May-2014  rmind sync with head
 1.33.22.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.33.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.34.30.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.34.30.1 10-Jun-2019  christos Sync with HEAD
 1.38.10.1 02-Aug-2025  perseant Sync with HEAD
 1.38.4.1 20-Jul-2024  martin Pull up following revision(s) (requested by rin in ticket #739):

sys/net/zlib.c: revision 1.39
sys/conf/copts.mk: revision 1.12 (patch)

net/zlib.c: Add apparent /* FALLTHROUGH */'s to appease compilers
Ideas from t-kusaba at iij, thanks!

sys/conf/copts.mk: Drop fallthrough hack for zlib.c
 1.15 05-Mar-2020  riastradh Avoid duplicate definition of internal_state struct.
 1.14 25-Mar-2009  darran branches: 1.14.64; 1.14.68;
Fixes PR kern/41069 and PR kern/41070.

Extends the Opencrypto API to allow the destination buffer size to be
specified when its not the same size as the input buffer (i.e. for
operations like compress and decompress).
The crypto_op and crypt_n_op structures gain a u_int dst_len field.
The session_op structure gains a comp_alg field to specify a compression
algorithm.
Moved four ioctls to new ids; CIOCGSESSION, CIOCNGSESSION, CIOCCRYPT,
and CIOCNCRYPTM.
Added four backward compatible ioctls; OCIOCGSESSION, OCIOCNGSESSION,
OCIOCCRYPT, and OCIOCNCRYPTM.

Backward compatibility is maintained in ocryptodev.h and ocryptodev.c which
implement the original ioctls and set dst_len and comp_alg to 0.

Adds user-space access to compression features.

Adds software gzip support (CRYPTO_GZIP_COMP).

Adds the fast version of crc32 from zlib to libkern. This should be generally
useful and provide a place to start normalizing the various crc32 routines
in the kernel. The crc32 routine is used in this patch to support GZIP.

With input and support from tls@NetBSD.org.
 1.13 14-Mar-2009  dsl Remove all the __P() from sys (excluding sys/dist)
Diff checked with grep and MK1 eyeball.
i386 and amd64 GENERIC and sys still build.
 1.12 14-Jan-2006  christos branches: 1.12.72; 1.12.82; 1.12.84; 1.12.88; 1.12.92;
zlib 1.2.3 changed the include protection variable names; adjust.
 1.11 11-Dec-2005  christos branches: 1.11.2;
Protect zlib.h with the same symbol as userland.
XXX: We should either not install this, or have only one copy.
 1.10 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.9 29-May-2005  christos branches: 1.9.2;
- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.8 26-Feb-2005  perry nuke trailing whitespace
 1.7 08-Jul-2003  itojun branches: 1.7.8; 1.7.10;
prototype must not have variable name
 1.6 13-Mar-2002  fvdl branches: 1.6.12;
Upgrade this generated version to be based on zlib-1.1.4
 1.5 15-Feb-1999  hubertf branches: 1.5.18; 1.5.20; 1.5.22;
RCS ID police
 1.4 02-May-1998  christos Merge changes from pppd-2.3.4; adds ppp-deflate-draft stuff and updates
zlib. Maybe we can merge our other copy of zlib with this one now and
avoid having two copies?
 1.3 17-May-1997  christos Update to ppp-2.3b5
 1.2 12-Mar-1997  christos Update to ppp-2.3b4; from Paul Mackerras
 1.1 15-Mar-1996  paulus Added packet filtering, support for "PPP Deflate" packet compression,
trivial multicast support, and support for xon/xoff output flow
control to the PPP subsystem. Fixed several bugs, including making
the accumulation and resetting of statistics more consistent. State
for the VJ compressor is now dynamically allocated.
 1.5.22.1 16-Mar-2002  jdolecek Catch up with -current.
 1.5.20.1 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.5.18.1 20-Mar-2002  he Pull up revision 1.6 (requested by fvdl):
Upgrade libz to 1.1.4 due to a possible security bug.
 1.6.12.6 11-Dec-2005  christos Sync with head.
 1.6.12.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.6.12.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.6.12.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.12.2 18-Sep-2004  skrll Sync with HEAD.
 1.6.12.1 03-Aug-2004  skrll Sync with HEAD
 1.7.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.7.8.1 29-Apr-2005  kent sync with -current
 1.9.2.1 21-Jun-2006  yamt sync with head.
 1.11.2.1 15-Jan-2006  yamt sync with head.
 1.12.92.1 21-Apr-2010  matt sync to netbsd-5
 1.12.88.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.84.1 03-May-2009  snj Pull up following revision(s) (requested by tls in ticket #611):
sys/lib/libkern/Makefile: patch
sys/lib/libkern/crc32.c: revision 1.1
sys/lib/libkern/crc32.h: revision 1.1
sys/lib/libkern/libkern.h: revision 1.89
sys/lib/libkern/arch/i386/Makefile.inc: revision 1.28
sys/net/zlib.h: revision 1.14 via patch
sys/opencrypto/crypto.c: revision 1.33
sys/opencrypto/cryptodev.c: revision 1.46
sys/opencrypto/cryptodev.h: revision 1.16
sys/opencrypto/cryptosoft.c: revision 1.24
sys/opencrypto/cryptosoft.h: revision 1.6
sys/opencrypto/deflate.h: revision 1.6
sys/opencrypto/cryptosoft_xform.c: revision 1.12
sys/opencrypto/deflate.c: revision 1.13
sys/opencrypto/files.opencrypto: revision 1.20
sys/opencrypto/ocryptodev.c: revision 1.1
sys/opencrypto/ocryptodev.h: revision 1.1
sys/opencrypto/xform.c: revision 1.18
sys/opencrypto/xform.h: revision 1.10
Fixes PR kern/41069 and PR kern/41070.

Extends the Opencrypto API to allow the destination buffer size to be
specified when its not the same size as the input buffer (i.e. for
operations like compress and decompress).
The crypto_op and crypt_n_op structures gain a u_int dst_len field.
The session_op structure gains a comp_alg field to specify a compression
algorithm.
Moved four ioctls to new ids; CIOCGSESSION, CIOCNGSESSION, CIOCCRYPT,
and CIOCNCRYPTM.
Added four backward compatible ioctls; OCIOCGSESSION, OCIOCNGSESSION,
OCIOCCRYPT, and OCIOCNCRYPTM.

Backward compatibility is maintained in ocryptodev.h and ocryptodev.c which
implement the original ioctls and set dst_len and comp_alg to 0.

Adds user-space access to compression features.

Adds software gzip support (CRYPTO_GZIP_COMP).

Adds the fast version of crc32 from zlib to libkern. This should be generally
useful and provide a place to start normalizing the various crc32 routines
in the kernel. The crc32 routine is used in this patch to support GZIP.

With input and support from tls@NetBSD.org.
 1.12.82.1 28-Apr-2009  skrll Sync with HEAD.
 1.12.72.1 04-May-2009  yamt sync with head.
 1.14.68.1 19-Mar-2020  martin Pull up following revision(s) (requested by riastradh in ticket #787):

sys/altq/altq_flowvalve.h: revision 1.4
sys/net/zlib.h: revision 1.15
sys/dist/pf/net/pfvar.h: revision 1.23
sys/external/bsd/drm2/dist/include/drm/drmP.h: revision 1.38
sys/external/bsd/drm2/dist/drm/drm_drv.c: revision 1.13
sys/net/slcompress.h: revision 1.20

Need opt_inet.h for #ifdef INET, INET6.

Avoid duplicate definition of internal_state struct.

Avoid struct inode.

This is an fs-independent structure in Linux. We don't actually use
it as such; it's just a dummy struct tag. But we do have an actual
struct inode in ufs and in lfs, and using the same struct tag here
confuses ctf leading to four copies of pretty much every drm data
structure.
 1.14.64.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1 26-May-2009  pooka branches: 1.1.2;
Install agr ioctl header and stop putting our hand under the sys skirt
in ifconfig.
 1.1.2.2 20-Jun-2009  yamt sync with head
 1.1.2.1 26-May-2009  yamt file Makefile was added on branch yamt-nfs-mp on 2009-06-20 07:20:33 +0000
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file files.agr was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file files.agr was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file files.agr was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023_slowprotocols.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023_slowprotocols.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023_slowprotocols.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.2 11-Dec-2005  christos branches: 1.2.26;
merge ktrace-lwp.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 26-Feb-2007  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023_tlv.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023_tlv.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023_tlv.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.3 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.2 10-Dec-2005  elad branches: 1.2.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023_tlv.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023_tlv.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023_tlv.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_impl.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_impl.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_impl.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.13 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.12 30-Sep-2021  yamaguchi Replace ifnet::if_agriprivate with ifnet::if_lagg

agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).

After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
 1.11 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.10 01-Jul-2011  joerg branches: 1.10.54; 1.10.60;
Fix memset usage.
 1.9 29-May-2009  darran Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.

Adds SIOCINITIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).

Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).

In consultation with tls@.
 1.8 26-Aug-2007  dyoung branches: 1.8.26; 1.8.36; 1.8.40; 1.8.44;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.7 09-Jul-2007  ad branches: 1.7.2; 1.7.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.6 20-May-2007  yamt use mutex.
 1.5 22-Feb-2007  thorpej branches: 1.5.4; 1.5.6;
TRUE -> true, FALSE -> false
 1.4 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 11-Dec-2005  christos branches: 1.3.26;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.3 03-Sep-2007  yamt sync with head.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.5.6.1 11-Jul-2007  mjf Sync with head.
 1.5.4.3 09-Oct-2007  ad Sync with head.
 1.5.4.2 01-Jul-2007  ad Adapt to callout API change.
 1.5.4.1 08-Jun-2007  ad Sync with head.
 1.7.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.7.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.8.44.1 21-Apr-2010  matt sync to netbsd-5
 1.8.40.1 23-Jul-2009  jym Sync with HEAD.
 1.8.36.1 05-Jun-2009  snj Pull up following revision(s) (requested by 792):
sys/dev/pci/if_wm.c: revision 1.175 via patch
sys/net/if_ethersubr.c: revision 1.172 via patch
sys/net/agr/ieee8023ad_lacp.c: revision 1.9 via patch
sys/net/agr/if_agr.c: revision 1.23 via patch
sys/net/agr/if_agrether.c: revision 1.7 via patch
sys/net/agr/if_agrvar_impl.h: revision 1.8 via patch
Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.
Adds SIOCSIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).
Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).
In consultation with tls@.
 1.8.26.1 20-Jun-2009  yamt sync with head
 1.10.60.1 29-Feb-2020  ad Sync with head.
 1.10.54.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3 16-Dec-2008  christos replace bitmask_snprintf(9) with snprintb(3)
 1.2 10-Dec-2005  elad branches: 1.2.70; 1.2.74; 1.2.84;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.84.1 19-Jan-2009  skrll Sync with HEAD.
 1.2.74.1 04-May-2009  yamt sync with head.
 1.2.70.1 17-Jan-2009  mjf Sync with HEAD.
 1.7 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.6 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.5 16-Dec-2008  christos replace bitmask_snprintf(9) with snprintb(3)
 1.4 11-Dec-2005  christos branches: 1.4.70; 1.4.74; 1.4.84;
merge ktrace-lwp.
 1.3 12-Aug-2005  yamt include callout.h explicitly.
 1.2 01-Jun-2005  yamt branches: 1.2.2;
constify.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_debug.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_debug.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_debug.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.2.1 21-Jun-2006  yamt sync with head.
 1.4.84.1 19-Jan-2009  skrll Sync with HEAD.
 1.4.74.1 04-May-2009  yamt sync with head.
 1.4.70.1 17-Jan-2009  mjf Sync with HEAD.
 1.3 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 01-Jun-2005  yamt branches: 1.2.2;
constify.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_debug.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.4 11-Dec-2005  christos Sync with head.
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_debug.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_debug.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.2.1 21-Jun-2006  yamt sync with head.
 1.5 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.4 29-Oct-2006  yamt branches: 1.4.4;
make agr headers include lock.h and queue.h by themselves.
 1.3 10-Dec-2005  elad branches: 1.3.20; 1.3.22;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 01-Jun-2005  yamt branches: 1.2.2;
make lacp_timer_funcs static.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_impl.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.4 11-Dec-2005  christos Sync with head.
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_impl.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_impl.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.2.3 26-Feb-2007  yamt sync with head.
 1.2.2.2 30-Dec-2006  yamt sync with head.
 1.2.2.1 21-Jun-2006  yamt sync with head.
 1.3.22.1 10-Dec-2006  yamt sync with head.
 1.3.20.1 18-Nov-2006  ad Sync with head.
 1.4.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.6 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.5 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.4 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 11-Dec-2005  christos branches: 1.3.26;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_select.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_select.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_select.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_sm.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_sm.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_sm.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.5 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.4 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 11-Dec-2005  christos branches: 1.3.26;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_sm_mux.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_sm_mux.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_sm_mux.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.4 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.3 11-Dec-2005  christos merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_sm_ptx.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_sm_ptx.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_sm_ptx.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.5 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.4 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 11-Dec-2005  christos branches: 1.3.26;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_sm_rx.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_sm_rx.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_sm_rx.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.5 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.4 15-May-2020  maxv hardclock_ticks -> getticks()
 1.3 11-Dec-2005  christos merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_sm_tx.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_sm_tx.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_sm_tx.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.6 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.5 22-Oct-2006  uebayasi const static -> static const
 1.4 11-Dec-2005  christos branches: 1.4.20; 1.4.22;
merge ktrace-lwp.
 1.3 12-Aug-2005  yamt include callout.h explicitly.
 1.2 01-Jun-2005  yamt branches: 1.2.2;
make lacp_timer_funcs static.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_lacp_timer.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_lacp_timer.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_lacp_timer.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.2.2 30-Dec-2006  yamt sync with head.
 1.2.2.1 21-Jun-2006  yamt sync with head.
 1.4.22.1 22-Oct-2006  yamt sync with head
 1.4.20.1 18-Nov-2006  ad Sync with head.
 1.6 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.5 30-Sep-2021  yamaguchi Replace ifnet::if_agriprivate with ifnet::if_lagg

agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).

After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
 1.4 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.3 11-Dec-2005  christos branches: 1.3.26;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_marker.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_marker.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_marker.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file ieee8023ad_marker.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file ieee8023ad_marker.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file ieee8023ad_marker.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.57 29-Jun-2024  riastradh if_stats(9): Add ifp argument to if_stat..._ref.

This will enable us to pass the ifp through to a dtrace probe inside.

No functional change intended in this change, but this is an API
change visible to modules so it shouldn't be pulled up.

PR kern/58377
 1.56 18-Sep-2022  thorpej Eliminate use of IFF_OACTIVE.
 1.55 20-Jun-2022  yamaguchi Handling frames that vlan id is 0 as non-VLAN frames
even if a vlan tag is stripped by harware offloading
 1.54 31-Dec-2021  riastradh sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.53 30-Sep-2021  yamaguchi Replace ifnet::if_agriprivate with ifnet::if_lagg

agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).

After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
 1.52 02-Aug-2021  andvar fix various typos in comments and log messages.
 1.51 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.50 06-Oct-2019  uwe branches: 1.50.2;
xc_barrier - convenience function to xc_broadcast() a nop.

Make the intent more clear and also avoid a bunch of (xcfunc_t)nullop
casts that gcc 8 -Wcast-function-type is not happy about.
 1.49 26-Apr-2019  pgoyette Some more empty-string --> NULL conversions for module dependencies
 1.48 23-Mar-2019  pgoyette Replace compile-time checking for vlan code with a module hook.

Should resolve the errors reported on irc when booting a kernel which
has agr without vlan:


[ 1.0000000] WARNING: module error: built-in module if_agr can't find builtin dependency `if_vlan'
[ 1.0000000] WARNING: module error: built-in module if_agr prerequisite if_vlan failed, error 2
 1.47 26-Jun-2018  msaitoh branches: 1.47.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.46 25-Jan-2018  christos branches: 1.46.2;
Add locking.
 1.45 16-Jan-2018  knakahara Fix agr(4) module build. Reviewed by pgoyette@n.o, thanks.
 1.44 15-Jan-2018  maxv Mmh, fix a weird mistake: the guy who added #if NVLAN > 0 forgot to
actually include vlan.h, so the branches are never compiled.

They don't compile, by the way, so fix that too, by reproducing the vlan
input path of ether_input().
 1.43 06-Dec-2017  ozaki-r Ensure to not turn on IFF_RUNNING of an interface until its initialization completes

And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
 1.42 06-Dec-2017  ozaki-r Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
 1.41 28-Jan-2017  maya branches: 1.41.6;
Switch agr(4) to use a workqueue. This is necessary because during
a callout, it allocates memory with M_WAITOK, which triggers a
DEBUG assert.

XXX we should drain the workqueue.

ok riastradh
 1.40 15-Dec-2016  ozaki-r branches: 1.40.2;
Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input

The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
 1.39 07-Aug-2016  christos modularize some more drivers and merge the module glue
 1.38 20-Jul-2016  ozaki-r Apply pserialize to some iterations of IP address lists
 1.37 07-Jul-2016  ozaki-r branches: 1.37.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.36 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.35 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.34 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.33 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.32 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.31 12-Sep-2013  martin branches: 1.31.6;
Remove unused variable
 1.30 19-Oct-2011  dyoung branches: 1.30.2; 1.30.12; 1.30.16;
Use if_flags_set() and if_addr_init() instead of ifp->if_ioctl().
 1.29 11-Aug-2010  pgoyette Keep condvar wmesg within 8 char limit
 1.28 26-May-2010  dyoung Change sc_wrports from an int to a bool and "test truth" instead of
comparing with 0.

Add 'volatile' to several other state variables that need it.
 1.27 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.26 08-Feb-2010  dyoung branches: 1.26.2;
Take another stab at fixing the LOCKDEBUG panic reported in PR
kern/39940 and by Martti Kuparinen on current-users@: replace the
ioctl lock with finer-grained locking. Lock the ports list and
wait to if_clone_destroy() until all threads are out of the softc.

Thanks to Martti Kuparinen for testing these changes.
 1.25 19-Jan-2010  pooka branches: 1.25.2;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.24 09-Jun-2009  yamt comment style. no functional change.
 1.23 29-May-2009  darran Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.

Adds SIOCINITIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).

Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).

In consultation with tls@.
 1.22 07-Nov-2008  dyoung branches: 1.22.4;
*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.21 19-May-2008  yamt branches: 1.21.4; 1.21.6; 1.21.8; 1.21.12;
agr_ioctl_filter: comment the intention.
 1.20 20-Dec-2007  dyoung branches: 1.20.6; 1.20.8; 1.20.10; 1.20.12;
Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.19 08-Dec-2007  ad branches: 1.19.4;
Unbork
 1.18 08-Dec-2007  elad Replace usage of p_cred in kauth(9) call with kauth_cred_get().

okay yamt@.
 1.17 05-Dec-2007  dyoung Use IFADDR_FIRST(), IFADDR_NEXT().
 1.16 02-Sep-2007  dyoung branches: 1.16.6; 1.16.8;
We cannot sleep in a software interrupt, so do not sockaddr_dl_alloc(...,
M_WAITOK). Instead, sockaddr_dl_init() a sockaddr_dl on the stack.
 1.15 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.14 26-Aug-2007  dyoung branches: 1.14.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.13 07-Aug-2007  dyoung branches: 1.13.2;
Use sockaddr_dl_measure() and sockaddr_dl_init(). Erase unnecessary
cast.
 1.12 20-May-2007  yamt branches: 1.12.2; 1.12.6;
use mutex.
 1.11 04-Mar-2007  christos branches: 1.11.2; 1.11.4; 1.11.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.10 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.9 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.8 29-Oct-2006  yamt branches: 1.8.4;
agr_setconfig: ensure ifname is NUL terminated. PR/34894 from Michael Pounov.
 1.7 25-Oct-2006  elad Kill KAUTH_GENERIC_ISSUSER use.
 1.6 08-Jul-2006  yamt branches: 1.6.4; 1.6.6;
agr_ioctl: wrap a long line after kauth merge.
 1.5 15-May-2006  yamt branches: 1.5.4;
include sys/kauth.h for kauth_authorize_generic.
 1.4 14-May-2006  elad integrate kauth.
 1.3 11-Dec-2005  christos branches: 1.3.4; 1.3.6; 1.3.8; 1.3.10; 1.3.12;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.6 21-Jan-2008  yamt sync with head
 1.1.8.5 07-Dec-2007  yamt sync with head
 1.1.8.4 03-Sep-2007  yamt sync with head.
 1.1.8.3 26-Feb-2007  yamt sync with head.
 1.1.8.2 30-Dec-2006  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agr.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agr.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agr.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.3.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.3.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.3.8.2 11-Aug-2006  yamt sync with head
 1.3.8.1 24-May-2006  yamt sync with head.
 1.3.6.1 01-Jun-2006  kardel Sync with head.
 1.3.4.1 09-Sep-2006  rpaulo sync with head
 1.5.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.6.6.1 10-Dec-2006  yamt sync with head.
 1.6.4.1 18-Nov-2006  ad Sync with head.
 1.8.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.8.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.11.6.1 09-Dec-2007  reinoud Pullup to HEAD
 1.11.4.1 11-Jul-2007  mjf Sync with head.
 1.11.2.3 09-Oct-2007  ad Sync with head.
 1.11.2.2 20-Aug-2007  ad Sync with HEAD.
 1.11.2.1 08-Jun-2007  ad Sync with head.
 1.12.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.12.6.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.12.6.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.12.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.12.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.13.2.2 07-Aug-2007  dyoung Use sockaddr_dl_measure() and sockaddr_dl_init(). Erase unnecessary
cast.
 1.13.2.1 07-Aug-2007  dyoung file if_agr.c was added on branch matt-mips64 on 2007-08-07 04:27:45 +0000
 1.14.2.2 09-Jan-2008  matt sync with HEAD
 1.14.2.1 06-Nov-2007  matt sync with HEAD
 1.16.8.2 26-Dec-2007  ad Sync with head.
 1.16.8.1 08-Dec-2007  ad Sync with head.
 1.16.6.2 27-Dec-2007  mjf Sync with HEAD.
 1.16.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.19.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.20.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.20.10.5 09-Oct-2010  yamt sync with head
 1.20.10.4 11-Aug-2010  yamt sync with head.
 1.20.10.3 11-Mar-2010  yamt sync with head
 1.20.10.2 20-Jun-2009  yamt sync with head
 1.20.10.1 04-May-2009  yamt sync with head.
 1.20.8.1 04-Jun-2008  yamt sync with head
 1.20.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.20.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.21.12.1 21-Apr-2010  matt sync to netbsd-5
 1.21.8.1 05-Jun-2009  snj Pull up following revision(s) (requested by 792):
sys/dev/pci/if_wm.c: revision 1.175 via patch
sys/net/if_ethersubr.c: revision 1.172 via patch
sys/net/agr/ieee8023ad_lacp.c: revision 1.9 via patch
sys/net/agr/if_agr.c: revision 1.23 via patch
sys/net/agr/if_agrether.c: revision 1.7 via patch
sys/net/agr/if_agrvar_impl.h: revision 1.8 via patch
Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.
Adds SIOCSIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).
Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).
In consultation with tls@.
 1.21.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.21.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.22.4.1 23-Jul-2009  jym Sync with HEAD.
 1.25.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.25.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.26.2.2 05-Mar-2011  rmind sync with head
 1.26.2.1 30-May-2010  rmind sync with head
 1.30.16.1 18-May-2014  rmind sync with head
 1.30.12.2 03-Dec-2017  jdolecek update from HEAD
 1.30.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.30.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.31.6.5 05-Feb-2017  skrll Sync with HEAD
 1.31.6.4 05-Oct-2016  skrll Sync with HEAD
 1.31.6.3 09-Jul-2016  skrll Sync with HEAD
 1.31.6.2 19-Mar-2016  skrll Sync with HEAD
 1.31.6.1 22-Sep-2015  skrll Sync with HEAD
 1.37.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.37.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.37.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.40.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.41.6.2 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.41.6.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.46.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.47.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.47.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.47.2.1 10-Jun-2019  christos Sync with HEAD
 1.50.2.1 29-Feb-2020  ad Sync with head.
 1.12 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.11 10-Nov-2019  chs in many device attach paths, allocate memory with M_WAITOK instead of M_NOWAIT
and remove code to handle failures that can no longer happen.
 1.10 06-Dec-2017  ozaki-r branches: 1.10.4;
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
 1.9 19-Oct-2011  dyoung branches: 1.9.46;
Use if_flags_set() and if_addr_init() instead of ifp->if_ioctl().
 1.8 09-Jun-2009  yamt comment style. no functional change.
 1.7 29-May-2009  darran Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.

Adds SIOCINITIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).

Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).

In consultation with tls@.
 1.6 26-Aug-2007  dyoung branches: 1.6.26; 1.6.36; 1.6.40; 1.6.44;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.5 04-Mar-2007  christos branches: 1.5.2; 1.5.10; 1.5.14;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.4 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 11-Dec-2005  christos branches: 1.3.26;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.3 03-Sep-2007  yamt sync with head.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrether.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrether.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrether.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.26.2 12-Mar-2007  rmind Sync with HEAD.
 1.3.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.5.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.5.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.5.2.1 09-Oct-2007  ad Sync with head.
 1.6.44.1 21-Apr-2010  matt sync to netbsd-5
 1.6.40.1 23-Jul-2009  jym Sync with HEAD.
 1.6.36.1 05-Jun-2009  snj Pull up following revision(s) (requested by 792):
sys/dev/pci/if_wm.c: revision 1.175 via patch
sys/net/if_ethersubr.c: revision 1.172 via patch
sys/net/agr/ieee8023ad_lacp.c: revision 1.9 via patch
sys/net/agr/if_agr.c: revision 1.23 via patch
sys/net/agr/if_agrether.c: revision 1.7 via patch
sys/net/agr/if_agrvar_impl.h: revision 1.8 via patch
Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.
Adds SIOCSIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).
Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).
In consultation with tls@.
 1.6.26.1 20-Jun-2009  yamt sync with head
 1.9.46.1 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.10.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4 26-Sep-2017  knakahara VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.

I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html

XXX need pullup to -8 branch
 1.3 05-May-2007  yamt branches: 1.3.78; 1.3.114;
agrether_hashmbuf: feed ipv6 flowlabel to hash calculation.
 1.2 11-Dec-2005  christos branches: 1.2.26; 1.2.30; 1.2.32;
merge ktrace-lwp.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 03-Sep-2007  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrether_hash.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrether_hash.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrether_hash.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.32.1 11-Jul-2007  mjf Sync with head.
 1.2.30.1 08-Jun-2007  ad Sync with head.
 1.2.26.1 07-May-2007  yamt sync with head.
 1.3.114.1 24-Oct-2017  snj Pull up following revision(s) (requested by knakahara in ticket #302):
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.30-1.31
sys/arch/x86/pci/if_vmx.c: 1.20
sys/dev/ic/i82557.c: 1.148
sys/dev/ic/rtl8169.c: 1.152
sys/dev/pci/cxgb/cxgb_sge.c: 1.5
sys/dev/pci/if_age.c: 1.51
sys/dev/pci/if_alc.c: 1.25
sys/dev/pci/if_ale.c: 1.23
sys/dev/pci/if_bge.c: 1.311
sys/dev/pci/if_bge.c: 1.312
sys/dev/pci/if_bnx.c: 1.62
sys/dev/pci/if_jme.c: 1.32
sys/dev/pci/if_nfe.c: 1.64
sys/dev/pci/if_sip.c: 1.167
sys/dev/pci/if_stge.c: 1.63-1.64
sys/dev/pci/if_ti.c: 1.102
sys/dev/pci/if_txp.c: 1.48
sys/dev/pci/if_vge.c: 1.61
sys/dev/pci/if_wm.c: 1.538
sys/dev/pci/ixgbe/ix_txrx.c: 1.29 via patch
sys/net/agr/if_agrether_hash.c: 1.4
sys/net/if_ether.h: 1.67-1.68
sys/net/if_ethersubr.c: 1.244
sys/net/if_vlan.c: 1.100
sys/net80211/ieee80211_input.c: 1.89
sys/net80211/ieee80211_output.c: 1.59
sys/sys/mbuf.h: 1.171
VLAN ID uses pkthdr instead of mtag now. Contributed by s-yamaguchi@IIJ.
I just commit by proxy. Reviewed by joerg@n.o and christos@n.o, thanks.
See http://mail-index.netbsd.org/tech-net/2017/09/26/msg006459.html
--
only get vtag when we have vtag like the other drivers.
--
- only get the vtag if we have it like the other drivers
- mask the hardware vlan tag
--
- add a constant for the vlan mask.
- enforce that we have a tag before we get it.
only get vtag when we have vtag like the other drivers.
like if_bge.c:1.312 and if_stge.c:1.64.
fixed by s-yamaguchi@IIJ, thanks.
 1.3.78.1 03-Dec-2017  jdolecek update from HEAD
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrethervar.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrethervar.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrethervar.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3 08-Mar-2024  rillig ifconfig: fix agr status

Previously, when the interface was collecting, its status was reported
as '<COLLECTING,DISTRIBUTING>', even when it was not distributing.

sbin/ifconfig/agr.c(170): warning: 'b\0DISTRIBUTING\0' overlaps earlier
'b\0COLLECTING\0' on bit 0 [376]
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrioctl.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrioctl.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrioctl.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.6 24-Apr-2019  msaitoh KNF. No functional change.
 1.5 28-Jan-2017  maya branches: 1.5.14;
Switch agr(4) to use a workqueue. This is necessary because during
a callout, it allocates memory with M_WAITOK, which triggers a
DEBUG assert.

XXX we should drain the workqueue.

ok riastradh
 1.4 24-Mar-2008  yamt branches: 1.4.48; 1.4.68; 1.4.72; 1.4.76;
agrport_monitor: map IFM_NONE to IFM_NONE|IFM_ETHER and add a comment.
 1.3 11-Dec-2005  christos branches: 1.3.70;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.2 24-Mar-2008  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrmonitor.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrmonitor.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrmonitor.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.70.1 03-Apr-2008  mjf Sync with HEAD.
 1.4.76.1 21-Apr-2017  bouyer Sync with HEAD
 1.4.72.1 20-Mar-2017  pgoyette Sync with HEAD
 1.4.68.1 05-Feb-2017  skrll Sync with HEAD
 1.4.48.1 03-Dec-2017  jdolecek update from HEAD
 1.5.14.1 10-Jun-2019  christos Sync with HEAD
 1.4 15-Mar-2009  cegger ansify function definitions
 1.3 23-Nov-2005  yamt branches: 1.3.74; 1.3.84; 1.3.90;
fix a typo in a comment.
 1.2 12-Aug-2005  yamt branches: 1.2.6;
include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrsoftc.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.4 11-Dec-2005  christos Sync with head.
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrsoftc.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrsoftc.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.6.1 29-Nov-2005  yamt sync with head.
 1.3.90.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.3.84.1 28-Apr-2009  skrll Sync with HEAD.
 1.3.74.1 04-May-2009  yamt sync with head.
 1.13 10-Nov-2019  chs in many device attach paths, allocate memory with M_WAITOK instead of M_NOWAIT
and remove code to handle failures that can no longer happen.
 1.12 25-Jan-2018  christos branches: 1.12.4;
Add locking.
 1.11 06-Dec-2017  ozaki-r Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
 1.10 24-Aug-2015  pooka branches: 1.10.10;
sprinkle _KERNEL_OPT
 1.9 19-Jan-2010  pooka branches: 1.9.22; 1.9.40;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.8 01-Sep-2007  dyoung branches: 1.8.24;
Use ifreq_getaddr(). Pass a sockaddr instead of ifreq where sockaddr
will suffice.
 1.7 02-Aug-2007  yamt branches: 1.7.2; 1.7.4; 1.7.6;
don't forget to maintain ama_addrs. reported by Coverity via Arnaud Lacombe.
 1.6 02-Aug-2007  yamt agrport_mc_del_callback: s/SIOCADDMULTI/SIOCDELMULTI/
 1.5 04-Mar-2007  christos branches: 1.5.2; 1.5.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.4 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 11-Dec-2005  christos branches: 1.3.26;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.3 03-Sep-2007  yamt sync with head.
 1.1.8.2 26-Feb-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrsubr.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrsubr.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrsubr.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.26.2 12-Mar-2007  rmind Sync with HEAD.
 1.3.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.5.10.2 03-Sep-2007  skrll Sync with HEAD.
 1.5.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.5.2.2 09-Oct-2007  ad Sync with head.
 1.5.2.1 20-Aug-2007  ad Sync with HEAD.
 1.7.6.2 02-Aug-2007  yamt don't forget to maintain ama_addrs. reported by Coverity via Arnaud Lacombe.
 1.7.6.1 02-Aug-2007  yamt file if_agrsubr.c was added on branch matt-mips64 on 2007-08-02 12:37:48 +0000
 1.7.4.1 06-Nov-2007  matt sync with HEAD
 1.7.2.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.8.24.1 11-Mar-2010  yamt sync with head
 1.9.40.1 22-Sep-2015  skrll Sync with HEAD
 1.9.22.1 03-Dec-2017  jdolecek update from HEAD
 1.10.10.1 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.12.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.5 06-Dec-2017  ozaki-r Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
 1.4 21-Feb-2007  thorpej branches: 1.4.124;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 29-Oct-2006  yamt branches: 1.3.4;
make agr headers include lock.h and queue.h by themselves.
 1.2 10-Dec-2005  elad branches: 1.2.20; 1.2.22;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.3 26-Feb-2007  yamt sync with head.
 1.1.8.2 30-Dec-2006  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrsubr.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrsubr.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrsubr.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.22.1 10-Dec-2006  yamt sync with head.
 1.2.20.1 18-Nov-2006  ad Sync with head.
 1.3.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.4.124.1 05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.7 28-Jan-2017  maya Switch agr(4) to use a workqueue. This is necessary because during
a callout, it allocates memory with M_WAITOK, which triggers a
DEBUG assert.

XXX we should drain the workqueue.

ok riastradh
 1.6 08-Feb-2010  dyoung branches: 1.6.20; 1.6.38; 1.6.42; 1.6.46;
Take another stab at fixing the LOCKDEBUG panic reported in PR
kern/39940 and by Martti Kuparinen on current-users@: replace the
ioctl lock with finer-grained locking. Lock the ports list and
wait to if_clone_destroy() until all threads are out of the softc.

Thanks to Martti Kuparinen for testing these changes.
 1.5 09-Jul-2007  ad branches: 1.5.32; 1.5.54;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.4 20-May-2007  yamt use mutex.
 1.3 11-Dec-2005  christos branches: 1.3.30; 1.3.32;
merge ktrace-lwp.
 1.2 12-Aug-2005  yamt include callout.h explicitly.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.2 03-Sep-2007  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrtimer.c was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrtimer.c was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrtimer.c was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.3.32.1 11-Jul-2007  mjf Sync with head.
 1.3.30.2 01-Jul-2007  ad Adapt to callout API change.
 1.3.30.1 08-Jun-2007  ad Sync with head.
 1.5.54.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.5.32.1 11-Mar-2010  yamt sync with head
 1.6.46.1 21-Apr-2017  bouyer Sync with HEAD
 1.6.42.1 20-Mar-2017  pgoyette Sync with HEAD
 1.6.38.1 05-Feb-2017  skrll Sync with HEAD
 1.6.20.1 03-Dec-2017  jdolecek update from HEAD
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrvar.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrvar.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrvar.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.12 26-Mar-2023  andvar fix various typos in documentation, comments and sysctl device description.
mainly aion -> ation and inlude -> include.
 1.11 28-Jan-2017  maya Switch agr(4) to use a workqueue. This is necessary because during
a callout, it allocates memory with M_WAITOK, which triggers a
DEBUG assert.

XXX we should drain the workqueue.

ok riastradh
 1.10 26-May-2010  dyoung branches: 1.10.18; 1.10.36; 1.10.40; 1.10.44;
Change sc_wrports from an int to a bool and "test truth" instead of
comparing with 0.

Add 'volatile' to several other state variables that need it.
 1.9 08-Feb-2010  dyoung branches: 1.9.2;
Take another stab at fixing the LOCKDEBUG panic reported in PR
kern/39940 and by Martti Kuparinen on current-users@: replace the
ioctl lock with finer-grained locking. Lock the ports list and
wait to if_clone_destroy() until all threads are out of the softc.

Thanks to Martti Kuparinen for testing these changes.
 1.8 29-May-2009  darran branches: 1.8.2;
Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.

Adds SIOCINITIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).

Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).

In consultation with tls@.
 1.7 20-May-2007  yamt branches: 1.7.32; 1.7.44; 1.7.48; 1.7.52;
use mutex.
 1.6 04-Mar-2007  christos branches: 1.6.2; 1.6.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.5 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.4 29-Oct-2006  yamt branches: 1.4.4;
make agr headers include lock.h and queue.h by themselves.
 1.3 08-Jul-2006  yamt branches: 1.3.4; 1.3.6;
make a multiple inclusion protection macro match with the filename.
 1.2 10-Dec-2005  elad branches: 1.2.4; 1.2.8; 1.2.16;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 18-Mar-2005  yamt branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
add agr(4), a pseudo network device driver for link aggregation.
 1.1.8.4 03-Sep-2007  yamt sync with head.
 1.1.8.3 26-Feb-2007  yamt sync with head.
 1.1.8.2 30-Dec-2006  yamt sync with head.
 1.1.8.1 21-Jun-2006  yamt sync with head.
 1.1.6.2 29-Apr-2005  kent sync with -current
 1.1.6.1 18-Mar-2005  kent file if_agrvar_impl.h was added on branch kent-audio2 on 2005-04-29 11:29:32 +0000
 1.1.4.3 11-Dec-2005  christos Sync with head.
 1.1.4.2 01-Apr-2005  skrll Sync with HEAD.
 1.1.4.1 18-Mar-2005  skrll file if_agrvar_impl.h was added on branch ktrace-lwp on 2005-04-01 14:31:50 +0000
 1.1.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.1 18-Mar-2005  yamt file if_agrvar_impl.h was added on branch yamt-km on 2005-03-19 08:36:35 +0000
 1.2.16.1 13-Jul-2006  gdamore Merge from HEAD.
 1.2.8.1 11-Aug-2006  yamt sync with head
 1.2.4.1 09-Sep-2006  rpaulo sync with head
 1.3.6.1 10-Dec-2006  yamt sync with head.
 1.3.4.1 18-Nov-2006  ad Sync with head.
 1.4.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.4.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.6.4.1 11-Jul-2007  mjf Sync with head.
 1.6.2.1 08-Jun-2007  ad Sync with head.
 1.7.52.1 21-Apr-2010  matt sync to netbsd-5
 1.7.48.1 23-Jul-2009  jym Sync with HEAD.
 1.7.44.1 05-Jun-2009  snj Pull up following revision(s) (requested by 792):
sys/dev/pci/if_wm.c: revision 1.175 via patch
sys/net/if_ethersubr.c: revision 1.172 via patch
sys/net/agr/ieee8023ad_lacp.c: revision 1.9 via patch
sys/net/agr/if_agr.c: revision 1.23 via patch
sys/net/agr/if_agrether.c: revision 1.7 via patch
sys/net/agr/if_agrvar_impl.h: revision 1.8 via patch
Add vlan support and hardware offload capabilities to agr.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.
Adds SIOCSIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).
Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).
In consultation with tls@.
 1.7.32.3 11-Aug-2010  yamt sync with head.
 1.7.32.2 11-Mar-2010  yamt sync with head
 1.7.32.1 20-Jun-2009  yamt sync with head
 1.8.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.8.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.9.2.1 30-May-2010  rmind sync with head
 1.10.44.1 21-Apr-2017  bouyer Sync with HEAD
 1.10.40.1 20-Mar-2017  pgoyette Sync with HEAD
 1.10.36.1 05-Feb-2017  skrll Sync with HEAD
 1.10.18.1 03-Dec-2017  jdolecek update from HEAD
 1.1 17-May-2021  yamaguchi branches: 1.1.2; 1.1.6;
Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.1.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.1.6.1 17-May-2021  thorpej file Makefile was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.1.2.2 31-May-2021  cjep sync with head
 1.1.2.1 17-May-2021  cjep file Makefile was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.4 31-Mar-2022  yamaguchi Added a kernel option to run LACP on a half duplex interface
 1.3 16-Nov-2021  yamaguchi Added a kernel option to set SYNC bit of LACP
while the lagg interface is in STANDBY state
 1.2 12-Oct-2021  yamaguchi lagg: update capabilities of ifnet and ethercom

Commonly capabilities of all child interface are configured
to a lagg interface.
 1.1 17-May-2021  yamaguchi branches: 1.1.2; 1.1.6;
Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.1.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.1.6.1 17-May-2021  thorpej file files.lagg was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.1.2.2 31-May-2021  cjep sync with head
 1.1.2.1 17-May-2021  cjep file files.lagg was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.74 30-Jul-2025  ozaki-r lagg: fix locking against myself in lagg_linkstate_changed

Since if.c v1.535 linkstate processing is done with IFNET_LOCK held, so
lagg doesn't need to take it by itself anymore.

Reported by mlelstv@
Acked by yamaguchi@
 1.73 25-Apr-2025  andvar s/cahanged/changed/ in comment.
 1.72 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.71 29-Jun-2024  riastradh branches: 1.71.2;
if_stats(9): Add ifp argument to if_stat..._ref.

This will enable us to pass the ifp through to a dtrace probe inside.

No functional change intended in this change, but this is an API
change visible to modules so it shouldn't be pulled up.

PR kern/58377
 1.70 05-Apr-2024  yamaguchi lagg(4): Added vlan check
 1.69 05-Apr-2024  yamaguchi lagg(4): release lock before pserialize_perform() if possible
 1.68 05-Apr-2024  yamaguchi lagg(4): added __predict_true
 1.67 04-Apr-2024  yamaguchi Added comments to lagg(4)
 1.66 04-Apr-2024  yamaguchi lagg(4): replace NULL check with KASSERT because lp_softc is always non-NULL
 1.65 04-Apr-2024  yamaguchi lagg(4): increase output packets and bytes only if no error occurred

pointed out by ozaki-r@, thanks.
 1.64 04-Apr-2024  yamaguchi lagg(4): change errno

suggested by ozaki-r@, thanks.
 1.63 04-Apr-2024  yamaguchi lagg(4): added NULL check for pfil_run_hooks

pointed out by ozaki-r@, thanks.
 1.62 04-Apr-2024  yamaguchi lagg(4): move comment about IFF_PROMISC

pointed out by ozaki-r@, thanks.
 1.61 04-Apr-2024  yamaguchi lagg(4): added size check to SIOCSLAGG

pointed out by ozaki-r@, thanks.
 1.60 04-Apr-2024  yamaguchi added missing LAGG_UNLOCK()
 1.59 04-Apr-2024  yamaguchi lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding

lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.
But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
 1.58 04-Apr-2024  yamaguchi lagg(4): use flexible array member
 1.57 01-Dec-2023  yamaguchi lagg(4): eliminate unnecessary reset by the change of if_flags
 1.56 01-Dec-2023  yamaguchi lagg(4): use sadl for lagg(4) configured by a user
 1.55 28-Nov-2023  yamaguchi lagg(4): Fix missing IFNET_LOCK acquirement
 1.54 22-Nov-2023  yamaguchi Set the fastest linkspeed in each physical interface to lagg(4)
 1.53 22-Nov-2023  yamaguchi Set ETHERCAP_VLAN_HWTAGGING on lagg(4)
that doesn't has physical interfaces
 1.52 22-Nov-2023  yamaguchi lagg(4): Fix missing pfil_run_hooks() and bpf_mtap()
 1.51 18-Oct-2023  yamaguchi copy MTU of lagg to a interface added to lagg
even if the interface is the first member of the lagg

This change breaks ATF test case for lagg MTU
 1.50 16-Oct-2023  yamaguchi Fix missing IFNET_LOCK holding while destroy the lagg interface
 1.49 16-Oct-2023  yamaguchi lagg(4): release LAGG_LOCK before mtu changing

PR kern/57650
 1.48 26-Jun-2022  riastradh branches: 1.48.4;
lagg(4): Safely handle misaligned mbufs.

Optimizing for non-strict-alignment architectures -- without falling
afoul of alignment sanitizers or overeager compilers -- is left as an
exercise for the reader.

PR kern/56894
 1.47 04-Apr-2022  martin Avoid signed/unsigned comparision by casting the sizeof expression.
 1.46 04-Apr-2022  yamaguchi Move input processing of lagg(4) before ether_input
to get rid of dependence.

This implementation is similar with that of bridge(4).
 1.45 01-Apr-2022  yamaguchi lagg(4): reimplement add and delete port

The IFNET_LOCK for the adding or deleting port became to
be held the whole time while the ifnet of the port is changed.
 1.44 31-Mar-2022  yamaguchi rename lagg_enqueue to lagg_output

NFC
 1.43 31-Mar-2022  yamaguchi Use ether_ioctl to change mtu of lagg(4)
 1.42 31-Mar-2022  yamaguchi Use addlog(4) for putting 2 messages to one line
 1.41 31-Mar-2022  yamaguchi Make lagg interface specified "laggproto none" able to up
 1.40 31-Mar-2022  yamaguchi added log when ifpromisc is failed
 1.39 31-Mar-2022  yamaguchi Set flags related to MTU on adding l2tp(4) to lagg(4)
 1.38 31-Mar-2022  yamaguchi fix coding style
 1.37 31-Mar-2022  yamaguchi lagg(4): remove duplicated bpf_mtap
 1.36 31-Mar-2022  yamaguchi Change error code to ENOBUFS on lack of buffer memory

pointed out by k-goda@IIJ
 1.35 31-Mar-2022  yamaguchi Fix missing freeing resource related to protocol

pointed out by k-goda@IIJ
 1.34 31-Mar-2022  yamaguchi Switch ifp->if_output along with configuring ifp->if_lagg

lagg_port_output stored to ifp->if_output uses ifp->if_lagg.
Therefore, ifp->if_output switches to lagg_port_output after
ifp->if_lagg is configured, and restores in reverse order.

This missing order is pointed out by k-goda@IIJ
 1.33 31-Mar-2022  yamaguchi Added missing NULL check

pointed out by k-goda@IIJ
 1.32 31-Mar-2022  yamaguchi lagg(4): commonize the error handling
 1.31 31-Mar-2022  yamaguchi lagg(4): fix typo

pointed out by k-goda@IIJ
 1.30 12-Jan-2022  yamaguchi Fix to call lacp_linkstate with IFNET_LOCK held

Network stack calls lacp_linkstate through lagg_port_ioctl when
doing "ifconfig up" or "ifconfig down" to an interface that is
a member of lagg(4). And IFNET_LOCK in the member interface
is held while the ioctl.
Therefore, lacp_linkstate is renamed to
lacp_linkstate_ifnet_locked, and always called with IFNET_LOCK
held. It avoids locking agains myself.
 1.29 12-Jan-2022  riastradh lagg(4): Need to take IFNET_LOCK around if_init.

This should really just avoid dropping IFNET_LOCK before it's done
changing the port interface's configuration, but this stop-gap change
will serve provisionally to reduce crashes until we can confirm that
there's no deadlock lurking in the time this logic drops IFNET_LOCK.
 1.28 31-Dec-2021  riastradh sys: Use if_init wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.27 31-Dec-2021  riastradh sys: Use if_stop wrapper function.

Exception: Not in kern_pmf.c, for the kind of silly reason that it
avoids having kern_pmf.c refer to symbols defined only in net; this
avoids a pain in the rump.
 1.26 15-Nov-2021  yamaguchi introduced APIs to configure VLAN TAG to ethernet devices
 1.25 12-Nov-2021  yamaguchi Configure vlan to an added interface after setting ifnet::if_lagg

The configuration uses ioctl of the interface, and the ioctl
for port (lagg_port_ioctl) needs ifnet:::if_lagg setting.
 1.24 12-Nov-2021  yamaguchi lagg: Add vid to vlanid_list in ethercom
 1.23 12-Nov-2021  yamaguchi Fix the wrong check of interface type

- lp->lp_iftype: original ifnet::if_type
- lp->lp_ifp->if_type: current ifnet::if_type
- always IFT_IEEE8023ADLAG
 1.22 12-Nov-2021  yamaguchi lagg: Notify the changes of capenables of interface
to child interfaces
 1.21 11-Nov-2021  yamaguchi lagg: Use promiscuous mode instead of if_init() to avoid panic
when the interface has no if_init()
 1.20 08-Nov-2021  yamaguchi remove unused ioctl command named SIOCGLAGGPORT
to get status of l2tp(4) added to lagg

NOTE:
SIOCGLAGGPORT is based on FreeBSD implementation.
And, currently, it is not used in NetBSD kernel/userland.
 1.19 08-Nov-2021  yamaguchi lagg: renew MAC addresses to change the value of interface type

The interface type(ifnet::if_type) is changed on adding to lagg(4)
and deleting from it.
 1.18 08-Nov-2021  yamaguchi Update the MAC address of all child interface
when that of lagg is changed.
 1.17 22-Oct-2021  yamaguchi lagg: change hash logic to generate the same value
when pairs of source and destination are the same
 1.16 19-Oct-2021  yamaguchi lagg: reject a vlan interface that is not configured

The vlan I/F has no MAC address used in LACP.
 1.15 19-Oct-2021  yamaguchi lagg: support l2tp(4) aggregation

- Accept "ifconfig lagg* laggport l2tp*"
- Set promiscuous mode when the added interface is l2tp*
- check IFF_UP in addition to IFF_RUNNING on
SIOCSIFFLAGS to a child interface.
 1.14 19-Oct-2021  yamaguchi lagg: clear I/G bitg and set G/L bit in a generated MAC address
 1.13 12-Oct-2021  yamaguchi Set a port interface of lagg(4) in promiscuous mode
when the lagg(4) is in promiscuous mode.
 1.12 12-Oct-2021  yamaguchi lagg: update capabilities of ifnet and ethercom

Commonly capabilities of all child interface are configured
to a lagg interface.
 1.11 05-Oct-2021  yamaguchi Drop unicast packets that are not for us
when lagg(4) is not in promisc
 1.10 30-Sep-2021  yamaguchi lagg: Register lagg_ifdetach to ether_ifdetach hook
 1.9 30-Sep-2021  yamaguchi Make a link-layer address of lagg(4) configurable by ifconfig(8)

lagg(4) uses a configured link-layer (MAC) address instead
of a random MAC address generated on creating.
The configured MAC address is copied to all child interface
and used for a system id of LACP.
 1.8 30-Sep-2021  yamaguchi Fix to acquire LAGG_LOCK without psref
to remove possibility of deadlock

the deadlock maybe happened between lagg_ifdetach()
and lagg_delport()

1. lagg_ifdetach calls psref_target_acquire()
2. lagg_delport calls LAGG_LOCK()
3. lagg_ifdetach calls LAGG_LOCK()
- wait for lagg_delport
4. lagg_delport calls psref_target_destroy()
- wait for lagg_ifdetach
 1.7 30-Sep-2021  yamaguchi lagg: Register lagg_linkstate_changed to link-state change hook
 1.6 13-Jul-2021  ozaki-r lagg: fix typo for ALTQ
 1.5 16-Jun-2021  riastradh branches: 1.5.2;
if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.4 24-May-2021  thorpej branches: 1.4.2;
Move __KERNEL_RCSID() to the traditional location.
 1.3 24-May-2021  yamaguchi Added missing copyright and license notice

pointed out by thorpej@n.o., Thanks.
 1.2 19-May-2021  rillig if_lagg: fix format string incompatibility

In struct ifnet, the member if_mtu has type uint64_t, which differs from
struct ifreq, where the member ifru_mtu has type int.
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.4.2.2 31-May-2021  cjep sync with head
 1.4.2.1 24-May-2021  cjep file if_lagg.c was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.5.2.3 01-Aug-2021  thorpej Sync with HEAD.
 1.5.2.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.5.2.1 16-Jun-2021  thorpej file if_lagg.c was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.48.4.5 01-Aug-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1144):

sys/net/lagg/if_lagg.c: revision 1.74

lagg: fix locking against myself in lagg_linkstate_changed

Since if.c v1.535 linkstate processing is done with IFNET_LOCK held, so
lagg doesn't need to take it by itself anymore.

Reported by mlelstv@
Acked by yamaguchi@
 1.48.4.4 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #916):

sys/net/lagg/if_laggproto.c: revision 1.15
sys/net/lagg/if_lagg_lacp.c: revision 1.36
sys/net/lagg/if_laggproto.c: revision 1.16
sys/net/lagg/if_lagg_lacp.c: revision 1.37
sys/net/lagg/if_lagg_lacp.c: revision 1.38
sys/net/lagg/if_lagg_lacp.c: revision 1.39
sys/net/lagg/if_lagg.c: revision 1.54
sys/net/lagg/if_lagg.c: revision 1.55
sys/net/lagg/if_lagg.c: revision 1.59
sys/net/lagg/if_lagg.c: revision 1.70
sys/net/lagg/if_laggproto.h: revision 1.19
sys/net/lagg/if_lagg_lacp.c: revision 1.28
sys/net/lagg/if_lagg_lacp.c: revision 1.29
sys/net/lagg/if_laggproto.c: revision 1.7
sys/net/lagg/if_lagg_lacp.h: revision 1.5
sys/net/lagg/if_laggproto.c: revision 1.8
sys/net/lagg/if_laggproto.c: revision 1.9
sys/net/lagg/if_lagg_lacp.c: revision 1.40
sys/net/lagg/if_lagg_lacp.c: revision 1.41
sys/net/lagg/if_lagg_lacp.c: revision 1.42
sys/net/lagg/if_lagg_lacp.c: revision 1.43
tests/net/if_lagg/t_lagg.sh: revision 1.11
sys/net/lagg/if_lagg.c: revision 1.60
sys/net/lagg/if_lagg.c: revision 1.62
sys/net/lagg/if_lagg.c: revision 1.63
sys/net/lagg/if_lagg.c: revision 1.64
sys/net/lagg/if_laggproto.h: revision 1.20
sys/net/lagg/if_lagg.c: revision 1.65
sys/net/lagg/if_lagg.c: revision 1.66
sys/net/lagg/if_lagg.c: revision 1.67
sys/net/lagg/if_lagg_lacp.c: revision 1.30
sys/net/lagg/if_lagg.c: revision 1.68
sys/net/lagg/if_laggproto.c: revision 1.10
sys/net/lagg/if_lagg_lacp.c: revision 1.31
sys/net/lagg/if_lagg.c: revision 1.69
sys/net/lagg/if_laggproto.c: revision 1.11
sys/net/lagg/if_lagg_lacp.c: revision 1.32
sys/net/lagg/if_laggproto.c: revision 1.12
sys/net/lagg/if_lagg_lacp.c: revision 1.33
sys/net/lagg/if_laggproto.c: revision 1.13
sys/net/lagg/if_lagg_lacp.c: revision 1.34
sys/net/lagg/if_laggproto.c: revision 1.14
sys/net/lagg/if_lagg_lacp.c: revision 1.35

Set the fastest linkspeed in each physical interface to lagg(4)

lagg(4): Added logs about LACP processing

lagg(4): Fix missing IFNET_LOCK acquirement

lagg(4): update link speed when a physical interface is removed

lagg(4): fix missing update of the number of active ports

lagg(4): Added 0 length check

lagg(4): Added LACP_READY state for logging
when a port turns SELECTED or UNSELECTED

lagg(4): added log on detaching a port from SELECTED state to STANDBY
acquire LAGG_PROTO_LOCK instead of pserialize read section

lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding
lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.

But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
added missing LAGG_UNLOCK()

lagg(4): move comment about IFF_PROMISC
pointed out by ozaki-r@, thanks.

lagg(4): added NULL check for pfil_run_hooks
pointed out by ozaki-r@, thanks.

lagg(4): change errno
suggested by ozaki-r@, thanks.

lagg(4): increase output packets and bytes only if no error occurred
pointed out by ozaki-r@, thanks.

lagg(4): replace NULL check with KASSERT because lp_softc is always non-NULL

lagg(4): Use CTASSERT
Added KASSERT for LACP_LOCK

lagg(4): move allocate memory before ioctl
Added comments to lagg(4)

lagg(4): added __predict_true

lagg(4): added missing pserialize_read_enter
fix missing LACP_LOCK

lagg(4): added check of LACP running state for safety

When LACP stops, the handler of callout do nothing
because all port is already detached from lacp.

Therefore, the added checks are just for safety.
added missing workq_wait for lacp_tick_work()

lagg(4): set suppress at the same time with distribution state

lagg(4): remove unnecessary masking
pointed out by ozaki-r@, thanks.

lagg(4): move reply limitation to recive processing

lagg(4): release lock before pserialize_perform() if possible

lagg(4): Added vlan check

lagg(4): Fix missing destroy for list and entry

lagg(4) test: Fix typo and old comment

lagg: fill name of workqueue correctly
Found by KASSERT failure for DIAGNOSTIC kernel.
Authored by ozaki-r@.
 1.48.4.3 12-Dec-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #491):

sys/net/lagg/if_lagg.c: revision 1.56
sys/net/lagg/if_lagg.c: revision 1.57
sbin/ifconfig/lagg.c: revision 1.4

lagg(4): use sadl for lagg(4) configured by a user

lagg(4): eliminate unnecessary reset by the change of if_flags

Fix "ifconfig lagg* lagglacp -maxports" command

This command clears the setting of the maximum number of
lacp active ports. The command was accepted but it did not
work until this change.
 1.48.4.2 27-Nov-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #476):

sys/net/lagg/if_lagg.c: revision 1.52
sys/net/lagg/if_lagg.c: revision 1.53
sys/net/lagg/if_lagg_lacp.c: revision 1.26
sys/net/lagg/if_lagg_lacp.c: revision 1.27

Change LACPDU sending interval by TIMEOUT bit in partner's state

Update sending interval when the partner's state is changed

lagg(4): Fix missing pfil_run_hooks() and bpf_mtap()

Set ETHERCAP_VLAN_HWTAGGING on lagg(4)
that doesn't has physical interfaces
 1.48.4.1 19-Oct-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #429):

sys/net/lagg/if_lagg.c: revision 1.50
sys/net/lagg/if_lagg.c: revision 1.51
tests/net/if_lagg/t_lagg.sh: revision 1.10
sys/net/lagg/if_lagg.c: revision 1.49
tests/net/if_lagg/t_lagg.sh: revision 1.9
share/man/man4/lagg.4: revision 1.5

lagg(4): release LAGG_LOCK before mtu changing
PR kern/57650

Make the lagg interface up before change its MTU
This change is related to PR kern/57650

Fix missing IFNET_LOCK holding while destroy the lagg interface
copy MTU of lagg to a interface added to lagg
even if the interface is the first member of the lagg

This change breaks ATF test case for lagg MTU

Update the test case for MTU of lag to adapt new behavior

Update lagg(4) manual
1. corrected the wrong example
- lagg(4) can not add multiple port and set its priority at once
- This is the restriction of ifconfig(8)
2. adapted to changed behavior related to MTU
- Changed not to copy MTU of the 1st physical interface
to lagg(4) to prevent locking against myself
 1.71.2.1 02-Aug-2025  perseant Sync with HEAD
 1.4 04-Apr-2024  yamaguchi lagg(4): use flexible array member
 1.3 08-Nov-2021  yamaguchi remove unused ioctl command named SIOCGLAGGPORT
to get status of l2tp(4) added to lagg

NOTE:
SIOCGLAGGPORT is based on FreeBSD implementation.
And, currently, it is not used in NetBSD kernel/userland.
 1.2 24-May-2021  yamaguchi branches: 1.2.2; 1.2.6;
Added missing copyright and license notice

pointed out by thorpej@n.o., Thanks.
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.2.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.2.6.1 24-May-2021  thorpej file if_lagg.h was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.2.2.2 31-May-2021  cjep sync with head
 1.2.2.1 24-May-2021  cjep file if_lagg.h was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.43 05-Apr-2024  yamaguchi lagg(4): move reply limitation to recive processing
 1.42 05-Apr-2024  yamaguchi lagg(4): set suppress at the same time with distribution state
 1.41 05-Apr-2024  yamaguchi added missing workq_wait for lacp_tick_work()
 1.40 05-Apr-2024  yamaguchi lagg(4): added check of LACP running state for safety

When LACP stops, the handler of callout do nothing
because all port is already detached from lacp.
Therefore, the added checks are just for safety.
 1.39 05-Apr-2024  yamaguchi fix missing LACP_LOCK
 1.38 05-Apr-2024  yamaguchi lagg(4): added missing pserialize_read_enter
 1.37 04-Apr-2024  yamaguchi Added comments to lagg(4)
 1.36 04-Apr-2024  yamaguchi lagg(4): move allocate memory before ioctl
 1.35 04-Apr-2024  yamaguchi Added KASSERT for LACP_LOCK
 1.34 04-Apr-2024  yamaguchi lagg(4): Use CTASSERT
 1.33 04-Apr-2024  yamaguchi lagg(4): change errno

suggested by ozaki-r@, thanks.
 1.32 04-Apr-2024  yamaguchi lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding

lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.
But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
 1.31 04-Apr-2024  yamaguchi lagg(4): added log on detaching a port from SELECTED state to STANDBY
 1.30 04-Apr-2024  yamaguchi lagg(4): Added LACP_READY state for logging
when a port turns SELECTED or UNSELECTED
 1.29 22-Nov-2023  yamaguchi lagg(4): Added logs about LACP processing
 1.28 22-Nov-2023  yamaguchi Set the fastest linkspeed in each physical interface to lagg(4)
 1.27 22-Nov-2023  yamaguchi Update sending interval when the partner's state is changed
 1.26 22-Nov-2023  yamaguchi Change LACPDU sending interval by TIMEOUT bit in partner's state
 1.25 10-Apr-2022  andvar branches: 1.25.4;
fix various typos in comments and output/log messages.
 1.24 04-Apr-2022  yamaguchi Fix missing m_reset_rcvif for allocated mbuf
 1.23 04-Apr-2022  yamaguchi Move input processing of lagg(4) before ether_input
to get rid of dependence.

This implementation is similar with that of bridge(4).
 1.22 01-Apr-2022  yamaguchi lagg(4): reimplement add and delete port

The IFNET_LOCK for the adding or deleting port became to
be held the whole time while the ifnet of the port is changed.
 1.21 31-Mar-2022  yamaguchi rename lagg_enqueue to lagg_output

NFC
 1.20 31-Mar-2022  yamaguchi Use addlog(4) for putting 2 messages to one line
 1.19 31-Mar-2022  yamaguchi update state of aggregator on multi-speed changing
 1.18 31-Mar-2022  yamaguchi handle LACPDU and MarkerDU in thread context

Those handler move from softint to thread context to
improve throughput in high load, because they hold LACP_LOCK.

pointed out by k-goda@IIJ
 1.17 31-Mar-2022  yamaguchi fix coding style
 1.16 31-Mar-2022  yamaguchi Added length check for safety

pointed out by k-goda@IIJ
 1.15 31-Mar-2022  yamaguchi Added missing kmem_free

pointed out by k-goda@IIJ
 1.14 31-Mar-2022  yamaguchi Added a kernel option to run LACP on a half duplex interface
 1.13 16-Jan-2022  rillig lagg: remove stray semicolon

No binary change.
 1.12 12-Jan-2022  yamaguchi Fix to call lacp_linkstate with IFNET_LOCK held

Network stack calls lacp_linkstate through lagg_port_ioctl when
doing "ifconfig up" or "ifconfig down" to an interface that is
a member of lagg(4). And IFNET_LOCK in the member interface
is held while the ioctl.
Therefore, lacp_linkstate is renamed to
lacp_linkstate_ifnet_locked, and always called with IFNET_LOCK
held. It avoids locking agains myself.
 1.11 06-Jan-2022  riastradh lagg(4): Take lock as required around if ioctl.

Note: There are some calls to SIOCADDMULTI/SIOCDELMULTI that take the
lock when they don't need it, but it's not clear it's harmful either
unless they come via a caller that holds softnet_lock.

candidate fix for
https://mail-index.netbsd.org/current-users/2021/12/31/msg041876.html

ok yamaguchi
 1.10 31-Dec-2021  riastradh sys: Use if_ioctl wrapper function.
 1.9 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.8 16-Nov-2021  yamaguchi Added a kernel option to set SYNC bit of LACP
while the lagg interface is in STANDBY state
 1.7 16-Nov-2021  yamaguchi Set SYNC bit of LACPDU when the interface is on STANDBY state
 1.6 19-Oct-2021  yamaguchi lagg: support l2tp(4) aggregation

- Accept "ifconfig lagg* laggport l2tp*"
- Set promiscuous mode when the added interface is l2tp*
- check IFF_UP in addition to IFF_RUNNING on
SIOCSIFFLAGS to a child interface.
 1.5 02-Oct-2021  mrg avoid set-but-unused-variable warnings.
 1.4 30-Sep-2021  yamaguchi Make a link-layer address of lagg(4) configurable by ifconfig(8)

lagg(4) uses a configured link-layer (MAC) address instead
of a random MAC address generated on creating.
The configured MAC address is copied to all child interface
and used for a system id of LACP.
 1.3 30-Jun-2021  yamaguchi lagg: fix an uninitialize variable

pointed out by tnn@n.o., thanks.
 1.2 18-May-2021  hannken branches: 1.2.2; 1.2.6;
Make this compile without DIAGNOSTIC.
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.2.6.3 01-Aug-2021  thorpej Sync with HEAD.
 1.2.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.2.6.1 18-May-2021  thorpej file if_lagg_lacp.c was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.2.2.2 31-May-2021  cjep sync with head
 1.2.2.1 18-May-2021  cjep file if_lagg_lacp.c was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.25.4.2 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #916):

sys/net/lagg/if_laggproto.c: revision 1.15
sys/net/lagg/if_lagg_lacp.c: revision 1.36
sys/net/lagg/if_laggproto.c: revision 1.16
sys/net/lagg/if_lagg_lacp.c: revision 1.37
sys/net/lagg/if_lagg_lacp.c: revision 1.38
sys/net/lagg/if_lagg_lacp.c: revision 1.39
sys/net/lagg/if_lagg.c: revision 1.54
sys/net/lagg/if_lagg.c: revision 1.55
sys/net/lagg/if_lagg.c: revision 1.59
sys/net/lagg/if_lagg.c: revision 1.70
sys/net/lagg/if_laggproto.h: revision 1.19
sys/net/lagg/if_lagg_lacp.c: revision 1.28
sys/net/lagg/if_lagg_lacp.c: revision 1.29
sys/net/lagg/if_laggproto.c: revision 1.7
sys/net/lagg/if_lagg_lacp.h: revision 1.5
sys/net/lagg/if_laggproto.c: revision 1.8
sys/net/lagg/if_laggproto.c: revision 1.9
sys/net/lagg/if_lagg_lacp.c: revision 1.40
sys/net/lagg/if_lagg_lacp.c: revision 1.41
sys/net/lagg/if_lagg_lacp.c: revision 1.42
sys/net/lagg/if_lagg_lacp.c: revision 1.43
tests/net/if_lagg/t_lagg.sh: revision 1.11
sys/net/lagg/if_lagg.c: revision 1.60
sys/net/lagg/if_lagg.c: revision 1.62
sys/net/lagg/if_lagg.c: revision 1.63
sys/net/lagg/if_lagg.c: revision 1.64
sys/net/lagg/if_laggproto.h: revision 1.20
sys/net/lagg/if_lagg.c: revision 1.65
sys/net/lagg/if_lagg.c: revision 1.66
sys/net/lagg/if_lagg.c: revision 1.67
sys/net/lagg/if_lagg_lacp.c: revision 1.30
sys/net/lagg/if_lagg.c: revision 1.68
sys/net/lagg/if_laggproto.c: revision 1.10
sys/net/lagg/if_lagg_lacp.c: revision 1.31
sys/net/lagg/if_lagg.c: revision 1.69
sys/net/lagg/if_laggproto.c: revision 1.11
sys/net/lagg/if_lagg_lacp.c: revision 1.32
sys/net/lagg/if_laggproto.c: revision 1.12
sys/net/lagg/if_lagg_lacp.c: revision 1.33
sys/net/lagg/if_laggproto.c: revision 1.13
sys/net/lagg/if_lagg_lacp.c: revision 1.34
sys/net/lagg/if_laggproto.c: revision 1.14
sys/net/lagg/if_lagg_lacp.c: revision 1.35

Set the fastest linkspeed in each physical interface to lagg(4)

lagg(4): Added logs about LACP processing

lagg(4): Fix missing IFNET_LOCK acquirement

lagg(4): update link speed when a physical interface is removed

lagg(4): fix missing update of the number of active ports

lagg(4): Added 0 length check

lagg(4): Added LACP_READY state for logging
when a port turns SELECTED or UNSELECTED

lagg(4): added log on detaching a port from SELECTED state to STANDBY
acquire LAGG_PROTO_LOCK instead of pserialize read section

lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding
lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.

But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
added missing LAGG_UNLOCK()

lagg(4): move comment about IFF_PROMISC
pointed out by ozaki-r@, thanks.

lagg(4): added NULL check for pfil_run_hooks
pointed out by ozaki-r@, thanks.

lagg(4): change errno
suggested by ozaki-r@, thanks.

lagg(4): increase output packets and bytes only if no error occurred
pointed out by ozaki-r@, thanks.

lagg(4): replace NULL check with KASSERT because lp_softc is always non-NULL

lagg(4): Use CTASSERT
Added KASSERT for LACP_LOCK

lagg(4): move allocate memory before ioctl
Added comments to lagg(4)

lagg(4): added __predict_true

lagg(4): added missing pserialize_read_enter
fix missing LACP_LOCK

lagg(4): added check of LACP running state for safety

When LACP stops, the handler of callout do nothing
because all port is already detached from lacp.

Therefore, the added checks are just for safety.
added missing workq_wait for lacp_tick_work()

lagg(4): set suppress at the same time with distribution state

lagg(4): remove unnecessary masking
pointed out by ozaki-r@, thanks.

lagg(4): move reply limitation to recive processing

lagg(4): release lock before pserialize_perform() if possible

lagg(4): Added vlan check

lagg(4): Fix missing destroy for list and entry

lagg(4) test: Fix typo and old comment

lagg: fill name of workqueue correctly
Found by KASSERT failure for DIAGNOSTIC kernel.
Authored by ozaki-r@.
 1.25.4.1 27-Nov-2023  martin Pull up following revision(s) (requested by yamaguchi in ticket #476):

sys/net/lagg/if_lagg.c: revision 1.52
sys/net/lagg/if_lagg.c: revision 1.53
sys/net/lagg/if_lagg_lacp.c: revision 1.26
sys/net/lagg/if_lagg_lacp.c: revision 1.27

Change LACPDU sending interval by TIMEOUT bit in partner's state

Update sending interval when the partner's state is changed

lagg(4): Fix missing pfil_run_hooks() and bpf_mtap()

Set ETHERCAP_VLAN_HWTAGGING on lagg(4)
that doesn't has physical interfaces
 1.5 22-Nov-2023  yamaguchi Set the fastest linkspeed in each physical interface to lagg(4)
 1.4 31-Mar-2022  yamaguchi branches: 1.4.4;
handle LACPDU and MarkerDU in thread context

Those handler move from softint to thread context to
improve throughput in high load, because they hold LACP_LOCK.

pointed out by k-goda@IIJ
 1.3 30-Nov-2021  yamaguchi Move net/agr/ieee8023_slowprotocols.h to net/ether_slowprotocols.h

Definitions related to slowprotocols are duplicated between
agr/ieee8023ad_slowprotocols.h and lagg/if_lagg_lacp.h
Therefore, the contents are moved to added file.

Note: currently, there are just LACP and Marker protocol,
however slowprotocols is independent of them.
 1.2 24-May-2021  yamaguchi branches: 1.2.2; 1.2.6;
Added missing copyright and license notice

pointed out by thorpej@n.o., Thanks.
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.2.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.2.6.1 24-May-2021  thorpej file if_lagg_lacp.h was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.2.2.2 31-May-2021  cjep sync with head
 1.2.2.1 24-May-2021  cjep file if_lagg_lacp.h was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.4.4.1 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #916):

sys/net/lagg/if_laggproto.c: revision 1.15
sys/net/lagg/if_lagg_lacp.c: revision 1.36
sys/net/lagg/if_laggproto.c: revision 1.16
sys/net/lagg/if_lagg_lacp.c: revision 1.37
sys/net/lagg/if_lagg_lacp.c: revision 1.38
sys/net/lagg/if_lagg_lacp.c: revision 1.39
sys/net/lagg/if_lagg.c: revision 1.54
sys/net/lagg/if_lagg.c: revision 1.55
sys/net/lagg/if_lagg.c: revision 1.59
sys/net/lagg/if_lagg.c: revision 1.70
sys/net/lagg/if_laggproto.h: revision 1.19
sys/net/lagg/if_lagg_lacp.c: revision 1.28
sys/net/lagg/if_lagg_lacp.c: revision 1.29
sys/net/lagg/if_laggproto.c: revision 1.7
sys/net/lagg/if_lagg_lacp.h: revision 1.5
sys/net/lagg/if_laggproto.c: revision 1.8
sys/net/lagg/if_laggproto.c: revision 1.9
sys/net/lagg/if_lagg_lacp.c: revision 1.40
sys/net/lagg/if_lagg_lacp.c: revision 1.41
sys/net/lagg/if_lagg_lacp.c: revision 1.42
sys/net/lagg/if_lagg_lacp.c: revision 1.43
tests/net/if_lagg/t_lagg.sh: revision 1.11
sys/net/lagg/if_lagg.c: revision 1.60
sys/net/lagg/if_lagg.c: revision 1.62
sys/net/lagg/if_lagg.c: revision 1.63
sys/net/lagg/if_lagg.c: revision 1.64
sys/net/lagg/if_laggproto.h: revision 1.20
sys/net/lagg/if_lagg.c: revision 1.65
sys/net/lagg/if_lagg.c: revision 1.66
sys/net/lagg/if_lagg.c: revision 1.67
sys/net/lagg/if_lagg_lacp.c: revision 1.30
sys/net/lagg/if_lagg.c: revision 1.68
sys/net/lagg/if_laggproto.c: revision 1.10
sys/net/lagg/if_lagg_lacp.c: revision 1.31
sys/net/lagg/if_lagg.c: revision 1.69
sys/net/lagg/if_laggproto.c: revision 1.11
sys/net/lagg/if_lagg_lacp.c: revision 1.32
sys/net/lagg/if_laggproto.c: revision 1.12
sys/net/lagg/if_lagg_lacp.c: revision 1.33
sys/net/lagg/if_laggproto.c: revision 1.13
sys/net/lagg/if_lagg_lacp.c: revision 1.34
sys/net/lagg/if_laggproto.c: revision 1.14
sys/net/lagg/if_lagg_lacp.c: revision 1.35

Set the fastest linkspeed in each physical interface to lagg(4)

lagg(4): Added logs about LACP processing

lagg(4): Fix missing IFNET_LOCK acquirement

lagg(4): update link speed when a physical interface is removed

lagg(4): fix missing update of the number of active ports

lagg(4): Added 0 length check

lagg(4): Added LACP_READY state for logging
when a port turns SELECTED or UNSELECTED

lagg(4): added log on detaching a port from SELECTED state to STANDBY
acquire LAGG_PROTO_LOCK instead of pserialize read section

lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding
lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.

But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
added missing LAGG_UNLOCK()

lagg(4): move comment about IFF_PROMISC
pointed out by ozaki-r@, thanks.

lagg(4): added NULL check for pfil_run_hooks
pointed out by ozaki-r@, thanks.

lagg(4): change errno
suggested by ozaki-r@, thanks.

lagg(4): increase output packets and bytes only if no error occurred
pointed out by ozaki-r@, thanks.

lagg(4): replace NULL check with KASSERT because lp_softc is always non-NULL

lagg(4): Use CTASSERT
Added KASSERT for LACP_LOCK

lagg(4): move allocate memory before ioctl
Added comments to lagg(4)

lagg(4): added __predict_true

lagg(4): added missing pserialize_read_enter
fix missing LACP_LOCK

lagg(4): added check of LACP running state for safety

When LACP stops, the handler of callout do nothing
because all port is already detached from lacp.

Therefore, the added checks are just for safety.
added missing workq_wait for lacp_tick_work()

lagg(4): set suppress at the same time with distribution state

lagg(4): remove unnecessary masking
pointed out by ozaki-r@, thanks.

lagg(4): move reply limitation to recive processing

lagg(4): release lock before pserialize_perform() if possible

lagg(4): Added vlan check

lagg(4): Fix missing destroy for list and entry

lagg(4) test: Fix typo and old comment

lagg: fill name of workqueue correctly
Found by KASSERT failure for DIAGNOSTIC kernel.
Authored by ozaki-r@.
 1.16 26-Sep-2024  rin lagg: fill name of workqueue correctly

Found by KASSERT failure for DIAGNOSTIC kernel.

Authored by ozaki-r@.
 1.15 05-Apr-2024  yamaguchi branches: 1.15.2;
lagg(4): Fix missing destroy for list and entry
 1.14 05-Apr-2024  yamaguchi lagg(4): release lock before pserialize_perform() if possible
 1.13 05-Apr-2024  yamaguchi lagg(4): remove unnecessary masking

pointed out by ozaki-r@, thanks.
 1.12 04-Apr-2024  yamaguchi acquire LAGG_PROTO_LOCK instead of pserialize read section
 1.11 04-Apr-2024  yamaguchi lagg(4): Added 0 length check
 1.10 04-Apr-2024  yamaguchi lagg(4): fix missing update of the number of active ports
 1.9 04-Apr-2024  yamaguchi lagg(4): update link speed when a physical interface is removed
 1.8 28-Nov-2023  yamaguchi lagg(4): Fix missing IFNET_LOCK acquirement
 1.7 22-Nov-2023  yamaguchi Set the fastest linkspeed in each physical interface to lagg(4)
 1.6 31-Mar-2022  yamaguchi branches: 1.6.4;
rename lagg_enqueue to lagg_output

NFC
 1.5 31-Mar-2022  yamaguchi Make lagg interface specified "laggproto none" able to up
 1.4 31-Mar-2022  yamaguchi set active when the port is distributing
 1.3 31-Mar-2022  yamaguchi lagg(4): use KASSERT
 1.2 24-May-2021  thorpej branches: 1.2.2; 1.2.6;
Remove leading blank line.
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.2.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.2.6.1 24-May-2021  thorpej file if_laggproto.c was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.2.2.2 31-May-2021  cjep sync with head
 1.2.2.1 24-May-2021  cjep file if_laggproto.c was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.6.4.1 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #916):

sys/net/lagg/if_laggproto.c: revision 1.15
sys/net/lagg/if_lagg_lacp.c: revision 1.36
sys/net/lagg/if_laggproto.c: revision 1.16
sys/net/lagg/if_lagg_lacp.c: revision 1.37
sys/net/lagg/if_lagg_lacp.c: revision 1.38
sys/net/lagg/if_lagg_lacp.c: revision 1.39
sys/net/lagg/if_lagg.c: revision 1.54
sys/net/lagg/if_lagg.c: revision 1.55
sys/net/lagg/if_lagg.c: revision 1.59
sys/net/lagg/if_lagg.c: revision 1.70
sys/net/lagg/if_laggproto.h: revision 1.19
sys/net/lagg/if_lagg_lacp.c: revision 1.28
sys/net/lagg/if_lagg_lacp.c: revision 1.29
sys/net/lagg/if_laggproto.c: revision 1.7
sys/net/lagg/if_lagg_lacp.h: revision 1.5
sys/net/lagg/if_laggproto.c: revision 1.8
sys/net/lagg/if_laggproto.c: revision 1.9
sys/net/lagg/if_lagg_lacp.c: revision 1.40
sys/net/lagg/if_lagg_lacp.c: revision 1.41
sys/net/lagg/if_lagg_lacp.c: revision 1.42
sys/net/lagg/if_lagg_lacp.c: revision 1.43
tests/net/if_lagg/t_lagg.sh: revision 1.11
sys/net/lagg/if_lagg.c: revision 1.60
sys/net/lagg/if_lagg.c: revision 1.62
sys/net/lagg/if_lagg.c: revision 1.63
sys/net/lagg/if_lagg.c: revision 1.64
sys/net/lagg/if_laggproto.h: revision 1.20
sys/net/lagg/if_lagg.c: revision 1.65
sys/net/lagg/if_lagg.c: revision 1.66
sys/net/lagg/if_lagg.c: revision 1.67
sys/net/lagg/if_lagg_lacp.c: revision 1.30
sys/net/lagg/if_lagg.c: revision 1.68
sys/net/lagg/if_laggproto.c: revision 1.10
sys/net/lagg/if_lagg_lacp.c: revision 1.31
sys/net/lagg/if_lagg.c: revision 1.69
sys/net/lagg/if_laggproto.c: revision 1.11
sys/net/lagg/if_lagg_lacp.c: revision 1.32
sys/net/lagg/if_laggproto.c: revision 1.12
sys/net/lagg/if_lagg_lacp.c: revision 1.33
sys/net/lagg/if_laggproto.c: revision 1.13
sys/net/lagg/if_lagg_lacp.c: revision 1.34
sys/net/lagg/if_laggproto.c: revision 1.14
sys/net/lagg/if_lagg_lacp.c: revision 1.35

Set the fastest linkspeed in each physical interface to lagg(4)

lagg(4): Added logs about LACP processing

lagg(4): Fix missing IFNET_LOCK acquirement

lagg(4): update link speed when a physical interface is removed

lagg(4): fix missing update of the number of active ports

lagg(4): Added 0 length check

lagg(4): Added LACP_READY state for logging
when a port turns SELECTED or UNSELECTED

lagg(4): added log on detaching a port from SELECTED state to STANDBY
acquire LAGG_PROTO_LOCK instead of pserialize read section

lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding
lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.

But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
added missing LAGG_UNLOCK()

lagg(4): move comment about IFF_PROMISC
pointed out by ozaki-r@, thanks.

lagg(4): added NULL check for pfil_run_hooks
pointed out by ozaki-r@, thanks.

lagg(4): change errno
suggested by ozaki-r@, thanks.

lagg(4): increase output packets and bytes only if no error occurred
pointed out by ozaki-r@, thanks.

lagg(4): replace NULL check with KASSERT because lp_softc is always non-NULL

lagg(4): Use CTASSERT
Added KASSERT for LACP_LOCK

lagg(4): move allocate memory before ioctl
Added comments to lagg(4)

lagg(4): added __predict_true

lagg(4): added missing pserialize_read_enter
fix missing LACP_LOCK

lagg(4): added check of LACP running state for safety

When LACP stops, the handler of callout do nothing
because all port is already detached from lacp.

Therefore, the added checks are just for safety.
added missing workq_wait for lacp_tick_work()

lagg(4): set suppress at the same time with distribution state

lagg(4): remove unnecessary masking
pointed out by ozaki-r@, thanks.

lagg(4): move reply limitation to recive processing

lagg(4): release lock before pserialize_perform() if possible

lagg(4): Added vlan check

lagg(4): Fix missing destroy for list and entry

lagg(4) test: Fix typo and old comment

lagg: fill name of workqueue correctly
Found by KASSERT failure for DIAGNOSTIC kernel.
Authored by ozaki-r@.
 1.15.2.1 02-Aug-2025  perseant Sync with HEAD
 1.20 28-Nov-2023  yamaguchi lagg(4): Fix missing IFNET_LOCK acquirement
 1.19 22-Nov-2023  yamaguchi Set the fastest linkspeed in each physical interface to lagg(4)
 1.18 26-Jun-2022  riastradh branches: 1.18.4; 1.18.8;
lagg(4): Safely handle misaligned mbufs.

Optimizing for non-strict-alignment architectures -- without falling
afoul of alignment sanitizers or overeager compilers -- is left as an
exercise for the reader.

PR kern/56894
 1.17 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.16 04-Apr-2022  yamaguchi Move input processing of lagg(4) before ether_input
to get rid of dependence.

This implementation is similar with that of bridge(4).
 1.15 31-Mar-2022  yamaguchi rename lagg_enqueue to lagg_output

NFC
 1.14 31-Mar-2022  yamaguchi Use addlog(4) for putting 2 messages to one line
 1.13 31-Mar-2022  yamaguchi Make lagg interface specified "laggproto none" able to up
 1.12 31-Mar-2022  yamaguchi added log when ifpromisc is failed
 1.11 31-Mar-2022  yamaguchi fix coding style
 1.10 12-Jan-2022  yamaguchi Fix to call lacp_linkstate with IFNET_LOCK held

Network stack calls lacp_linkstate through lagg_port_ioctl when
doing "ifconfig up" or "ifconfig down" to an interface that is
a member of lagg(4). And IFNET_LOCK in the member interface
is held while the ioctl.
Therefore, lacp_linkstate is renamed to
lacp_linkstate_ifnet_locked, and always called with IFNET_LOCK
held. It avoids locking agains myself.
 1.9 19-Oct-2021  yamaguchi lagg: support l2tp(4) aggregation

- Accept "ifconfig lagg* laggport l2tp*"
- Set promiscuous mode when the added interface is l2tp*
- check IFF_UP in addition to IFF_RUNNING on
SIOCSIFFLAGS to a child interface.
 1.8 12-Oct-2021  yamaguchi Set a port interface of lagg(4) in promiscuous mode
when the lagg(4) is in promiscuous mode.
 1.7 12-Oct-2021  yamaguchi lagg: update capabilities of ifnet and ethercom

Commonly capabilities of all child interface are configured
to a lagg interface.
 1.6 30-Sep-2021  yamaguchi lagg: Register lagg_ifdetach to ether_ifdetach hook
 1.5 30-Sep-2021  yamaguchi Make a link-layer address of lagg(4) configurable by ifconfig(8)

lagg(4) uses a configured link-layer (MAC) address instead
of a random MAC address generated on creating.
The configured MAC address is copied to all child interface
and used for a system id of LACP.
 1.4 30-Sep-2021  yamaguchi lagg: Register lagg_linkstate_changed to link-state change hook
 1.3 24-May-2021  yamaguchi branches: 1.3.2; 1.3.6;
Added missing copyright and license notice

pointed out by thorpej@n.o., Thanks.
 1.2 19-May-2021  rillig if_lagg: fix Clang build

Clang is stricter than GCC when it comes to nonliteral format strings.

sys/net/lagg/if_lagg.c:2372:12: error:
format string is not a string literal [-Werror,-Wformat-nonliteral]
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.3.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.3.6.1 24-May-2021  thorpej file if_laggproto.h was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.3.2.2 31-May-2021  cjep sync with head
 1.3.2.1 24-May-2021  cjep file if_laggproto.h was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.18.8.1 16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.18.4.1 03-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #916):

sys/net/lagg/if_laggproto.c: revision 1.15
sys/net/lagg/if_lagg_lacp.c: revision 1.36
sys/net/lagg/if_laggproto.c: revision 1.16
sys/net/lagg/if_lagg_lacp.c: revision 1.37
sys/net/lagg/if_lagg_lacp.c: revision 1.38
sys/net/lagg/if_lagg_lacp.c: revision 1.39
sys/net/lagg/if_lagg.c: revision 1.54
sys/net/lagg/if_lagg.c: revision 1.55
sys/net/lagg/if_lagg.c: revision 1.59
sys/net/lagg/if_lagg.c: revision 1.70
sys/net/lagg/if_laggproto.h: revision 1.19
sys/net/lagg/if_lagg_lacp.c: revision 1.28
sys/net/lagg/if_lagg_lacp.c: revision 1.29
sys/net/lagg/if_laggproto.c: revision 1.7
sys/net/lagg/if_lagg_lacp.h: revision 1.5
sys/net/lagg/if_laggproto.c: revision 1.8
sys/net/lagg/if_laggproto.c: revision 1.9
sys/net/lagg/if_lagg_lacp.c: revision 1.40
sys/net/lagg/if_lagg_lacp.c: revision 1.41
sys/net/lagg/if_lagg_lacp.c: revision 1.42
sys/net/lagg/if_lagg_lacp.c: revision 1.43
tests/net/if_lagg/t_lagg.sh: revision 1.11
sys/net/lagg/if_lagg.c: revision 1.60
sys/net/lagg/if_lagg.c: revision 1.62
sys/net/lagg/if_lagg.c: revision 1.63
sys/net/lagg/if_lagg.c: revision 1.64
sys/net/lagg/if_laggproto.h: revision 1.20
sys/net/lagg/if_lagg.c: revision 1.65
sys/net/lagg/if_lagg.c: revision 1.66
sys/net/lagg/if_lagg.c: revision 1.67
sys/net/lagg/if_lagg_lacp.c: revision 1.30
sys/net/lagg/if_lagg.c: revision 1.68
sys/net/lagg/if_laggproto.c: revision 1.10
sys/net/lagg/if_lagg_lacp.c: revision 1.31
sys/net/lagg/if_lagg.c: revision 1.69
sys/net/lagg/if_laggproto.c: revision 1.11
sys/net/lagg/if_lagg_lacp.c: revision 1.32
sys/net/lagg/if_laggproto.c: revision 1.12
sys/net/lagg/if_lagg_lacp.c: revision 1.33
sys/net/lagg/if_laggproto.c: revision 1.13
sys/net/lagg/if_lagg_lacp.c: revision 1.34
sys/net/lagg/if_laggproto.c: revision 1.14
sys/net/lagg/if_lagg_lacp.c: revision 1.35

Set the fastest linkspeed in each physical interface to lagg(4)

lagg(4): Added logs about LACP processing

lagg(4): Fix missing IFNET_LOCK acquirement

lagg(4): update link speed when a physical interface is removed

lagg(4): fix missing update of the number of active ports

lagg(4): Added 0 length check

lagg(4): Added LACP_READY state for logging
when a port turns SELECTED or UNSELECTED

lagg(4): added log on detaching a port from SELECTED state to STANDBY
acquire LAGG_PROTO_LOCK instead of pserialize read section

lagg(4): Remove unnecessary LAGG_LOCK holding while lagg_proto_detach()
to avoid deadlock in workqueue_wait due to LAGG_LOCK holding
lagg_proto_detach dose not need to hold LAGG_LOCK because only one
context can access to a detaching protocol after sc->sc_var is updated.

But it was held without any reason. And it had caused a deadlock by
holding LAGG_LOCK in caller of workqueue_wait
and waiting for the lock in worker.
added missing LAGG_UNLOCK()

lagg(4): move comment about IFF_PROMISC
pointed out by ozaki-r@, thanks.

lagg(4): added NULL check for pfil_run_hooks
pointed out by ozaki-r@, thanks.

lagg(4): change errno
suggested by ozaki-r@, thanks.

lagg(4): increase output packets and bytes only if no error occurred
pointed out by ozaki-r@, thanks.

lagg(4): replace NULL check with KASSERT because lp_softc is always non-NULL

lagg(4): Use CTASSERT
Added KASSERT for LACP_LOCK

lagg(4): move allocate memory before ioctl
Added comments to lagg(4)

lagg(4): added __predict_true

lagg(4): added missing pserialize_read_enter
fix missing LACP_LOCK

lagg(4): added check of LACP running state for safety

When LACP stops, the handler of callout do nothing
because all port is already detached from lacp.

Therefore, the added checks are just for safety.
added missing workq_wait for lacp_tick_work()

lagg(4): set suppress at the same time with distribution state

lagg(4): remove unnecessary masking
pointed out by ozaki-r@, thanks.

lagg(4): move reply limitation to recive processing

lagg(4): release lock before pserialize_perform() if possible

lagg(4): Added vlan check

lagg(4): Fix missing destroy for list and entry

lagg(4) test: Fix typo and old comment

lagg: fill name of workqueue correctly
Found by KASSERT failure for DIAGNOSTIC kernel.
Authored by ozaki-r@.
 1.6 04-Apr-2022  yamaguchi Move input processing of lagg(4) before ether_input
to get rid of dependence.

This implementation is similar with that of bridge(4).
 1.5 31-Mar-2022  yamaguchi fix coding style
 1.4 30-Sep-2021  yamaguchi lagg: Register lagg_ifdetach to ether_ifdetach hook
 1.3 30-Sep-2021  yamaguchi lagg: Register lagg_linkstate_changed to link-state change hook
 1.2 24-May-2021  yamaguchi branches: 1.2.2; 1.2.6;
Added missing copyright and license notice

pointed out by thorpej@n.o., Thanks.
 1.1 17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.2.6.2 17-Jun-2021  thorpej Sync w/ HEAD.
 1.2.6.1 24-May-2021  thorpej file if_laggvar.h was added on branch thorpej-i2c-spi-conf on 2021-06-17 04:46:35 +0000
 1.2.2.2 31-May-2021  cjep sync with head
 1.2.2.1 24-May-2021  cjep file if_laggvar.h was added on branch cjep_staticlib_x on 2021-05-31 22:15:21 +0000
 1.2 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.1 22-Aug-2010  rmind branches: 1.1.2; 1.1.4; 1.1.10; 1.1.14; 1.1.24; 1.1.28;
Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.1.28.1 18-May-2014  rmind sync with head
 1.1.24.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.14.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.10.2 05-Mar-2011  rmind sync with head
 1.1.10.1 22-Aug-2010  rmind file Makefile was added on branch rmind-uvmplock on 2011-03-05 20:55:54 +0000
 1.1.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.1.4.1 22-Aug-2010  uebayasi file Makefile was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.1.2.2 09-Oct-2010  yamt sync with head
 1.1.2.1 22-Aug-2010  yamt file Makefile was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.2 17-Apr-2025  gdt npf: Adjust README following tech-kern@ discussion

This text has been adjusted to follow the rough consensus of the
public comments and a number of off-list comments.
 1.1 29-Sep-2018  rmind branches: 1.1.2; 1.1.6; 1.1.40;
NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.1.40.1 02-Aug-2025  perseant Sync with HEAD
 1.1.6.2 10-Jun-2019  christos Sync with HEAD
 1.1.6.1 29-Sep-2018  christos file README was added on branch phil-wifi on 2019-06-10 22:09:46 +0000
 1.1.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.1.2.1 29-Sep-2018  pgoyette file README was added on branch pgoyette-compat on 2018-09-30 01:45:56 +0000
 1.24 01-Jun-2025  joe kernel: extract rules, lookup socket, process filtering, reviews by christos@
 1.23 30-May-2020  rmind branches: 1.23.26;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.22 23-Jul-2019  rmind branches: 1.22.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.21 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.20 02-Jan-2017  rmind branches: 1.20.14; 1.20.16;
NPF: implement dynamic handling of interface addresses (the kernel part).
 1.19 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.18 09-Dec-2016  christos This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
 1.17 19-Jul-2014  rmind branches: 1.17.2; 1.17.4; 1.17.8; 1.17.10;
NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.16 08-Nov-2013  rmind branches: 1.16.2;
NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.15 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.14 19-Sep-2013  rmind - Convert NPF to use BPF byte-code by default. Compile BPF byte-code in
npfctl(8) and generate separate marks to describe the filter criteria.
- Rewrite 'npfctl show' functionality and fix some of the bugs.
- npftest: add a test for BPF COP.
- Bump NPF_VERSION.
 1.13 02-Jun-2013  rmind branches: 1.13.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.12 12-Mar-2013  christos normali{s,z}e
 1.11 10-Mar-2013  christos Split the npflog cloner and auto-load the extensions.
 1.10 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.9 10-Dec-2012  rmind Add NPF "rndblock" extension to randomly drop packets (using a random function
with a percentage or modulo operation). This is a demo module, although it can
be used for packet loss simulation. Example of a procedure in npf.conf:

procedure "somedrop" {
# Drop 1.9% of the traffic
rndblock: percentage 1.9
}
 1.8 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.7 15-Jul-2012  rmind branches: 1.7.2;
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.6 06-Feb-2012  rmind branches: 1.6.2;
- Split NPF rule procedure code into a separate module (no functional changes).
- Simplify some code, add more comments, some asserts.
- G/C unused rule hook code.
 1.5 29-Nov-2011  rmind branches: 1.5.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.4 18-Dec-2010  rmind branches: 1.4.6; 1.4.10;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.3 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 16-Sep-2010  rmind branches: 1.2.2; 1.2.4;
NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 16-Sep-2010  uebayasi file files.npf was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 16-Sep-2010  yamt file files.npf was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.4.10.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.3 16-Jan-2013  yamt sync with (a bit old) head
 1.4.10.2 30-Oct-2012  yamt sync with head
 1.4.10.1 17-Apr-2012  yamt sync with head
 1.4.6.2 05-Mar-2011  rmind sync with head
 1.4.6.1 18-Dec-2010  rmind file files.npf was added on branch rmind-uvmplock on 2011-03-05 20:55:54 +0000
 1.5.2.1 18-Feb-2012  mrg merge to -current.
 1.6.2.4 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.6.2.3 15-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #745):
distrib/sets/lists/comp/shl.mi: revision 1.241
distrib/sets/lists/modules/mi: revision 1.49
distrib/sets/lists/base/md.sparc64: revision 1.171
lib/npf/ext_rndblock/npfext_rndblock.c: revision 1.1
distrib/sets/lists/base/ad.mips64eb: revision 1.106
distrib/sets/lists/modules/md.evbppc: revision 1.29
sys/net/npf/npf_ext_rndblock.c: revision 1.1
lib/npf/Makefile: revision 1.2
sys/modules/npf_ext_rndblock/Makefile: revision 1.1
lib/npf/ext_rndblock/Makefile: revision 1.1
distrib/sets/lists/base/ad.mips64el: revision 1.106
lib/npf/ext_rndblock/shlib_version: revision 1.1
distrib/sets/lists/base/md.amd64: revision 1.182
distrib/sets/lists/base/shl.mi: revision 1.643
sys/net/npf/files.npf: revision 1.9
sys/modules/Makefile: revision 1.117
Add NPF &quot;rndblock&quot; extension to randomly drop packets (using a random function
with a percentage or modulo operation). This is a demo module, although it can
be used for packet loss simulation. Example of a procedure in npf.conf:
procedure &quot;somedrop&quot; {
# Drop 1.9% of the traffic
rndblock: percentage 1.9
}
 1.6.2.2 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.6.2.1 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.7.2.5 03-Dec-2017  jdolecek update from HEAD
 1.7.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.3 23-Jun-2013  tls resync from head
 1.7.2.2 25-Feb-2013  tls resync with head
 1.7.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.13.2.1 18-May-2014  rmind sync with head
 1.16.2.1 10-Aug-2014  tls Rebase.
 1.17.10.1 18-Jan-2017  skrll Sync with netbsd-5
 1.17.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.17.4.1 05-Feb-2017  skrll Sync with HEAD
 1.17.2.1 18-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1319):
sys/modules/npf/Makefile: revision 1.19
sys/net/npf/files.npf: revision 1.18
sys/net/npf/lpm.c: revision 1.1
sys/net/npf/lpm.h: revision 1.1
sys/net/npf/npf_impl.h: revision 1.62
sys/net/npf/npf_tableset.c: revision 1.24
sys/net/npf/npf_tableset_ptree.c: file removal
sys/rump/net/lib/libnpf/Makefile: revision 1.18
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
--
ditch ptree and use lpm
--
remove ptree add lpm
 1.20.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.20.16.1 10-Jun-2019  christos Sync with HEAD
 1.20.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.22.2.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.23.26.1 02-Aug-2025  perseant Sync with HEAD
 1.6 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.5 29-Jan-2017  christos branches: 1.5.12; 1.5.14;
- Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.4 26-Dec-2016  christos branches: 1.4.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.3 13-Mar-2013  christos branches: 1.3.6; 1.3.14; 1.3.18; 1.3.22;
add missing argument
 1.2 13-Mar-2013  christos don't auto-unload
 1.1 10-Mar-2013  christos Split the npflog cloner and auto-load the extensions.
 1.3.22.2 20-Mar-2017  pgoyette Sync with HEAD
 1.3.22.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.3.18.1 05-Feb-2017  skrll Sync with HEAD
 1.3.14.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.3.14.1 13-Mar-2013  yamt file if_npflog.c was added on branch yamt-pagecache on 2014-05-22 11:41:09 +0000
 1.3.6.3 03-Dec-2017  jdolecek update from HEAD
 1.3.6.2 23-Jun-2013  tls resync from head
 1.3.6.1 13-Mar-2013  tls file if_npflog.c was added on branch tls-maxphys on 2013-06-23 06:20:25 +0000
 1.4.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.5.14.1 10-Jun-2019  christos Sync with HEAD
 1.5.12.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.2 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.1 29-Jan-2017  christos branches: 1.1.2; 1.1.4; 1.1.8; 1.1.18; 1.1.20; 1.1.22;
- Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.1.22.1 10-Jun-2019  christos Sync with HEAD
 1.1.20.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 29-Jan-2017  jdolecek file if_npflog.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1.8.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.8.1 29-Jan-2017  bouyer file if_npflog.h was added on branch bouyer-socketcan on 2017-04-21 16:54:05 +0000
 1.1.4.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.4.1 29-Jan-2017  pgoyette file if_npflog.h was added on branch pgoyette-localcount on 2017-03-20 06:57:50 +0000
 1.1.2.2 05-Feb-2017  skrll Sync with HEAD
 1.1.2.1 29-Jan-2017  skrll file if_npflog.h was added on branch nick-nhusb on 2017-02-05 13:40:58 +0000
 1.6 12-Jun-2019  christos Avoid LOCKDEBUG pserialize panic by implementing suggestion #1 from

http://mail-index.netbsd.org/current-users/2019/02/24/msg035220.html:

Convert the mutex to spin-lock at IPL_NET (but it is excessive) and
convert the memory allocations in that code path to KM_NOSLEEP.
 1.5 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.4 01-Jun-2017  chs branches: 1.4.8; 1.4.10; 1.4.12;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.3 26-Dec-2016  rmind branches: 1.3.2; 1.3.6; 1.3.8;
Fix kmem_free() in hashmap_remove().
 1.2 26-Dec-2016  rmind Fix kmem_free() sizes in hashmap_rehash() and lpm_clear().
 1.1 09-Dec-2016  christos branches: 1.1.2;
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
 1.1.2.4 27-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1340):
sys/net/npf/lpm.c: revision 1.3
Fix kmem_free() in hashmap_remove().
 1.1.2.3 26-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1339):
sys/net/npf/lpm.c: revision 1.2
Fix kmem_free() sizes in hashmap_rehash() and lpm_clear().
 1.1.2.2 18-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1319):
sys/modules/npf/Makefile: revision 1.19
sys/net/npf/files.npf: revision 1.18
sys/net/npf/lpm.c: revision 1.1
sys/net/npf/lpm.h: revision 1.1
sys/net/npf/npf_impl.h: revision 1.62
sys/net/npf/npf_tableset.c: revision 1.24
sys/net/npf/npf_tableset_ptree.c: file removal
sys/rump/net/lib/libnpf/Makefile: revision 1.18
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
--
ditch ptree and use lpm
--
remove ptree add lpm
 1.1.2.1 09-Dec-2016  snj file lpm.c was added on branch netbsd-7 on 2016-12-18 07:40:50 +0000
 1.3.8.3 28-Aug-2017  skrll Sync with HEAD
 1.3.8.2 05-Feb-2017  skrll Sync with HEAD
 1.3.8.1 26-Dec-2016  skrll file lpm.c was added on branch nick-nhusb on 2017-02-05 13:40:58 +0000
 1.3.6.2 18-Jan-2017  skrll Sync with netbsd-5
 1.3.6.1 26-Dec-2016  skrll file lpm.c was added on branch netbsd-7-nhusb on 2017-01-18 08:46:46 +0000
 1.3.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.3.2.1 26-Dec-2016  pgoyette file lpm.c was added on branch pgoyette-localcount on 2017-01-07 08:56:50 +0000
 1.4.12.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4.12.1 10-Jun-2019  christos Sync with HEAD
 1.4.10.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.4.8.2 03-Dec-2017  jdolecek update from HEAD
 1.4.8.1 01-Jun-2017  jdolecek file lpm.c was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.3 12-Jun-2019  christos Avoid LOCKDEBUG pserialize panic by implementing suggestion #1 from

http://mail-index.netbsd.org/current-users/2019/02/24/msg035220.html:

Convert the mutex to spin-lock at IPL_NET (but it is excessive) and
convert the memory allocations in that code path to KM_NOSLEEP.
 1.2 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.1 09-Dec-2016  christos branches: 1.1.2; 1.1.4; 1.1.8; 1.1.10; 1.1.22; 1.1.24; 1.1.26;
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
 1.1.26.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.26.1 10-Jun-2019  christos Sync with HEAD
 1.1.24.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.1.22.2 03-Dec-2017  jdolecek update from HEAD
 1.1.22.1 09-Dec-2016  jdolecek file lpm.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1.10.2 05-Feb-2017  skrll Sync with HEAD
 1.1.10.1 09-Dec-2016  skrll file lpm.h was added on branch nick-nhusb on 2017-02-05 13:40:58 +0000
 1.1.8.2 18-Jan-2017  skrll Sync with netbsd-5
 1.1.8.1 09-Dec-2016  skrll file lpm.h was added on branch netbsd-7-nhusb on 2017-01-18 08:46:46 +0000
 1.1.4.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.4.1 09-Dec-2016  pgoyette file lpm.h was added on branch pgoyette-localcount on 2017-01-07 08:56:50 +0000
 1.1.2.2 18-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1319):
sys/modules/npf/Makefile: revision 1.19
sys/net/npf/files.npf: revision 1.18
sys/net/npf/lpm.c: revision 1.1
sys/net/npf/lpm.h: revision 1.1
sys/net/npf/npf_impl.h: revision 1.62
sys/net/npf/npf_tableset.c: revision 1.24
sys/net/npf/npf_tableset_ptree.c: file removal
sys/rump/net/lib/libnpf/Makefile: revision 1.18
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
--
ditch ptree and use lpm
--
remove ptree add lpm
 1.1.2.1 09-Dec-2016  snj file lpm.h was added on branch netbsd-7 on 2016-12-18 07:40:50 +0000
 1.44 27-Aug-2020  riastradh npf: Make sure to initialize portmap_lock only once.

PR kern/55586
 1.43 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.42 07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.41 25-Aug-2019  rmind branches: 1.41.2;
- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.40 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.39 06-Aug-2019  christos - npf_conn_init(): fix a race when initialising the G/C thread.
- Fix a bug when partially initialised connection is destroyed on error.
(from rmind@)
 1.38 23-Jul-2019  rmind branches: 1.38.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.37 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.36 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.35 12-Sep-2018  christos Fix lockdebug diagnostic error of trying to acquire an rw_lock from a
pserialized active context. From riastradh@
 1.34 01-Jun-2017  chs branches: 1.34.8; 1.34.10;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.33 26-Dec-2016  christos branches: 1.33.6;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.32 10-Dec-2016  christos add functionality to lookup a nat entry from the connection list.
 1.31 29-Oct-2015  christos branches: 1.31.2;
Simplify even further and fix non-modular kernels:
We cannot use the init at attach() trick, because other npf ext modules
will load before the attach function is called on non modular kernels.
 1.30 27-Oct-2015  christos modules don't define MODULAR.
 1.29 27-Oct-2015  christos simplify (and fix) logic.
 1.28 19-Oct-2015  martin Ifdef npf_init() the same way as all it's callers are protected.
 1.27 19-Oct-2015  christos Fix the code so that it works in all 3 cases: non-modular, modular/builtin,
modular/filesystem. In the non-modular case we initialize through attach.
In the modular/builtin case we define the module to be class misc so it
attaches late (after percpu is initialized) since driver modules attach
too early. In the modular/filesystem case we define it to be a driver
module since we autoload it via /dev/npf open.
 1.26 18-Oct-2015  jmcneill Defer initialization of built-in npf module until other pseudo-devices
are initialized. MODULE_CLASS_DRIVER modules are now initialized before
autoconfiguration starts, but npf_init has a dependency on percpu(9) which
doesn't work until CPUs have attached (at least on ARM).
 1.25 18-Oct-2015  christos needs to be driver, otherwise it will not load!
 1.24 17-Oct-2015  jmcneill mark this MODULE_CLASS_MISC as npf_init cannot run when builtin driver modules are initialized
 1.23 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.22 25-Jul-2014  dholland branches: 1.22.4;
Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.21 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.20 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.19 16-Mar-2014  dholland branches: 1.19.2;
Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.18 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.17 19-Sep-2013  rmind - Convert NPF to use BPF byte-code by default. Compile BPF byte-code in
npfctl(8) and generate separate marks to describe the filter criteria.
- Rewrite 'npfctl show' functionality and fix some of the bugs.
- npftest: add a test for BPF COP.
- Bump NPF_VERSION.
 1.16 02-Jun-2013  rmind branches: 1.16.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.15 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.14 29-Oct-2012  rmind Implement NPF table listing and preservation of entries on reload.
Bump the version.
 1.13 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.12 15-Jul-2012  rmind branches: 1.12.2;
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.11 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.9 11-Mar-2012  rmind - Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
 1.8 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.7 15-Jan-2012  rmind branches: 1.7.2;
- Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.6 06-Nov-2011  tron branches: 1.6.4;
Change module class to driver as npf(4) is a pseudo device.
 1.5 25-Apr-2011  yamt branches: 1.5.4;
fix module build
 1.4 02-Feb-2011  rmind branches: 1.4.2;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.3 18-Jan-2011  rmind branches: 1.3.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.2 18-Dec-2010  rmind branches: 1.2.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.1 22-Aug-2010  rmind branches: 1.1.2; 1.1.4;
Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.1.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.1.4.1 22-Aug-2010  uebayasi file npf.c was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.1.2.2 09-Oct-2010  yamt sync with head
 1.1.2.1 22-Aug-2010  yamt file npf.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.2.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.3.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.4.2.3 31-May-2011  rmind sync with head
 1.4.2.2 05-Mar-2011  rmind sync with head
 1.4.2.1 02-Feb-2011  rmind file npf.c was added on branch rmind-uvmplock on 2011-03-05 20:55:54 +0000
 1.5.4.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.4.3 30-Oct-2012  yamt sync with head
 1.5.4.2 17-Apr-2012  yamt sync with head
 1.5.4.1 10-Nov-2011  yamt sync with head
 1.6.4.3 05-Apr-2012  mrg sync to latest -current.
 1.6.4.2 24-Feb-2012  mrg sync to -current.
 1.6.4.1 18-Feb-2012  mrg merge to -current.
 1.7.2.7 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.7.2.6 24-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #702):
sys/net/npf/npf_tableset.c: revision 1.15
usr.sbin/npf/npfctl/npfctl.h: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.6
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.10
sys/net/npf/npf_state_tcp.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.24
sys/net/npf/npf.h: revision 1.22
sys/net/npf/npf_ctl.c: revision 1.19
sys/net/npf/npf.c: revision 1.14
usr.sbin/npf/npfctl/npfctl.8: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.21
npf_tcp_inwindow: inspect the sequence numbers even if the packet contains no
data, fixing up only the RST to the initial SYN. This makes off-path attacks
more difficult. For the reference, see &quot;Reflection Scan: an Off-Path Attack
on TCP&quot; by Jan Wrobel.
Implement NPF table listing and preservation of entries on reload.
Bump the version.
npfctl(8): mention table listing.
 1.7.2.5 19-Nov-2012  msaitoh Fix a bug that the patch was incorrectly applied with last commit.
 1.7.2.4 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.7.2.3 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.7.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.7.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.12.2.5 03-Dec-2017  jdolecek update from HEAD
 1.12.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.3 23-Jun-2013  tls resync from head
 1.12.2.2 25-Feb-2013  tls resync with head
 1.12.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.16.2.1 18-May-2014  rmind sync with head
 1.19.2.1 10-Aug-2014  tls Rebase.
 1.22.4.4 28-Aug-2017  skrll Sync with HEAD
 1.22.4.3 05-Feb-2017  skrll Sync with HEAD
 1.22.4.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.22.4.1 22-Sep-2015  skrll Sync with HEAD
 1.31.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.31.2.3 26-Jul-2016  pgoyette Rename LOCALCOUNT_INITIALIZER to DEVSW_MODULE_INIT. This better describes
what we're doing, and why.
 1.31.2.2 19-Jul-2016  pgoyette Instead of repeatedly typing the conditional initialization of the
.d_localcount members in the various {b,c}devsw, define an initializer
macro and use it. This also removes the need for defining new symbols
for each 'struct localcount'.

As suggested by riastradh@
 1.31.2.1 18-Jul-2016  pgoyette Rump drivers are always installed via devsw_attach() so we need to
always allocate a 'struct localcount' for these drivers whenever they
are built as modules.
 1.33.6.2 29-Apr-2017  pgoyette Remove more unnecessary #include for sys/localcount.h
 1.33.6.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.34.10.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.34.10.1 10-Jun-2019  christos Sync with HEAD
 1.34.8.2 26-Jan-2019  pgoyette Sync with HEAD
 1.34.8.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.38.2.4 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.38.2.3 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.38.2.2 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.38.2.1 07-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #25):

sys/net/npf/npf_conn.h: revision 1.17
sys/net/npf/npf.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.28
sys/net/npf/npf_conn.c: revision 1.29

Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.

- npf_conn_init(): fix a race when initialising the G/C thread.
- Fix a bug when partially initialised connection is destroyed on error.
(from rmind@)
 1.41.2.1 29-Feb-2020  ad Sync with head.
 1.68 09-Oct-2025  joe PR kern/59615 introduce layer checks for 10 userland 11 kernel
 1.67 01-Jul-2025  joe branches: 1.67.2;
kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.66 01-Jun-2025  joe NPF copyright 2025
 1.65 31-May-2025  joe Userland: npf rule parser for user and group id
 1.64 12-Feb-2023  kardel branches: 1.64.6;
PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream as https://github.com/rmind/npf/pull/115
 1.63 30-May-2020  rmind branches: 1.63.20;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.62 25-Aug-2019  rmind Move PACKET_TAG_NPF where it belongs to.
 1.61 21-Aug-2019  rmind npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.60 23-Jul-2019  rmind branches: 1.60.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.59 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.58 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.57 19-Apr-2018  christos branches: 1.57.2;
s/static inline/static __inline/g for consistency.
 1.56 08-Mar-2018  maxv Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.

Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.
 1.55 15-Dec-2017  maxv branches: 1.55.2;
Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:
packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)
Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.54 29-Jan-2017  christos branches: 1.54.6;
- Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.53 26-Dec-2016  rmind branches: 1.53.2;
Bump NPF_VERSION to 19.
 1.52 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.51 10-Dec-2016  christos Welcome to version 18:
- Connection state keys are not stored and loaded using the logical key
contents.
- connection finder key is stored in a map that contains the key and the
direction.
 1.50 10-Dec-2016  christos add functionality to lookup a nat entry from the connection list.
 1.49 09-Dec-2016  christos make this compile again
 1.48 08-Dec-2016  rmind NPF: adjust the 'stateful-ends' mechanism to tag the packets and thus
pass-through them on other interfaces. Per discussion with christos@.
 1.47 10-Aug-2014  rmind branches: 1.47.2; 1.47.4; 1.47.6; 1.47.8; 1.47.12;
- Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.46 23-Jul-2014  rmind npf_iscached: add an assert.
 1.45 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.44 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.43 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.42 29-Jun-2014  rmind NPF:
- Populate the BPF external memory store with L3 information.
- Eliminate NPF_COP_L3 call and just use the data in the memstore.
- Bump NPF_VERSION.
 1.41 25-Jun-2014  rmind Adjust NPF to the recent BPF / BPF JIT changes and make it work again.
All regression tests are happy now (hi alnsn!).
 1.40 30-May-2014  rmind - npf_nat_freepolicy: handle a race condition when a new connection might
be associated with a NAT policy which is going away and npfctl reload
would wait for its natural expiration (potentially long time).
- Remove npf_ruleset_natreload() by merging into npf_ruleset_reload().
- npf_ruleset_reload: eliminate a small time period when a valid NAT
policy might be inactive during the reload operation.
 1.39 19-May-2014  jakllsch Add ability to have mbufs disappear (to another interface) during
npf_rproc_run(). For upcoming npf_ext_route extension.

Guidance and ok by rmind@.
 1.38 14-Mar-2014  rmind branches: 1.38.2;
NPF: add support for "stateful-ends".
 1.37 13-Feb-2014  rmind NPF: add support for IPv6-to-IPv6 Network Prefix Translation (NPTv6),
as per RFC 6296. Add a unit test. Also, bump NPF_VERSION.

Thanks to S.P.Zeidler for the help with NPTv6 work!
 1.36 07-Feb-2014  rmind NPF: add support for static (stateless) NAT.
 1.35 06-Feb-2014  rmind Add support for CDB based NPF tables.
 1.34 06-Dec-2013  rmind NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.33 12-Nov-2013  rmind NPF: add support for table naming and remove NPF_TABLE_SLOTS (there is
just an arbitrary sanity limit of NPF_MAX_TABLES currently set to 128).

Few misc fixes. Bump NPF_VERSION.
 1.32 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.31 19-Sep-2013  rmind - Convert NPF to use BPF byte-code by default. Compile BPF byte-code in
npfctl(8) and generate separate marks to describe the filter criteria.
- Rewrite 'npfctl show' functionality and fix some of the bugs.
- npftest: add a test for BPF COP.
- Bump NPF_VERSION.
 1.30 11-Mar-2013  christos branches: 1.30.6;
use sizeof(req) to find if it was empty or not (from uwe)
 1.29 11-Mar-2013  christos *"" is not constant according to gcc. So we move the responsibility for adding
a , to the users of the macro.
 1.28 11-Mar-2013  christos - avoid trailing , in dependencies when there are none other the npf module
itself.
- remove if_npflog dependency from npf_ext_log.
 1.27 10-Feb-2013  rmind - Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
 1.26 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.25 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.24 23-Dec-2012  rmind - Add NPF version check in proplist as well, not only ioctl. Bump the version.
- Fix a bug in table entry lookup.
- Updates/fixes to the man pages. Misc.
 1.23 10-Dec-2012  rmind npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.22 29-Oct-2012  rmind Implement NPF table listing and preservation of entries on reload.
Bump the version.
 1.21 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.20 19-Jul-2012  spz branches: 1.20.2;
teach npf ipv6-icmp
reviewed by rmind@
 1.19 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.18 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.17 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.16 14-Apr-2012  rmind Update rumpdev_npf; use WARNS=4.
 1.15 11-Mar-2012  rmind - Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
 1.14 06-Feb-2012  rmind branches: 1.14.2;
- Split NPF rule procedure code into a separate module (no functional changes).
- Simplify some code, add more comments, some asserts.
- G/C unused rule hook code.
 1.13 05-Feb-2012  rmind Multiple NPF fixes, add better error reporting from kernel side, add some
asserts, bump the version.
 1.12 15-Jan-2012  rmind - Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.11 29-Nov-2011  rmind branches: 1.11.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.10 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.9 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.8 02-Feb-2011  rmind branches: 1.8.2; 1.8.6;
Bump NPF_VERSION.
 1.7 02-Feb-2011  rmind NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.6 18-Jan-2011  rmind branches: 1.6.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.5 18-Dec-2010  rmind branches: 1.5.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.4 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.3 25-Sep-2010  rmind branches: 1.3.2; 1.3.4;
Add nbuf_advfetch() and simplify some code slightly.
 1.2 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.3.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.3.4.1 25-Sep-2010  uebayasi file npf.h was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.3.2.2 09-Oct-2010  yamt sync with head
 1.3.2.1 25-Sep-2010  yamt file npf.h was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.8.6.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.6.5 23-Jan-2013  yamt sync with head
 1.8.6.4 16-Jan-2013  yamt sync with (a bit old) head
 1.8.6.3 30-Oct-2012  yamt sync with head
 1.8.6.2 17-Apr-2012  yamt sync with head
 1.8.6.1 10-Nov-2011  yamt sync with head
 1.8.2.2 05-Mar-2011  rmind sync with head
 1.8.2.1 02-Feb-2011  rmind file npf.h was added on branch rmind-uvmplock on 2011-03-05 20:55:54 +0000
 1.11.2.3 29-Apr-2012  mrg sync to latest -current.
 1.11.2.2 05-Apr-2012  mrg sync to latest -current.
 1.11.2.1 18-Feb-2012  mrg merge to -current.
 1.14.2.13 05-Apr-2018  martin Pullup the following revision, requested by maxv in ticket #1542:

sys/net/npf/npf.h 1.55

Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:
packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)
Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.14.2.12 11-Feb-2013  riz branches: 1.14.2.12.2;
Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.14.2.11 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.14.2.10 07-Jan-2013  riz Pull up following revision(s) (requested by rmind in ticket #776):
usr.sbin/npf/npfctl/npf.conf.5: revision 1.26
usr.sbin/npf/npfctl/npfctl.c: revision 1.26
dist/pf/usr.sbin/ftp-proxy/npf.c: revision 1.2
lib/libnpf/npf.c: revision 1.15
sys/net/npf/npf_ctl.c: revision 1.20
lib/libnpf/npf.h: revision 1.12
lib/libnpf/npf.3: revision 1.6
lib/libnpf/npf.3: revision 1.7
usr.sbin/npf/npfctl/npf_build.c: revision 1.17
sys/net/npf/npf.h: revision 1.24
- Add NPF version check in proplist as well, not only ioctl. Bump the version.
- Fix a bug in table entry lookup.
- Updates/fixes to the man pages. Misc.
Remove a superfluous quote and fix a recurring typo.
ftp-proxy: disable NPF bits for now; it will be re-done.
 1.14.2.9 16-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #746):
sys/net/npf/npf_inet.c: revision 1.18
sys/net/npf/npf_mbuf.c: revision 1.8
sys/net/npf/npf.h: revision 1.23
npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.14.2.8 24-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #702):
sys/net/npf/npf_tableset.c: revision 1.15
usr.sbin/npf/npfctl/npfctl.h: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.6
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.10
sys/net/npf/npf_state_tcp.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.24
sys/net/npf/npf.h: revision 1.22
sys/net/npf/npf_ctl.c: revision 1.19
sys/net/npf/npf.c: revision 1.14
usr.sbin/npf/npfctl/npfctl.8: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.21
npf_tcp_inwindow: inspect the sequence numbers even if the packet contains no
data, fixing up only the RST to the initial SYN. This makes off-path attacks
more difficult. For the reference, see &quot;Reflection Scan: an Off-Path Attack
on TCP&quot; by Jan Wrobel.
Implement NPF table listing and preservation of entries on reload.
Bump the version.
npfctl(8): mention table listing.
 1.14.2.7 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.14.2.6 25-Jul-2012  jdc branches: 1.14.2.6.4;
Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.14.2.5 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.14.2.4 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.14.2.3 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.14.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.14.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.14.2.12.2.1 05-Apr-2018  martin Pullup the following revision, requested by maxv in ticket #1542:

sys/net/npf/npf.h 1.55

Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:
packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)
Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.14.2.6.4.2 05-Apr-2018  martin Pullup the following revision, requested by maxv in ticket #1542:

sys/net/npf/npf.h 1.55

Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:
packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)
Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.14.2.6.4.1 16-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #746):
sys/net/npf/npf_inet.c: revision 1.18
sys/net/npf/npf_mbuf.c: revision 1.8
sys/net/npf/npf.h: revision 1.23
npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.20.2.5 03-Dec-2017  jdolecek update from HEAD
 1.20.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.20.2.3 23-Jun-2013  tls resync from head
 1.20.2.2 25-Feb-2013  tls resync with head
 1.20.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.30.6.1 18-May-2014  rmind sync with head
 1.38.2.1 10-Aug-2014  tls Rebase.
 1.47.12.1 05-Apr-2018  martin Pullup the following revision, requested by maxv in ticket #1593:

sys/net/npf/npf.h 1.55

Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:
packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)
Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.47.8.2 20-Mar-2017  pgoyette Sync with HEAD
 1.47.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.47.6.1 05-Apr-2018  martin Pullup the following revision, requested by maxv in ticket #1593:

sys/net/npf/npf.h 1.55

Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:
packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)
Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.47.4.1 05-Feb-2017  skrll Sync with HEAD
 1.47.2.1 05-Apr-2018  martin Pullup the following revision, requested by maxv in ticket #1593:

sys/net/npf/npf.h 1.55

Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:
packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)
Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.53.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.54.6.2 09-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #817):

sys/net/npf/npf_inet.c: revision 1.38-1.44
sys/net/npf/npf_handler.c: revision 1.38-1.39
sys/net/npf/npf_alg_icmp.c: revision 1.26
sys/net/npf/npf.h: revision 1.56
sys/net/npf/npf_sendpkt.c: revision 1.17-1.18

Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.
Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.

Switch nptr to uint8_t, and use nbuf_ensure_contig. Makes us use fewer
magic values.

Remove dead branches, 'npc' can't be NULL (and it is dereferenced
earlier).

Fix two consecutive mistakes.

The first mistake was npf_inet.c rev1.37:
"Don't reassemble ipv6 fragments, instead treat the first fragment
as a regular packet (subject to filtering rules), and pass
subsequent fragments in the same group unconditionally."

Doing this was entirely wrong, because then a packet just had to push
the L4 payload in a secondary fragment, and NPF wouldn't apply rules on
it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake
was supposed to be a fix for the second mistake.

The second mistake was that ip6_reass_packet (in npf_reassembly) was
getting called with npc->npc_hlen. But npc_hlen pointed to the last
encountered header in the IPv6 chain, which was not necessarily the
fragment header. So ip6_reass_packet was given garbage, and would fail,
resulting in the packet getting kicked. So basically IPv6 was broken by
NPF.

The first mistake is reverted, and the second one is fixed by doing:
- hlen = sizeof(struct ip6_frag);
+ hlen = 0;

Now the iteration stops on the fragment header, and the call to
ip6_reass_packet is valid.

My npf_inet.c rev1.38 is partially reverted: we don't need to worry
about failing properly to advance; once the packet is reassembled
npf_cache_ip gets called again, and this time the whole chain should be
there.

Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the
packet gets correctly reassembled by NPF now.

Mmh, put back the RFC6946 check (about dummy fragments), otherwise NPF
is not happy in npf_reassembly, because NPC_IPFRAG is again returned after
the packet was reassembled.

I'm wondering whether it would not be better to just remove the fragment
header in frag6_input directly.

Fix the "return-rst" rule on IPv6 packets.
The scopes needed to be set on the addresses before invoking ip6_output,
because ip6_output needs them. The reason they are not here already is
because pfil_run_hooks (in ip6_input) is called _before_ the kernel
initializes the scopes.

Until now ip6_output was always failing, and the IPv6-TCP-RST packet was
never actually sent.

Perhaps it would be better to have the kernel initialize the scopes
before invoking pfil_run_hooks, but several things will need to be fixed
in several places.

Tested with a simple TCPv6 server. Until now the client would block
waiting for an answer that never came; now it receives an RST right away
and closes the connection, as expected.
I believe that the same problem exists in the "return-icmp" rules, but I
can't investigate this right now (some problems with wireshark).

Fix the IPv6 payload computation in npf_tcpsaw. It was incorrect, and this
caused the "return-rst" rules to send back an RST with the wrong ACK when
the received SYN had an IPv6 option.

Set the scopes before calling icmp6_error(). This fixes a bug similar to
the one I fixed in rev1.17: since the scopes were not set the packet was
never actually sent.

Tested with wireshark, now the ICMPv6 reply is correctly sent, as
expected.

Don't read the L4 payload after IPPROTO_AH when handling IPv6 packets.
AH must be considered as the payload, otherwise a

block all
pass in proto ah from any
pass out proto ah from any

configuration will actually block everything, because NPF checks the
protocol against the one found after AH, and not AH itself.

In addition it may have been a problem for stateful connections; an AH
packet sent by an attacker with an incorrect authentication and a correct
TCP/UDP/whatever payload from an active connection could manage to change
NPF's FSM state, which would perhaps have altered the legitimate
connection with the authenticated remote IPsec host.

Note that IPv4 already doesn't go beyond AH, which is the correct
behavior.

Add XXX (we don't handle IPv6 Jumbograms), and whitespace.
 1.54.6.1 04-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #693):

sys/net/npf/npf.h: revision 1.55

Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
bypass a certain number of filtering rules.

Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
has large extensions.

As a result of an overflow, (mbuf + npc_hlen) won't point at the real
protocol header, but instead at some garbage within the packet. That
garbage, is what NPF applies its rules on.

If these filtering rules allow the packet to enter, that packet is given
to the main IPv6 entry point. This entry point, however, is not subject to
an integer overflow, so it will actually parse the correct protocol header.

The result is: NPF read a wrong header, allowed the packet to enter, the
kernel read the correct header, and delivered the packet depending on this
correct header. So the offending packet was supposed to be kicked, but
still went through the firewall.

Simple example, a packet with:

packet + 0 = IP6 Header
packet + 40 = IP6 Routing header (ip6r_len = 31)
packet + 48 = Crafted UDP header (uh_dport = 7777)
packet + 296 = IP6 Dest header (ip6e_len = 0)
packet + 304 = Real UDP header (uh_dport = 6666)

Will bypass a rule of the kind "block port 6666". Here NPF reads the
crafted UDP header, sees 7777, lets the packet in; later the kernel reads
the real UDP header, and delivers it on port 6666.

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.
 1.55.2.4 26-Jan-2019  pgoyette Sync with HEAD
 1.55.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.55.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.55.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.57.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.57.2.1 10-Jun-2019  christos Sync with HEAD
 1.60.2.4 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #119):

sys/net/npf/npf_mbuf.c: revision 1.25
sys/net/npf/npf.h: revision 1.64
sys/net/npf/npf_sendpkt.c: revision 1.23

PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream ashttps://github.com/rmind/npf/pull/115
 1.60.2.3 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.60.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.60.2.1 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #139):

lib/libnpf/npf.c: revision 1.47
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.10
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.10
sys/net/npf/npf.h: revision 1.61
sys/net/npf/npf_ctl.c: revision 1.56
sys/net/npf/npf_os.c: revision 1.15
lib/libnpf/libnpf.3: revision 1.10
sys/net/npf/npf_tableset.c: revision 1.34
usr.sbin/npf/npfctl/npfctl.c: revision 1.61
sys/net/npf/npf_impl.h: revision 1.77
lib/libnpf/npf.h: revision 1.37

- npftest: fix a memleak in a unit test (standalone path only).
- Minor style fixes. No functional change.
npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.63.20.1 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #119):

sys/net/npf/npf_mbuf.c: revision 1.25
sys/net/npf/npf.h: revision 1.64
sys/net/npf/npf_sendpkt.c: revision 1.23

PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream ashttps://github.com/rmind/npf/pull/115
 1.64.6.1 02-Aug-2025  perseant Sync with HEAD
 1.67.2.1 13-Oct-2025  martin Pull up following revision(s) (requested by joe in ticket #53):

sys/net/npf/npf.h: revision 1.68
sys/net/npf/npf_ruleset.c: revision 1.57

PR kern/59615 introduce layer checks for 10 userland 11 kernel
 1.22 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.21 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.20 23-Jul-2019  rmind branches: 1.20.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.19 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.18 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.17 12-Sep-2018  christos Fix lockdebug diagnostic error of trying to acquire an rw_lock from a
pserialized active context. From riastradh@
 1.16 26-Dec-2016  christos branches: 1.16.14; 1.16.16;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.15 11-Aug-2014  rmind branches: 1.15.2; 1.15.4;
- Add and use npf_alg_export().
- npf_conn_import: handle NAT metadata correctly.
- npf_nat_newpolicy: restore the policy ID.
- npfctl_load: fix error code handling for the limit cases.
- npf_config_import: fix the inverted logic.
- npfctl_load: improve error handling.
 1.14 20-Jul-2014  rmind branches: 1.14.2;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.13 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.12 17-Feb-2014  rmind branches: 1.12.2;
npf_alg_session: fix inverted logic in the previous commit.
 1.11 16-Feb-2014  rmind NPF: pass ALG functions via npfa_funcs_t structure.
 1.10 06-Dec-2013  rmind NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.9 02-Jun-2013  rmind branches: 1.9.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.8 20-Mar-2013  christos Make ALG's autoloadable by providing in the config file:
alg "algname"
 1.7 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.6 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.5 15-Jul-2012  rmind branches: 1.5.2;
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.4 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.3 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.2 11-Nov-2010  rmind branches: 1.2.6; 1.2.10; 1.2.14; 1.2.16;
NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.1 22-Aug-2010  rmind branches: 1.1.2; 1.1.4;
Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.1.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.1.4.1 22-Aug-2010  uebayasi file npf_alg.c was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.1.2.2 09-Oct-2010  yamt sync with head
 1.1.2.1 22-Aug-2010  yamt file npf_alg.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.2.16.5 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.2.16.4 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.2.16.3 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.2.16.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.2.16.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.2.14.1 24-Feb-2012  mrg sync to -current.
 1.2.10.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.10.3 23-Jan-2013  yamt sync with head
 1.2.10.2 30-Oct-2012  yamt sync with head
 1.2.10.1 17-Apr-2012  yamt sync with head
 1.2.6.2 05-Mar-2011  rmind sync with head
 1.2.6.1 11-Nov-2010  rmind file npf_alg.c was added on branch rmind-uvmplock on 2011-03-05 20:55:54 +0000
 1.5.2.4 03-Dec-2017  jdolecek update from HEAD
 1.5.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.5.2.2 23-Jun-2013  tls resync from head
 1.5.2.1 25-Feb-2013  tls resync with head
 1.9.2.1 18-May-2014  rmind sync with head
 1.12.2.1 10-Aug-2014  tls Rebase.
 1.14.2.1 29-Aug-2014  martin Pull up following revision(s) (requested by rmind in ticket #56):
sys/net/npf/npf_ctl.c: revision 1.39
usr.sbin/npf/npfctl/npfctl.c: revision 1.43
lib/libnpf/npf.c: revision 1.33
lib/libnpf/npf.c: revision 1.34
sys/net/npf/npf_impl.h: revision 1.59
sys/net/npf/npf_ctl.c: revision 1.40
sys/net/npf/npf_conn.c: revision 1.11
sys/net/npf/npf_alg.c: revision 1.15
sys/net/npf/npf_conn.c: revision 1.12
sys/net/npf/npf_nat.c: revision 1.33
sys/net/npf/npf_nat.c: revision 1.34
Add and use npf_alg_export().
npf_conn_import: handle NAT metadata correctly.
npf_nat_newpolicy: restore the policy ID.
npfctl_load: fix error code handling for the limit cases.
npf_config_import: fix the inverted logic.
npfctl_load: improve error handling.
npf_conn_import: add a missing stat counter increment.
npf_nat_import: add a missing reference and make a comment.
npf_config_submit: finally, include the saved connections.
 1.15.4.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.15.2.1 05-Feb-2017  skrll Sync with HEAD
 1.16.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.16.16.1 10-Jun-2019  christos Sync with HEAD
 1.16.14.2 26-Jan-2019  pgoyette Sync with HEAD
 1.16.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.20.2.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.20.2.1 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.33 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.32 23-Jul-2019  rmind branches: 1.32.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.31 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.30 23-Mar-2018  maxv branches: 1.30.2;
In addition to checking L4 in the cache, here we also need to check the
protocol. The NPF entry point does not ensure that

ICMPv6 can be set only in IPv6
ICMPv4 can be set only in IPv4

So we could have ICMPv6 in IPv4.
 1.29 22-Mar-2018  maxv Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.
 1.28 22-Mar-2018  maxv Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.

Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.
 1.27 22-Mar-2018  maxv Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).

Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.
 1.26 12-Mar-2018  maxv Remove dead branches, 'npc' can't be NULL (and it is dereferenced
earlier).
 1.25 10-Dec-2017  rmind branches: 1.25.2;
- npf_cop_table: handle non-IP packets in the ether (fixes PR/52290).
- npfa_icmp_nat: do not recompute the checksum if no port translation.
- npf_normalize (MSS clamping): fix the checksum handling on PFIL_OUT.
- npflog: report the packet direction correctly.
 1.24 26-Dec-2016  christos branches: 1.24.8;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.23 20-Jul-2014  rmind branches: 1.23.2; 1.23.4; 1.23.6; 1.23.8; 1.23.12;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.22 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.21 08-Jun-2014  spz fix typo in comment
 1.20 19-Feb-2014  rmind branches: 1.20.2;
NPF: fix the recent breakage of the traceroute ALG. Also, simplify and
refactor a little bit.
 1.19 16-Feb-2014  rmind NPF: pass ALG functions via npfa_funcs_t structure.
 1.18 06-Dec-2013  rmind NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.17 02-Jun-2013  rmind branches: 1.17.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.16 20-Mar-2013  christos Make ALG's autoloadable by providing in the config file:
alg "algname"
 1.15 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.14 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.13 16-Sep-2012  rmind npf_icmp_uniqid: split into npf_icmp_uniqid4() and npf_icmp_uniqid6() parts.
 1.12 10-Sep-2012  rmind branches: 1.12.2;
npf_icmp_uniqid: inspect the correct npc_info for IPv4/v6.
 1.11 19-Jul-2012  spz teach npf ipv6-icmp
reviewed by rmind@
 1.10 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.9 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.8 29-Nov-2011  rmind branches: 1.8.2; 1.8.4;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.7 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.6 18-Jan-2011  rmind branches: 1.6.4; 1.6.8;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.5 18-Dec-2010  rmind branches: 1.5.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.4 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.3 25-Sep-2010  rmind branches: 1.3.2; 1.3.4;
Add nbuf_advfetch() and simplify some code slightly.
 1.2 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.3.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.3.4.1 25-Sep-2010  uebayasi file npf_alg_icmp.c was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.3.2.2 09-Oct-2010  yamt sync with head
 1.3.2.1 25-Sep-2010  yamt file npf_alg_icmp.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.8.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.8.4 23-Jan-2013  yamt sync with head
 1.6.8.3 30-Oct-2012  yamt sync with head
 1.6.8.2 17-Apr-2012  yamt sync with head
 1.6.8.1 10-Nov-2011  yamt sync with head
 1.6.4.2 05-Mar-2011  rmind sync with head
 1.6.4.1 18-Jan-2011  rmind file npf_alg_icmp.c was added on branch rmind-uvmplock on 2011-03-05 20:55:54 +0000
 1.8.4.8 17-May-2018  martin Pull up following revision(s) via patch (requested by maxv in ticket #1549):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27,1.28

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).

Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.
 1.8.4.7 11-Feb-2013  riz branches: 1.8.4.7.2;
Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.8.4.6 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.8.4.5 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #679):
sys/net/npf/npf_session.c: revision 1.18
usr.sbin/npf/npftest/npftest.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.7
usr.sbin/npf/npftest/npftest.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.5
sys/net/npf/npf_alg_icmp.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.3
npftest:
- Do not stop running other tests, if some tests fail.
- Fix some endianness bugs in the test cases.
Tested on sparc64 by martin@, all tests pass.
Add two new command line options to help integration into ATF:
-L lists the available test cases, -T executes a single named test.
Fix printf format
Mark npf_session_worker as __dead.
More __dead
npf_icmp_uniqid: split into npf_icmp_uniqid4() and npf_icmp_uniqid6() parts.
 1.8.4.4 13-Sep-2012  riz Pull up following revision(s) (requested by rmind in ticket #555):
sys/net/npf/npf_alg_icmp.c: revision 1.12
npf_icmp_uniqid: inspect the correct npc_info for IPv4/v6.
 1.8.4.3 25-Jul-2012  jdc branches: 1.8.4.3.2;
Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.8.4.2 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.8.4.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.8.4.7.2.1 17-May-2018  martin Pull up following revision(s) via patch (requested by maxv in ticket #1549):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27,1.28

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).

Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.
 1.8.4.3.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.8.2.1 24-Feb-2012  mrg sync to -current.
 1.12.2.5 03-Dec-2017  jdolecek update from HEAD
 1.12.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.3 23-Jun-2013  tls resync from head
 1.12.2.2 25-Feb-2013  tls resync with head
 1.12.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.17.2.1 18-May-2014  rmind sync with head
 1.20.2.1 10-Aug-2014  tls Rebase.
 1.23.12.1 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1605):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.29

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.
We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.
Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.
In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).
This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.
 1.23.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.23.6.1 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1605):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.29

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.
We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.
Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.
In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).
This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.
 1.23.4.1 05-Feb-2017  skrll Sync with HEAD
 1.23.2.1 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1605):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.29

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.
We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.
Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.
In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).
This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.
 1.24.8.2 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #823):

sys/net/npf/npf_inet.c: revision 1.45-1.47
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.30
sys/net/npf/npf_sendpkt.c: revision 1.19

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Retrieve the complete IPv4 header right away, and make sure we did retrieve
the IPv6 option header we were iterating on.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.

If we fail to advance inside TCP/UDP/ICMPv4/ICMPv6, stop pretending L4
is unknown, and error out right away.

This prevents bugs in machinery, if a place looks for L4 in 'npc_proto'
without checking the cache too. I've seen a ~similar problem already.

In addition to checking L4 in the cache, here we also need to check the
protocol. The NPF entry point does not ensure that
ICMPv6 can be set only in IPv6
ICMPv4 can be set only in IPv4
So we could have ICMPv6 in IPv4.

apply some INET6 so this compiles in INET6-less kernels again.
 1.24.8.1 09-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #817):

sys/net/npf/npf_inet.c: revision 1.38-1.44
sys/net/npf/npf_handler.c: revision 1.38-1.39
sys/net/npf/npf_alg_icmp.c: revision 1.26
sys/net/npf/npf.h: revision 1.56
sys/net/npf/npf_sendpkt.c: revision 1.17-1.18

Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.
Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.

Switch nptr to uint8_t, and use nbuf_ensure_contig. Makes us use fewer
magic values.

Remove dead branches, 'npc' can't be NULL (and it is dereferenced
earlier).

Fix two consecutive mistakes.

The first mistake was npf_inet.c rev1.37:
"Don't reassemble ipv6 fragments, instead treat the first fragment
as a regular packet (subject to filtering rules), and pass
subsequent fragments in the same group unconditionally."

Doing this was entirely wrong, because then a packet just had to push
the L4 payload in a secondary fragment, and NPF wouldn't apply rules on
it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake
was supposed to be a fix for the second mistake.

The second mistake was that ip6_reass_packet (in npf_reassembly) was
getting called with npc->npc_hlen. But npc_hlen pointed to the last
encountered header in the IPv6 chain, which was not necessarily the
fragment header. So ip6_reass_packet was given garbage, and would fail,
resulting in the packet getting kicked. So basically IPv6 was broken by
NPF.

The first mistake is reverted, and the second one is fixed by doing:
- hlen = sizeof(struct ip6_frag);
+ hlen = 0;

Now the iteration stops on the fragment header, and the call to
ip6_reass_packet is valid.

My npf_inet.c rev1.38 is partially reverted: we don't need to worry
about failing properly to advance; once the packet is reassembled
npf_cache_ip gets called again, and this time the whole chain should be
there.

Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the
packet gets correctly reassembled by NPF now.

Mmh, put back the RFC6946 check (about dummy fragments), otherwise NPF
is not happy in npf_reassembly, because NPC_IPFRAG is again returned after
the packet was reassembled.

I'm wondering whether it would not be better to just remove the fragment
header in frag6_input directly.

Fix the "return-rst" rule on IPv6 packets.
The scopes needed to be set on the addresses before invoking ip6_output,
because ip6_output needs them. The reason they are not here already is
because pfil_run_hooks (in ip6_input) is called _before_ the kernel
initializes the scopes.

Until now ip6_output was always failing, and the IPv6-TCP-RST packet was
never actually sent.

Perhaps it would be better to have the kernel initialize the scopes
before invoking pfil_run_hooks, but several things will need to be fixed
in several places.

Tested with a simple TCPv6 server. Until now the client would block
waiting for an answer that never came; now it receives an RST right away
and closes the connection, as expected.
I believe that the same problem exists in the "return-icmp" rules, but I
can't investigate this right now (some problems with wireshark).

Fix the IPv6 payload computation in npf_tcpsaw. It was incorrect, and this
caused the "return-rst" rules to send back an RST with the wrong ACK when
the received SYN had an IPv6 option.

Set the scopes before calling icmp6_error(). This fixes a bug similar to
the one I fixed in rev1.17: since the scopes were not set the packet was
never actually sent.

Tested with wireshark, now the ICMPv6 reply is correctly sent, as
expected.

Don't read the L4 payload after IPPROTO_AH when handling IPv6 packets.
AH must be considered as the payload, otherwise a

block all
pass in proto ah from any
pass out proto ah from any

configuration will actually block everything, because NPF checks the
protocol against the one found after AH, and not AH itself.

In addition it may have been a problem for stateful connections; an AH
packet sent by an attacker with an incorrect authentication and a correct
TCP/UDP/whatever payload from an active connection could manage to change
NPF's FSM state, which would perhaps have altered the legitimate
connection with the authenticated remote IPsec host.

Note that IPv4 already doesn't go beyond AH, which is the correct
behavior.

Add XXX (we don't handle IPv6 Jumbograms), and whitespace.
 1.25.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.25.2.2 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.25.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.30.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.30.2.1 10-Jun-2019  christos Sync with HEAD
 1.32.2.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.14 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.13 10-Dec-2017  rmind branches: 1.13.2; 1.13.4;
- npf_cop_table: handle non-IP packets in the ether (fixes PR/52290).
- npfa_icmp_nat: do not recompute the checksum if no port translation.
- npf_normalize (MSS clamping): fix the checksum handling on PFIL_OUT.
- npflog: report the packet direction correctly.
 1.12 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.11 20-Jul-2014  rmind branches: 1.11.4; 1.11.6; 1.11.10;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.10 30-Jun-2014  rmind NPF: use BPF JIT by default.
 1.9 29-Jun-2014  rmind NPF:
- Populate the BPF external memory store with L3 information.
- Eliminate NPF_COP_L3 call and just use the data in the memstore.
- Bump NPF_VERSION.
 1.8 25-Jun-2014  rmind Adjust NPF to the recent BPF / BPF JIT changes and make it work again.
All regression tests are happy now (hi alnsn!).
 1.7 24-Jun-2014  alnsn Fix signatures of copfuncs.
 1.6 06-Dec-2013  rmind branches: 1.6.2; 1.6.4; 1.6.6;
NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.5 23-Nov-2013  rmind Move initialisation of bpf_args_t into the npf_ruleset_inspect().
This allows us to reuse the BPF memory store as a cache.
 1.4 16-Nov-2013  rmind NPF: convert to bpf_jit_generate()/bpf_jit_freecode().
 1.3 15-Nov-2013  rmind - Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.2 12-Nov-2013  rmind NPF: add support for table naming and remove NPF_TABLE_SLOTS (there is
just an arbitrary sanity limit of NPF_MAX_TABLES currently set to 128).

Few misc fixes. Bump NPF_VERSION.
 1.1 19-Sep-2013  rmind - Convert NPF to use BPF byte-code by default. Compile BPF byte-code in
npfctl(8) and generate separate marks to describe the filter criteria.
- Rewrite 'npfctl show' functionality and fix some of the bugs.
- npftest: add a test for BPF COP.
- Bump NPF_VERSION.
 1.6.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.6.1 06-Dec-2013  yamt file npf_bpf.c was added on branch yamt-pagecache on 2014-05-22 11:41:09 +0000
 1.6.4.2 18-May-2014  rmind sync with head
 1.6.4.1 06-Dec-2013  rmind file npf_bpf.c was added on branch rmind-smpnet on 2014-05-18 17:46:13 +0000
 1.6.2.1 10-Aug-2014  tls Rebase.
 1.11.10.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.11.6.1 05-Feb-2017  skrll Sync with HEAD
 1.11.4.3 03-Dec-2017  jdolecek update from HEAD
 1.11.4.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.4.1 20-Jul-2014  tls file npf_bpf.c was added on branch tls-maxphys on 2014-08-20 00:04:35 +0000
 1.13.4.1 10-Jun-2019  christos Sync with HEAD
 1.13.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.18 13-Feb-2022  riastradh npf(4): Use atomic_store_release and atomic_load_consume for config.

...or atomic_load_relaxed, when the config is locked. (Not necessary
to use atomic_* at all in NetBSD, but in C11 it will be cheaper to
say atomic_load_relaxed explicitly so an _Atomic-qualified object
doesn't cause the load to be surrounded by unnecessary membars.)

No need for store-before-load ordering here, so no need to
membar_sync.
 1.17 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.16 23-May-2020  rmind Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.15 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.14 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.13 23-Jul-2019  rmind branches: 1.13.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.12 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.11 03-Jan-2017  rmind branches: 1.11.14; 1.11.16;
NPF: fix the interface table initialisation on load.
 1.10 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.9 30-Nov-2014  rmind branches: 1.9.2;
- npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.8 11-Aug-2014  rmind branches: 1.8.2; 1.8.4;
NPF: finish up the rework of npfctl_save() mechanism.
 1.7 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.6 30-May-2014  rmind - npf_nat_freepolicy: handle a race condition when a new connection might
be associated with a NAT policy which is going away and npfctl reload
would wait for its natural expiration (potentially long time).
- Remove npf_ruleset_natreload() by merging into npf_ruleset_reload().
- npf_ruleset_reload: eliminate a small time period when a valid NAT
policy might be inactive during the reload operation.
 1.5 22-Nov-2013  rmind branches: 1.5.2; 1.5.4;
Add npf_tableset_syncdict() to sync the table IDs in the proplib dictionary,
as they can change on reload now. Also, fix table name checking in npfctl.
 1.4 12-Nov-2013  rmind NPF: add support for table naming and remove NPF_TABLE_SLOTS (there is
just an arbitrary sanity limit of NPF_MAX_TABLES currently set to 128).

Few misc fixes. Bump NPF_VERSION.
 1.3 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.2 10-Feb-2013  rmind branches: 1.2.2; 1.2.4; 1.2.6;
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
 1.1 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.2.6.4 03-Dec-2017  jdolecek update from HEAD
 1.2.6.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.2 25-Feb-2013  tls resync with head
 1.2.6.1 10-Feb-2013  tls file npf_conf.c was added on branch tls-maxphys on 2013-02-25 00:30:02 +0000
 1.2.4.1 18-May-2014  rmind sync with head
 1.2.2.2 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.2.2.1 10-Feb-2013  riz file npf_conf.c was added on branch netbsd-6 on 2013-02-11 21:49:48 +0000
 1.5.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.4.1 22-Nov-2013  yamt file npf_conf.c was added on branch yamt-pagecache on 2014-05-22 11:41:09 +0000
 1.5.2.1 10-Aug-2014  tls Rebase.
 1.8.4.2 05-Feb-2017  skrll Sync with HEAD
 1.8.4.1 06-Apr-2015  skrll Sync with HEAD
 1.8.2.1 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #280):
sys/net/npf/npf_ruleset.c: revision 1.40
sys/net/npf/npf_nat.c: revision 1.36
sys/net/npf/npf_nat.c: revision 1.37
sys/net/npf/npf_conn.h: revision 1.7
sys/net/npf/npf_conf.c: revision 1.9
sys/net/npf/npf_ruleset.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.60
NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
- npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.9.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.11.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.11.16.1 10-Jun-2019  christos Sync with HEAD
 1.11.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.13.2.4 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.13.2.3 25-May-2020  martin Pull up following revision(s) (requested by rmind in ticket #930):

usr.sbin/npf/npfctl/npf_build.c: revision 1.54
sys/net/npf/npf_conn.h: revision 1.19
usr.sbin/npf/npfctl/npfctl.h: revision 1.52
usr.sbin/npf/npfctl/npf_show.c: revision 1.31
sys/net/npf/npf_conf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.56
sys/net/npf/npf_conndb.c: revision 1.8
sys/net/npf/npf_conn.c: revision 1.31

Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.13.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.13.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.35 22-Jan-2023  riastradh npf(9): Update comment to reduce diff from upstream.

No functional change.
 1.34 13-Feb-2022  riastradh npf(4): Use atomic_store_release and atomic_load_consume for conn_db.

...or atomic_load_relaxed, when npf->conn_lock is held, for the sake
of C11.

No need for store-before-load implied by membar_sync.
 1.33 25-Jan-2021  christos s/npf_config_lock/npf->config_lock/ in the comments
 1.32 30-May-2020  rmind branches: 1.32.2;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.31 23-May-2020  rmind Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.30 29-Sep-2019  rmind NPF ifmap: rework and fix a few small bugs.
 1.29 06-Aug-2019  christos - npf_conn_init(): fix a race when initialising the G/C thread.
- Fix a bug when partially initialised connection is destroyed on error.
(from rmind@)
 1.28 06-Aug-2019  christos Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.
 1.27 23-Jul-2019  rmind branches: 1.27.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.26 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.25 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.24 10-Dec-2017  rmind branches: 1.24.2; 1.24.4;
- npf_cop_table: handle non-IP packets in the ether (fixes PR/52290).
- npfa_icmp_nat: do not recompute the checksum if no port translation.
- npf_normalize (MSS clamping): fix the checksum handling on PFIL_OUT.
- npflog: report the packet direction correctly.
 1.23 29-Jan-2017  christos - Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.22 26-Dec-2016  christos branches: 1.22.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.21 10-Dec-2016  christos revert dir hack.
 1.20 10-Dec-2016  christos Welcome to version 18:
- Connection state keys are not stored and loaded using the logical key
contents.
- connection finder key is stored in a map that contains the key and the
direction.
 1.19 10-Dec-2016  kre Remove what looks like remnant (partly removed already) debug code,
which could not possibly compile as it was.
 1.18 10-Dec-2016  christos add functionality to lookup a nat entry from the connection list.
 1.17 08-Dec-2016  rmind NPF: adjust the 'stateful-ends' mechanism to tag the packets and thus
pass-through them on other interfaces. Per discussion with christos@.
 1.16 05-Feb-2015  rmind branches: 1.16.2;
npf_conn_establish: fix the previous change - drop the reference on error.
 1.15 01-Feb-2015  rmind - npf_conn_establish: remove a rare race condition when we might destroy a
connection when it is still referenced by another thread.
- npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
- Sprinkle some asserts.
 1.14 20-Dec-2014  rmind NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
 1.13 30-Nov-2014  rmind NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
 1.12 24-Aug-2014  rmind branches: 1.12.2;
- npf_conn_import: add a missing stat counter increment.
- npf_nat_import: add a missing reference and make a comment.
 1.11 11-Aug-2014  rmind branches: 1.11.2;
- Add and use npf_alg_export().
- npf_conn_import: handle NAT metadata correctly.
- npf_nat_newpolicy: restore the policy ID.
- npfctl_load: fix error code handling for the limit cases.
- npf_config_import: fix the inverted logic.
- npfctl_load: improve error handling.
 1.10 10-Aug-2014  rmind branches: 1.10.2;
- Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.9 26-Jul-2014  rmind branches: 1.9.2;
npf_conn_conkey: fix a comment.
 1.8 25-Jul-2014  rmind npf_conn_conkey: adjust to return the key length and add a comment
describing the key layout.
 1.7 25-Jul-2014  rmind npf_mk_connlist: destroy the connections on error path.
 1.6 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.5 20-Jul-2014  joerg Drop variable only used in return.
 1.4 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.3 19-Jul-2014  christos gcc-4.8 complains about not being able to inline
 1.2 19-Jul-2014  rmind Fix gcc warnings.
 1.1 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.9.2.2 10-Aug-2014  tls Rebase.
 1.9.2.1 26-Jul-2014  tls file npf_conn.c was added on branch tls-earlyentropy on 2014-08-10 06:56:16 +0000
 1.10.2.5 15-Mar-2015  snj Pull up following revision(s) (requested by rmind in ticket #586):
sys/net/npf/npf_conn.c: revision 1.16
npf_conn_establish: fix the previous change - drop the reference on error.
 1.10.2.4 04-Feb-2015  snj Pull up following revision(s) (requested by rmind in ticket #479):
lib/libnpf/npf.c: revision 1.35
lib/libnpf/npf.h: revision 1.28
sys/net/npf/npf_conn.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.61
sys/net/npf/npf_ruleset.c: revision 1.41
usr.sbin/npf/npfctl/npf.conf.5: revision 1.44
usr.sbin/npf/npfctl/npf_parse.y: revision 1.37
usr.sbin/npf/npfctl/npf_show.c: revisions 1.16, 1.17
usr.sbin/npf/npfctl/npfctl.c: revision 1.46
load the config file before bpfjit so that we can disable the warning.
--
Don't depend on yacc to include stdlib.h or string.h.
--
- npf_conn_establish: remove a rare race condition when we might destroy a
connection when it is still referenced by another thread.
- npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
- Sprinkle some asserts.
--
npf.conf(5): mention alg, include in the example, minor fix.
--
npfctl(8): report dynamic rule ID in a comment, print the case when libpcap
is used correctly. Also, add npf_ruleset_dump() helper in the kernel.
--
libnpf: add npf_rule_getid() and npf_rule_getcode().
Missed in the previous commit.
--
npfctl_print_rule: print the ID in hex, not decimal.
 1.10.2.3 22-Dec-2014  msaitoh Pull up following revision(s) (requested by rmind in ticket #347):
sys/net/npf/npf_nat.c: revision 1.38
sys/net/npf/npf_conn.h: revision 1.8
sys/net/npf/npf_conn.c: revision 1.14
NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
 1.10.2.2 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #280):
sys/net/npf/npf_ruleset.c: revision 1.40
sys/net/npf/npf_nat.c: revision 1.36
sys/net/npf/npf_nat.c: revision 1.37
sys/net/npf/npf_conn.h: revision 1.7
sys/net/npf/npf_conf.c: revision 1.9
sys/net/npf/npf_ruleset.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.60
NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
- npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.10.2.1 29-Aug-2014  martin Pull up following revision(s) (requested by rmind in ticket #56):
sys/net/npf/npf_ctl.c: revision 1.39
usr.sbin/npf/npfctl/npfctl.c: revision 1.43
lib/libnpf/npf.c: revision 1.33
lib/libnpf/npf.c: revision 1.34
sys/net/npf/npf_impl.h: revision 1.59
sys/net/npf/npf_ctl.c: revision 1.40
sys/net/npf/npf_conn.c: revision 1.11
sys/net/npf/npf_alg.c: revision 1.15
sys/net/npf/npf_conn.c: revision 1.12
sys/net/npf/npf_nat.c: revision 1.33
sys/net/npf/npf_nat.c: revision 1.34
Add and use npf_alg_export().
npf_conn_import: handle NAT metadata correctly.
npf_nat_newpolicy: restore the policy ID.
npfctl_load: fix error code handling for the limit cases.
npf_config_import: fix the inverted logic.
npfctl_load: improve error handling.
npf_conn_import: add a missing stat counter increment.
npf_nat_import: add a missing reference and make a comment.
npf_config_submit: finally, include the saved connections.
 1.11.2.3 03-Dec-2017  jdolecek update from HEAD
 1.11.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.2.1 11-Aug-2014  tls file npf_conn.c was added on branch tls-maxphys on 2014-08-20 00:04:35 +0000
 1.12.2.2 05-Feb-2017  skrll Sync with HEAD
 1.12.2.1 06-Apr-2015  skrll Sync with HEAD
 1.16.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.16.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.22.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.24.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.24.4.1 10-Jun-2019  christos Sync with HEAD
 1.24.2.2 26-Jan-2019  pgoyette Sync with HEAD
 1.24.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.27.2.4 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.27.2.3 25-May-2020  martin Pull up following revision(s) (requested by rmind in ticket #930):

usr.sbin/npf/npfctl/npf_build.c: revision 1.54
sys/net/npf/npf_conn.h: revision 1.19
usr.sbin/npf/npfctl/npfctl.h: revision 1.52
usr.sbin/npf/npfctl/npf_show.c: revision 1.31
sys/net/npf/npf_conf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.56
sys/net/npf/npf_conndb.c: revision 1.8
sys/net/npf/npf_conn.c: revision 1.31

Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.27.2.2 04-Oct-2019  martin Pull up following revision(s) (requested by rmind in ticket #282):

usr.sbin/npf/npfctl/npf_build.c: revision 1.53
lib/libnpf/npf.c: revision 1.48
usr.sbin/npf/npfctl/npfctl.h: revision 1.50
sys/net/npf/npf_impl.h: revision 1.80
usr.sbin/npf/npfctl/npfctl.h: revision 1.51
sys/net/npf/npf_ruleset.c: revision 1.49
usr.sbin/npf/npfctl/npf.conf.5: revision 1.90
sys/net/npf/npf_ctl.c: revision 1.59
lib/libnpf/libnpf.3: revision 1.11
usr.sbin/npf/npfctl/npf_parse.y: revision 1.50
usr.sbin/npf/npftest/npftest.conf: revision 1.8
usr.sbin/npf/npfctl/npfctl.c: revision 1.62
usr.sbin/npf/npfctl/npfctl.c: revision 1.63
usr.sbin/npf/npfctl/npf_scan.l: revision 1.30
usr.sbin/npf/npfctl/npfctl.8: revision 1.22
lib/libnpf/npf.h: revision 1.38
usr.sbin/npf/npfctl/npfctl.8: revision 1.23
usr.sbin/npf/npfctl/npfctl.8: revision 1.24
sys/net/npf/npf_if.c: revision 1.11
sys/net/npf/npf_if.c: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.89
sys/net/npf/npf_conn.c: revision 1.30
usr.sbin/npf/npfctl/npf_build.c: revision 1.52

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.

NPF ifmap: rework and fix a few small bugs.

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.
(missed a file in previous commit; cvs is so helpful..)

libnpf/npfctl: support dynamic NAT rulesets using a name prefix.

Use -width Pa for FILES.

Fix pasto in table replace -t type

Use -width Pa for FILES.

npf_ifmap_copylogname: be more defensive.
 1.27.2.1 07-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #25):

sys/net/npf/npf_conn.h: revision 1.17
sys/net/npf/npf.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.28
sys/net/npf/npf_conn.c: revision 1.29

Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.

- npf_conn_init(): fix a race when initialising the G/C thread.
- Fix a bug when partially initialised connection is destroyed on error.
(from rmind@)
 1.32.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.20 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.19 23-May-2020  rmind Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.18 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.17 06-Aug-2019  christos - npf_conn_init(): fix a race when initialising the G/C thread.
- Fix a bug when partially initialised connection is destroyed on error.
(from rmind@)
 1.16 23-Jul-2019  rmind branches: 1.16.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.15 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.14 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.13 10-Dec-2017  rmind branches: 1.13.2; 1.13.4;
- npf_cop_table: handle non-IP packets in the ether (fixes PR/52290).
- npfa_icmp_nat: do not recompute the checksum if no port translation.
- npf_normalize (MSS clamping): fix the checksum handling on PFIL_OUT.
- npflog: report the packet direction correctly.
 1.12 29-Jan-2017  christos - Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.11 26-Dec-2016  christos branches: 1.11.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.10 10-Dec-2016  christos Welcome to version 18:
- Connection state keys are not stored and loaded using the logical key
contents.
- connection finder key is stored in a map that contains the key and the
direction.
 1.9 10-Dec-2016  christos add functionality to lookup a nat entry from the connection list.
 1.8 20-Dec-2014  rmind branches: 1.8.2;
NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
 1.7 30-Nov-2014  rmind NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
 1.6 10-Aug-2014  rmind branches: 1.6.2; 1.6.4; 1.6.6;
- Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.5 25-Jul-2014  rmind branches: 1.5.2;
npf_conn_conkey: adjust to return the key length and add a comment
describing the key layout.
 1.4 25-Jul-2014  rmind npf_mk_connlist: destroy the connections on error path.
 1.3 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.2 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.1 19-Jul-2014  rmind Add npf_conn.h missed in the previous commit.
 1.5.2.2 10-Aug-2014  tls Rebase.
 1.5.2.1 25-Jul-2014  tls file npf_conn.h was added on branch tls-earlyentropy on 2014-08-10 06:56:16 +0000
 1.6.6.2 05-Feb-2017  skrll Sync with HEAD
 1.6.6.1 06-Apr-2015  skrll Sync with HEAD
 1.6.4.3 03-Dec-2017  jdolecek update from HEAD
 1.6.4.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.4.1 10-Aug-2014  tls file npf_conn.h was added on branch tls-maxphys on 2014-08-20 00:04:35 +0000
 1.6.2.2 22-Dec-2014  msaitoh Pull up following revision(s) (requested by rmind in ticket #347):
sys/net/npf/npf_nat.c: revision 1.38
sys/net/npf/npf_conn.h: revision 1.8
sys/net/npf/npf_conn.c: revision 1.14
NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
 1.6.2.1 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #280):
sys/net/npf/npf_ruleset.c: revision 1.40
sys/net/npf/npf_nat.c: revision 1.36
sys/net/npf/npf_nat.c: revision 1.37
sys/net/npf/npf_conn.h: revision 1.7
sys/net/npf/npf_conf.c: revision 1.9
sys/net/npf/npf_ruleset.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.60
NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
- npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.8.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.8.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.11.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.13.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.13.4.1 10-Jun-2019  christos Sync with HEAD
 1.13.2.2 26-Jan-2019  pgoyette Sync with HEAD
 1.13.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.16.2.4 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.16.2.3 25-May-2020  martin Pull up following revision(s) (requested by rmind in ticket #930):

usr.sbin/npf/npfctl/npf_build.c: revision 1.54
sys/net/npf/npf_conn.h: revision 1.19
usr.sbin/npf/npfctl/npfctl.h: revision 1.52
usr.sbin/npf/npfctl/npf_show.c: revision 1.31
sys/net/npf/npf_conf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.56
sys/net/npf/npf_conndb.c: revision 1.8
sys/net/npf/npf_conn.c: revision 1.31

Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.16.2.2 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.16.2.1 07-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #25):

sys/net/npf/npf_conn.h: revision 1.17
sys/net/npf/npf.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.28
sys/net/npf/npf_conn.c: revision 1.29

Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.

- npf_conn_init(): fix a race when initialising the G/C thread.
- Fix a bug when partially initialised connection is destroyed on error.
(from rmind@)
 1.9 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.8 23-May-2020  rmind Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.7 14-Dec-2019  riastradh Skip npf_config_sync if nothing to do.

Saves an unnecessary pserialize_perform every second.
 1.6 23-Jul-2019  rmind branches: 1.6.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.5 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.4 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.3 26-Dec-2016  christos branches: 1.3.14; 1.3.16;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.2 23-Jul-2014  rmind branches: 1.2.2; 1.2.6; 1.2.8; 1.2.12;
NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.1 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.2.12.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.8.1 05-Feb-2017  skrll Sync with HEAD
 1.2.6.3 03-Dec-2017  jdolecek update from HEAD
 1.2.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.6.1 23-Jul-2014  tls file npf_conndb.c was added on branch tls-maxphys on 2014-08-20 00:04:35 +0000
 1.2.2.2 10-Aug-2014  tls Rebase.
 1.2.2.1 23-Jul-2014  tls file npf_conndb.c was added on branch tls-earlyentropy on 2014-08-10 06:56:16 +0000
 1.3.16.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.3.16.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.16.1 10-Jun-2019  christos Sync with HEAD
 1.3.14.2 26-Jan-2019  pgoyette Sync with HEAD
 1.3.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.6.2.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.6.2.1 25-May-2020  martin Pull up following revision(s) (requested by rmind in ticket #930):

usr.sbin/npf/npfctl/npf_build.c: revision 1.54
sys/net/npf/npf_conn.h: revision 1.19
usr.sbin/npf/npfctl/npfctl.h: revision 1.52
usr.sbin/npf/npfctl/npf_show.c: revision 1.31
sys/net/npf/npf_conf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.56
sys/net/npf/npf_conndb.c: revision 1.8
sys/net/npf/npf_conn.c: revision 1.31

Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.2 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.1 23-Jul-2019  rmind branches: 1.1.2; 1.1.10;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.1.10.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.10.1 23-Jul-2019  martin file npf_connkey.c was added on branch phil-wifi on 2020-04-13 08:05:15 +0000
 1.1.2.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.62 01-Jun-2025  joe NPF copyright 2025
 1.61 01-Jun-2025  joe kernel: extract rules, lookup socket, process filtering, reviews by christos@
 1.60 30-May-2020  rmind branches: 1.60.26;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.59 30-Sep-2019  rmind libnpf/npfctl: support dynamic NAT rulesets using a name prefix.
 1.58 25-Aug-2019  rmind ake npfctl_switch() and pfil private to OS-specific module.
 1.57 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.56 21-Aug-2019  rmind npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.55 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.54 23-Jul-2019  rmind branches: 1.54.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.53 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.52 29-Oct-2018  christos We need to have rump tests work in two modes:

1. npf unit tests. In this case only the npf subsystem is created
and dictionaries are passed directly.
2. kernel system tests (like the ipsec natt test). In this case, npf is
instantiated regularly as part of the kernel and dictionaries are
passed via ioctl.

We differentiate between the two cases by checking the "mbufops" member
which is NULL, regularly and non-NULL in the npf unit tests. Previously
this was done using an ifdef which obviously can't work for both cases.
 1.51 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.50 10-Dec-2017  rmind branches: 1.50.2; 1.50.4;
- npf_mk_rules: enforce unique names for the dynamic rulesets.
- npf_worker_unregister: merge fix for the standalone NPF.
 1.49 30-Oct-2017  ozaki-r Fix npfclt reload on rump kernels

It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.

PR kern/52643
 1.48 17-May-2017  christos branches: 1.48.2;
Allow npf to be used "normally" from a rump kernel, not just from the
test harness (problem reported by Frank Kardel)
 1.47 29-Jan-2017  christos branches: 1.47.4;
- Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.46 02-Jan-2017  rmind branches: 1.46.2;
NPF: implement dynamic handling of interface addresses (the kernel part).
 1.45 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.44 10-Dec-2016  christos add functionality to lookup a nat entry from the connection list.
 1.43 28-Oct-2015  christos branches: 1.43.2;
remove bogus KASSERT, there are error paths that don't satisfy this.
XXX: should improve error reporting to userland.
 1.42 08-Jun-2015  rmind - npfctl: fix the confusion in the parser (0/0 case with no other filter).
- Always populate the error dictionary, not only for DEBUG/DIAGNOSTIC.
 1.41 20-Mar-2015  rmind NPF: replace the TAILQ of the dynamic rules with a linked list and fix the
inheriting of the active dynamic rules during the reload; also, fix a bug
in the insert path by putting a memory barrier in the right place.
 1.40 24-Aug-2014  rmind branches: 1.40.2;
- npf_conn_import: add a missing stat counter increment.
- npf_nat_import: add a missing reference and make a comment.
 1.39 11-Aug-2014  rmind - Add and use npf_alg_export().
- npf_conn_import: handle NAT metadata correctly.
- npf_nat_newpolicy: restore the policy ID.
- npfctl_load: fix error code handling for the limit cases.
- npf_config_import: fix the inverted logic.
- npfctl_load: improve error handling.
 1.38 11-Aug-2014  rmind branches: 1.38.2;
NPF: finish up the rework of npfctl_save() mechanism.
 1.37 10-Aug-2014  rmind - Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.36 25-Jul-2014  rmind npf_mk_connlist: destroy the connections on error path.
 1.35 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.34 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.33 06-Feb-2014  rmind branches: 1.33.2;
Add support for CDB based NPF tables.
 1.32 12-Nov-2013  rmind NPF: add support for table naming and remove NPF_TABLE_SLOTS (there is
just an arbitrary sanity limit of NPF_MAX_TABLES currently set to 128).

Few misc fixes. Bump NPF_VERSION.
 1.31 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.30 27-Oct-2013  rmind Add NPF_MAX_RULES, an artificial limit (set it to 1M).
 1.29 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.28 19-Sep-2013  rmind - Convert NPF to use BPF byte-code by default. Compile BPF byte-code in
npfctl(8) and generate separate marks to describe the filter criteria.
- Rewrite 'npfctl show' functionality and fix some of the bugs.
- npftest: add a test for BPF COP.
- Bump NPF_VERSION.
 1.27 19-Sep-2013  rmind npfctl_rule: fixes for the dynamic rules.
 1.26 02-Jun-2013  rmind branches: 1.26.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.25 19-May-2013  rmind - Add NPF table flushing functionality.
- Fix line numbering for npfctl debug command.
 1.24 20-Mar-2013  christos Make ALG's autoloadable by providing in the config file:
alg "algname"
 1.23 16-Feb-2013  rmind - Convert NPF dynamic rule ID to just incremented 64-bit counter.
- Fix multiple bugs. Also, update the man page.
 1.22 10-Feb-2013  rmind - Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
 1.21 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.20 23-Dec-2012  rmind - Add NPF version check in proplist as well, not only ioctl. Bump the version.
- Fix a bug in table entry lookup.
- Updates/fixes to the man pages. Misc.
 1.19 29-Oct-2012  rmind Implement NPF table listing and preservation of entries on reload.
Bump the version.
 1.18 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.17 15-Aug-2012  rmind branches: 1.17.2;
- {npf_mk_rproc,npf_nat_save}: fix the fetching of {rproc-ptr,id_ptr}.
- npf_rproc_setlog: initialise variables to 0, as keys may not exist.

Bugs found by mlelstv@ while testing on Amiga.
 1.16 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.15 30-May-2012  rmind npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
 1.14 11-Mar-2012  rmind - Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
 1.13 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.12 05-Feb-2012  rmind branches: 1.12.2;
Multiple NPF fixes, add better error reporting from kernel side, add some
asserts, bump the version.
 1.11 15-Jan-2012  rmind - Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.10 29-Nov-2011  rmind branches: 1.10.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.9 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.8 04-Nov-2011  jakllsch Use uint8_t instead of npf_netmask_t, as npf_netmask_t is a uint_fast8_t,
which is in many places is actually a uint32_t and thus incompatible with
prop_dictionary_get_uint8(). The correct type is noted in a comment.
 1.7 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.6 02-Feb-2011  rmind branches: 1.6.2; 1.6.6;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.5 18-Jan-2011  rmind branches: 1.5.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.4 18-Dec-2010  rmind branches: 1.4.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.3 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 16-Sep-2010  rmind branches: 1.2.2; 1.2.4;
NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 16-Sep-2010  uebayasi file npf_ctl.c was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 16-Sep-2010  yamt file npf_ctl.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.4.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.6.6.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.6.4 23-Jan-2013  yamt sync with head
 1.6.6.3 30-Oct-2012  yamt sync with head
 1.6.6.2 17-Apr-2012  yamt sync with head
 1.6.6.1 10-Nov-2011  yamt sync with head
 1.6.2.2 05-Mar-2011  rmind sync with head
 1.6.2.1 02-Feb-2011  rmind file npf_ctl.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.10.2.4 02-Jun-2012  mrg sync to latest -current.
 1.10.2.3 05-Apr-2012  mrg sync to latest -current.
 1.10.2.2 24-Feb-2012  mrg sync to -current.
 1.10.2.1 18-Feb-2012  mrg merge to -current.
 1.12.2.10 22-Sep-2013  riz Pull up following revision(s) (requested by rmind in ticket #952):
sys/net/npf/npf_ctl.c: revision 1.27
npfctl_rule: fixes for the dynamic rules.
 1.12.2.9 18-Feb-2013  riz branches: 1.12.2.9.2;
Pull up following revision(s) (requested by rmind in ticket #829):
usr.sbin/npf/npfctl/npfctl.8: revision 1.13
usr.sbin/npf/npfctl/npf_build.c: revision 1.21
lib/libnpf/npf.c: revision 1.18
sys/net/npf/npf_ctl.c: revision 1.23
usr.sbin/npf/npfctl/npfctl.h: revision 1.27
lib/libnpf/npf.h: revision 1.15
sys/net/npf/npf_ruleset.c: revision 1.19
sys/net/npf/npf_impl.h: revision 1.28
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.c: revision 1.31
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.6
- Convert NPF dynamic rule ID to just incremented 64-bit counter.
- Fix multiple bugs. Also, update the man page.
 1.12.2.8 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.12.2.7 07-Jan-2013  riz Pull up following revision(s) (requested by rmind in ticket #776):
usr.sbin/npf/npfctl/npf.conf.5: revision 1.26
usr.sbin/npf/npfctl/npfctl.c: revision 1.26
dist/pf/usr.sbin/ftp-proxy/npf.c: revision 1.2
lib/libnpf/npf.c: revision 1.15
sys/net/npf/npf_ctl.c: revision 1.20
lib/libnpf/npf.h: revision 1.12
lib/libnpf/npf.3: revision 1.6
lib/libnpf/npf.3: revision 1.7
usr.sbin/npf/npfctl/npf_build.c: revision 1.17
sys/net/npf/npf.h: revision 1.24
- Add NPF version check in proplist as well, not only ioctl. Bump the version.
- Fix a bug in table entry lookup.
- Updates/fixes to the man pages. Misc.
Remove a superfluous quote and fix a recurring typo.
ftp-proxy: disable NPF bits for now; it will be re-done.
 1.12.2.6 24-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #702):
sys/net/npf/npf_tableset.c: revision 1.15
usr.sbin/npf/npfctl/npfctl.h: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.6
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.10
sys/net/npf/npf_state_tcp.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.24
sys/net/npf/npf.h: revision 1.22
sys/net/npf/npf_ctl.c: revision 1.19
sys/net/npf/npf.c: revision 1.14
usr.sbin/npf/npfctl/npfctl.8: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.21
npf_tcp_inwindow: inspect the sequence numbers even if the packet contains no
data, fixing up only the RST to the initial SYN. This makes off-path attacks
more difficult. For the reference, see &quot;Reflection Scan: an Off-Path Attack
on TCP&quot; by Jan Wrobel.
Implement NPF table listing and preservation of entries on reload.
Bump the version.
npfctl(8): mention table listing.
 1.12.2.5 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.12.2.4 19-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #511):
lib/libnpf/npf.c: revision 1.12
sys/net/npf/npf_ctl.c: revision 1.17
sys/net/npf/npf_nat.c: revision 1.17
- {npf_mk_rproc,npf_nat_save}: fix the fetching of {rproc-ptr,id_ptr}.
- npf_rproc_setlog: initialise variables to 0, as keys may not exist.
Bugs found by mlelstv@ while testing on Amiga.
 1.12.2.3 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.12.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.12.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.12.2.9.2.1 22-Sep-2013  riz Pull up following revision(s) (requested by rmind in ticket #952):
sys/net/npf/npf_ctl.c: revision 1.27
npfctl_rule: fixes for the dynamic rules.
 1.17.2.5 03-Dec-2017  jdolecek update from HEAD
 1.17.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.2.3 23-Jun-2013  tls resync from head
 1.17.2.2 25-Feb-2013  tls resync with head
 1.17.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.26.2.1 18-May-2014  rmind sync with head
 1.33.2.1 10-Aug-2014  tls Rebase.
 1.38.2.4 30-Oct-2018  martin Pull up following revision(s) (requested by sborrill in ticket #1646):

sys/net/npf/npf_ctl.c: revision 1.47 (partial, via patch)

- Increase copyin buffer size to 4M
 1.38.2.3 10-Jun-2015  snj Pull up following revision(s) (requested by rmind in ticket #835):
sys/net/npf/npf_ctl.c: revision 1.42
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.40
- npfctl: fix the confusion in the parser (0/0 case with no other filter).
- Always populate the error dictionary, not only for DEBUG/DIAGNOSTIC.
 1.38.2.2 21-Mar-2015  snj Pull up following revision(s) (requested by rmind in ticket #630):
sys/net/npf/npf_ctl.c: revision 1.41
sys/net/npf/npf_ruleset.c: revision 1.42
usr.sbin/npf/npfctl/npf_build.c: revision 1.39
usr.sbin/npf/npfctl/npf_show.c: revision 1.18
NPF: replace the TAILQ of the dynamic rules with a linked list and fix the
inheriting of the active dynamic rules during the reload; also, fix a bug
in the insert path by putting a memory barrier in the right place.
--
npfctl:
- Fix the filter criteria when to/from is omitted but port used.
- Print more user-friendly error if an NPF table has a duplicate entry.
 1.38.2.1 29-Aug-2014  martin Pull up following revision(s) (requested by rmind in ticket #56):
sys/net/npf/npf_ctl.c: revision 1.39
usr.sbin/npf/npfctl/npfctl.c: revision 1.43
lib/libnpf/npf.c: revision 1.33
lib/libnpf/npf.c: revision 1.34
sys/net/npf/npf_impl.h: revision 1.59
sys/net/npf/npf_ctl.c: revision 1.40
sys/net/npf/npf_conn.c: revision 1.11
sys/net/npf/npf_alg.c: revision 1.15
sys/net/npf/npf_conn.c: revision 1.12
sys/net/npf/npf_nat.c: revision 1.33
sys/net/npf/npf_nat.c: revision 1.34
Add and use npf_alg_export().
npf_conn_import: handle NAT metadata correctly.
npf_nat_newpolicy: restore the policy ID.
npfctl_load: fix error code handling for the limit cases.
npf_config_import: fix the inverted logic.
npfctl_load: improve error handling.
npf_conn_import: add a missing stat counter increment.
npf_nat_import: add a missing reference and make a comment.
npf_config_submit: finally, include the saved connections.
 1.40.2.5 28-Aug-2017  skrll Sync with HEAD
 1.40.2.4 05-Feb-2017  skrll Sync with HEAD
 1.40.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.40.2.2 22-Sep-2015  skrll Sync with HEAD
 1.40.2.1 06-Apr-2015  skrll Sync with HEAD
 1.43.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.43.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.46.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.47.4.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.48.2.1 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #357):
distrib/sets/lists/debug/mi: 1.228
distrib/sets/lists/tests/mi: 1.765-1.766
etc/mtree/NetBSD.dist.tests: 1.149
sys/net/npf/npf_ctl.c: 1.49
tests/net/ipsec/Makefile: 1.10
tests/net/ipsec/algorithms.sh: 1.6
tests/net/ipsec/natt_terminator.c: 1.1
tests/net/ipsec/t_ipsec_natt.sh: 1.1
tests/net/net_common.sh: 1.23-1.24
usr.sbin/npf/npfctl/npfctl.c: 1.54
Handle esp-udp for NAT-T
--
Fix npfclt reload on rump kernels
It fails because npfctl cannot get an errno when it calls ioctl to the (rump)
kernel; npfctl (libnpf) expects that an errno is returned via proplib,
however, the rump library of npf doesn't so. It happens because of mishandlings
of complicate npf kernel options.
PR kern/52643
--
Fix showing translated port (ntohs-ed twice wrongly)
--
Add test cases of NAT-T (transport mode)
A small C program is added to make a special socket (UDP_ENCAP_ESPINUDP)
and keep it to handle UDP-encapsulated ESP packets.
--
Add net/ipsec debug lib directory
--
Add ./usr/libdata/debug/usr/tests/net/ipsec
--
Stop using bpfjit
Because most architectures don't support it and npf still works without it.
 1.50.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.50.4.1 10-Jun-2019  christos Sync with HEAD
 1.50.2.3 26-Jan-2019  pgoyette Sync with HEAD
 1.50.2.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.50.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.54.2.5 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.54.2.4 04-Oct-2019  martin Pull up following revision(s) (requested by rmind in ticket #282):

usr.sbin/npf/npfctl/npf_build.c: revision 1.53
lib/libnpf/npf.c: revision 1.48
usr.sbin/npf/npfctl/npfctl.h: revision 1.50
sys/net/npf/npf_impl.h: revision 1.80
usr.sbin/npf/npfctl/npfctl.h: revision 1.51
sys/net/npf/npf_ruleset.c: revision 1.49
usr.sbin/npf/npfctl/npf.conf.5: revision 1.90
sys/net/npf/npf_ctl.c: revision 1.59
lib/libnpf/libnpf.3: revision 1.11
usr.sbin/npf/npfctl/npf_parse.y: revision 1.50
usr.sbin/npf/npftest/npftest.conf: revision 1.8
usr.sbin/npf/npfctl/npfctl.c: revision 1.62
usr.sbin/npf/npfctl/npfctl.c: revision 1.63
usr.sbin/npf/npfctl/npf_scan.l: revision 1.30
usr.sbin/npf/npfctl/npfctl.8: revision 1.22
lib/libnpf/npf.h: revision 1.38
usr.sbin/npf/npfctl/npfctl.8: revision 1.23
usr.sbin/npf/npfctl/npfctl.8: revision 1.24
sys/net/npf/npf_if.c: revision 1.11
sys/net/npf/npf_if.c: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.89
sys/net/npf/npf_conn.c: revision 1.30
usr.sbin/npf/npfctl/npf_build.c: revision 1.52

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.

NPF ifmap: rework and fix a few small bugs.

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.
(missed a file in previous commit; cvs is so helpful..)

libnpf/npfctl: support dynamic NAT rulesets using a name prefix.

Use -width Pa for FILES.

Fix pasto in table replace -t type

Use -width Pa for FILES.

npf_ifmap_copylogname: be more defensive.
 1.54.2.3 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.54.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #139):

lib/libnpf/npf.c: revision 1.47
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.10
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.10
sys/net/npf/npf.h: revision 1.61
sys/net/npf/npf_ctl.c: revision 1.56
sys/net/npf/npf_os.c: revision 1.15
lib/libnpf/libnpf.3: revision 1.10
sys/net/npf/npf_tableset.c: revision 1.34
usr.sbin/npf/npfctl/npfctl.c: revision 1.61
sys/net/npf/npf_impl.h: revision 1.77
lib/libnpf/npf.h: revision 1.37

- npftest: fix a memleak in a unit test (standalone path only).
- Minor style fixes. No functional change.
npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.54.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.60.26.1 02-Aug-2025  perseant Sync with HEAD
 1.17 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.16 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.15 29-Sep-2018  rmind branches: 1.15.4; 1.15.6;
NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.14 26-Jun-2018  msaitoh branches: 1.14.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.13 18-Feb-2017  christos branches: 1.13.12;
provide a copy function used for logging that does not lock, but can return
trash.
 1.12 18-Feb-2017  mlelstv npf_ifmap_getname requires the config to be locked. For now, just prevent the
crash.
 1.11 29-Jan-2017  christos - Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.10 26-Dec-2016  christos branches: 1.10.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.9 16-Jun-2016  ozaki-r branches: 1.9.2;
Use if_get_byindex instead of if_byindex for MP-safe
 1.8 20-Jul-2014  rmind branches: 1.8.4;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.7 19-May-2014  jakllsch Add ability to have mbufs disappear (to another interface) during
npf_rproc_run(). For upcoming npf_ext_route extension.

Guidance and ok by rmind@.
 1.6 11-Mar-2013  christos branches: 1.6.10;
*"" is not constant according to gcc. So we move the responsibility for adding
a , to the users of the macro.
 1.5 11-Mar-2013  christos - avoid trailing , in dependencies when there are none other the npf module
itself.
- remove if_npflog dependency from npf_ext_log.
 1.4 11-Mar-2013  christos remove the detach that does not belong here anymore.
 1.3 10-Mar-2013  christos Split the npflog cloner and auto-load the extensions.
 1.2 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.1 16-Sep-2012  rmind branches: 1.1.2; 1.1.4; 1.1.6;
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.1.6.6 03-Dec-2017  jdolecek update from HEAD
 1.1.6.5 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.6.4 23-Jun-2013  tls resync from head
 1.1.6.3 25-Feb-2013  tls resync with head
 1.1.6.2 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.6.1 16-Sep-2012  tls file npf_ext_log.c was added on branch tls-maxphys on 2012-11-20 03:02:47 +0000
 1.1.4.3 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.1.4.2 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.1.4.1 16-Sep-2012  riz file npf_ext_log.c was added on branch netbsd-6 on 2012-11-18 22:38:26 +0000
 1.1.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.3 23-Jan-2013  yamt sync with head
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 16-Sep-2012  yamt file npf_ext_log.c was added on branch yamt-pagecache on 2012-10-30 17:22:44 +0000
 1.6.10.1 10-Aug-2014  tls Rebase.
 1.8.4.3 28-Aug-2017  skrll Sync with HEAD
 1.8.4.2 05-Feb-2017  skrll Sync with HEAD
 1.8.4.1 09-Jul-2016  skrll Sync with HEAD
 1.9.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.9.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.10.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.13.12.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.13.12.1 28-Jul-2018  pgoyette Sync with HEAD
 1.14.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.14.2.1 10-Jun-2019  christos Sync with HEAD
 1.15.6.1 29-Feb-2020  ad Sync with head.
 1.15.4.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.5 12-Mar-2013  christos normali{s,z}e
 1.4 11-Mar-2013  christos *"" is not constant according to gcc. So we move the responsibility for adding
a , to the users of the macro.
 1.3 11-Mar-2013  christos - avoid trailing , in dependencies when there are none other the npf module
itself.
- remove if_npflog dependency from npf_ext_log.
 1.2 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.1 16-Sep-2012  rmind branches: 1.1.2; 1.1.4; 1.1.6;
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.1.6.4 23-Jun-2013  tls resync from head
 1.1.6.3 25-Feb-2013  tls resync with head
 1.1.6.2 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.1.6.1 16-Sep-2012  tls file npf_ext_normalise.c was added on branch tls-maxphys on 2012-11-20 03:02:47 +0000
 1.1.4.3 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.1.4.2 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.1.4.1 16-Sep-2012  riz file npf_ext_normalise.c was added on branch netbsd-6 on 2012-11-18 22:38:27 +0000
 1.1.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.3 23-Jan-2013  yamt sync with head
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 16-Sep-2012  yamt file npf_ext_normalise.c was added on branch yamt-pagecache on 2012-10-30 17:22:44 +0000
 1.11 08-Mar-2021  christos reinstate a simple version of ip_randomid()
 1.10 30-May-2020  rmind branches: 1.10.2;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.9 29-Sep-2018  rmind branches: 1.9.4;
NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.8 31-Aug-2018  maxv Introduce npf_set_mss(). When the MSS is not 16bit-aligned, it sets:

0 8 16 24 32
+------+-----------+-----------+------+
| data | MSS (low) | MSS (hig) | data |
+------+-----------+-----------+------+
^ ^
old[0] old[1]

And sets new[0,1] accordingly with the new value. The MSS-clamping code
then adjusts twice the checksum on a 16bit boundary:

from old[0] to new[0]
from old[1] to new[1]

Fixes PR/53479, opened by myself. Tested with wireshark and kASan.
 1.7 07-Apr-2018  maxv branches: 1.7.2;
Fix an inverted logic.

nbuf_cksum_barrier returns true when the direction is PFIL_OUT and TSO is
active; that is to say, it returns true when the checksum was already
recomputed by the function.

The check should be !nbuf_cksum_barrier, because otherwise we're wrongfully
checksumming twice, and it causes the packet to be kicked later in
tcp_input.

This can be seen with a configuration of the type:

procedure "norm" {
normalize: "max-mss" 15000
}
group default {
pass all apply "norm"
}

The packets systematically get dropped because the checksum validation in
tcp_input fails. With this patch in place, it works.
 1.6 10-Dec-2017  rmind branches: 1.6.2;
- npf_cop_table: handle non-IP packets in the ether (fixes PR/52290).
- npfa_icmp_nat: do not recompute the checksum if no port translation.
- npf_normalize (MSS clamping): fix the checksum handling on PFIL_OUT.
- npflog: report the packet direction correctly.
 1.5 29-Jan-2017  christos - Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.4 26-Dec-2016  christos branches: 1.4.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.3 20-Jul-2014  rmind branches: 1.3.4; 1.3.8;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.2 19-May-2014  jakllsch branches: 1.2.2;
Add ability to have mbufs disappear (to another interface) during
npf_rproc_run(). For upcoming npf_ext_route extension.

Guidance and ok by rmind@.
 1.1 12-Mar-2013  christos branches: 1.1.6; 1.1.12;
normali{s,z}e
 1.1.12.1 10-Aug-2014  tls Rebase.
 1.1.6.4 03-Dec-2017  jdolecek update from HEAD
 1.1.6.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.6.2 23-Jun-2013  tls resync from head
 1.1.6.1 12-Mar-2013  tls file npf_ext_normalize.c was added on branch tls-maxphys on 2013-06-23 06:20:25 +0000
 1.2.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.2.1 19-May-2014  yamt file npf_ext_normalize.c was added on branch yamt-pagecache on 2014-05-22 11:41:09 +0000
 1.3.8.2 20-Mar-2017  pgoyette Sync with HEAD
 1.3.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.3.4.1 05-Feb-2017  skrll Sync with HEAD
 1.4.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.6.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.6.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.6.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.7.2.1 10-Jun-2019  christos Sync with HEAD
 1.9.4.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.10.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.9 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.8 29-Sep-2018  rmind branches: 1.8.4;
NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.7 29-Jan-2017  christos branches: 1.7.12; 1.7.14;
- Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.6 26-Dec-2016  christos branches: 1.6.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.5 20-Jul-2014  rmind branches: 1.5.4; 1.5.8;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.4 19-May-2014  jakllsch Add ability to have mbufs disappear (to another interface) during
npf_rproc_run(). For upcoming npf_ext_route extension.

Guidance and ok by rmind@.
 1.3 11-Mar-2013  christos branches: 1.3.10;
*"" is not constant according to gcc. So we move the responsibility for adding
a , to the users of the macro.
 1.2 11-Mar-2013  christos - avoid trailing , in dependencies when there are none other the npf module
itself.
- remove if_npflog dependency from npf_ext_log.
 1.1 10-Dec-2012  rmind branches: 1.1.2; 1.1.4; 1.1.8;
Add NPF "rndblock" extension to randomly drop packets (using a random function
with a percentage or modulo operation). This is a demo module, although it can
be used for packet loss simulation. Example of a procedure in npf.conf:

procedure "somedrop" {
# Drop 1.9% of the traffic
rndblock: percentage 1.9
}
 1.1.8.4 03-Dec-2017  jdolecek update from HEAD
 1.1.8.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.8.2 25-Feb-2013  tls resync with head
 1.1.8.1 10-Dec-2012  tls file npf_ext_rndblock.c was added on branch tls-maxphys on 2013-02-25 00:30:02 +0000
 1.1.4.2 16-Jan-2013  yamt sync with (a bit old) head
 1.1.4.1 10-Dec-2012  yamt file npf_ext_rndblock.c was added on branch yamt-pagecache on 2013-01-16 05:33:49 +0000
 1.1.2.2 15-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #745):
distrib/sets/lists/comp/shl.mi: revision 1.241
distrib/sets/lists/modules/mi: revision 1.49
distrib/sets/lists/base/md.sparc64: revision 1.171
lib/npf/ext_rndblock/npfext_rndblock.c: revision 1.1
distrib/sets/lists/base/ad.mips64eb: revision 1.106
distrib/sets/lists/modules/md.evbppc: revision 1.29
sys/net/npf/npf_ext_rndblock.c: revision 1.1
lib/npf/Makefile: revision 1.2
sys/modules/npf_ext_rndblock/Makefile: revision 1.1
lib/npf/ext_rndblock/Makefile: revision 1.1
distrib/sets/lists/base/ad.mips64el: revision 1.106
lib/npf/ext_rndblock/shlib_version: revision 1.1
distrib/sets/lists/base/md.amd64: revision 1.182
distrib/sets/lists/base/shl.mi: revision 1.643
sys/net/npf/files.npf: revision 1.9
sys/modules/Makefile: revision 1.117
Add NPF &quot;rndblock&quot; extension to randomly drop packets (using a random function
with a percentage or modulo operation). This is a demo module, although it can
be used for packet loss simulation. Example of a procedure in npf.conf:
procedure &quot;somedrop&quot; {
# Drop 1.9% of the traffic
rndblock: percentage 1.9
}
 1.1.2.1 10-Dec-2012  riz file npf_ext_rndblock.c was added on branch netbsd-6 on 2012-12-15 23:45:58 +0000
 1.3.10.1 10-Aug-2014  tls Rebase.
 1.5.8.2 20-Mar-2017  pgoyette Sync with HEAD
 1.5.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.5.4.1 05-Feb-2017  skrll Sync with HEAD
 1.6.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.7.14.1 10-Jun-2019  christos Sync with HEAD
 1.7.12.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.8.4.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.54 08-Jul-2025  joe Pass frames directly when no layer 2 rules are set

NPF's original implementation of default pass is to block. i.e if the packet matches absolutely
no rule even the default group. we cannot use that in layer 2 as well since all frames will be
blocked when no rules are set for layer 2 and that would not be good. since NPF is primarily
a layer 3 filter.

Greg@ Markus@
 1.53 01-Jul-2025  joe kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.52 01-Jun-2025  joe NPF copyright 2025
 1.51 01-Jun-2025  joe kernel: extract rules, lookup socket, process filtering, reviews by christos@
 1.50 05-Jul-2024  rin npf: Drop redundant NULL check before m_freem(9)

XXX
Their standalone version of m_freem() does not work for NULL input.
I will send pullreq to upstream soon.
 1.49 30-May-2020  rmind branches: 1.49.26;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.48 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.47 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.46 23-Jul-2019  rmind branches: 1.46.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.45 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.44 10-Jul-2018  maxv Modify the logic in npf_reassembly. Don't call nbuf_reset, we don't need
it since we don't read the IPv4 header anymore.

If ip{6}_reass_packet fails, always free 'm', and always clear the nbuf.

We want to avoid the case where

'm' was reallocated
the nbuf pointer was not updated accordingly
the caller tried to use the nbuf pointer

This case doesn't happen right now, but the code is fragile, so strengthen
it.
 1.43 10-Jul-2018  maxv Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.
 1.42 10-Jul-2018  maxv Simplify the pointer handling. Set *mp = NULL at the beginning of the
function. In npf_reassembly, pass a simple boolean instead of a ** mbuf
pointer. Add a KASSERT for IPv4, we don't want (error && !m). Remove
the 'fastout' label, use 'out'.
 1.41 10-Jul-2018  maxv Update the pointer when fast-kicking, because it may have been freed.
Before my changes the nonsensical pointer ininitialization held, but
when I started introducing sanity checks the whole thing collapsed.

Need pullup-8.
 1.40 10-Jul-2018  maxv Set con = NULL just once, instead of doing it in each branch.
 1.39 13-Mar-2018  maxv branches: 1.39.2;
Fix two consecutive mistakes.

The first mistake was npf_inet.c rev1.37:

"Don't reassemble ipv6 fragments, instead treat the first fragment
as a regular packet (subject to filtering rules), and pass
subsequent fragments in the same group unconditionally."

Doing this was entirely wrong, because then a packet just had to push
the L4 payload in a secondary fragment, and NPF wouldn't apply rules on
it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake
was supposed to be a fix for the second mistake.

The second mistake was that ip6_reass_packet (in npf_reassembly) was
getting called with npc->npc_hlen. But npc_hlen pointed to the last
encountered header in the IPv6 chain, which was not necessarily the
fragment header. So ip6_reass_packet was given garbage, and would fail,
resulting in the packet getting kicked. So basically IPv6 was broken by
NPF.

The first mistake is reverted, and the second one is fixed by doing:

- hlen = sizeof(struct ip6_frag);
+ hlen = 0;

Now the iteration stops on the fragment header, and the call to
ip6_reass_packet is valid.

My npf_inet.c rev1.38 is partially reverted: we don't need to worry
about failing properly to advance; once the packet is reassembled
npf_cache_ip gets called again, and this time the whole chain should be
there.

Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the
packet gets correctly reassembled by NPF now.
 1.38 08-Mar-2018  maxv Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.

Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.
 1.37 19-Feb-2017  christos branches: 1.37.6; 1.37.12;
Don't reassemble ipv6 fragments, instead treat the first fragment as a regular
packet (subject to filtering rules), and pass subsequent fragments in the
same group unconditionally.
 1.36 29-Jan-2017  christos - Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.35 26-Dec-2016  christos branches: 1.35.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.34 08-Dec-2016  rmind NPF: adjust the 'stateful-ends' mechanism to tag the packets and thus
pass-through them on other interfaces. Per discussion with christos@.
 1.33 23-Jul-2014  rmind branches: 1.33.4; 1.33.8;
NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.32 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.31 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.30 19-May-2014  jakllsch Add ability to have mbufs disappear (to another interface) during
npf_rproc_run(). For upcoming npf_ext_route extension.

Guidance and ok by rmind@.
 1.29 14-Mar-2014  rmind branches: 1.29.2;
NPF: add support for "stateful-ends".
 1.28 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.27 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.26 09-Feb-2013  rmind branches: 1.26.2;
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.25 20-Jan-2013  rmind - nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.24 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.23 06-Oct-2012  rmind npf_packet_handler: drop the packet if IPv6 reassembly did not work.
 1.22 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.21 12-Aug-2012  rmind branches: 1.21.2;
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.20 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.19 02-Jul-2012  rmind npf_packet_handler: fix gcc unused warning.
 1.18 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.17 30-May-2012  rmind npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
 1.16 06-May-2012  rmind - Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
 1.15 11-Mar-2012  rmind - Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
 1.14 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.13 06-Feb-2012  rmind branches: 1.13.2;
- Split NPF rule procedure code into a separate module (no functional changes).
- Simplify some code, add more comments, some asserts.
- G/C unused rule hook code.
 1.12 15-Jan-2012  rmind - Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.11 29-Nov-2011  rmind branches: 1.11.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.10 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.9 05-Nov-2011  zoltan When building the kernel without IPv6 support, compilation failed.
Fix that.
 1.8 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.7 02-Feb-2011  rmind branches: 1.7.2; 1.7.6;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.6 18-Jan-2011  rmind branches: 1.6.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.5 18-Dec-2010  rmind branches: 1.5.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.4 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.3 10-Oct-2010  rmind branches: 1.3.2;
npf_packet_handler: clear M_CANFASTFWD flag, so inspection would work when
fast forwarding is enabled (e.g. with GATEWAY kernel option). Thanks matt@
for the tip.
 1.2 16-Sep-2010  rmind branches: 1.2.2;
NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 16-Sep-2010  yamt file npf_handler.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.3.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.3.2.1 10-Oct-2010  uebayasi file npf_handler.c was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.7.6.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.6.5 23-Jan-2013  yamt sync with head
 1.7.6.4 30-Oct-2012  yamt sync with head
 1.7.6.3 23-May-2012  yamt sync with head.
 1.7.6.2 17-Apr-2012  yamt sync with head
 1.7.6.1 10-Nov-2011  yamt sync with head
 1.7.2.2 05-Mar-2011  rmind sync with head
 1.7.2.1 02-Feb-2011  rmind file npf_handler.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.11.2.4 02-Jun-2012  mrg sync to latest -current.
 1.11.2.3 05-Apr-2012  mrg sync to latest -current.
 1.11.2.2 24-Feb-2012  mrg sync to -current.
 1.11.2.1 18-Feb-2012  mrg merge to -current.
 1.13.2.9 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.13.2.8 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.13.2.7 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.13.2.6 09-Oct-2012  riz Pull up following revision(s) (requested by rmind in ticket #594):
sys/net/npf/npf_handler.c: revision 1.23
npf_packet_handler: drop the packet if IPv6 reassembly did not work.
 1.13.2.5 13-Aug-2012  riz branches: 1.13.2.5.2;
Pull up following revision(s) (requested by rmind in ticket #485):
lib/libnpf/npf.c: revision 1.11
sys/net/npf/npf_session.c: revision 1.17
sys/modules/npf/Makefile: revision 1.10
usr.sbin/npf/npftest/npftest.c: revision 1.4
usr.sbin/npf/npftest/README: revision 1.1
sys/net/npf/npf_tableset.c: revision 1.14
usr.sbin/npf/npftest/npftest.h: revision 1.4
lib/libnpf/npf.h: revision 1.10
sys/net/npf/npf_ruleset.c: revision 1.14
usr.sbin/npf/npfctl/npf_data.c: revision 1.18
usr.sbin/npf/npftest/npftest.conf: revision 1.1
sys/net/npf/npf_handler.c: revision 1.21
sys/net/npf/npf_impl.h: revision 1.21
usr.sbin/npf/npfctl/npfctl.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.1
usr.sbin/npf/npftest/npfstream.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.19
sys/net/npf/npf_nat.c: revision 1.16
sys/net/npf/npf_state.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.5
usr.sbin/npf/npfctl/npf_parse.y: revision 1.12
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.13.2.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.13.2.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.13.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.13.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.13.2.5.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.21.2.4 03-Dec-2017  jdolecek update from HEAD
 1.21.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.21.2.2 25-Feb-2013  tls resync with head
 1.21.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.26.2.2 18-May-2014  rmind sync with head
 1.26.2.1 28-Aug-2013  rmind sync with head
 1.29.2.1 10-Aug-2014  tls Rebase.
 1.33.8.2 20-Mar-2017  pgoyette Sync with HEAD
 1.33.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.33.4.2 28-Aug-2017  skrll Sync with HEAD
 1.33.4.1 05-Feb-2017  skrll Sync with HEAD
 1.35.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.37.12.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.37.12.2 28-Jul-2018  pgoyette Sync with HEAD
 1.37.12.1 15-Mar-2018  pgoyette Synch with HEAD
 1.37.6.2 10-Jul-2018  martin Pull up following revision(s) (requested by maxv in ticket #919):

sys/net/npf/npf_handler.c: revision 1.41

Update the pointer when fast-kicking, because it may have been freed.

Before my changes the nonsensical pointer ininitialization held, but
when I started introducing sanity checks the whole thing collapsed.

Need pullup-8.
 1.37.6.1 09-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #817):

sys/net/npf/npf_inet.c: revision 1.38-1.44
sys/net/npf/npf_handler.c: revision 1.38-1.39
sys/net/npf/npf_alg_icmp.c: revision 1.26
sys/net/npf/npf.h: revision 1.56
sys/net/npf/npf_sendpkt.c: revision 1.17-1.18

Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.
Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.

Switch nptr to uint8_t, and use nbuf_ensure_contig. Makes us use fewer
magic values.

Remove dead branches, 'npc' can't be NULL (and it is dereferenced
earlier).

Fix two consecutive mistakes.

The first mistake was npf_inet.c rev1.37:
"Don't reassemble ipv6 fragments, instead treat the first fragment
as a regular packet (subject to filtering rules), and pass
subsequent fragments in the same group unconditionally."

Doing this was entirely wrong, because then a packet just had to push
the L4 payload in a secondary fragment, and NPF wouldn't apply rules on
it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake
was supposed to be a fix for the second mistake.

The second mistake was that ip6_reass_packet (in npf_reassembly) was
getting called with npc->npc_hlen. But npc_hlen pointed to the last
encountered header in the IPv6 chain, which was not necessarily the
fragment header. So ip6_reass_packet was given garbage, and would fail,
resulting in the packet getting kicked. So basically IPv6 was broken by
NPF.

The first mistake is reverted, and the second one is fixed by doing:
- hlen = sizeof(struct ip6_frag);
+ hlen = 0;

Now the iteration stops on the fragment header, and the call to
ip6_reass_packet is valid.

My npf_inet.c rev1.38 is partially reverted: we don't need to worry
about failing properly to advance; once the packet is reassembled
npf_cache_ip gets called again, and this time the whole chain should be
there.

Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the
packet gets correctly reassembled by NPF now.

Mmh, put back the RFC6946 check (about dummy fragments), otherwise NPF
is not happy in npf_reassembly, because NPC_IPFRAG is again returned after
the packet was reassembled.

I'm wondering whether it would not be better to just remove the fragment
header in frag6_input directly.

Fix the "return-rst" rule on IPv6 packets.
The scopes needed to be set on the addresses before invoking ip6_output,
because ip6_output needs them. The reason they are not here already is
because pfil_run_hooks (in ip6_input) is called _before_ the kernel
initializes the scopes.

Until now ip6_output was always failing, and the IPv6-TCP-RST packet was
never actually sent.

Perhaps it would be better to have the kernel initialize the scopes
before invoking pfil_run_hooks, but several things will need to be fixed
in several places.

Tested with a simple TCPv6 server. Until now the client would block
waiting for an answer that never came; now it receives an RST right away
and closes the connection, as expected.
I believe that the same problem exists in the "return-icmp" rules, but I
can't investigate this right now (some problems with wireshark).

Fix the IPv6 payload computation in npf_tcpsaw. It was incorrect, and this
caused the "return-rst" rules to send back an RST with the wrong ACK when
the received SYN had an IPv6 option.

Set the scopes before calling icmp6_error(). This fixes a bug similar to
the one I fixed in rev1.17: since the scopes were not set the packet was
never actually sent.

Tested with wireshark, now the ICMPv6 reply is correctly sent, as
expected.

Don't read the L4 payload after IPPROTO_AH when handling IPv6 packets.
AH must be considered as the payload, otherwise a

block all
pass in proto ah from any
pass out proto ah from any

configuration will actually block everything, because NPF checks the
protocol against the one found after AH, and not AH itself.

In addition it may have been a problem for stateful connections; an AH
packet sent by an attacker with an incorrect authentication and a correct
TCP/UDP/whatever payload from an active connection could manage to change
NPF's FSM state, which would perhaps have altered the legitimate
connection with the authenticated remote IPsec host.

Note that IPv4 already doesn't go beyond AH, which is the correct
behavior.

Add XXX (we don't handle IPv6 Jumbograms), and whitespace.
 1.39.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.39.2.1 10-Jun-2019  christos Sync with HEAD
 1.46.2.3 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.46.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.46.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.49.26.1 02-Aug-2025  perseant Sync with HEAD
 1.13 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.12 30-Sep-2019  rmind npf_ifmap_copylogname: be more defensive.
 1.11 29-Sep-2019  rmind NPF ifmap: rework and fix a few small bugs.
 1.10 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.9 29-Sep-2018  rmind branches: 1.9.4;
NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.8 18-Feb-2017  christos branches: 1.8.12; 1.8.14;
provide a copy function used for logging that does not lock, but can return
trash.
 1.7 26-Dec-2016  christos branches: 1.7.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.6 12-May-2016  ozaki-r branches: 1.6.2;
Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.5 12-Jul-2015  rmind npfkern: eliminate INACTIVE_ID and use 0 for unregistered interfaces.
 1.4 10-Aug-2014  rmind branches: 1.4.2; 1.4.4; 1.4.6;
- Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.3 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.2 11-Nov-2013  martin branches: 1.2.2; 1.2.4; 1.2.6;
Add missing [0] (check for unused entries) when matching interface
names.
 1.1 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.2.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.6.1 11-Nov-2013  yamt file npf_if.c was added on branch yamt-pagecache on 2014-05-22 11:41:09 +0000
 1.2.4.2 18-May-2014  rmind sync with head
 1.2.4.1 11-Nov-2013  rmind file npf_if.c was added on branch rmind-smpnet on 2014-05-18 17:46:13 +0000
 1.2.2.1 10-Aug-2014  tls Rebase.
 1.4.6.4 28-Aug-2017  skrll Sync with HEAD
 1.4.6.3 05-Feb-2017  skrll Sync with HEAD
 1.4.6.2 29-May-2016  skrll Sync with HEAD
 1.4.6.1 22-Sep-2015  skrll Sync with HEAD
 1.4.4.3 03-Dec-2017  jdolecek update from HEAD
 1.4.4.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.4.4.1 10-Aug-2014  tls file npf_if.c was added on branch tls-maxphys on 2014-08-20 00:04:35 +0000
 1.4.2.1 17-Jul-2015  snj Pull up following revision(s) (requested by rmind in ticket #880):
sys/net/npf/npf_if.c: revision 1.5
sys/net/npf/npf_mbuf.c: revision 1.14
usr.sbin/npf/npf.7: revision 1.3
usr.sbin/npf/npfctl/npf_var.c: revision 1.9
npfkern: eliminate INACTIVE_ID and use 0 for unregistered interfaces.
--
- npfvar_get_type1: check for NULL first.
- Minor fix for the npf(7) man page.
 1.6.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.6.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.7.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.8.14.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.8.14.1 10-Jun-2019  christos Sync with HEAD
 1.8.12.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.9.4.3 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.9.4.2 04-Oct-2019  martin Pull up following revision(s) (requested by rmind in ticket #282):

usr.sbin/npf/npfctl/npf_build.c: revision 1.53
lib/libnpf/npf.c: revision 1.48
usr.sbin/npf/npfctl/npfctl.h: revision 1.50
sys/net/npf/npf_impl.h: revision 1.80
usr.sbin/npf/npfctl/npfctl.h: revision 1.51
sys/net/npf/npf_ruleset.c: revision 1.49
usr.sbin/npf/npfctl/npf.conf.5: revision 1.90
sys/net/npf/npf_ctl.c: revision 1.59
lib/libnpf/libnpf.3: revision 1.11
usr.sbin/npf/npfctl/npf_parse.y: revision 1.50
usr.sbin/npf/npftest/npftest.conf: revision 1.8
usr.sbin/npf/npfctl/npfctl.c: revision 1.62
usr.sbin/npf/npfctl/npfctl.c: revision 1.63
usr.sbin/npf/npfctl/npf_scan.l: revision 1.30
usr.sbin/npf/npfctl/npfctl.8: revision 1.22
lib/libnpf/npf.h: revision 1.38
usr.sbin/npf/npfctl/npfctl.8: revision 1.23
usr.sbin/npf/npfctl/npfctl.8: revision 1.24
sys/net/npf/npf_if.c: revision 1.11
sys/net/npf/npf_if.c: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.89
sys/net/npf/npf_conn.c: revision 1.30
usr.sbin/npf/npfctl/npf_build.c: revision 1.52

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.

NPF ifmap: rework and fix a few small bugs.

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.
(missed a file in previous commit; cvs is so helpful..)

libnpf/npfctl: support dynamic NAT rulesets using a name prefix.

Use -width Pa for FILES.

Fix pasto in table replace -t type

Use -width Pa for FILES.

npf_ifmap_copylogname: be more defensive.
 1.9.4.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.8 13-Feb-2022  riastradh npf(4): Use atomic_store_release and atomic_load_consume for config.

...or atomic_load_relaxed, when the config is locked. (Not necessary
to use atomic_* at all in NetBSD, but in C11 it will be cheaper to
say atomic_load_relaxed explicitly so an _Atomic-qualified object
doesn't cause the load to be surrounded by unnecessary membars.)

No need for store-before-load ordering here, so no need to
membar_sync.
 1.7 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.6 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.5 19-Jan-2019  rmind branches: 1.5.4;
Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.4 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.3 11-Dec-2017  ozaki-r branches: 1.3.2; 1.3.4;
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK

IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
 1.2 03-Jan-2017  rmind branches: 1.2.2; 1.2.6; 1.2.12; 1.2.18;
NPF: fix the interface table initialisation on load.
 1.1 02-Jan-2017  rmind NPF: implement dynamic handling of interface addresses (the kernel part).
 1.2.18.2 03-Dec-2017  jdolecek update from HEAD
 1.2.18.1 03-Jan-2017  jdolecek file npf_ifaddr.c was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.2.12.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.2.6.2 05-Feb-2017  skrll Sync with HEAD
 1.2.6.1 03-Jan-2017  skrll file npf_ifaddr.c was added on branch nick-nhusb on 2017-02-05 13:40:58 +0000
 1.2.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.2.2.1 03-Jan-2017  pgoyette file npf_ifaddr.c was added on branch pgoyette-localcount on 2017-01-07 08:56:50 +0000
 1.3.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.3.4.1 10-Jun-2019  christos Sync with HEAD
 1.3.2.2 26-Jan-2019  pgoyette Sync with HEAD
 1.3.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.5.4.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.5.4.1 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.85 01-Jul-2025  joe kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.84 01-Jun-2025  joe NPF copyright 2025
 1.83 01-Jun-2025  joe kernel: extract rules, lookup socket, process filtering, reviews by christos@
 1.82 27-Aug-2020  riastradh branches: 1.82.26;
npf: Make sure to initialize portmap_lock only once.

PR kern/55586
 1.81 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.80 29-Sep-2019  rmind NPF ifmap: rework and fix a few small bugs.
 1.79 25-Aug-2019  rmind ake npfctl_switch() and pfil private to OS-specific module.
 1.78 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.77 21-Aug-2019  rmind npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.76 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.75 23-Jul-2019  rmind branches: 1.75.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.74 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.73 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.72 12-Sep-2018  christos Fix lockdebug diagnostic error of trying to acquire an rw_lock from a
pserialized active context. From riastradh@
 1.71 31-Aug-2018  maxv Introduce npf_set_mss(). When the MSS is not 16bit-aligned, it sets:

0 8 16 24 32
+------+-----------+-----------+------+
| data | MSS (low) | MSS (hig) | data |
+------+-----------+-----------+------+
^ ^
old[0] old[1]

And sets new[0,1] accordingly with the new value. The MSS-clamping code
then adjusts twice the checksum on a 16bit boundary:

from old[0] to new[0]
from old[1] to new[1]

Fixes PR/53479, opened by myself. Tested with wireshark and kASan.
 1.70 10-Dec-2017  rmind branches: 1.70.2; 1.70.4;
- npf_mk_rules: enforce unique names for the dynamic rulesets.
- npf_worker_unregister: merge fix for the standalone NPF.
 1.69 19-Feb-2017  christos forgot to commit this (new prototype)
 1.68 29-Jan-2017  christos - Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.67 03-Jan-2017  rmind branches: 1.67.2;
NPF: fix the interface table initialisation on load.
 1.66 02-Jan-2017  rmind NPF: implement dynamic handling of interface addresses (the kernel part).
 1.65 28-Dec-2016  christos export rprocs too so we don't lose them.
 1.64 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.63 10-Dec-2016  christos add functionality to lookup a nat entry from the connection list.
 1.62 09-Dec-2016  christos This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
 1.61 02-Feb-2015  rmind branches: 1.61.2;
npfctl(8): report dynamic rule ID in a comment, print the case when libpcap
is used correctly. Also, add npf_ruleset_dump() helper in the kernel.
 1.60 30-Nov-2014  rmind - npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.59 11-Aug-2014  rmind branches: 1.59.2;
- Add and use npf_alg_export().
- npf_conn_import: handle NAT metadata correctly.
- npf_nat_newpolicy: restore the policy ID.
- npfctl_load: fix error code handling for the limit cases.
- npf_config_import: fix the inverted logic.
- npfctl_load: improve error handling.
 1.58 11-Aug-2014  rmind branches: 1.58.2;
NPF: finish up the rework of npfctl_save() mechanism.
 1.57 10-Aug-2014  rmind - Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.56 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.55 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.54 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.53 25-Jun-2014  rmind Adjust NPF to the recent BPF / BPF JIT changes and make it work again.
All regression tests are happy now (hi alnsn!).
 1.52 30-May-2014  rmind - npf_nat_freepolicy: handle a race condition when a new connection might
be associated with a NAT policy which is going away and npfctl reload
would wait for its natural expiration (potentially long time).
- Remove npf_ruleset_natreload() by merging into npf_ruleset_reload().
- npf_ruleset_reload: eliminate a small time period when a valid NAT
policy might be inactive during the reload operation.
 1.51 19-May-2014  jakllsch Add ability to have mbufs disappear (to another interface) during
npf_rproc_run(). For upcoming npf_ext_route extension.

Guidance and ok by rmind@.
 1.50 14-Mar-2014  rmind branches: 1.50.2;
NPF: add support for "stateful-ends".
 1.49 19-Feb-2014  rmind NPF: fix the recent breakage of the traceroute ALG. Also, simplify and
refactor a little bit.
 1.48 16-Feb-2014  rmind NPF: pass ALG functions via npfa_funcs_t structure.
 1.47 13-Feb-2014  rmind NPF: add support for IPv6-to-IPv6 Network Prefix Translation (NPTv6),
as per RFC 6296. Add a unit test. Also, bump NPF_VERSION.

Thanks to S.P.Zeidler for the help with NPTv6 work!
 1.46 06-Feb-2014  rmind Add support for CDB based NPF tables.
 1.45 06-Dec-2013  rmind NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.44 04-Dec-2013  rmind - npf_do_nat: fix a race condition and simplify the logic.
- npf_session_setnat: clear the NAT association on failure.
 1.43 23-Nov-2013  rmind Move initialisation of bpf_args_t into the npf_ruleset_inspect().
This allows us to reuse the BPF memory store as a cache.
 1.42 22-Nov-2013  rmind npf_addr_mix: use xor rather than sum.
 1.41 22-Nov-2013  rmind Add npf_tableset_syncdict() to sync the table IDs in the proplib dictionary,
as they can change on reload now. Also, fix table name checking in npfctl.
 1.40 16-Nov-2013  rmind NPF: convert to bpf_jit_generate()/bpf_jit_freecode().
 1.39 15-Nov-2013  rmind - Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.38 12-Nov-2013  rmind NPF: add support for table naming and remove NPF_TABLE_SLOTS (there is
just an arbitrary sanity limit of NPF_MAX_TABLES currently set to 128).

Few misc fixes. Bump NPF_VERSION.
 1.37 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.36 04-Nov-2013  rmind npf_generic_fsm and npf_tcp_fsm: use uint8_t and make the arrays more dense.
 1.35 29-Oct-2013  rmind npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.34 27-Oct-2013  rmind Add NPF_MAX_RULES, an artificial limit (set it to 1M).
 1.33 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.32 19-Sep-2013  rmind - Convert NPF to use BPF byte-code by default. Compile BPF byte-code in
npfctl(8) and generate separate marks to describe the filter criteria.
- Rewrite 'npfctl show' functionality and fix some of the bugs.
- npftest: add a test for BPF COP.
- Bump NPF_VERSION.
 1.31 02-Jun-2013  rmind branches: 1.31.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.30 19-May-2013  rmind - Add NPF table flushing functionality.
- Fix line numbering for npfctl debug command.
 1.29 20-Mar-2013  christos Make ALG's autoloadable by providing in the config file:
alg "algname"
 1.28 16-Feb-2013  rmind - Convert NPF dynamic rule ID to just incremented 64-bit counter.
- Fix multiple bugs. Also, update the man page.
 1.27 10-Feb-2013  rmind - Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
 1.26 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.25 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.24 29-Oct-2012  rmind Implement NPF table listing and preservation of entries on reload.
Bump the version.
 1.23 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.22 15-Aug-2012  rmind branches: 1.22.2;
Add npf_state_setsampler() for _NPF_TESTING case. This also fixes the build.
 1.21 12-Aug-2012  rmind - Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.20 28-Jul-2012  matt Fix -fno-common found by building i386/conf/ALL
 1.19 19-Jul-2012  spz teach npf ipv6-icmp
reviewed by rmind@
 1.18 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.17 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.16 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.15 30-May-2012  rmind npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
 1.14 06-May-2012  rmind - Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
 1.13 14-Apr-2012  rmind Update rumpdev_npf; use WARNS=4.
 1.12 11-Mar-2012  rmind - Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
 1.11 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.10 06-Feb-2012  rmind branches: 1.10.2;
- Split NPF rule procedure code into a separate module (no functional changes).
- Simplify some code, add more comments, some asserts.
- G/C unused rule hook code.
 1.9 29-Nov-2011  rmind branches: 1.9.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.8 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.7 02-Feb-2011  rmind branches: 1.7.2; 1.7.6;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.6 18-Jan-2011  rmind branches: 1.6.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.5 18-Dec-2010  rmind branches: 1.5.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.4 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.3 25-Sep-2010  matt branches: 1.3.2; 1.3.4;
Rename rb.h to rbtree.h, as it is more appropriate (c.f. ptree.h). Also
helps find code that hasn't been updated to use the new rbtree API.
 1.2 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.3.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.3.4.1 25-Sep-2010  uebayasi file npf_impl.h was added on branch uebayasi-xip on 2010-10-22 09:23:14 +0000
 1.3.2.2 09-Oct-2010  yamt sync with head
 1.3.2.1 25-Sep-2010  yamt file npf_impl.h was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.7.6.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.6.5 23-Jan-2013  yamt sync with head
 1.7.6.4 30-Oct-2012  yamt sync with head
 1.7.6.3 23-May-2012  yamt sync with head.
 1.7.6.2 17-Apr-2012  yamt sync with head
 1.7.6.1 10-Nov-2011  yamt sync with head
 1.7.2.2 05-Mar-2011  rmind sync with head
 1.7.2.1 02-Feb-2011  rmind file npf_impl.h was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.9.2.5 02-Jun-2012  mrg sync to latest -current.
 1.9.2.4 29-Apr-2012  mrg sync to latest -current.
 1.9.2.3 05-Apr-2012  mrg sync to latest -current.
 1.9.2.2 24-Feb-2012  mrg sync to -current.
 1.9.2.1 18-Feb-2012  mrg merge to -current.
 1.10.2.15 17-Nov-2013  bouyer Pull up following revision(s) (requested by rmind in ticket #985):
sys/net/npf/npf_impl.h: revision 1.35
sys/net/npf/npf_nat.c: revision 1.21
sys/net/npf/npf_session.c: revision 1.26
npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.10.2.14 18-Feb-2013  riz branches: 1.10.2.14.2;
Pull up following revision(s) (requested by rmind in ticket #829):
usr.sbin/npf/npfctl/npfctl.8: revision 1.13
usr.sbin/npf/npfctl/npf_build.c: revision 1.21
lib/libnpf/npf.c: revision 1.18
sys/net/npf/npf_ctl.c: revision 1.23
usr.sbin/npf/npfctl/npfctl.h: revision 1.27
lib/libnpf/npf.h: revision 1.15
sys/net/npf/npf_ruleset.c: revision 1.19
sys/net/npf/npf_impl.h: revision 1.28
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.c: revision 1.31
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.6
- Convert NPF dynamic rule ID to just incremented 64-bit counter.
- Fix multiple bugs. Also, update the man page.
 1.10.2.13 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.10.2.12 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.10.2.11 26-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #718):
usr.sbin/npf/npfctl/npfctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.c: revision 1.23
usr.sbin/npf/npfctl/npf_parse.y: revision 1.15
usr.sbin/npf/npfctl/npfctl.c: revision 1.24
usr.sbin/npf/npfctl/npf_parse.y: revision 1.16
usr.sbin/npf/npfctl/npfctl.h: revision 1.22
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.14
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.12
usr.sbin/npf/npfctl/npf_scan.l: revision 1.7
usr.sbin/npf/npfctl/npf_scan.l: revision 1.8
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.2
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.3
usr.sbin/npf/npfctl/npf_var.c: revision 1.6
usr.sbin/npf/npfctl/npf_var.c: revision 1.7
gcc 4.1 is not smart enough to notice &quot;arg&quot; is only used when initialized
correctly and produces a &quot;might be used unintialized&quot; warning.
npfctl: switch to efun(3) routines.
npfctl: switch to ecalloc(3).
 1.10.2.10 24-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #702):
sys/net/npf/npf_tableset.c: revision 1.15
usr.sbin/npf/npfctl/npfctl.h: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.6
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.10
sys/net/npf/npf_state_tcp.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.24
sys/net/npf/npf.h: revision 1.22
sys/net/npf/npf_ctl.c: revision 1.19
sys/net/npf/npf.c: revision 1.14
usr.sbin/npf/npfctl/npfctl.8: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.21
npf_tcp_inwindow: inspect the sequence numbers even if the packet contains no
data, fixing up only the RST to the initial SYN. This makes off-path attacks
more difficult. For the reference, see &quot;Reflection Scan: an Off-Path Attack
on TCP&quot; by Jan Wrobel.
Implement NPF table listing and preservation of entries on reload.
Bump the version.
npfctl(8): mention table listing.
 1.10.2.9 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.10.2.8 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #678):
sys/rump/librump/rumpkern/rump.c: revision 1.243
sys/rump/librump/rumpkern/rump.c: revision 1.244
sys/rump/librump/rumpkern/rump.c: revision 1.245
sys/rump/librump/rumpkern/rump.c: revision 1.246
usr.sbin/npf/npftest/npftest.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.2
usr.sbin/npf/npftest/npftest.h: revision 1.5
sys/rump/net/Makefile.rumpnetcomp: revision 1.5
sys/rump/net/lib/libnpf/shlib_version: revision 1.1
sys/net/npf/npf_impl.h: revision 1.22
sys/rump/dev/lib/libnpf/Makefile: file removal
usr.sbin/npf/npftest/Makefile: revision 1.3
sys/rump/dev/lib/libnpf/component.c: file removal
sys/rump/dev/lib/libnpf/shlib_version: file removal
sys/net/npf/npf_state.c: revision 1.12
sys/rump/net/lib/libnpf/component.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.6
sys/rump/net/lib/libnpf/Makefile: revision 1.1
Move and rename librumpdev_npf to librumpnet_npf.
Enable the build of librumpnet_npf.
Add npf_state_setsampler() for _NPF_TESTING case. This also fixes the build.
Call pserialize_init() during rump start-up, since librump/net/npf
uses it.
It helps to include the declaration of the routine being called.
We also need kcpuset_init() now.
Use correct routine name - kcpuset_sysinit() vs kcpuset_init()
 1.10.2.7 13-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #485):
lib/libnpf/npf.c: revision 1.11
sys/net/npf/npf_session.c: revision 1.17
sys/modules/npf/Makefile: revision 1.10
usr.sbin/npf/npftest/npftest.c: revision 1.4
usr.sbin/npf/npftest/README: revision 1.1
sys/net/npf/npf_tableset.c: revision 1.14
usr.sbin/npf/npftest/npftest.h: revision 1.4
lib/libnpf/npf.h: revision 1.10
sys/net/npf/npf_ruleset.c: revision 1.14
usr.sbin/npf/npfctl/npf_data.c: revision 1.18
usr.sbin/npf/npftest/npftest.conf: revision 1.1
sys/net/npf/npf_handler.c: revision 1.21
sys/net/npf/npf_impl.h: revision 1.21
usr.sbin/npf/npfctl/npfctl.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.1
usr.sbin/npf/npftest/npfstream.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.19
sys/net/npf/npf_nat.c: revision 1.16
sys/net/npf/npf_state.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.5
usr.sbin/npf/npfctl/npf_parse.y: revision 1.12
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.10.2.6 25-Jul-2012  jdc Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.10.2.5 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.10.2.4 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.10.2.3 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.10.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.10.2.14.2.1 17-Nov-2013  bouyer Pull up following revision(s) (requested by rmind in ticket #985):
sys/net/npf/npf_impl.h: revision 1.35
sys/net/npf/npf_nat.c: revision 1.21
sys/net/npf/npf_session.c: revision 1.26
npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.22.2.5 03-Dec-2017  jdolecek update from HEAD
 1.22.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.22.2.3 23-Jun-2013  tls resync from head
 1.22.2.2 25-Feb-2013  tls resync with head
 1.22.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.31.2.1 18-May-2014  rmind sync with head
 1.50.2.1 10-Aug-2014  tls Rebase.
 1.58.2.4 18-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1319):
sys/modules/npf/Makefile: revision 1.19
sys/net/npf/files.npf: revision 1.18
sys/net/npf/lpm.c: revision 1.1
sys/net/npf/lpm.h: revision 1.1
sys/net/npf/npf_impl.h: revision 1.62
sys/net/npf/npf_tableset.c: revision 1.24
sys/net/npf/npf_tableset_ptree.c: file removal
sys/rump/net/lib/libnpf/Makefile: revision 1.18
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
--
ditch ptree and use lpm
--
remove ptree add lpm
 1.58.2.3 04-Feb-2015  snj branches: 1.58.2.3.4;
Pull up following revision(s) (requested by rmind in ticket #479):
lib/libnpf/npf.c: revision 1.35
lib/libnpf/npf.h: revision 1.28
sys/net/npf/npf_conn.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.61
sys/net/npf/npf_ruleset.c: revision 1.41
usr.sbin/npf/npfctl/npf.conf.5: revision 1.44
usr.sbin/npf/npfctl/npf_parse.y: revision 1.37
usr.sbin/npf/npfctl/npf_show.c: revisions 1.16, 1.17
usr.sbin/npf/npfctl/npfctl.c: revision 1.46
load the config file before bpfjit so that we can disable the warning.
--
Don't depend on yacc to include stdlib.h or string.h.
--
- npf_conn_establish: remove a rare race condition when we might destroy a
connection when it is still referenced by another thread.
- npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
- Sprinkle some asserts.
--
npf.conf(5): mention alg, include in the example, minor fix.
--
npfctl(8): report dynamic rule ID in a comment, print the case when libpcap
is used correctly. Also, add npf_ruleset_dump() helper in the kernel.
--
libnpf: add npf_rule_getid() and npf_rule_getcode().
Missed in the previous commit.
--
npfctl_print_rule: print the ID in hex, not decimal.
 1.58.2.2 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #280):
sys/net/npf/npf_ruleset.c: revision 1.40
sys/net/npf/npf_nat.c: revision 1.36
sys/net/npf/npf_nat.c: revision 1.37
sys/net/npf/npf_conn.h: revision 1.7
sys/net/npf/npf_conf.c: revision 1.9
sys/net/npf/npf_ruleset.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.60
NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
- npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.58.2.1 29-Aug-2014  martin Pull up following revision(s) (requested by rmind in ticket #56):
sys/net/npf/npf_ctl.c: revision 1.39
usr.sbin/npf/npfctl/npfctl.c: revision 1.43
lib/libnpf/npf.c: revision 1.33
lib/libnpf/npf.c: revision 1.34
sys/net/npf/npf_impl.h: revision 1.59
sys/net/npf/npf_ctl.c: revision 1.40
sys/net/npf/npf_conn.c: revision 1.11
sys/net/npf/npf_alg.c: revision 1.15
sys/net/npf/npf_conn.c: revision 1.12
sys/net/npf/npf_nat.c: revision 1.33
sys/net/npf/npf_nat.c: revision 1.34
Add and use npf_alg_export().
npf_conn_import: handle NAT metadata correctly.
npf_nat_newpolicy: restore the policy ID.
npfctl_load: fix error code handling for the limit cases.
npf_config_import: fix the inverted logic.
npfctl_load: improve error handling.
npf_conn_import: add a missing stat counter increment.
npf_nat_import: add a missing reference and make a comment.
npf_config_submit: finally, include the saved connections.
 1.58.2.3.4.1 18-Jan-2017  skrll Sync with netbsd-5
 1.59.2.3 28-Aug-2017  skrll Sync with HEAD
 1.59.2.2 05-Feb-2017  skrll Sync with HEAD
 1.59.2.1 06-Apr-2015  skrll Sync with HEAD
 1.61.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.61.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.67.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.70.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.70.4.1 10-Jun-2019  christos Sync with HEAD
 1.70.2.3 26-Jan-2019  pgoyette Sync with HEAD
 1.70.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.70.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.75.2.5 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.75.2.4 04-Oct-2019  martin Pull up following revision(s) (requested by rmind in ticket #282):

usr.sbin/npf/npfctl/npf_build.c: revision 1.53
lib/libnpf/npf.c: revision 1.48
usr.sbin/npf/npfctl/npfctl.h: revision 1.50
sys/net/npf/npf_impl.h: revision 1.80
usr.sbin/npf/npfctl/npfctl.h: revision 1.51
sys/net/npf/npf_ruleset.c: revision 1.49
usr.sbin/npf/npfctl/npf.conf.5: revision 1.90
sys/net/npf/npf_ctl.c: revision 1.59
lib/libnpf/libnpf.3: revision 1.11
usr.sbin/npf/npfctl/npf_parse.y: revision 1.50
usr.sbin/npf/npftest/npftest.conf: revision 1.8
usr.sbin/npf/npfctl/npfctl.c: revision 1.62
usr.sbin/npf/npfctl/npfctl.c: revision 1.63
usr.sbin/npf/npfctl/npf_scan.l: revision 1.30
usr.sbin/npf/npfctl/npfctl.8: revision 1.22
lib/libnpf/npf.h: revision 1.38
usr.sbin/npf/npfctl/npfctl.8: revision 1.23
usr.sbin/npf/npfctl/npfctl.8: revision 1.24
sys/net/npf/npf_if.c: revision 1.11
sys/net/npf/npf_if.c: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.89
sys/net/npf/npf_conn.c: revision 1.30
usr.sbin/npf/npfctl/npf_build.c: revision 1.52

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.

NPF ifmap: rework and fix a few small bugs.

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.
(missed a file in previous commit; cvs is so helpful..)

libnpf/npfctl: support dynamic NAT rulesets using a name prefix.

Use -width Pa for FILES.

Fix pasto in table replace -t type

Use -width Pa for FILES.

npf_ifmap_copylogname: be more defensive.
 1.75.2.3 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.75.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #139):

lib/libnpf/npf.c: revision 1.47
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.10
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.10
sys/net/npf/npf.h: revision 1.61
sys/net/npf/npf_ctl.c: revision 1.56
sys/net/npf/npf_os.c: revision 1.15
lib/libnpf/libnpf.3: revision 1.10
sys/net/npf/npf_tableset.c: revision 1.34
usr.sbin/npf/npfctl/npfctl.c: revision 1.61
sys/net/npf/npf_impl.h: revision 1.77
lib/libnpf/npf.h: revision 1.37

- npftest: fix a memleak in a unit test (standalone path only).
- Minor style fixes. No functional change.
npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.75.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.82.26.1 02-Aug-2025  perseant Sync with HEAD
 1.58 01-Jul-2025  joe kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.57 30-May-2020  rmind branches: 1.57.26;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.56 23-May-2020  rmind Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.55 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.54 23-Jul-2019  rmind branches: 1.54.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.53 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.52 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.51 31-Aug-2018  maxv Introduce npf_set_mss(). When the MSS is not 16bit-aligned, it sets:

0 8 16 24 32
+------+-----------+-----------+------+
| data | MSS (low) | MSS (hig) | data |
+------+-----------+-----------+------+
^ ^
old[0] old[1]

And sets new[0,1] accordingly with the new value. The MSS-clamping code
then adjusts twice the checksum on a 16bit boundary:

from old[0] to new[0]
from old[1] to new[1]

Fixes PR/53479, opened by myself. Tested with wireshark and kASan.
 1.50 08-Apr-2018  maxv branches: 1.50.2;
Fix bug I introduced in previous commit.
 1.49 07-Apr-2018  maxv Rewrite npf_fetch_tcpopts:

* Instead of doing several nbuf_advance/nbuf_ensure_contig and
playing with gotos, fetch the TCP options only once, and iterate over
the (safe) area. The code is similar to tcp_dooptions.

* When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is
the one we're expecting. If it isn't, then skip the option. This
wasn't done before, and not doing it allowed a packet to bypass the
max-mss clamping procedure. Discussed on tech-net@.
 1.48 06-Apr-2018  maxv If we're trying to read the mss on a packet that for some reason has two
MAXSEG options, we find ourselves patching the second option with the
value of the first one.

Fix that by using a local variable.
 1.47 23-Mar-2018  maxv If we fail to advance inside TCP/UDP/ICMPv4/ICMPv6, stop pretending L4
is unknown, and error out right away.

This prevents bugs in machinery, if a place looks for L4 in 'npc_proto'
without checking the cache too. I've seen a ~similar problem already.
 1.46 22-Mar-2018  maxv Retrieve the complete IPv4 header right away, and make sure we did retrieve
the IPv6 option header we were iterating on.
 1.45 22-Mar-2018  maxv Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.

Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.
 1.44 21-Mar-2018  maxv Add XXX (we don't handle IPv6 Jumbograms), and whitespace.
 1.43 21-Mar-2018  maxv Don't read the L4 payload after IPPROTO_AH when handling IPv6 packets.

AH must be considered as the payload, otherwise a

block all
pass in proto ah from any
pass out proto ah from any

configuration will actually block everything, because NPF checks the
protocol against the one found after AH, and not AH itself.

In addition it may have been a problem for stateful connections; an AH
packet sent by an attacker with an incorrect authentication and a correct
TCP/UDP/whatever payload from an active connection could manage to change
NPF's FSM state, which would perhaps have altered the legitimate
connection with the authenticated remote IPsec host.

Note that IPv4 already doesn't go beyond AH, which is the correct
behavior.
 1.42 17-Mar-2018  maxv Fix the IPv6 payload computation in npf_tcpsaw. It was incorrect, and this
caused the "return-rst" rules to send back an RST with the wrong ACK when
the received SYN had an IPv6 option.
 1.41 13-Mar-2018  maxv Mmh, put back the RFC6946 check (about dummy fragments), otherwise NPF
is not happy in npf_reassembly, because NPC_IPFRAG is again returned after
the packet was reassembled.

I'm wondering whether it would not be better to just remove the fragment
header in frag6_input directly.
 1.40 13-Mar-2018  maxv Fix two consecutive mistakes.

The first mistake was npf_inet.c rev1.37:

"Don't reassemble ipv6 fragments, instead treat the first fragment
as a regular packet (subject to filtering rules), and pass
subsequent fragments in the same group unconditionally."

Doing this was entirely wrong, because then a packet just had to push
the L4 payload in a secondary fragment, and NPF wouldn't apply rules on
it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake
was supposed to be a fix for the second mistake.

The second mistake was that ip6_reass_packet (in npf_reassembly) was
getting called with npc->npc_hlen. But npc_hlen pointed to the last
encountered header in the IPv6 chain, which was not necessarily the
fragment header. So ip6_reass_packet was given garbage, and would fail,
resulting in the packet getting kicked. So basically IPv6 was broken by
NPF.

The first mistake is reverted, and the second one is fixed by doing:

- hlen = sizeof(struct ip6_frag);
+ hlen = 0;

Now the iteration stops on the fragment header, and the call to
ip6_reass_packet is valid.

My npf_inet.c rev1.38 is partially reverted: we don't need to worry
about failing properly to advance; once the packet is reassembled
npf_cache_ip gets called again, and this time the whole chain should be
there.

Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the
packet gets correctly reassembled by NPF now.
 1.39 08-Mar-2018  maxv Switch nptr to uint8_t, and use nbuf_ensure_contig. Makes us use fewer
magic values.
 1.38 08-Mar-2018  maxv Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.

Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.
 1.37 19-Feb-2017  christos branches: 1.37.6; 1.37.12;
Don't reassemble ipv6 fragments, instead treat the first fragment as a regular
packet (subject to filtering rules), and pass subsequent fragments in the
same group unconditionally.
 1.36 26-Dec-2016  christos branches: 1.36.2;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.35 07-Nov-2016  jnemeth fixup misplaced #endif
 1.34 18-Mar-2016  mrg branches: 1.34.2;
minimal changes necessary to link into an INET6-less kernel.
 1.33 17-Dec-2015  mlelstv make DDB print ipv6 addresses too
 1.32 20-Jul-2014  rmind branches: 1.32.2; 1.32.4; 1.32.6; 1.32.10;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.31 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.30 19-Feb-2014  rmind branches: 1.30.2;
NPF: fix the recent breakage of the traceroute ALG. Also, simplify and
refactor a little bit.
 1.29 13-Feb-2014  rmind NPF: add support for IPv6-to-IPv6 Network Prefix Translation (NPTv6),
as per RFC 6296. Add a unit test. Also, bump NPF_VERSION.

Thanks to S.P.Zeidler for the help with NPTv6 work!
 1.28 06-Dec-2013  rmind NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.27 22-Nov-2013  rmind Optimise checksum fixup routines:
- npf_fixup16_cksum: 1's complement sum is endian-independent.
- npf_fixup32_cksum: the first 32->16 bit reduction is not needed.

Pointed out by Valery Ushakov.
 1.26 22-Nov-2013  rmind npf_addr_mix: use xor rather than sum.
 1.25 30-Oct-2013  mrg used __diagused where appropriate.
 1.24 25-Oct-2013  martin Turn a few __unused into __diagused
 1.23 23-Aug-2013  rmind - npf_cache_ip: re-fetch IPv6 header since nbufs might have been reallocated.
- npf_cache_all: clear NBUF_DATAREF_RESET since npf_cache_ip() handles it.
 1.22 02-Jun-2013  rmind branches: 1.22.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.21 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.20 24-Dec-2012  rmind Silence gcc in npf_recache().
 1.19 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.18 10-Dec-2012  rmind npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.17 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.16 21-Jul-2012  rmind branches: 1.16.2;
- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.15 19-Jul-2012  spz teach npf ipv6-icmp
reviewed by rmind@
 1.14 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.13 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.12 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.11 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.10 29-Nov-2011  rmind branches: 1.10.2; 1.10.4;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.9 12-Nov-2011  jakllsch Make a comment consistent with the code.
 1.8 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.7 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.6 18-Jan-2011  rmind branches: 1.6.4; 1.6.8;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.5 18-Dec-2010  rmind branches: 1.5.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.4 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.3 25-Sep-2010  rmind branches: 1.3.2; 1.3.4;
Add nbuf_advfetch() and simplify some code slightly.
 1.2 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.3.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.3.4.1 25-Sep-2010  uebayasi file npf_inet.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.3.2.2 09-Oct-2010  yamt sync with head
 1.3.2.1 25-Sep-2010  yamt file npf_inet.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.8.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.8.5 23-Jan-2013  yamt sync with head
 1.6.8.4 16-Jan-2013  yamt sync with (a bit old) head
 1.6.8.3 30-Oct-2012  yamt sync with head
 1.6.8.2 17-Apr-2012  yamt sync with head
 1.6.8.1 10-Nov-2011  yamt sync with head
 1.6.4.2 05-Mar-2011  rmind sync with head
 1.6.4.1 18-Jan-2011  rmind file npf_inet.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.10.4.11 17-May-2018  martin Pull up following revision(s) via patch (requested by maxv in ticket #1549):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27,1.28

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).

Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.
 1.10.4.10 13-Sep-2013  msaitoh Pull up following revision (requested by riz in ticket #942):
/sys/net/npf/npf_inet.c revision 1.23
Fix bugs to prevent panic:
- npf_cache_ip: re-fetch IPv6 header since nbufs might have been reallocated.
- npf_cache_all: clear NBUF_DATAREF_RESET since npf_cache_ip() handles it.
 1.10.4.9 11-Feb-2013  riz branches: 1.10.4.9.2;
Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.10.4.8 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.10.4.7 16-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #746):
sys/net/npf/npf_inet.c: revision 1.18
sys/net/npf/npf_mbuf.c: revision 1.8
sys/net/npf/npf.h: revision 1.23
npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.10.4.6 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.10.4.5 25-Jul-2012  jdc branches: 1.10.4.5.4;
Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.10.4.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.10.4.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.10.4.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10.4.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.10.4.9.2.2 17-May-2018  martin Pull up following revision(s) via patch (requested by maxv in ticket #1549):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27,1.28

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).

Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.
 1.10.4.9.2.1 13-Sep-2013  msaitoh Pull up following revision (requested by riz in ticket #942):
/sys/net/npf/npf_inet.c revision 1.23
Fix bugs to prevent panic:
- npf_cache_ip: re-fetch IPv6 header since nbufs might have been reallocated.
- npf_cache_all: clear NBUF_DATAREF_RESET since npf_cache_ip() handles it.
 1.10.4.5.4.1 16-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #746):
sys/net/npf/npf_inet.c: revision 1.18
sys/net/npf/npf_mbuf.c: revision 1.8
sys/net/npf/npf.h: revision 1.23
npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.10.2.1 24-Feb-2012  mrg sync to -current.
 1.16.2.5 03-Dec-2017  jdolecek update from HEAD
 1.16.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.16.2.3 23-Jun-2013  tls resync from head
 1.16.2.2 25-Feb-2013  tls resync with head
 1.16.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.22.2.2 18-May-2014  rmind sync with head
 1.22.2.1 28-Aug-2013  rmind sync with head
 1.30.2.1 10-Aug-2014  tls Rebase.
 1.32.10.1 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1605):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.29

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.
We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.
Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.
In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).
This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.
 1.32.6.1 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1605):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.29

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.
We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.
Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.
In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).
This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.
 1.32.4.5 28-Aug-2017  skrll Sync with HEAD
 1.32.4.4 05-Feb-2017  skrll Sync with HEAD
 1.32.4.3 05-Dec-2016  skrll Sync with HEAD
 1.32.4.2 19-Mar-2016  skrll Sync with HEAD
 1.32.4.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.32.2.3 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1605):

sys/net/npf/npf_inet.c: revision 1.45
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.29

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.
We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.
Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.
In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).
This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.
 1.32.2.2 22-May-2017  martin Pull up missing part of rev 1.33, partly pulled up in ticket #1394:
make DDB print ipv6 addresses too
 1.32.2.1 12-May-2017  snj Pull up following revision(s) (requested by jnemeth in ticket #1394):
sys/net/npf/npf_inet.c: revisions 1.34, 1.35 via patch
sys/net/npf/npf_mbuf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.40
minimal changes necessary to link into an INET6-less kernel.
--
fixup misplaced #endif
 1.34.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.34.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.36.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.37.12.8 26-Jan-2019  pgoyette Sync with HEAD
 1.37.12.7 30-Sep-2018  pgoyette Ssync with HEAD
 1.37.12.6 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.37.12.5 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.37.12.4 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.37.12.3 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.37.12.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.37.12.1 15-Mar-2018  pgoyette Synch with HEAD
 1.37.6.2 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #823):

sys/net/npf/npf_inet.c: revision 1.45-1.47
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.30
sys/net/npf/npf_sendpkt.c: revision 1.19

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Retrieve the complete IPv4 header right away, and make sure we did retrieve
the IPv6 option header we were iterating on.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.

If we fail to advance inside TCP/UDP/ICMPv4/ICMPv6, stop pretending L4
is unknown, and error out right away.

This prevents bugs in machinery, if a place looks for L4 in 'npc_proto'
without checking the cache too. I've seen a ~similar problem already.

In addition to checking L4 in the cache, here we also need to check the
protocol. The NPF entry point does not ensure that
ICMPv6 can be set only in IPv6
ICMPv4 can be set only in IPv4
So we could have ICMPv6 in IPv4.

apply some INET6 so this compiles in INET6-less kernels again.
 1.37.6.1 09-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #817):

sys/net/npf/npf_inet.c: revision 1.38-1.44
sys/net/npf/npf_handler.c: revision 1.38-1.39
sys/net/npf/npf_alg_icmp.c: revision 1.26
sys/net/npf/npf.h: revision 1.56
sys/net/npf/npf_sendpkt.c: revision 1.17-1.18

Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.
Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.

Switch nptr to uint8_t, and use nbuf_ensure_contig. Makes us use fewer
magic values.

Remove dead branches, 'npc' can't be NULL (and it is dereferenced
earlier).

Fix two consecutive mistakes.

The first mistake was npf_inet.c rev1.37:
"Don't reassemble ipv6 fragments, instead treat the first fragment
as a regular packet (subject to filtering rules), and pass
subsequent fragments in the same group unconditionally."

Doing this was entirely wrong, because then a packet just had to push
the L4 payload in a secondary fragment, and NPF wouldn't apply rules on
it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake
was supposed to be a fix for the second mistake.

The second mistake was that ip6_reass_packet (in npf_reassembly) was
getting called with npc->npc_hlen. But npc_hlen pointed to the last
encountered header in the IPv6 chain, which was not necessarily the
fragment header. So ip6_reass_packet was given garbage, and would fail,
resulting in the packet getting kicked. So basically IPv6 was broken by
NPF.

The first mistake is reverted, and the second one is fixed by doing:
- hlen = sizeof(struct ip6_frag);
+ hlen = 0;

Now the iteration stops on the fragment header, and the call to
ip6_reass_packet is valid.

My npf_inet.c rev1.38 is partially reverted: we don't need to worry
about failing properly to advance; once the packet is reassembled
npf_cache_ip gets called again, and this time the whole chain should be
there.

Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the
packet gets correctly reassembled by NPF now.

Mmh, put back the RFC6946 check (about dummy fragments), otherwise NPF
is not happy in npf_reassembly, because NPC_IPFRAG is again returned after
the packet was reassembled.

I'm wondering whether it would not be better to just remove the fragment
header in frag6_input directly.

Fix the "return-rst" rule on IPv6 packets.
The scopes needed to be set on the addresses before invoking ip6_output,
because ip6_output needs them. The reason they are not here already is
because pfil_run_hooks (in ip6_input) is called _before_ the kernel
initializes the scopes.

Until now ip6_output was always failing, and the IPv6-TCP-RST packet was
never actually sent.

Perhaps it would be better to have the kernel initialize the scopes
before invoking pfil_run_hooks, but several things will need to be fixed
in several places.

Tested with a simple TCPv6 server. Until now the client would block
waiting for an answer that never came; now it receives an RST right away
and closes the connection, as expected.
I believe that the same problem exists in the "return-icmp" rules, but I
can't investigate this right now (some problems with wireshark).

Fix the IPv6 payload computation in npf_tcpsaw. It was incorrect, and this
caused the "return-rst" rules to send back an RST with the wrong ACK when
the received SYN had an IPv6 option.

Set the scopes before calling icmp6_error(). This fixes a bug similar to
the one I fixed in rev1.17: since the scopes were not set the packet was
never actually sent.

Tested with wireshark, now the ICMPv6 reply is correctly sent, as
expected.

Don't read the L4 payload after IPPROTO_AH when handling IPv6 packets.
AH must be considered as the payload, otherwise a

block all
pass in proto ah from any
pass out proto ah from any

configuration will actually block everything, because NPF checks the
protocol against the one found after AH, and not AH itself.

In addition it may have been a problem for stateful connections; an AH
packet sent by an attacker with an incorrect authentication and a correct
TCP/UDP/whatever payload from an active connection could manage to change
NPF's FSM state, which would perhaps have altered the legitimate
connection with the authenticated remote IPsec host.

Note that IPv4 already doesn't go beyond AH, which is the correct
behavior.

Add XXX (we don't handle IPv6 Jumbograms), and whitespace.
 1.50.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.50.2.1 10-Jun-2019  christos Sync with HEAD
 1.54.2.3 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.54.2.2 25-May-2020  martin Pull up following revision(s) (requested by rmind in ticket #930):

usr.sbin/npf/npfctl/npf_build.c: revision 1.54
sys/net/npf/npf_conn.h: revision 1.19
usr.sbin/npf/npfctl/npfctl.h: revision 1.52
usr.sbin/npf/npfctl/npf_show.c: revision 1.31
sys/net/npf/npf_conf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.56
sys/net/npf/npf_conndb.c: revision 1.8
sys/net/npf/npf_conn.c: revision 1.31

Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.54.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.57.26.1 02-Aug-2025  perseant Sync with HEAD
 1.17 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.16 09-Feb-2013  rmind branches: 1.16.2;
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.15 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.14 19-Jul-2012  spz branches: 1.14.2;
teach npf ipv6-icmp
reviewed by rmind@
 1.13 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.12 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.11 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.9 15-Jan-2012  rmind branches: 1.9.2;
- Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.8 29-Nov-2011  rmind branches: 1.8.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.7 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.6 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.5 18-Jan-2011  rmind branches: 1.5.4; 1.5.8;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.4 11-Nov-2010  rmind branches: 1.4.2;
NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.3 25-Sep-2010  rmind branches: 1.3.2; 1.3.4;
Add nbuf_advfetch() and simplify some code slightly.
 1.2 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.3.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.3.4.1 25-Sep-2010  uebayasi file npf_instr.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.3.2.2 09-Oct-2010  yamt sync with head
 1.3.2.1 25-Sep-2010  yamt file npf_instr.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.4.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.8.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.8.4 23-Jan-2013  yamt sync with head
 1.5.8.3 30-Oct-2012  yamt sync with head
 1.5.8.2 17-Apr-2012  yamt sync with head
 1.5.8.1 10-Nov-2011  yamt sync with head
 1.5.4.2 05-Mar-2011  rmind sync with head
 1.5.4.1 18-Jan-2011  rmind file npf_instr.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.8.2.2 24-Feb-2012  mrg sync to -current.
 1.8.2.1 18-Feb-2012  mrg merge to -current.
 1.9.2.8 17-Nov-2013  bouyer Apply patch, requested by rmind in ticket 986:
usr.sbin/npf/npfctl/npf_ncgen.c patch
sys/net/npf/npf_instr.c patch
fix the byteorder for port range comparison
 1.9.2.7 11-Feb-2013  riz branches: 1.9.2.7.2;
Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.9.2.6 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.9.2.5 25-Jul-2012  jdc branches: 1.9.2.5.4;
Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.9.2.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.9.2.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.9.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.9.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.9.2.7.2.1 17-Nov-2013  bouyer Apply patch, requested by rmind in ticket 986:
usr.sbin/npf/npfctl/npf_ncgen.c patch
sys/net/npf/npf_instr.c patch
fix the byteorder for port range comparison
 1.9.2.5.4.1 17-Nov-2013  bouyer Apply patch, requested by rmind in ticket 986:
usr.sbin/npf/npfctl/npf_ncgen.c patch
sys/net/npf/npf_instr.c patch
fix the byteorder for port range comparison
 1.14.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.14.2.1 25-Feb-2013  tls resync with head
 1.16.2.1 18-May-2014  rmind sync with head
 1.5 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.4 22-Jun-2012  rmind branches: 1.4.2;
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.3 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.2 18-Jan-2011  rmind branches: 1.2.4; 1.2.8; 1.2.12; 1.2.14;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.1 18-Dec-2010  rmind branches: 1.1.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.1.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.2.14.3 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.2.14.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.2.14.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.2.12.1 24-Feb-2012  mrg sync to -current.
 1.2.8.2 30-Oct-2012  yamt sync with head
 1.2.8.1 17-Apr-2012  yamt sync with head
 1.2.4.2 05-Mar-2011  rmind sync with head
 1.2.4.1 18-Jan-2011  rmind file npf_log.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.4.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.25 12-Feb-2023  kardel PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream as https://github.com/rmind/npf/pull/115
 1.24 30-May-2020  rmind branches: 1.24.20;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.23 26-Sep-2019  christos Cast m_mbuflen() result to "size_t". It could also be "u_int" since it is
assigned to "u_int", but all the other "standalone" equivalent functions return
"size_t".
 1.22 15-Nov-2018  maxv branches: 1.22.4;
Remove the 't' argument from m_tag_find().
 1.21 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.20 10-Aug-2018  maxv Rename

ip6_undefer_csum -> in6_undefer_cksum
in6_delayed_cksum -> in6_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in6_offload.c. Add comments to explain what
we're doing.

Same as IPv4.
 1.19 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.18 26-Dec-2016  christos branches: 1.18.14; 1.18.16;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.17 08-Dec-2016  rmind NPF: adjust the 'stateful-ends' mechanism to tag the packets and thus
pass-through them on other interfaces. Per discussion with christos@.
 1.16 18-Mar-2016  mrg branches: 1.16.2;
minimal changes necessary to link into an INET6-less kernel.
 1.15 17-Dec-2015  mlelstv handle delayed cksums also for ipv6
 1.14 12-Jul-2015  rmind npfkern: eliminate INACTIVE_ID and use 0 for unregistered interfaces.
 1.13 10-Aug-2014  rmind branches: 1.13.2; 1.13.4;
- Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.12 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.11 19-Feb-2013  rmind branches: 1.11.2;
nbuf_ensure_contig: fix assert (can be equal if there is zero-length mbuf).
Found by npftest on sparc64.
 1.10 20-Jan-2013  rmind - nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.9 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.8 10-Dec-2012  rmind npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.7 14-Apr-2012  rmind branches: 1.7.2;
Update rumpdev_npf; use WARNS=4.
 1.6 18-Jan-2011  rmind branches: 1.6.4; 1.6.8; 1.6.12; 1.6.14;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.5 11-Nov-2010  rmind branches: 1.5.2;
NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.4 03-Oct-2010  rmind branches: 1.4.2; 1.4.4;
nbuf_advfetch: fix bug and change behaviour on error case.
 1.3 25-Sep-2010  rmind Add nbuf_advfetch() and simplify some code slightly.
 1.2 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.4.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.4.4.1 03-Oct-2010  uebayasi file npf_mbuf.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.4.2.2 09-Oct-2010  yamt sync with head
 1.4.2.1 03-Oct-2010  yamt file npf_mbuf.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.14.3 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.6.14.2 16-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #746):
sys/net/npf/npf_inet.c: revision 1.18
sys/net/npf/npf_mbuf.c: revision 1.8
sys/net/npf/npf.h: revision 1.23
npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.6.14.1 26-Jun-2012  riz branches: 1.6.14.1.4;
Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.6.14.1.4.1 16-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #746):
sys/net/npf/npf_inet.c: revision 1.18
sys/net/npf/npf_mbuf.c: revision 1.8
sys/net/npf/npf.h: revision 1.23
npf_rwrcksum: handle delayed checksums in the network stack; also fix
non-NPF_NAT_PORTS case and add some comments. PR/47235.
 1.6.12.1 29-Apr-2012  mrg sync to latest -current.
 1.6.8.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.8.3 23-Jan-2013  yamt sync with head
 1.6.8.2 16-Jan-2013  yamt sync with (a bit old) head
 1.6.8.1 17-Apr-2012  yamt sync with head
 1.6.4.2 05-Mar-2011  rmind sync with head
 1.6.4.1 18-Jan-2011  rmind file npf_mbuf.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.7.2.3 03-Dec-2017  jdolecek update from HEAD
 1.7.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.2.1 25-Feb-2013  tls resync with head
 1.11.2.1 18-May-2014  rmind sync with head
 1.13.4.4 05-Feb-2017  skrll Sync with HEAD
 1.13.4.3 19-Mar-2016  skrll Sync with HEAD
 1.13.4.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.13.4.1 22-Sep-2015  skrll Sync with HEAD
 1.13.2.3 12-May-2017  snj Pull up following revision(s) (requested by jnemeth in ticket #1394):
sys/net/npf/npf_inet.c: revisions 1.34, 1.35 via patch
sys/net/npf/npf_mbuf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.40
minimal changes necessary to link into an INET6-less kernel.
--
fixup misplaced #endif
 1.13.2.2 26-Jan-2016  riz Pull up following revision(s) (requested by mlelstv in ticket #1065):
sys/net/npf/npf_mbuf.c: revision 1.15
handle delayed cksums also for ipv6
 1.13.2.1 17-Jul-2015  snj Pull up following revision(s) (requested by rmind in ticket #880):
sys/net/npf/npf_if.c: revision 1.5
sys/net/npf/npf_mbuf.c: revision 1.14
usr.sbin/npf/npf.7: revision 1.3
usr.sbin/npf/npfctl/npf_var.c: revision 1.9
npfkern: eliminate INACTIVE_ID and use 0 for unregistered interfaces.
--
- npfvar_get_type1: check for NULL first.
- Minor fix for the npf(7) man page.
 1.16.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.18.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.16.1 10-Jun-2019  christos Sync with HEAD
 1.18.14.4 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.18.14.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.18.14.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.18.14.1 28-Jul-2018  pgoyette Sync with HEAD
 1.22.4.2 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #119):

sys/net/npf/npf_mbuf.c: revision 1.25
sys/net/npf/npf.h: revision 1.64
sys/net/npf/npf_sendpkt.c: revision 1.23

PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream ashttps://github.com/rmind/npf/pull/115
 1.22.4.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.24.20.1 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #119):

sys/net/npf/npf_mbuf.c: revision 1.25
sys/net/npf/npf.h: revision 1.64
sys/net/npf/npf_sendpkt.c: revision 1.23

PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream ashttps://github.com/rmind/npf/pull/115
 1.54 01-Jul-2025  joe kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.53 24-Feb-2023  riastradh branches: 1.53.6;
npf: Eliminate __HAVE_ATOMIC_AS_MEMBAR conditionals.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html

Requested by rmind@:
https://github.com/rmind/npf/pull/127#issuecomment-1399573125
 1.52 09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.51 12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.50 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.49 23-May-2020  rmind Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.48 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.47 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.46 23-Jul-2019  rmind branches: 1.46.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.45 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.44 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.43 11-May-2018  maxv branches: 1.43.2;
Improve comment, it's not just IPv4.
 1.42 23-Apr-2018  christos PR/53207: David Binderman: Use logical and
 1.41 26-Dec-2016  christos branches: 1.41.8; 1.41.14;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.40 18-Mar-2016  mrg branches: 1.40.2;
minimal changes necessary to link into an INET6-less kernel.
 1.39 30-Dec-2014  christos Don't forget to destroy the mutex before freeing the nat struct on a failed
load.
XXX: pullup -7
 1.38 20-Dec-2014  rmind NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
 1.37 30-Nov-2014  rmind - npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.36 30-Nov-2014  rmind NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
 1.35 26-Nov-2014  rmind branches: 1.35.2;
NPF: fix the reference counting and share the active NAT portmap correctly
when performing the reload. Should fixes PR/49412, reported by kardel@.
 1.34 24-Aug-2014  rmind - npf_conn_import: add a missing stat counter increment.
- npf_nat_import: add a missing reference and make a comment.
 1.33 11-Aug-2014  rmind - Add and use npf_alg_export().
- npf_conn_import: handle NAT metadata correctly.
- npf_nat_newpolicy: restore the policy ID.
- npfctl_load: fix error code handling for the limit cases.
- npf_config_import: fix the inverted logic.
- npfctl_load: improve error handling.
 1.32 10-Aug-2014  rmind branches: 1.32.2;
- Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.31 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.30 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.29 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.28 30-May-2014  rmind - npf_nat_freepolicy: handle a race condition when a new connection might
be associated with a NAT policy which is going away and npfctl reload
would wait for its natural expiration (potentially long time).
- Remove npf_ruleset_natreload() by merging into npf_ruleset_reload().
- npf_ruleset_reload: eliminate a small time period when a valid NAT
policy might be inactive during the reload operation.
 1.27 14-Mar-2014  rmind branches: 1.27.2;
NPF: add support for "stateful-ends".
 1.26 19-Feb-2014  rmind NPF: fix the recent breakage of the traceroute ALG. Also, simplify and
refactor a little bit.
 1.25 13-Feb-2014  rmind NPF: add support for IPv6-to-IPv6 Network Prefix Translation (NPTv6),
as per RFC 6296. Add a unit test. Also, bump NPF_VERSION.

Thanks to S.P.Zeidler for the help with NPTv6 work!
 1.24 07-Feb-2014  rmind NPF: add support for static (stateless) NAT.
 1.23 06-Dec-2013  rmind NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.22 04-Dec-2013  rmind - npf_do_nat: fix a race condition and simplify the logic.
- npf_session_setnat: clear the NAT association on failure.
 1.21 29-Oct-2013  rmind npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.20 02-Jun-2013  rmind branches: 1.20.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.19 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.18 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.17 15-Aug-2012  rmind branches: 1.17.2;
- {npf_mk_rproc,npf_nat_save}: fix the fetching of {rproc-ptr,id_ptr}.
- npf_rproc_setlog: initialise variables to 0, as keys may not exist.

Bugs found by mlelstv@ while testing on Amiga.
 1.16 12-Aug-2012  rmind - Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.15 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.14 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.13 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.12 11-Mar-2012  rmind - Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
 1.11 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.10 05-Feb-2012  rmind branches: 1.10.2;
Multiple NPF fixes, add better error reporting from kernel side, add some
asserts, bump the version.
 1.9 15-Jan-2012  rmind - Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.8 19-Nov-2011  tls branches: 1.8.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.7 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.6 02-Feb-2011  rmind branches: 1.6.2; 1.6.6;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.5 18-Jan-2011  rmind branches: 1.5.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.4 18-Dec-2010  rmind branches: 1.4.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.3 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 16-Sep-2010  rmind branches: 1.2.2; 1.2.4;
NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 16-Sep-2010  uebayasi file npf_nat.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 16-Sep-2010  yamt file npf_nat.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.4.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.6.6.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.6.4 23-Jan-2013  yamt sync with head
 1.6.6.3 30-Oct-2012  yamt sync with head
 1.6.6.2 17-Apr-2012  yamt sync with head
 1.6.6.1 10-Nov-2011  yamt sync with head
 1.6.2.2 05-Mar-2011  rmind sync with head
 1.6.2.1 02-Feb-2011  rmind file npf_nat.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.8.2.3 05-Apr-2012  mrg sync to latest -current.
 1.8.2.2 24-Feb-2012  mrg sync to -current.
 1.8.2.1 18-Feb-2012  mrg merge to -current.
 1.10.2.9 17-Nov-2013  bouyer Pull up following revision(s) (requested by rmind in ticket #985):
sys/net/npf/npf_impl.h: revision 1.35
sys/net/npf/npf_nat.c: revision 1.21
sys/net/npf/npf_session.c: revision 1.26
npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.10.2.8 11-Feb-2013  riz branches: 1.10.2.8.2;
Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.10.2.7 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.10.2.6 19-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #511):
lib/libnpf/npf.c: revision 1.12
sys/net/npf/npf_ctl.c: revision 1.17
sys/net/npf/npf_nat.c: revision 1.17
- {npf_mk_rproc,npf_nat_save}: fix the fetching of {rproc-ptr,id_ptr}.
- npf_rproc_setlog: initialise variables to 0, as keys may not exist.
Bugs found by mlelstv@ while testing on Amiga.
 1.10.2.5 13-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #485):
lib/libnpf/npf.c: revision 1.11
sys/net/npf/npf_session.c: revision 1.17
sys/modules/npf/Makefile: revision 1.10
usr.sbin/npf/npftest/npftest.c: revision 1.4
usr.sbin/npf/npftest/README: revision 1.1
sys/net/npf/npf_tableset.c: revision 1.14
usr.sbin/npf/npftest/npftest.h: revision 1.4
lib/libnpf/npf.h: revision 1.10
sys/net/npf/npf_ruleset.c: revision 1.14
usr.sbin/npf/npfctl/npf_data.c: revision 1.18
usr.sbin/npf/npftest/npftest.conf: revision 1.1
sys/net/npf/npf_handler.c: revision 1.21
sys/net/npf/npf_impl.h: revision 1.21
usr.sbin/npf/npfctl/npfctl.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.1
usr.sbin/npf/npftest/npfstream.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.19
sys/net/npf/npf_nat.c: revision 1.16
sys/net/npf/npf_state.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.5
usr.sbin/npf/npfctl/npf_parse.y: revision 1.12
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.10.2.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.10.2.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.10.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.10.2.8.2.1 17-Nov-2013  bouyer Pull up following revision(s) (requested by rmind in ticket #985):
sys/net/npf/npf_impl.h: revision 1.35
sys/net/npf/npf_nat.c: revision 1.21
sys/net/npf/npf_session.c: revision 1.26
npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.17.2.4 03-Dec-2017  jdolecek update from HEAD
 1.17.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.2.2 23-Jun-2013  tls resync from head
 1.17.2.1 25-Feb-2013  tls resync with head
 1.20.2.1 18-May-2014  rmind sync with head
 1.27.2.1 10-Aug-2014  tls Rebase.
 1.32.2.6 12-May-2017  snj Pull up following revision(s) (requested by jnemeth in ticket #1394):
sys/net/npf/npf_inet.c: revisions 1.34, 1.35 via patch
sys/net/npf/npf_mbuf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.40
minimal changes necessary to link into an INET6-less kernel.
--
fixup misplaced #endif
 1.32.2.5 04-Jan-2015  martin Pull up following revision(s) (requested by rmind in ticket #374):
sys/net/npf/npf_nat.c: revision 1.39
Don't forget to destroy the mutex before freeing the nat struct on a failed
load.
 1.32.2.4 22-Dec-2014  msaitoh Pull up following revision(s) (requested by rmind in ticket #347):
sys/net/npf/npf_nat.c: revision 1.38
sys/net/npf/npf_conn.h: revision 1.8
sys/net/npf/npf_conn.c: revision 1.14
NPF: set the connection flags atomically in the post-creation logic and
fix a tiny race condition window. Might fix PR/49488.
 1.32.2.3 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #280):
sys/net/npf/npf_ruleset.c: revision 1.40
sys/net/npf/npf_nat.c: revision 1.36
sys/net/npf/npf_nat.c: revision 1.37
sys/net/npf/npf_conn.h: revision 1.7
sys/net/npf/npf_conf.c: revision 1.9
sys/net/npf/npf_ruleset.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.60
NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
- npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.32.2.2 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #274):
sys/net/npf/npf_nat.c: revision 1.35
sys/net/npf/npf_ruleset.c: revision 1.38
NPF: fix the reference counting and share the active NAT portmap correctly
when performing the reload. Should fixes PR/49412, reported by kardel@.
 1.32.2.1 29-Aug-2014  martin Pull up following revision(s) (requested by rmind in ticket #56):
sys/net/npf/npf_ctl.c: revision 1.39
usr.sbin/npf/npfctl/npfctl.c: revision 1.43
lib/libnpf/npf.c: revision 1.33
lib/libnpf/npf.c: revision 1.34
sys/net/npf/npf_impl.h: revision 1.59
sys/net/npf/npf_ctl.c: revision 1.40
sys/net/npf/npf_conn.c: revision 1.11
sys/net/npf/npf_alg.c: revision 1.15
sys/net/npf/npf_conn.c: revision 1.12
sys/net/npf/npf_nat.c: revision 1.33
sys/net/npf/npf_nat.c: revision 1.34
Add and use npf_alg_export().
npf_conn_import: handle NAT metadata correctly.
npf_nat_newpolicy: restore the policy ID.
npfctl_load: fix error code handling for the limit cases.
npf_config_import: fix the inverted logic.
npfctl_load: improve error handling.
npf_conn_import: add a missing stat counter increment.
npf_nat_import: add a missing reference and make a comment.
npf_config_submit: finally, include the saved connections.
 1.35.2.3 05-Feb-2017  skrll Sync with HEAD
 1.35.2.2 19-Mar-2016  skrll Sync with HEAD
 1.35.2.1 06-Apr-2015  skrll Sync with HEAD
 1.40.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.41.14.4 26-Jan-2019  pgoyette Sync with HEAD
 1.41.14.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.41.14.2 21-May-2018  pgoyette Sync with HEAD
 1.41.14.1 02-May-2018  pgoyette Synch with HEAD
 1.41.8.1 05-May-2018  martin Pull up following revision(s) (requested by prlw1 in ticket #795):

sys/net/npf/npf_nat.c: revision 1.42

PR/53207: David Binderman: Use logical and
 1.43.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.43.2.1 10-Jun-2019  christos Sync with HEAD
 1.46.2.4 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.46.2.3 25-May-2020  martin Pull up following revision(s) (requested by rmind in ticket #930):

usr.sbin/npf/npfctl/npf_build.c: revision 1.54
sys/net/npf/npf_conn.h: revision 1.19
usr.sbin/npf/npfctl/npfctl.h: revision 1.52
usr.sbin/npf/npfctl/npf_show.c: revision 1.31
sys/net/npf/npf_conf.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.56
sys/net/npf/npf_conndb.c: revision 1.8
sys/net/npf/npf_conn.c: revision 1.31

Backport selected NPF fixes from the upstream (to be pulled up):

- npf_conndb_lookup: protect the connection lookup with pserialize(9),
instead of incorrectly assuming that the handler always runs at IPL_SOFNET.
Should fix crashes reported on high load (PR/55182).

- npf_config_destroy: handle partially initialized config; fixes crashes
with some invalid configurations.

- NAT policy creation / destruction: set the initial reference and do not
wait for reference draining on destruction; destroy the policy on the
last reference drop instead. Fixes a lockup with the dynamic NAT rules.

- npf_nat_{export,import}: fix a regression since dynamic NAT rules.

- npfctl: fix a regression and restore the default group behaviour.

- Add npf_cache_tcp() and validate the TCP data offset (from maxv@).
 1.46.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.46.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.53.6.1 02-Aug-2025  perseant Sync with HEAD
 1.12 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.11 09-Feb-2013  rmind branches: 1.11.2;
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.10 19-Jul-2012  spz branches: 1.10.2;
teach npf ipv6-icmp
reviewed by rmind@
 1.9 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.8 15-Jun-2012  rmind - Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
 1.7 14-Apr-2012  rmind Update rumpdev_npf; use WARNS=4.
 1.6 10-Mar-2012  christos definitions used by the disassembler.
 1.5 04-Nov-2011  zoltan branches: 1.5.4; 1.5.6;
Add IPv6 support for NPF.
 1.4 18-Dec-2010  rmind branches: 1.4.6; 1.4.10;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.3 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 16-Sep-2010  rmind branches: 1.2.2; 1.2.4;
NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 16-Sep-2010  uebayasi file npf_ncode.h was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 16-Sep-2010  yamt file npf_ncode.h was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.4.10.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.3 30-Oct-2012  yamt sync with head
 1.4.10.2 17-Apr-2012  yamt sync with head
 1.4.10.1 10-Nov-2011  yamt sync with head
 1.4.6.2 05-Mar-2011  rmind sync with head
 1.4.6.1 18-Dec-2010  rmind file npf_ncode.h was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.5.6.5 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.5.6.4 25-Jul-2012  jdc Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.5.6.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.5.6.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.5.6.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.5.4.2 29-Apr-2012  mrg sync to latest -current.
 1.5.4.1 11-Mar-2012  mrg sync to latest -current
 1.10.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.2.1 25-Feb-2013  tls resync with head
 1.11.2.1 18-May-2014  rmind sync with head
 1.23 01-Jul-2025  joe kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.22 20-Mar-2025  pgoyette Disable autounload for the npf module, until we can figure out why
it's causing panic during system startup.
 1.21 27-Jan-2021  christos branches: 1.21.24;
Don't silently ignore the errors from npfctl_run_op. We end up returning
packets to userland that are missing required fields (like in rule_add the
id of the rule) and npfctl aborts.
 1.20 25-Jan-2021  christos Fix locking issue: npf_default_pass needs to be called with the config lock
held.
 1.19 18-Aug-2020  maxv branches: 1.19.2;
Add missing cases, to prevent memory corruption.

Reported-by: syzbot+f8b8a689a3560dda27f7@syzkaller.appspotmail.com
 1.18 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.17 25-Aug-2019  rmind ake npfctl_switch() and pfil private to OS-specific module.
 1.16 25-Aug-2019  rmind - npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
 1.15 21-Aug-2019  rmind npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.14 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.13 10-Aug-2019  rmind Add the ifnet_t::if_npf_private field. Bump the kernel version.
Fixes PR/54098.
 1.12 23-Jul-2019  rmind branches: 1.12.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.11 27-Feb-2019  mrg work around a GCC 7 vs sparc (32 bit) issue i haven't figured out
the real cause of yet.

mark npf_init() as non-static. for a yet-unknown reason, when this
function is inlined by the compiler (or a human!) into the single
caller, some CPUs end up in a hung state that can't be interrupted
eventually leading to system hang. eg:

[ 8.9693040] root on hme0
[ 8.9862690] nfs_boot: trying DHCP/BOOTP
xcall(cpu2,0xf0240ac8) from 0xf0241170: couldn't ping cpus: cpu1

is the symptom though sometimes nfs_boot is actually able to
complete mountroot before it hangs.


this may be a compiler bug but the symptom and the trigger are
far removed and my so-far reading of the "broken" npf_init
inlining has shown no issues, however, i haven't completed a
full scan of this asm in the past month so i'm commiting this
workaround for now.
 1.10 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.9 11-Dec-2017  ozaki-r branches: 1.9.2; 1.9.4;
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK

IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
 1.8 17-Nov-2017  ozaki-r branches: 1.8.2;
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.7 20-Jul-2017  pgoyette The nof module depends on some stuff from the bpf module, so set the
required modules list accordingly.
 1.6 27-Jan-2017  ryo branches: 1.6.2; 1.6.6; 1.6.8;
Don't hold softnet_lock if NET_MPSAFE.

Some functions lock softnet_lock while waiting in pserialize_perform() in pfil_add_hook().
(e.g. key_timehandler(), etc)
 1.5 03-Jan-2017  rmind branches: 1.5.2; 1.5.4;
NPF: fix the interface table initialisation on load.
 1.4 02-Jan-2017  christos make this compile as a module.
 1.3 02-Jan-2017  rmind NPF: implement dynamic handling of interface addresses (the kernel part).
 1.2 26-Dec-2016  rmind Convert NPF to the latest pfil(9) changes.
 1.1 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.5.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.5.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.5.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.5.2.1 03-Jan-2017  pgoyette file npf_os.c was added on branch pgoyette-localcount on 2017-01-07 08:56:50 +0000
 1.6.8.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.6.8.1 25-Jul-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #155):
sys/net/npf/npf_os.c: revision 1.7
The npf module depends on some stuff from the bpf module, so set the
required modules list accordingly.
 1.6.6.2 29-Apr-2017  pgoyette Remove explicit inclusion of <sys/localcount.h> since there is no
explicit usage of localcounts here. <sys/conf.h> will take care of
including as needed.
 1.6.6.1 28-Apr-2017  pgoyette The npf device may be loaded as a (rump) module, so make sure we have a
localcount in its devsw
 1.6.2.3 28-Aug-2017  skrll Sync with HEAD
 1.6.2.2 05-Feb-2017  skrll Sync with HEAD
 1.6.2.1 27-Jan-2017  skrll file npf_os.c was added on branch nick-nhusb on 2017-02-05 13:40:58 +0000
 1.8.2.2 03-Dec-2017  jdolecek update from HEAD
 1.8.2.1 17-Nov-2017  jdolecek file npf_os.c was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.9.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.9.4.1 10-Jun-2019  christos Sync with HEAD
 1.9.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.12.2.4 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.12.2.3 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #141):

usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.15
sys/net/npf/npf_alg.c: revision 1.21
sys/net/npf/npf.h: revision 1.62
sys/net/npf/npf_ctl.c: revision 1.57
sys/net/npf/npf_ctl.c: revision 1.58
sys/net/npf/npf_os.c: revision 1.16
sys/net/npf/npf_os.c: revision 1.17
sys/net/npf/npf_conf.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.78
sys/sys/mbuf.h: revision 1.220
sys/net/npf/npf_impl.h: revision 1.79
sys/net/npf/npf.c: revision 1.41
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.19
sys/net/npf/npf_nat.c: revision 1.48
sys/net/npf/npf_handler.c: revision 1.48
sys/net/npf/npf_ifaddr.c: revision 1.6

- npfctl_load_nvlist: simplify the config loading logic.
- Fix a small race condition in npf_nat_getaddr().
- Rework pserialize/EBR wrappers, make it easier to maintain.
Move PACKET_TAG_NPF where it belongs to.
Make npfctl_switch() and pfil private to OS-specific module.
 1.12.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #139):

lib/libnpf/npf.c: revision 1.47
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.10
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.10
sys/net/npf/npf.h: revision 1.61
sys/net/npf/npf_ctl.c: revision 1.56
sys/net/npf/npf_os.c: revision 1.15
lib/libnpf/libnpf.3: revision 1.10
sys/net/npf/npf_tableset.c: revision 1.34
usr.sbin/npf/npfctl/npfctl.c: revision 1.61
sys/net/npf/npf_impl.h: revision 1.77
lib/libnpf/npf.h: revision 1.37

- npftest: fix a memleak in a unit test (standalone path only).
- Minor style fixes. No functional change.
npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.12.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.19.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.21.24.1 02-Aug-2025  perseant Sync with HEAD
 1.6 12-Feb-2023  kardel PR kern/55654:
Switch default for parameter npf ip4.reassembly to 1.
This makes the NPF default configuration comply with host
requirements for IPv4.
 1.5 28-Apr-2022  martin branches: 1.5.4;
Make the thmap(9) used for params use sleepable allocations,
suggested by rmind@. Should fix PR 56802.
 1.4 28-Apr-2022  martin Temporary hack to make PR 56802 (when it happens) tell us for sure that
it is caused by KM_NOSLEEP memory allocation failure.
 1.3 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.2 11-Aug-2019  rmind branches: 1.2.8;
Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.1 23-Jul-2019  rmind branches: 1.1.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.1.2.3 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #1612):

usr.sbin/npf/npf-params.7: revision 1.9
sys/net/npf/npf_params.c: revision 1.6

PR kern/55654:

Switch default for parameter npf ip4.reassembly to 1.

This makes the NPF default configuration comply with host
requirements for IPv4.
 1.1.2.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.1.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.2.8.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2.8.1 11-Aug-2019  martin file npf_params.c was added on branch phil-wifi on 2020-04-13 08:05:15 +0000
 1.5.4.1 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #118):

usr.sbin/npf/npf-params.7: revision 1.9
sys/net/npf/npf_params.c: revision 1.6

PR kern/55654:

Switch default for parameter npf ip4.reassembly to 1.

This makes the NPF default configuration comply with host
requirements for IPv4.
 1.7 28-Aug-2020  riastradh npf: Remove harmless vestiges of debugging hacks.
 1.6 27-Aug-2020  riastradh npf: Make sure to initialize portmap_lock only once.

PR kern/55586
 1.5 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.4 11-Aug-2019  rmind branches: 1.4.8;
Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.3 25-Jul-2019  rmind branches: 1.3.2;
npf_portmap_flush: remove invalid assert (this routine can be callied via
the npf_destroy() path where the constraint is not applicable).
 1.2 23-Jul-2019  rmind NPF portmap: add a workaround for archs without 64-bit CAS.
 1.1 23-Jul-2019  rmind NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.3.2.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.3.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.4.8.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4.8.1 11-Aug-2019  martin file npf_portmap.c was added on branch phil-wifi on 2020-04-13 08:05:15 +0000
 1.16 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.15 09-Feb-2013  rmind branches: 1.15.2;
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.14 20-Jan-2013  rmind - nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.13 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.12 19-Jul-2012  spz branches: 1.12.2;
teach npf ipv6-icmp
reviewed by rmind@
 1.11 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.10 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.9 05-Feb-2012  rmind branches: 1.9.2;
Multiple NPF fixes, add better error reporting from kernel side, add some
asserts, bump the version.
 1.8 15-Jan-2012  rmind - Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.7 29-Nov-2011  rmind branches: 1.7.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.6 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.5 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.4 18-Dec-2010  rmind branches: 1.4.6; 1.4.10;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.3 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 16-Sep-2010  rmind branches: 1.2.2; 1.2.4;
NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 16-Sep-2010  uebayasi file npf_processor.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 16-Sep-2010  yamt file npf_processor.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.4.10.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.10.4 23-Jan-2013  yamt sync with head
 1.4.10.3 30-Oct-2012  yamt sync with head
 1.4.10.2 17-Apr-2012  yamt sync with head
 1.4.10.1 10-Nov-2011  yamt sync with head
 1.4.6.2 05-Mar-2011  rmind sync with head
 1.4.6.1 18-Dec-2010  rmind file npf_processor.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.7.2.2 24-Feb-2012  mrg sync to -current.
 1.7.2.1 18-Feb-2012  mrg merge to -current.
 1.9.2.5 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.9.2.4 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.9.2.3 25-Jul-2012  jdc Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.9.2.2 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.9.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.12.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.1 25-Feb-2013  tls resync with head
 1.15.2.1 18-May-2014  rmind sync with head
 1.23 24-Feb-2023  riastradh npf: Eliminate __HAVE_ATOMIC_AS_MEMBAR conditionals.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html

Requested by rmind@:
https://github.com/rmind/npf/pull/127#issuecomment-1399573125
 1.22 09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.21 12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.20 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.19 23-Jul-2019  rmind branches: 1.19.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.18 11-Apr-2019  kamil Fix CVS Id usage
 1.17 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.16 29-Jan-2017  christos branches: 1.16.12; 1.16.14;
- Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.15 28-Dec-2016  christos branches: 1.15.2;
export rprocs too so we don't lose them.
 1.14 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.13 10-Dec-2016  christos Add missing extcalls array. This is currently a no-op, but this is what
userland does too. Allows npfctl save; npfctl load to work again.
 1.12 11-Aug-2014  rmind branches: 1.12.4; 1.12.8;
NPF: finish up the rework of npfctl_save() mechanism.
 1.11 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.10 19-May-2014  jakllsch Add ability to have mbufs disappear (to another interface) during
npf_rproc_run(). For upcoming npf_ext_route extension.

Guidance and ok by rmind@.
 1.9 11-Mar-2013  christos branches: 1.9.10;
prevent the lookup function from autoloading recursively.
 1.8 11-Mar-2013  christos move the module loading in the correct place.
 1.7 10-Mar-2013  christos Split the npflog cloner and auto-load the extensions.
 1.6 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.5 20-Jan-2013  rmind - nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.4 03-Oct-2012  mlelstv ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.3 16-Sep-2012  rmind Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
 1.2 20-Feb-2012  rmind branches: 1.2.2; 1.2.4;
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.1 06-Feb-2012  rmind branches: 1.1.2; 1.1.4;
- Split NPF rule procedure code into a separate module (no functional changes).
- Simplify some code, add more comments, some asserts.
- G/C unused rule hook code.
 1.1.4.3 24-Feb-2012  mrg sync to -current.
 1.1.4.2 18-Feb-2012  mrg merge to -current.
 1.1.4.1 06-Feb-2012  mrg file npf_rproc.c was added on branch jmcneill-usbmp on 2012-02-18 07:35:38 +0000
 1.1.2.4 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.1.2.3 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.1.2.2 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #693):
lib/npf/ext_normalise/shlib_version: revision 1.1
lib/libnpf/npf.c: revision 1.13
distrib/sets/lists/modules/mi: revision 1.48
sys/net/npf/npf_rproc.c: revision 1.3
sys/net/npf/npf_rproc.c: revision 1.4
sys/modules/npf/Makefile: revision 1.11
usr.sbin/npf/npfctl/npfctl.h: revision 1.20
lib/npf/ext_log/npfext_log.c: revision 1.1
lib/libnpf/npf.h: revision 1.11
sys/net/npf/npf_inet.c: revision 1.17
sys/net/npf/npf_log.c: file removal
sys/net/npf/npf_handler.c: revision 1.22
distrib/sets/lists/base/shl.mi: revision 1.636
sys/net/npf/npf_impl.h: revision 1.23
usr.sbin/npf/npfctl/Makefile: revision 1.8
lib/npf/Makefile: revision 1.1
lib/npf/ext_log/shlib_version: revision 1.1
lib/Makefile: revision 1.189
distrib/sets/lists/comp/shl.mi: revision 1.236
usr.sbin/npf/npfctl/npf_build.c: revision 1.14
distrib/sets/lists/base/mi: revision 1.1007
usr.sbin/npf/npfctl/npf_scan.l: revision 1.6
distrib/sets/lists/base/mi: revision 1.1009
sys/net/npf/npf.h: revision 1.21
lib/npf/ext_normalise/npfext_normalise.c: revision 1.1
etc/mtree/NetBSD.dist.base: revision 1.105
lib/libnpf/Makefile: revision 1.3
etc/mtree/NetBSD.dist.base: revision 1.106
usr.sbin/npf/npfctl/npf_extmod.c: revision 1.1
sys/net/npf/npf_ctl.c: revision 1.18
lib/npf/ext_log/Makefile: revision 1.1
distrib/sets/lists/comp/mi: revision 1.1781
usr.sbin/npf/npfctl/npf_var.h: revision 1.4
sys/net/npf/npf.c: revision 1.13
sys/modules/Makefile: revision 1.111
sys/net/npf/npf_ext_log.c: revision 1.1
lib/npf/Makefile.inc: revision 1.1
sys/net/npf/npf_ext_normalise.c: revision 1.1
sys/net/npf/files.npf: revision 1.8
sys/rump/net/lib/libnpf/Makefile: revision 1.2
sys/modules/npf_ext_log/Makefile: revision 1.1
lib/npf/ext_normalise/Makefile: revision 1.1
usr.sbin/npf/npfctl/npfctl.c: revision 1.20
usr.sbin/npf/npfctl/npf_parse.y: revision 1.13
sys/modules/npf_ext_normalise/Makefile: revision 1.1
Implement dynamic NPF extensions interface. An extension consists of
dynamically loaded module (.so) supplementing npfctl(8) and a kernel
module. Move normalisation and logging functionality into their own
extensions. More improvements to come.
Add /usr/lib/npf.
Add ./usr/libdata/debug/usr/lib/npf for rmind
Fix MKDEBUG set lists
ext_ops does not change during the life cycle and can be fetched without
the mutex held. This avoids confusion in the compiler about an uninitialized
variable ext_ops.
ok rmind@
 1.1.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.2.4.5 03-Dec-2017  jdolecek update from HEAD
 1.2.4.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.4.3 23-Jun-2013  tls resync from head
 1.2.4.2 25-Feb-2013  tls resync with head
 1.2.4.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.2.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.2.4 23-Jan-2013  yamt sync with head
 1.2.2.3 30-Oct-2012  yamt sync with head
 1.2.2.2 17-Apr-2012  yamt sync with head
 1.2.2.1 20-Feb-2012  yamt file npf_rproc.c was added on branch yamt-pagecache on 2012-04-17 00:08:39 +0000
 1.9.10.1 10-Aug-2014  tls Rebase.
 1.12.8.2 20-Mar-2017  pgoyette Sync with HEAD
 1.12.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.12.4.1 05-Feb-2017  skrll Sync with HEAD
 1.15.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.16.14.1 10-Jun-2019  christos Sync with HEAD
 1.16.12.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.19.2.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.57 09-Oct-2025  joe PR kern/59615 introduce layer checks for 10 userland 11 kernel
 1.56 01-Jul-2025  joe branches: 1.56.2;
kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.55 01-Jun-2025  joe NPF copyright 2025
 1.54 01-Jun-2025  joe npfctl: show user/group in retrieved rule
 1.53 01-Jun-2025  joe kernel: extract rules, lookup socket, process filtering, reviews by christos@
 1.52 08-Aug-2023  kardel branches: 1.52.6;
The analysis documented in PR misc/56990 is correct.
Fix by not returning when encountering a ruleset rule.

The code up to now would stop at any group rule.

ruleset rules are marked as group rule and a dynamic rule.

processing is only finished when a result is present AND
we are looking at a plain group rule.
 1.51 30-May-2020  rmind branches: 1.51.20;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.50 12-Feb-2020  christos PR/54950: Lloyd Parkes: Avoid NULL deref.
 1.49 29-Sep-2019  rmind branches: 1.49.2;
NPF ifmap: rework and fix a few small bugs.
 1.48 23-Jul-2019  rmind branches: 1.48.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.47 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.46 10-Dec-2017  rmind branches: 1.46.2; 1.46.4;
- npf_mk_rules: enforce unique names for the dynamic rulesets.
- npf_worker_unregister: merge fix for the standalone NPF.
 1.45 29-Jan-2017  christos branches: 1.45.6;
- Increase copyin buffer size to 4M
- Change log output format to be like the OpenBSD's pf including in
the header the matching rule etc, and fill in the matching info.
 1.44 28-Dec-2016  christos branches: 1.44.2;
export rprocs too so we don't lose them.
 1.43 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.42 20-Mar-2015  rmind branches: 1.42.2;
NPF: replace the TAILQ of the dynamic rules with a linked list and fix the
inheriting of the active dynamic rules during the reload; also, fix a bug
in the insert path by putting a memory barrier in the right place.
 1.41 02-Feb-2015  rmind npfctl(8): report dynamic rule ID in a comment, print the case when libpcap
is used correctly. Also, add npf_ruleset_dump() helper in the kernel.
 1.40 30-Nov-2014  rmind - npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.39 30-Nov-2014  rmind NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
 1.38 26-Nov-2014  rmind branches: 1.38.2;
NPF: fix the reference counting and share the active NAT portmap correctly
when performing the reload. Should fixes PR/49412, reported by kardel@.
 1.37 11-Aug-2014  rmind branches: 1.37.2;
NPF: finish up the rework of npfctl_save() mechanism.
 1.36 10-Aug-2014  rmind - Add npf_ruleset_export(), npf_rule_export() and npf_nat_policyexport().
- Split off npf_conn_export(). Add npf_ifmap_getname() and use it to save
the interface name; pick it up on npf_conn_import().
- Misc fixes. Bump NPF_VERSION.
 1.35 23-Jul-2014  rmind NPF: rework of the connection saving and restoring:
- Add support for saving a snapshot of the current connections together
with a full configuration. Support a reverse load operation. Eliminate
the old 'sess-save' and 'sess-load' in favour of the new mechanism.
- Share code between load and reload operations: the latter performs
load from npf.conf without affecting the connections.
- Simplify and fix races with connection loading.
- Bump NPF_VERSION.
 1.34 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.33 25-Jun-2014  rmind Adjust NPF to the recent BPF / BPF JIT changes and make it work again.
All regression tests are happy now (hi alnsn!).
 1.32 24-Jun-2014  alnsn Implement copfuncs and external memory in bpfjit.
 1.31 30-May-2014  rmind - npf_nat_freepolicy: handle a race condition when a new connection might
be associated with a NAT policy which is going away and npfctl reload
would wait for its natural expiration (potentially long time).
- Remove npf_ruleset_natreload() by merging into npf_ruleset_reload().
- npf_ruleset_reload: eliminate a small time period when a valid NAT
policy might be inactive during the reload operation.
 1.30 04-Dec-2013  rmind branches: 1.30.2;
- npf_do_nat: fix a race condition and simplify the logic.
- npf_session_setnat: clear the NAT association on failure.
 1.29 23-Nov-2013  rmind Move initialisation of bpf_args_t into the npf_ruleset_inspect().
This allows us to reuse the BPF memory store as a cache.
 1.28 16-Nov-2013  rmind NPF: convert to bpf_jit_generate()/bpf_jit_freecode().
 1.27 15-Nov-2013  rmind - Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.26 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.25 19-Sep-2013  rmind NPF: G/C n-code in favour of BPF byte-code. Delete lots of code, mmm!
 1.24 19-Sep-2013  rmind - Convert NPF to use BPF byte-code by default. Compile BPF byte-code in
npfctl(8) and generate separate marks to describe the filter criteria.
- Rewrite 'npfctl show' functionality and fix some of the bugs.
- npftest: add a test for BPF COP.
- Bump NPF_VERSION.
 1.23 18-Sep-2013  rmind Add bpf_filter_ext() to use with BPF COP, restore bpf_filter() as it was
originally to preserve compatibility. Similarly, add bpf_validate_ext()
which takes bpf_ctx_t.
 1.22 30-Aug-2013  rmind bpf_filter: add a custom argument which can be passed to coprocessor routine.
 1.21 29-Aug-2013  rmind Implement BPF_COP/BPF_COPX instructions in the misc category (BPF_MISC)
which add a capability to call external functions in a predetermined way.

It can be thought as a BPF "coprocessor" -- a generic mechanism to offload
more complex packet inspection operations. There is no default coprocessor
and this functionality is not targeted to the /dev/bpf. This is primarily
targeted to the kernel subsystems, therefore there is no way to set a custom
coprocessor at the userlevel.

Discussed on: tech-net@
OK: core@
 1.20 18-Mar-2013  rmind branches: 1.20.6;
Always use BPF JIT for NPF rules (using BPF code) if it is available.
 1.19 16-Feb-2013  rmind - Convert NPF dynamic rule ID to just incremented 64-bit counter.
- Fix multiple bugs. Also, update the man page.
 1.18 10-Feb-2013  rmind - Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
 1.17 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.16 20-Jan-2013  rmind - nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.15 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.14 12-Aug-2012  rmind branches: 1.14.2;
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.13 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.12 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.11 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.10 06-Feb-2012  rmind branches: 1.10.2;
- Split NPF rule procedure code into a separate module (no functional changes).
- Simplify some code, add more comments, some asserts.
- G/C unused rule hook code.
 1.9 15-Jan-2012  rmind - Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.8 08-Dec-2011  rmind - Explain the magic in npf_tcpfl2case().
- Use __unused instead of (void)cast; fix comment.
 1.7 02-Feb-2011  rmind branches: 1.7.2; 1.7.6; 1.7.10;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.6 18-Jan-2011  rmind branches: 1.6.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.5 27-Dec-2010  uebayasi branches: 1.5.2;
Fix build.
 1.4 18-Dec-2010  rmind NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.3 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 16-Sep-2010  rmind branches: 1.2.2; 1.2.4;
NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 16-Sep-2010  uebayasi file npf_ruleset.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 16-Sep-2010  yamt file npf_ruleset.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.5.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.7.10.2 24-Feb-2012  mrg sync to -current.
 1.7.10.1 18-Feb-2012  mrg merge to -current.
 1.7.6.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.6.3 23-Jan-2013  yamt sync with head
 1.7.6.2 30-Oct-2012  yamt sync with head
 1.7.6.1 17-Apr-2012  yamt sync with head
 1.7.2.2 05-Mar-2011  rmind sync with head
 1.7.2.1 02-Feb-2011  rmind file npf_ruleset.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.10.2.7 18-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #829):
usr.sbin/npf/npfctl/npfctl.8: revision 1.13
usr.sbin/npf/npfctl/npf_build.c: revision 1.21
lib/libnpf/npf.c: revision 1.18
sys/net/npf/npf_ctl.c: revision 1.23
usr.sbin/npf/npfctl/npfctl.h: revision 1.27
lib/libnpf/npf.h: revision 1.15
sys/net/npf/npf_ruleset.c: revision 1.19
sys/net/npf/npf_impl.h: revision 1.28
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.c: revision 1.31
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.6
- Convert NPF dynamic rule ID to just incremented 64-bit counter.
- Fix multiple bugs. Also, update the man page.
 1.10.2.6 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.10.2.5 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.10.2.4 13-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #485):
lib/libnpf/npf.c: revision 1.11
sys/net/npf/npf_session.c: revision 1.17
sys/modules/npf/Makefile: revision 1.10
usr.sbin/npf/npftest/npftest.c: revision 1.4
usr.sbin/npf/npftest/README: revision 1.1
sys/net/npf/npf_tableset.c: revision 1.14
usr.sbin/npf/npftest/npftest.h: revision 1.4
lib/libnpf/npf.h: revision 1.10
sys/net/npf/npf_ruleset.c: revision 1.14
usr.sbin/npf/npfctl/npf_data.c: revision 1.18
usr.sbin/npf/npftest/npftest.conf: revision 1.1
sys/net/npf/npf_handler.c: revision 1.21
sys/net/npf/npf_impl.h: revision 1.21
usr.sbin/npf/npfctl/npfctl.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.1
usr.sbin/npf/npftest/npfstream.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.19
sys/net/npf/npf_nat.c: revision 1.16
sys/net/npf/npf_state.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.5
usr.sbin/npf/npfctl/npf_parse.y: revision 1.12
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.10.2.3 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.10.2.2 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.10.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.14.2.4 03-Dec-2017  jdolecek update from HEAD
 1.14.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.14.2.2 23-Jun-2013  tls resync from head
 1.14.2.1 25-Feb-2013  tls resync with head
 1.20.6.1 18-May-2014  rmind sync with head
 1.30.2.1 10-Aug-2014  tls Rebase.
 1.37.2.4 21-Mar-2015  snj Pull up following revision(s) (requested by rmind in ticket #630):
sys/net/npf/npf_ctl.c: revision 1.41
sys/net/npf/npf_ruleset.c: revision 1.42
usr.sbin/npf/npfctl/npf_build.c: revision 1.39
usr.sbin/npf/npfctl/npf_show.c: revision 1.18
NPF: replace the TAILQ of the dynamic rules with a linked list and fix the
inheriting of the active dynamic rules during the reload; also, fix a bug
in the insert path by putting a memory barrier in the right place.
--
npfctl:
- Fix the filter criteria when to/from is omitted but port used.
- Print more user-friendly error if an NPF table has a duplicate entry.
 1.37.2.3 04-Feb-2015  snj Pull up following revision(s) (requested by rmind in ticket #479):
lib/libnpf/npf.c: revision 1.35
lib/libnpf/npf.h: revision 1.28
sys/net/npf/npf_conn.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.61
sys/net/npf/npf_ruleset.c: revision 1.41
usr.sbin/npf/npfctl/npf.conf.5: revision 1.44
usr.sbin/npf/npfctl/npf_parse.y: revision 1.37
usr.sbin/npf/npfctl/npf_show.c: revisions 1.16, 1.17
usr.sbin/npf/npfctl/npfctl.c: revision 1.46
load the config file before bpfjit so that we can disable the warning.
--
Don't depend on yacc to include stdlib.h or string.h.
--
- npf_conn_establish: remove a rare race condition when we might destroy a
connection when it is still referenced by another thread.
- npf_conn_destroy: remove the backwards entry using the saved key, PR/49488.
- Sprinkle some asserts.
--
npf.conf(5): mention alg, include in the example, minor fix.
--
npfctl(8): report dynamic rule ID in a comment, print the case when libpcap
is used correctly. Also, add npf_ruleset_dump() helper in the kernel.
--
libnpf: add npf_rule_getid() and npf_rule_getcode().
Missed in the previous commit.
--
npfctl_print_rule: print the ID in hex, not decimal.
 1.37.2.2 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #280):
sys/net/npf/npf_ruleset.c: revision 1.40
sys/net/npf/npf_nat.c: revision 1.36
sys/net/npf/npf_nat.c: revision 1.37
sys/net/npf/npf_conn.h: revision 1.7
sys/net/npf/npf_conf.c: revision 1.9
sys/net/npf/npf_ruleset.c: revision 1.39
sys/net/npf/npf_conn.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.60
NPF:
- npf_nat_import: take the port only if using the portmap.
- Sprinkle some comments and asserts.
- npf_config_load: if loading the connections, do not perform any actice
NAT policy take over or or portmap sharing - just replace them all.
- npf_config_fini: flush with the empty connection database.
- npf_nat_import: fix the stat counter.
 1.37.2.1 01-Dec-2014  martin Pull up following revision(s) (requested by rmind in ticket #274):
sys/net/npf/npf_nat.c: revision 1.35
sys/net/npf/npf_ruleset.c: revision 1.38
NPF: fix the reference counting and share the active NAT portmap correctly
when performing the reload. Should fixes PR/49412, reported by kardel@.
 1.38.2.2 05-Feb-2017  skrll Sync with HEAD
 1.38.2.1 06-Apr-2015  skrll Sync with HEAD
 1.42.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.42.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.44.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.45.6.1 23-Aug-2023  martin Pull up following revision(s) (requested by kardel in ticket #1893):

sys/net/npf/npf_ruleset.c: revision 1.52

The analysis documented in PR misc/56990 is correct.

Fix by not returning when encountering a ruleset rule.

The code up to now would stop at any group rule.
ruleset rules are marked as group rule and a dynamic rule.
processing is only finished when a result is present AND
we are looking at a plain group rule.
 1.46.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.46.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.46.4.1 10-Jun-2019  christos Sync with HEAD
 1.46.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.48.2.4 23-Aug-2023  martin Pull up following revision(s) (requested by kardel in ticket #1725):

sys/net/npf/npf_ruleset.c: revision 1.52

The analysis documented in PR misc/56990 is correct.

Fix by not returning when encountering a ruleset rule.

The code up to now would stop at any group rule.
ruleset rules are marked as group rule and a dynamic rule.
processing is only finished when a result is present AND
we are looking at a plain group rule.
 1.48.2.3 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.48.2.2 12-Feb-2020  martin Pull up following revision(s) (requested by christos in ticket #699):

sys/net/npf/npf_ruleset.c: revision 1.50

PR/54950: Lloyd Parkes: Avoid NULL deref.
 1.48.2.1 04-Oct-2019  martin Pull up following revision(s) (requested by rmind in ticket #282):

usr.sbin/npf/npfctl/npf_build.c: revision 1.53
lib/libnpf/npf.c: revision 1.48
usr.sbin/npf/npfctl/npfctl.h: revision 1.50
sys/net/npf/npf_impl.h: revision 1.80
usr.sbin/npf/npfctl/npfctl.h: revision 1.51
sys/net/npf/npf_ruleset.c: revision 1.49
usr.sbin/npf/npfctl/npf.conf.5: revision 1.90
sys/net/npf/npf_ctl.c: revision 1.59
lib/libnpf/libnpf.3: revision 1.11
usr.sbin/npf/npfctl/npf_parse.y: revision 1.50
usr.sbin/npf/npftest/npftest.conf: revision 1.8
usr.sbin/npf/npfctl/npfctl.c: revision 1.62
usr.sbin/npf/npfctl/npfctl.c: revision 1.63
usr.sbin/npf/npfctl/npf_scan.l: revision 1.30
usr.sbin/npf/npfctl/npfctl.8: revision 1.22
lib/libnpf/npf.h: revision 1.38
usr.sbin/npf/npfctl/npfctl.8: revision 1.23
usr.sbin/npf/npfctl/npfctl.8: revision 1.24
sys/net/npf/npf_if.c: revision 1.11
sys/net/npf/npf_if.c: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.89
sys/net/npf/npf_conn.c: revision 1.30
usr.sbin/npf/npfctl/npf_build.c: revision 1.52

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.

NPF ifmap: rework and fix a few small bugs.

npfctl: implement table replace subcommand.
Contributed by Timshel Knoll-Miller.
(missed a file in previous commit; cvs is so helpful..)

libnpf/npfctl: support dynamic NAT rulesets using a name prefix.

Use -width Pa for FILES.

Fix pasto in table replace -t type

Use -width Pa for FILES.

npf_ifmap_copylogname: be more defensive.
 1.49.2.1 29-Feb-2020  ad Sync with head.
 1.51.20.1 23-Aug-2023  martin Pull up following revision(s) (requested by kardel in ticket #340):

sys/net/npf/npf_ruleset.c: revision 1.52

The analysis documented in PR misc/56990 is correct.

Fix by not returning when encountering a ruleset rule.

The code up to now would stop at any group rule.
ruleset rules are marked as group rule and a dynamic rule.
processing is only finished when a result is present AND
we are looking at a plain group rule.
 1.52.6.1 02-Aug-2025  perseant Sync with HEAD
 1.56.2.1 13-Oct-2025  martin Pull up following revision(s) (requested by joe in ticket #53):

sys/net/npf/npf.h: revision 1.68
sys/net/npf/npf_ruleset.c: revision 1.57

PR kern/59615 introduce layer checks for 10 userland 11 kernel
 1.23 12-Feb-2023  kardel PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream as https://github.com/rmind/npf/pull/115
 1.22 30-May-2020  rmind branches: 1.22.20;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.21 29-Sep-2018  rmind branches: 1.21.4;
npf_return_tcp: fix no-INET6 case.
 1.20 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.19 10-Apr-2018  mrg branches: 1.19.2;
apply some INET6 so this compiles in INET6-less kernels again.
 1.18 17-Mar-2018  maxv Set the scopes before calling icmp6_error(). This fixes a bug similar to
the one I fixed in rev1.17: since the scopes were not set the packet was
never actually sent.

Tested with wireshark, now the ICMPv6 reply is correctly sent, as
expected.
 1.17 14-Mar-2018  maxv Fix the "return-rst" rule on IPv6 packets.

The scopes needed to be set on the addresses before invoking ip6_output,
because ip6_output needs them. The reason they are not here already is
because pfil_run_hooks (in ip6_input) is called _before_ the kernel
initializes the scopes.

Until now ip6_output was always failing, and the IPv6-TCP-RST packet was
never actually sent.

Perhaps it would be better to have the kernel initialize the scopes
before invoking pfil_run_hooks, but several things will need to be fixed
in several places.

Tested with a simple TCPv6 server. Until now the client would block
waiting for an answer that never came; now it receives an RST right away
and closes the connection, as expected.

I believe that the same problem exists in the "return-icmp" rules, but I
can't investigate this right now (some problems with wireshark).
 1.16 26-Dec-2016  christos branches: 1.16.8; 1.16.14;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.15 20-Jul-2014  rmind branches: 1.15.4; 1.15.8;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.14 09-Feb-2013  rmind branches: 1.14.10;
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.13 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.12 15-Jul-2012  rmind branches: 1.12.2;
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.11 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10 06-May-2012  rmind - Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
 1.9 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.8 29-Nov-2011  rmind branches: 1.8.2; 1.8.4;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.7 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.6 05-Nov-2011  zoltan When building the kernel without IPv6 support, compilation failed.
Fix that.
 1.5 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.4 18-Jan-2011  rmind branches: 1.4.4; 1.4.8;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.3 11-Nov-2010  rmind branches: 1.3.2;
NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 25-Sep-2010  rmind branches: 1.2.2; 1.2.4;
Add nbuf_advfetch() and simplify some code slightly.
 1.1 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 25-Sep-2010  uebayasi file npf_sendpkt.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 25-Sep-2010  yamt file npf_sendpkt.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.3.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.4.8.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.8.5 23-Jan-2013  yamt sync with head
 1.4.8.4 30-Oct-2012  yamt sync with head
 1.4.8.3 23-May-2012  yamt sync with head.
 1.4.8.2 17-Apr-2012  yamt sync with head
 1.4.8.1 10-Nov-2011  yamt sync with head
 1.4.4.2 05-Mar-2011  rmind sync with head
 1.4.4.1 18-Jan-2011  rmind file npf_sendpkt.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.8.4.6 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.8.4.5 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.8.4.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.8.4.3 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.8.4.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.8.4.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.8.2.2 02-Jun-2012  mrg sync to latest -current.
 1.8.2.1 24-Feb-2012  mrg sync to -current.
 1.12.2.3 03-Dec-2017  jdolecek update from HEAD
 1.12.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.1 25-Feb-2013  tls resync with head
 1.14.10.1 10-Aug-2014  tls Rebase.
 1.15.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.15.4.1 05-Feb-2017  skrll Sync with HEAD
 1.16.14.4 30-Sep-2018  pgoyette Ssync with HEAD
 1.16.14.3 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.16.14.2 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.16.14.1 15-Mar-2018  pgoyette Synch with HEAD
 1.16.8.2 14-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #823):

sys/net/npf/npf_inet.c: revision 1.45-1.47
sys/net/npf/npf_alg_icmp.c: revision 1.27-1.30
sys/net/npf/npf_sendpkt.c: revision 1.19

Fix use-after-free.

The nbuf can be reallocated as a result of caching 'enpc', so it is
necessary to recache 'npc', otherwise it contains pointers to the freed
mbuf - pointers which are then used in the ruleset machinery.

We recache 'npc' when we are sure we won't use 'enpc' anymore, because
'enpc' can be clobbered as a result of caching 'npc' (in other words,
only one of the two can be cached at the same time).
Also, we recache 'npc' unconditionally, because there is no way to know
whether the nbuf got clobbered relatively to it. We can't use the
NBUF_DATAREF_RESET flag, because it is stored in the nbuf and not in the
cache.

Discussed with rmind@.

Change npf_cache_all so that it ensures the potential ICMP Query Id is in
the nbuf. In such a way that we don't need to ensure that later.
Change npfa_icmp4_inspect and npfa_icmp6_inspect so that they touch neither
the nbuf nor npc. Adapt their callers accordingly.

In the end, if a packet has a Query Id, we set NPC_ICMP_ID in npc and leave
right away, without recaching npc (not needed since we didn't touch the
nbuf).

This fixes the handling of Query Id packets (that I broke in my previous
commit), and also fixes another possible use-after-free.

Retrieve the complete IPv4 header right away, and make sure we did retrieve
the IPv6 option header we were iterating on.

Ah, fix compilation. I tested my previous change by loading the kernel
module from the filesystem, but the Makefile didn't have DIAGNOSTIC
enabled, and the two KASSERTs I added did not compile properly.

If we fail to advance inside TCP/UDP/ICMPv4/ICMPv6, stop pretending L4
is unknown, and error out right away.

This prevents bugs in machinery, if a place looks for L4 in 'npc_proto'
without checking the cache too. I've seen a ~similar problem already.

In addition to checking L4 in the cache, here we also need to check the
protocol. The NPF entry point does not ensure that
ICMPv6 can be set only in IPv6
ICMPv4 can be set only in IPv4
So we could have ICMPv6 in IPv4.

apply some INET6 so this compiles in INET6-less kernels again.
 1.16.8.1 09-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #817):

sys/net/npf/npf_inet.c: revision 1.38-1.44
sys/net/npf/npf_handler.c: revision 1.38-1.39
sys/net/npf/npf_alg_icmp.c: revision 1.26
sys/net/npf/npf.h: revision 1.56
sys/net/npf/npf_sendpkt.c: revision 1.17-1.18

Declare NPC_FMTERR, and use it to kick malformed packets. Several sanity
checks are added in IPv6; after we see the first IPPROTO_FRAGMENT header,
we are allowed to fail to advance, otherwise we kick the packet.
Sent on tech-net@ a few days ago, no response, but I'm committing it now
anyway.

Switch nptr to uint8_t, and use nbuf_ensure_contig. Makes us use fewer
magic values.

Remove dead branches, 'npc' can't be NULL (and it is dereferenced
earlier).

Fix two consecutive mistakes.

The first mistake was npf_inet.c rev1.37:
"Don't reassemble ipv6 fragments, instead treat the first fragment
as a regular packet (subject to filtering rules), and pass
subsequent fragments in the same group unconditionally."

Doing this was entirely wrong, because then a packet just had to push
the L4 payload in a secondary fragment, and NPF wouldn't apply rules on
it - meaning any IPv6 packet could bypass >=L4 filtering. This mistake
was supposed to be a fix for the second mistake.

The second mistake was that ip6_reass_packet (in npf_reassembly) was
getting called with npc->npc_hlen. But npc_hlen pointed to the last
encountered header in the IPv6 chain, which was not necessarily the
fragment header. So ip6_reass_packet was given garbage, and would fail,
resulting in the packet getting kicked. So basically IPv6 was broken by
NPF.

The first mistake is reverted, and the second one is fixed by doing:
- hlen = sizeof(struct ip6_frag);
+ hlen = 0;

Now the iteration stops on the fragment header, and the call to
ip6_reass_packet is valid.

My npf_inet.c rev1.38 is partially reverted: we don't need to worry
about failing properly to advance; once the packet is reassembled
npf_cache_ip gets called again, and this time the whole chain should be
there.

Tested with a simple UDPv6 server - send a 3000-byte-sized buffer, the
packet gets correctly reassembled by NPF now.

Mmh, put back the RFC6946 check (about dummy fragments), otherwise NPF
is not happy in npf_reassembly, because NPC_IPFRAG is again returned after
the packet was reassembled.

I'm wondering whether it would not be better to just remove the fragment
header in frag6_input directly.

Fix the "return-rst" rule on IPv6 packets.
The scopes needed to be set on the addresses before invoking ip6_output,
because ip6_output needs them. The reason they are not here already is
because pfil_run_hooks (in ip6_input) is called _before_ the kernel
initializes the scopes.

Until now ip6_output was always failing, and the IPv6-TCP-RST packet was
never actually sent.

Perhaps it would be better to have the kernel initialize the scopes
before invoking pfil_run_hooks, but several things will need to be fixed
in several places.

Tested with a simple TCPv6 server. Until now the client would block
waiting for an answer that never came; now it receives an RST right away
and closes the connection, as expected.
I believe that the same problem exists in the "return-icmp" rules, but I
can't investigate this right now (some problems with wireshark).

Fix the IPv6 payload computation in npf_tcpsaw. It was incorrect, and this
caused the "return-rst" rules to send back an RST with the wrong ACK when
the received SYN had an IPv6 option.

Set the scopes before calling icmp6_error(). This fixes a bug similar to
the one I fixed in rev1.17: since the scopes were not set the packet was
never actually sent.

Tested with wireshark, now the ICMPv6 reply is correctly sent, as
expected.

Don't read the L4 payload after IPPROTO_AH when handling IPv6 packets.
AH must be considered as the payload, otherwise a

block all
pass in proto ah from any
pass out proto ah from any

configuration will actually block everything, because NPF checks the
protocol against the one found after AH, and not AH itself.

In addition it may have been a problem for stateful connections; an AH
packet sent by an attacker with an incorrect authentication and a correct
TCP/UDP/whatever payload from an active connection could manage to change
NPF's FSM state, which would perhaps have altered the legitimate
connection with the authenticated remote IPsec host.

Note that IPv4 already doesn't go beyond AH, which is the correct
behavior.

Add XXX (we don't handle IPv6 Jumbograms), and whitespace.
 1.19.2.1 10-Jun-2019  christos Sync with HEAD
 1.21.4.2 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #119):

sys/net/npf/npf_mbuf.c: revision 1.25
sys/net/npf/npf.h: revision 1.64
sys/net/npf/npf_sendpkt.c: revision 1.23

PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream ashttps://github.com/rmind/npf/pull/115
 1.21.4.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.22.20.1 14-Mar-2023  martin Pull up following revision(s) (requested by kardel in ticket #119):

sys/net/npf/npf_mbuf.c: revision 1.25
sys/net/npf/npf.h: revision 1.64
sys/net/npf/npf_sendpkt.c: revision 1.23

PR kern/56052:
allow block-return packets passed through without rule matching.
Included up-stream ashttps://github.com/rmind/npf/pull/115
 1.34 20-Jul-2014  rmind Bye bye npf_session.c
 1.33 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.32 14-May-2014  rmind npf_session_inspect: do not silently drop the packet on state check failure.
Let the rules deal with it (e.g. we may want to log it).
 1.31 14-Mar-2014  rmind branches: 1.31.2;
NPF: add support for "stateful-ends".
 1.30 06-Dec-2013  rmind NPF:
- Adjust NAT to not assume flow direction in some cases and thus support
less usual setups which are possible when using 'map' with a custom
filter criteria.
- Introduce NPF_SRC/NPF_DST and replace npc_src/npc_dst with npc_ips[2]
for more convenient handling.
- ICMP ALG: restrict matching only to the outgoing traffic, but be more
direction-agnostic elsewhere.
 1.29 04-Dec-2013  rmind - npf_do_nat: fix a race condition and simplify the logic.
- npf_session_setnat: clear the NAT association on failure.
 1.28 22-Nov-2013  rmind npf_addr_mix: use xor rather than sum.
 1.27 08-Nov-2013  rmind NPF: add support for specifying the interfaces before they are attached.
If an interface is or gets detached, all associated rules and connections
will be deactivated (it might be useful to have an option to invalidate
the associated connections). Once the interface is reattached they will
become active.

Bump NPF_VERSION.
 1.26 29-Oct-2013  rmind npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.25 26-Sep-2013  rmind sess_hash_bucket: convert to murmurhash2, include ports, use random seed.
 1.24 02-Jun-2013  rmind branches: 1.24.2;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.23 18-Mar-2013  rmind npf_session_establish: fix previous.
 1.22 18-Mar-2013  rmind Add npf_session_trackable_p() and npf_session_fillent() for the common code.
Simplify. No functional change.
 1.21 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.20 20-Jan-2013  rmind - nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.19 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.18 13-Sep-2012  joerg Mark npf_session_worker as __dead.
 1.17 12-Aug-2012  rmind branches: 1.17.2;
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.16 19-Jul-2012  spz teach npf ipv6-icmp
reviewed by rmind@
 1.15 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.14 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.13 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.12 11-Mar-2012  rmind - Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
 1.11 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.10 29-Nov-2011  rmind branches: 1.10.2; 1.10.4;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.9 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.8 02-Feb-2011  rmind branches: 1.8.2; 1.8.6;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.7 18-Jan-2011  rmind branches: 1.7.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.6 18-Dec-2010  rmind branches: 1.6.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.5 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.4 03-Oct-2010  rmind branches: 1.4.2; 1.4.4;
- npf_session_gc: fix for previous RB-tree conversion.
- npf_session_free: rename (to singular).
 1.3 24-Sep-2010  rmind Fixes/improvements to RB-tree implementation:
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.

XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..

1-3 address the PR/43488 by Jeremy Huddleston.

Passes RB-tree regression tests.
Reviewed by: matt@, christos@
 1.2 16-Sep-2010  rmind NPF checkpoint:
- Add support for bi-directional NAT and redirection / port forwarding.
- Finish filtering on ICMP type/code and add filtering on TCP flags.
- Add support for TCP reset (RST) or ICMP destination unreachable on block.
- Fix a bunch of bugs; misc cleanup.
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.4.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.4.4.1 03-Oct-2010  uebayasi file npf_session.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.4.2.2 09-Oct-2010  yamt sync with head
 1.4.2.1 03-Oct-2010  yamt file npf_session.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.6.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.7.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.8.6.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.6.4 23-Jan-2013  yamt sync with head
 1.8.6.3 30-Oct-2012  yamt sync with head
 1.8.6.2 17-Apr-2012  yamt sync with head
 1.8.6.1 10-Nov-2011  yamt sync with head
 1.8.2.2 05-Mar-2011  rmind sync with head
 1.8.2.1 02-Feb-2011  rmind file npf_session.c was added on branch rmind-uvmplock on 2011-03-05 20:55:55 +0000
 1.10.4.10 17-Nov-2013  bouyer Pull up following revision(s) (requested by rmind in ticket #985):
sys/net/npf/npf_impl.h: revision 1.35
sys/net/npf/npf_nat.c: revision 1.21
sys/net/npf/npf_session.c: revision 1.26
npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.10.4.9 11-Feb-2013  riz branches: 1.10.4.9.2;
Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.10.4.8 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.10.4.7 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #679):
sys/net/npf/npf_session.c: revision 1.18
usr.sbin/npf/npftest/npftest.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.7
usr.sbin/npf/npftest/npftest.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.5
sys/net/npf/npf_alg_icmp.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.3
npftest:
- Do not stop running other tests, if some tests fail.
- Fix some endianness bugs in the test cases.
Tested on sparc64 by martin@, all tests pass.
Add two new command line options to help integration into ATF:
-L lists the available test cases, -T executes a single named test.
Fix printf format
Mark npf_session_worker as __dead.
More __dead
npf_icmp_uniqid: split into npf_icmp_uniqid4() and npf_icmp_uniqid6() parts.
 1.10.4.6 13-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #485):
lib/libnpf/npf.c: revision 1.11
sys/net/npf/npf_session.c: revision 1.17
sys/modules/npf/Makefile: revision 1.10
usr.sbin/npf/npftest/npftest.c: revision 1.4
usr.sbin/npf/npftest/README: revision 1.1
sys/net/npf/npf_tableset.c: revision 1.14
usr.sbin/npf/npftest/npftest.h: revision 1.4
lib/libnpf/npf.h: revision 1.10
sys/net/npf/npf_ruleset.c: revision 1.14
usr.sbin/npf/npfctl/npf_data.c: revision 1.18
usr.sbin/npf/npftest/npftest.conf: revision 1.1
sys/net/npf/npf_handler.c: revision 1.21
sys/net/npf/npf_impl.h: revision 1.21
usr.sbin/npf/npfctl/npfctl.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.1
usr.sbin/npf/npftest/npfstream.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.19
sys/net/npf/npf_nat.c: revision 1.16
sys/net/npf/npf_state.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.5
usr.sbin/npf/npfctl/npf_parse.y: revision 1.12
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.10.4.5 25-Jul-2012  jdc Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.10.4.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.10.4.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.10.4.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10.4.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.10.4.9.2.1 17-Nov-2013  bouyer Pull up following revision(s) (requested by rmind in ticket #985):
sys/net/npf/npf_impl.h: revision 1.35
sys/net/npf/npf_nat.c: revision 1.21
sys/net/npf/npf_session.c: revision 1.26
npf_session_setnat: fix the race condition when the old connection is still
being expired while a new/duplicate is being created.
 1.10.2.2 05-Apr-2012  mrg sync to latest -current.
 1.10.2.1 24-Feb-2012  mrg sync to -current.
 1.17.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.2.3 23-Jun-2013  tls resync from head
 1.17.2.2 25-Feb-2013  tls resync with head
 1.17.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.24.2.1 18-May-2014  rmind sync with head
 1.31.2.1 10-Aug-2014  tls Rebase.
 1.4 03-Oct-2025  joe hold locks in socket access in npf PR kern/59681
 1.3 02-Jun-2025  joe branches: 1.3.2; 1.3.4;
fix build for non-INET6 kernels : martin@
 1.2 02-Jun-2025  joe remove headers from INET6 options: martin@
 1.1 01-Jun-2025  joe kernel: extract rules, lookup socket, process filtering, reviews by christos@
 1.3.4.2 02-Aug-2025  perseant Sync with HEAD
 1.3.4.1 02-Jun-2025  perseant file npf_socket.c was added on branch perseant-exfatfs on 2025-08-02 05:57:48 +0000
 1.3.2.1 13-Oct-2025  martin Pull up following revision(s) (requested by joe in ticket #52):

sys/net/npf/npf_socket.c: revision 1.4

hold locks in socket access in npf PR kern/59681
 1.23 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.22 23-Jul-2019  rmind branches: 1.22.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.21 29-Oct-2018  christos We need to have rump tests work in two modes:

1. npf unit tests. In this case only the npf subsystem is created
and dictionaries are passed directly.
2. kernel system tests (like the ipsec natt test). In this case, npf is
instantiated regularly as part of the kernel and dictionaries are
passed via ioctl.

We differentiate between the two cases by checking the "mbufops" member
which is NULL, regularly and non-NULL in the npf unit tests. Previously
this was done using an ifdef which obviously can't work for both cases.
 1.20 26-Oct-2018  christos enable the sampling function for _NPF_RUMP
 1.19 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.18 26-Dec-2016  christos branches: 1.18.14; 1.18.16;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.17 20-Jul-2014  rmind branches: 1.17.4; 1.17.8;
NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.16 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.15 04-Nov-2013  rmind branches: 1.15.2;
npf_generic_fsm and npf_tcp_fsm: use uint8_t and make the arrays more dense.
 1.14 09-Feb-2013  rmind branches: 1.14.2;
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.13 24-Dec-2012  rmind - Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.12 15-Aug-2012  rmind branches: 1.12.2;
Add npf_state_setsampler() for _NPF_TESTING case. This also fixes the build.
 1.11 12-Aug-2012  rmind - Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.10 21-Jul-2012  rmind - npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.9 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.8 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.7 30-May-2012  rmind npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
 1.6 29-Nov-2011  rmind branches: 1.6.2; 1.6.4;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.5 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.4 25-Apr-2011  yamt branches: 1.4.4;
fix module build
 1.3 18-Jan-2011  rmind branches: 1.3.4;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.2 18-Dec-2010  rmind branches: 1.2.2;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.1 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.3.4.3 31-May-2011  rmind sync with head
 1.3.4.2 05-Mar-2011  rmind sync with head
 1.3.4.1 18-Jan-2011  rmind file npf_state.c was added on branch rmind-uvmplock on 2011-03-05 20:55:56 +0000
 1.4.4.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.4.4 23-Jan-2013  yamt sync with head
 1.4.4.3 30-Oct-2012  yamt sync with head
 1.4.4.2 17-Apr-2012  yamt sync with head
 1.4.4.1 10-Nov-2011  yamt sync with head
 1.6.4.8 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.6.4.7 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.6.4.6 18-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #678):
sys/rump/librump/rumpkern/rump.c: revision 1.243
sys/rump/librump/rumpkern/rump.c: revision 1.244
sys/rump/librump/rumpkern/rump.c: revision 1.245
sys/rump/librump/rumpkern/rump.c: revision 1.246
usr.sbin/npf/npftest/npftest.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.2
usr.sbin/npf/npftest/npftest.h: revision 1.5
sys/rump/net/Makefile.rumpnetcomp: revision 1.5
sys/rump/net/lib/libnpf/shlib_version: revision 1.1
sys/net/npf/npf_impl.h: revision 1.22
sys/rump/dev/lib/libnpf/Makefile: file removal
usr.sbin/npf/npftest/Makefile: revision 1.3
sys/rump/dev/lib/libnpf/component.c: file removal
sys/rump/dev/lib/libnpf/shlib_version: file removal
sys/net/npf/npf_state.c: revision 1.12
sys/rump/net/lib/libnpf/component.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.6
sys/rump/net/lib/libnpf/Makefile: revision 1.1
Move and rename librumpdev_npf to librumpnet_npf.
Enable the build of librumpnet_npf.
Add npf_state_setsampler() for _NPF_TESTING case. This also fixes the build.
Call pserialize_init() during rump start-up, since librump/net/npf
uses it.
It helps to include the declaration of the routine being called.
We also need kcpuset_init() now.
Use correct routine name - kcpuset_sysinit() vs kcpuset_init()
 1.6.4.5 13-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #485):
lib/libnpf/npf.c: revision 1.11
sys/net/npf/npf_session.c: revision 1.17
sys/modules/npf/Makefile: revision 1.10
usr.sbin/npf/npftest/npftest.c: revision 1.4
usr.sbin/npf/npftest/README: revision 1.1
sys/net/npf/npf_tableset.c: revision 1.14
usr.sbin/npf/npftest/npftest.h: revision 1.4
lib/libnpf/npf.h: revision 1.10
sys/net/npf/npf_ruleset.c: revision 1.14
usr.sbin/npf/npfctl/npf_data.c: revision 1.18
usr.sbin/npf/npftest/npftest.conf: revision 1.1
sys/net/npf/npf_handler.c: revision 1.21
sys/net/npf/npf_impl.h: revision 1.21
usr.sbin/npf/npfctl/npfctl.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.1
usr.sbin/npf/npftest/npfstream.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.19
sys/net/npf/npf_nat.c: revision 1.16
sys/net/npf/npf_state.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.5
usr.sbin/npf/npfctl/npf_parse.y: revision 1.12
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.6.4.4 25-Jul-2012  jdc Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.6.4.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.6.4.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.6.4.1 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.6.2.1 02-Jun-2012  mrg sync to latest -current.
 1.12.2.3 03-Dec-2017  jdolecek update from HEAD
 1.12.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.2.1 25-Feb-2013  tls resync with head
 1.14.2.1 18-May-2014  rmind sync with head
 1.15.2.1 10-Aug-2014  tls Rebase.
 1.17.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.17.4.1 05-Feb-2017  skrll Sync with HEAD
 1.18.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.16.1 10-Jun-2019  christos Sync with HEAD
 1.18.14.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.18.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.22.2.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.21 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.20 23-Jul-2019  rmind branches: 1.20.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.19 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.18 26-Dec-2016  rmind branches: 1.18.14; 1.18.16;
npf_tcp_fsm: fix for the NPF_TCPS_SYN_RECEIVED state.

SYN re-transmission after SYN-ACK was seen by NPF should not terminate
the connection. Thanks to: Alexander Kiselev <kiselev99 at gmail com>
 1.17 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.16 25-Jul-2014  rmind branches: 1.16.4; 1.16.8;
npf_tcp_inwindow: enable strict RST check by default.
 1.15 20-Jul-2014  rmind NPF: add nbuf_t * into npf_cache_t and remove unnecessary carrying by argument.
 1.14 19-Jul-2014  rmind NPF: partially rewrite the connection tracking mechanism:
- Separate the tracking interface from the storage (state table)
and thus prepare to use a new data structure for the storage.
- Fix some race conditions in NAT association logic.
 1.13 04-Nov-2013  rmind branches: 1.13.2;
npf_generic_fsm and npf_tcp_fsm: use uint8_t and make the arrays more dense.
 1.12 24-Dec-2012  rmind branches: 1.12.2;
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
 1.11 06-Oct-2012  rmind npf_tcp_inwindow: inspect the sequence numbers even if the packet contains no
data, fixing up only the RST to the initial SYN. This makes off-path attacks
more difficult. For the reference, see "Reflection Scan: an Off-Path Attack
on TCP" by Jan Wrobel.
 1.10 21-Jul-2012  rmind branches: 1.10.2;
- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.9 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.8 01-Jul-2012  rmind npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
 1.7 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.6 05-Jun-2012  rmind npf_state_tcp: add an assert; fix some comments while here.
 1.5 30-May-2012  rmind npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
 1.4 03-Apr-2012  rmind branches: 1.4.2;
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).

PR/46265 from Changli Gao.
 1.3 08-Dec-2011  rmind branches: 1.3.2;
- Explain the magic in npf_tcpfl2case().
- Use __unused instead of (void)cast; fix comment.
 1.2 05-Dec-2011  rmind - Add npf_tcpfl2case() and make TCP state table more compact.
- Adjust the state for FIN case on sim-SYN and SYN-RECEIVED.
 1.1 29-Nov-2011  rmind branches: 1.1.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.1.2.3 02-Jun-2012  mrg sync to latest -current.
 1.1.2.2 05-Apr-2012  mrg sync to latest -current.
 1.1.2.1 18-Feb-2012  mrg merge to -current.
 1.3.2.7 08-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.3.2.6 24-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #702):
sys/net/npf/npf_tableset.c: revision 1.15
usr.sbin/npf/npfctl/npfctl.h: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.6
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.10
sys/net/npf/npf_state_tcp.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.24
sys/net/npf/npf.h: revision 1.22
sys/net/npf/npf_ctl.c: revision 1.19
sys/net/npf/npf.c: revision 1.14
usr.sbin/npf/npfctl/npfctl.8: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.21
npf_tcp_inwindow: inspect the sequence numbers even if the packet contains no
data, fixing up only the RST to the initial SYN. This makes off-path attacks
more difficult. For the reference, see &quot;Reflection Scan: an Off-Path Attack
on TCP&quot; by Jan Wrobel.
Implement NPF table listing and preservation of entries on reload.
Bump the version.
npfctl(8): mention table listing.
 1.3.2.5 25-Jul-2012  jdc Pull up revisions:
src/usr.sbin/npf/npfctl/npfctl.c revisions 1.16,1.17
src/sys/net/npf/npf.h revision 1.20
src/sys/net/npf/npf_alg_icmp.c revision 1.11
src/sys/net/npf/npf_impl.h revision 1.19
src/sys/net/npf/npf_inet.c revisions 1.15,1.16
src/sys/net/npf/npf_instr.c revision 1.14
src/sys/net/npf/npf_ncode.h revision 1.10
src/sys/net/npf/npf_processor.c revision 1.12
src/sys/net/npf/npf_session.c revision 1.16
src/usr.sbin/npf/npfctl/npf_build.c revision 1.12
src/usr.sbin/npf/npfctl/npf_data.c revisions 1.16,1.17
src/usr.sbin/npf/npfctl/npf_disassemble.c revision 1.8
src/usr.sbin/npf/npfctl/npf_ncgen.c revision 1.13
src/usr.sbin/npf/npfctl/npf_parse.y revision 1.11
src/usr.sbin/npf/npfctl/npf_scan.l revision 1.5
src/usr.sbin/npf/npfctl/npf_var.h revision 1.3
src/usr.sbin/npf/npfctl/npfctl.h revision 1.18
src/sys/net/npf/npf_state.c revision 1.10
src/sys/net/npf/npf_state_tcp.c revision 1.10
src/usr.sbin/npf/npftest/npfstream.c revision 1.2
src/usr.sbin/npf/npftest/libnpftest/npf_test_subr.c revision 1.2
(requested by rmind in ticket #435).

Add missing __dead.

teach npf ipv6-icmp
reviewed by rmind@

- npfctl_print_stats: beautification a la French style.
- npfctl_icmpcode: fix the build break.

- npf_fetch_tcpopts: fix off-by-one when validating TCP option length
against the maximum allowed.
- npf_tcp_inwindow: be more liberal with npf_fetch_tcpopts().
- Few minor improvements to npftest.
 1.3.2.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.3.2.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.3.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.3.2.1 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #354):
sys/net/npf/npf_state_tcp.c: revision 1.4
sys/net/npf/npf_state_tcp.c: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.6
usr.sbin/npf/npftest/npftest.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.1
usr.sbin/npf/npftest/npftest.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.2
usr.sbin/npf/npfctl/npf_data.c: revision 1.11
usr.sbin/npf/npftest/npftest.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.12
usr.sbin/npf/npftest/npftest.h: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.5
usr.sbin/npf/npfctl/npf_data.c: revision 1.13
sys/net/npf/npf.h: revision 1.16
usr.sbin/npf/npftest/npftest.h: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.6
usr.sbin/npf/npftest/npftest.h: revision 1.3
usr.sbin/npf/npfctl/npf_parse.y: revision 1.7
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.10
usr.sbin/npf/npfctl/npf_build.c: revision 1.6
usr.sbin/npf/npfctl/npf_parse.y: revision 1.8
usr.sbin/npf/npfctl/npf_build.c: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.9
usr.sbin/npf/npfctl/npf.conf.5: revision 1.10
usr.sbin/npf/npfctl/npf.conf.5: revision 1.11
usr.sbin/npf/npfctl/npf.conf.5: revision 1.12
sys/net/npf/npf_state.c: revision 1.7
usr.sbin/npf/npfctl/npfctl.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.12
usr.sbin/npf/npfctl/Makefile: revision 1.7
sys/rump/net/lib/libnet/Makefile: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.7
usr.sbin/npf/npftest/Makefile: revision 1.1
usr.sbin/npf/npftest/Makefile: revision 1.2
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.2
usr.sbin/npf/npftest/npfstream.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.2
usr.sbin/npf/npfctl/npf_scan.l: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.12
sys/rump/dev/lib/libnpf/Makefile: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.14
sys/rump/dev/lib/libnpf/Makefile: revision 1.3
usr.sbin/npf/npfctl/npfctl.h: revision 1.15
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.9
sys/net/npf/npf_ctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_var.c: revision 1.4
usr.sbin/npf/npfctl/npf_var.h: revision 1.2
usr.sbin/npf/npfctl/npf_var.c: revision 1.5
sys/net/npf/npf_impl.h: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.10
sys/net/npf/npf_impl.h: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.4
sys/net/npf/npf_impl.h: revision 1.15
sys/net/npf/npf_handler.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.5
sys/net/npf/npf_handler.c: revision 1.17
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.2
sys/net/npf/npf_ncode.h: revision 1.7
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.1
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.3
sys/net/npf/npf_ncode.h: revision 1.8
npf_tcp_inwindow: in a case of negative skew, bump the maximum seen value of
SEQ+LEN in the receiver's side correctly (using ACK from the sender's side).
PR/46265 from Changli Gao.
rumpnet_net: add pfil.c
Update rumpdev_npf; use WARNS=4.
Add initial NPF regression tests integrated with RUMP framework (running the
kernel part of NPF in userland). Other tests will be added once converted to
RUMP framework. All tests are in the public domain.
Some Makefile fixes from christos@.
- Fix double-free case on ICMP return case.
- npf_pfil_register: handle kernels without INET6 option correctly.
- Reduce some #ifdefs.
npfctl(8): add show-config command. Also, update syntax.
npftest: add a stream processor, which prints out the TCP state information.
A tool for debugging connection tracking from tcpdump -w captured data.
npftest: add a module for TCP state tracking and add few test cases.
npf_state_tcp: add an assert; fix some comments while here.
- Rework NPF NAT syntax to be more structured and support future additions
of different types and configurations of NAT.
- npfctl: improve disassemble and show-config command functionality.
- Fix custom ICMP code and type filtering.
make this compile again.
remove error(1) output
Remove superfluous Pp
- make each element of a variable hold a type
- change get_type to take an index, so we can get the individual types of
each element (since primitive elements can be in lists)
- make port_range primitive
- add a routine to convert a variable of primitives to a variable containing
- only port ranges.
remove extra rule that got merged...
 1.4.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.4.2.4 23-Jan-2013  yamt sync with head
 1.4.2.3 30-Oct-2012  yamt sync with head
 1.4.2.2 17-Apr-2012  yamt sync with head
 1.4.2.1 03-Apr-2012  yamt file npf_state_tcp.c was added on branch yamt-pagecache on 2012-04-17 00:08:39 +0000
 1.10.2.4 03-Dec-2017  jdolecek update from HEAD
 1.10.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.2.2 25-Feb-2013  tls resync with head
 1.10.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.12.2.1 18-May-2014  rmind sync with head
 1.13.2.1 10-Aug-2014  tls Rebase.
 1.16.8.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.16.4.1 05-Feb-2017  skrll Sync with HEAD
 1.18.16.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18.16.1 10-Jun-2019  christos Sync with HEAD
 1.18.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.20.2.1 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.43 07-Feb-2025  joe introduce a kernel boolean assertion to ensure the running thread holds the mutex
 1.42 24-Feb-2023  riastradh branches: 1.42.6;
npf: Eliminate __HAVE_ATOMIC_AS_MEMBAR conditionals.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html

Requested by rmind@:
https://github.com/rmind/npf/pull/127#issuecomment-1399573125
 1.41 23-Jan-2023  riastradh npf(9): Drop table lock around copyout.

It is forbidden to hold a spin lock around copyout, and t_lock is a
spin lock.

We need t_lock in order to iterate over the list of entries.
However, during copyout itself, we only need to ensure that the
object we're copying out isn't freed by npf_table_remove or
npf_table_gc.

Fortunately, the only caller of npf_table_list, npf_table_remove, and
npf_table_gc is npfctl_table, and it serializes all of them by the
npf config lock. So we can safely drop t_lock across copyout.

PR kern/57136
PR kern/57181
 1.40 22-Jan-2023  riastradh npf(9): Another comment tweak to match upstream.

No functional change.
 1.39 22-Jan-2023  riastradh npf(9): Use __HAVE_ATOMIC_AS_MEMBAR around refcnt consistently.
 1.38 09-Apr-2022  riastradh branches: 1.38.4;
sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.37 12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.36 25-Jan-2021  christos s/npf_config_lock/npf->config_lock/ in the comments
 1.35 30-May-2020  rmind branches: 1.35.2;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.34 21-Aug-2019  rmind npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.33 23-Jul-2019  rmind branches: 1.33.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.32 20-Jun-2019  christos Add error checking for previous memory allocation failure.
 1.31 20-Jun-2019  christos PR/54314: Frank Kardel: LOCKDEBUG: Mutex error: assert_sleepable,70:
spin lock held when loading NPF
 1.30 12-Jun-2019  christos Avoid LOCKDEBUG pserialize panic by implementing suggestion #1 from

http://mail-index.netbsd.org/current-users/2019/02/24/msg035220.html:

Convert the mutex to spin-lock at IPL_NET (but it is excessive) and
convert the memory allocations in that code path to KM_NOSLEEP.
 1.29 19-Jan-2019  rmind Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.28 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.27 10-Mar-2017  christos branches: 1.27.12; 1.27.14;
fix MIN/MAX confusion.
 1.26 02-Jan-2017  rmind branches: 1.26.2;
NPF: implement dynamic handling of interface addresses (the kernel part).
 1.25 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.24 09-Dec-2016  christos This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
 1.23 20-Apr-2016  christos branches: 1.23.2;
/32 and /128 are valid netmasks.
 1.22 11-Aug-2014  rmind branches: 1.22.2; 1.22.4; 1.22.8;
NPF: finish up the rework of npfctl_save() mechanism.
 1.21 06-Feb-2014  rmind Add support for CDB based NPF tables.
 1.20 22-Nov-2013  rmind Add npf_tableset_syncdict() to sync the table IDs in the proplib dictionary,
as they can change on reload now. Also, fix table name checking in npfctl.
 1.19 12-Nov-2013  rmind NPF: add support for table naming and remove NPF_TABLE_SLOTS (there is
just an arbitrary sanity limit of NPF_MAX_TABLES currently set to 128).

Few misc fixes. Bump NPF_VERSION.
 1.18 19-May-2013  rmind branches: 1.18.2;
- Add NPF table flushing functionality.
- Fix line numbering for npfctl debug command.
 1.17 09-Feb-2013  rmind NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
 1.16 04-Dec-2012  rmind npf_table_list: avoid triggering assert on diagnostic.
 1.15 29-Oct-2012  rmind Implement NPF table listing and preservation of entries on reload.
Bump the version.
 1.14 12-Aug-2012  rmind branches: 1.14.2;
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.13 15-Jul-2012  rmind - Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.12 01-Jul-2012  rmind NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary "pass proto <name/number>".
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
 1.11 22-Jun-2012  rmind NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.10 20-Feb-2012  rmind - Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
 1.9 15-Jan-2012  rmind branches: 1.9.2;
- Expire all sessions on flush.
- Enable checking for zero mask in IP{4,6}MATCH after npfctl changes.
- Make locking symmetric for npf_ruleset_inspect().
- Sync function prototypes in npf(3) man page with reality.
- Rename NPF_TABLE_RBTREE to NPF_TABLE_TREE.
 1.8 29-Nov-2011  rmind branches: 1.8.2;
- Rework and improve TCP state tracking.
- Fix regressions after IPv6 patch merge.

Note: npfctl(8) rework will come soon.
 1.7 06-Nov-2011  rmind Few fixes, KNF/style, bump the NPF version.
 1.6 04-Nov-2011  zoltan Add IPv6 support for NPF.
 1.5 02-Feb-2011  rmind branches: 1.5.2; 1.5.6;
NPF checkpoint:
- Add libnpf(3) - a library to control NPF (configuration, ruleset, etc).
- Add NPF support for ftp-proxy(8).
- Add rc.d script for NPF.
- Convert npfctl(8) to use libnpf(3) and thus make it less depressive.
Note: next clean-up step should be a parser, once dholland@ will finish it.
- Add more documentation.
- Various fixes.
 1.4 18-Dec-2010  rmind branches: 1.4.2; 1.4.4;
NPF checkpoint:
- Add support for session saving/restoring.
- Add packet logging support (can tcpdump a pseudo-interface).
- Support reload without flushing of sessions; rework some locking.
- Revisit session mangement, replace linking with npf_sentry_t entries.
- Add some counters for statistics, using percpu(9).
- Add IP_DF flag cleansing.
- Fix various bugs; misc clean-up.
 1.3 11-Nov-2010  rmind NPF checkpoint:
- Add proper TCP state tracking as described in Guido van Rooij paper,
plus handle TCP Window Scaling option.
- Completely rework npf_cache_t, reduce granularity, simplify code.
- Add npf_addr_t as an abstraction, amend session handling code, as well
as NAT code et al, to use it. Now design is prepared for IPv6 support.
- Handle IPv4 fragments i.e. perform packet reassembly.
- Add support for IPv4 ID randomization and minimum TTL enforcement.
- Add support for TCP MSS "clamping".
- Random bits for IPv6. Various fixes and clean-up.
 1.2 24-Sep-2010  rmind branches: 1.2.2; 1.2.4;
Fixes/improvements to RB-tree implementation:
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.

XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..

1-3 address the PR/43488 by Jeremy Huddleston.

Passes RB-tree regression tests.
Reviewed by: matt@, christos@
 1.1 22-Aug-2010  rmind Import NPF - a packet filter. Some features:

- Designed to be fully MP-safe and highly efficient.

- Tables/IP sets (hash or red-black tree) for high performance lookups.

- Stateful filtering and Network Address Port Translation (NAPT).
Framework for application level gateways (ALGs).

- Packet inspection engine called n-code processor - inspired by BPF -
supporting generic RISC-like and specific CISC-like instructions for
common patterns (e.g. IPv4 address matching). See npf_ncode(9) manual.

- Convenient userland utility npfctl(8) with npf.conf(8).

NOTE: This is not yet a fully capable alternative to PF or IPFilter.
Further work (support for binat/rdr, return-rst/return-icmp, common ALGs,
state saving/restoring, logging, etc) is in progress.

Thanks a lot to Matt Thomas for various useful comments and code review.
Aye by: board@
 1.2.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.1 24-Sep-2010  uebayasi file npf_tableset.c was added on branch uebayasi-xip on 2010-10-22 09:23:15 +0000
 1.2.2.2 09-Oct-2010  yamt sync with head
 1.2.2.1 24-Sep-2010  yamt file npf_tableset.c was added on branch yamt-nfs-mp on 2010-10-09 03:32:37 +0000
 1.4.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.4.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.6.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.6.4 16-Jan-2013  yamt sync with (a bit old) head
 1.5.6.3 30-Oct-2012  yamt sync with head
 1.5.6.2 17-Apr-2012  yamt sync with head
 1.5.6.1 10-Nov-2011  yamt sync with head
 1.5.2.2 05-Mar-2011  rmind sync with head
 1.5.2.1 02-Feb-2011  rmind file npf_tableset.c was added on branch rmind-uvmplock on 2011-03-05 20:55:56 +0000
 1.8.2.2 24-Feb-2012  mrg sync to -current.
 1.8.2.1 18-Feb-2012  mrg merge to -current.
 1.9.2.8 11-Feb-2013  riz Pull up following revision(s) (requested by rmind in ticket #817):
usr.sbin/npf/npfctl/npfctl.8: revision 1.12
usr.sbin/npf/npfctl/npf.conf.5: revision 1.27
usr.sbin/npf/npfctl/npf_parse.y: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.28
lib/libnpf/npf.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.c: revision 1.29
lib/libnpf/npf.c: revision 1.17
sys/modules/npf/Makefile: revision 1.12
sys/net/npf/npf_rproc.c: revision 1.6
usr.sbin/npf/npftest/README: revision 1.4
sys/net/npf/npf_tableset.c: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.21
sys/net/npf/npf_ctl.c: revision 1.22
usr.sbin/npf/npfctl/npfctl.h: revision 1.25
lib/libnpf/npf.h: revision 1.13
usr.sbin/npf/npftest/npftest.conf: revision 1.2
usr.sbin/npf/npfctl/npfctl.h: revision 1.26
sys/net/npf/npf_ruleset.c: revision 1.17
lib/libnpf/npf.h: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.18
sys/net/npf/npf_conf.c: revision 1.1
usr.sbin/npf/npfctl/npf_scan.l: revision 1.10
sys/net/npf/npf_conf.c: revision 1.2
sys/net/npf/npf_instr.c: revision 1.16
sys/net/npf/npf_handler.c: revision 1.26
sys/net/npf/npf_impl.h: revision 1.26
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.14
sys/net/npf/npf_processor.c: revision 1.15
sys/net/npf/npf_impl.h: revision 1.27
sys/net/npf/npf_alg_icmp.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.15
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.16
sys/net/npf/npf_ncode.h: revision 1.11
sys/net/npf/files.npf: revision 1.10
usr.sbin/npf/npftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.c: revision 1.30
lib/libnpf/npf.3: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.5
usr.sbin/npf/npfctl/npf_build.c: revision 1.18
usr.sbin/npf/npfctl/npf_build.c: revision 1.19
sys/net/npf/npf_alg.c: revision 1.7
usr.sbin/npf/npfctl/Makefile: revision 1.10
sys/net/npf/npf_inet.c: revision 1.21
sys/net/npf/npf.h: revision 1.26
sys/net/npf/npf.h: revision 1.27
usr.sbin/pf/ftp-proxy/Makefile: revision 1.8
sys/net/npf/npf_nat.c: revision 1.19
sys/net/npf/npf.c: revision 1.15
sys/net/npf/npf_state.c: revision 1.14
sys/net/npf/npf_sendpkt.c: revision 1.14
sys/rump/net/lib/libnpf/Makefile: revision 1.4
IPv6 linklocal address printing cosmetics
NPF:
- Implement dynamic NPF rules. Controlled through npf(3) library of via
npfctl rule command. A rule can be removed using a unique identifier,
returned on addition, or using a key which is SHA1 hash of the rule.
Adjust npftest and add a regression test.
- Improvements to rule inspection mechanism.
- Initial BPF support as an alternative to n-code.
- Minor fixes; bump the version.
Disable -DWITH_NPF for now; will be converted to BPF mechanism.
- Fix NPF config reload with dynamic rules present.
- Implement list and flush commands on a dynamic ruleset.
Allow filtering on IP addresses even if the L4 protocol is unknown.
Patch from spz@.
npftest: adjust for recent change.
 1.9.2.7 11-Dec-2012  riz Pull up following revision(s) (requested by rmind in ticket #736):
usr.sbin/npf/npfctl/npf_parse.y: revision 1.17
sys/net/npf/npf_tableset.c: revision 1.16
usr.sbin/npf/npfctl/npfctl.h: revision 1.23
usr.sbin/npf/npfctl/npf_data.c: revision 1.19
usr.sbin/npf/npfctl/npf_build.c: revision 1.15
share/examples/npf/host-npf.conf: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.9
share/examples/npf/soho_gw-npf.conf: revision 1.3
usr.sbin/npf/npfctl/npf_var.h: revision 1.6
usr.sbin/npf/npfctl/npf.conf.5: revision 1.24
npfctl: extend syntax for extracting interface IP address(es) by the family.
adjust to current npf.conf syntax
npf_table_list: avoid triggering assert on diagnostic.
 1.9.2.6 24-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #702):
sys/net/npf/npf_tableset.c: revision 1.15
usr.sbin/npf/npfctl/npfctl.h: revision 1.21
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.6
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.10
sys/net/npf/npf_state_tcp.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.24
sys/net/npf/npf.h: revision 1.22
sys/net/npf/npf_ctl.c: revision 1.19
sys/net/npf/npf.c: revision 1.14
usr.sbin/npf/npfctl/npfctl.8: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.21
npf_tcp_inwindow: inspect the sequence numbers even if the packet contains no
data, fixing up only the RST to the initial SYN. This makes off-path attacks
more difficult. For the reference, see &quot;Reflection Scan: an Off-Path Attack
on TCP&quot; by Jan Wrobel.
Implement NPF table listing and preservation of entries on reload.
Bump the version.
npfctl(8): mention table listing.
 1.9.2.5 13-Aug-2012  riz Pull up following revision(s) (requested by rmind in ticket #485):
lib/libnpf/npf.c: revision 1.11
sys/net/npf/npf_session.c: revision 1.17
sys/modules/npf/Makefile: revision 1.10
usr.sbin/npf/npftest/npftest.c: revision 1.4
usr.sbin/npf/npftest/README: revision 1.1
sys/net/npf/npf_tableset.c: revision 1.14
usr.sbin/npf/npftest/npftest.h: revision 1.4
lib/libnpf/npf.h: revision 1.10
sys/net/npf/npf_ruleset.c: revision 1.14
usr.sbin/npf/npfctl/npf_data.c: revision 1.18
usr.sbin/npf/npftest/npftest.conf: revision 1.1
sys/net/npf/npf_handler.c: revision 1.21
sys/net/npf/npf_impl.h: revision 1.21
usr.sbin/npf/npfctl/npfctl.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_build.c: revision 1.13
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.1
usr.sbin/npf/npftest/npfstream.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.19
sys/net/npf/npf_nat.c: revision 1.16
sys/net/npf/npf_state.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.5
usr.sbin/npf/npfctl/npf_parse.y: revision 1.12
- Extend npftest: add ruleset inspection testing from the config generated
by npfctl debug functionality. Auto-create npftest interfaces for this.
- NPF sessions: combine protocol and interface into a separate substructure,
share between the entries and thus fix the handling of them. Constify.
- npftest: add regression tests for NAT policies.
- npf_build_nat: simplify and fix bi-NAT regression.
- Bump yacc stack size for npfctl.
 1.9.2.4 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.9.2.3 05-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #399):
sys/net/npf/npf_session.c: revision 1.14
sys/net/npf/npf_tableset.c: revision 1.12
sys/net/npf/npf_state_tcp.c: revision 1.8
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.3
usr.sbin/npf/npfctl/npf_data.c: revision 1.14
sys/net/npf/npf_inet.c: revision 1.13
sys/net/npf/npf_ruleset.c: revision 1.12
sys/net/npf/npf.h: revision 1.18
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.8: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.2
usr.sbin/npf/npfctl/npfctl.8: revision 1.8
sys/net/npf/npf_instr.c: revision 1.12
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.3
usr.sbin/npf/npfctl/npf.conf.5: revision 1.13
usr.sbin/npf/npfctl/npf.conf.5: revision 1.14
sys/net/npf/npf_state.c: revision 1.9
sys/net/npf/npf_processor.c: revision 1.11
usr.sbin/npf/npfctl/npfctl.c: revision 1.13
usr.sbin/npf/npfctl/npfctl.c: revision 1.14
usr.sbin/npf/npfctl/npf_build.c: revision 1.10
lib/libnpf/npf.3: revision 1.5
lib/libnpf/npf.h: revision 1.8
share/man/man9/npf_ncode.9: revision 1.9
usr.sbin/npf/npfctl/npf_scan.l: revision 1.4
lib/libnpf/npf.c: revision 1.9
usr.sbin/npf/npfctl/npfctl.h: revision 1.16
sys/net/npf/npf_nat.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.2
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.6
sys/net/npf/npf_impl.h: revision 1.17
sys/net/npf/npf_handler.c: revision 1.18
sys/net/npf/npf_handler.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.4
sys/net/npf/npf_ncode.h: revision 1.9
Fix and update npf.conf(5), npfctl(8) and its usage message.
npf_state_tcp: fix for FIN retransmission and out-of-order ACK case.
NPF improvements:
- Add NPF_OPCODE_PROTO to match the address and/or protocol only.
- Update parser to support arbitrary &quot;pass proto &lt;name/number&gt;&quot;.
- Fix IPv6 address and protocol handling (add a regression test).
- Fix few theorethical races in session handling module.
- Misc fixes, simplifications and some clean up.
npf_packet_handler: fix gcc unused warning.
 1.9.2.2 26-Jun-2012  riz Pull up following revision(s) (requested by rmind in ticket #365):
sys/rump/librump/rumpkern/rumpcpu_generic.c: revision 1.4
sys/net/npf/npf_session.c: revision 1.13
sys/net/npf/npf_tableset.c: revision 1.11
sys/net/npf/npf_state_tcp.c: revision 1.7
sys/net/npf/npf_inet.c: revision 1.12
sys/net/npf/npf.h: revision 1.17
sys/net/npf/npf_instr.c: revision 1.11
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.2
sys/net/npf/npf_state.c: revision 1.8
sys/net/npf/npf_log.c: revision 1.4
sys/net/npf/npf_alg.c: revision 1.4
sys/rump/librump/rumpkern/Makefile.rumpkern: revision 1.118
sys/net/npf/npf_nat.c: revision 1.13
sys/net/npf/npf.c: revision 1.11
sys/net/npf/npf_sendpkt.c: revision 1.11
sys/net/npf/npf_impl.h: revision 1.16
sys/rump/librump/rumpkern/scheduler.c: revision 1.28
rumpkern:
- Add subr_kcpuset.c and subr_pserialize.c modules.
- Add kcpuset_{running,attached} for RUMP env.
NPF:
- Rename some functions for consistency and de-inline them.
- Fix few invalid asserts (add regressoin test).
- Use pserialize(9) for ALG interface.
- Minor fixes, sprinkle many comments.
 1.9.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by rmind in ticket #158):
sys/net/npf/npf_session.c: revision 1.12
sys/net/npf/npf_tableset.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.2
usr.sbin/npf/npfctl/npf_parse.y: revision 1.4
sys/net/npf/npf_inet.c: revision 1.11
sys/net/npf/npf.h: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.5
sys/net/npf/npf_ruleset.c: revision 1.11
sys/net/npf/npf_instr.c: revision 1.10
usr.sbin/npf/npfctl/Makefile: revision 1.6
sys/net/npf/npf_processor.c: revision 1.10
sys/net/npf/npf_log.c: revision 1.3
lib/libnpf/npf.h: revision 1.7
sys/net/npf/npf_alg.c: revision 1.3
sys/net/npf/npf_sendpkt.c: revision 1.9
lib/libnpf/npf.c: revision 1.8
usr.sbin/npf/npfctl/npfctl.h: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.13
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.8
sys/net/npf/npf_ctl.c: revision 1.14
sys/net/npf/npf_nat.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.12
sys/net/npf/npf_impl.h: revision 1.11
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.1
sys/net/npf/npf_impl.h: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.2
sys/net/npf/npf_handler.c: revision 1.14
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.3
sys/net/npf/npf_handler.c: revision 1.15
sys/net/npf/npf_ncode.h: revision 1.6
sys/net/npf/npf.c: revision 1.8
sys/net/npf/npf.c: revision 1.9
sys/net/npf/npf_alg_icmp.c: revision 1.9
sys/net/npf/npf_session.c: revision 1.11
- Add NPF_DECISION_BLOCK and NPF_DECISION_PASS. Be more defensive in the
packet handler. Change the default policy to block when the config is
loaded and set it to pass when flush operation is performed.
- Use kmem_zalloc(9) instead of kmem_alloc(9) in few places.
- npf_rproc_{create,release}: use kmem_intr_{alloc,free} as the destruction
of rule procedure might happen in the interrupt handler (under a very rare
condition, if config reload races with the handler).
- npf_session_establish: check whether layer 3 and 4 are cached.
- npfctl_build_group: do not make groups as passing rules.
- Remove some unecessary header inclusion.
Simplify slightly: merge iface into addr_or_iface, use it in filt_addr.
Add a small disassembler.
definitions used by the disassembler.
- better printing of type/code flags/mask
- pass the instruction start pointer, instead of subtracting 1 to account for it
- Save active config in proplib dictionary; add GETCONF ioctl to retrieve.
- Few fixes. Improve some comments.
don't leak the branch target array.
Add NPF config retrieval routines.
 1.14.2.5 03-Dec-2017  jdolecek update from HEAD
 1.14.2.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.14.2.3 23-Jun-2013  tls resync from head
 1.14.2.2 25-Feb-2013  tls resync with head
 1.14.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.18.2.1 18-May-2014  rmind sync with head
 1.22.8.1 18-Jan-2017  skrll Sync with netbsd-5
 1.22.4.3 28-Aug-2017  skrll Sync with HEAD
 1.22.4.2 05-Feb-2017  skrll Sync with HEAD
 1.22.4.1 22-Apr-2016  skrll Sync with HEAD
 1.22.2.1 18-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1319):
sys/modules/npf/Makefile: revision 1.19
sys/net/npf/files.npf: revision 1.18
sys/net/npf/lpm.c: revision 1.1
sys/net/npf/lpm.h: revision 1.1
sys/net/npf/npf_impl.h: revision 1.62
sys/net/npf/npf_tableset.c: revision 1.24
sys/net/npf/npf_tableset_ptree.c: file removal
sys/rump/net/lib/libnpf/Makefile: revision 1.18
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
--
ditch ptree and use lpm
--
remove ptree add lpm
 1.23.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.23.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.26.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.27.14.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.27.14.1 10-Jun-2019  christos Sync with HEAD
 1.27.12.2 26-Jan-2019  pgoyette Sync with HEAD
 1.27.12.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.33.2.3 21-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1718):

sys/net/npf/npf_tableset.c: revision 1.41

npf(9): Drop table lock around copyout.

It is forbidden to hold a spin lock around copyout, and t_lock is a
spin lock.

We need t_lock in order to iterate over the list of entries.
However, during copyout itself, we only need to ensure that the
object we're copying out isn't freed by npf_table_remove or
npf_table_gc.

Fortunately, the only caller of npf_table_list, npf_table_remove, and
npf_table_gc is npfctl_table, and it serializes all of them by the
npf config lock. So we can safely drop t_lock across copyout.

PR kern/57136
PR kern/57181
 1.33.2.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.33.2.1 01-Sep-2019  martin Pull up following revision(s) (requested by rmind in ticket #139):

lib/libnpf/npf.c: revision 1.47
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.10
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.10
sys/net/npf/npf.h: revision 1.61
sys/net/npf/npf_ctl.c: revision 1.56
sys/net/npf/npf_os.c: revision 1.15
lib/libnpf/libnpf.3: revision 1.10
sys/net/npf/npf_tableset.c: revision 1.34
usr.sbin/npf/npfctl/npfctl.c: revision 1.61
sys/net/npf/npf_impl.h: revision 1.77
lib/libnpf/npf.h: revision 1.37

- npftest: fix a memleak in a unit test (standalone path only).
- Minor style fixes. No functional change.
npfkern/libnpf: Add support for the table replace/swap operation.
Contributed by Timshel Knoll-Miller.
 1.35.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.38.4.1 21-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #332):

sys/net/npf/npf_tableset.c: revision 1.41

npf(9): Drop table lock around copyout.

It is forbidden to hold a spin lock around copyout, and t_lock is a
spin lock.

We need t_lock in order to iterate over the list of entries.
However, during copyout itself, we only need to ensure that the
object we're copying out isn't freed by npf_table_remove or
npf_table_gc.

Fortunately, the only caller of npf_table_list, npf_table_remove, and
npf_table_gc is npfctl_table, and it serializes all of them by the
npf config lock. So we can safely drop t_lock across copyout.

PR kern/57136
PR kern/57181
 1.42.6.1 02-Aug-2025  perseant Sync with HEAD
 1.2 09-Dec-2016  christos This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
 1.1 15-Jul-2012  rmind branches: 1.1.2; 1.1.4; 1.1.6; 1.1.18; 1.1.20; 1.1.24; 1.1.26;
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.1.26.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.1.24.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.20.1 05-Feb-2017  skrll Sync with HEAD
 1.1.18.1 18-Dec-2016  snj Pull up following revision(s) (requested by rmind in ticket #1319):
sys/modules/npf/Makefile: revision 1.19
sys/net/npf/files.npf: revision 1.18
sys/net/npf/lpm.c: revision 1.1
sys/net/npf/lpm.h: revision 1.1
sys/net/npf/npf_impl.h: revision 1.62
sys/net/npf/npf_tableset.c: revision 1.24
sys/net/npf/npf_tableset_ptree.c: file removal
sys/rump/net/lib/libnpf/Makefile: revision 1.18
This patches ditches the ptree(3) library, because it is broken (you
can get missing entries!). Instead, as a temporary solution, we switch
to a simple linear scan of the hash tables for the longest-prefix-match
(lpm.c lpm.h) algorithm. In fact, with few unique prefixes in the set,
on modern hardware this simple algorithm is pretty fast anyway!
--
ditch ptree and use lpm
--
remove ptree add lpm
 1.1.6.2 30-Oct-2012  yamt sync with head
 1.1.6.1 15-Jul-2012  yamt file npf_tableset_ptree.c was added on branch yamt-pagecache on 2012-10-30 17:22:45 +0000
 1.1.4.1 03-Dec-2017  jdolecek update from HEAD
 1.1.2.2 16-Jul-2012  riz Pull up following revision(s) (requested by rmind in ticket #421):
lib/libnpf/npf.c: revision 1.10
sys/net/npf/npf_session.c: revision 1.15
sys/net/npf/npf_tableset.c: revision 1.13
sys/net/npf/npf_state_tcp.c: revision 1.9
usr.sbin/npf/npfctl/npf_data.c: revision 1.15
sys/net/npf/npf_inet.c: revision 1.14
sys/net/npf/npf_ruleset.c: revision 1.13
sys/net/npf/npf.h: revision 1.19
usr.sbin/npf/npfctl/npf_ncgen.c: revision 1.12
sys/net/npf/npf_instr.c: revision 1.13
sys/net/npf/npf_handler.c: revision 1.20
usr.sbin/npf/npftest/libnpftest/npf_table_test.c: revision 1.4
sys/net/npf/npf_alg_icmp.c: revision 1.10
usr.sbin/npf/npfctl/npfctl.c: revision 1.15
usr.sbin/npf/npfctl/npf_build.c: revision 1.11
lib/libnpf/npf.h: revision 1.9
sys/net/npf/npf_alg.c: revision 1.5
sys/rump/dev/lib/libnpf/Makefile: revision 1.4
usr.sbin/npf/npfctl/npfctl.h: revision 1.17
sys/net/npf/npf_ctl.c: revision 1.16
sys/net/npf/npf_nat.c: revision 1.15
sys/net/npf/npf_tableset_ptree.c: revision 1.1
sys/net/npf/npf.c: revision 1.12
sys/net/npf/npf_sendpkt.c: revision 1.12
usr.sbin/npf/npfctl/npf_disassemble.c: revision 1.7
sys/net/npf/npf_impl.h: revision 1.18
sys/net/npf/files.npf: revision 1.7
usr.sbin/npf/npfctl/npf_parse.y: revision 1.10
- Rework NPF tables and fix support for IPv6. Implement tree table type
using radix / Patricia tree. Universal IPv4/IPv6 comparator for ptree(3)
was contributed by Matt Thomas.
- NPF tables: update regression tests, improve npfctl(8) error messages.
- Fix few bugs when using kernel modules and handle module autounloader.
- Few other fixes and misc cleanups.
- Bump the version.
 1.1.2.1 15-Jul-2012  riz file npf_tableset_ptree.c was added on branch netbsd-6 on 2012-07-16 22:13:27 +0000
 1.10 27-Aug-2020  riastradh npf: Don't stop early after sleeping and before processing instances.

We already check winfo->exit below, after processing instances and
before sleeping again.

Candidate fix for:

panic: kernel diagnostic assertion "LIST_EMPTY(&winfo->instances)" failed: file "/home/riastradh/netbsd/current/src/sys/rump/net/lib/libnpf/../../../..//net/npf/npf_worker.c", line 300 NPF instances must be discharged before the npfk_sysfini() call
 1.9 30-May-2020  rmind npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.
 1.8 30-May-2020  rmind Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.7 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.6 19-Jan-2019  rmind branches: 1.6.4;
Major NPF improvements:
- Convert NPF connection table to thmap. State lookup is now lock-free.
- Improve connection state G/C: it is now incremental and tunable.
- Add support for dynamic NAT address. Translation addresses can now be
selected from a pool of addresses. There are two selection algorithms,
"ip-hash" and "round-robin" (see the man page).
- Translation address can be specified as e.g. ifaddrs(wm0) in npf.conf
to dynamically choose an IP from the interface address(es).
- Add support for the NETMAP algorithm with static NAT for net-to-net
translation (it is equivalent to iptables NETMAP logic).
- Convert 'ipset' tables to use thmap; the table lookup is now lock-free.
- Misc improvements, bug fixes and more unit tests.
- Bump NPF_VERSION (will also bump libnpf).
 1.5 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.4 10-Dec-2017  rmind branches: 1.4.2; 1.4.4;
- npf_mk_rules: enforce unique names for the dynamic rulesets.
- npf_worker_unregister: merge fix for the standalone NPF.
 1.3 02-Jan-2017  rmind NPF: implement dynamic handling of interface addresses (the kernel part).
 1.2 26-Dec-2016  christos Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.1 02-Jun-2013  rmind branches: 1.1.2; 1.1.10; 1.1.14; 1.1.18;
- NPF connection tracking: rework synchronisation on tracking disable/enable
points and document it. Split the worker thread into a separate module
with an interface, so it could be re-used for other tasks.
- Replace ALG list with arrays and thus hit fewer cache lines.
- Misc bug fixes.
 1.1.18.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.14.1 05-Feb-2017  skrll Sync with HEAD
 1.1.10.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.10.1 02-Jun-2013  yamt file npf_worker.c was added on branch yamt-pagecache on 2014-05-22 11:41:09 +0000
 1.1.2.3 03-Dec-2017  jdolecek update from HEAD
 1.1.2.2 23-Jun-2013  tls resync from head
 1.1.2.1 02-Jun-2013  tls file npf_worker.c was added on branch tls-maxphys on 2013-06-23 06:20:25 +0000
 1.4.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4.4.1 10-Jun-2019  christos Sync with HEAD
 1.4.2.2 26-Jan-2019  pgoyette Sync with HEAD
 1.4.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.6.4.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.6.4.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.6 01-Jul-2025  joe kernel code for layer 2 filtering in NPF

reviewed by christos@
 1.5 30-May-2020  rmind branches: 1.5.26;
Major NPF improvements (merge from upstream):

- Switch to the C11-style atomic primitives using atomic_loadstore(9).

- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.

- npfkern: rewrite the G/C worker logic and make it self-tuning.

- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.

- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.

- Amend and improve the manual pages.
 1.4 11-Aug-2019  rmind Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.
 1.3 23-Jul-2019  rmind branches: 1.3.2;
NPF improvements:
- Add support for dynamic NETMAP algorithm (stateful net-to-net).
- Add most of the support for the dynamic NAT rules; a little bit more
userland work is needed to finish this up and enable.
- Replace 'stateful-ends' with more permissive 'stateful-all'.
- Add various tunable parameters and document them, see npf-params(7).
- Reduce the memory usage of the connection state table (conndb).
- Portmap rewrite: use memory more efficiently, handle addresses dynamically.
- Bug fix: add splsoftnet()/splx() around the thmap writers and comment.
- npftest: clean up and simplify; fix some memleaks to make ASAN happy.
 1.2 29-Sep-2018  rmind NPF: Major rework -- migrate NPF to the libnv library.
- This conversion significantly simplifies the code and moves NPF to
a binary serialisation format (replacing the XML-like format).
- Fix some memory/reference leaks and possibly use-after-free bugs.
- Bump NPF_VERSION as this change makes libnpf incompatible with the
previous versions. Also, different serialisation format means NPF
connection/config saving and loading is not compatible with the
previous versions either.

Thanks to christos@ for extra testing.
 1.1 26-Dec-2016  christos branches: 1.1.2; 1.1.6; 1.1.18; 1.1.20; 1.1.22;
Sync NPF with the version on github: backport standalone NPF changes,
which allow us to create and run separate NPF instances. Minor fixes.
(from rmind@)
 1.1.22.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.22.1 10-Jun-2019  christos Sync with HEAD
 1.1.20.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 26-Dec-2016  jdolecek file npfkern.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1.6.2 05-Feb-2017  skrll Sync with HEAD
 1.1.6.1 26-Dec-2016  skrll file npfkern.h was added on branch nick-nhusb on 2017-02-05 13:40:58 +0000
 1.1.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.1.2.1 26-Dec-2016  pgoyette file npfkern.h was added on branch pgoyette-localcount on 2017-01-07 08:56:50 +0000
 1.3.2.2 20-Jun-2020  martin Pull up following revision(s) (requested by rmind in ticket #956):

usr.sbin/npf/npf-params.7: revision 1.4
sys/net/npf/npf_worker.c: revision 1.9
usr.sbin/npf/npftest/npftest.h: revision 1.17
usr.sbin/npf/npfctl/npf_bpf_comp.c: revision 1.16
usr.sbin/npf/npf-params.7: revision 1.5
sys/net/npf/npf_state_tcp.c: revision 1.21
usr.sbin/npf/npfctl/npf_build.c: revision 1.55
usr.sbin/npf/npf-params.7: revision 1.6
sys/net/npf/npfkern.h: revision 1.5
lib/libnpf/npf.c: revision 1.49
usr.sbin/npf/npf-params.7: revision 1.7
sys/net/npf/npf_impl.h: revision 1.81
sys/net/npf/npf_ext_log.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.h: revision 1.53
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.11
sys/net/npf/npf_nat.c: revision 1.50
sys/net/npf/npf_mbuf.c: revision 1.24
sys/net/npf/npf_alg.c: revision 1.22
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: file removal
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.10
sys/net/npf/npf.h: revision 1.63
usr.sbin/npf/npftest/libnpftest/npf_test.h: revision 1.21
usr.sbin/npf/npfctl/npf_var.c: revision 1.13
sys/net/npf/files.npf: revision 1.23
usr.sbin/npf/npfctl/npf_show.c: revision 1.32
usr.sbin/npf/npfctl/npf.conf.5: revision 1.91
sys/net/npf/npf_os.c: revision 1.18
sys/net/npf/npf_connkey.c: revision 1.2
sys/net/npf/npf_conf.c: revision 1.17
lib/libnpf/libnpf.3: revision 1.12
usr.sbin/npf/npftest/npftest.c: revision 1.25
usr.sbin/npf/npftest/libnpftest/npf_gc_test.c: revision 1.1
usr.sbin/npf/npfctl/npf_parse.y: revision 1.51
sys/net/npf/npf_tableset.c: revision 1.35
usr.sbin/npf/npftest/npftest.conf: revision 1.9
sys/net/npf/npf_sendpkt.c: revision 1.22
usr.sbin/npf/npfctl/npf_var.h: revision 1.10
sys/net/npf/npf_state.c: revision 1.23
sys/net/npf/npf_conn.h: revision 1.20
usr.sbin/npf/npfctl/npfctl.c: revision 1.64
usr.sbin/npf/npfctl/npf_cmd.c: revision 1.1
sys/net/npf/npf_portmap.c: revision 1.5
sys/net/npf/npf_params.c: revision 1.3
usr.sbin/npf/npfctl/npf_scan.l: revision 1.32
tests/net/npf/t_npf.sh: revision 1.4
sys/net/npf/npf_ext_rndblock.c: revision 1.9
lib/libnpf/npf.h: revision 1.39
sys/net/npf/npf_ruleset.c: revision 1.51
sys/net/npf/npf_alg_icmp.c: revision 1.33
sys/net/npf/npf.c: revision 1.43
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.17
usr.sbin/npf/npfctl/npfctl.8: revision 1.25
sys/net/npf/npf_ctl.c: revision 1.60
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.18
usr.sbin/npf/npftest/libnpftest/Makefile: revision 1.11
sys/net/npf/npf_handler.c: revision 1.49
sys/net/npf/npf_inet.c: revision 1.57
sys/net/npf/npf_ifaddr.c: revision 1.7
sys/net/npf/npf_conndb.c: revision 1.9
sys/net/npf/npf_if.c: revision 1.13
usr.sbin/npf/npfctl/Makefile: revision 1.15
sys/net/npf/npf_conn.c: revision 1.32
sys/net/npf/npf_ext_normalize.c: revision 1.10
sys/net/npf/npf_rproc.c: revision 1.20
sys/net/npf/npf_worker.c: revision 1.8

Major NPF improvements (merge from upstream):
- Switch to the C11-style atomic primitives using atomic_loadstore(9).
- npfkern: introduce the 'state.key.interface' and 'state.key.direction'
settings. Users can now choose whether the connection state should be
strictly per-interface or global at the configuration level. Keep NAT
logic to be always per-interface, though.
- npfkern: rewrite the G/C worker logic and make it self-tuning.
- npfkern and libnpf: multiple bug fixes; add param exporting; introduce
more parameters. Remove npf_nvlist_{copyin,copyout}() functions and
refactor npfctl_load_nvlist() with others; add npfctl_run_op() to have
a single entry point for operations. Introduce npf_flow_t and clean up
some code.
- npfctl: lots of fixes for the 'npfctl show' logic; make 'npfctl list'
more informative; misc usability improvements and more user-friendly
error messages.
- Amend and improve the manual pages.

npf_worker_sys{init,fini}: initialize/destroy the exit_cv condvar.

npftest -- npf_test_init(): add a workaround for NetBSD.

npf-params(7): fix the state.key defaults.

npf-params.7: s/filer/filter/

Adjust to "npfctl debug" command line changes, from rmind@.

Use more markup.
 1.3.2.1 13-Aug-2019  martin Pull up following revision(s) (requested by rmind in ticket #49):

usr.sbin/npf/npf.7: revision 1.7
sys/net/npf/npfkern.h: revision 1.4
sys/net/npf/npf_conn.h: revision 1.18
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.13
sys/net/npf/npf_ctl.c: revision 1.55
sys/net/npf/npf_os.c: revision 1.14
sys/net/npf/npf_conf.c: revision 1.14
usr.sbin/npf/npftest/libnpftest/npf_conn_test.c: revision 1.3
usr.sbin/npf/npftest/libnpftest/npf_perf_test.c: revision 1.9
sys/net/npf/npf_impl.h: revision 1.76
sys/net/npf/npf_portmap.c: revision 1.4
sys/net/npf/npf_params.c: revision 1.2
sys/net/npf/npf.c: revision 1.40
usr.sbin/npf/npftest/libnpftest/npf_test_subr.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.18
sys/net/npf/npf_nat.c: revision 1.47
sys/net/npf/npf_handler.c: revision 1.47
sys/net/npf/npf_inet.c: revision 1.55
sys/net/npf/npf_if.c: revision 1.10
sys/net/npf/npf_worker.c: revision 1.7
usr.sbin/npf/npf-params.7: revision 1.3

npf-params(7): add more bpf.jit details.
From David H. Gutteridge.

Adjust some internal NPF APIs:
* npfkern: use the npfk_ prefix.
* NPF portmap: amend the API so it could be used elsewhere.
* Make npf_connkey_t public.

npf.7: add xref to npf-params.7
(Adding directly here since this particular file isn't included in
rmind@'s upstream GitHub repo at present.)
 1.5.26.1 02-Aug-2025  perseant Sync with HEAD

RSS XML Feed