Home | History | Annotate | Download | only in net
History log of /src/sys/net/if.h
RevisionDateAuthorComments
 1.308  05-Jun-2025  ozaki-r if: remove unused ifa_ifwithaf()
 1.307  05-Jun-2025  ozaki-r if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
 1.306  22-Sep-2024  andvar s/remvoed/removed/ in comment.
 1.305  09-Oct-2023  riastradh branches: 1.305.2;
net/if.h: Explain the IFF_ALLMULTI situation.

No functional change intended.
 1.304  25-Nov-2022  knakahara Support explicit unnumbered interface.

Currently, NetBSD supports implicit unnumbered interface by setting
the same IP address to two interfaces. However, such interface is not
treated as unnumbered when one of the interfaces is being changed and
has been changed IP address. That behavior can be harmful for some
routing daemons.
 1.303  24-Oct-2022  msaitoh Make ifq_drops in struct ifqueue and struct ifaltq 64 bit.
 1.302  18-Sep-2022  martin Typo in comment
 1.301  03-Sep-2022  thorpej Garbage-collect the remaining vestiges of netisr.
 1.300  20-Aug-2022  riastradh ifnet(9): Defer if_watchdog (a.k.a. if_slowtimo) to workqueue.

This is necessary to make mii_down and the *_init/stop routines that
call it to sleep waiting for MII callouts on other CPUs.

Mark the workqueue and callout MP-safe; only take the kernel lock
around the callback.

No kernel bump despite change to struct ifnet because the change is
ABI-compatible and using the callout outside net/if.c has never been
kosher.
 1.299  28-Jul-2022  skrll Trailing whitespace
 1.298  20-Jun-2022  yamaguchi bpf(4): added support for VLAN hardware offloading of ethernet devices
 1.297  20-Jun-2022  yamaguchi Handling frames that vlan id is 0 as non-VLAN frames
even if a vlan tag is stripped by harware offloading
 1.296  31-Dec-2021  riastradh sys/net: New functions if_ioctl, if_init, and if_stop.

These are wrappers, suitable for inserting appropriate kasserts
regarding the API's locking contract, for the corresponding functions
in struct ifnet.

Since these are intended to commit configuration changes to the
interface, which may involve resetting the device, the caller should
hold IFNET_LOCK. However, I can't straightforwardly prove that all
callers do yet, so the assertion is disabled for now.
 1.295  30-Sep-2021  yamaguchi net: obsolete ifnet::if_link_state_chenged
that was used for updating link-state of vlan I/F

The obsoleted function is replaced with
ifnet::if_linkstate_hooks
 1.294  30-Sep-2021  yamaguchi Provide a hook point called at change of link state
 1.293  30-Sep-2021  yamaguchi Replace ifnet::if_agriprivate with ifnet::if_lagg

agr(4) and lagg(4) can not be used on the same interface so that
if_agrprivate and if_lagg are not used at the same time.
For resolve this wasteful, if_lagg is used in not only lagg(4)
but also agr(4).

After this modification, if_lagg has 3 states:
1. if_lagg == NULL
- Both agr(4) and lagg(4) are not running on the interface
2. if_lagg != NULL && ifp->if_type != IFT_IEEE8023ADLAG
- agr(4) is running on the I/F
3. if_lagg != NULL && ifp->if_type == IFT_IEEE8023ADLAG
- lagg(4) is running on the I/F
 1.292  09-Aug-2021  andvar fix various typos in compatibility, mainly in comments.
 1.291  29-Jun-2021  riastradh Make if_stats_init, if_attach, if_initialize return void.

percpu_alloc can't fail.


Author: Maya Rashish <maya@NetBSD.org>
Committer: Taylor R Campbell <riastradh@NetBSD.org>
 1.290  17-May-2021  yamaguchi Add a new link-aggregation pseudo interface named lagg(4)

- FreeBSD's lagg(4) based implementation
- MP-safe and MP-scalable
 1.289  15-Oct-2020  roy branches: 1.289.6; 1.289.8;
net: remove IFEF_NO_LINK_STATE_CHANGE

This flag was only set for virtual interfaces.
All virtual interfaces have a means of knowing if they are going to work
or not and as such now support link state changes.

If we want this flag back, it should be used as an indicator that
the interfaces does not support link state changes that userland can use
so it can make a decision on what to do when the link state is UNKNOWN.
 1.288  27-Sep-2020  roy bridge: When an interface joins then mark addresses on it as tentative

The exact flow is detatch addresses, join bridge and then mark detached
addresses as tentative.
This ensures that Duplicate Address Detection for the joining interface
are performed across all members of the bridge.
 1.287  26-Sep-2020  roy net: Add a callback to ifnet to notify of link state changes
 1.286  26-Sep-2020  roy net: Fix the setting of if_link_state

Link state changes are not dependant on the interface being up, but we also
need to guard against more link state changes being scheduled when the
interface is being detached.

We do this by clearing the link queue but keeping if_link_sheduled = true.
We can check for this in both if_link_state_change() and
if_link_state_change_work() to abort early as there is no point in doing
anything if the interface is being detached because if_down() is called
in if_detach() after the workqueue has been drained to the same overall
effect.
 1.285  22-Sep-2020  roy ifconfig: Report link state even if media is not supported

For AF_LINK addrs from getifaddrs(2), ifa_data is struct if_data.
This in turn holds ifi_link_state which we can use to report
link status if the interface does not support media where it's normally
reported.

Based on OpenBSD.
 1.284  28-Aug-2020  ozaki-r net: introduce IFQ_ENQUEUE_ISR to assemble packet queuing routines (NFCI)
 1.283  05-May-2020  jdolecek remove struct ifnet if_mcastop, it's not used by anything
 1.282  14-Feb-2020  thorpej Remove the conditional __IF_STATS_PERCPU.
 1.281  06-Feb-2020  thorpej Perform link state change processing on a work queue, rather than in a
softint.
 1.280  01-Feb-2020  thorpej Make if_stats competely opaque to user-space.
 1.279  01-Feb-2020  thorpej Flip the switch to the per-cpu implementation in <net/if_stats.h>. Leave
the conditional in place for a time in case serious problems are discovered,
so that the Old Way can be re-enabled quickly. After some time, the Old
Way will be removed completely.
 1.278  29-Jan-2020  thorpej Add support for MP-safe network interface statistics by maintaining them
in per-cpu storage, and collecting them for export in an if_data structure
when user-space wants them.

The new if_stat API is structured to make a gradual transition to the
new way in network drivers possible, and per-cpu stats are currently
disabled (thus there is no kernel ABI change). Once all drivers have
been converted, the old ABI will be removed, and per-cpu stats will be
enabled universally.
 1.277  19-Sep-2019  knakahara branches: 1.277.2;
Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.276  13-Sep-2019  msaitoh if_flags is neither int nor short. It's unsigned short.
 1.275  10-Aug-2019  rmind Add the ifnet_t::if_npf_private field. Bump the kernel version.
Fixes PR/54098.
 1.274  04-Jul-2019  ozaki-r branches: 1.274.2;
Add support for a network interface description.

ioctl(2):
- Add SIOCGIFDESCR/SIOCSIFDESCR commands to get/set the description.

This enables to make a memo for interface, like "Home network" or "Remote VPN".

From t-kusaba@IIJ
 1.273  24-Jun-2019  skrll Fix 'unknown' spellos
 1.272  10-May-2019  msaitoh Remove extra parentheses. No functional change.
 1.271  10-May-2019  msaitoh Add missing parentheses for IFQ_CLASSIFY macro's argument.
 1.270  10-May-2019  msaitoh Modify comment to make the data structure clear. No functional change.
 1.269  23-Mar-2019  pgoyette Replace compile-time checking for vlan code with a module hook.

Should resolve the errors reported on irc when booting a kernel which
has agr without vlan:


[ 1.0000000] WARNING: module error: built-in module if_agr can't find builtin dependency `if_vlan'
[ 1.0000000] WARNING: module error: built-in module if_agr prerequisite if_vlan failed, error 2
 1.268  05-Feb-2019  msaitoh Remove NOTRAILERS from IFFBITS.
 1.267  05-Feb-2019  msaitoh Remove very old IFF_NOTRAILERS flag.
 1.266  18-Oct-2018  knakahara fix panic when do ifconfig -vlanif and ifconfig vlanif again. advised by ozaki-r@.

e.g. do the following commands.
====================
# ifconfig vlan0 create
# ifconfig vlan0 vlan 100 vlanif wm0
# ifconfig vlan0 -vlanif wm0
# ifconfig vlan0 vlan 100 vlanif wm0
====================

ATF net/if_vlan do this type of test, however it cannot detect this bug.
Because the shmif(4)'s ifp->if_hwdl is always NULL as shmif(4)'s ethernet
address is set U/L bit.
See: https://nxr.netbsd.org/xref/src/sys/net/if_ethersubr.c#997
 1.265  22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.264  03-Jul-2018  ozaki-r Fix net.inet6.ip6.ifq node doesn't exist

The node (and child nodes) is initialized in sysctl_net_pktq_setup, but the call
of sysctl_net_pktq_setup is skipped unexpectedly.

sysctl_net_pktq_setup is skipped if in6_present is false that indicates the
netinet6 component isn't loaded on rump kernels. However the flag is
accidentally always false because the flag is turned on in in6_dom_init that is
called after if_sysctl_setup on both normal and rump kernels.

Fix the issue by moving if_sysctl_setup after in6_dom_init (domaininit on normal
kernels). This fix is ad-hoc but good enough for netbsd-8. We should refine
the initialization order of network components in the future.

Pointed out by hikaru@
 1.263  21-Jun-2018  knakahara branches: 1.263.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.262  12-Jun-2018  ozaki-r Check if ether_ifdetach is called without INET_LOCK
 1.261  01-May-2018  maxv Move if_name() from net_osdep.h to if.h. net_osdep.h is now unused and can
be removed - the other BSDs did the same.

Discussed with Kengo (if.h suggested by him).
 1.260  19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.259  12-Apr-2018  ozaki-r Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.
 1.258  15-Jan-2018  maxv branches: 1.258.2;
Add a KASSERT in IFQ_CLASSIFY, we really need to make sure the given
mbuf is the top of the chain.
 1.257  18-Dec-2017  ozaki-r Note that IFNET_LOCK must not be held in softint
 1.256  15-Dec-2017  ozaki-r Write a guideline for converting an interface to IFEF_MPSAFE

Requested by skrll@
 1.255  15-Dec-2017  ozaki-r Describe which lock is used to protect each member variable of struct ifnet

Requested by skrll@
 1.254  15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.253  11-Dec-2017  ozaki-r Wrap if_ioctl_lock with IFNET_* macros (NFC)

Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
 1.252  11-Dec-2017  ozaki-r Rename IFNET_LOCK to IFNET_GLOBAL_LOCK

IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
 1.251  08-Dec-2017  ozaki-r Revert "Make if_timer MP-safe if IFEF_MPSAFE"

Because it has decreased the performance of wm. And also I found that
wm_watchdog doesn't work well with if_watchdog framework at all. Sharing one
counter (if_timer) with multiple instances (hardware multi-queues) can't detect
a single (or some) stall of them because other instances reset the counter even
if the stalled one want the watchdog to fire.

Interfaces without IFEF_MPSAFE works safely with the original if_watchdog thanks
to KENREL_LOCK. OTOH, interfaces with IFEF_MPSAFE shouldn't use if_watchdog and
should implement their own watchdog timer that works with multiple instances.
 1.250  08-Dec-2017  ozaki-r Fix build of kernels without ether

By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.

PR kern/52790
 1.249  06-Dec-2017  ozaki-r Make if_timer MP-safe if IFEF_MPSAFE

if_timer, a counter used by if_watchdog (if_slowtimo), can be modified in
if_watchdog and if_start and/or interrupt handlers of some device drivers. All
such accesses were serialized by KERNEL_LOCK. If IFEF_MPSAFE is enabled,
KERNEL_LOCK of if_start (and perhaps interrupt handlers) is omitted and if_timer
becomes racy.

Fix the race condition by protecting if_timer by a spin mutex. if_watchdog_reset
and if_watchdog_stop are introduced to ensure to take the mutex on accessing
if_timer. Interface with IFEF_MPSAFE enabled must use the functions.

In addition, if_watchdog callout is now set CALLOUT_MPSAFE if IFEF_MPSAFE. It
means that if_watchdog implemented by a driver must be MP-safe if the driver is
set IFEF_MPSAFE.

Currenlty interfaces with IFEF_MPSAFE implementing if_watchdog and accessing
if_timer in if_start and interrupt handlers are only wm(4). wm is changed to
use the functions. (Its watchdog handler (wm_watchdog) is already MP-safe.

These contracts will be written somewhere in a further commit.

Note that the spin mutex is now ifp->if_snd.ifq_lock to avoid adding another
spin mutex to each interface. For now reusing it isn't problematic (see the
comment to know why) thought if that does matter in the future, feel free to
replace it with a new spin mutex. It's easy to do.
 1.248  06-Dec-2017  knakahara unify processing to check nesting count for some tunnel protocols.
 1.247  06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock on if_up and if_down

One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
 1.246  06-Dec-2017  ozaki-r Fix locking against myself on ifpromisc

vlan_unconfig_locked could be called with holding if_ioctl_lock.
 1.245  06-Dec-2017  ozaki-r Ensure to hold if_ioctl_lock when calling if_flags_set
 1.244  22-Nov-2017  ozaki-r Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE

If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.

This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.

Proposed on tech-kern@ and tech-net@
 1.243  17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.242  16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.241  23-Oct-2017  msaitoh if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
 1.240  27-Jun-2017  roy Introduce if_get_bylla to find an interface with the active
local link address.
 1.239  19-May-2017  ozaki-r branches: 1.239.2;
Allow CARP to call the link_state_change handler immediately

If the handler is delayed because of the indirection call via softint,
some operations are executed in reverse and may cause unexpected
behaviors. For example, due to the issue a GARP packet wasn't sent on
a transition from the BACKUP state to the MASTER state; this happened
because IN_IFF_DETACHED flag wasn't cleared on arpannounce, which
had been cleared in the link_state_change handler.

This fixes an issue reported by sborrill@ on tech-net:
http://mail-index.netbsd.org/tech-net/2017/03/14/msg006283.html
 1.238  06-Apr-2017  ozaki-r Revert "Make sure to hold if_ioctl_lock when calling ifp->if_ioctl"

As per pgoyette@ and riastradh@ requests; we shouldn't decide to
hold a lock based on if the lock is held or not.
 1.237  05-Apr-2017  ozaki-r Make sure to hold if_ioctl_lock when calling ifp->if_ioctl

Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.
 1.236  14-Mar-2017  ozaki-r Use if_acquire and if_release instead of using psref API directly

- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
 1.235  23-Feb-2017  ozaki-r Remove mkludge stuffs

For unknown reasons, IPv6 multicast addresses are linked to a first
IPv6 address assigned to an interface. Due to the design, when removing
a first address having multicast addresses, we need to save them to
somewhere and later restore them once a new IPv6 address is activated.
mkludge stuffs support the operations.

This change links multicast addresses to an interface directly and
throws the kludge away.

Note that as usual some obsolete member variables remain for kvm(3)
users. And also sysctl net.inet6.multicast_kludge remains to avoid
breaking old ifmcstat.

TODO: currently ifnet has a list of in6_multi but obviously the list
should be protocol independent. Provide a common structure (if_multi
or something) to handle in6_multi and in_multi together as well as
ifaddr does for in_ifaddr and in6_ifaddr.
 1.234  17-Feb-2017  ozaki-r Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.
 1.233  22-Dec-2016  ozaki-r branches: 1.233.2;
Remove assertion that the lock isn't held

It's useless in this case, because without it we can know that
the lock is held or not on a next lock acquisition and even more
if LOCKDEBUG is enabled a failure on the acquisition will provide
useful information for debugging while an assertion failure will
provide just the fact that the assertion failed.
 1.232  13-Dec-2016  ozaki-r Constify ifp of if_is_deactivated
 1.231  12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.230  08-Dec-2016  ozaki-r Introduce deferred if_start framework

The framework provides a means to schedule if_start that will be executed
in softint later. It intends to be used to avoid calling if_start,
especially bpf_mtap, in hardware interrupt.

It adds a dedicated softint to a driver if the driver requests to use the
framework via if_deferred_start_init. The driver can schedule deferred
if_start by if_schedule_deferred_start.

Proposed and discussed on tech-kern and tech-net
 1.229  22-Nov-2016  ozaki-r Make lortrequest static and rename it to loop_rtrequest

No functional change.
 1.228  08-Oct-2016  joerg Since IFF_MULTICAST's value can't be represented without implicit cast
as signed short, make if_flags unsigned.
 1.227  03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.226  21-Sep-2016  roy Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.
 1.225  10-Aug-2016  kre On the first day (that being the eighth day of the eighth month,) the
building was completed only to discover that within there lay havoc.

On the second day all just groaned and moaned, and it must be someone
else's problen.

On the third day, St. Martin stepped in and traced the culprit, which
provided inspiration, and a correction was made.

Forevermore all were agog at just how such a trivial thing could do
so much damage...


OK... to be a little less vague. The loopback interface is a truly
"special" thing, and rump knew that - and treated it very specially.
Unfortunately, when the loopback interface is changed, and rump does
not keep up, bad things happen.

This (overall) might, or might not, be the correct fix - but for now
it appears to work. If someone, sometime, finds a better way to
deal with the issues of the loopback interfaces true majesty, feel
free to revert this and do it another way.
 1.224  01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.223  01-Aug-2016  ozaki-r Revert "Revert part of "Switch the address list of intefaces to pslist(9)" (r1.220)"

netstat now uses sysctl instead of kvm(3) to get address information from
the kernel. So we can avoid the issue introduced by the reverted commit
(PR kern/51325) by updating netstat with the latest source code.
 1.222  22-Jul-2016  knakahara Toward NET_MPSAFE-on in future, if_snd uses if_snd->ifq_lock by default.

That can reduce confusing difference between NET_MPSAFE on and off.
 1.221  11-Jul-2016  ozaki-r branches: 1.221.2;
Revert part of "Switch the address list of intefaces to pslist(9)" (r1.220)

Reverting the whole change set just messes up many files uselessly
because changes to them (except for if.h) are proper.

- Remove ifa_pslist_entry that breaks kvm(3) users (e.g., netstat -ia)
- Change IFADDR_{READER,WRITER}_* macros to use old IFADDR_* (or just NOP)
for now

Fix PR kern/51325
 1.220  07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.219  30-Jun-2016  ozaki-r Get rid of duplicate prototype of ifafree
 1.218  28-Jun-2016  ozaki-r Introduce if_is_deactivated

Checking ifp->if_output == if_nulloutput is too implicit.

No functional change.
 1.217  27-Jun-2016  knakahara fix spelling mistake pointed out by roy@n.o
 1.216  27-Jun-2016  knakahara reduce link state changing softint if it is not required

ok by ozaki-r@n.o
 1.215  22-Jun-2016  knakahara fix: locking about IFQ_ENQUEUE and ALTQ

- If NET_MPSAFE is not defined, IFQ_LOCK is nop. Currently, that means
IFQ_ENQUEUE() of some paths such as bridge_enqueue() is called parallel
wrongly.
- If ALTQ is enabled, Tx processing should call if_transmit() (= IFQ_ENQUEUE
+ ifp->if_start()) instead of ifp->if_transmit() to call ALTQ_ENQUEUE()
and ALTQ_DEQUEUE().
Furthermore, ALTQ processing is always required KERNEL_LOCK currently.
 1.214  21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.213  21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.212  21-Jun-2016  ozaki-r Introduce if_index_t
 1.211  20-Jun-2016  knakahara introduce if_start_lock()

if_start_lock() calls ifp->if_start() holding KERNEL_LOCK if it is required.
 1.210  20-Jun-2016  knakahara fix: i386 build failure
 1.209  20-Jun-2016  knakahara introduce if_output_lock()

if_output_lock() calls ifp->if_output() holding KERNEL_LOCK if it is required.
 1.208  20-Jun-2016  knakahara introduce if_extflags (was if__pad1)
 1.207  10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.206  16-May-2016  ozaki-r Replace ifnet_lock with if_get and if_put

ifnet_lock is a dedicated method to safely destroy an interface over running
ioctl operations. Replace it with a more generic mechanism using psref(9).
 1.205  16-May-2016  ozaki-r Introduce if_get, if_get_byindex and if_put

The new API enables to obtain an ifnet object with protected by psref(9).
It is intended to be used where an obtained ifnet object is used over
sleepable operations.
 1.204  12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.203  28-Apr-2016  knakahara introduce new ifnet MP-scalable sending interface "if_transmit".
 1.202  28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.201  20-Apr-2016  knakahara IFQ_ENQUEUE refactor (3/3) : eliminate pktattr argument from IFQ_ENQUEUE caller
 1.200  20-Apr-2016  knakahara IFQ_ENQUEUE refactor (2/3) : eliminate pktattr argument from altq implemantation
 1.199  20-Apr-2016  knakahara IFQ_ENQUEUE refactor (1/3) : add altq_pktattr fields to m_pkthdr

Reviewed by joerg@n.o and tls@n.o, thanks.
 1.198  19-Feb-2016  roy Implement a queue for if_link_state_change() calls to fix a race condition
introduced in the prior patch.

The queue has capacity to store 8 link state changes, if it overflows then
the oldest state change is lost, but the oldest DOWN state change is
preserved to ensure any subsequent UP state changes reflect properly.

Because there are only 3 states to queue, the queue itself is implemented
by storing 2-bit numbers in a bigger one.
To increase the size of the queue, just increase the size of the backing
store to a bigger number.
 1.197  16-Feb-2016  ozaki-r Remove workaround for GATEWAY

The workaround was introduced because lltable/llentry uses rwlock
but it may be executed in hardware interrupt due to fast forward.
Now we don't run fast forward in hardware interrupt anymore, so
we can remove the workaround.
 1.196  15-Feb-2016  ozaki-r Run if_link_state_change in softint

if_link_state_change can execute the network stack that is expected to
not run in hardware interrupt (at least now), however network drivers
may call it in hardware interrupt. Avoid that by introducing a new
softint for if_link_state_change.

The original patch is provided by mlelstv@ and tweaked a bit by me.

Should fix PR kern/50602.
 1.195  09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.194  04-Jan-2016  ozaki-r Fix the destruction of the afdata lock

Pointed out by mlelstv@
 1.193  02-Oct-2015  ozaki-r Fix typo
 1.192  30-Sep-2015  ozaki-r Make GATEWAY (fastforward) work again

With GATEWAY (fastforward), the whole forwarding processing runs in
hardware interrupt context. So we cannot use rwlock for lltable and
llentry in that case.

This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
when GATEWAY is enabled. We need to tweak locking only around rtree
in lltable_free. Other than that, what we need to do is to change macros
for locks.

I hope fastforward runs in softint some day in the future...
 1.191  31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.190  18-May-2015  martin Implement SIOCIFGCLONERS for netbsd32, so ifconfig -C works.
 1.189  02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.188  20-Apr-2015  roy Introduce p2p_rtrequest() so that IFF_POINTOPOINT interfaces can work
with RTF_LOCAL.
Fixes PR kern/49829.
 1.187  07-Apr-2015  roy Move in6if_do_dad() to if_do_dad() as the routine is not INET6 specific
and could equally be used by INET.
 1.186  03-Apr-2015  msaitoh Use 1000ULL to prevent integer overflow (for IF_Gbps(10)). Same as OpenBSD.
 1.185  16-Jan-2015  ozaki-r Remove an outdated snippet for NET_MPSAFE
 1.184  15-Dec-2014  ozaki-r Introduce if_initialize and if_register as an alternative to if_attach

if_attach initializes an ifnet object and registers it to the system
(e.g., ifnet_list), however, if_attach doesn't complete the
initialization and the rest of it will be done by if_alloc_sadl
that is normally directly called by device drivers or called via
functions like ether_ifattach. So there is a race between
if_attach and if_alloc_sadl (A half-baked ifnet object may be
accessed, for example, via ioctl between them).

The aim of this fix is to register an initializing ifnet object
after completing its initializations. To this end, this fix
separates if_attach into an initialization part (if_initialize)
and a registration part (if_register) and call the latter after
if_alloc_sadl (ether_ifattach). So a typical usage of the two
new APIs is like this:

if_initialize(ifp); // was if_attach
ether_ifattach(ifp, enaddr);
if_register(ifp);

Nonetheless, changing every drivers to do so at once isn't
feasible. So we keep if_attach working as it used to be and
will change only some drivers that we need at this point.
Once we know the fix really works well, we'll change all
the others.

Some more information of the fix can be found here:
http://mail-index.netbsd.org/tech-kern/2014/12/10/msg018242.html

No objection on tech-kern and tech-net.
 1.183  02-Dec-2014  ozaki-r Revert "Pull if_drain routine out of m_reclaim"

The commit broke dlopen()'d rumpnet on platforms where ld.so does not
override weak aliases (e.g. musl, Solaris, potentially OS X, ...).

Requested by pooka@.
 1.182  01-Dec-2014  ozaki-r Make more functions static

No functional change.
 1.181  28-Nov-2014  ozaki-r branches: 1.181.2;
Remove dead codes and make if_free_sadl static

No functional change.
 1.180  27-Nov-2014  ozaki-r Pull if_drain routine out of m_reclaim

It's if-specific and should be in if.c.

No functional change.
 1.179  26-Nov-2014  ozaki-r Change if_slowtimo_ch to a pointer

One benefit to do so is to reduce memory used for struct callout;
we can avoid to allocate struct callout for interfaces that don't
use callout.

Requested by uebayasi@.
 1.178  26-Nov-2014  ozaki-r Create if_slowtimo (if_watchdog) callout for each interface

This change is to obviate the need to run if_slowtimo callbacks that
may sleep inside IFNET_FOREACH. And also by this change we can turn
on MPSAFE of callouts individually.

Discussed with uebayasi@ and riastradh@.
 1.177  26-Nov-2014  ozaki-r Rename if_watchdog to if_slowtimo

if_watchdog callbacks do a little more than what "watchdog" suggests.

Discussed with uebayasi@ (the idea originally from openbsd-tech).
 1.176  26-Nov-2014  ozaki-r Make if_slowtimo static
 1.175  09-Sep-2014  rmind Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.174  31-Jul-2014  ozaki-r branches: 1.174.2;
Define IFADDR_FOREACH_SAFE for on-the-fly element removal in a loop

We have to use it when we purge an address element in an ifaddr loop.

This change restores the original behavior that was accidentally degraded.
 1.173  31-Jul-2014  ozaki-r Define IFNET_EMPTY() and replace !IFNET_FIRST() with it

No functional change.
 1.172  16-Jul-2014  ozaki-r Kill void * for bridge in struct ifnet

No functional change.
 1.171  14-Jul-2014  ozaki-r Make bridge MPSAFE

- Introduce BRIDGE_MPSAFE
- It's enabled only when NET_MPSAFE is defined
in if.h or the kernel config
- Add iflist and rtlist mutex locks
- Locking iflist is performance sensitive,
so it's not used when !BRIDGE_MPSAFE
- Add bif object reference counting
- It enables fine-grain locking for bridge member lists
by allowing to not hold a lock during touching a bif
- bridge_release_member is added to decrement the
reference count
- A condition variable is added to do bridge_delete_member
gracefully
- Add if_bridgeif to ifnet
- It's a shortcut to a bif object of a bridge member
- It reduces a bif lookup cost and so lock contention on iflist
- Make bridgestp MPSAFE too
 1.170  01-Jul-2014  ozaki-r Unbreak lib/libc/net/getifaddrs.c

--- getifaddrs.o ---
In file included from /tmp/bracket/build/2014.07.01.10.35.18-i386/src/lib/libc/net/getifaddrs.c:39:0:
/tmp/bracket/build/2014.07.01.10.35.18-i386/src/sys/net/if.h:208:2: error: unknown type name 'kmutex_t'
kmutex_t *ifq_lock;
^
 1.169  01-Jul-2014  ozaki-r Lock IFQ operations when NET_MPSAFE

- Introduce NET_MPSAFE
- not defined by default
- Add ifq_lock to protect ifnet#if_snd
- Initialize ifq_lock and lock IFQ operations
when NET_MPSAFE

When NET_MPSAFE isn't defined, this modification
doesn't change its behavior and adds trivial
performance overheads.

Discussed with matt@ on tech-net
 1.168  01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.167  16-Jun-2014  ozaki-r Include pktqueue.h only if _KERNEL
 1.166  16-Jun-2014  ozaki-r Move sysctl_pktq_{maxlen,count} to pktqueue.c and make them global

They will be used by bridge.

ok rmind@
 1.165  18-May-2014  rmind - Move ifnet_list (and lo0ifp while here) under #ifdef _KERNEL.
- Make ifindex2ifnet, if_indexlim and some other variables static.
- Move if_index generation into its own function.
- if_alloc/if_free: replace malloc with kmem.
 1.164  17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.163  26-Apr-2014  pooka Decouple sockets linkage from interface code by making ifioctl() a pointer.
 1.162  17-Apr-2014  christos add LRO
 1.161  12-Mar-2014  pooka branches: 1.161.2;
add a mask for valid capabilities

also add a comment stating why capabilities start from 0x80
 1.160  25-Jan-2014  christos add a lint comment
 1.159  28-Oct-2013  christos add an alias for the linux name for the interface index
 1.158  05-Oct-2013  christos fix the source too, not just the doc.
 1.157  05-Oct-2013  christos Add SIOCGIFINDEX from Ty Sarna and Matthew Sporleder.
 1.156  29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.155  25-Oct-2012  msaitoh branches: 1.155.2;
Move the prototype definition of ether_input() from if.h to if_ether.h.
 1.154  25-Oct-2011  dyoung branches: 1.154.2; 1.154.8; 1.154.12;
Document the ifioctl locking in comments.

Add a missing percpu_free(9) call.
 1.153  19-Oct-2011  dyoung Fix userland compilation: pull the ifioctl lock-related data members
into a struct ifnet_lock that the ifnet has a pointer to. In a
non-_KERNEL environment, don't #include <sys/percpu.h> et cetera, and
don't define the struct ifnet_lock but *do* declare it.
 1.152  19-Oct-2011  dyoung Start to untangle the ifnet ioctls mess.

Add ifnet functions, if_mcast_op(), if_flags_set(), and if_addr_init()
for adding/deleting multicast addresses, modifying the if_flags,
and initializing local/remote addresses. Make ifpromisc() use
if_flags_set(). Protocols and network drivers should use these
instead of ifp->if_ioctl() calls. Subsequent commits will
replace ifp->if_ioctl(SIOCADDMULTI| SIOCDELMULTI| SIOCSIFDSTADDR|
SIOCINITIFADDR| SIOCSIFFLAGS) calls with calls to the new functions.

Use a mutex(9) to synchronize ifp->if_ioctl() calls originating in
userland. Also synchronize ifp->if_ioctl() calls with ifnet detachment
and reclamation.
 1.151  12-Aug-2011  dyoung Declare if_free().
 1.150  01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.149  18-Jan-2011  rmind branches: 1.149.2;
NPF checkpoint:
- Add the concept of rule procedure: separate normalization, logging and
potentially other functions from the rule structure. Rule procedure can be
shared amongst the rules. Separation is both at kernel level (npf_rproc_t)
and configuration ("procedure" + "apply").
- Fix portmap sharing for NAT policy.
- Update TCP state tracking logic. Use TCP FSM definitions.
- Add if_byindex(), OK by matt@. Use in logging for the lookup.
- Fix traceroute ALG and many other bugs; misc clean-up.
 1.148  15-Nov-2010  pooka branches: 1.148.2;
Implement ifconfig linkstr as proposed on tech-net.
 1.147  20-Oct-2010  pooka Remove XXX comment with the text "going away soon". It was added
in September 1989 -- I think we passed "soon" around last week.
 1.146  17-Jan-2010  pooka branches: 1.146.2; 1.146.4;
Forward declare struct bpf_if and use that as the type for bpf_if
instead of "void *". Buys us oo times the type-safety for 0 times
the price.
(no functional change)
 1.145  05-Oct-2009  dyoung Replace u_quad_t with uint64_t. u_quad_t is just a typedef for
uint64_t, so no ABI/API breakage will result from this change.
 1.144  11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.143  13-Aug-2009  dyoung Use sysctl(9) to expose to userland each interface transmission
queue's maximum length, current length, and number of drops. E.g.,

% sysctl net.interfaces.bnx0
net.interfaces.bnx0.sndq.len = 0
net.interfaces.bnx0.sndq.maxlen = 509
net.interfaces.bnx0.sndq.drops = 0

Let userland adjust the maximum queue length.

While I'm here, add a 64-bit generation number, if_index_gen, to
ifnet; the pair [ifp->if_index, ifp->if_index_gen] can serve to
identify an ifnet for the lifetime of the system. I will use this
in an upcoming change.

Ok matt@.
 1.142  11-Jan-2009  christos merge christos-time_t
 1.141  07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.140  24-Oct-2008  dyoung branches: 1.140.2; 1.140.8;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.139  18-Jun-2008  yamt branches: 1.139.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@
 1.138  15-Jun-2008  christos - add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.137  13-May-2008  dyoung branches: 1.137.2;
Let us call ioctl(SIOC[ADG]LIFADDR) with a link-layer address on
an AF_LINK socket, only, to be consistent with SIOC[ADG]LIFADDR
behavior on AF_INET and AF_INET6 sockets. Let us create AF_LINK
sockets for this purpose. Note that most operations on AF_LINK
sockets are not implemented.
 1.136  11-May-2008  dyoung Add kernel support for adding/removing link-layer addresses using
SIOCALIFADDR AND SIOCDLIFADDR, respectively. Corresponding
ifconfig(8) changes are coming soon.
 1.135  28-Apr-2008  martin branches: 1.135.2;
Remove clause 3 and 4 from TNF licenses
 1.134  07-Feb-2008  dyoung branches: 1.134.6; 1.134.8; 1.134.10; 1.134.12;
Start patching up the kernel so that a network driver always has
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.

First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
 1.133  22-Jan-2008  dyoung Take two steps toward adding and deleting link-layer addresses.

1 Extract subroutine if_dl_create() from if_alloc_sadl().
if_dl_create() allocates a link-layer ifaddr.

2 Extract subroutine ifioctl_common() from ifioctl(). ifioctl_common()
will be the basis for an ifnet "superclass" whose functions
drivers may inherit. Very simple drivers may set ifnet->if_ioctl
= ifioctl_common. More sophisticated drivers will set ifnet->if_ioctl
= driver_ioctl. driver_ioctl() will call ifioctl_common() to
re-use the common code.
 1.132  20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.131  20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.130  06-Dec-2007  dyoung branches: 1.130.4;
Add ifa_insert() and ifa_remove() that add/remove an ifaddr to/from
an interface and increase/decrease its reference count.
 1.129  05-Dec-2007  dyoung Extract common code, creating a subroutine if_purgeaddrs(ifp,
family, purgeaddr) which applies function `purgeaddr' to each
address on `ifp' belonging to `family'.
 1.128  05-Dec-2007  dyoung Add IFNET_FIRST(), IFNET_NEXT(), IFADDR_FIRST(), IFADDR_NEXT(),
IFADDR_EMPTY().

Call the IF{NET,ADDR}_FOREACH() macro arguments __ifp and __ifa
instead of ifp and ifa.
 1.127  13-Sep-2007  gdt branches: 1.127.6; 1.127.8;
Add a define for the ifru_space union member.

Copy the entire sockaddr to the buffer to be written to user space,
according to its length, not just the part that fits in struct
sockaddr.

This fixes the 'bad MAC address' problem in dhclient.
 1.126  02-Sep-2007  dyoung Protect userland from ifreq_getaddr() w/ #ifdef _KERNEL.
 1.125  31-Aug-2007  dyoung Per discussion in 30 May 2007 on tech-net, add accessors for
ifreq->ifr_addr, ifreq_getaddr() and ifreq_setaddr().
 1.124  29-May-2007  christos branches: 1.124.2; 1.124.6; 1.124.8;
Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.123  04-Mar-2007  christos branches: 1.123.2; 1.123.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.122  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.121  23-Nov-2006  yamt branches: 1.121.4;
implement ipv6 TSO.
partly from Matthias Scheler. tested by him.
 1.120  13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.119  30-Aug-2006  christos branches: 1.119.2; 1.119.4;
fully initialize IF_CLONE_INITIALIZER
 1.118  25-Jun-2006  yamt add a comment on if_agrprivate.
 1.117  23-Jun-2006  drochner remove dependency on "agr" to make "struct ifnet" independant of the
kernel configuration, avoids kernel/userland mismatches, ok by christos
 1.116  18-May-2006  liamjfoy branches: 1.116.4;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.115  16-Mar-2006  christos branches: 1.115.2;
Remove duplicate and slightly different declaration of ether_sprintf, which
really should be in if_ether.h like all the other ether_ functions.
 1.114  11-Dec-2005  thorpej branches: 1.114.4; 1.114.6; 1.114.8; 1.114.10;
ANSI function decls and application of static.
 1.113  11-Dec-2005  christos merge ktrace-lwp.
 1.112  06-Dec-2005  christos make the ALTQ macros statement-line, by wrapping them in do {} while (0)
 1.111  27-Jul-2005  dyoung Add members ifr_buf, ifr_buflen to ifreq for specifying the location
and size of a userland buffer. The kernel shall not copyout more
than ifr_buflen bytes to ifr_buf. For future ioctls that use
ifr_buf and ifr_buflen instead of ifr_data, the kernel can return
a larger struct in the future than when the ioctl is introduced,
without breaking ABI compatibility, provided that the size, order,
and semantics of the fields at the front of the struct does not
change.
 1.110  22-Jun-2005  dyoung branches: 1.110.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.
 1.109  19-Jun-2005  peter Use 'pattr' consistently in the IFQ_* macros.
 1.108  02-May-2005  yamt split IFCAP_CSUM_xxx to IFCAP_CSUM_xxx_Rx and IFCAP_CSUM_xxx_Tx.
 1.107  31-Mar-2005  christos factor out the interface queueing code into two functions. One used by
the non point-to-point interfaces that has one queue, and one used by
the point to point interfaces that has two queues. No functional changes.
XXX: The ALTQ stuff makes the code ugly.
XXX: More cleanup to come
 1.106  20-Mar-2005  agc Fix the spelling of Bill Studenmund's name - noticed from the licences
on the Sony PSP as found in:

http://www.scei.co.jp/psp-license/pspnet.txt
 1.105  20-Mar-2005  thorpej Define IFFBITS and IFCAPBITS here in <net/if.h>. Taken from ifconfig.
 1.104  18-Mar-2005  yamt add agr(4), a pseudo network device driver for link aggregation.
 1.103  06-Mar-2005  matt Add beginning of TCP Segment Offload support.
 1.102  28-Feb-2005  jonathan Increase default value for IFQ_MAXLEN from 50 to 256.

The value of 50 dates back to 4.3BSD and 10Mbit interfaces.
Gigabit interfaces are 100x faster, and by observation, when heavy
interrupt mitigation is enabled, gigabit interfaces can enqueue 40 packets
or more in a single hardware interrupt. So IFQ_MAXLEN of 256 is adequate
for at least four gigabit interfaces.

Increasing IFQ_MAXLEN discussed and approved, in priniciple, circa Apr 2004.
The value is sysctl'able, so the default is no longer so critical,
but (imho) best to tune for high-performane systems by default.
 1.101  26-Feb-2005  perry nuke trailing whitespace
 1.100  24-Jan-2005  matt branches: 1.100.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.99  08-Jan-2005  yamt branches: 1.99.2;
constify broadcastaddr.
 1.98  04-Dec-2004  peter Change ifc_destroy to return an int instead of void, so that it
can pass back errors to ifconfig.
 1.97  04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.96  21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.95  10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.94  28-Nov-2003  keihan s/netbsd.org/NetBSD.org/g
 1.93  10-Nov-2003  jonathan Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.
 1.92  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.91  03-Jul-2003  ragge Make IFQ_MAXLEN possible to set as an config-file option.
 1.90  29-Jun-2003  fvdl branches: 1.90.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.89  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.88  30-Apr-2003  bjh21 Expose IF_NAMESIZE for POSIX and X/Open applications.
 1.87  28-Apr-2003  bjh21 Add a new feature-test macro, _NETBSD_SOURCE. If this is defined
by the application, all NetBSD interfaces are made visible, even
if some other feature-test macro (like _POSIX_C_SOURCE) is defined.
<sys/featuretest.h> defined _NETBSD_SOURCE if none of _ANSI_SOURCE,
_POSIX_C_SOURCE and _XOPEN_SOURCE is defined, so as to preserve
existing behaviour.

This has two major advantages:
+ Programs that require non-POSIX facilities but define _POSIX_C_SOURCE
can trivially be overruled by putting -D_NETBSD_SOURCE in their CFLAGS.
+ It makes most of the #ifs simpler, in that they're all now ORs of the
various macros, rather than having checks for (!defined(_ANSI_SOURCE) ||
!defined(_POSIX_C_SOURCE) || !defined(_XOPEN_SOURCE)) all over the place.

I've tried not to change the semantics of the headers in any case where
_NETBSD_SOURCE wasn't defined, but there were some places where the
current semantics were clearly mad, and retaining them was harder than
correcting them. In particular, I've mostly normalised things so that
_ANSI_SOURCE gets you the smallest set of stuff, then _POSIX_C_SOURCE,
_XOPEN_SOURCE and _NETBSD_SOURCE in that order.

Tested by building for vax, encouraged by thorpej, and uncontested in
tech-userlevel for a week.
 1.86  05-Mar-2003  christos Fix the fallout from potr malloc changes
 1.85  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.84  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.83  02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.82  26-Aug-2002  thorpej Fix signed/unsigned comparison warnings from GCC 3.3.
 1.81  09-Aug-2002  soren <net/if.h> needs <sys/socket.h> for struct sockaddr.
PR kern/3377 from der Mouse.
 1.80  23-Jun-2002  itojun g/c last bit of old ipv6 prefix management.
 1.79  11-Jun-2002  pooka s/splimp/splnet/ in comment
 1.78  27-May-2002  itojun re-scan all ifnet after domaininit() for if_afdata initialization.
 1.77  27-May-2002  itojun framework to add af-dependent data structure to struct ifnet.
as discussed at bsd-api-discuss. sync w/kame
 1.76  23-May-2002  matt Add SIOCGIFDATA and SIOCZIFDATA ioctl's to get interface data. (the Z
variant also zeroes the counters after copying them). In ifunit, add
support for dealing all numeric ifname by treating them as an ifindex
which is used to look up the interface.
 1.75  17-Mar-2002  simonb branches: 1.75.4; 1.75.6;
Make the 'ifnet' variable an extern and declare it in if.c.
 1.74  17-Sep-2001  thorpej Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.73  14-Jun-2001  itojun branches: 1.73.2; 1.73.4;
fix comment on ifi_lastchange, for 1.4 if_data
 1.72  14-Jun-2001  itojun update comment on if_lastchange
 1.71  11-Jun-2001  wiz Fix various misspellings of compatible/compatibility.
 1.70  02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.69  30-May-2001  mrg use _KERNEL_OPT
 1.68  10-Apr-2001  enami fix possible typo in comment.
 1.67  10-Apr-2001  thorpej Add a PFIL_HOOKS filtering point to every network interface.
 1.66  07-Apr-2001  thorpej ether_*() functions belong in if_ether.h, not if.h.
 1.65  17-Jan-2001  itojun branches: 1.65.2;
move forward decl of rt_addrinfo upwards.
 1.64  17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.63  17-Jan-2001  thorpej Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.62  23-Dec-2000  thorpej Fix a silly bug in the ALTQ version of IFQ_DEQUEUE().
 1.61  18-Dec-2000  thorpej Add an "ifr_dlt" alias for the union in struct ifreq.
 1.60  18-Dec-2000  thorpej Always pull in DLT_* constants.
 1.59  18-Dec-2000  thorpej Add a if_dlt member, used so that userland can query the DLT_* of an
interface without having to first attach it to a bpfdesc.
 1.58  18-Dec-2000  thorpej Commit to the ALTQ glue.
 1.57  14-Dec-2000  thorpej Fix braino in IF_PURGE().
 1.56  14-Dec-2000  thorpej Oops, forgot IFQ_POLL() in the ALTQ case.
 1.55  13-Dec-2000  thorpej First step at integrating ALTQ -- IFQ_*() glue macros that select
old-style queueing or ALTQ based on a compile time option.
 1.54  11-Oct-2000  thorpej Change the if_reset vector to if_init, and add an if_stop. if_stop
also takes an argument indicating whether or not the interface should
also be disabled (i.e. power removed, resources freed, etc.)
 1.53  20-Jul-2000  thorpej Add a SIOCGIFCLONERS ioctl, which fetches a list of network
interface cloners from the kernel.
 1.52  04-Jul-2000  thorpej Don't allow IFF_PROMISC to be changed directly by userspace. It
interferes with the reference counting done by ifpromisc(), and is
essentially impossible to get the semantics correct if we allow this
flag to be directly toggled.

No programs should really be affected by this; IFF_PROMISC is basically
useless without bpf, anyway, and bpf still provides a way to set
promiscuous mode on an interface (which uses ifpromisc()).
 1.51  02-Jul-2000  thorpej Add the notion of "cloning" of network pseudo-interface (e.g. `gif').
This allows them to be created and destroyed on the fly via ifconfig(8),
rather than specifying the count in the kernel configuration file.
 1.50  15-May-2000  itojun branches: 1.50.4;
backout previous (packed attribute to struct ifreq)
 1.49  15-May-2000  itojun add packed attribute to struct ifreq. this should avoid unaligned access
while parsing SIOCGIFCONF, on alignment-picky archs.
 1.48  29-Mar-2000  simonb Extern the declarations of ifindex2ifnet and if_index.
 1.47  22-Mar-2000  itojun remove if_withname, which was merged in by mistake during KAME merge.
 1.46  06-Mar-2000  thorpej - Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.
 1.45  06-Mar-2000  kleink Make pre-1.5 compatibility structures being defined conditional on _KERNEL
as well.
 1.44  01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.43  13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.42  19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.41  26-Oct-1999  wrstuden Up the size of the ifa_flags and ifa_refcnt from shorts to ints. Now will
deal correctly with more than 32767 routes out an interface.

Should close PR 7148 regarding problems when ifs_refcnt overflows.

Bump kernel version from 1.4L to 1.4M.
 1.40  29-Sep-1999  thorpej branches: 1.40.2; 1.40.4; 1.40.6;
const poison ifunit().
 1.39  21-Sep-1999  matt Add a ifru_value (unsigned int) as a generic value.
 1.38  03-Jul-1999  kleink Add namespace protection, using XNS5.2 D2.0 as a reference (which effectively
boils down to not making anything but the if_nameindex(3) interfaces available
to _XOPEN_SOURCE).
 1.37  01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.36  18-May-1999  thorpej Rework layer 2 protocol input routines. Instead of calling e.g. ether_input()
directly, call the function pointer (*if_input)(ifp, m). The input routine
expects the packet header to be at the head of the packet, and will adjust
as necessary. Privatize the layer 2 input and output routines, allowing
*_ifattach() to set them up as appropriate.
 1.35  27-Mar-1999  aidan branches: 1.35.2; 1.35.4; 1.35.6;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.
 1.34  10-Mar-1999  thorpej Const poison ether_ifattach().
 1.33  10-Mar-1999  thorpej Const poison ether_sprintf().
 1.32  22-May-1998  matt branches: 1.32.6;
Add an if_drain to the ifnet structure (call when the system is low
on mbufs). Add code to m_reclaim to call if_drain in each ifnet
that has one set. Remove register from declarations.
 1.31  14-May-1998  kml Driver for Essential Communications' RoadRunner HIPPI (800 Mb/sec network)
card. With some modification, this could probably also work for their
Gigabit Ethernet card based on the same chipset...
 1.30  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.29  02-Oct-1997  is Reimplement a test for broadcast addresses advertized, which was left out
when rewriting the ARP system.
 1.28  08-Apr-1997  chuck branches: 1.28.4;
prevent multiple inclusions
 1.27  17-Mar-1997  thorpej BSD/OS-style network interface media selection, implemented by
Jonathan Stone and myself. Many thanks to Matt Thomas for providing
the information necessary to implement this interface, and for helping
to shake out the bugs.
 1.26  15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.25  15-Jan-1997  gwr branches: 1.25.2;
fix alignment again for m68k
 1.24  13-Jun-1996  cgd branches: 1.24.2;
add an ifru_mtu member to the union in 'struct ifreq', and add a
#define so that ifr_mtu accesses that. MTU shouldn't be overloaded
with ifr_metric, if only for clarity. Adding an MTU field to the
union hurts nothing (in fact, does not actually _change_ generated
code), and does improve clarity.
 1.23  07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.22  26-Feb-1996  mrg two more local addr changes, all done differently now (idea from charles)
 1.21  17-Feb-1996  pk struct ifaliasreq: adapt nomenclature to protocol specific counterparts, ie.
swap `ifra_broadaddr' and `ifra_dstaddr'.
 1.20  13-Feb-1996  christos Net prototypes
 1.19  19-Jun-1995  cgd oops; export that head definition to non-kernel code.
 1.18  19-Jun-1995  cgd define a type for the ifnet queue's head.
 1.17  12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.16  07-Apr-1995  mycroft if_start and if_watchdog should return void.
 1.15  26-Mar-1995  jtc KERNEL -> _KERNEL
 1.14  08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.13  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.12  19-Oct-1994  cgd fix pr 528; don't define struct if_data inside another structure.
 1.11  26-Jul-1994  cgd kill vax code, at ragge's requeust.
 1.10  29-Jun-1994  cgd branches: 1.10.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8  16-Feb-1994  mycroft IFF_ALLMULTI is not externally settable.
 1.7  10-Feb-1994  mycroft if_init and if_done are not actually used; no point in having them at all.
 1.6  10-Dec-1993  cgd slight fix to last
 1.5  10-Dec-1993  cgd the IFF_MULTICAST constant should always be defined. also,
move IFF_LLC* -> IFF_LINK*; they were misnamed.
 1.4  06-Dec-1993  hpeyerl multicast support.
From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3  20-May-1993  cgd branches: 1.3.4;
add rcs ids to everything, and clean up headers
 1.2  19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.4  10-Dec-1993  cgd LLC -> LINK
 1.3.4.3  14-Nov-1993  mycroft Canonicalize all #includes.
 1.3.4.2  03-Nov-1993  mycroft if_init and if_done aren't actually used anywhere; nuke them. if_start and
if_watchdog return void.
 1.3.4.1  29-Oct-1993  mycroft Make if_reset #ifdef vax. (Note: this shifts struct ifnet; rebuild your
kernels from scratch.)
 1.10.2.1  14-Aug-1994  mycroft update from trunk (to remove ancient vax stuff)
 1.24.2.1  18-Jan-1997  thorpej Update from trunk.
 1.25.2.1  07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.28.4.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.32.6.1  11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.35.6.2  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.35.6.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.35.4.3  02-Aug-1999  thorpej Update from trunk.
 1.35.4.2  01-Jul-1999  thorpej Sync w/ -current.
 1.35.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.35.2.1  11-May-2000  he Pull up revision 1.46 (partial, via patch, requested by jhawk):
Add a driver for ``wi'', Lucent "Orinoco"/Wavelan.
 1.40.6.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.40.4.1  15-Nov-1999  fvdl Sync with -current
 1.40.2.5  21-Apr-2001  bouyer Sync with HEAD
 1.40.2.4  11-Feb-2001  bouyer Sync with HEAD.
 1.40.2.3  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.40.2.2  05-Jan-2001  bouyer Sync with HEAD
 1.40.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.50.4.1  31-Dec-2000  jhawk Pull up revision 1.51, 1.53 (requested by bouyer):
Support cloning of network pseudo-interfaces.
 1.65.2.9  11-Nov-2002  nathanw Catch up to -current
 1.65.2.8  27-Aug-2002  nathanw Catch up to -current.
 1.65.2.7  13-Aug-2002  nathanw Catch up to -current.
 1.65.2.6  01-Aug-2002  nathanw Catch up to -current.
 1.65.2.5  20-Jun-2002  nathanw Catch up to -current.
 1.65.2.4  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.65.2.3  21-Sep-2001  nathanw Catch up to -current.
 1.65.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.65.2.1  09-Apr-2001  nathanw Catch up with -current.
 1.73.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.73.2.3  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.73.2.2  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.73.2.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.75.6.1  01-Nov-2002  tron Pull up revision 1.76 (requested by martin in ticket #32):
Add SIOCGIFDATA and SIOCZIFDATA ioctl's to get interface data. (the Z
variant also zeroes the counters after copying them). In ifunit, add
support for dealing all numeric ifname by treating them as an ifindex
which is used to look up the interface.
 1.75.4.4  29-Aug-2002  gehenna catch up with -current.
 1.75.4.3  15-Jul-2002  gehenna catch up with -current.
 1.75.4.2  20-Jun-2002  gehenna catch up with -current.
 1.75.4.1  30-May-2002  gehenna Catch up with -current.
 1.90.2.12  11-Dec-2005  christos Sync with head.
 1.90.2.11  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.90.2.10  01-Apr-2005  skrll Sync with HEAD.
 1.90.2.9  08-Mar-2005  skrll Sync with HEAD.
 1.90.2.8  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.90.2.7  04-Feb-2005  skrll Sync with HEAD.
 1.90.2.6  17-Jan-2005  skrll Sync with HEAD.
 1.90.2.5  18-Dec-2004  skrll Sync with HEAD.
 1.90.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.90.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.90.2.2  03-Aug-2004  skrll Sync with HEAD
 1.90.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.99.2.1  29-Apr-2005  kent sync with -current
 1.100.2.2  26-Mar-2005  yamt sync with head.
 1.100.2.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.110.2.9  11-Feb-2008  yamt sync with head.
 1.110.2.8  04-Feb-2008  yamt sync with head.
 1.110.2.7  21-Jan-2008  yamt sync with head
 1.110.2.6  07-Dec-2007  yamt sync with head
 1.110.2.5  27-Oct-2007  yamt sync with head.
 1.110.2.4  03-Sep-2007  yamt sync with head.
 1.110.2.3  26-Feb-2007  yamt sync with head.
 1.110.2.2  30-Dec-2006  yamt sync with head.
 1.110.2.1  21-Jun-2006  yamt sync with head.
 1.114.10.1  19-Apr-2006  elad sync with head.
 1.114.8.4  03-Sep-2006  yamt sync with head.
 1.114.8.3  26-Jun-2006  yamt sync with head.
 1.114.8.2  24-May-2006  yamt sync with head.
 1.114.8.1  01-Apr-2006  yamt sync with head.
 1.114.6.2  01-Jun-2006  kardel Sync with head.
 1.114.6.1  22-Apr-2006  simonb Sync with head.
 1.114.4.1  09-Sep-2006  rpaulo sync with head
 1.115.2.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.116.4.1  13-Jul-2006  gdamore Merge from HEAD.
 1.119.4.1  10-Dec-2006  yamt sync with head.
 1.119.2.2  12-Jan-2007  ad Sync with head.
 1.119.2.1  18-Nov-2006  ad Sync with head.
 1.121.4.2  12-Mar-2007  rmind Sync with HEAD.
 1.121.4.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.123.4.1  11-Jul-2007  mjf Sync with head.
 1.123.2.2  09-Oct-2007  ad Sync with head.
 1.123.2.1  09-Jun-2007  ad Sync with head.
 1.124.8.3  23-Mar-2008  matt sync with HEAD
 1.124.8.2  09-Jan-2008  matt sync with HEAD
 1.124.8.1  06-Nov-2007  matt sync with HEAD
 1.124.6.3  09-Dec-2007  jmcneill Sync with HEAD.
 1.124.6.2  02-Oct-2007  joerg Sync with HEAD.
 1.124.6.1  03-Sep-2007  jmcneill Sync with HEAD.
 1.124.2.1  03-Sep-2007  skrll Sync with HEAD.
 1.127.8.2  26-Dec-2007  ad Sync with head.
 1.127.8.1  08-Dec-2007  ad Sync with head.
 1.127.6.3  18-Feb-2008  mjf Sync with HEAD.
 1.127.6.2  27-Dec-2007  mjf Sync with HEAD.
 1.127.6.1  08-Dec-2007  mjf Sync with HEAD.
 1.130.4.2  23-Jan-2008  bouyer Sync with HEAD.
 1.130.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.134.12.5  11-Mar-2010  yamt sync with head
 1.134.12.4  16-Sep-2009  yamt sync with head
 1.134.12.3  19-Aug-2009  yamt sync with head.
 1.134.12.2  04-May-2009  yamt sync with head.
 1.134.12.1  16-May-2008  yamt sync with head.
 1.134.10.3  17-Jun-2008  yamt sync with head.
 1.134.10.2  18-May-2008  yamt sync with head.
 1.134.10.1  19-Apr-2008  yamt Peter Postma's work-in-progress pf import from OpenBSD 4.2.
updated to -current by me.
 1.134.8.3  09-Nov-2008  christos merge with head.
 1.134.8.2  01-Nov-2008  christos Sync with head.
 1.134.8.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.134.6.3  17-Jan-2009  mjf Sync with HEAD.
 1.134.6.2  29-Jun-2008  mjf Sync with HEAD.
 1.134.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.135.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.137.2.1  18-Jun-2008  simonb Sync with head.
 1.139.2.1  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.140.8.3  24-Dec-2011  matt Make this compile if COMPAT_14 is defined.
 1.140.8.2  13-May-2010  matt Add a spare int field to ifa_msghdr so its length is a multiple of 8.
 1.140.8.1  11-May-2010  matt A few changes that make the route interface and related sysctls 32/64 bit
independent so the netbsd32 userland can use them.
 1.140.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.146.4.1  05-Mar-2011  rmind sync with head
 1.146.2.1  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.148.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.149.2.1  08-Feb-2011  bouyer Sync with HEAD
 1.154.12.3  03-Dec-2017  jdolecek update from HEAD
 1.154.12.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.154.12.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.154.8.1  16-Apr-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #1289):
sys/net/if.h: revision 1.186
Use 1000ULL to prevent integer overflow (for IF_Gbps(10)). Same as OpenBSD.
 1.154.2.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.154.2.1  30-Oct-2012  yamt sync with head
 1.155.2.3  18-May-2014  rmind sync with head
 1.155.2.2  28-Aug-2013  rmind sync with head
 1.155.2.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.161.2.1  10-Aug-2014  tls Rebase.
 1.174.2.1  16-Apr-2015  snj Pull up following revision(s) (requested by msaitoh in ticket #693):
sys/net/if.h: revision 1.186
Use 1000ULL to prevent integer overflow (for IF_Gbps(10)). Same as OpenBSD.
 1.181.2.12  28-Aug-2017  skrll Sync with HEAD
 1.181.2.11  05-Feb-2017  skrll Sync with HEAD
 1.181.2.10  05-Dec-2016  skrll Sync with HEAD
 1.181.2.9  05-Oct-2016  skrll Sync with HEAD
 1.181.2.8  09-Jul-2016  skrll Sync with HEAD
 1.181.2.7  29-May-2016  skrll Sync with HEAD
 1.181.2.6  22-Apr-2016  skrll Sync with HEAD
 1.181.2.5  19-Mar-2016  skrll Sync with HEAD
 1.181.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.181.2.3  22-Sep-2015  skrll Sync with HEAD
 1.181.2.2  06-Jun-2015  skrll Sync with HEAD
 1.181.2.1  06-Apr-2015  skrll Sync with HEAD
 1.221.2.6  26-Apr-2017  pgoyette Sync with HEAD
 1.221.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.221.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.221.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.221.2.2  06-Aug-2016  pgoyette Sync with HEAD
 1.221.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.233.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.239.2.8  24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.239.2.7  13-Jul-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #911):

sys/kern/init_main.c: revision 1.498
sys/rump/net/lib/libnet/net_component.c: revision 1.10
sys/net/if.h: revision 1.264
sys/net/if.c: revision 1.429

Fix net.inet6.ip6.ifq node doesn't exist

The node (and child nodes) is initialized in sysctl_net_pktq_setup, but the call
of sysctl_net_pktq_setup is skipped unexpectedly.
sysctl_net_pktq_setup is skipped if in6_present is false that indicates the
netinet6 component isn't loaded on rump kernels. However the flag is
accidentally always false because the flag is turned on in in6_dom_init that is
called after if_sysctl_setup on both normal and rump kernels.

Fix the issue by moving if_sysctl_setup after in6_dom_init (domaininit on normal
kernels). This fix is ad-hoc but good enough for netbsd-8. We should refine
the initialization order of network components in the future.

Pointed out by hikaru@
 1.239.2.6  13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.239.2.5  14-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #749):

sys/net/if.h: revision 1.259
sys/net/route.c: revision 1.209
sys/net/route.h: revision 1.118
sys/net/rtsock.c: revision 1.240

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by
moving utility functions of rtentry updates from rtsock.c and ensuring
holding the rt_lock.
It also improves the atomicity of a update of a rtentry.
 1.239.2.4  11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.239.2.3  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.239.2.2  10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.239.2.1  01-Jul-2017  snj Pull up following revision(s) (requested by roy in ticket #77):
sys/net/if.h: revision 1.240
sys/netinet/if_arp.c: revision 1.253
sys/net/if.c: revision 1.395
Introduce if_get_bylla to find an interface with the active
local link address.
--
Use if_get_bylla() instead of just looking at the lla of the interface
the address belongs to.
This allows any ARP message we receieved from another interface to
be correctly dropped.
While here, move the protocol length check higher up the food chain.
 1.258.2.15  20-Oct-2018  pgoyette Sync with head
 1.258.2.14  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.258.2.13  28-Jul-2018  pgoyette Sync with HEAD
 1.258.2.12  25-Jun-2018  pgoyette Sync with HEAD
 1.258.2.11  02-May-2018  pgoyette Synch with HEAD
 1.258.2.10  22-Apr-2018  pgoyette Sync with HEAD
 1.258.2.9  16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.258.2.8  08-Mar-2018  pgoyette Handle ifconf() compat vectors
 1.258.2.7  06-Mar-2018  pgoyette Declare it correctly
 1.258.2.6  06-Mar-2018  pgoyette Declare the compat_ifconf vector, not the stub.
 1.258.2.5  06-Mar-2018  pgoyette And we need the oifreq definition here, too
 1.258.2.4  06-Mar-2018  pgoyette Better to add these required headers closer to where they're needed
 1.258.2.3  06-Mar-2018  pgoyette And another required header
 1.258.2.2  06-Mar-2018  pgoyette Include necessary header
 1.258.2.1  06-Mar-2018  pgoyette Move indirect function call vectors to if.h where they can be
found by the code that manipulates them.
 1.263.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.263.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.263.2.1  10-Jun-2019  christos Sync with HEAD
 1.274.2.1  24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.277.2.1  29-Feb-2020  ad Sync with head.
 1.289.8.1  31-May-2021  cjep sync with head
 1.289.6.2  01-Aug-2021  thorpej Sync with HEAD.
 1.289.6.1  17-Jun-2021  thorpej Sync w/ HEAD.
 1.305.2.1  11-Nov-2023  thorpej branches: 1.305.2.1.2;
Mostly de-tangle ifnet::if_snd from ifaltq, in a way that's minimally-
invasive to the ALTQ code itself.

The point of this is to lay the groundwork for future changes to ifqueue,
which among other benefits, will also hide the ALTQ ABI from drivers.
 1.305.2.1.2.5  16-Nov-2023  thorpej if_transmit_lock() and if_enqueue() are equivalent. if_enqueue() is
a better name, so collapse everything down to that and garbage-collect
if_transmit_lock().
 1.305.2.1.2.4  16-Nov-2023  thorpej IFQ_CLASSIFY() -> ifq_classify_packet().
 1.305.2.1.2.3  15-Nov-2023  thorpej Protect the ALTQ state that's exposed to the ifqueue if the ifq->ifq_lock.
This requires exposing some implementation details to ALTQ, which is guarded
by an __IFQ_PRIVATE define.
 1.305.2.1.2.2  15-Nov-2023  thorpej Rename ifq_enqueue() -> if_enqueue(), ifq_enqueue2() -> if_enqueue2().
 1.305.2.1.2.1  14-Nov-2023  thorpej New network interface output queue API.

RSS XML Feed