Home | History | Annotate | Download | only in netinet
History log of /src/sys/netinet/ip_flow.c
RevisionDateAuthorComments
 1.86  29-Jun-2024  riastradh netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.85  19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.84  15-Feb-2021  knakahara Fix build failure for options GATEWAY.
 1.83  14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.82  11-Apr-2018  maxv branches: 1.82.14;
Remove whitespaces/tabs, and one non-ASCII character.
 1.81  17-Nov-2017  ozaki-r branches: 1.81.2;
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.80  07-Feb-2017  ozaki-r branches: 1.80.6;
Add missing NULL checks for m_get_rcvif
 1.79  11-Jan-2017  ozaki-r branches: 1.79.2;
Get rid of unnecessary header inclusions
 1.78  08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.77  18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.76  01-Aug-2016  knakahara improve fast-forward performance when the number of flows exceeds IPFLOW_MAX.

In the fast-forward case, when the number of flows exceeds IPFLOW_MAX, the
performmance degraded to about 50% compared to the case less than IPFLOW_MAX
flows. This modification suppresses the degradation to 65%. Furthermore,
the modified kernel is about the same performance as the original kernel
when the number of flows is less than IPFLOW_MAX.

The original patch is implemented by ryo@n.o. Thanks.
 1.75  27-Jul-2016  knakahara remove extra ifdefs. no functional changes.

ip_flow.c becomes build target only if GATEWAY kernel option is on.
So, "#ifdef GATEWAY" in ip_flow.c is not needed.
 1.74  26-Jul-2016  ozaki-r Simplify by using atomic_swap instead of mutex

Suggested by kefren@
 1.73  11-Jul-2016  ozaki-r branches: 1.73.2;
Run timers in workqueue

Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).

Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.

Proposed on tech-net and tech-kern.
 1.72  20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.71  13-Jun-2016  knakahara eliminate unnecessary splnet
 1.70  13-Jun-2016  knakahara MP-ify fastforward to support GATEWAY kernel option.

I add "ipflow_lock" mutex in ip_flow.c and "ip6flow_lock" mutex in ip6_flow.c
to protect all data in each file. Of course, this is not MP-scalable. However,
it is sufficient as tentative workaround. We should make it scalable somehow
in the future.

ok by ozaki-r@n.o.
 1.69  13-Jun-2016  knakahara make ipflow_reap() static function.
 1.68  13-Jun-2016  knakahara remove unnecessary splnet before pool_{get,put}
 1.67  10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.66  23-Mar-2015  roy Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.
 1.65  18-Oct-2014  snj branches: 1.65.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.64  22-May-2014  rmind branches: 1.64.2;
- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.63  01-Apr-2014  pooka branches: 1.63.2;
Wrap ipflow_create() & ip6flow_create() in kernel lock. Prevents the
interrupt side on another core from seeing the situation while the ipflow
is being modified.
 1.62  19-Mar-2014  liamjfoy Move ipflow into ip_var.h and fix confliction
 1.61  19-Mar-2014  liamjfoy Remove ipflow_prune and replace with ipflow_reap. ok rmind@
 1.60  19-Jan-2012  liamjfoy branches: 1.60.6; 1.60.10;
Remove ipf_start from ipf struct
 1.59  01-Apr-2010  tls branches: 1.59.8; 1.59.12;
After discussion with ad@: it appears that KERNEL_LOCK also protects
the driver output path (that is, ifp->if_output()). In the case of
entry through the socket code, we are fine, because pru_usrreq takes
KERNEL_LOCK. However, there are a few other ways to cause output
which require protection:

1) direct calls to tcp_output() in tcp_input()
2) fast-forwarding code (ip_flow) -- protected elsewise
against itself by the softnet lock.
3) *Possibly* the ARP code. I have currently persuaded
myself that it is safe because of how it's called.
4) Possibly the ICMP code.

This change addresses #1 and #2.
 1.58  15-Mar-2009  cegger branches: 1.58.2; 1.58.4;
ansify function definitions
 1.57  01-Feb-2009  pooka branches: 1.57.2;
Init ipflow pool dynamically instead of using a linkset.
 1.56  28-Apr-2008  martin branches: 1.56.8;
Remove clause 3 and 4 from TNF licenses
 1.55  24-Apr-2008  ad branches: 1.55.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.54  12-Apr-2008  thorpej branches: 1.54.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.53  09-Apr-2008  thorpej - ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).
 1.52  07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.51  04-Jan-2008  dyoung branches: 1.51.6;
Constify a bit.
 1.50  04-Jan-2008  dyoung Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.
 1.49  20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.48  20-Aug-2007  dyoung branches: 1.48.2; 1.48.8; 1.48.10; 1.48.14;
Don't call rtcache_check() from the fast-forward code, which runs
at IPL_NET, because rtcache_check() may read the forwarding table.
Elsewhere, the kernel only blocks interrupts at priority IPL_SOFTNET
and below while it modifies the forwarding table, so rtcache_check()
could be reading the table in an inconsistent state. Use
rtcache_done(), instead.

XXX netinet/ip_flow.c and netinet6/ip6_flow.c are virtually identical.
XXX They should share code.
 1.47  02-May-2007  dyoung branches: 1.47.2; 1.47.6;
Remove obsolete files netinet/in_route.[ch].
 1.46  02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.45  05-Apr-2007  liamjfoy use size_t for indexes

just pass a *ip to ipflow_hash instead of members

ok christos@
 1.44  26-Mar-2007  liamjfoy Add a small note regarding further commented code in netinet6/ip6_flow.c
 1.43  25-Mar-2007  liamjfoy Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.
 1.42  12-Mar-2007  ad branches: 1.42.2; 1.42.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.41  04-Mar-2007  christos branches: 1.41.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.40  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.39  26-Jan-2007  dyoung branches: 1.39.2;
bzero -> memset
 1.38  15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.37  09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.36  06-Oct-2006  mrg add a missing semicolon from the previous commit.
 1.35  05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.34  02-Sep-2006  liamjfoy branches: 1.34.2; 1.34.4;
increment ips_total too.

ok matt thomas
 1.33  07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.32  24-Dec-2005  perry branches: 1.32.4; 1.32.6; 1.32.8; 1.32.14;
change comment from __const__ to const
 1.31  11-Dec-2005  christos merge ktrace-lwp.
 1.30  17-Oct-2005  christos small list macro cleanup:
- remove duplicate LIST_FIRST (Liam Foy)
- change to use LIST_FOREACH or for () instead of while () for consistency
 1.29  03-Feb-2005  perry branches: 1.29.6;
KNF + slightly ANSIfy
 1.28  25-Apr-2004  simonb branches: 1.28.4; 1.28.6;
Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.27  12-Dec-2003  scw Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.
 1.26  02-Nov-2002  perry branches: 1.26.6;
/*CONTCOND*/ while (0)'ed macros
 1.25  30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.24  09-Jun-2002  itojun whitespace
 1.23  08-Mar-2002  thorpej branches: 1.23.6;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.22  13-Nov-2001  lukem add RCSIDs
 1.21  29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.20  17-Sep-2001  thorpej branches: 1.20.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.19  12-Jun-2001  wiz branches: 1.19.2; 1.19.4;
receive, not recieve
 1.18  02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.17  13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.16  30-Jun-2000  thorpej branches: 1.16.2;
Pass the correct destination address for the route-to-gateway case.
From Zdenek Salvet, kern/10483.
 1.15  28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.14  17-Oct-1999  sommerfeld branches: 1.14.2; 1.14.10;
If a packet came in as link-level broadcast or link-level multicast, don't
attempt to fast-forward it out.
 1.13  26-Mar-1999  proff branches: 1.13.2; 1.13.8;
security: test for ip_len < ip_hl <<2 and drop packet accordingly
 1.12  28-Jan-1999  itohy ~htons(...) is always negative.
 1.11  25-Jan-1999  mycroft One more tweak to the checksum hack, and I promise I'm done. B-)
 1.10  25-Jan-1999  mycroft Absolutely minor tweak to generate better code.
 1.9  24-Jan-1999  mycroft Update the comment about the checksum hack. It was way out of date.
 1.8  24-Jan-1999  mycroft Modify the checksum slightly so that the htons()s can all be combined.
 1.7  08-Oct-1998  thorpej Use the pool allocator for ipflow entries.
 1.6  10-Jun-1998  sommerfe Truncate mbufs to the correct length before forwarding; fixes pr5560
 1.5  02-Jun-1998  thorpej In addition to the IP flow hash table, put the flows on a list. The table
is used for fast lookup, the list for traversal of all flows. Also, use
PRT timers.
 1.4  18-May-1998  matt Fix two bugs.
 1.3  04-May-1998  matt Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.
 1.2  04-May-1998  thorpej - kern/5380 (Dennis Ferguson): fix incremental IP header checksum.
- kern/5381 (Dennis Ferguson): check IP header checksum in fast forward
code.
- In ipflow_slowtimo(), if no IP flows are in use, don't bother checking
all of the hash buckets.
 1.1  29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.13.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.2.2  01-Jul-2000  he Pull up revision 1.16 (requested by thorpej):
Pass the correct destination address for the route-to-gateway
case. Fixes PR#10483.
 1.13.2.1  18-Oct-1999  cgd pull up rev 1.14 from trunk (requested by sommerfeld):
Multicast storm prevention: don't attempt to forward link-level
multicast packets which contain ip unicast packets; these packets
would only be generated from misconfigured/buggy systems.
 1.14.10.1  30-Jun-2000  thorpej Pull up rev. 1.16:
Pass the correct destination address for the route-to-gateway case.
>From Zdenek Salvet, kern/10483.
 1.14.2.2  21-Apr-2001  bouyer Sync with HEAD
 1.14.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.7  11-Nov-2002  nathanw Catch up to -current
 1.16.2.6  01-Aug-2002  nathanw Catch up to -current.
 1.16.2.5  20-Jun-2002  nathanw Catch up to -current.
 1.16.2.4  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.16.2.3  14-Nov-2001  nathanw Catch up to -current.
 1.16.2.2  21-Sep-2001  nathanw Catch up to -current.
 1.16.2.1  21-Jun-2001  nathanw Catch up to -current.
 1.19.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.19.2.4  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.19.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.19.2.2  16-Mar-2002  jdolecek Catch up with -current.
 1.19.2.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.23.6.2  15-Jul-2002  gehenna catch up with -current.
 1.23.6.1  20-Jun-2002  gehenna catch up with -current.
 1.26.6.5  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.26.6.4  04-Feb-2005  skrll Sync with HEAD.
 1.26.6.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.6.2  18-Sep-2004  skrll Sync with HEAD.
 1.26.6.1  03-Aug-2004  skrll Sync with HEAD
 1.28.6.1  12-Feb-2005  yamt sync with head.
 1.28.4.1  29-Apr-2005  kent sync with -current
 1.29.6.5  21-Jan-2008  yamt sync with head
 1.29.6.4  03-Sep-2007  yamt sync with head.
 1.29.6.3  26-Feb-2007  yamt sync with head.
 1.29.6.2  30-Dec-2006  yamt sync with head.
 1.29.6.1  21-Jun-2006  yamt sync with head.
 1.32.14.1  19-Jun-2006  chap Sync with head.
 1.32.8.2  03-Sep-2006  yamt sync with head.
 1.32.8.1  26-Jun-2006  yamt sync with head.
 1.32.6.1  04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.32.4.1  09-Sep-2006  rpaulo sync with head
 1.34.4.3  18-Dec-2006  yamt sync with head.
 1.34.4.2  10-Dec-2006  yamt sync with head.
 1.34.4.1  22-Oct-2006  yamt sync with head
 1.34.2.3  01-Feb-2007  ad Sync with head.
 1.34.2.2  12-Jan-2007  ad Sync with head.
 1.34.2.1  18-Nov-2006  ad Sync with head.
 1.39.2.5  07-May-2007  yamt sync with head.
 1.39.2.4  15-Apr-2007  yamt sync with head.
 1.39.2.3  24-Mar-2007  yamt sync with head.
 1.39.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.39.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.41.2.4  09-Oct-2007  ad Sync with head.
 1.41.2.3  08-Jun-2007  ad Sync with head.
 1.41.2.2  10-Apr-2007  ad Sync with head.
 1.41.2.1  13-Mar-2007  ad Sync with head.
 1.42.4.1  29-Mar-2007  reinoud Pullup to -current
 1.42.2.1  11-Jul-2007  mjf Sync with head.
 1.47.6.1  03-Sep-2007  jmcneill Sync with HEAD.
 1.47.2.1  03-Sep-2007  skrll Sync with HEAD.
 1.48.14.2  08-Jan-2008  bouyer Sync with HEAD
 1.48.14.1  02-Jan-2008  bouyer Sync with HEAD
 1.48.10.1  26-Dec-2007  ad Sync with head.
 1.48.8.1  18-Feb-2008  mjf Sync with HEAD.
 1.48.2.1  09-Jan-2008  matt sync with HEAD
 1.51.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.54.2.1  18-May-2008  yamt sync with head.
 1.55.2.3  11-Aug-2010  yamt sync with head.
 1.55.2.2  04-May-2009  yamt sync with head.
 1.55.2.1  16-May-2008  yamt sync with head.
 1.56.8.2  28-Apr-2009  skrll Sync with HEAD.
 1.56.8.1  03-Mar-2009  skrll Sync with HEAD.
 1.57.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.58.4.1  30-May-2010  rmind sync with head
 1.58.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.59.12.1  18-Feb-2012  mrg merge to -current.
 1.59.8.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.59.8.1  17-Apr-2012  yamt sync with head
 1.60.10.2  18-May-2014  rmind sync with head
 1.60.10.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.60.6.2  03-Dec-2017  jdolecek update from HEAD
 1.60.6.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.63.2.1  10-Aug-2014  tls Rebase.
 1.64.2.1  12-May-2017  snj Pull up following revision(s) (requested by skrll/ozaki-r in ticket #1402):
sys/net/route.c: revision 1.170 via patch
sys/netinet/ip_flow.c: revision 1.73 via patch
sys/netinet6/ip6_flow.c: revision 1.28 via patch
sys/netinet6/nd6.c: revision 1.203 via patch
Run timers in workqueue
Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).
Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.
Proposed on tech-net and tech-kern.
 1.65.2.6  28-Aug-2017  skrll Sync with HEAD
 1.65.2.5  05-Feb-2017  skrll Sync with HEAD
 1.65.2.4  05-Dec-2016  skrll Sync with HEAD
 1.65.2.3  05-Oct-2016  skrll Sync with HEAD
 1.65.2.2  09-Jul-2016  skrll Sync with HEAD
 1.65.2.1  06-Apr-2015  skrll Sync with HEAD
 1.73.2.4  20-Mar-2017  pgoyette Sync with HEAD
 1.73.2.3  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.73.2.2  04-Nov-2016  pgoyette Sync with HEAD
 1.73.2.1  06-Aug-2016  pgoyette Sync with HEAD
 1.79.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.80.6.1  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.81.2.1  16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.82.14.1  03-Apr-2021  thorpej Sync with HEAD.

RSS XML Feed