Home | History | Annotate | Download | only in netinet6
History log of /src/sys/netinet6/ip6_forward.c
RevisionDateAuthorComments
 1.103  29-Jun-2024  riastradh netinet6: Use _NET_STAT* API instead of direct array access.

XXX Exception: ip6flow_addstats_rt _assigns_ one of the `statistics'
to the current count of ip6 flows in use, and we don't have anything
in the _NET_STAT* API for that. So for now I abuse the abstraction,
until we sort out this one exceptional case properly.

PR kern/58380
 1.102  28-Aug-2020  ozaki-r inet6: reduce silent packet discards
 1.101  28-Aug-2020  ozaki-r inet6: pass rcvif to ip6_forward to avoid extra psref_acquire
 1.100  28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.99  12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.98  01-Nov-2019  knakahara Fix ipsecif(4) IPV6_MINMTU does not work correctly.
 1.97  19-Sep-2019  ozaki-r Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.96  13-May-2019  ozaki-r branches: 1.96.2;
Count packets dropped by pfil
 1.95  01-May-2018  maxv branches: 1.95.2;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.94  26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.93  18-Apr-2018  maxv Remove unused netipsec/xform.h includes.
 1.92  29-Jan-2018  maxv branches: 1.92.2;
style
 1.91  29-Jan-2018  maxv Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.90  09-Jan-2018  ozaki-r Fix use-after-free of mbuf by ip6flow_create (one more)

XXX need pullup-[678]
 1.89  09-Jan-2018  ozaki-r Fix use-after-free of mbuf by ip6flow_create

This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.

XXX need pullup-[678]
 1.88  02-Aug-2017  ozaki-r Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
 1.87  09-May-2017  ozaki-r branches: 1.87.2;
Add missing KEY_FREESP to ip6_forward
 1.86  14-Feb-2017  ozaki-r branches: 1.86.4;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.85  16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.84  16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.83  11-Jan-2017  ozaki-r branches: 1.83.2;
Get rid of unnecessary header inclusions
 1.82  08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.81  31-Aug-2016  ozaki-r Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@
 1.80  28-Jun-2016  ozaki-r branches: 1.80.2;
Add missing NULL checks for m_get_rcvif_psref
 1.79  10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.78  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.77  07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.76  10-Dec-2014  christos call vsnprintf instead of snprintf; provide more detail
 1.75  08-Dec-2014  christos Merge some common code in the failed forwarding case, while providing better
diagnostics, and fixing leaks.
 1.74  14-Nov-2014  maxv branches: 1.74.2;
Do not uselessly include <sys/malloc.h>.
 1.73  30-May-2014  christos branches: 1.73.2;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.72  29-Jun-2013  rmind branches: 1.72.4;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.71  05-Jun-2013  christos branches: 1.71.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.70  22-Mar-2012  drochner branches: 1.70.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.69  19-Dec-2011  drochner branches: 1.69.2; 1.69.6; 1.69.8;
rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.68  04-Feb-2010  joerg branches: 1.68.12; 1.68.16;
Explicitly include opt_gateway.h when depending on GATEWAY.
 1.67  11-Nov-2009  joerg Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output. Patch suggested by bad@
 1.66  18-Mar-2009  cegger bzero -> memset
 1.65  23-Apr-2008  thorpej branches: 1.65.2; 1.65.10; 1.65.12; 1.65.16; 1.65.18; 1.65.20;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.64  15-Apr-2008  thorpej branches: 1.64.2;
Make ip6 and icmp6 stats per-cpu.
 1.63  08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.62  14-Jan-2008  dyoung branches: 1.62.2; 1.62.6;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in6_losing().
 1.61  12-Jan-2008  dyoung Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().
 1.60  10-Jan-2008  dyoung Save some rtcache_getrt() calls.
 1.59  20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.58  23-May-2007  christos branches: 1.58.8; 1.58.14; 1.58.16; 1.58.20;
Ansify + add a few comments, from Karl Sjödahl
 1.57  02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.56  07-Mar-2007  liamjfoy branches: 1.56.2; 1.56.4;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.55  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.54  10-Feb-2007  degroote branches: 1.54.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.53  26-Jan-2007  dyoung bzero -> memset
 1.52  15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.51  09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.50  02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.49  29-Jun-2006  liamjfoy branches: 1.49.4; 1.49.6; 1.49.8; 1.49.10;
Fix a minor printf found while reading the code
 1.48  07-Jun-2006  kardel branches: 1.48.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.47  21-Jan-2006  rpaulo branches: 1.47.2; 1.47.4; 1.47.6; 1.47.12;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.46  11-Dec-2005  christos branches: 1.46.2;
merge ktrace-lwp.
 1.45  29-May-2005  christos branches: 1.45.2;
- avoid shadowed variables
- sprinkle const.
 1.44  26-Feb-2005  perry nuke trailing whitespace
 1.43  16-Jul-2004  itojun branches: 1.43.4; 1.43.6;
prevent mbuf leak on IPsec tunnel mode. from iij seil team
 1.42  24-Jun-2004  itojun error could be left uninitialized when we jump into "senderr"
 1.41  16-Jan-2004  itojun when ipsec tunnel mode is applied, we are originating packet (instead of
forwarding). go to ip6_output() path for fragmentation and other processing.
from kame
 1.40  29-Oct-2003  mycroft Do a jump optimization that eliminates some uninitialized variable warnings.
 1.39  03-Oct-2003  itojun shouldn't check scope match when encapsulating packet into tunnel mode.
iij seil team
 1.38  02-Oct-2003  itojun do not deref state.ro if it is NULL
 1.37  02-Oct-2003  itojun correctly look at outer IPv6 header when forwarding packet into ipsec tunnel.
iij seil team
 1.36  07-Aug-2003  itojun make net.inet6.ip6.redirect actually work. from Tomoyuki Sahara via kame
 1.35  03-Jul-2003  itojun minor KNF
 1.34  30-Jun-2003  itojun branches: 1.34.2;
KNF
 1.33  24-Jun-2003  itojun use time.tv_sec directly
 1.32  11-Sep-2002  itojun avoid from applying IPsec transport mode to the packets when the kernel
forwards the packets.
sync w/kame
 1.31  08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.30  07-Jun-2002  itojun typo
 1.29  07-Jun-2002  itojun 'fall through' is not a valid LINT keyword.
 1.28  29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.27  18-Dec-2001  itojun branches: 1.27.8; 1.27.10;
reduce white space/cosmetic diffs w/kame.
 1.26  13-Nov-2001  lukem add RCSIDs
 1.25  24-Oct-2001  itojun more whitespace sync with kame
 1.24  17-Oct-2001  itojun branches: 1.24.2;
unifdef OLDIP6OUTPUT
 1.23  18-Jul-2001  itojun sync with draft-ietf-ipngwg-p2p-pingpong-00.txt. apply special behavior
only if ip6_dst is "neighbor" within p2p prefix. sync with kame
 1.22  22-Jun-2001  itojun branches: 1.22.2;
do not forward packet back to point-to-point interface, if the packet
matches the ipv6 prefix assigned to the p2p interface (= redirect case).
this leads to pingpong, chews bandwidth. bad thing is that bad guy from
remote can chew bandwidth. (follows upcoming internet draft)
 1.21  12-Jun-2001  matt senderr needs only be declared when PFIL_HOOKS is defined
 1.20  12-Jun-2001  itojun run pfil_hooks for IPv6 forwarding path (note: ip6_forward() does not
call ip6_output()).
 1.19  30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.18  10-Feb-2001  itojun branches: 1.18.2;
to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.17  22-Sep-2000  itojun on ipsec policy lookup, do not try to lookup port numbers for forwarded packet.
sync with kame.
 1.16  27-Jul-2000  itojun do not forward packet with :: in the source.
this is not in the spec - we had rough consensus on it in ipngwg,
spec will get updated to include this behavior.
 1.15  16-Jul-2000  itojun s/IPSEC_IPV6FWD/IPSEC/. this should correct strange behavior on ipv6
forwarding (even if policy asks for tunnel mode encryption, packets
go out in clear). sync with kame.
 1.14  06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.13  30-Jun-2000  itojun suppress too noisy warning on forward-over-loopback case. from kame
 1.12  03-Jun-2000  itojun branches: 1.12.2;
sync with kame.
- use latest source address selection code - in6_src.c.
- correct frag header insertion.
- deep copy ip6 header portion in ip6_mloopback to avoid overwrite.
- do not bark when we forward packet to loopback.
- some cosmetics.
 1.11  19-May-2000  itojun branches: 1.11.2;
correct manipulation of link-local scoped address on loopback.
now "telnet fe80::1%lo0" should work again.
(we have another bug near here - will attack it soon)
 1.10  19-May-2000  itojun do not mistakingly forward link-local scoped packet (the bug was added
with "beyondscope" icmp6 support).
"options FAKE_LOOPBACK_IF" will honor scope on loopback outputs. rcvif will
be real interface, not the loopback, just like when multicast loopback.

(sync with kame)
 1.9  26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.8  06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.7  31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.6  06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.5  13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.4  30-Jul-1999  itojun branches: 1.4.2; 1.4.8;
remove reference to in6_systm.h (file itself will be removed afterwords)
 1.3  03-Jul-1999  thorpej RCS ID police.
 1.2  01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1  28-Jun-1999  itojun branches: 1.1.2;
file ip6_forward.c was initially added on branch kame.
 1.1.2.2  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3  02-Aug-1999  thorpej Update from trunk.
 1.2.2.2  01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1  01-Jul-1999  thorpej file ip6_forward.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.4.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.2.3  21-Apr-2001  bouyer Sync with HEAD
 1.4.2.2  11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.12.2.13  07-Apr-2004  jmc Pullup rev 1.39 (requested by itojun in ticket #99)

Shouldn't check scope match when encapsulating packet into tunnel mode.
 1.12.2.12  07-Apr-2004  jmc Pullup rev 1.38 (requested by itojun in ticket #96)

Do not deref state.ro if it is NULL.
 1.12.2.11  07-Apr-2004  jmc Pullup patch (requested by itojun in ticket #94)

Correctly look at outer IPv6 header when forwarding packet into ipsec tunnel.
 1.12.2.10  11-Feb-2003  msaitoh Pull up revision 1.22 (requested by itojun):
do not forward packet back to point-to-point interface, if the packet
matches the ipv6 prefix assigned to the p2p interface (= redirect case).
this leads to pingpong, chews bandwidth. bad thing is that bad guy from
remote can chew bandwidth. (follows upcoming internet draft)
 1.12.2.9  14-Nov-2002  itojun sys/netinet6/ip6_forward.c 1.20 via patch

Need opt_pfil_hooks.h for PFIL_HOOK.

(masanobu)
 1.12.2.8  26-Feb-2002  he Apply patch (requested by martti):
Fix it so that IPFilter handles IPv6 traffic.
 1.12.2.7  09-Feb-2002  he Pull up revision 1.12.2.5 (requested by martti):
Updated IPFilter to 3.4.23.
(This one re-adds filtering support for forwarded IPv6 packets.)
 1.12.2.6  25-Oct-2001  jhawk Revert Darren Reed <darrenr@netbsd.org>'s unauthorized commit to the
netbsd-1-5 branch, rev 1.12.2.5 of 2001/10/15 13:19:15. This may
re-appear on the branch pending suitable review.
 1.12.2.5  15-Oct-2001  darrenr add ipv6 filtering hooks with pfil_hook for forwarded packet case
 1.12.2.4  29-Sep-2000  itojun pullup (approved by releng-1-5)

do not try to look up port number on forwarding case.
sys/netinet6/ip6_forward.c 1.16 -> 1.17
 1.12.2.3  28-Jul-2000  itojun pullup 1.15 -> 1.16 (aproved by releng-1-5)

> do not forward packet with :: in the source.
> this is not in the spec - we had rough consensus on it in ipngwg,
> spec will get updated to include this behavior.
 1.12.2.2  17-Jul-2000  itojun pullup 1.13 -> 1.15 (approved by releng-1-5)

1.14 -> 1.15
s/IPSEC_IPV6FWD/IPSEC/. this should correct strange behavior on ipv6
forwarding (even if policy asks for tunnel mode encryption, packets
go out in clear). sync with kame.

1.13 -> 1.14
date: 2000/07/06 12:51:41; author: itojun; state: Exp; lines: +2 -3
remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.12.2.1  01-Jul-2000  itojun mrege 1.12 -> 1.13: (approved by: releng-1-5)
suppress too noisy warning on forward-over-loopback case. from kame
 1.18.2.8  17-Sep-2002  nathanw Catch up to -current.
 1.18.2.7  20-Jun-2002  nathanw Catch up to -current.
 1.18.2.6  08-Jan-2002  nathanw Catch up to -current.
 1.18.2.5  14-Nov-2001  nathanw Catch up to -current.
 1.18.2.4  22-Oct-2001  nathanw Catch up to -current.
 1.18.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.18.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.18.2.1  09-Apr-2001  nathanw Catch up with -current.
 1.22.2.4  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.22.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.22.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.22.2.1  03-Aug-2001  lukem update to -current
 1.24.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.27.10.3  04-Oct-2003  tron Pull up revision 1.39 (requested by itojun in ticket #1504):
shouldn't check scope match when encapsulating packet into tunnel mode.
iij seil team
 1.27.10.2  02-Oct-2003  tron Pull up revision 1.38 (requested by itojun in ticket #1501):
do not deref state.ro if it is NULL
 1.27.10.1  02-Oct-2003  tron Pull up revision 1.37 via patch (requested by itojun in ticket #1500):
correctly look at outer IPv6 header when forwarding packet into ipsec tunnel.
iij seil team
 1.27.8.2  20-Jun-2002  gehenna catch up with -current.
 1.27.8.1  30-May-2002  gehenna Catch up with -current.
 1.34.2.5  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.34.2.4  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.34.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.34.2.1  03-Aug-2004  skrll Sync with HEAD
 1.43.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.43.4.1  29-Apr-2005  kent sync with -current
 1.45.2.5  21-Jan-2008  yamt sync with head
 1.45.2.4  03-Sep-2007  yamt sync with head.
 1.45.2.3  26-Feb-2007  yamt sync with head.
 1.45.2.2  30-Dec-2006  yamt sync with head.
 1.45.2.1  21-Jun-2006  yamt sync with head.
 1.46.2.1  01-Feb-2006  yamt sync with head.
 1.47.12.1  19-Jun-2006  chap Sync with head.
 1.47.6.2  11-Aug-2006  yamt sync with head
 1.47.6.1  26-Jun-2006  yamt sync with head.
 1.47.4.1  04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.47.2.1  09-Sep-2006  rpaulo sync with head
 1.48.2.1  13-Jul-2006  gdamore Merge from HEAD.
 1.49.10.1  04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.49.8.2  14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1366):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.49.8.1  24-May-2007  pavel branches: 1.49.8.1.4;
Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.49.8.1.4.1  14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1366):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.49.6.2  18-Dec-2006  yamt sync with head.
 1.49.6.1  10-Dec-2006  yamt sync with head.
 1.49.4.2  01-Feb-2007  ad Sync with head.
 1.49.4.1  12-Jan-2007  ad Sync with head.
 1.54.2.3  07-May-2007  yamt sync with head.
 1.54.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.54.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.56.4.1  11-Jul-2007  mjf Sync with head.
 1.56.2.1  08-Jun-2007  ad Sync with head.
 1.58.20.3  19-Jan-2008  bouyer Sync with HEAD
 1.58.20.2  10-Jan-2008  bouyer Sync with HEAD
 1.58.20.1  02-Jan-2008  bouyer Sync with HEAD
 1.58.16.1  26-Dec-2007  ad Sync with head.
 1.58.14.1  18-Feb-2008  mjf Sync with HEAD.
 1.58.8.2  23-Mar-2008  matt sync with HEAD
 1.58.8.1  09-Jan-2008  matt sync with HEAD
 1.62.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.62.2.1  22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.64.2.1  18-May-2008  yamt sync with head.
 1.65.20.1  21-Apr-2010  matt sync to netbsd-5
 1.65.18.1  14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1139):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.65.16.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.65.12.1  14-Nov-2009  sborrill Pull up the following revisions(s) (requested by joerg in ticket #1139):
sys/netinet6/ip6_forward.c: revision 1.67

Clear cksum flags before any further processing like ip_forward does.
Many drivers set the UDP/TCP v4 flags even for v6 traffic and if the
packet is encapsulated with gif, the IPv6 header would get corrupted by
ip_output.
 1.65.10.1  28-Apr-2009  skrll Sync with HEAD.
 1.65.2.2  11-Mar-2010  yamt sync with head
 1.65.2.1  04-May-2009  yamt sync with head.
 1.68.16.2  05-Apr-2012  mrg sync to latest -current.
 1.68.16.1  18-Feb-2012  mrg merge to -current.
 1.68.12.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.68.12.1  17-Apr-2012  yamt sync with head
 1.69.8.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1540):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.69.8.1  13-Mar-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1518):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.69.6.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1540):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.69.6.1  13-Mar-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1518):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.69.2.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1540):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.69.2.1  13-Mar-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1518):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.70.2.3  03-Dec-2017  jdolecek update from HEAD
 1.70.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.2.1  23-Jun-2013  tls resync from head
 1.71.2.1  28-Aug-2013  rmind sync with head
 1.72.4.1  10-Aug-2014  tls Rebase.
 1.73.2.3  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1590):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.73.2.2  12-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1551):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.73.2.1  17-Jan-2015  martin branches: 1.73.2.1.2; 1.73.2.1.6;
Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.73.2.1.6.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1590):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.73.2.1.6.1  12-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1551):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.73.2.1.2.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1590):

sys/netinet6/ip6_forward.c: revision 1.91 (via patch)

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.73.2.1.2.1  12-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #1551):
sys/netinet6/ip6_forward.c: 1.89-1.90 via patch
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.74.2.6  28-Aug-2017  skrll Sync with HEAD
 1.74.2.5  05-Feb-2017  skrll Sync with HEAD
 1.74.2.4  05-Oct-2016  skrll Sync with HEAD
 1.74.2.3  09-Jul-2016  skrll Sync with HEAD
 1.74.2.2  22-Sep-2015  skrll Sync with HEAD
 1.74.2.1  06-Apr-2015  skrll Sync with HEAD
 1.80.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.80.2.1  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.83.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.86.4.1  11-May-2017  pgoyette Sync with HEAD
 1.87.2.4  24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.87.2.3  30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #671):

sys/netinet6/ip6_forward.c: revision 1.91

Fix two pretty bad mistakes. If ipsec6_check_policy fails m is not freed,
and a 'goto out' is missing after ipsec6_process_packet.
 1.87.2.2  09-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #484):
sys/netinet6/ip6_forward.c: 1.89-1.90
Fix use-after-free of mbuf by ip6flow_create
This fixes recent failures of some ATF tests such as t_ipsec_tunnel_odd.
--
Fix use-after-free of mbuf by ip6flow_create (one more)
 1.87.2.1  21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.92.2.2  02-May-2018  pgoyette Synch with HEAD
 1.92.2.1  22-Apr-2018  pgoyette Sync with HEAD
 1.95.2.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.95.2.1  10-Jun-2019  christos Sync with HEAD
 1.96.2.1  24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

RSS XML Feed