Home | History | Annotate | Download | only in netinet6
History log of /src/sys/netinet6/in6_proto.c
RevisionDateAuthorComments
 1.131  09-Feb-2024  andvar fix spelling mistakes, mainly in comments and log messages.
 1.130  24-Oct-2022  knakahara Fix PR kern/57037

Be able to change the behavior sending parameter changing routing messages.
When set net.inet6.ip6.param_rt_msg=0, don't send parameter changing
routing messages.
When set net.inet6.ip6.param_rt_msg=1(default), send parameter changing
routing messages by RTM_NEWADDR.
 1.129  03-Sep-2022  thorpej Garbage-collect everything related to struct domain::dom_ifqueues
(except dom_ifqueues itself, until the next kernel version bump).
It's no longer used now that nothing uses the legacy netisr mechanism.
 1.128  12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.127  24-Apr-2020  jakllsch Fill in .pr_usrreqs for SOCK_SEQPACKET and SOCK_STREAM variants of SCTP too.

This should allow these socket types of SCTP to operate on IPv6 family
sockets, as .pr_usrreqs must not be NULL for socreate() to succeed.
 1.126  14-Aug-2018  maxv branches: 1.126.10;
Retire EtherIP, we have L2TP instead.
 1.125  11-May-2018  roy branches: 1.125.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.
 1.124  03-May-2018  maxv Remove now unused tcpip.h includes. Some were already unused before.
 1.123  03-May-2018  maxv Remove net_osdep.h completely.
 1.122  15-Mar-2018  maxv Add the PR_LASTHDR flag on the PFsync and CARP entries. Otherwise a
"require" IPsec policy is not enforced on them, and unauthenticated
packets will be accepted.

Tested with a require-AH configuration. Sent on tech-net@, no comment.
 1.121  07-Feb-2018  maxv branches: 1.121.2;
Style, and localify IPV6FORWARDING. No functional change.
 1.120  07-Feb-2018  maxv Change ip6_hdrnestlimit to be 15 instead of 50. I couldn't find any
reference in RFCs about what a correct limit should be, but FreeBSD already
uses 15.

If an IPv6 packet has 50 options, there is clearly something wrong with it.
 1.119  27-Sep-2017  ozaki-r Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
 1.118  21-Sep-2017  ozaki-r Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
 1.117  14-Apr-2017  ozaki-r branches: 1.117.4;
Rumpify netipsec

Note that we should modularize netipsec and reduce reverse symbol references
(referencing symbols of netipsec from net, netinet and netinet6) though,
the task needs lots of code changes. Prior to doing so, rumpifying it and
having ATF tests should be useful.
 1.116  16-Feb-2017  knakahara add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.115  13-Feb-2017  ozaki-r Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.
 1.114  13-Dec-2016  ozaki-r branches: 1.114.2;
Remove unnecessary inclusions of nd6.h
 1.113  06-Jul-2016  ozaki-r branches: 1.113.2;
Move in6_ifaddr_list to a more proper place (from ip6_input.c to in6.c)

It's a similar place as the IPv4 address list, i.e., in.c.

More varibles will join together.
 1.112  26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.111  11-Apr-2016  ozaki-r Sweep unncessary radix.h inclusions
 1.110  21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.109  21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.108  20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.107  13-Oct-2015  rjs Add core networking support for SCTP.
 1.106  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.105  22-Apr-2015  roy Move INET6 specific in6_if_{up,down}() and in6_if_link_{up,down}()
into agnostic domain functions.
 1.104  10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.103  05-Jun-2014  rmind branches: 1.103.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.102  22-May-2014  rmind Move udp6_input(), udp6_sendup(), udp6_realinput() and udp6_input_checksum()
from udp_usrreq.c to udp6_usrreq.c where they belong. No functional change.
 1.101  18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.100  02-Jan-2014  pooka branches: 1.100.2;
Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.99  05-Jun-2013  christos branches: 1.99.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.98  01-Mar-2013  joerg Retire OSI network stack. OK core@
 1.97  23-Jun-2012  christos branches: 1.97.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.96  22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.95  31-Dec-2011  christos branches: 1.95.2; 1.95.6; 1.95.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.94  19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.93  24-Sep-2011  christos branches: 1.93.2; 1.93.6;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.92  24-May-2011  spz RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.91  03-May-2011  dyoung *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.90  31-Mar-2011  dyoung Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.89  24-Aug-2010  jakllsch branches: 1.89.2;
Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.88  04-Feb-2010  joerg branches: 1.88.2; 1.88.4;
Explicitly include opt_gateway.h when depending on GATEWAY.
 1.87  11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.86  11-Sep-2009  dyoung Nothing uses sockaddr_in6_cmp() right now, and the generic
sockaddr_cmp() is probably as fast or faster than calling
sockaddr_in6_cmp() through a function pointer, so let's stop
compiling it.
 1.85  21-Aug-2009  tsutsui Fix error on kernels with options IPSEC without options IPSEC_ESP.
Found on building evbppc/conf/PMPPC.
 1.84  23-Mar-2009  liamjfoy Init ip6flow pool dynamically instead of using a linkset.
 1.83  25-Nov-2008  pooka branches: 1.83.4;
Make dom_maxrtkey of inet/inet6domain the size of the ip_encap pack
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
 1.82  24-Apr-2008  ad branches: 1.82.2; 1.82.8; 1.82.10; 1.82.12;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.81  23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.80  15-Apr-2008  thorpej branches: 1.80.2;
Make pim6 stats per-cpu.
 1.79  19-Sep-2007  dyoung branches: 1.79.16; 1.79.20;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.78  30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.77  06-May-2007  dyoung branches: 1.77.2; 1.77.6; 1.77.8;
In AppleTalk, IPv4, and IPv6 routing domains, help sockaddr_cmp()
avoid an indirect function call by comparing the family, length,
and bytes [dom->dom_sa_cmpofs, dom->dom_sa_cmpofs + dom->dom_sa_cmplen),
corresponding to the the sockaddrs' "address" members.

For ISO, actually use sockaddr_iso_cmp, for a change. Thanks to
yamt@ for pointing out my error.
 1.76  02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.75  07-Mar-2007  liamjfoy branches: 1.75.2; 1.75.4;
Add IPv6 Fast Forward - the IPv4 counterpart:

If ip6_forward successfully forwards a packet, a cache, in this case a
ip6flow struct entry, will be created. ether_input and friends will
then be able to call ip6flow_fastforward with the packet which will then
be passed to if_output (unless an issue is found - in that case the packet
is passed back to ip6_input).

ok matt@ christos@ dyoung@ and joerg@
 1.74  06-Mar-2007  liamjfoy Fix some style issues - no functional change
 1.73  27-Feb-2007  degroote Initialize fast_ipsec entry in the protocol switch with structure
initializers as other entries.
 1.72  19-Feb-2007  dyoung Initialize protocol switch with structure initializers.
 1.71  17-Feb-2007  dyoung 0 -> NULL
 1.70  10-Feb-2007  degroote branches: 1.70.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.69  09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.68  23-Nov-2006  rpaulo branches: 1.68.2; 1.68.4;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.67  10-Oct-2006  dogcow change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)
 1.66  30-Aug-2006  christos branches: 1.66.2; 1.66.4;
add missing initializers
 1.65  28-Aug-2006  christos remove extra members
 1.64  25-Aug-2006  matt One step closer to loadable domains. Store pointers to a domain's soft
interrupt queues so if_detach can remove packets to removed interfaces from
them. This eliminates a lot of conditional ugly code in if.c
 1.63  18-May-2006  liamjfoy Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.62  05-Mar-2006  rpaulo branches: 1.62.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.61  11-Dec-2005  christos branches: 1.61.4; 1.61.6; 1.61.8;
merge ktrace-lwp.
 1.60  19-Jul-2005  gdt Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.59  29-May-2005  christos branches: 1.59.2;
- avoid shadowed variables
- sprinkle const.
 1.58  23-Jan-2005  matt branches: 1.58.6;
Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.57  22-Apr-2004  matt branches: 1.57.4;
Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.56  04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.55  03-Nov-2003  briggs Revert the change in default value of ipv6_v6only. Further discussion
on this topic is required. It should be reintroduced and pursued in
the IETF.
 1.54  28-Oct-2003  briggs Toggle the default value of ip6_v6only. Also provide a sample sysctl to
retain the existing behavior.
 1.53  06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.52  05-Sep-2003  itojun call tcp_drain() if IPv4-less kernel
 1.51  04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.50  14-Aug-2003  itojun enforce ipsec policy on raw wildcard.
 1.49  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.48  07-Aug-2003  itojun make net.inet6.ip6.redirect actually work. from Tomoyuki Sahara via kame
 1.47  17-Apr-2003  thorpej branches: 1.47.2;
Protect the definition of offsetof().
 1.46  11-Nov-2002  itojun pmtu_probe is not used anywhere (it is used in KAME TCP6-only code).
From: Krister Walfridsson <cato@df.lth.se>
 1.45  20-Aug-2002  itojun sync up use_deprecated handling with latest kame.
- bind(deprecated) is allowed, trusting userland app is doing the right thing
- use_deprecated default to 1
 1.44  17-Aug-2002  itojun set default value for use_deprecated to 0, to avoid consequences with ftpd.
 1.43  09-Jun-2002  itojun whitespace cleanup
 1.42  08-Jun-2002  itojun whitespace cleanup
 1.41  29-May-2002  itojun move per-interface ip6/icmp6 stat to ifnet->if_afdata. sync w/kame
 1.40  28-May-2002  itojun limit number of IPv6 fragments (not the fragment queue size) to
fight against lots-of-frags DoS attacks. sync w/kame
 1.39  15-Mar-2002  itojun branches: 1.39.4; 1.39.6;
have tcp6_drain
 1.38  21-Dec-2001  itojun call encap6_ctlinput on icmp6 against tunnelled packet. sync w/kame
 1.37  21-Dec-2001  itojun use radix table for inbound tunnel lookup (would increase performance
for machines with a lot of tunnels).
update route cache for IPvX-over-IPv6 tunnel on path MTU discovery.
snyc with kame
 1.36  21-Dec-2001  itojun move in6_gif_hlim decl to in6_gif.c. sync with kame
 1.35  21-Dec-2001  itojun move protosw fragment for gif/stf to their own source code.
reduce #ifdef in stf code. sync with kame
 1.34  13-Nov-2001  lukem add RCSIDs
 1.33  24-Oct-2001  itojun no tcp_fasttimo any more. PR 14333
 1.32  24-Oct-2001  itojun more whitespace sync with kame
 1.31  16-Oct-2001  itojun branches: 1.31.2;
remove unused #define. sync whitespace/comment with kame.
 1.30  15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.29  21-Mar-2001  thorpej branches: 1.29.2;
Add a protosw flag, PR_ABRTACPTDIS (Abort on Accept of Disconnected
Socket), and add it to the protocols that use that behavior (all
PR_LISTEN protocols except for PF_LOCAL stream sockets).
 1.28  01-Mar-2001  itojun branches: 1.28.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.27  21-Feb-2001  itojun need PR_ADDR|PR_ATOMIC for IPPROTO_EON. fix typo. from chopps, sync with kame
 1.26  20-Feb-2001  itojun ISO over IPv4/v6 by EON encapsulation. from chopps, sync with kame.
 1.25  11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.24  11-Feb-2001  itojun whitespace sync with kame
 1.23  19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.22  18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.21  10-Oct-2000  itojun sync with kame ($KAME$)
 1.20  10-Oct-2000  enami Don't initialize TCP twice on v4/v6 dual stack kernel.
 1.19  28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.18  06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.17  19-Apr-2000  itojun branches: 1.17.4;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.16  26-Feb-2000  itojun implement rip6_ctlinput, to cope with routing changes correctly.
(IMHO we need rip_ctlinput as well)
 1.15  26-Feb-2000  itojun make it possible to throw IPv6 packet with proto=4/41.
(in normal case we don't do it, but this is how IPv4 in_proto is written)
 1.14  14-Feb-2000  thorpej Use ratecheck() for ICMP6 rate limiting.
 1.13  06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.12  06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.11  06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.10  02-Jan-2000  itojun add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)
 1.9  13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.8  31-Jul-1999  itojun branches: 1.8.2; 1.8.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.7  30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.6  27-Jul-1999  explorer Fix a problem where tcp_slowtimo was called twice, once for ipv4 tcp and
once for ipv6. This patch makes the ipv6 case pass NULLs in for fast
and slow timeouts iff defined(INET) and passes in the right function
if !defined(INET).

Reveiwed by itojun@iijlab.net.
 1.5  22-Jul-1999  itojun change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.
 1.4  09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.3  03-Jul-1999  thorpej RCS ID police.
 1.2  01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1  28-Jun-1999  itojun branches: 1.1.2;
file in6_proto.c was initially added on branch kame.
 1.1.2.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3  02-Aug-1999  thorpej Update from trunk.
 1.2.2.2  01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1  01-Jul-1999  thorpej file in6_proto.c was added on branch chs-ubc2 on 1999-07-01 23:48:28 +0000
 1.8.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.2.4  27-Mar-2001  bouyer Sync with HEAD.
 1.8.2.3  12-Mar-2001  bouyer Sync with HEAD.
 1.8.2.2  11-Feb-2001  bouyer Sync with HEAD.
 1.8.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.4  09-Sep-2003  msaitoh Pull up rev. 1.50 (requested by itojun in ticket #50):
enforce ipsec policy on raw wildcard.
 1.17.4.3  11-Mar-2001  he Pull up revision 1.28 (via patch, requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.17.4.2  17-Oct-2000  tv Pullup 1.20 [enami]:
Don't initialize TCP twice on v4/v6 dual stack kernel.
 1.17.4.1  16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.28.2.7  11-Dec-2002  thorpej Sync with HEAD.
 1.28.2.6  20-Jun-2002  nathanw Catch up to -current.
 1.28.2.5  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.28.2.4  08-Jan-2002  nathanw Catch up to -current.
 1.28.2.3  14-Nov-2001  nathanw Catch up to -current.
 1.28.2.2  22-Oct-2001  nathanw Catch up to -current.
 1.28.2.1  09-Apr-2001  nathanw Catch up with -current.
 1.29.2.3  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.29.2.2  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.29.2.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.31.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.39.6.3  04-Oct-2003  tron Pull up revision 1.50 (requested by itojun in ticket #1409):
enforce ipsec policy on raw wildcard.
 1.39.6.2  21-Nov-2002  he Pull up revision 1.45 (requested by itojun in ticket #708):
Allow bind() of deprecated addresses, trusting userland
application knows what it's doing.
 1.39.6.1  18-Aug-2002  lukem Pull up revision 1.44 (requested by itojun in ticket #697):
set default value for use_deprecated to 0, to avoid consequences with ftpd.
 1.39.4.3  29-Aug-2002  gehenna catch up with -current.
 1.39.4.2  20-Jun-2002  gehenna catch up with -current.
 1.39.4.1  30-May-2002  gehenna Catch up with -current.
 1.47.2.5  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.47.2.4  24-Jan-2005  skrll Sync with HEAD.
 1.47.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.47.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.47.2.1  03-Aug-2004  skrll Sync with HEAD
 1.57.4.1  29-Apr-2005  kent sync with -current
 1.58.6.1  15-Aug-2005  tron Pull up revision 1.60 (requested by gdt in ticket #661):
Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.59.2.5  27-Oct-2007  yamt sync with head.
 1.59.2.4  03-Sep-2007  yamt sync with head.
 1.59.2.3  26-Feb-2007  yamt sync with head.
 1.59.2.2  30-Dec-2006  yamt sync with head.
 1.59.2.1  21-Jun-2006  yamt sync with head.
 1.61.8.3  03-Sep-2006  yamt sync with head.
 1.61.8.2  24-May-2006  yamt sync with head.
 1.61.8.1  13-Mar-2006  yamt sync with head.
 1.61.6.2  01-Jun-2006  kardel Sync with head.
 1.61.6.1  22-Apr-2006  simonb Sync with head.
 1.61.4.2  09-Sep-2006  rpaulo sync with head
 1.61.4.1  07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.62.4.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.66.4.2  10-Dec-2006  yamt sync with head.
 1.66.4.1  22-Oct-2006  yamt sync with head
 1.66.2.2  12-Jan-2007  ad Sync with head.
 1.66.2.1  18-Nov-2006  ad Sync with head.
 1.68.4.1  04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.68.2.1  24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.70.2.3  07-May-2007  yamt sync with head.
 1.70.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.70.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.75.4.1  11-Jul-2007  mjf Sync with head.
 1.75.2.2  09-Oct-2007  ad Sync with head.
 1.75.2.1  08-Jun-2007  ad Sync with head.
 1.77.8.1  06-Nov-2007  matt sync with HEAD
 1.77.6.2  02-Oct-2007  joerg Sync with HEAD.
 1.77.6.1  03-Sep-2007  jmcneill Sync with HEAD.
 1.77.2.1  03-Sep-2007  skrll Sync with HEAD.
 1.79.20.2  17-Jan-2009  mjf Sync with HEAD.
 1.79.20.1  02-Jun-2008  mjf Sync with HEAD.
 1.79.16.1  22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.80.2.1  18-May-2008  yamt sync with head.
 1.82.12.1  21-Nov-2010  riz branches: 1.82.12.1.2;
Pull up following revision(s) (requested by jakllsch in ticket #1445):
sys/netinet6/ip6_etherip.h: revision 1.2
sys/netinet6/in6_proto.c: revision 1.89
sys/netinet6/ip6_etherip.c: revision 1.14
Make the EtherIP in IPv6 input path work.
XXX: Figure out if we really need a separate protosw for IPv6.
 1.82.12.1.2.1  07-Jan-2011  matt If using hardware checksum offload and the packet can't be h/w checksumed
(for whatever reason, some hardware is stupid) allow the driver to calculate
the checksum instead.
 1.82.10.2  28-Apr-2009  skrll Sync with HEAD.
 1.82.10.1  19-Jan-2009  skrll Sync with HEAD.
 1.82.8.1  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.82.2.4  09-Oct-2010  yamt sync with head
 1.82.2.3  11-Mar-2010  yamt sync with head
 1.82.2.2  16-Sep-2009  yamt sync with head
 1.82.2.1  04-May-2009  yamt sync with head.
 1.83.4.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.88.4.3  31-May-2011  rmind sync with head
 1.88.4.2  21-Apr-2011  rmind sync with head
 1.88.4.1  05-Mar-2011  rmind sync with head
 1.88.2.1  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.89.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.93.6.2  05-Apr-2012  mrg sync to latest -current.
 1.93.6.1  18-Feb-2012  mrg merge to -current.
 1.93.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.93.2.2  30-Oct-2012  yamt sync with head
 1.93.2.1  17-Apr-2012  yamt sync with head
 1.95.8.1  08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.95.6.1  08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.95.2.1  08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.97.2.3  03-Dec-2017  jdolecek update from HEAD
 1.97.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.97.2.1  23-Jun-2013  tls resync from head
 1.99.2.2  18-May-2014  rmind sync with head
 1.99.2.1  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.100.2.1  10-Aug-2014  tls Rebase.
 1.103.4.10  28-Aug-2017  skrll Sync with HEAD
 1.103.4.9  05-Feb-2017  skrll Sync with HEAD
 1.103.4.8  09-Jul-2016  skrll Sync with HEAD
 1.103.4.7  29-May-2016  skrll Sync with HEAD
 1.103.4.6  22-Apr-2016  skrll Sync with HEAD
 1.103.4.5  19-Mar-2016  skrll Sync with HEAD
 1.103.4.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.103.4.3  22-Sep-2015  skrll Sync with HEAD
 1.103.4.2  06-Jun-2015  skrll Sync with HEAD
 1.103.4.1  06-Apr-2015  skrll Sync with HEAD
 1.113.2.3  26-Apr-2017  pgoyette Sync with HEAD
 1.113.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.113.2.1  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.114.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.117.4.5  12-May-2018  martin Pull up following revision(s) (requested by roy in ticket #821):

sys/netinet6/in6_proto.c: revision 1.125
sys/net/raw_cb.h: revision 1.29
sys/kern/uipc_usrreq.c: revision 1.186

Increase the default size of some receive buffers from 8k to 16k.

This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.
 1.117.4.4  31-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #676):

sys/netinet/in_proto.c: revision 1.127
sys/netinet6/in6_proto.c: revision 1.122

Add the PR_LASTHDR flag on the PFsync and CARP entries. Otherwise a
"require" IPsec policy is not enforced on them, and unauthenticated
packets will be accepted.

Tested with a require-AH configuration. Sent on tech-net@, no comment.
 1.117.4.3  30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #672):
sys/netinet6/in6_proto.c: revision 1.120
Change ip6_hdrnestlimit to be 15 instead of 50. I couldn't find any
reference in RFCs about what a correct limit should be, but FreeBSD already
uses 15.
If an IPv6 packet has 50 options, there is clearly something wrong with it.
 1.117.4.2  24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #305):
distrib/sets/lists/tests/mi: revision 1.762
sys/net/route.c: revision 1.198-1.201
sys/net/route.h: revision 1.114
sys/netatalk/at_proto.c: revision 1.22
sys/netinet/in_proto.c: revision 1.124
sys/netinet6/in6_proto.c: revision 1.118
sys/netmpls/mpls_proto.c: revision 1.31
sys/netnatm/natm_proto.c: revision 1.18
sys/rump/net/lib/libsockin/sockin.c: revision 1.65
sys/sys/domain.h: revision 1.33
tests/net/route/Makefile: revision 1.6
tests/net/route/t_rtcache.sh: revision 1.1
Add tests of rtcache invalidation
Remove unnecessary NULL check of rt_ifp
It's always non-NULL.
Invalidate rtcache based on a global generation counter
The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.
One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.
This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
Remove the global lock for rtcache
Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
Synchronize on rtcache_generation with rtlock
It's racy if NET_MPSAFE is enabled.
Pointed out by joerg@
 1.117.4.1  21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.121.2.3  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.121.2.2  21-May-2018  pgoyette Sync with HEAD
 1.121.2.1  15-Mar-2018  pgoyette Synch with HEAD
 1.125.2.1  10-Jun-2019  christos Sync with HEAD
 1.126.10.1  25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)

RSS XML Feed