Home | History | Annotate | Download | only in netinet6
History log of /src/sys/netinet6/in6_pcb.c
RevisionDateAuthorComments
 1.177  04-Nov-2022  ozaki-r inpcb: get rid of parentheses for return value
 1.176  04-Nov-2022  ozaki-r inpcb: use in_port_t for port numbers
 1.175  04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.174  04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.173  28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.172  28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.171  14-Oct-2022  ryo Avoid error of "-Wreturn-local-addr", and simplify the logic.

However, -Wreturn-local-addr is still disabled by default by GCC_NO_RETURN_LOCAL_ADDR
in bsd.own.mk because it causes errors in other parts.
 1.170  29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.169  29-Jul-2022  knakahara Remove obsoleted comments.

These comments are added with IFNET_LOCK by in_pcb.c:r1.180 and
in6_pcb.c:r1.162. And then, IFNET_LOCK codes are removed in
in_pcb.c:r1.183 and in6_pcb.c:r1.166, however the comments have
remained.
 1.168  09-Jun-2022  knakahara refactor: use TAILQ_FOREACH instead of TAILQ_FOREACH_SAFE about inpt_queue.

They don't use "ninph" pointer and don't remove elements.
 1.167  08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.166  15-May-2019  ozaki-r Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
Unfortunatetly it caused a deadlock instead. A known scenario causing a
deadlock is to occur the following two operations concurrently: (a) a removal of
an IP adddres assigned to an interface and (b) a manipulation of multicast
groups to the interface. The resource dependency graph is like this:
softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

Thanks to the previous commit that avoids data races on if_flags for
IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
deadlock.

PR kern/54189
 1.165  27-Feb-2018  maxv branches: 1.165.4;
Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.164  08-Feb-2018  dholland Typos.
 1.163  22-Dec-2017  ozaki-r Add missing curlwp_bindx
 1.162  15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.161  25-Apr-2017  ozaki-r branches: 1.161.4;
Check if solock of PCB is held when SP caches in the PCB are accessed

To this end, a back pointer from inpcbpolicy to inpcb_hdr is added.
 1.160  20-Apr-2017  ozaki-r Simplify logic of udp4_sendup and udp6_sendup

They are always passed a socket with the same protocol faimiliy
as its own: AF_INET for udp4_sendup and AF_INET6 for udp6_sendup.
 1.159  02-Mar-2017  ozaki-r Make sure im6o_memberships is protected by in6p's lock (solock)
 1.158  02-Mar-2017  ozaki-r Use LIST_* macros

No functional change.
 1.157  13-Feb-2017  ozaki-r Replace splnet with splsoftnet
 1.156  23-Jan-2017  ozaki-r Get rid of splnet for pool(9)

We don't need it anymore.
 1.155  13-Dec-2016  ozaki-r branches: 1.155.2;
Remove unnecessary inclusions of nd6.h
 1.154  12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.153  08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.152  31-Oct-2016  christos restore previous logic.
 1.151  31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.150  29-Sep-2016  roy Now that we disallow sending or receiving from invalid addresses,
allow binding to tentative addresses.
 1.149  26-Aug-2016  roy Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.148  01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.147  15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.146  15-Jul-2016  ozaki-r Use ifatoia6 macro

No functional change.
 1.145  21-Jun-2016  ozaki-r branches: 1.145.2;
Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.144  21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.143  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.142  24-May-2015  rtr remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
 1.141  19-May-2015  ozaki-r Use NULL instead of 0 for pointers
 1.140  02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.139  27-Apr-2015  ozaki-r Add missing error checks on rtcache_setdst

It can fail with ENOMEM.
 1.138  27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.137  26-Apr-2015  rtr return EINVAL if sin{,6}_len != sizeof(sockaddr_in{,6}) respectively in
in{,6}_pcbconnect().

checking just m->m_len isn't enough because there are various places that
assume sa_len has been properly populated.
 1.136  24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.135  03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.134  25-Nov-2014  seanb branches: 1.134.2;
Really make SO_REUSEPORT and SO_REUSEADDR equivalent for multicast
sockets. From FreeBSD.
 1.133  25-Nov-2014  seanb Clean up any dangling ifp references in (struct in6pcb *)->in6p_v4moptions
(v4 multicast options off v4 mapped v6 socket) on interface destruction. The
code to clean this up in a true v4 socket was moved to its own function
which is now also called in the corresponding place for v6 sockets on
interface destruction.
 1.132  14-Nov-2014  maxv Do not uselessly include <sys/malloc.h>.
 1.131  11-Oct-2014  christos Succeed binding to multicast address for now: Open questions:
Open questions:

http://mail-index.netbsd.org/tech-net/2014/07/23/msg004714.html
 1.130  11-Oct-2014  christos Make IPV4 mapped addresses able to do IPV4 multicast. Fixes needed:

- allow binding to mapped v4 multicast addresses
- define v4moptions, allow setting it via ioctl, pass it to ip_output,
free it when killing the pcb.

Ideally we would allow the IPV6 multicast setsockopts work on mapped addresses
too, but this is a lot more work and linux does not do it either.
 1.129  07-Sep-2014  rmind in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.

This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.128  05-Aug-2014  rtr branches: 1.128.2;
revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.127  03-Aug-2014  rmind in6_pcbdetach: now that IGMP and multicast groups are MP-safe, we can move
the ip6_freemoptions() call outside the softnet_lock. Should fix PR/49065.
 1.126  24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.125  30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.124  23-Nov-2013  christos branches: 1.124.2;
convert from CIRCLEQ to TAILQ.
 1.123  05-Jun-2013  christos branches: 1.123.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.122  12-Apr-2013  christos PR/47738: connect(2) to 239.x.y.z should return error but does not.
 1.121  24-Aug-2012  dholland branches: 1.121.2;
Remove stray #undef, probably someone's debugging leftover from long ago.
 1.120  25-Jun-2012  christos rename rfc6056 -> portalgo, requested by yamt
 1.119  22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.118  31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.117  19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.116  24-Sep-2011  christos branches: 1.116.2; 1.116.6;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.115  31-Aug-2011  plunky NULL does not need a cast
 1.114  04-May-2011  dyoung Invalidate the vestigital PCB at the top of in6_pcblookup_connect() to
fix the bug where incoming TCPv6 connections were reset.
 1.113  03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.112  20-Aug-2010  joerg branches: 1.112.2;
Remove stray {
 1.111  20-Aug-2010  joerg Consider a mapped IPv4 address of 0.0.0.0 as unspecified. This allows
using mapped IPv4 address with connect without preceding bind.
 1.110  26-May-2009  pooka branches: 1.110.2; 1.110.4;
POOL_INIT -> pool_init
 1.109  12-May-2009  elad Implicit EPERM -> explicit EACCES.

Requested by ad@ and yamt@.
 1.108  02-May-2009  elad Replace wrong __UNCONST() use with a local variable.

Similar to issues pointed out by bouyer@ and forgotten by me when I did
the last commit.

Should fix issues reported on current-users@ in:

http://mail-index.netbsd.org/current-users/2009/05/02/msg009273.html
 1.107  30-Apr-2009  elad - Make in6_pcbbind_{addr,port}() static

- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()

- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
so that we have a more complete context

- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()

- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
not a copy of it, pointed out by bouyer@, thanks!

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
 1.106  22-Apr-2009  elad Only check if the port is used if it was specified.

Should fix problem reported in

http://mail-index.netbsd.org/current-users/2009/04/22/msg009130.html
 1.105  20-Apr-2009  elad Replace KAUTH_GENERIC_ISSUSER with a better alternative.
 1.104  20-Apr-2009  elad Extract in6_pcbbind()'s guts into two new routines: in6_pcbbind_addr() and
in6_pcbbind_port(), used for binding to an address and a port respectively.

While here, fix a possible "leak" of an in6pcb when binding to an address
succeeded but binding to an auto-assigned port failed.

Proposed and received no objections on tech-net@:

http://mail-index.netbsd.org/tech-net/2009/04/15/msg001223.html
 1.103  18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.102  14-Apr-2009  elad Don't set sin->sin_port and sin6->sin6_port to 0 before calling
ifa_ifwithaddr(), as we no longer do a byte compare on the entire struct.

Reviewed by and okay from dyoung@.
 1.101  18-Mar-2009  cegger bcopy -> memcpy
 1.100  18-Mar-2009  cegger bzero -> memset
 1.99  20-Aug-2008  matt branches: 1.99.2; 1.99.8;
Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.
 1.98  04-Aug-2008  matt Free the socket only after disposing of the PCB.
 1.97  24-Apr-2008  ad branches: 1.97.2; 1.97.4; 1.97.8;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.96  20-Mar-2008  dyoung branches: 1.96.2;
Use ip6_clearpktopts() to destroy the IPv6 PCB's in6p_outputopts,
so that there's no chance of either leaking memory, or leaving
dangling pointers to a route cache.
 1.95  19-Mar-2008  dyoung No code ever sets struct ip6_pktopts member ip6po_m, so get rid of
it.
 1.94  14-Jan-2008  dyoung branches: 1.94.2; 1.94.6;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in6_losing().
 1.93  12-Jan-2008  dyoung Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().
 1.92  10-Jan-2008  dyoung Save some rtcache_getrt() calls.
 1.91  20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.90  21-Nov-2007  drochner branches: 1.90.2; 1.90.6;
Fix in6_pcbrtentry() for the case of IPv6-mapped IPv4 addresses:
don't assume that the cached route is a sockaddr_in6, and do the
right comparisions so that no out-of-bounds memory is accessed.

btw, the use of "#ifdef INET" throughout the source doesn't look clean
to me: There are 2 cases -- whether AF_INET is usable by userland
programs, and whether IPv4 is supported as on-wire protocol.
 1.89  10-Nov-2007  dyoung Use sockaddr_in6_init().
 1.88  19-Jul-2007  dyoung branches: 1.88.4; 1.88.6; 1.88.10; 1.88.12; 1.88.14;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.87  23-May-2007  christos branches: 1.87.2;
Ansify + add a few comments, from Karl Sjödahl
 1.86  02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.85  12-Mar-2007  ad branches: 1.85.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.84  04-Mar-2007  christos branches: 1.84.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.83  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.82  26-Jan-2007  dyoung branches: 1.82.2;
Change a couple of bzeros to memsets.
 1.81  04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.80  15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.79  09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.78  08-Dec-2006  joerg Remove now superflous {.
 1.77  08-Dec-2006  joerg When a dynamic route is deleted in in_losing and in6_losing, rtrequest
is called, but the current reference via the PCB is not removed. This
is effectively a leaked reference. Call rtfree unconditional.
 1.76  02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.75  16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.74  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.73  05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.72  23-Jul-2006  ad branches: 1.72.4; 1.72.6;
Use the LWP cached credentials where sane.
 1.71  14-May-2006  elad integrate kauth.
 1.70  05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.69  21-Jan-2006  rpaulo branches: 1.69.2; 1.69.4; 1.69.6; 1.69.8; 1.69.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.68  15-Nov-2005  dsl branches: 1.68.2;
Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.67  29-May-2005  christos branches: 1.67.2; 1.67.8;
- avoid shadowed variables
- sprinkle const.
 1.66  04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.65  24-Jun-2004  drochner abstain from typecasting the LHS of an assignment;
gcc-3.4.x doesn't like it
 1.64  26-Apr-2004  jonathan Fix per-PCB IPsec policy cache for FAST_IPSEC:

The sys/netipsec policy-cache (added by Jason Thorpe as a rewrite of
the KAME per-PCB policy cache) assumes that policy-cacheable PCBs
always has a non-NULL inph_sp in the common PCB header. So we must
do all the per-PCB policy cache calls when either (KAME) IPSEC, or
FAST_IPSEC is defined. ``Make it so''.

We can now support non-IPsec'ed IPv6 traffic, when both
``options FAST_IPSEC'' and ``options INET6'' are configured.
 1.63  25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.62  29-Mar-2004  atatat Make these compile without INET. tcp_input probably needs a lot more
work...
 1.61  13-Jan-2004  itojun branches: 1.61.2;
avoid deref-after-free.
http://sources.zabbadoz.net/freebsd/patchset/106-ipsec-pcb-discon.diff
 1.60  05-Nov-2003  itojun use hash table for in6_pcbbind(). similar to in_pcb 1.89 -> 1.90
 1.59  30-Sep-2003  christos Fix off-by-one in PRC_NCMDS check. From FreeBSD via OpenBSD
 1.58  06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.57  06-Sep-2003  itojun clarify flowlabel handling
 1.56  04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.55  13-Aug-2003  itojun in6_pcbrtentry() now returns IPv4 rtentry if in6pcb is connected to IPv4 mapped
address. PR kern/22431 from Andreas Gustafsson
 1.54  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.53  05-Nov-2002  perry branches: 1.53.6;
include opt_inet.h -- found by David Laight
 1.52  11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.51  26-Aug-2002  itojun pass proc * to in6_pcbsetport. PR 18073
 1.50  20-Aug-2002  itojun sync up use_deprecated handling with latest kame.
- bind(deprecated) is allowed, trusting userland app is doing the right thing
- use_deprecated default to 1
 1.49  11-Jun-2002  itojun share policy-on-pcb for listening socket. sync w/kame
todo: share even more, avoid frequent updates of spidx
 1.48  28-May-2002  itojun correct in*_pcbrtentry. check cached value correctly.
 1.47  28-May-2002  itojun in in*_pcbrtentry(), check if route is still valid (RTF_UP),
and address family is still valid.
 1.46  21-Mar-2002  itojun branches: 1.46.4; 1.46.6;
protect in6pcb queue operation by splnet, as pcb queue will be touched
by in6_pcbpurgeif() under splnet.
 1.45  21-Dec-2001  itojun whitespace/costmetic sync w/kame
 1.44  13-Nov-2001  lukem add RCSIDs
 1.43  24-Oct-2001  itojun more whitespace sync with kame
 1.42  16-Oct-2001  itojun branches: 1.42.2;
remove unused #define. sync whitespace/comment with kame.
 1.41  15-Oct-2001  itojun implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.40  06-Aug-2001  itojun cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.39  25-Jul-2001  itojun allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.38  02-Jul-2001  itojun branches: 1.38.2;
on interface removal, remove multicast groups joined from pcb, before
removing interface addresses. without the change, we may deref
NULL pointer in in_pcbpurgeif(). from jinmei@kame, sync with kame
 1.37  27-Jun-2001  itojun netbsd; on interface removal, force pcbs to leave from multicast groups
pointing toward the interface about to be removed. sync with kame
XXX still need more discussions on semantics. the behavior should be safer
 1.36  11-May-2001  itojun there's no need to #if NFAITH here. IN6P_FAITH can be set even on
NFAITH == 0 kernel, it is safer to always check the condition.
sync with kame.
 1.35  11-Feb-2001  itojun branches: 1.35.2;
pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.34  10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.33  21-Dec-2000  itojun make sure we notify of routing changes, even if we have net route pointed
to by inpcb.
 1.32  19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.31  02-Oct-2000  itojun fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.30  26-Aug-2000  itojun implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
 1.29  07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.28  06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.27  02-Jul-2000  itojun repair kernel faithd(8) support. there were two mistakes:
(1) tcp6_input dropped packets for translation
(2) in6_pcblookup_connect was too strict
 1.26  08-Jun-2000  itojun branches: 1.26.2;
make sure not to overwrite sockaddr on PRU_SEND/PRU_CONNECT to
link-local address. From: frank
 1.25  05-Jun-2000  itojun backout change to in6_pcbnotify(). the change seems premature
(may cause trouble with advanced API in certain situation).
 1.24  05-Jun-2000  itojun pass struct proc * down to udp6_output and in6_pcbbind.
 1.23  03-Jun-2000  itojun sync with kame.
- use latest source address selection code - in6_src.c.
- correct frag header insertion.
- deep copy ip6 header portion in ip6_mloopback to avoid overwrite.
- do not bark when we forward packet to loopback.
- some cosmetics.
 1.22  29-May-2000  itojun disallow bind(2) with IPv4 mapped address for now. port number check is
insufficient at this moment and we can bind(2) two sockets listen on same
port number.

for real fix, we need to check inpcb table with in6pcb. we can't
find inpcb chain from particular in6pcb chain (like finding tcbtable from tcb6)
luckily RFC2553 does not talk about bind(2) behavior for IPv4 mapped.
IPv4 mapped brings in too much complexities...
 1.21  02-Mar-2000  itojun branches: 1.21.2;
bump kame revision id
 1.20  02-Mar-2000  itojun properly handle notifies from icmp6, so that we can properly reflect
redirects/unreach to transport layer. (sync with latest kame)
 1.19  06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.18  03-Feb-2000  itojun use u_int16_t, not u_short, for port #.
 1.17  03-Feb-2000  itojun remove #if 0'ed code
 1.16  02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.15  01-Feb-2000  thorpej Improve the readability of one small piece of code.
 1.14  31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.13  26-Jan-2000  itojun make setsockopt(IPV6_PORTRANGE) work. obeys IPNOPRIVPORTS.
 1.12  06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.11  06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.10  13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.9  31-Jul-1999  itojun branches: 1.9.2; 1.9.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.8  17-Jul-1999  itojun fix faith interface support. need testing.
(i understand this is a dirty hack, of course)
 1.7  09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.6  04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.5  03-Jul-1999  thorpej RCS ID police.
 1.4  02-Jul-1999  itojun try to get a non-conflicting port # when bind(2) to port number 0
is called.
 1.3  02-Jul-1999  itojun expand insque/remque (quick hack). fundamental fix should be done
while clarifying relationship between inpcb and in6pcb.

PR: 7891
 1.2  01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1  28-Jun-1999  itojun branches: 1.1.2;
file in6_pcb.c was initially added on branch kame.
 1.1.2.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3  02-Aug-1999  thorpej Update from trunk.
 1.2.2.2  01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1  01-Jul-1999  thorpej file in6_pcb.c was added on branch chs-ubc2 on 1999-07-01 23:48:27 +0000
 1.9.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.9.2.3  11-Feb-2001  bouyer Sync with HEAD.
 1.9.2.2  05-Jan-2001  bouyer Sync with HEAD
 1.9.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.26.2.3  09-Sep-2003  msaitoh Pull up rev. 1.55 via patch (requested by itojun in ticket #66):
in6_pcbrtentry() now returns IPv4 rtentry if in6pcb is connected to IPv4
mapped address. Fixes PR 22431 from Andreas Gustafsson
 1.26.2.2  27-Aug-2000  itojun pullup (approved by releng-1-5)

> implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.

> cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3
> cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8
> cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h
> cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c
> cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c
> cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c
> cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
 1.26.2.1  03-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
repair kernel faithd(8) support. there were two mistakes:
(1) tcp6_input dropped packets for translation
(2) in6_pcblookup_connect was too strict
 1.35.2.10  11-Nov-2002  nathanw Catch up to -current
 1.35.2.9  17-Sep-2002  nathanw Catch up to -current.
 1.35.2.8  27-Aug-2002  nathanw Catch up to -current.
 1.35.2.7  20-Jun-2002  nathanw Catch up to -current.
 1.35.2.6  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.35.2.5  08-Jan-2002  nathanw Catch up to -current.
 1.35.2.4  14-Nov-2001  nathanw Catch up to -current.
 1.35.2.3  22-Oct-2001  nathanw Catch up to -current.
 1.35.2.2  24-Aug-2001  nathanw Catch up with -current.
 1.35.2.1  21-Jun-2001  nathanw Catch up to -current.
 1.38.2.6  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.38.2.5  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.38.2.4  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.38.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.38.2.2  25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.38.2.1  03-Aug-2001  lukem update to -current
 1.42.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.46.6.4  10-Sep-2003  tron Pull up revision 1.55 via patch (requested by itojun in ticket #1405):
in6_pcbrtentry() now returns IPv4 rtentry if in6pcb is connected to IPv4 mapped
address. PR kern/22431 from Andreas Gustafsson
 1.46.6.3  15-Jun-2003  tron Pull up revision 1.53 (requested by itojun in ticket #1241):
include opt_inet.h -- found by David Laight
 1.46.6.2  21-Nov-2002  he Pull up revision 1.50 (requested by itojun in ticket #708):
Allow bind() of deprecated addresses, trusting userland
application knows what it's doing.
 1.46.6.1  27-Aug-2002  lukem Pull up revision 1.51 (requested by itojun in ticket #731):
pass proc * to in6_pcbsetport. PR 18073
 1.46.4.3  29-Aug-2002  gehenna catch up with -current.
 1.46.4.2  20-Jun-2002  gehenna catch up with -current.
 1.46.4.1  30-May-2002  gehenna Catch up with -current.
 1.53.6.6  11-Dec-2005  christos Sync with head.
 1.53.6.5  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.53.6.4  18-Dec-2004  skrll Sync with HEAD.
 1.53.6.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.53.6.2  18-Sep-2004  skrll Sync with HEAD.
 1.53.6.1  03-Aug-2004  skrll Sync with HEAD
 1.61.2.1  28-Apr-2004  jmc Pullup rev 1.64 (requested by jonathan in ticket #201)

Fix per-PCB IPsec policy cache for FAST_IPSEC.
The sys/netipsec policy-cache assumes that policy-cacheable PCBs
always has a non-NULL inph_sp in the common PCB header. So we must
do all the per-PCB policy cache calls when either (KAME) IPSEC, or
FAST_IPSEC is defined.
 1.67.8.1  22-Nov-2005  yamt sync with head.
 1.67.2.8  24-Mar-2008  yamt sync with head.
 1.67.2.7  21-Jan-2008  yamt sync with head
 1.67.2.6  07-Dec-2007  yamt sync with head
 1.67.2.5  15-Nov-2007  yamt sync with head.
 1.67.2.4  03-Sep-2007  yamt sync with head.
 1.67.2.3  26-Feb-2007  yamt sync with head.
 1.67.2.2  30-Dec-2006  yamt sync with head.
 1.67.2.1  21-Jun-2006  yamt sync with head.
 1.68.2.1  01-Feb-2006  yamt sync with head.
 1.69.10.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.69.8.4  11-May-2006  elad sync with head
 1.69.8.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.69.8.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.69.8.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.69.6.2  11-Aug-2006  yamt sync with head
 1.69.6.1  24-May-2006  yamt sync with head.
 1.69.4.1  01-Jun-2006  kardel Sync with head.
 1.69.2.2  09-Sep-2006  rpaulo sync with head
 1.69.2.1  07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.72.6.3  18-Dec-2006  yamt sync with head.
 1.72.6.2  10-Dec-2006  yamt sync with head.
 1.72.6.1  22-Oct-2006  yamt sync with head
 1.72.4.3  01-Feb-2007  ad Sync with head.
 1.72.4.2  12-Jan-2007  ad Sync with head.
 1.72.4.1  18-Nov-2006  ad Sync with head.
 1.82.2.4  07-May-2007  yamt sync with head.
 1.82.2.3  24-Mar-2007  yamt sync with head.
 1.82.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.82.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.84.2.3  20-Aug-2007  ad Sync with HEAD.
 1.84.2.2  08-Jun-2007  ad Sync with head.
 1.84.2.1  13-Mar-2007  ad Sync with head.
 1.85.2.1  11-Jul-2007  mjf Sync with head.
 1.87.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.88.14.2  19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.88.14.1  19-Jul-2007  dyoung file in6_pcb.c was added on branch matt-mips64 on 2007-07-19 20:48:57 +0000
 1.88.12.4  18-Feb-2008  mjf Sync with HEAD.
 1.88.12.3  27-Dec-2007  mjf Sync with HEAD.
 1.88.12.2  08-Dec-2007  mjf Sync with HEAD.
 1.88.12.1  19-Nov-2007  mjf Sync with HEAD.
 1.88.10.2  22-Nov-2007  bouyer Sync with HEAD
 1.88.10.1  13-Nov-2007  bouyer Sync with HEAD
 1.88.6.2  23-Mar-2008  matt sync with HEAD
 1.88.6.1  09-Jan-2008  matt sync with HEAD
 1.88.4.2  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.88.4.1  11-Nov-2007  joerg Sync with HEAD.
 1.90.6.3  19-Jan-2008  bouyer Sync with HEAD
 1.90.6.2  10-Jan-2008  bouyer Sync with HEAD
 1.90.6.1  02-Jan-2008  bouyer Sync with HEAD
 1.90.2.1  26-Dec-2007  ad Sync with head.
 1.94.6.3  28-Sep-2008  mjf Sync with HEAD.
 1.94.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.94.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.94.2.1  24-Mar-2008  keiichi sync with head.
 1.96.2.1  18-May-2008  yamt sync with head.
 1.97.8.1  19-Oct-2008  haad Sync with HEAD.
 1.97.4.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.97.2.4  09-Oct-2010  yamt sync with head
 1.97.2.3  20-Jun-2009  yamt sync with head
 1.97.2.2  16-May-2009  yamt sync with head
 1.97.2.1  04-May-2009  yamt sync with head.
 1.99.8.2  23-Jul-2009  jym Sync with HEAD.
 1.99.8.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.99.2.1  28-Apr-2009  skrll Sync with HEAD.
 1.110.4.2  31-May-2011  rmind sync with head
 1.110.4.1  05-Mar-2011  rmind sync with head
 1.110.2.1  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.112.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.116.6.2  05-Apr-2012  mrg sync to latest -current.
 1.116.6.1  18-Feb-2012  mrg merge to -current.
 1.116.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.116.2.2  30-Oct-2012  yamt sync with head
 1.116.2.1  17-Apr-2012  yamt sync with head
 1.121.2.3  03-Dec-2017  jdolecek update from HEAD
 1.121.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.121.2.1  23-Jun-2013  tls resync from head
 1.123.2.3  18-May-2014  rmind sync with head
 1.123.2.2  23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.123.2.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.124.2.1  10-Aug-2014  tls Rebase.
 1.128.2.3  28-Sep-2016  bouyer Pull up following revision(s) (requested by roy in ticket #1243):
sys/netinet6/raw_ip6.c: revision 1.150 via patch
sys/netinet6/in6_pcb.c: revision 1.149 via patch
Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.128.2.2  17-Jan-2015  martin branches: 1.128.2.2.4;
Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.128.2.1  08-Sep-2014  msaitoh Pull up following revision(s) (requested by rmind in ticket #80):
sys/netinet6/in6_pcb.c: revision 1.129
sys/netinet/in_pcb.c: revision 1.152
in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.
This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.128.2.2.4.1  18-Jan-2017  skrll Sync with netbsd-5
 1.134.2.8  28-Aug-2017  skrll Sync with HEAD
 1.134.2.7  05-Feb-2017  skrll Sync with HEAD
 1.134.2.6  05-Dec-2016  skrll Sync with HEAD
 1.134.2.5  05-Oct-2016  skrll Sync with HEAD
 1.134.2.4  09-Jul-2016  skrll Sync with HEAD
 1.134.2.3  22-Sep-2015  skrll Sync with HEAD
 1.134.2.2  06-Jun-2015  skrll Sync with HEAD
 1.134.2.1  06-Apr-2015  skrll Sync with HEAD
 1.145.2.6  26-Apr-2017  pgoyette Sync with HEAD
 1.145.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.145.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.145.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.145.2.2  06-Aug-2016  pgoyette Sync with HEAD
 1.145.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.155.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.161.4.2  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #463):
sys/netinet/in.c: revision 1.212
sys/netinet/ip_output.c: revision 1.288
sys/netinet6/in6.c: revision 1.256
sys/netinet6/in6_pcb.c: revision 1.163
sys/sys/lwp.h: revision 1.176
Add missing curlwp_bindx
--
Add missing curlwp_bindx
--
Check LP_BOUND is surely set in curlwp_bindx
This may find an extra call of curlwp_bindx.
--
Fix usage of curlwp_bind in ip_output
curlwp_bindx must be called in LIFO order, i.e., we can't call curlwp_bind
and curlwp_bindx like this:
bound1 = curlwp_bind();
bound2 = curlwp_bind();
curlwp_bindx(bound1);
curlwp_bindx(bound2);
ip_outout did so if NET_MPSAFE. Fix it.
--
Fix wrong usage of psref_held
We can't use it for checking if a caller does NOT hold a given target.
If you want to do it you should have psref_not_held or something.
 1.161.4.1  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.165.4.1  10-Jun-2019  christos Sync with HEAD

RSS XML Feed