Home | History | Annotate | Download | only in net
History log of /src/sys/net/rtsock.c
RevisionDateAuthorComments
 1.256  27-Aug-2022  skrll Add a little const. NFC.
 1.255  09-Mar-2020  roy route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.254  03-Feb-2020  roy rtsock: favour ifatoia and ifatoia6 over direct struct casts
 1.253  29-Jan-2020  thorpej Do not reference ifp->if_data directly; use if_export_if_data().
 1.252  01-Sep-2019  roy branches: 1.252.2;
inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.
This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted (RTM_DELETED)
or has failed to been resolved (RTM_MISS). The latter case can be
interpreted as unreachable.
 1.251  22-Aug-2019  roy rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9
 1.250  27-May-2019  ozaki-r branches: 1.250.2;
Don't take softnet_lock in sysctl_rtable

Taking softnet_lock there can cause a locking error with nfs sosend, so we don't.
Having only KERNEL_LOCK is enough because now the routing table is protected by
KERNEL_LOCK that was introduced by the fix for PR 53043.

PR kern/54227 from Paul Ripke
 1.249  29-Apr-2019  pgoyette For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.
 1.248  01-Mar-2019  pgoyette Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.
 1.247  27-Feb-2019  ozaki-r Protect sysctl_rtable with KERNEL_LOCK and softnet_lock

In the function the routing table could be accessed without any locks, which was
unsafe. Actually, on netbsd-7, a kernel panic happened(*). The situation of
locking hasn't changed since netbsd-7 so we still need to hold the big locks on
-current (and netbsd-8) too.

Note that if NET_MPSAFE is enabled, the routing table is protected by its own
lock and we don't need the locks.

Reported and tested on netbsd-7 by sborrill@

(*) http://mail-index.netbsd.org/tech-net/2018/11/08/msg007153.html
 1.246  29-Jan-2019  pgoyette Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.
 1.245  27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.244  13-Nov-2018  maxv Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.

[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.243  07-Sep-2018  maxv Set unused pr_input field to NULL, discussed on tech-net@.
 1.242  31-Aug-2018  maxv Fix buffer overflow, detected by kASan.

ifconfig gif0 create
ifconfig gif0 up

[ 50.682919] kASan: Unauthorized Access In 0xffffffff80f22655: Addr 0xffffffff81b997a0 [8 bytes, read]
[ 50.682919] #0 0xffffffff8021ce6a in kasan_memcpy <netbsd>
[ 50.692999] #1 0xffffffff80f22655 in m_copyback_internal <netbsd>
[ 50.692999] #2 0xffffffff80f22e81 in m_copyback <netbsd>
[ 50.692999] #3 0xffffffff8103109a in rt_msg1 <netbsd>
[ 50.692999] #4 0xffffffff8159109a in compat_70_rt_newaddrmsg1 <netbsd>
[ 50.692999] #5 0xffffffff81031b0f in rt_newaddrmsg <netbsd>
[ 50.692999] #6 0xffffffff8102c35e in rt_ifa_addlocal <netbsd>
[ 50.692999] #7 0xffffffff80a5287c in in6_update_ifa1 <netbsd>
[ 50.692999] #8 0xffffffff80a54149 in in6_update_ifa <netbsd>
[ 50.692999] #9 0xffffffff80a59176 in in6_ifattach <netbsd>
[ 50.692999] #10 0xffffffff80a56dd4 in in6_if_up <netbsd>
[ 50.692999] #11 0xffffffff80fc5cb8 in if_up_locked <netbsd>
[ 50.703622] #12 0xffffffff80fcc4c1 in ifioctl_common <netbsd>
[ 50.703622] #13 0xffffffff80fde694 in gif_ioctl <netbsd>
[ 50.703622] #14 0xffffffff80fcdb1f in doifioctl <netbsd>
 1.241  25-Apr-2018  ozaki-r branches: 1.241.2;
Fix a deadlock (rt_free vs. route_intr on rt_so_mtx)

It occurs only if NET_MPSAFE is enabled.
 1.240  12-Apr-2018  ozaki-r Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.
 1.239  19-Mar-2018  roy rtsock: log dropped messages that we cannot report to userland
 1.238  25-Jan-2018  ozaki-r branches: 1.238.2;
Fix another deadlock

When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.237  19-Jan-2018  ozaki-r Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr

The deadlock happened only if NET_MPSAFE on.
 1.236  18-Dec-2017  ozaki-r Fix compile error (may be used uninitialized)

Hmm, __noinline had hidden this error.
 1.235  18-Dec-2017  ozaki-r Revert "Spinkle __noinline to some non-performance-sensitive functions for debugging"

We should do this kind of tweaks for debugging just locally and personally.

Requested by christos@
 1.234  14-Dec-2017  ozaki-r Fix a bug that tries to psref_acquire ifa with a psref used before

This fixes ATF tests that started to fail by a recent change to psref.
 1.233  14-Dec-2017  ozaki-r Protect ifp returned from route_output_get_ifa surely

An ifp returned from route_output_get_ifa was supposed to be protected
by a returned ifa; if the ifa belongs to ifp, holding the ifa prevents
the ifp from being freed. However route_output_get_ifa can return an ifp
to which a returned ifa doesn't belong. So we need to take a reference
to a returning ifp separately.
 1.232  14-Dec-2017  ozaki-r Spinkle __noinline to some non-performance-sensitive functions for debugging
 1.231  19-Nov-2017  christos Avoid using a zero family mask.
 1.230  17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.229  25-Sep-2017  ozaki-r Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
 1.228  25-Sep-2017  ozaki-r Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
 1.227  01-Jul-2017  christos put the code that returns the sizeof the socket by family in one place.
 1.226  30-Jun-2017  christos Avoid DIAGNOSTIC warning with previous fix and simplify it (don't require
memory alloc/free).
 1.225  30-Jun-2017  ozaki-r Restore the original length of a sockaddr for netmask

route(8) passes a sockaddr for netmask that is truncated with its
prefixlen. However the kernel basically doesn't expect such format
and may read beyond the data. So restore the original length of the
the data at the beginning of the kernel for the rest components.

Failures of ATF tests such as route_flags_blackhole6 should
be fixed.
 1.224  28-Jun-2017  ozaki-r Restore ARP/NDP entries to route show and netstat -r

Requested by dyoung@ some time ago
 1.223  26-Jun-2017  ozaki-r Drop RTF_UP from a routing message of a deleted ARP/NDP entry
 1.222  26-Jun-2017  ozaki-r Fix ifdef; care about a case w/ INET6 and w/o INET
 1.221  26-Jun-2017  ozaki-r Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry

A message originally included only DST and GATEWAY. Restore it.
 1.220  26-Jun-2017  ozaki-r Fix usage of routing messages on arp -d and ndp -d

It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
 1.219  23-Jun-2017  ozaki-r Tweak lltable_sysctl_dumparp

- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
 1.218  23-Jun-2017  ozaki-r Fix build of kernels without both INET and INET6
 1.217  22-Jun-2017  ozaki-r Purge L2 caches on changing an interface of a route

The change addresses situations similar to PR 51179.
 1.216  16-Jun-2017  ozaki-r Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries

ARP/NDP entries aren't connected routes.

Reported by ryo@
 1.215  16-Jun-2017  ozaki-r Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@
 1.214  15-Jun-2017  ozaki-r Simplify

We can assume that rt_ifp is always non-NULL.
 1.213  01-Jun-2017  chs branches: 1.213.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.212  11-Apr-2017  roy Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.
 1.211  24-Mar-2017  ozaki-r Forbit installing a route which its gateway is unreachable

This change needs a tweak in route_output_change to unbreak route
change commands (e.g., route change -inet6 default -reject).

PR kern/52077 (s-yamaguchi@IIJ and ozaki-r@)
 1.210  22-Mar-2017  ozaki-r Tweak and KNF some functions
 1.209  17-Mar-2017  ozaki-r Add missing NULL check

Fix PR kern/52083
 1.208  14-Mar-2017  ozaki-r Add missing pserialize_read_exit

Pointed out by riastradh@
 1.207  14-Mar-2017  ozaki-r Use if_acquire and if_release instead of using psref API directly

- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
 1.206  14-Mar-2017  ozaki-r Fix use of curlwp_bind

There was an error path that returned without curlwp_bindx.
 1.205  14-Mar-2017  ozaki-r Fix race condition in sysctl_iflist

We need to use psref for the ifa iteration because iflist_addr can sleep.
 1.204  14-Mar-2017  ozaki-r Replace DIAGNOSTIC + panic with KASSERT
 1.203  14-Mar-2017  ozaki-r Avoid debug printf just if DIAGNOSTIC
 1.202  21-Feb-2017  ozaki-r Use kmem instead of malloc
 1.201  17-Feb-2017  ozaki-r Fill rmx_locks too

Otherwise userland sees garbage in it.

This should fix t_mtudisc6 failing on babylon5.
 1.200  19-Jan-2017  ozaki-r Disable rt_update mechanism by default

This is a workaround for PR kern/51877. Enable again once the issue
is fixed.
 1.199  12-Dec-2016  ozaki-r branches: 1.199.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.198  26-Oct-2016  ozaki-r Pull RTM_CHANGE code out of route_output to make further changes easy

No functional change.
 1.197  03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.196  21-Sep-2016  roy Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.
 1.195  01-Sep-2016  roy Split out sysctl_iflist into sysctl_iflist_if and sysctl_iflist_addr.
Setup a command and function pointer in one case statement
instead of having a seconary case statement within a loop.
This makes the code much easier to follow, and possibly to add more compat
in the future.

Don't panic when running an old binary without compat support.
 1.194  01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.193  28-Jul-2016  martin PR kern/51371: avoid shifting negative values
 1.192  21-Jul-2016  ozaki-r Make complex RTM_CHANGE code understandable

Tests for route change added recently would reduce the possibility of
regressions.

Reviewed by ryo@
 1.191  07-Jul-2016  ozaki-r branches: 1.191.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.190  16-Jun-2016  ozaki-r Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND
 1.189  10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.188  17-May-2016  ozaki-r Fix RT_IN_PRINT
 1.187  17-May-2016  ozaki-r Tidy up route_output

Avoid jumping into the middle of a switch statement, use a function instead.
 1.186  12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.185  25-Apr-2016  roy Set rtm_pid = curproc->p_pid for a few more messages.
 1.184  25-Apr-2016  ozaki-r Check error of rt_setgate and rt_settag
 1.183  25-Apr-2016  ozaki-r Fix errno on rt_setgate error

I bet it's not EDQUOT (Disc quota exceeded).
 1.182  08-Apr-2016  christos - remove printf
- fix indent
 1.181  07-Apr-2016  christos Use sockaddr_dl_init
 1.180  06-Apr-2016  christos Don't interpret routing requests by interface index as arp entry additions!
 1.179  05-Apr-2016  ozaki-r Unbreak build of kernels without INET
 1.178  04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.177  21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.176  21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.175  20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.174  13-Oct-2015  rjs Add core networking support for SCTP.
 1.173  07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.172  17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.171  02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.170  26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.169  24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.168  06-Apr-2015  ozaki-r Add hint comments for big ifdef
 1.167  03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.166  02-Dec-2014  christos fix debugging printf.
 1.165  02-Dec-2014  christos use the new printing code.
 1.164  05-Sep-2014  matt branches: 1.164.2;
Don't use C++ new keyword
 1.163  09-Aug-2014  rtr branches: 1.163.2; 1.163.4; 1.163.8;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.162  08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.161  05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.160  05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.159  31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.158  30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.157  24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.156  23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.155  09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.154  09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.153  07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.152  07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.151  07-Jul-2014  rtr return EOPNOTSUPP for pr_stat instead of returning success since we
don't fill in the struct stat passed to us.
 1.150  06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.149  01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.148  22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.147  21-May-2014  rmind raw_detach: rawpcb may be embedded, free using the real size (saved in rcb).
 1.146  20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.145  19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.144  18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.143  25-Feb-2014  pooka branches: 1.143.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.142  24-Jul-2013  kefren report about route tag in sysctl route walker
 1.141  01-Mar-2013  joerg branches: 1.141.6;
Retire OSI network stack. OK core@
 1.140  30-Jan-2012  christos branches: 1.140.6;
- don't copy past the end of sockaddr if we are rounding, zero it out instead,
from mlelstv@
- put a comment explaining the 6 nuls.
 1.139  31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.138  12-Dec-2011  roy When adding or scrubbing a prefix, always notify userland even if the
prefix does not have IFA_ROUTE.
Don't scrub the interface in SIOCAIFADDR if the new address does't
have IFA_ROUTE. If more functions are added to in_ifscrub then this logic
might need to be revisited.

Fixes PR/26450.
 1.137  31-Oct-2011  yamt branches: 1.137.2; 1.137.6;
remove an unnecessary cast
 1.136  17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.135  31-Mar-2011  dyoung Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.134  10-Feb-2011  kefren Allow changing route flags. Should fix PR/40455
OK'ed: dyoung@
 1.133  01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.132  25-Dec-2010  christos branches: 1.132.2; 1.132.4;
merge the length getting code from rt_msg1 and rt_msg2 and make it fail
when the compatibility ifinfo is missing instead of returning junk.
 1.131  12-Nov-2010  roy Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.
 1.130  28-Jun-2010  kefren we need to set rt_ifp even if ifa is the same. Fixes the case when one
changes route to a different ifp but wants to keep the same ifa
 1.129  26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.128  02-May-2010  kefren Permit the existence of a route with unlinked ifp and ifa,
enabling this way the posibility to send a packet on an interface with
source address from another interface.
 1.127  16-Sep-2009  pooka branches: 1.127.2; 1.127.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.126  12-Sep-2009  tsutsui Make this compile with options RTSOCK_DEBUG.
Noticed by PR kern/41842, but fixed differently.
 1.125  02-Apr-2009  christos Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0
 1.124  11-Mar-2009  roy Revert r1.119 as the implementation is broken.
 1.123  20-Feb-2009  yamt remove inline from some functions which are not small or critical.
 1.122  14-Feb-2009  christos mention when this will really break, not 2038 but 2145.
 1.121  11-Jan-2009  christos branches: 1.121.2;
we need route_enqueue not to be static
 1.120  11-Jan-2009  christos merge christos-time_t
 1.119  21-Dec-2008  roy When removing routes automatically added, remove the flag from the associated
address.
When changing routes automatically addded, move the flag to the new assoicated
address.
 1.118  17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.117  12-Dec-2008  christos RTAX_GENMASK and RTAX_AUTHOR could cause kernel memory corruption because
info struct members could be pointing to free'd memory. Fix from dyoung.
XXX: Pullup to 5.0
 1.116  07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.115  28-Oct-2008  christos branches: 1.115.2;
Fold long lines created by the previous commit. No functional change.
 1.114  28-Oct-2008  dyoung Stop the "Sleazy use of local variables throughout file", replace
'dst' with 'info.rti_info[RTAX_DST]', et cetera.
 1.113  25-Oct-2008  christos branches: 1.113.2;
Fix handling of RTAX_GENMASK. Since this has been removed, userland programs
that set it, ended up causing the kernel to reference random garbage. Ignore
it for compatibility, but add a DIAGNOSTIC message so that userland programs
that set it can be fixed. The only one so far is pppd. Hi dyoung!
 1.112  24-Oct-2008  dyoung Do not gratuitously cast to void *. Remove excess parenthesization.
Do not "test truth" of pointers, but compare with NULL.

No functional change intended.
 1.111  28-Aug-2008  christos - more void * removal
- bcopy -> memcpy
- memmove -> memcpy
- explicitly initialize size to 0 on memory allocation failure.
 1.110  28-Aug-2008  dyoung Do not cast to void * unnecessarily.
 1.109  15-Jun-2008  cube branches: 1.109.2;
Fix previous: a well hidden assignment was lost.
 1.108  15-Jun-2008  christos - add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.107  01-Jun-2008  christos branches: 1.107.2;
Don't obliterate the whole message, preserve the data we have just written
and only zero out the rest.
 1.106  29-May-2008  christos PR/38791: J.T. Conklin: routing socket event header not cleared
 1.105  25-May-2008  dholland fix typo
 1.104  24-May-2008  christos Coverity CID 5013: Add diagnostic test for bad cmd parameter.
 1.103  13-May-2008  dyoung Replace a call to rtrequest() with single dst, mask, gateway
arguments, with a call to rtrequest1() with the rt_addrinfo those
single arguments come from. No functional change intended.
 1.102  11-May-2008  dyoung Use memset, memmove, and memcmp instead of Bzero, Bcopy, and Bcmp,
respectively.
 1.101  24-Apr-2008  ad branches: 1.101.2; 1.101.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.100  29-Mar-2008  yamt branches: 1.100.2; 1.100.4;
route_intr: fill a correct member of sockproto. (sp_family -> sp_protocol)
 1.99  26-Mar-2008  ad Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.
 1.98  20-Feb-2008  matt branches: 1.98.2; 1.98.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.97  20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.96  05-Dec-2007  dyoung branches: 1.96.4;
Use IFADDR_FIRST(), IFADDR_NEXT().
 1.95  19-Jul-2007  dyoung branches: 1.95.4; 1.95.6; 1.95.12; 1.95.14; 1.95.16;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.94  09-Jun-2007  dyoung branches: 1.94.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.93  04-Mar-2007  christos branches: 1.93.2; 1.93.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.92  18-Feb-2007  matt Initialize routeswitch with structure initializers.
 1.91  13-Nov-2006  dyoung branches: 1.91.4;
make the routing socket report the right source address in RTM_GET
responses when a source-address selection policy is in use.
 1.90  13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.89  19-Sep-2006  elad Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.
 1.88  08-Sep-2006  elad branches: 1.88.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.87  03-Sep-2006  christos branches: 1.87.2;
use c99 initializers
 1.86  23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.85  27-May-2006  elad add sysctl for routing stats
 1.84  14-May-2006  elad branches: 1.84.2;
integrate kauth.
 1.83  15-Apr-2006  christos Coverity CID 854: Add KASSERT before deref.
 1.82  15-Apr-2006  christos Coverity CID 853: Prevent NULL deref.
 1.81  21-Feb-2006  rpaulo branches: 1.81.2; 1.81.4; 1.81.6;
In sysctl_iflist() don't assume TAILQ_FIRST() will never be NULL.
Prevents crash found by Uwe and fix confirmed working by Jeff Ito (all
on tech-net).
 1.80  24-Dec-2005  perry branches: 1.80.2; 1.80.4; 1.80.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.79  11-Dec-2005  christos merge ktrace-lwp.
 1.78  22-Jun-2005  dyoung branches: 1.78.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.
 1.77  09-Jun-2005  atatat Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.
 1.76  29-May-2005  christos - sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.
 1.75  26-Feb-2005  perry nuke trailing whitespace
 1.74  24-Jan-2005  matt branches: 1.74.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.73  23-Jan-2005  matt Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.72  23-Oct-2004  christos branches: 1.72.4;
PR/27286: Tom Ivar Helbekkmo: Allow RTM_GET to work with RTA_IFA|RTA_IFP set.

Quiting Tom: The problem is the special case of an RTM_GET message
that wants interface information included in the response, and
therefore include the RTA_IFA or RTA_IFP (or both) flags in the
bitmask that says what addresses are supplied in the message. For
the RTM_GET message, it doesn't make sense to supply addresses
other than the one you're asking about, so those two other bits
are, in that specific case, overloaded with this meaning.

There is code in sys/net/rtsock.c to handle the case, but at some
time, extra sanity checking of the received message was added, that
failed to take this possibility into account.

The patch, is needed for the Asterisk software PBX to work properly
when it has multiple interfaces active: it needs to ask the kernel
for the IP address of the interface that will be used to communicate
with a given host.
 1.71  25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.70  22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.69  21-Apr-2004  matt ANSI-fy and some additional de-__P and constification.
 1.68  21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.67  24-Mar-2004  atatat branches: 1.67.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.66  28-Dec-2003  atatat Sysctl functions called for "generic" nodes should forward "query"
requests (where possible), rather than returning errors.
 1.65  04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.64  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.63  29-Jun-2003  fvdl branches: 1.63.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.62  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.61  24-Jun-2003  itojun recover code that requires exact match on rtm_change/lock (lost in 1.16).
without it "route change X" would change less-specific route by mistake.
reported by jinmei@kame
 1.60  16-May-2003  itojun use strlcpy
 1.59  02-May-2003  itojun KNF
 1.58  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.57  24-Nov-2002  scw Quell an uninitialised variable warning.
 1.56  02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.55  22-Feb-2002  christos branches: 1.55.10;
PR/15703: Sean Boudreau: Case in route_output() where struct rtentry *rt
dereferenced after free.
 1.54  12-Nov-2001  lukem add RCSIDs
 1.53  05-Nov-2001  matt Switch to using queue access macros instead of refering to the member
fields explicitly.
 1.52  29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.51  16-Sep-2001  wiz branches: 1.51.2;
Spell 'occurred' with two 'r's.
 1.50  21-Jul-2001  itojun branches: 1.50.2;
repair validation on RTAX_GENMASK insertion. has been broken since 44bsd.
(freebsd3 has a fix since 1999, but has insufficient validation on sa_len)
 1.49  19-Jul-2001  enami No need to clear part of struct rt_addrinfo in rt_xaddrs() since the only
caller clears whole the struct.
 1.48  18-Jul-2001  thorpej bzero -> memset
 1.47  04-Jun-2001  itojun branches: 1.47.2;
simplify previous change (mbuf length adjustment for rtsock response).
 1.46  04-Jun-2001  itojun adjust routing socket response mbufs to the correct length. sync with kame.
 1.45  17-Jan-2001  itojun branches: 1.45.2;
pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.44  10-Nov-2000  enami Don't require the size of sockaddr to be rounded up if it was the last one
and was netmask.
 1.43  19-Oct-2000  itojun prevent stack overwrite due to bzero() arg mistake. from msaitoh.
 1.42  28-Sep-2000  erh When grabbing address structures out of a character array make sure that the number of addresses and length of each match up with the size of the data we're handed. Fixes arp on the alpha.
 1.41  28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.40  15-Apr-2000  simonb branches: 1.40.4;
Remove some routing specific sysctl function declarations from
<sys/sysctl.h> and make them static in net/rtsock.c.
 1.39  30-Mar-2000  augustss Kill some more register declarations.
 1.38  12-Mar-2000  itojun initialize rn with 0, just to be sure
 1.37  10-Mar-2000  itojun do not touch radix_node with RNF_ROOT on route_output(). this can
cause kernel panic (by non-root invocation of route(8)) on certain
routing table setup.
KAME PR: 217
 1.36  06-Mar-2000  thorpej - Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.
 1.35  17-Feb-2000  itojun backout incomplete hack from KAME codebase (originally from bbn).

the hack tries to respect ifa or ifp passed to RTM_ADD. However, the change
broke certain link-layers. They include:
- midway ethernet card (en*), which uses sockaddr_dl in gateway portion
to pass PVC information. with the patch, the gateway portion will be
overwritten by empty sockaddr_dl and PVC initialization will fail.
- IPv6, which can't set static ND table with the patch (ndp -s), for the
similar reason as above.

There may be improved hack coming soon, hope the new one does not break others.
 1.34  11-Feb-2000  itojun make assumption in rt_msg1 (len <= MHLEN + MLEN) explicit.
panic if not satisfied.
 1.33  01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.32  19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.31  09-Jul-1999  thorpej branches: 1.31.2; 1.31.8;
defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.30  01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.29  02-Apr-1999  chopps deal with failure of malloc NOWAIT by restarting after mallocing with WAIT.
don't write beyond the users given buffer size (this happened if there was
enough space for the initial malloc to succeed).
 1.28  12-Dec-1998  christos branches: 1.28.2;
fix thinko in previous change.
 1.27  10-Dec-1998  christos IPX counters and centralize statistics routine.
 1.26  01-Mar-1998  fvdl branches: 1.26.6;
Merge with Lite2 + local changes
 1.25  10-Dec-1997  christos PR/2733: Bill Sommerfeld: route change command can crash system. Actually
the case mentioned in the PR was fixed as part of PR/2582. There was a similar
case though that was not handled as part of my initial fix, which was fixed
in FreeBSD. I applied the remaining part from FreeBSD and the code matches
now the FreeBSD respective version. [this probably should be pulled up for 1.3]
 1.24  27-Mar-1997  thorpej branches: 1.24.8;
m_copyback() is now in uipc_mbuf.c
 1.23  22-Feb-1997  thorpej Allow non-superuser to open, listen to, and send safe commands on the
routing socket. Superuser priviledge is required for all commands
but RTM_GET.
 1.22  11-Dec-1996  mycroft branches: 1.22.4;
Undo silly part of previous change.
 1.21  01-Jul-1996  christos - Fix PR/2582: default route change without specifying gateway kills system.

While I was there:
- Fix KNF style problem.
- Remove bogus casts to 0, and (caddr_t).
 1.20  23-May-1996  mycroft We must indirect through the higher-level protocol for
PRU_{BIND,CONNECT} so that it can check the sockaddr.
 1.19  22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.18  29-Mar-1996  cgd branches: 1.18.4;
make this version of ROUNDUP() consistent with the others in this directory.
(only makes a diff on the alpha.)
 1.17  13-Feb-1996  christos Net prototypes
 1.16  19-Aug-1995  cgd Update to latest code from CSRG.
 1.15  17-Aug-1995  mycroft so_pcb should be a void *.
 1.14  12-Aug-1995  mycroft splnet --> splsoftnet
 1.13  12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.12  08-Mar-1995  cgd fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.
 1.11  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.10  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.9  11-May-1994  mycroft Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.
 1.8  07-May-1994  cgd kill kinfo stuff, for now
 1.7  10-Feb-1994  mycroft Deprecate af.h.
 1.6  16-Jan-1994  cgd include <machine/cpu.h> not <machine/mtpr.h>
 1.5  18-Dec-1993  mycroft Canonicalize all #includes.
 1.4  04-Sep-1993  jtc branches: 1.4.2;
include systm.h to get prototypes (and possibly inlines) of *max functions.
 1.3  22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2  18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.4  01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.3  08-Nov-1993  mycroft Remove references to af.h.
 1.4.2.2  16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.4.2.1  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.18.4.2  11-Dec-1996  mycroft From trunk:
Fix null pointer dereference when attempting to change the default route
without specifying a gateway.
 1.18.4.1  11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.22.4.1  12-Mar-1997  is Merge in changes from The Trunk
 1.24.8.1  15-Dec-1997  mellon Pull rev 1.25 up from trunk (christos)
 1.26.6.1  11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.28.2.1  02-Apr-1999  chopps branches: 1.28.2.1.2; 1.28.2.1.4;
pull-up revision 1.29
 1.28.2.1.4.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.28.2.1.4.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.28.2.1.4.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.28.2.1.2.3  02-Aug-1999  thorpej Update from trunk.
 1.28.2.1.2.2  01-Jul-1999  thorpej Sync w/ -current.
 1.28.2.1.2.1  21-Jun-1999  thorpej Sync w/ -current.
 1.31.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.31.2.3  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.31.2.2  22-Nov-2000  bouyer Sync with HEAD.
 1.31.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.40.4.2  25-Jun-2003  msaitoh Pullup reviosion 1.60 (requested by itojun in ticket #48):
recover code that requires exact match on rtm_change/lock (lost in 1.16).
without it "route change X" would change less-specific route by mistake.
reported by jinmei@kame
 1.40.4.1  19-Oct-2000  he Pull up revision 1.43 (requested by itojun):
Prevent stack overwrite due to bzero() argument mistake.
 1.45.2.11  11-Dec-2002  thorpej Sync with HEAD.
 1.45.2.10  11-Nov-2002  nathanw Catch up to -current
 1.45.2.9  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.45.2.8  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.45.2.7  28-Feb-2002  nathanw Catch up to -current.
 1.45.2.6  14-Nov-2001  nathanw Catch up to -current.
 1.45.2.5  21-Sep-2001  nathanw Catch up to -current.
 1.45.2.4  24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.45.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.45.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.45.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.47.2.3  16-Mar-2002  jdolecek Catch up with -current.
 1.47.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.47.2.1  03-Aug-2001  lukem update to -current
 1.50.2.1  01-Oct-2001  fvdl Catch up with -current.
 1.51.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.10.1  24-Jun-2003  grant Pull up revision 1.61 (requested by itojun in ticket #1336):

recover code that requires exact match on rtm_change/lock (lost in
1.16). without it "route change X" would change less-specific route by
mistake. reported by jinmei@kame
 1.63.2.9  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.63.2.8  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.63.2.7  04-Feb-2005  skrll Sync with HEAD.
 1.63.2.6  24-Jan-2005  skrll Sync with HEAD.
 1.63.2.5  02-Nov-2004  skrll Sync with HEAD.
 1.63.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.63.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.63.2.2  03-Aug-2004  skrll Sync with HEAD
 1.63.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.67.2.1  28-May-2004  tron branches: 1.67.2.1.2;
Pull up revision 1.71 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.67.2.1.2.1  18-May-2005  riz Pull up revision 1.72 via patch (requested by christos in ticket #961):
PR/27286: Tom Ivar Helbekkmo: Allow RTM_GET to work with RTA_IFA|RTA_IFP set.
Quiting Tom: The problem is the special case of an RTM_GET message
that wants interface information included in the response, and
therefore include the RTA_IFA or RTA_IFP (or both) flags in the
bitmask that says what addresses are supplied in the message. For
the RTM_GET message, it doesn't make sense to supply addresses
other than the one you're asking about, so those two other bits
are, in that specific case, overloaded with this meaning.
There is code in sys/net/rtsock.c to handle the case, but at some
time, extra sanity checking of the received message was added, that
failed to take this possibility into account.
The patch, is needed for the Asterisk software PBX to work properly
when it has multiple interfaces active: it needs to ask the kernel
for the IP address of the interface that will be used to communicate
with a given host.
 1.72.4.1  29-Apr-2005  kent sync with -current
 1.74.2.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.78.2.7  27-Feb-2008  yamt sync with head.
 1.78.2.6  21-Jan-2008  yamt sync with head
 1.78.2.5  07-Dec-2007  yamt sync with head
 1.78.2.4  03-Sep-2007  yamt sync with head.
 1.78.2.3  26-Feb-2007  yamt sync with head.
 1.78.2.2  30-Dec-2006  yamt sync with head.
 1.78.2.1  21-Jun-2006  yamt sync with head.
 1.80.6.2  01-Jun-2006  kardel Sync with head.
 1.80.6.1  22-Apr-2006  simonb Sync with head.
 1.80.4.1  09-Sep-2006  rpaulo sync with head
 1.80.2.1  01-Mar-2006  yamt sync with head.
 1.81.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.81.4.4  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.81.4.3  19-Apr-2006  elad sync with head.
 1.81.4.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.81.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.81.2.5  14-Sep-2006  yamt sync with head.
 1.81.2.4  03-Sep-2006  yamt sync with head.
 1.81.2.3  11-Aug-2006  yamt sync with head
 1.81.2.2  26-Jun-2006  yamt sync with head.
 1.81.2.1  24-May-2006  yamt sync with head.
 1.84.2.1  19-Jun-2006  chap Sync with head.
 1.87.2.1  18-Nov-2006  ad Sync with head.
 1.88.2.2  10-Dec-2006  yamt sync with head.
 1.88.2.1  22-Oct-2006  yamt sync with head
 1.91.4.2  12-Mar-2007  rmind Sync with HEAD.
 1.91.4.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.93.4.1  11-Jul-2007  mjf Sync with head.
 1.93.2.2  20-Aug-2007  ad Sync with HEAD.
 1.93.2.1  15-Jul-2007  ad Sync with head.
 1.94.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.95.16.2  19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.95.16.1  19-Jul-2007  dyoung file rtsock.c was added on branch matt-mips64 on 2007-07-19 20:48:54 +0000
 1.95.14.2  26-Dec-2007  ad Sync with head.
 1.95.14.1  08-Dec-2007  ad Sync with head.
 1.95.12.2  27-Dec-2007  mjf Sync with HEAD.
 1.95.12.1  08-Dec-2007  mjf Sync with HEAD.
 1.95.6.2  23-Mar-2008  matt sync with HEAD
 1.95.6.1  09-Jan-2008  matt sync with HEAD
 1.95.4.1  09-Dec-2007  jmcneill Sync with HEAD.
 1.96.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.98.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.98.6.4  28-Sep-2008  mjf Sync with HEAD.
 1.98.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.98.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.98.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.98.2.1  22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.100.4.3  17-Jun-2008  yamt sync with head.
 1.100.4.2  04-Jun-2008  yamt sync with head
 1.100.4.1  18-May-2008  yamt sync with head.
 1.100.2.7  29-Dec-2008  christos protect with _KERNEL_OPT the compat netbsd option.
 1.100.2.6  28-Dec-2008  christos ort_metrics -> rt_metrics
rt_metrics -> nrt_metrics
for userland compatibility
 1.100.2.5  27-Dec-2008  christos merge with head.
 1.100.2.4  09-Nov-2008  christos merge with head.
 1.100.2.3  01-Nov-2008  christos Sync with head.
 1.100.2.2  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.100.2.1  29-Mar-2008  christos file rtsock.c was added on branch christos-time_t on 2008-03-29 20:47:02 +0000
 1.101.4.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.101.4.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.101.2.5  11-Aug-2010  yamt sync with head.
 1.101.2.4  11-Mar-2010  yamt sync with head
 1.101.2.3  16-Sep-2009  yamt sync with head
 1.101.2.2  04-May-2009  yamt sync with head.
 1.101.2.1  16-May-2008  yamt sync with head.
 1.107.2.1  18-Jun-2008  simonb Sync with head.
 1.109.2.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.109.2.1  19-Oct-2008  haad Sync with HEAD.
 1.113.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.113.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.113.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.115.2.4  03-Apr-2009  snj branches: 1.115.2.4.4;
Pull up following revision(s) (requested by christos in ticket #650):
sys/net/route.c: revision 1.117
sys/net/route.h: revision 1.73
sys/net/rtsock.c: revision 1.125
usr.sbin/arp/arp.c: revision 1.48
usr.sbin/pppd/pppd/sys-bsd.c: revision 1.59
Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.
 1.115.2.3  15-Mar-2009  snj Pull up following revision(s) (requested by roy in ticket #560):
sys/net/rtsock.c: revision 1.124
Revert r1.119 as the implementation is broken.
 1.115.2.2  09-Jan-2009  snj Pull up following revision(s) (requested by roy in ticket #239):
sys/net/rtsock.c: revision 1.119
When removing routes automatically added, remove the flag from the
associated address.
When changing routes automatically addded, move the flag to the new
assoicated address.
 1.115.2.1  23-Dec-2008  snj Pull up following revision(s) (requested by christos in ticket #202):
sys/net/rtsock.c: revision 1.117
RTAX_GENMASK and RTAX_AUTHOR could cause kernel memory corruption because
info struct members could be pointing to free'd memory. Fix from dyoung.
XXX: Pullup to 5.0
 1.115.2.4.4.2  13-May-2010  matt Make sure all structure lengths are rounded via RT_ROUNDUP in routing messages.
This simplies the protocol since all items will now start on a RT_ROUNDUP
aligned address independent of the structure.
 1.115.2.4.4.1  27-Apr-2010  matt Make sure each rt_msg has an aligned length.
 1.121.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.127.4.4  21-Apr-2011  rmind sync with head
 1.127.4.3  05-Mar-2011  rmind sync with head
 1.127.4.2  03-Jul-2010  rmind sync with head
 1.127.4.1  30-May-2010  rmind sync with head
 1.127.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.132.4.2  17-Feb-2011  bouyer Sync with HEAD
 1.132.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.132.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.137.6.1  18-Feb-2012  mrg merge to -current.
 1.137.2.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.137.2.1  17-Apr-2012  yamt sync with head
 1.140.6.3  03-Dec-2017  jdolecek update from HEAD
 1.140.6.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.140.6.1  23-Jun-2013  tls resync from head
 1.141.6.3  18-May-2014  rmind sync with head
 1.141.6.2  28-Aug-2013  rmind sync with head
 1.141.6.1  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.143.2.1  10-Aug-2014  tls Rebase.
 1.163.8.2  23-Feb-2019  martin Apply patch, requested by sborrill in ticket #1680:

sys/net/rtsock.c (apply patch)

Fix locking for sysctl_rtable (fix in HEAD will be different).
 1.163.8.1  28-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1657):

sys/net/rtsock.c: revision 1.244 (adapted)

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.163.4.2  23-Feb-2019  martin Apply patch, requested by sborrill in ticket #1680:

sys/net/rtsock.c (apply patch)

Fix locking for sysctl_rtable (fix in HEAD will be different).
 1.163.4.1  28-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1657):

sys/net/rtsock.c: revision 1.244 (adapted)

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.163.2.2  23-Feb-2019  martin Apply patch, requested by sborrill in ticket #1680:

sys/net/rtsock.c (apply patch)

Fix locking for sysctl_rtable (fix in HEAD will be different).
 1.163.2.1  28-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1657):

sys/net/rtsock.c: revision 1.244 (adapted)

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.164.2.12  28-Aug-2017  skrll Sync with HEAD
 1.164.2.11  05-Feb-2017  skrll Sync with HEAD
 1.164.2.10  05-Dec-2016  skrll Sync with HEAD
 1.164.2.9  05-Oct-2016  skrll Sync with HEAD
 1.164.2.8  09-Jul-2016  skrll Sync with HEAD
 1.164.2.7  29-May-2016  skrll Sync with HEAD
 1.164.2.6  22-Apr-2016  skrll Sync with HEAD
 1.164.2.5  19-Mar-2016  skrll Sync with HEAD
 1.164.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.164.2.3  22-Sep-2015  skrll Sync with HEAD
 1.164.2.2  06-Jun-2015  skrll Sync with HEAD
 1.164.2.1  06-Apr-2015  skrll Sync with HEAD
 1.191.2.6  26-Apr-2017  pgoyette Sync with HEAD
 1.191.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.191.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.191.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.191.2.2  06-Aug-2016  pgoyette Sync with HEAD
 1.191.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.199.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.213.2.13  29-May-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1276):

sys/net/rtsock.c: revision 1.250

Don't take softnet_lock in sysctl_rtable

Taking softnet_lock there can cause a deadlock with nfs sosend, so we don't.
Having only KERNEL_LOCK is enough because now the routing table is protected by
KERNEL_LOCK that was introduced by the fix for PR 53043.

PR kern/54227 from Paul Ripke
 1.213.2.12  07-Mar-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1203):

sys/net/rtsock.c: revision 1.247

Protect sysctl_rtable with KERNEL_LOCK and softnet_lock

In the function the routing table could be accessed without any locks, which was
unsafe. Actually, on netbsd-7, a kernel panic happened(*). The situation of
locking hasn't changed since netbsd-7 so we still need to hold the big locks on
-current (and netbsd-8) too.

Note that if NET_MPSAFE is enabled, the routing table is protected by its own
lock and we don't need the locks.

Reported and tested on netbsd-7 by sborrill@
(*) http://mail-index.netbsd.org/tech-net/2018/11/08/msg007153.html
 1.213.2.11  21-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1101):

sys/net/rtsock.c: revision 1.244

Fix kernel info leak. There are 2 bytes of padding in struct if_msghdr.
[ 944.607323] kleak: Possible leak in copyout: [len=176, leaked=2]
[ 944.617335] #0 0xffffffff80b7c44a in kleak_note <netbsd>
[ 944.627332] #1 0xffffffff80b7c4ca in kleak_copyout <netbsd>
[ 944.627332] #2 0xffffffff80c91698 in sysctl_iflist_if <netbsd>
[ 944.637336] #3 0xffffffff80c91d3c in sysctl_iflist <netbsd>
[ 944.647343] #4 0xffffffff80c93855 in sysctl_rtable <netbsd>
[ 944.647343] #5 0xffffffff80b5b328 in sysctl_dispatch <netbsd>
[ 944.657346] #6 0xffffffff80b5b62e in sys___sysctl <netbsd>
[ 944.667354] #7 0xffffffff8025ab3c in sy_call <netbsd>
[ 944.667354] #8 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 944.677365] #9 0xffffffff8025adf4 in syscall <netbsd>
 1.213.2.10  05-May-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #788):

sys/net/rtsock.c: revision 1.241

Fix a deadlock (rt_free vs. route_intr on rt_so_mtx)
It occurs only if NET_MPSAFE is enabled.
 1.213.2.9  14-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #749):

sys/net/if.h: revision 1.259
sys/net/route.c: revision 1.209
sys/net/route.h: revision 1.118
sys/net/rtsock.c: revision 1.240

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by
moving utility functions of rtentry updates from rtsock.c and ensuring
holding the rt_lock.
It also improves the atomicity of a update of a rtentry.
 1.213.2.8  09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.213.2.7  28-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #595):
sys/net/if.c: revision 1.398
sys/net/rtsock.c: revision 1.231
remove useless cast, initialize family.
Avoid using a zero family mask.
 1.213.2.6  03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.213.2.5  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #457):
sys/net/rtsock.c: revision 1.233-1.234, 1.236
Protect ifp returned from route_output_get_ifa surely
An ifp returned from route_output_get_ifa was supposed to be protected
by a returned ifa; if the ifa belongs to ifp, holding the ifa prevents
the ifp from being freed. However route_output_get_ifa can return an ifp
to which a returned ifa doesn't belong. So we need to take a reference
to a returning ifp separately.
--
Fix a bug that tries to psref_acquire ifa with a psref used before
This fixes ATF tests that started to fail by a recent change to psref.
--
Fix compile error (may be used uninitialized)
Hmm, __noinline had hidden this error.
 1.213.2.4  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.213.2.3  21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.213.2.2  25-Jul-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #140):
sys/kern/uipc_domain.c: 1.97-1.99
sys/net/rtsock.c: 1.225-1.227
sys/sys/socket.h: 1.123
Restore the original length of a sockaddr for netmask
route(8) passes a sockaddr for netmask that is truncated with its
prefixlen. However the kernel basically doesn't expect such format
and may read beyond the data. So restore the original length of the
the data at the beginning of the kernel for the rest components.
Failures of ATF tests such as route_flags_blackhole6 should
be fixed.
--
Avoid DIAGNOSTIC warning with previous fix and simplify it (don't require
memory alloc/free).
--
put the code that returns the sizeof the socket by family in one place.
--
don't warn about AF_LINK sockets with sa_len less than the size of the sockaddr
--
don't print diagnostic for AF_LINK
 1.213.2.1  07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.238.2.25  22-Jan-2019  pgoyette Convert the MODULE_{,VOID_}HOOK_CALL macros to do everything in-line
rather than defining an intermediate hook##call function. Almost
all of the hooks are called only once, and although we lose the
ability of doing things like

if (MODULE_HOOK_CALL(...) == 0) ...

we simplify things quite a bit. With this change, we no longer need
to have both declaration and definition macros, and the definition
no longer needs to have both prototype argument list and a "real"
argument list.

FWIW, the above if now needs to written as

int ret;

MODULE_HOOK_CALL(..., ret);
if (ret == 0) ...

with appropriate use of braces {}.
 1.238.2.24  21-Jan-2019  pgoyette No need to declare the hook_call() function for void hooks. So
remove and simplify.
 1.238.2.23  18-Jan-2019  pgoyette Don't restrict hooks to having only int or void types. Pass the hook's
type to the various macros, as needed.

Allows us to reduce diffs to original in at least one or two places (we
no longer have to provide an additional parameter to the hook routine
for returning a non-int return value).
 1.238.2.22  15-Jan-2019  pgoyette Remove a couple of unneeded #include-s

XXX There's probably a lot more clean-up that could happen here!
 1.238.2.21  15-Jan-2019  pgoyette Add vectors for sctp_{add,delete}_ipaddr() so we can check them
in rtsock.c rather than depending on the SCTP kernel compile
option. This is similar to what was done previously with NTP.
 1.238.2.20  15-Jan-2019  pgoyette Split sys/net/rtsock.c into two pieces, one of which is applicable only
to -current and one which is shared between -current and COMPAT_50.
 1.238.2.19  14-Jan-2019  pgoyette Create a variant of the HOOK macros that handles hook routines of
type void, and use them where appropriate.
 1.238.2.18  13-Jan-2019  pgoyette Add the required hooks for rtsock_50 and modify the COMPATCALL() macro
to use the hooks. While the rtsock_50 situation is still sub-optimal
(it includes the main rtsock.c with a whole bunch of function and
variable redefinitions via macros), this at least makes it possible to
load the rtsock_50 code separately from more recent code, rather than
the previous requirement that rtsock_50 be built-in.
 1.238.2.17  13-Jan-2019  pgoyette Remove the HOOK2 versions of the MODULE_HOOK macros. There were
only a few uses, and using them led to some lack of clarity in the
code. Instead, we now use two separate hooks, with names that
make it clear(er) what we're doing.

This also positions us to start unraveling some of the rtsock_50
mess, which will need (at least) five hooks.
 1.238.2.16  13-Jan-2019  pgoyette Rearrange a bit, put all the sysctl-related stuff at the end of the
file, and enclose it in a single ``#ifdef COMPAT_RTSOCK ... #endif''
block.

XXX Arguably, this code might better belong in its own source file,
but I'll leave that for a future project.
 1.238.2.15  11-Jan-2019  pgoyette Don't accept OIFLIST operation unless the rtsock_70_hook is loaded,
even though the results are otherwise identical to those on current.
 1.238.2.14  11-Jan-2019  pgoyette Rework the various sysctl-related routines to call the correct code
for each version. While here, extract the 5.0 specific code instead
of including in the main rtsock.c code.

Also, clean up all the sysctl-related routines to prevent building
more than one copy, no matter how many places rtsock.c gets #include'd
into!
 1.238.2.13  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.238.2.12  15-Oct-2018  pgoyette Convert a couple more hooks to the MP-safe mechanism.

While here, clean up some headers and remove any that are now empty.
 1.238.2.11  30-Sep-2018  pgoyette Ssync with HEAD
 1.238.2.10  29-Sep-2018  pgoyette In MODULE_HOOK_CALL_DECL we don't need to provide the actual argument
list for calling the hook function, nor do we need to provide the
default value (for when the hook has not been set).
 1.238.2.9  18-Sep-2018  pgoyette The COMPAT_HOOK macros were renamed to MODULE_HOOK, adjust all callers
 1.238.2.8  18-Sep-2018  pgoyette Split the COMPAT_CALL_HOOK to separate the declaration from the
implementation. Some hooks are called from multiple source files,
and the old method resulted in duplicate implementations.

Implement MP-safe hooks for the usb_subr_30 code. Pass the helper
functions as arguments to the compat code so it does not have to
determine if the kernel contains usb code.
 1.238.2.7  17-Sep-2018  pgoyette Adapt (most of) the indirect function pointers to the new MP-safe
mechanism. Still remaining are the compat_netbsd32 stuff, and
some usb subroutines.
 1.238.2.6  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.238.2.5  02-May-2018  pgoyette Synch with HEAD
 1.238.2.4  16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.238.2.3  30-Mar-2018  pgoyette Extract compat_14 stuff into its own module
 1.238.2.2  22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.238.2.1  15-Mar-2018  pgoyette Create a separate module for COMPAT_70 code only, and untangle the
70 compat code from the current.
 1.241.2.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.241.2.1  10-Jun-2019  christos Sync with HEAD
 1.250.2.2  05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #168):

sys/net/rtsock.c: revision 1.252
sys/netinet6/nd6_nbr.c: revision 1.168 - 1.172
sys/netinet6/nd6.c: revision 1.262

inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.

This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted
(RTM_DELETED) or has failed to been resolved (RTM_MISS).

The latter case can be interpreted as unreachable.

inet6: change rt_announce and llchange to bool in nd6_na_input()
more bool
 1.250.2.1  26-Aug-2019  martin Pull up following revision(s) (requested by roy in ticket #109):

sys/net/route.h: revision 1.124
sys/netinet6/nd6.c: revision 1.258
sys/netinet6/nd6.c: revision 1.259
sys/net/rtsock.c: revision 1.251
sys/netinet/if_arp.c: revision 1.284
sys/netinet6/nd6_nbr.c: revision 1.167

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9

-

nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.252.2.1  29-Feb-2020  ad Sync with head.

RSS XML Feed