Home | History | Annotate | Download | only in netinet
History log of /src/sys/netinet/in_pcb.c
RevisionDateAuthorComments
 1.202  04-Nov-2022  ozaki-r ipcb: add/update the description of functions

From rmind-smpnet patches
 1.201  04-Nov-2022  ozaki-r inpcb: replace leading white spaces with tabs
 1.200  04-Nov-2022  ozaki-r inpcb: get rid of parentheses for return value
 1.199  04-Nov-2022  ozaki-r inpcb: use NULL
 1.198  04-Nov-2022  ozaki-r inpcb: use in_port_t for port numbers
 1.197  04-Nov-2022  ozaki-r inpcb: use pool_cache instead of pool
 1.196  04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.195  04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.194  29-Oct-2022  ozaki-r inpcb: fix for kernels without INET6
 1.193  28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.192  28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.191  14-Oct-2022  ryo Avoid error of "-Wreturn-local-addr", and simplify the logic.

However, -Wreturn-local-addr is still disabled by default by GCC_NO_RETURN_LOCAL_ADDR
in bsd.own.mk because it causes errors in other parts.
 1.190  29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.189  29-Jul-2022  knakahara Remove obsoleted comments.

These comments are added with IFNET_LOCK by in_pcb.c:r1.180 and
in6_pcb.c:r1.162. And then, IFNET_LOCK codes are removed in
in_pcb.c:r1.183 and in6_pcb.c:r1.166, however the comments have
remained.
 1.188  10-Jun-2022  knakahara Use LIST_FOREACH macro.
 1.187  09-Jun-2022  knakahara refactor: use TAILQ_FOREACH instead of TAILQ_FOREACH_SAFE about inpt_queue.

They don't use "ninph" pointer and don't remove elements.
 1.186  19-Oct-2021  roy netinet: Allow binding the unspecified address when no addresses exist

You should always be able to bind to the unspecified address even if
no addresses have been configured on any interface.

For example, a DHCP client could be started before the loopback interface
has been fully configured.
 1.185  08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.184  20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.183  15-May-2019  ozaki-r Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
Unfortunatetly it caused a deadlock instead. A known scenario causing a
deadlock is to occur the following two operations concurrently: (a) a removal of
an IP adddres assigned to an interface and (b) a manipulation of multicast
groups to the interface. The resource dependency graph is like this:
softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

Thanks to the previous commit that avoids data races on if_flags for
IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
deadlock.

PR kern/54189
 1.182  27-Feb-2018  maxv branches: 1.182.4;
Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.181  01-Jan-2018  christos 1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo
 1.180  15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.179  10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.178  25-Apr-2017  ozaki-r branches: 1.178.4;
Check if solock of PCB is held when SP caches in the PCB are accessed

To this end, a back pointer from inpcbpolicy to inpcb_hdr is added.
 1.177  20-Apr-2017  ozaki-r Simplify logic of udp4_sendup and udp6_sendup

They are always passed a socket with the same protocol faimiliy
as its own: AF_INET for udp4_sendup and AF_INET6 for udp6_sendup.
 1.176  02-Mar-2017  ozaki-r Make sure imo_membership is protected by inp's lock (solock)
 1.175  13-Feb-2017  ozaki-r Replace splnet with splsoftnet
 1.174  23-Jan-2017  ozaki-r Get rid of splnet for pool(9)

We don't need it anymore.
 1.173  11-Jan-2017  ozaki-r branches: 1.173.2;
Get rid of unnecessary header inclusions
 1.172  12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.171  08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.170  29-Sep-2016  roy Now that we disallow sending or receiving from invalid addresses,
allow binding to tentative addresses.
 1.169  26-Aug-2016  roy Allow bind to detached INET addresses.
 1.168  01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.167  20-Jul-2016  ozaki-r Reduce scopes of variables
 1.166  08-Jul-2016  ozaki-r branches: 1.166.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.165  06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.164  21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.163  15-Feb-2016  rtr Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.162  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.161  24-May-2015  rtr remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
 1.160  02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.159  02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.158  26-Apr-2015  rtr return EINVAL if sin{,6}_len != sizeof(sockaddr_in{,6}) respectively in
in{,6}_pcbconnect().

checking just m->m_len isn't enough because there are various places that
assume sa_len has been properly populated.
 1.157  24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.156  03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.155  25-Nov-2014  seanb branches: 1.155.2;
Really make SO_REUSEPORT and SO_REUSEADDR equivalent for multicast
sockets. From FreeBSD.
 1.154  25-Nov-2014  seanb Clean up any dangling ifp references in (struct in6pcb *)->in6p_v4moptions
(v4 multicast options off v4 mapped v6 socket) on interface destruction. The
code to clean this up in a true v4 socket was moved to its own function
which is now also called in the corresponding place for v6 sockets on
interface destruction.
 1.153  10-Nov-2014  maxv Do not uselessly include <sys/malloc.h>.
 1.152  07-Sep-2014  rmind in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.

This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.151  05-Aug-2014  rtr branches: 1.151.2;
revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.150  03-Aug-2014  rmind in_pcbdetach: not that IGMP and multicast groups are MP-safe, we can move
the ip_freemoptions() call outside the softnet_lock. Should fix PR/49065.
 1.149  24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.148  30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.147  22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.146  23-Nov-2013  christos branches: 1.146.2;
convert from CIRCLEQ to TAILQ.
 1.145  05-Jun-2013  christos branches: 1.145.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.144  12-Apr-2013  christos PR/47738: connect(2) to 239.x.y.z should return error but does not.
 1.143  25-Jun-2012  christos branches: 1.143.2;
rename rfc6056 -> portalgo, requested by yamt
 1.142  21-Jun-2012  yamt constify, comments.
no functional changes.
 1.141  22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.140  19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.139  24-Sep-2011  christos branches: 1.139.2; 1.139.6;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.138  03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.137  12-May-2009  elad branches: 1.137.4; 1.137.6;
Implicit EPERM -> explicit EACCES.

Requested by ad@ and yamt@.
 1.136  09-May-2009  elad Add check for IN_MULTICAST() that was taken only to in_pcbbind_port() --
it's necessary in in_pcbbind_addr() as well.

Pointed out by Mihai Chelaru on tech-net@, thanks!
 1.135  30-Apr-2009  elad Commit changes to netinet6/in6_src.c, forgot in previous commit:

http://mail-index.netbsd.org/source-changes/2009/04/30/msg220547.html

Make in_pcbsetport() set the port number selected before passing "sin" to
kauth(9).
 1.134  30-Apr-2009  elad - Make in6_pcbbind_{addr,port}() static

- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()

- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
so that we have a more complete context

- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()

- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
not a copy of it, pointed out by bouyer@, thanks!

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
 1.133  23-Apr-2009  elad - Make kauth(9) call logic match the one in netinet6/in6_pcb.c

- Indent a comment
 1.132  23-Apr-2009  elad Some changes to in_pcbbind():

- Extract guts to in_pcbbind_{addr,port}()

- Put the port auto-assignment logic in in_pcbsetport(), which looks very
similar to in6_pcbsetport()

- Fix a bug where "sin" was passed to kauth(9) without being set to
anything

No objections on tech-net@.
 1.131  14-Apr-2009  elad Don't set sin->sin_port and sin6->sin6_port to 0 before calling
ifa_ifwithaddr(), as we no longer do a byte compare on the entire struct.

Reviewed by and okay from dyoung@.
 1.130  18-Mar-2009  cegger bzero -> memset
 1.129  11-Oct-2008  pooka branches: 1.129.2; 1.129.4; 1.129.8; 1.129.10;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
 1.128  03-Oct-2008  pooka Hallo, pool_init(). Auf wiedersehen & byebye, link set POOL_INIT().
 1.127  04-Aug-2008  spz typo fix in comment (drops the ' in drop's :)
 1.126  04-Aug-2008  matt Free the socket only after disposing of the PCB.
 1.125  05-May-2008  ad branches: 1.125.2; 1.125.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.124  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.123  24-Apr-2008  ad branches: 1.123.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.122  14-Jan-2008  dyoung branches: 1.122.6; 1.122.8;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
 1.121  20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.120  16-Dec-2007  elad Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
 1.119  21-Aug-2007  dyoung branches: 1.119.2; 1.119.8; 1.119.10; 1.119.14;
Use sockaddr_in_init().
 1.118  19-Jul-2007  dyoung branches: 1.118.4; 1.118.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.117  02-May-2007  dyoung branches: 1.117.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.116  12-Mar-2007  ad branches: 1.116.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.115  04-Mar-2007  christos branches: 1.115.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.114  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.113  26-Jan-2007  dyoung branches: 1.113.2;
KNF: bzero -> memset, change (struct in_ifaddr *)0 to NULL.
 1.112  15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.111  09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.110  08-Dec-2006  joerg When a dynamic route is deleted in in_losing and in6_losing, rtrequest
is called, but the current reference via the PCB is not removed. This
is effectively a leaked reference. Call rtfree unconditional.
 1.109  16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.108  13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.107  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.106  05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.105  19-Sep-2006  elad Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.
 1.104  08-Sep-2006  elad branches: 1.104.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.103  23-Jul-2006  ad branches: 1.103.4;
Use the LWP cached credentials where sane.
 1.102  14-May-2006  elad integrate kauth.
 1.101  15-Nov-2005  dsl branches: 1.101.4; 1.101.6; 1.101.8; 1.101.10; 1.101.12;
Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.100  29-May-2005  christos branches: 1.100.2; 1.100.8;
- add const
- remove bogus casts
- avoid nested variables
 1.99  07-May-2005  christos PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
 1.98  03-Feb-2005  perry ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.97  02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.96  29-Sep-2004  christos branches: 1.96.4; 1.96.6;
PR/27082: Sean Boudreau: redundant assignment or NULL dereference in
in_pcbconnect()
 1.95  25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.94  02-Mar-2004  thorpej Call ipsec_pcbconn() and ipsec_pcbdisconn() for FAST_IPSEC, too.
 1.93  13-Jan-2004  itojun avoid deref-after-free.
http://sources.zabbadoz.net/freebsd/patchset/106-ipsec-pcb-discon.diff
 1.92  02-Jan-2004  itojun whitespace
 1.91  11-Nov-2003  jonathan Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.90  28-Oct-2003  provos use a hash table to bind to local ports; suggested by markus friedl
approved: fvdl@
 1.89  23-Oct-2003  mycroft Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
 1.88  04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.87  15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.86  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.85  22-Jul-2003  itojun avoid code dup when check broadcast addr in bind(2)
 1.84  21-Jul-2003  itojun permit bind(2) to broadcast address, as it was permitted before.
(for instance, "ntpd -b" was broken since revision 1.82)
found report on http://pc.2ch.net/unix
 1.83  26-Jun-2003  itojun branches: 1.83.2;
check if INADDR_TO_IA gets us valid in_ifaddr or not. hopefully fix PR21964
 1.82  15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.81  16-Mar-2003  lukem Enable check in in_pcbbind() to enforce sin_family == AF_INET.
If there are any "old programs which incorrectly set this" left,
they will now fail with EAFNOSUPPORT.
This make in_pcbbind() consistent with in_pcbconnect() and the other
protocol families.

As per my PR [kern/4441], which has the comment:
Steven's "TCP/IP Illustrated, Volume 2", page 730, notes that
in_pcbbind() has the check which determines if sin_family == AF_INET
commented out, but the same check in in_pcbconnect() is still active.
 1.80  22-Oct-2002  simonb "error" in in_pcbbind() was only ever set but not used, remove it.
 1.79  11-Jun-2002  itojun share policy-on-pcb for listening socket. sync w/kame
todo: share even more, avoid frequent updates of spidx
 1.78  09-Jun-2002  itojun whitespace
 1.77  28-May-2002  itojun correct in*_pcbrtentry. check cached value correctly.
 1.76  28-May-2002  itojun in in*_pcbrtentry(), check if route is still valid (RTF_UP),
and address family is still valid.
 1.75  08-Mar-2002  thorpej branches: 1.75.6;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.74  22-Jan-2002  itojun make sure to check address family on route cache. with IPv4 mapped
address we can see both AF_INET/INET6.
 1.73  13-Nov-2001  lukem add RCSIDs
 1.72  04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.71  06-Aug-2001  itojun branches: 1.71.4;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.70  25-Jul-2001  itojun allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.69  02-Jul-2001  itojun branches: 1.69.2;
on interface removal, remove multicast groups joined from pcb, before
removing interface addresses. without the change, we may deref
NULL pointer in in_pcbpurgeif(). from jinmei@kame, sync with kame
 1.68  08-Nov-2000  ad branches: 1.68.2;
Update for hashinit() change.
 1.67  25-Aug-2000  tron Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.
 1.66  06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.65  03-Apr-2000  enami branches: 1.65.4;
- Unselect the multicast outgoing interface if it is being detached.
- Drop the multicast membership if we are joining through the interface
being detached.
 1.64  30-Mar-2000  augustss Remove register declarations.
 1.63  02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.62  01-Feb-2000  thorpej Small amount of cosmetic cleanup.
 1.61  13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.60  09-Jul-1999  thorpej branches: 1.60.2; 1.60.8;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.59  01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.58  23-Mar-1999  lukem branches: 1.58.4; 1.58.6;
Ensure that you can only bind a more specific address when it is done by the
same uid or by root.

This code is from FreeBSD. (Whilst it was originally obtained from OpenBSD,
FreeBSD fixed it to work with multicast. To quote the commit message:
- Don't bother checking for conflicting sockets if we're binding to a
multicast address.
- Don't return an error if we're binding to INADDR_ANY, the conflicting
socket is bound to INADDR_ANY, and the conflicting socket has
SO_REUSEPORT set.
)
 1.57  19-Dec-1998  thorpej Reverse the copyright-notice-swap. It went against existing practice.
 1.56  16-Nov-1998  lukem branches: 1.56.2;
if INADDR_ANY is given in in_pcbconnect(), choose the ia_addr of the first
interface, not the ia_broadaddr. should fix [standards/5645] and [kern/6425]
 1.55  13-Nov-1998  lukem simplify test in in_pcbbind() for setting wild=1; no need to check if
((so->so_proto->pr_flags & PR_CONNREQUIRED) == 0 ||
(so->so_options & SO_ACCEPTCONN) == 0)
since the latter is always true, so the former test in unnecessary.
from `TCP/IP Illustrated, Volume 2', W. Richard Stevens, p 730.
 1.54  05-Oct-1998  lukem * in_pcblookup_port(): deprecate INPLOOKUP_WILDCARD and flags in favour
of a lookup_wildcard arg; simplifies the logic a bit.
* when assigning ephemeral ports in in_pcbbind(), always call
in_pcblookup_port() with lookup_wildcard=1, so that ephemeral port
allocation on sockets with SO_REUSEADDR set won't potentially bind to a
port in use by something else (principle of least surprise).
 1.53  30-Sep-1998  tls Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.52  02-Aug-1998  thorpej Use the pool allocator for inpcbs.
 1.51  23-Jul-1998  pk in_pcballoc(): we can't afford to wait for memory.
 1.50  15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.49  13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.48  07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.47  08-Jan-1998  lukem * start from the top of the given ephemeral range and work down;
results in reserved ephemeral ports starting at the top (as per
current practice), and shouldn't have a negative effect on normal
ephemeral ports...
* initialise inpt_lastlow in in_pcbinit
 1.46  08-Jan-1998  lukem add missing ; ...
 1.45  07-Jan-1998  lukem add the following, derived from FreeBSD:
* IP_PORTRANGE socket option, which controls how the ephemeral ports
are allocated. it takes the following settings:
IP_PORTRANGE_DEFAULT use anonportmin (49152) -> anonportmax (65535)
IP_PORTRANGE_HIGH as IP_PORTRANGE_DEFAULT (retained for FreeBSD
compat reasons, where these are separate)
IP_PORTRANGE_LOW use 600 -> 1023. only works if uid==0.
* in_pcb flag INP_ANONPORT. set if port was allocated ephmerally
 1.44  05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.43  05-Jan-1998  lukem enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}
 1.42  30-Dec-1997  lukem as per the IANA assigned ports numbers document, use ports
49152..65535 for ephemeral ports (instead of 1024..5000).
closes my [kern/4440], but with correct code :)
 1.41  27-Nov-1997  mrg fix compile error when "options IPNOPROVPORTS"
 1.40  20-Nov-1997  thorpej Deal with a problem where ephemeral port shortage would case a PCB's
local address to be set, causing all further attemps to bind that PCB
to fail. From Koji Imada, PR #3857.
 1.39  14-Oct-1997  matt branches: 1.39.2;
Add support for returning maximum supported MTU when ip_output fails with
EMSGSIZE.
 1.38  22-Sep-1997  thorpej Implement in_pcbrtentry() - return the route associated with a PCB. If
one does not exist, attempt to allocate one. This is mostly pulled from
tcp_input.c.
 1.37  23-Jul-1997  thorpej branches: 1.37.2;
Pull SYN_cache_branch down into the main line.
 1.36  10-Dec-1996  mycroft branches: 1.36.8;
Return EAGAIN if binding with no specified port and the pool is empty.
 1.35  13-Oct-1996  christos backout previous kprintf changes
 1.34  10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.33  15-Sep-1996  mycroft Hash unconnected PCBs.
 1.32  09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.31  05-Sep-1996  perry Commit PR 2671, which adds an "IPNOPRIVPORTS" config option that turns
off the code that normally only allows root to bind low TCP
ports. Useful on firewalls and such.
 1.30  14-Aug-1996  thorpej Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.
 1.29  10-Jul-1996  cgd print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)
 1.28  22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.27  26-Feb-1996  mrg branches: 1.27.4;
two more local addr changes, all done differently now (idea from charles)
 1.26  26-Feb-1996  mrg if we are connecting *to* an address of any local interface, default the
local address of the socket to the same address.
 1.25  13-Feb-1996  christos netinet prototypes
 1.24  31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.23  17-Aug-1995  mycroft branches: 1.23.2;
so_pcb should be a void *.
 1.22  12-Aug-1995  mycroft splnet --> splsoftnet
 1.21  18-Jun-1995  cgd convert pcb lists to CIRCLEQs, so that the end can be looked at more
easily, and so that the original (insque/remque) logic can be effectively
mimiced. (This fixes a bug in the previous set of list changes.)
also (since terminator is no longer null) reinstate uninitted list checks,
but mark them XXX.
 1.20  12-Jun-1995  mycroft in_pcbnotify*() don't return anything.
 1.19  12-Jun-1995  mycroft Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.
 1.18  12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.17  04-Jun-1995  mycroft Remove one more bogus cast.
 1.16  04-Jun-1995  mycroft Don't cast things unnecessarily.
 1.15  04-Jun-1995  mycroft Clean up many more casts.
 1.14  01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.13  13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.12  29-Sep-1994  deraadt failure to bind to a reserved port should return EACCES not EPERM.
 1.11  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.10  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.9  02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.8  08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.7  18-Dec-1993  mycroft Canonicalize all #includes.
 1.6  06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.5  11-Jun-1993  deraadt The latest patch was hosed. There is some program that I used which
left extra crud at the end of the file. I blame ftpd for not doing an
ftruncate().
 1.4  10-Jun-1993  deraadt patch from Yuval Yarom, sent to me by <andrew@werple.apana.org.au>
they say: When doing an implicit bind in_pcbbind will assign used ports
if the port is bound on specific interface, and not on INADDR_ANY.
Effects of the bug range from connection drops to machine hangs.
 1.3  22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2  18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2  05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.23.2.1  02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.27.4.2  11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.27.4.1  10-Dec-1996  mycroft From trunk:
Return EAGAIN if binding with no specified port and the pool is empty.
 1.36.8.1  14-May-1997  mellon in_pcbnotify() returns a result indicating whether or not it actually matched a pcb.
 1.37.2.2  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.37.2.1  29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.39.2.4  18-Jan-1999  cgd pull up rev 1.56 from trunk (PR#5645 and PR#6425). (lukem)
 1.39.2.3  01-Oct-1998  cgd pull up revisions 1.49-1.50, 1.53 (via patch) from trunk. (tls)
 1.39.2.2  28-Nov-1997  mellon Pull rev 1.41 up from trunk (mrg)
 1.39.2.1  20-Nov-1997  thorpej Fix PCB binding problem caused by ephemeral port shortage.
 1.56.2.1  11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.58.6.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.58.6.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.58.6.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.58.4.2  02-Aug-1999  thorpej Update from trunk.
 1.58.4.1  01-Jul-1999  thorpej Sync w/ -current.
 1.60.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.60.2.2  22-Nov-2000  bouyer Sync with HEAD.
 1.60.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.65.4.1  26-Aug-2000  tron Pull up from current (approved by thorpej):

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.

syssrc/sys/netinet/in.h 1.49 -> 1.50
syssrc/sys/netinet/in_pcb.c 1.66 -> 1.67
syssrc/sys/netinet/ip_input.c 1.116 -> 1.117
syssrc/sys/netinet/ip_var.h 1.41 -> 1.42
 1.68.2.6  11-Nov-2002  nathanw Catch up to -current
 1.68.2.5  20-Jun-2002  nathanw Catch up to -current.
 1.68.2.4  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.68.2.3  28-Feb-2002  nathanw Catch up to -current.
 1.68.2.2  14-Nov-2001  nathanw Catch up to -current.
 1.68.2.1  24-Aug-2001  nathanw Catch up with -current.
 1.69.2.6  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.69.2.5  16-Mar-2002  jdolecek Catch up with -current.
 1.69.2.4  11-Feb-2002  jdolecek Sync w/ -current.
 1.69.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.69.2.2  25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.69.2.1  03-Aug-2001  lukem update to -current
 1.71.4.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.75.6.2  20-Jun-2002  gehenna catch up with -current.
 1.75.6.1  30-May-2002  gehenna Catch up with -current.
 1.83.2.7  11-Dec-2005  christos Sync with head.
 1.83.2.6  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.83.2.5  04-Feb-2005  skrll Sync with HEAD.
 1.83.2.4  19-Oct-2004  skrll Sync with HEAD
 1.83.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.83.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.83.2.1  03-Aug-2004  skrll Sync with HEAD
 1.96.6.1  12-Feb-2005  yamt sync with head.
 1.96.4.1  29-Apr-2005  kent sync with -current
 1.100.8.1  22-Nov-2005  yamt sync with head.
 1.100.2.5  21-Jan-2008  yamt sync with head
 1.100.2.4  03-Sep-2007  yamt sync with head.
 1.100.2.3  26-Feb-2007  yamt sync with head.
 1.100.2.2  30-Dec-2006  yamt sync with head.
 1.100.2.1  21-Jun-2006  yamt sync with head.
 1.101.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.101.10.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.101.10.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.101.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.101.8.3  14-Sep-2006  yamt sync with head.
 1.101.8.2  11-Aug-2006  yamt sync with head
 1.101.8.1  24-May-2006  yamt sync with head.
 1.101.6.1  01-Jun-2006  kardel Sync with head.
 1.101.4.4  09-Sep-2006  rpaulo sync with head
 1.101.4.3  14-Mar-2006  rpaulo Remove last reference to in6pcb.
 1.101.4.2  14-Feb-2006  rpaulo Remove INPCBHASH_PORT, INPCBHASH_BIND, INPCBHASH_CONNECT (moved to
in_pcb.h).
If INET6, detect whether the pcb is v4 or v6 based on the socket
family (from FreeBSD).
 1.101.4.1  02-Feb-2006  rpaulo Remove #include netinet6/in6_pcb.h.
 1.103.4.3  01-Feb-2007  ad Sync with head.
 1.103.4.2  12-Jan-2007  ad Sync with head.
 1.103.4.1  18-Nov-2006  ad Sync with head.
 1.104.2.3  18-Dec-2006  yamt sync with head.
 1.104.2.2  10-Dec-2006  yamt sync with head.
 1.104.2.1  22-Oct-2006  yamt sync with head
 1.113.2.4  07-May-2007  yamt sync with head.
 1.113.2.3  24-Mar-2007  yamt sync with head.
 1.113.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.113.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.115.2.4  09-Oct-2007  ad Sync with head.
 1.115.2.3  20-Aug-2007  ad Sync with HEAD.
 1.115.2.2  08-Jun-2007  ad Sync with head.
 1.115.2.1  13-Mar-2007  ad Sync with head.
 1.116.2.1  11-Jul-2007  mjf Sync with head.
 1.117.2.2  03-Sep-2007  skrll Sync with HEAD.
 1.117.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.118.6.2  19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.118.6.1  19-Jul-2007  dyoung file in_pcb.c was added on branch matt-mips64 on 2007-07-19 20:48:55 +0000
 1.118.4.1  03-Sep-2007  jmcneill Sync with HEAD.
 1.119.14.2  19-Jan-2008  bouyer Sync with HEAD
 1.119.14.1  02-Jan-2008  bouyer Sync with HEAD
 1.119.10.1  26-Dec-2007  ad Sync with head.
 1.119.8.1  18-Feb-2008  mjf Sync with HEAD.
 1.119.2.2  23-Mar-2008  matt sync with HEAD
 1.119.2.1  09-Jan-2008  matt sync with HEAD
 1.122.8.1  18-May-2008  yamt sync with head.
 1.122.6.4  17-Jan-2009  mjf Sync with HEAD.
 1.122.6.3  05-Oct-2008  mjf Sync with HEAD.
 1.122.6.2  28-Sep-2008  mjf Sync with HEAD.
 1.122.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.123.2.3  16-May-2009  yamt sync with head
 1.123.2.2  04-May-2009  yamt sync with head.
 1.123.2.1  16-May-2008  yamt sync with head.
 1.125.6.1  19-Oct-2008  haad Sync with HEAD.
 1.125.2.2  10-Oct-2008  skrll Sync with HEAD.
 1.125.2.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.129.10.1  10-May-2009  snj branches: 1.129.10.1.2;
Apply patch (requested by sborrill in ticket #745):
Fix compilation with IPNOPRIVPORTS option.
 1.129.10.1.2.1  21-Apr-2010  matt sync to netbsd-5
 1.129.8.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.129.4.1  10-May-2009  snj Apply patch (requested by sborrill in ticket #745):
Fix compilation with IPNOPRIVPORTS option.
 1.129.2.1  28-Apr-2009  skrll Sync with HEAD.
 1.137.6.1  06-Jun-2011  jruoho Sync with HEAD.
 1.137.4.1  31-May-2011  rmind sync with head
 1.139.6.2  05-Apr-2012  mrg sync to latest -current.
 1.139.6.1  18-Feb-2012  mrg merge to -current.
 1.139.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.139.2.2  30-Oct-2012  yamt sync with head
 1.139.2.1  17-Apr-2012  yamt sync with head
 1.143.2.3  03-Dec-2017  jdolecek update from HEAD
 1.143.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.143.2.1  23-Jun-2013  tls resync from head
 1.145.2.4  18-May-2014  rmind sync with head
 1.145.2.3  23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.145.2.2  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.145.2.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.146.2.1  10-Aug-2014  tls Rebase.
 1.151.2.2  17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.151.2.1  08-Sep-2014  msaitoh Pull up following revision(s) (requested by rmind in ticket #80):
sys/netinet6/in6_pcb.c: revision 1.129
sys/netinet/in_pcb.c: revision 1.152
in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.
This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.155.2.8  28-Aug-2017  skrll Sync with HEAD
 1.155.2.7  05-Feb-2017  skrll Sync with HEAD
 1.155.2.6  05-Oct-2016  skrll Sync with HEAD
 1.155.2.5  09-Jul-2016  skrll Sync with HEAD
 1.155.2.4  19-Mar-2016  skrll Sync with HEAD
 1.155.2.3  22-Sep-2015  skrll Sync with HEAD
 1.155.2.2  06-Jun-2015  skrll Sync with HEAD
 1.155.2.1  06-Apr-2015  skrll Sync with HEAD
 1.166.2.6  26-Apr-2017  pgoyette Sync with HEAD
 1.166.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.166.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.166.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.166.2.2  06-Aug-2016  pgoyette Sync with HEAD
 1.166.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.173.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.178.4.3  18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.178.4.2  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.178.4.1  21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.182.4.1  10-Jun-2019  christos Sync with HEAD

RSS XML Feed