Home | History | Annotate | Download | only in netinet6
History log of /src/sys/netinet6/raw_ip6.c
RevisionDateAuthorComments
 1.185  05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.184  24-Feb-2024  mlelstv branches: 1.184.2;
Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.183  22-Mar-2023  ozaki-r in6: make sure a user-specified checksum field is within a packet

From OpenBSD
 1.182  04-Nov-2022  ozaki-r branches: 1.182.2;
inpcb: rename functions to in6pcb_*
 1.181  04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.180  28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.179  28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.178  28-May-2022  andvar fix various typos, mainly in comments.
 1.177  23-Feb-2022  andvar fix various typos in comments, mainly immediatly/immediately/,
as well shared and recently fixed typos in OpenBSD code by Jonathan Grey.
 1.176  21-Sep-2021  christos don't opencode kauth_cred_get()
 1.175  25-Feb-2019  maxv branches: 1.175.4;
RIP6, CAN, SCTP and SCTP6 lack a length check in their _send() functions.
Fix RIP6 and CAN, add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+0b9692ae0f49f93b7dc7@syzkaller.appspotmail.com
 1.174  24-Feb-2019  maxv RIP, RIP6, DDP, SCTP and SCTP6 lack a length check in their _connect()
functions. Fix the first three, and add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+9eaf98dad6ca738c250d@syzkaller.appspotmail.com
 1.173  28-Jan-2019  martin Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.172  11-May-2018  maxv branches: 1.172.2;
Dedup: introduce rip6_sbappendaddr. Same as IPv4.
 1.171  29-Apr-2018  maxv Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.
 1.170  28-Apr-2018  maxv Remove unused ipsec_var.h includes.
 1.169  26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.168  12-Apr-2018  maxv Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.
 1.167  12-Apr-2018  maxv Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.
 1.166  21-Mar-2018  roy Sprinkle more soroverflow().
 1.165  28-Feb-2018  maxv branches: 1.165.2;
Remove unused ipsec_private.h includes.
 1.164  26-Feb-2018  maxv Remove redundant condition (harmless). PR/53030.
 1.163  26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.162  08-Feb-2018  maxv Remove the IN6_IS_ADDR_V4MAPPED checks in the protocol functions. They
are useless, because the IPv6 entry point (ip6_input) already performs
them.

The checks were first added in the protocol functions:

Wed Dec 22 04:03:02 1999 UTC (18 years, 1 month ago) by itojun

"drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)"

Shortly afterwards they were also added in the IPv6 entry point, but
where not removed from the protocol functions:

Mon Jan 31 10:33:22 2000 UTC (18 years ago) by itojun

"be proactive about malicious packet on the wire. we fear that v4 mapped
address to be used as a tool to hose security filters (like bypassing
"local host only" filter by using ::ffff:127.0.0.1)."

OpenBSD did the same a few months ago. FreeBSD has never had these checks.
 1.161  01-Feb-2018  maxv Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.160  30-Jan-2018  maxv Fix a buffer overflow in ip6_get_prevhdr. Doing

mtod(m, char *) + len

is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.

The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.

But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.

However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.

As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.

Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.

Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.

This place is still fragile.
 1.159  23-Jan-2018  maxv Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.
 1.158  05-Nov-2017  ozaki-r Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659
 1.157  01-Jun-2017  chs branches: 1.157.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.156  03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.155  24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.154  13-Dec-2016  ozaki-r branches: 1.154.2;
Remove unnecessary inclusions of nd6.h
 1.153  18-Nov-2016  knakahara fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.152  31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.151  29-Sep-2016  roy Now that we disallow sending or receiving from invalid addresses,
allow binding to tentative addresses.
 1.150  26-Aug-2016  roy Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.149  01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.148  15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.147  15-Jul-2016  ozaki-r Use ifatoia6 macro

No functional change.
 1.146  21-Jun-2016  ozaki-r branches: 1.146.2;
Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.145  16-Jun-2016  ozaki-r Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND
 1.144  10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.143  12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.142  26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.141  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.140  02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.139  26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.138  24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.137  03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.136  09-Aug-2014  rtr branches: 1.136.2; 1.136.4; 1.136.6; 1.136.8;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.135  08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.134  05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.133  05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.132  31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.131  31-Jul-2014  ozaki-r Define IFNET_EMPTY() and replace !IFNET_FIRST() with it

No functional change.
 1.130  30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.129  24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.128  23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.127  09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.126  09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.125  07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.124  07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.123  07-Jul-2014  rtr * have pr_stat return EOPNOTSUPP consistently for all protocols that do
not fill in struct stat instead of returning success.

* in pr_stat remove all checks for non-NULL so->so_pcb except where the
pcb is actually used (i.e. cases where we don't return EOPNOTSUPP).

proposed on tech-net@
 1.122  06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.121  01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.120  23-Jun-2014  rtr where appropriate rename xxx_ioctl() struct mbuf * parameters from
`control' to `ifp' after split from xxx_usrreq().

sys_socket.c
fix wrapping of arguments to be consistent with other function calls
in the file after replacing pr_usrreq() call with pr_ioctl() which
required one less argument.

link_proto.c
fix indentation of parameters in link_ioctl() prototype to be
consistent with the rest of the file.

discussed with rmind@
 1.119  22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.118  30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.117  20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.116  19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.115  18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.114  18-May-2014  rmind Use IFNET_FIRST() rather than open coding ifnet access.
 1.113  25-Feb-2014  pooka branches: 1.113.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.112  23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.111  05-Jun-2013  christos branches: 1.111.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.110  22-Mar-2012  drochner branches: 1.110.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.109  19-Dec-2011  drochner branches: 1.109.2; 1.109.6; 1.109.8;
rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.108  03-May-2011  dyoung branches: 1.108.4; 1.108.8;
Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.107  08-Jul-2010  dyoung branches: 1.107.2;
Remove unnecessary casts from struct route * to struct route *.
 1.106  08-Jul-2010  dyoung Sprinkle const to prevent rip6_output() from re-assigning all but one of
its arguments.
 1.105  16-Sep-2009  pooka branches: 1.105.2; 1.105.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.104  06-May-2009  elad Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.103  15-Mar-2009  cegger ansify function definitions
 1.102  03-Jan-2009  yamt branches: 1.102.2;
remove extra semicolons.
 1.101  17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.100  06-Aug-2008  plunky branches: 1.100.2;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.99  04-May-2008  thorpej branches: 1.99.2; 1.99.6;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.98  24-Apr-2008  ad branches: 1.98.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.97  23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.96  15-Apr-2008  thorpej branches: 1.96.2;
Explicitly include <sys/percpu.h>.
 1.95  15-Apr-2008  thorpej Make raw6 stats per-cpu.
 1.94  15-Apr-2008  thorpej Make ip6 and icmp6 stats per-cpu.
 1.93  08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.92  08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.91  27-Nov-2007  christos branches: 1.91.10; 1.91.14;
require that the options argument is the right size, not that it is greater
or equal to the requested size. Suggested by Matt Thomas.
 1.90  06-Nov-2007  dyoung Delete dead code that I accidentally introduced before. Thanks
Arnaud Lacombe for pointing out to me Coverity CID 4562.
 1.89  06-Nov-2007  dyoung Use sockaddr_in6_init().
 1.88  01-Nov-2007  dyoung branches: 1.88.2;
De-__P().
 1.87  19-Sep-2007  dyoung branches: 1.87.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.86  19-Jul-2007  dyoung branches: 1.86.4; 1.86.6; 1.86.8;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.85  23-May-2007  christos branches: 1.85.2;
Ansify + add a few comments, from Karl Sjödahl
 1.84  04-Mar-2007  christos branches: 1.84.2; 1.84.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.83  22-Feb-2007  dyoung Cosmetic: remove extraneous () on return statements, break a line
in two, join lines, compare pointers with NULL instead of testing
their "truth."
 1.82  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.81  10-Feb-2007  degroote branches: 1.81.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.80  04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.79  02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.78  23-Jul-2006  ad branches: 1.78.4; 1.78.6; 1.78.8; 1.78.10;
Use the LWP cached credentials where sane.
 1.77  14-May-2006  elad integrate kauth.
 1.76  05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.75  21-Jan-2006  rpaulo branches: 1.75.2; 1.75.4; 1.75.6; 1.75.8; 1.75.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.74  11-Dec-2005  christos branches: 1.74.2;
merge ktrace-lwp.
 1.73  28-Aug-2005  rpaulo Implement net.inet6.raw6.stats sysctl.

Reviewed by Elad Efrat.
 1.72  29-May-2005  christos branches: 1.72.2;
- avoid shadowed variables
- sprinkle const.
 1.71  11-Mar-2005  atatat Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.
 1.70  10-Mar-2005  atatat Change types of kern.file2 and net.*.*.pcblist to NODE
 1.69  09-Mar-2005  atatat Add the following nodes to the sysctl tree:

net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
 1.68  06-Sep-2004  yamt branches: 1.68.4; 1.68.6;
rip6_output: redo raw_ip6.c 1.67-1.67, using m_copyback_cow.
 1.67  23-Jul-2004  yamt rip6_output: redo the previous (raw_ip6.c 1.66)
with less assumptions about alignment.
 1.66  22-Jul-2004  yamt rip6_output: make sure that the mbuf is writable
before write a checksum into it.
otherwise "ping6 -s50000" causes a panic.

ok'ed by itojun.
 1.65  11-Jun-2004  itojun implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.64  22-Apr-2004  itojun correct parameter to in6_cksum. keiichi@kame
 1.63  30-Oct-2003  simonb branches: 1.63.2;
Remove some assigned-to but otherwise unused variables.
 1.62  25-Oct-2003  christos fix uninitialized variables
 1.61  06-Sep-2003  itojun clarify flowlabel handling
 1.60  05-Sep-2003  itojun u_short -> u_int16_t. sync w/ kame.
don't set ip6_plen where unneeded (i.e. before calling ip6_output)
 1.59  04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.58  22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.57  22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.56  22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.55  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.54  29-Jun-2003  fvdl branches: 1.54.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.53  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.52  27-May-2003  itojun can't use M_WAIT here, i believe.
 1.51  11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.50  11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.49  20-Jul-2002  itojun remove unneeded extern decl (commented out). sync w/kame
 1.48  09-Jun-2002  itojun whitespace cleanup
 1.47  07-Jun-2002  itojun some KNF
 1.46  07-Jun-2002  itojun some KNF
 1.45  07-Jun-2002  itojun no need for offsetof()
 1.44  07-Jun-2002  itojun typo
 1.43  07-Jun-2002  itojun sync IPV6_CHECKSUM handling with kame.
 1.42  19-Mar-2002  itojun branches: 1.42.4; 1.42.6;
check sa_len and sa_family strictly. (NOTE: rtsol/rtsold older than Nov2001
will stop working, upgrade them first)
 1.41  20-Dec-2001  itojun centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame
 1.40  18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.39  13-Nov-2001  lukem add RCSIDs
 1.38  24-Oct-2001  itojun more whitespace sync with kame
 1.37  18-Oct-2001  itojun branches: 1.37.2;
gather stats on raw ip6 socket. sync with kame
 1.36  18-Oct-2001  itojun reduce diffs with kame (mostly cosmetic).
move IPV6_CHECKSUM processing to sys/netinet6/raw_ip6.c.
constify a couple of places.
 1.35  25-Jul-2001  itojun allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.34  23-Jul-2001  itojun repair scoped address handling in PRU_BIND. sync with kame.
 1.33  03-Jul-2001  itojun branches: 1.33.2;
call in{,6}_pcbpurgeif0() before in{,6}_purgeif().
 1.32  08-May-2001  itojun correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.31  04-Mar-2001  itojun branches: 1.31.2;
avoid possible alignment issue. sync with kame
 1.30  26-Feb-2001  itojun make sure to validate packet against ipsec policy.
 1.29  11-Feb-2001  itojun pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.28  10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.27  08-Feb-2001  itojun sync with kame better. cosmetic/stat changes only.
 1.26  24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.25  19-Oct-2000  itojun memcpy -> bcopy, for sync with kame tree
 1.24  07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.23  29-May-2000  itojun branches: 1.23.2;
disallow bind(2) with IPv4 mapped address for now. port number check is
insufficient at this moment and we can bind(2) two sockets listen on same
port number.

for real fix, we need to check inpcb table with in6pcb. we can't
find inpcb chain from particular in6pcb chain (like finding tcbtable from tcb6)
luckily RFC2553 does not talk about bind(2) behavior for IPv4 mapped.
IPv4 mapped brings in too much complexities...
 1.22  01-Mar-2000  itojun branches: 1.22.2;
introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.21  28-Feb-2000  itojun make ICMPv6 redirect actually flush route cache in udp6/raw6 socket.
 1.20  26-Feb-2000  itojun implement rip6_ctlinput, to cope with routing changes correctly.
(IMHO we need rip_ctlinput as well)
 1.19  06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.18  02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.17  01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.16  31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.15  06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.14  22-Dec-1999  itojun drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)
 1.13  15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.12  13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.11  13-Sep-1999  itojun branches: 1.11.2; 1.11.8;
- Call in{,6}_pcbdetach if ipsec initialization is failed during PRU_ATTACH.
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.

(sync with recent KAME)
 1.10  05-Aug-1999  itojun import recent kAME fixes.
- initialize hoplimit for raw6 socket properly.
- respect SO_TIMESTAMP on udp6.
- more sanity checks.
 1.9  31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.8  30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.7  19-Jul-1999  itojun fix IPV6_CHECKSUM socket option (length computation was wrong).
 1.6  09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.5  06-Jul-1999  itojun checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour
 1.4  04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.3  03-Jul-1999  thorpej RCS ID police.
 1.2  01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1  28-Jun-1999  itojun branches: 1.1.2;
file raw_ip6.c was initially added on branch kame.
 1.1.2.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3  02-Aug-1999  thorpej Update from trunk.
 1.2.2.2  01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1  01-Jul-1999  thorpej file raw_ip6.c was added on branch chs-ubc2 on 1999-07-01 23:48:30 +0000
 1.11.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.2.3  12-Mar-2001  bouyer Sync with HEAD.
 1.11.2.2  11-Feb-2001  bouyer Sync with HEAD.
 1.11.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.22.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.23.2.3  09-May-2001  he Pull up revision 1.32 (via patch, requested by itojun):
Correct faith prefix determintaion.
 1.23.2.2  06-Apr-2001  he Pull up revision 1.26 (via patch, requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.23.2.1  26-Feb-2001  he Pull up revision 1.30 (requested by itojun):
Make sure to validate packet against ipsec policy.
 1.31.2.13  17-Sep-2002  nathanw Catch up to -current.
 1.31.2.12  01-Aug-2002  nathanw Catch up to -current.
 1.31.2.11  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.31.2.10  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.31.2.9  20-Jun-2002  nathanw Catch up to -current.
 1.31.2.8  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.31.2.7  08-Jan-2002  nathanw Catch up to -current.
 1.31.2.6  14-Nov-2001  nathanw Catch up to -current.
 1.31.2.5  22-Oct-2001  nathanw Catch up to -current.
 1.31.2.4  24-Aug-2001  nathanw Catch up with -current.
 1.31.2.3  21-Jun-2001  nathanw Catch up to -current.
 1.31.2.2  13-Mar-2001  nathanw Be more careful not to dereference curproc when there might not be
a process context.
 1.31.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.33.2.5  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.33.2.4  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.33.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.33.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.33.2.1  03-Aug-2001  lukem update to -current
 1.37.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.42.6.1  14-Jun-2004  jmc Pullup rev 1.65 (requested by itojun in ticket #1709)

Implement IPV6_USE_MIN_MTU sockopt.
 1.42.4.2  29-Aug-2002  gehenna catch up with -current.
 1.42.4.1  20-Jun-2002  gehenna catch up with -current.
 1.54.2.6  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.54.2.5  01-Apr-2005  skrll Sync with HEAD.
 1.54.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.54.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.54.2.2  03-Aug-2004  skrll Sync with HEAD
 1.54.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.63.2.2  11-Sep-2004  he Pull up revisions 1.66-1.68 (requested by yamt in ticket #836):
Ensure that the mbuf is writable before writing a checksum
into it.
 1.63.2.1  14-Jun-2004  tron Pull up revision 1.65 (requested by itojun in ticket #468):
implement IPV6_USE_MIN_MTU sockopt. needed by bind9 + EDNS0 + big receive buffer.
 1.68.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.68.4.1  29-Apr-2005  kent sync with -current
 1.72.2.7  07-Dec-2007  yamt sync with head
 1.72.2.6  15-Nov-2007  yamt sync with head.
 1.72.2.5  27-Oct-2007  yamt sync with head.
 1.72.2.4  03-Sep-2007  yamt sync with head.
 1.72.2.3  26-Feb-2007  yamt sync with head.
 1.72.2.2  30-Dec-2006  yamt sync with head.
 1.72.2.1  21-Jun-2006  yamt sync with head.
 1.74.2.1  01-Feb-2006  yamt sync with head.
 1.75.10.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.75.8.3  11-May-2006  elad sync with head
 1.75.8.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.75.8.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.75.6.2  11-Aug-2006  yamt sync with head
 1.75.6.1  24-May-2006  yamt sync with head.
 1.75.4.1  01-Jun-2006  kardel Sync with head.
 1.75.2.4  09-Sep-2006  rpaulo sync with head
 1.75.2.3  23-Feb-2006  rpaulo Remove in6pcb references from rip6_output() and rip6_usrreq().
 1.75.2.2  14-Feb-2006  rpaulo Replace in6pcb with inpcb.
 1.75.2.1  07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.78.10.1  04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.78.8.1  24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.78.6.1  10-Dec-2006  yamt sync with head.
 1.78.4.1  12-Jan-2007  ad Sync with head.
 1.81.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.81.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.84.4.1  11-Jul-2007  mjf Sync with head.
 1.84.2.3  09-Oct-2007  ad Sync with head.
 1.84.2.2  20-Aug-2007  ad Sync with HEAD.
 1.84.2.1  08-Jun-2007  ad Sync with head.
 1.85.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.86.8.2  19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.86.8.1  19-Jul-2007  dyoung file raw_ip6.c was added on branch matt-mips64 on 2007-07-19 20:48:59 +0000
 1.86.6.3  09-Jan-2008  matt sync with HEAD
 1.86.6.2  08-Nov-2007  matt sync with -HEAD
 1.86.6.1  06-Nov-2007  matt sync with HEAD
 1.86.4.4  03-Dec-2007  joerg Sync with HEAD.
 1.86.4.3  11-Nov-2007  joerg Sync with HEAD.
 1.86.4.2  04-Nov-2007  jmcneill Sync with HEAD.
 1.86.4.1  02-Oct-2007  joerg Sync with HEAD.
 1.87.4.1  13-Nov-2007  bouyer Sync with HEAD
 1.88.2.2  08-Dec-2007  mjf Sync with HEAD.
 1.88.2.1  19-Nov-2007  mjf Sync with HEAD.
 1.91.14.3  17-Jan-2009  mjf Sync with HEAD.
 1.91.14.2  28-Sep-2008  mjf Sync with HEAD.
 1.91.14.1  02-Jun-2008  mjf Sync with HEAD.
 1.91.10.1  22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.96.2.1  18-May-2008  yamt sync with head.
 1.98.2.5  11-Aug-2010  yamt sync with head.
 1.98.2.4  11-Mar-2010  yamt sync with head
 1.98.2.3  16-May-2009  yamt sync with head
 1.98.2.2  04-May-2009  yamt sync with head.
 1.98.2.1  16-May-2008  yamt sync with head.
 1.99.6.1  19-Oct-2008  haad Sync with HEAD.
 1.99.2.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.100.2.2  28-Apr-2009  skrll Sync with HEAD.
 1.100.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.102.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.105.4.2  31-May-2011  rmind sync with head
 1.105.4.1  05-Mar-2011  rmind sync with head
 1.105.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.107.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.108.8.2  05-Apr-2012  mrg sync to latest -current.
 1.108.8.1  18-Feb-2012  mrg merge to -current.
 1.108.4.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.108.4.1  17-Apr-2012  yamt sync with head
 1.109.8.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1541):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.109.8.1  30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.109.6.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1541):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.109.6.1  30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.109.2.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1541):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.109.2.1  30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1523):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
sys/netinet6/ah_input.c: adjust other callers (patch)
sys/netinet6/esp_input.c: adjust other callers (patch)
sys/netinet6/ipcomp_input.c: adjust other callers (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.110.2.3  03-Dec-2017  jdolecek update from HEAD
 1.110.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.110.2.1  23-Jun-2013  tls resync from head
 1.111.2.3  18-May-2014  rmind sync with head
 1.111.2.2  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.111.2.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.113.2.1  10-Aug-2014  tls Rebase.
 1.136.8.1  18-Jan-2017  skrll Sync with netbsd-5
 1.136.6.3  29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.136.6.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1591):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.136.6.1  30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.136.4.9  28-Aug-2017  skrll Sync with HEAD
 1.136.4.8  05-Feb-2017  skrll Sync with HEAD
 1.136.4.7  05-Dec-2016  skrll Sync with HEAD
 1.136.4.6  05-Oct-2016  skrll Sync with HEAD
 1.136.4.5  09-Jul-2016  skrll Sync with HEAD
 1.136.4.4  29-May-2016  skrll Sync with HEAD
 1.136.4.3  22-Sep-2015  skrll Sync with HEAD
 1.136.4.2  06-Jun-2015  skrll Sync with HEAD
 1.136.4.1  06-Apr-2015  skrll Sync with HEAD
 1.136.2.4  29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.136.2.3  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1591):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.136.2.2  30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.136.2.1  28-Sep-2016  bouyer branches: 1.136.2.1.2;
Pull up following revision(s) (requested by roy in ticket #1243):
sys/netinet6/raw_ip6.c: revision 1.150 via patch
sys/netinet6/in6_pcb.c: revision 1.149 via patch
Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.136.2.1.2.3  29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.136.2.1.2.2  01-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1591):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.136.2.1.2.1  30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #1560):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160 (patch)
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.146.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.146.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.146.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.146.2.2  06-Aug-2016  pgoyette Sync with HEAD
 1.146.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.154.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.157.2.6  23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1808):

sys/netinet6/raw_ip6.c: revision 1.183 (via patch)
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.157.2.5  29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1175):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/sctp_usrreq.c 1.14
sys/netinet/tcp_usrreq.c 1.223
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/sctp6_usrreq.c 1.17
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.157.2.4  09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.157.2.3  30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #666):

sys/netinet6/raw_ip6.c: revision 1.161

Fix use-after-free, the first m_copyback_cow may have freed the mbuf, so
it is wrong to read ip6->ip6_nxt.
 1.157.2.2  30-Jan-2018  martin Pull up following revision(s) (requested by maxv in ticket #527):
sys/netinet6/frag6.c: revision 1.65
sys/netinet6/ip6_input.c: revision 1.187
sys/netinet6/ip6_var.h: revision 1.78
sys/netinet6/raw_ip6.c: revision 1.160
Fix a buffer overflow in ip6_get_prevhdr. Doing
mtod(m, char *) + len
is wrong, an option is allowed to be located in another mbuf of the chain.
If the offset of an option within the chain is bigger than the length of
the first mbuf in that chain, we are reading/writing one byte of packet-
controlled data beyond the end of the first mbuf.
The length of this first mbuf depends on the layout the network driver
chose. In the most difficult case, it will allocate a 2KB cluster, which
is bigger than the Ethernet MTU.
But there is at least one way of exploiting this case: by sending a
special combination of nested IPv6 fragments, the packet can control a
good bunch of 'len'. By luck, the memory pool containing clusters does not
embed the pool header in front of the items, so it is not straightforward
to predict what is located at 'mtod(m, char *) + len'.
However, by sending offending fragments in a loop, it is possible to
crash the kernel - at some point we will hit important data structures.
As far as I can tell, PF protects against this difficult case, because
it kicks nested fragments. NPF does not protect against this. IPF I don't
know.
Then there are the more easy cases, if the MTU is bigger than a cluster,
or if the network driver did not allocate a cluster, or perhaps if the
fragments are received via a tunnel; I haven't investigated these cases.
Change ip6_get_prevhdr so that it returns an offset in the chain, and
always use IP6_EXTHDR_GET to get a writable pointer. IP6_EXTHDR_GET
leaves M_PKTHDR untouched.
This place is still fragile.
 1.157.2.1  08-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #350):
sys/netinet6/icmp6.c: revision 1.214
sys/netinet6/raw_ip6.c: revision 1.158
Fix usages of ipsec_used
If IPsec isn't used, we must go back to the normal path.
PR kern/52659
 1.165.2.4  21-May-2018  pgoyette Sync with HEAD
 1.165.2.3  02-May-2018  pgoyette Synch with HEAD
 1.165.2.2  16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.165.2.1  22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.172.2.1  10-Jun-2019  christos Sync with HEAD
 1.175.4.2  10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1809):

sys/netinet6/raw_ip6.c: revision 1.184 (patch)
sys/netinet6/icmp6.c: revision 1.256 (patch)

Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.175.4.1  23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1615):

sys/netinet6/raw_ip6.c: revision 1.183 (via patch)
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.182.2.2  10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #615):

sys/netinet6/raw_ip6.c: revision 1.184
sys/netinet6/icmp6.c: revision 1.256

Deliver timestamps also to raw sockets.
Fixes PR 57955
 1.182.2.1  23-Mar-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #125):

sys/netinet6/raw_ip6.c: revision 1.183
sys/netinet6/ip6_output.c: revision 1.233

in6: reject setting negative values but -1 via setsockopt(IPV6_CHECKSUM)
Same as OpenBSD.

in6: make sure a user-specified checksum field is within a packet
From OpenBSD
 1.184.2.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed