Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/uipc_socket.c
RevisionDateAuthorComments
 1.314  16-Jul-2025  kre Kernel part of O_CLOFORK implementation (plus kernel revbump)

This is Ricardo Branco's implementation of O_CLOFORK (and
associated fcntl, etc) for NetBSD (with a few minor changes
by me).

For now, the header file symbols that should be exposed to
userland are hidden inside temporary #ifdef _KERNEL blocks,
just to avoid random userland apps, or config scripts, from
seeing any of this before it is better tested.

Userland parts of this will follow soon.

This also bumps the kernel version to 10.99.15 (changes to
data structs, and the signature of fd_dup()).
 1.313  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sprinkle SET_ERROR dtrace probes.

PR kern/58378: Kernel error code origination lacks dtrace probes
 1.312  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Nix trailing whitespace.

No functional change intended.
 1.311  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sort includes.

No functional change intended.
 1.310  05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.309  11-Feb-2024  jdolecek make kqfilter() behave the same for PIPE_SOCKETPAIR pipe as it does
for standard one - refuse EVFILT_WRITE if the reader is already disconnected

fixes test failure for kernel/kqueue/write/t_pipe.c on PIPE_SOCKETPAIR kernel

PR kern/55690
 1.308  03-Feb-2024  jdolecek fix PIPE_SOCKETPAIR variant of pipe1() to apply correctly the 'flags'
passed when called via pipe2(2), fixing repeatable process hang during
compilation with 'gcc -pipe'

refactor fsocreate() to return the new socket and file pointers,
expect the caller to call fd_affix() once initialization is fully complete

use the new fsocreate() to replace the duplicate open-coded 'flags' handling
in makesocket() used for socketpair(2), and in the PIPE_SOCKETPAIR pipe1()

this also fixes lib/libc/sys/t_pipe2 pipe2_cloexec test to succeed
on PIPE_SOCKETPAIR kernel

fixes PR kern/55690
 1.307  02-Nov-2023  martin Back out the following revisions on behalf of core:

sys/sys/lwp.h: revision 1.228
sys/sys/pipe.h: revision 1.40
sys/kern/uipc_socket.c: revision 1.306
sys/kern/kern_sleepq.c: revision 1.84
sys/rump/librump/rumpkern/locks_up.c: revision 1.13
sys/kern/sys_pipe.c: revision 1.165
usr.bin/fstat/fstat.c: revision 1.119
sys/rump/librump/rumpkern/locks.c: revision 1.87
sys/ddb/db_xxx.c: revision 1.78
sys/ddb/db_command.c: revision 1.187
sys/sys/condvar.h: revision 1.18
sys/ddb/db_interface.h: revision 1.42
sys/sys/socketvar.h: revision 1.166
sys/kern/uipc_syscalls.c: revision 1.209
sys/kern/kern_condvar.c: revision 1.60

Add cv_fdrestart() [...]
Use cv_fdrestart() to implement fo_restart.
Simplify/streamline pipes a little bit [...]

This changes have caused regressions and need to be debugged.
The cv_fdrestart() addition needs more discussion.
 1.306  13-Oct-2023  ad Use cv_fdrestart() to implement fo_restart.
 1.305  04-Oct-2023  ad kauth_cred_hold(): return cred verbatim so that donating a reference to
another data structure can be done more elegantly.
 1.304  07-Sep-2023  ad Fix a ~16 year old perf regression: when creating a socket, add a reference
to the caller's credentials rather than copying them. On an 80486DX2/66 this
seems to ~halve the time taken to create a socket.
 1.303  05-Aug-2023  andvar s/acccept/accept/ in comment.
 1.302  09-Apr-2022  riastradh branches: 1.302.4;
unix(4): Convert membar_exit to membar_release.

Use atomic_load_consume or atomic_load_relaxed where necessary.

Comment on why unlocked nonatomic access is valid where it is done.
 1.301  12-Mar-2022  riastradh kern: m_copym(M_DONTWAIT) can fail; handle that case gracefully.

Not sure if this should truncate the result or just fail with nonzero
error code (ENOBUFS?). Feel free to change this the other way if you
know better!

Reported-by: syzbot+54c34f25d1e4124eb85d@syzkaller.appspotmail.com
 1.300  23-Oct-2021  thorpej Add support for the EVFILT_EMPTY filter, which is activated when the
write buffer associated with the file descriptor is empty. This is
currently implemented only for sockets, and is intended primarily to
provide visibility to applications that all previously written data
has been acknowledged by the TCP layer on the receiver. Compatible
with the same filter in FreeBSD.
 1.299  11-Oct-2021  thorpej Setting EV_EOF requires modifying kn->kn_flags. However, that relies on
holding the kq_lock of that note's kq. Rather than exposing this directly,
add new knote_set_eof() and knote_clear_eof() functions that handle the
necessary locking and don't leak as many implementation details to modules.

NetBSD 9.99.91
 1.298  29-Sep-2021  thorpej The kq filterops that interact with sockets are MPSAFE.
 1.297  29-Sep-2021  thorpej - Change selremove_knote() from returning void to bool, and return
true if the last knote was removed and there are no more knotes
on the selinfo.
- Use this new return value in filt_sordetach(), filt_sowdetach(),
filt_fifordetach(), and filt_fifowdetach() to know when to clear
SB_KOTE without having to know select/kqueue implementation details.
 1.296  26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.295  03-Aug-2021  chs in sbsavetimestamp(), initialize struct timeval to 0 with memset() so that
the implicit padding is initialized. this avoids later copying uninitialized
memory out to user space. detected by KMSAN.
 1.294  11-Dec-2020  thorpej Use sel{record,remove}_knote().
 1.293  23-Nov-2020  chs Restore correct functioning of SIOCATMARK by removing the previous
change that was done to fix poll(POLLPRI | POLLRDBAND) and instead
add a separate flag to track when poll() should indicate that a
MSG_OOB byte is available. Re-fixes PR 54435 properly.
 1.292  17-Oct-2020  mlelstv branches: 1.292.2;
Setting a socket buffer size stops autoscaling. Add a sysctl to
prevent this behaviour. The default is not changed.
 1.291  26-Aug-2020  christos add socket info for user and group for unix sockets in fstat.
 1.290  07-Jun-2020  maxv Fix bohr bug triggered only once by syzkaller 2,5 months ago.

In sockopt_alloc(), 'sopt' may already have been initialized with
'sopt->sopt_data = sopt->sopt_buf'. If the allocation fails, we
end up with 'sopt->sopt_data = NULL', and later try to free this
NULL pointer in sockopt_destroy().

Fix that by not modifying 'sopt_data' if the allocation failed.

Difficult to reproduce in normal times, but fault(4) makes it
easy.

Reported-by: syzbot+380cb5d518742f063ad2@syzkaller.appspotmail.com
 1.289  26-Apr-2020  jakllsch Implement SCTP bug fixes found by maxv@.

Adding these seems to improve the SCTP situation.
 1.288  22-Feb-2020  maxv Zero out 'tv', to prevent uninitialized bytes in its padding from leaking
to userland. Found by kMSan.

Reported-by: syzbot+8134380511a82c8f5fd7@syzkaller.appspotmail.com
 1.287  21-Feb-2020  joerg Explicitly cast pointers to uintptr_t before casting to enums. They are
not necessarily the same size. Don't cast pointers to bool, check for
NULL instead.
 1.286  18-Feb-2020  christos PR/54435: Valery Ushakov: Clear urgent status after reading urgent data, so
that poll(2) works.
 1.285  14-Oct-2019  maxv branches: 1.285.2;
Add a check before the memcpy. memcpy is defined to never take NULL as
second argument, and the compiler is free to perform optimizations knowing
that this argument is never NULL.

In this particular case, it was harmless. But still good to fix.

Reported-by: syzbot+6f504255accb795eb6b7@syzkaller.appspotmail.com
 1.284  27-Sep-2019  pgoyette Actually return the updated pointer-to-mbuf-pointer to the caller
rather than discarding-after-assignment. Introduced from the
[pgoyette-compat] branch work.

Welcome to 9.99.14 !!! (Module hook routine prototype changed.)

Found by the lgtm bot, reported via private Email from maxv@
 1.283  14-Sep-2019  mlelstv Fix build.
 1.282  14-Sep-2019  christos PT/54527: Anthony Mallet: Don't clear socket errors for MSG_PEEK.
 1.281  16-Jul-2019  pgoyette branches: 1.281.2;
Move the assignment of SCTP-specific function hooks/pointers.

Without this, a rumpkernel (appropriately modified) built with SCTP
enabled will try to assign the function pointers, but the targets
are only available in rumpnet. We cannot link the rumpkernel against
rumpnet because rumpnet is already linked against rumpkernel and we
would end up with a circular dependency.

As reported in private Email by rjs@
 1.280  01-Jun-2019  maxv Add XXXs for SCTP bugs.
 1.279  08-May-2019  christos PR/54176: Anthony Mallet:
getsockopt(2) does not silently truncate returned optval
 1.278  15-Apr-2019  pgoyette Clean up this mess and simplify, so that all the socket options get
handled correctly whether or not the compat_50 module is loaded.
 1.277  15-Apr-2019  pgoyette If the compat code successfully handled an option, don't return an error.
 1.276  15-Apr-2019  pgoyette Actually update the timeout value for the compability sockops
 1.275  15-Apr-2019  pgoyette Split the COMPAT_50 socket-timeout stuff out of kern/uipc_socket.c
and into its own source file, which is now included in the compat_50
module.

(Not sure how this got missed during the original [pgoyette-compat] work)
 1.274  14-Apr-2019  maxv Add more checks, if the values are negative we hit a KASSERT later in the
timeout.

Reported-by: syzbot+662dbeb526303f458255@syzkaller.appspotmail.com
 1.273  08-Apr-2019  maxv Reset so_cred to NULL after freeing it, because close() may leave the PCB
in pcblist, and we don't want a future lookup (via eg netstat) to read
freed data.

Detected by KASAN, reported by Alexander Nasonov.
 1.272  31-Mar-2019  maxv Also check for MT_CONTROL, and end the receive operation if we see one. It
is possible to get an MT_CONTROL if we sleep in MSG_WAITALL. The other BSDs
do the same.

Reported-by: syzbot+e8aa26ad551c649227b4@syzkaller.appspotmail.com
 1.271  07-Mar-2019  maxv Remove getsombuf(), unused.
 1.270  07-Mar-2019  maxv style
 1.269  04-Feb-2019  mrg add a couple of fallthru comments.
 1.268  22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.267  07-Nov-2018  hannken Update getsockopt(SO_ERROR) to behave like soreceive() and
return and clear so->so_rerror if so->so_error is zero.

Ok: christos@
 1.266  04-Nov-2018  christos - Introduce a new SO_RERROR socket option to explicitly turn on
receive overflow errors re-instating the default behavior to
silently ignore them as before 2018-03-19.
- Introduce a new kern.sooptions sysctl to control the default
behavior of socket options. Setting this to 0x4000 (SO_RERROR),
turns on receive overflow error reporting for all sockets.
- Change dhcpcd to turn on SO_RERROR on all its sockets.

As discussed in tech-net.
 1.265  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.264  06-Jun-2018  roy branches: 1.264.2;
Separate receive socket errors from general socket errors.
 1.263  26-Apr-2018  maxv Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.
 1.262  26-Apr-2018  maxv Remove unused mbuf argument from sbsavetimestamp.
 1.261  19-Mar-2018  roy socket: remove now incorrect comment that so_error is only udp

As it can be affected by route(4) sockets which are raw.
 1.260  19-Mar-2018  roy socket: clear error even when peeking

The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
 1.259  04-Jan-2018  christos branches: 1.259.2;
Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)
 1.258  01-Jan-2018  christos make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)
 1.257  25-Oct-2017  maya Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.256  06-Jul-2017  christos move the timestamp stuff to uipc_socket.c because it already has the compat
includes.
 1.255  27-May-2017  bouyer branches: 1.255.2;
merge the bouyer-socketcan branch to HEAD.

CAN stands for Controller Area Network, a broadcast network used
in automation and automotive fields. For example, the NMEA2000 standard
developped for marine devices uses a CAN network as the link layer.

This is an implementation of the linux socketcan API:
https://www.kernel.org/doc/Documentation/networking/can.txt
you can also see can(4).

This adds a new socket family (AF_CAN) and protocol (PF_CAN),
as well as the canconfig(8) utility, used to set timing parameter of
CAN hardware. Also inclued is a driver for the CAN controller
found in the allwinner A20 SoC (I tested it with an Olimex lime2 board,
connected with PIC18-based CAN devices).

There is also the canloop(4) pseudo-device, which allows to use
the socketcan API without CAN hardware.

At this time the CANFD part of the linux socketcan API is not implemented.
Error frames are not implemented either. But I could get the cansend and
canreceive utilities from the canutils package to build and run with minimal
changes. tcpudmp(8) can also be used to record frames, which can be
decoded with etherreal.
 1.254  25-May-2017  christos switch to a switch
 1.253  01-May-2017  ryo whitespace police
 1.252  13-Oct-2016  uwe branches: 1.252.2; 1.252.6;
Revert to revision 1.249 to undo changes from PR 49636.

Marking up some zeroes with a type suffix, while not marking others in
the very same function does nothing but places cognitive burden on the
reader.

Spelling "clear bits" as "&~" is actually not uncommon (and some say
is more readable).
 1.251  10-Oct-2016  dholland foo & ~bar, not foo &~ bar. From Henning Petersen in PR 49636.
 1.250  10-Oct-2016  dholland PR 49636 Henning Petersen: use "0L" to return 0 from a function returning
long, and test its returned value against "0L" instead of "0".

This is not especially necessary, but it's also harmless.
 1.249  02-Oct-2016  christos more MFREE -> m_free
 1.248  10-Jun-2016  ozaki-r branches: 1.248.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.247  13-Oct-2015  rjs Add core networking support for SCTP.
 1.246  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.245  09-May-2015  rtr change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16
 1.244  03-May-2015  rtr flip (NULL == addr) to (addr == NULL) use in conditional from previous
commit.
 1.243  02-May-2015  rtr compare mbuf * pointer to NULL instead of 0
 1.242  02-May-2015  rtr remove unnecessary check that nam != NULL before deref in soconnect()
(added in previous commit).

sockargs copyin() makes sure we don't get NULL here
 1.241  02-May-2015  rtr make soconnect() fail with EAFNOSUPPORT if the domain of the socket does
not match family received in the sockaddr.

* connect() now fails as documented in connect(2).
* atf test t_connect:connect_foreign_family now passes.
 1.240  02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.239  24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.238  05-Apr-2015  rtr change return from EINVAL to EAFNOSUPPORT when the domain of the socket
does not match the family of the address to be bound.

fixes atf test lib/libc/sys/t_bind bind_foreign_family
 1.237  05-Apr-2015  rtr make bind() fail with EINVAL if the address family of the provided
socket does not match the address family of the sockaddr received.
 1.236  03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.235  05-Sep-2014  matt branches: 1.235.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.234  09-Aug-2014  rtr split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.233  08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.232  05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.231  05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.230  31-Jul-2014  mrg call ->pr_abort(so) now instead of generic PRU_ABORT.
fixes kern/49056, and appears to remove the only missed PRU_ABORT call.
 1.229  31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.228  30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.227  24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.226  23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.225  09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.224  19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.223  18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.222  17-May-2014  rmind - fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.
 1.221  25-Feb-2014  pooka branches: 1.221.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.220  02-Nov-2013  christos PR/48098: Brian Marcotte: panic: kernel diagnostic assertion "cred != NULL":
Fix from Michael van Elst, tcpdrop crashes kernel on ebryonic connections.
 1.219  17-Oct-2013  christos initialize a variable, hi gcc again!
 1.218  08-Oct-2013  seanb POSIX says getsockopt(s, SOL_SOCKET, SO_ACCEPTCONN,,) needs to work.
 1.217  29-Aug-2013  rmind Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.
 1.216  02-Aug-2013  spz Fix an inversion in checking for authorization to drop TCP connections
found (and the obvious fix suggested) by Sander Bos.
 1.215  08-Apr-2013  skrll branches: 1.215.4;
Remove some set but unused variables
 1.214  14-Mar-2013  gdt Add comment questioning lock asymmetry.
 1.213  14-Feb-2013  christos PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.
 1.212  08-Oct-2012  pooka put all kern socket sysctls in the same place
 1.211  09-Jul-2012  chs branches: 1.211.2;
in soreceive(), handle uios larger than 31 bits.
fixes the remaining problem in PR 43240.
 1.210  16-Mar-2012  matt Fix PR/49150.
Make listen(2) match the opengroup specification for what what errno to
return if the socket is connected when a listen(2) is attempted.
 1.209  01-Feb-2012  matt branches: 1.209.2;
When using socket loaning, make sure the KVA used for the loan has the same
color as the UVA being loaned.
 1.208  27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.207  25-Jan-2012  christos As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]
 1.206  20-Dec-2011  christos - Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).
 1.205  02-Jul-2011  bouyer branches: 1.205.2; 1.205.6;
Fix kern/45093 as discussed on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2011/06/17/msg010734.html

The cause of the problem is that the so_pendfree is processed with
the softnet_lock held at one point, and processing the list
calls sodoloanfree() which may kpause(). As the thread sleeps with
softnet_lock held, it ultimately cause a deadlock (see the PR or tech-kern
thread for details).
Although it should be possible to call sodopendfree() after releasing
the socket lock, it's not so easy to know where he socket lock is held and
where it's not, so we may hit the issue again later.
Add a kernel thread to handle the so_pendfree list, and wake up this
thread when adding mbufs to this list. Get rid of the various sodopendfree()
calls, hopefully fixing definitively the problem.
 1.204  26-Jun-2011  christos * Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.
 1.203  01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.202  17-Jan-2011  uebayasi branches: 1.202.2;
Include internal definitions (uvm/uvm.h) only where necessary.
 1.201  14-Oct-2010  oki branches: 1.201.2;
Wait for freeing mbuf cluster in sosend() causes freeze network stack.
Don't wait for it.
problem was found by iij seil team.
it is similar to OpenBSD uipc_socket.c rev.1.72.
 1.200  30-Dec-2009  elad branches: 1.200.2; 1.200.4;
Don't bother caching egid. It'll be removed soon.
 1.199  30-Dec-2009  elad Use credentials from the socket.
 1.198  29-Dec-2009  elad Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!
 1.197  29-Dec-2009  elad Remove commented-out code that should not have gone in.
 1.196  20-Dec-2009  dsl If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
 1.195  09-Dec-2009  dsl Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
 1.194  07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.193  03-Oct-2009  elad Move KAUTH_NETWORK_BIND::KAUTH_REQ_NETWORK_BIND_PORT policy back to the
subsystem (or close to it).

Note: Revisit KAUTH_REQ_NETWORK_BIND_PRIVPORT.
 1.192  03-Oct-2009  elad Finish moving socket policy to the subsystem.
 1.191  02-Oct-2009  elad Move some of the socket policy back to the subsystem.

Remove include we don't need in the secmodel code.
 1.190  11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.189  30-Apr-2009  ad PR kern/41311: Mutex error: mutex_vector_enter: locking against myself
 1.188  04-Apr-2009  ad Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)
 1.187  15-Mar-2009  cegger ansify function definitions
 1.186  23-Jan-2009  pooka branches: 1.186.2;
solock() in compat code error branch to avoid panic
 1.185  21-Jan-2009  yamt restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.
 1.184  19-Jan-2009  christos Provide compatibility to the old timeval SCM_TIMESTAMP messages.
 1.183  15-Jan-2009  christos check for error in the COMPAT_50 case.
 1.182  15-Jan-2009  christos reverse the polarity of the use of the error variable. Using a different
variable would be cleaner but it would require more ifdefs.
 1.181  14-Jan-2009  christos correct previous, fix reversed test, remove memset.
 1.180  14-Jan-2009  cegger make this compile: fix gcc warning about uninitialized use of tv.sec and tv.usec.
 1.179  14-Jan-2009  christos version get/set send/recv timeout setsockopt.
 1.178  07-Dec-2008  pooka Move some sysctl node creations away from linksets and into the
constructors for subsystems.

XXX: CTLFLAG_PERMANENT is non-sensible.
 1.177  14-Oct-2008  ad branches: 1.177.2; 1.177.4;
Accept filters:

- Remove remaining #ifdef INET.
- Avoid holding locks so we don't need to do KM_NOSLEEP allocations.
- Use a rwlock to protect the accept filter list.
- Make it safe to unload accept filter modules.
- Minor KNF.
 1.176  12-Oct-2008  plunky fix problem pointed out by ad where sockopt may end up sleeping
inappropriately with the socket lock held.

sockopt_init() may sleep

sockopt_set() will not sleep

sockopt_getmbuf() for legacy code will not sleep
 1.175  11-Oct-2008  tls Address problems with accept filters noted by ad in his source-changes
mail: http://mail-index.netbsd.org/source-changes/2008/10/10/msg211109.html

* Scary-looking socket locking stubs (changed to KASSERT of locked)

* depends on INET inappropriately (though now you must add new
accept filter names to the uipc_accf.c line in conf/files if
you aren't using dataready or httpready)

* New code uses MALLOC/FREE -- changed to kmem_alloc/kmem_free;
could be pool_cache, these are all fixed-size allocations.

We need to verify that this works as expected with protocols with per-socket
locking, like PF_LOCAL. I'm a little concerned about the case where the
lock on the listen socket isn't the same lock as on the eventual connected
socket.
 1.174  11-Oct-2008  pooka Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
 1.173  10-Oct-2008  plunky use kmem_alloc/kmem_free rather than malloc() for sockokpt
 1.172  10-Oct-2008  ad Redo 1.169 correctly (dsl's fix doesn't do what I intended, either!).
 1.171  06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.170  04-Aug-2008  tls Add accept filters, ported from FreeBSD by Coyote Point Systems. Add inetd
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.

OK core@.
 1.169  25-Jul-2008  dsl Remove all the pending connections in soclose().
'continue' in 'do .. while (0)' doesn't do what ad@ intended.
 1.168  18-Jun-2008  yamt branches: 1.168.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@
 1.167  28-May-2008  ad branches: 1.167.2;
Disable zero copy if MULTIPROCESSOR, until it is fixed:

- The TLB coherency overhead on MP systems is really expensive.
- It triggers a race in the VM system (grep kpause uvm/*).
 1.166  26-May-2008  ad Use pool_cache for sockets.
 1.165  24-May-2008  christos Coverity CID 5015: Remove unnecessary test; if l was null we would have
crashed before when p = l->l_proc.
 1.164  01-May-2008  drochner branches: 1.164.2;
fix soabort(): sofree() wants to be called with the lock held
approved by ad
 1.163  29-Apr-2008  ad solisten: don't leak lock if the socket is busy.
 1.162  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.161  27-Apr-2008  ad Fix a use-after-free in soabort(). It would be better to kill SS_NOFDREF
and maintain a per-socket reference count, but SS_NOFDREF is slightly
more than a simple reference count and I don't want to break anything.
 1.160  24-Apr-2008  ad branches: 1.160.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.159  14-Apr-2008  ad branches: 1.159.2;
soreceive: dom_externalize/dom_dispose can block. If new messages are
appended while the receiver is blocked, the sockbuf will be corrupted.
Dequeue control messages from the sockbuf and sync its state in one
pass. Only then process the control messages. From FreeBSD.
 1.158  28-Mar-2008  ad Prevent listen() on a socket that is already connected - we already prevent
connect() on a listening socket.
 1.157  27-Mar-2008  ad Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.156  24-Mar-2008  yamt merge yamt-lazymbuf branch.
 1.155  21-Mar-2008  ad Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.154  20-Mar-2008  ad - Extract the guts of soo_poll() into sopoll(), which takes a struct socket *.
This is for netsmb which wants to poll sockets directly.
- When polling a socket, first check for pending I/O without acquring any
locks. If no I/O seems to be pending, acquire locks/spl and check again
doing selrecord() if necessary.
 1.153  01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.152  27-Feb-2008  matt Convert stragglers to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.151  06-Feb-2008  ad branches: 1.151.2; 1.151.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.
 1.150  16-Dec-2007  elad Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.
 1.149  05-Dec-2007  pooka branches: 1.149.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.148  05-Dec-2007  ad Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.
 1.147  24-Nov-2007  dyoung branches: 1.147.2;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().
 1.146  24-Nov-2007  dyoung Eliminate common subexpressions, creating variables 'atomic' and
'dom'.
 1.145  07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.144  05-Oct-2007  dyoung branches: 1.144.2; 1.144.4;
Cosmetic: KNF. Shorten a staircase. Indent a complicated
if-condition. This code remains hard to read.
 1.143  25-Sep-2007  ad Use selinit() / seldestroy().
 1.142  19-Sep-2007  dyoung branches: 1.142.2;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.141  06-Aug-2007  yamt branches: 1.141.2; 1.141.4;
sosetopt: clear SB_AUTOSIZE when setting buffer size explicitly.
 1.140  02-May-2007  dyoung branches: 1.140.2; 1.140.6;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.139  15-Apr-2007  yamt - soabort: don't leak a socket on error.
- add an assertion.
 1.138  03-Apr-2007  rmind socreate: l cannot be NULL.
CID: 4314
 1.137  15-Mar-2007  ad sodopendfreel: Getting a bit over ambitious.. Go to splvm() before calling
pool_cache_put().
 1.136  12-Mar-2007  ad branches: 1.136.2;
Use mutexes/condvars.
 1.135  12-Mar-2007  ad branches: 1.135.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.134  04-Mar-2007  christos branches: 1.134.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.133  22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.132  17-Jan-2007  elad branches: 1.132.2;
Use KAUTH_ARG().
 1.131  08-Dec-2006  christos - make so_linger unsigned short to double the range
- return 1 or 0 for the flag being set instead of the flag value
- check for range properly
 1.130  06-Dec-2006  christos simplify linger code.
 1.129  01-Nov-2006  yamt branches: 1.129.2;
remove some __unused from function parameters.
 1.128  30-Oct-2006  elad Use integers, not pointers to integers, for KAUTH_REQ_NETWORK_SOCKET_OPEN.

Reminded by yamt@, thanks!
 1.127  25-Oct-2006  elad Introduce KAUTH_REQ_NETWORK_SOCKET_OPEN, to check if opening a socket is
allowed. It takes three int * arguments indicating domain, type, and
protocol. Replace previous KAUTH_REQ_NETWORK_SOCKET_RAWSOCK with it (but
keep it still).

Places that used to explicitly check for privileged context now don't
need it anymore, so I replaced these with XXX comment indiacting it for
future reference.

Documented and updated examples as well.
 1.126  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.125  03-Oct-2006  elad Back out socket credentials for now, until we figure a better way of
handling the reference counting from interrupt context.
 1.124  02-Oct-2006  elad Move the kauth_cred_free() call above the "is connected" check to not
leak credentials.

Pointed out by yamt@, thanks!
 1.123  02-Oct-2006  elad Add credentials to sockets, 'so_cred'.

Brought up on tech-kern@ some ~2 months ago, didn't seem to be an
objection; brought up again recently and no objection either... this is
not too intrusive and I've been running with this for a while.
 1.122  23-Jul-2006  ad branches: 1.122.4; 1.122.6;
Use the LWP cached credentials where sane.
 1.121  21-Jun-2006  yamt bump default so_snd.sb_lowat to increase chance to use loaning.

the idea to tweak the watermark from Jonathan Stone.
reviewed by Bill Studenmund.
 1.120  13-Jun-2006  ginsbach branches: 1.120.2;
Add EAFNOSUPPORT as a possible error if the address family is not
supported. This adds further differentiation between which argument to
socket(2) caused the error. No longer are invalid domain (address family)
errors classified as ENOPROTOSUPPORT errors. This should make socket(2)
conform to current POSIX and X/Open standards. Fixes PR/33676.
 1.119  25-May-2006  yamt move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
 1.118  14-May-2006  elad branches: 1.118.2;
integrate kauth.
 1.117  11-Apr-2006  yamt sodopendfree/sodopendfreel: remove unused "so" argument.
 1.116  01-Mar-2006  yamt branches: 1.116.2; 1.116.4; 1.116.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.115  27-Dec-2005  yamt branches: 1.115.2; 1.115.4; 1.115.6;
socreate: fix a null dereference on nfs reconnect, introduced by ktrace-lwp.
 1.114  11-Dec-2005  christos merge ktrace-lwp.
 1.113  08-Dec-2005  thorpej Sprinkle static.
 1.112  21-Oct-2005  nathanw Check the argument to SO_LINGER.
 1.111  08-May-2005  christos branches: 1.111.2; 1.111.4;
Panic strings should not end with \n.
 1.110  07-May-2005  christos PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
 1.109  01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.108  26-Feb-2005  perry branches: 1.108.2;
nuke trailing whitespace
 1.107  03-Sep-2004  darrenr branches: 1.107.4; 1.107.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.
 1.106  25-Aug-2004  itojun bug reported by millert@openbsd:
> Call dom_dispose() for any SCM_RIGHTS message that went through the
> read path rather than recv. Previously, if an fd was passed via
> sendmsg() but was consumed by the receiver via read() the ref count
> was incremented and never decremented and so the ref count would
> never reach zero even when there was no long any processes holding
> the file open (this was especially bad for locked fds).
 1.105  19-Aug-2004  christos PR/26210: Matthew Mondor: Since revision 1.14 when net-2 was merged,
the code to do receive packet accounting has been disabled for no apparent
reason. Re-enable it.
 1.104  01-Jul-2004  yamt bump sb_timeo from short to int to allow longer timeouts.
especially when hz is high.

while i'm here, bump sb_flags to int, as suggested by
Jason Thorpe and Bill Studenmund.

ride on 2.0G.
 1.103  25-May-2004  atatat Remaining sysctl descriptions under kern subtree
 1.102  22-May-2004  jonathan Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.
 1.101  01-May-2004  matt Use EVCNT_ATTACH_STATIC
 1.100  25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.99  22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.98  17-Apr-2004  christos PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.
 1.97  24-Mar-2004  atatat branches: 1.97.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.96  21-Mar-2004  mycroft Remove part of a very old change that caused NFS to not enforce socket buffer
limits. No idea why it was done in the first place.

Don't remember who reported this, but I think it was yamt.
 1.95  17-Mar-2004  yamt sokvaalloc: unreserve kva if uvm_km_valloc_wait failed.
 1.94  17-Mar-2004  yamt - move kern.somaxkva sysctl stuff from init_sysctl.c to uipc_socket.c.
- when changing its value, wakeup sokva waiters.
 1.93  17-Mar-2004  yamt - fix locking of sosend kva allocation.
- some comments.
 1.92  17-Mar-2004  yamt remove per-socket pendfree list.
 1.91  21-Oct-2003  thorpej Cache the "adjusted" value of sb_max when sb_max is changed, in order
to avoid doing quad math in sbreserve().

Change suggested by Simon Burge, and code inspired by a similar change
in FreeBSD.
 1.90  22-Sep-2003  christos - pass signo to fownsignal [ok by jd]
- make urg signal handling use fownsignal
- remove out of band detection in sowakeup
 1.89  15-Sep-2003  christos include <sys/poll.h>
 1.88  14-Sep-2003  christos provide some more ksiginfo info.
 1.87  06-Sep-2003  christos SA_SIGINFO changes.
 1.86  04-Sep-2003  wrstuden Adjust struct sockbuf and sorflush() so that we don't zero out
every field; some need to stay around.

Fixes a bug where by calling shutdown() on a socket with knotes
will cause the kernel to panic when the kernel closes the socket.
Other access, such as calling kevent() may also trigger the panic.

Debugged with help from Jason and Allen. Patch reviewed by same plus
Itojun and Matt Thomas.

This problem seems to be the same one that FreeBSD saw in their PR
number 54331.

Kernel version _not_ bumped as we will piggyback the bump earlier today.
 1.85  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.84  02-Jul-2003  ragge Make somaxkva modifyable via sysctl (and compile-time) instead of
hardcoding its size.
 1.83  29-Jun-2003  fvdl branches: 1.83.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.82  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.81  23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.80  03-May-2003  yamt export some of sosend loan routines for nfsd.
 1.79  09-Apr-2003  thorpej * Use a pool_cache constructor to record the physical address of mbufs
in the mbuf header.
* Use the new cached paddr feature of the pool_cache API to record
the physical address of mbuf clusters. (We cannot use a ctor for
clusters, since clusters have no constructed form; they are merely
buffers).

Bus_dma back-ends may use the cached physical addresses to save having to
extract the physical address from virtual.

* Provide space in m_ext recording the vm_page *'s for an SOSEND_LOAN_CHUNK-
sized non-cluster external buffer. Use this in the sosend_loan code to
save having to extract the physical address from virtual and then look
up the vm_page *'s.

* Provide an indication that an external buffer is mapped read-only at
the MMU. Set this flag for the external buffer in the sosend_loan
case, since loaned pages are always mapped read-only. Bus_dma back-ends
may use this information to save cache flushing, since a cache flush of
a read-only mapping is redundant on some architectures (the cache would
have already been flushed when making the mapping read-only).

Part 2 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
 1.78  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.77  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.76  31-Jan-2003  thorpej Change ext_size to a size_t, and update the signature of ext_free.
 1.75  27-Nov-2002  itojun "tv->tv_sec * hz" could overflow a long. millert@openbsd
 1.74  27-Nov-2002  itojun small SO_RCVTIMEO values are mistakenly taken to be zero. FreeBSD PR kern/32827.
 1.73  26-Nov-2002  christos si_ -> sel_ to avoid conflicts with siginfo.
 1.72  23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.71  21-Aug-2002  thorpej Make use of page loaning for large socket writes the default. The
SOSEND_NO_LOAN option can be used to go back to the old behavior.
 1.70  03-Jul-2002  thorpej Rename SB_UPDATE_TAIL() to SB_EMPTY_FIXUP(), per suggestion from
Jonathan Stone.
 1.69  03-Jul-2002  thorpej Make insertion of data into socket buffers O(C):
* Keep pointers to the first and last mbufs of the last record in the
socket buffer.
* Use the sb_lastrecord pointer in the sbappend*() family of functions
to avoid traversing the packet chain to find the last record.
* Add a new sbappend_stream() function for stream protocols which
guarantee that there will never be more than one record in the
socket buffer. This function uses the sb_mbtail pointer to perform
the data insertion. Make TCP use sbappend_stream().

On a profiling run, this makes sbappend of a TCP transmission using
a 1M socket buffer go from 50% of the time to .02% of the time.

Thanks to Bill Sommerfeld and YAMAMOTO Takashi for their debugging
assistance!
 1.68  11-Jun-2002  matt Fix 2 bugs with MSG_WAITALL. The first is to not block forever if one is
trying to MSG_PEEK for more than the socket can hold. The second is that
before sleeping waiting for more data, upcall the protocol telling it you
have just received data so it can kick itself to re-fill the just drained
socket buffer.
 1.67  10-Jun-2002  he In soreceive(), if any part of a received record has been freed,
and an error occurs, make sure the socket doesn't retain a partial
copy by dropping the rest of the record.

This would otherwise trigger a panic("receive 1a") under DIAGNOSTIC.

Fixes PR#16990, suggested fix adapted.

Reviewed by Matt Thomas.
 1.66  07-May-2002  enami branches: 1.66.2; 1.66.4;
In soreceive(), don't call sopendfree() if MSG_DONTWAIT is set
since it may sleep. nfsrv_rcv() tries to do its jobs in softintr
handler as far as possible.
 1.65  03-May-2002  thorpej Let the sosend_loan() path be selected at run-time; patch the variable
use_sosend_loan to enable/disable it. The SOSEND_LOAN kernel option
now causes it to default to 1.
 1.64  02-May-2002  thorpej Add some experimental page-loaning for writes on sockets. It is disabled
by default, and can be enabled by adding the SOSEND_LOAN option to your
kernel config. The SOSEND_COUNTERS option can be used to provide some
instrumentation.

Use of this option, combined with an application that does large enough
writes, gets us zero-copy on the TCP and UDP transmit path.
 1.63  06-Apr-2002  matt Don't use the tqh_ field names, instead use the correspond TAILQ_* macro.
 1.62  08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.61  03-Jan-2002  mrg fix previous: actually remove the COMPAT_SUNOS code, not just #if 0 it.
 1.60  03-Jan-2002  mrg move the COMPAT_SUNOS SO_BROADCAST hack out of uipc_socket.c into the
compat/sunos code. besides being cleaner this allows the sunos LKM
to properly work without any special kernel hacks.
 1.59  12-Nov-2001  lukem add RCSIDs
 1.58  29-Sep-2001  jdolecek branches: 1.58.2;
Use lmin() instead of min(), and long for mlen & clen, to avoid integer
overflow on LP64 architectures. This fixes kern/10070 by Juergen Weiss.

Fix tested on NetBSD/alpha by Bernd Ernesti, on NetBSD/sparc64
by David Brownlee and Eduardo Horvath.
 1.57  17-Sep-2001  jdolecek soreceive(): do not ignore uiomove() error
Problem reported and fix provided by Aaro Koskinen in kern/11692.
 1.56  13-Apr-2001  thorpej branches: 1.56.2; 1.56.4;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.55  21-Mar-2001  thorpej Add a protosw flag, PR_ABRTACPTDIS (Abort on Accept of Disconnected
Socket), and add it to the protocols that use that behavior (all
PR_LISTEN protocols except for PF_LOCAL stream sockets).
 1.54  27-Feb-2001  lukem branches: 1.54.2;
convert to ANSI KNF
 1.53  07-Feb-2001  itojun return ECONNABORTED, if the socket (tcp connection for example)
is disconnected by RST right before accept(2). fixes PR 10698/12027.
checked with SUSv2, XNET 5.2, and Stevens (unix network programming
vol 1 2nd ed) section 5.11.
 1.52  22-Jan-2001  itojun when the peer is disconnected before accept(2) is issued,
do not return junk data in mbuf (= sockaddr on accept(2)'s 2nd arg).
set the length zero.

behavior checked with bsdi and freebsd.
partial solution to PR 12027 and 10698 (need more investigation).
 1.51  10-Dec-2000  fvdl Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).
 1.50  30-Mar-2000  augustss branches: 1.50.4;
Get rid of register declarations.
 1.49  07-Feb-2000  jonathan Make kernel SOMAXCONN patchable. Will add sysctl once we
decide on namespace.
 1.48  08-Jun-1999  thorpej branches: 1.48.2;
In sosend(), if so_error is set, clear it before returning the error to
the process (i.e. pre-Reno behavior). The 4.4BSD behavior (introduced
in Reno) caused transient errors to stick incorrectly.

This is from PR #7640 (Havard Eidnes), cross-checked w/ FreeBSD, where
Bill Fenner committed the same fix (as described in a comment in the
Vat sources, by Van Jacobsen).
 1.47  15-May-1999  sommerfeld Delete test code.
 1.46  15-May-1999  sommerfeld Revise previous fix:
1) protect socket flags under splsoftnet()
2) avoid leaking memory on an error
 1.45  15-May-1999  tv Wow, that was much easier than I originally thought. Fix PR kern/7583:
serious race condition in sosend(). Upon closer inspection, the appropriate
flags are checked within splsoftnet() for soreceive(), so no change needed
there. Also a little KNFing in sosend().
 1.44  23-Mar-1999  lukem branches: 1.44.2; 1.44.4; 1.44.6;
Ensure that you can only bind a more specific address when it is done by the
same uid or by root.

This code is from FreeBSD. (Whilst it was originally obtained from OpenBSD,
FreeBSD fixed it to work with multicast. To quote the commit message:
- Don't bother checking for conflicting sockets if we're binding to a
multicast address.
- Don't return an error if we're binding to INADDR_ANY, the conflicting
socket is bound to INADDR_ANY, and the conflicting socket has
SO_REUSEPORT set.
)
 1.43  21-Jan-1999  mycroft Do remove sockets on so_q0, since select(2) and accept(2) do not (currently?)
return them.
 1.42  20-Jan-1999  mycroft Oops; previous was slightly broken.
 1.41  20-Jan-1999  mycroft Do not remove sockets from the accept(2) queue on close.
 1.40  16-Dec-1998  thorpej In the sosend() loop, if the residual count is > 0 before calling PRU_SEND,
set SS_MORETOCOME as a hint to the lower layer that more data is coming
on the next iteration of the loop. Clear the flag after the PRU_SEND
call.

Suggested by Justin Walker <justin@apple.com> on the freebsd-net
mailing list.
 1.39  25-Sep-1998  matt Fix spl problem in socreate (which lead to the corruption of the
socket pool).
 1.38  04-Aug-1998  perry Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)
 1.37  02-Aug-1998  thorpej Use the pool allocator for sockets.
 1.36  31-Jul-1998  perry fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.
 1.35  25-Jun-1998  thorpej branches: 1.35.2;
defopt COMPAT_SUNOS
 1.34  27-Apr-1998  kleink In soshutdown(), decouple the evaluation of the `how' argument from FREAD
and FWRITE; use SHUT_{RD,WR,RDWR} instead.
Also, return EINVAL if `how' is invalid.
 1.33  25-Apr-1998  matt Hook for 0-copy (or other optimized) sends and receives
 1.32  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.31  07-Jan-1998  thorpej Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).
 1.30  05-Jan-1998  thorpej From 4.4BSD-Lite2 (noted by Frank van der Linden):
so_linger is used as an argument to tsleep(), so was stuffed with
clockticks for the TCP linger time. However, so_linger is set directly from
l_linger if the linger time is specified, and l_linger is seconds (although
this is not currently documented anywhere). Fix this to set the TCP
linger time in seconds, and multiply so_linger by hz when tsleep() is
called to actually perform the linger.
 1.29  27-Aug-1997  mycroft branches: 1.29.4;
Fix a mbuf leak in sosend() when we have a negative residual count.
 1.28  24-Jun-1997  thorpej branches: 1.28.4;
In sosetopt():
- Disallow < 1 values for SO_SNDBUF, SO_RCVBUF, SO_SNDLOWAT, and
SO_RCVLOWAT; return EINVAL if the user attempts to set <= 0.
Inspired by PR #3770, from Havard Eidnes <he@vader.runit.sintef.no>.
- For SO_SNDLOWAT and SO_RCVLOWAT, don't let the low-water mark get
set above the high-water mark. Behavior is now consistent with
BSD/OS: If such an attempt is made, silently truncate to the high-water
value.
 1.27  11-Jun-1997  kleink Calculate returned timeval correctly when using SO_SNDTIMEO/SO_RCVTIME;
from Koji Imada <koji@math.human.nagoya-u.ac.jp> in PR/3682.
 1.26  11-Jan-1997  thorpej Implement SO_TIMESTAMP socket option: receive a timeval timestamp
as a control message with a datagram.
 1.25  14-Aug-1996  explorer This fixes a nasty little bug where traceroute (and other raw-ip sending
programs which attach their own header) can crash the machine. The problem
in this case was:
a variable "space" was set to the total data to copy,
len was used to remember how much to copy in this chunk (mbuf),
in one case, len = min(MCLBYTES - max_hdr, resid) but
size -= MCLBYTES;
instead of
size -= len;

Note that userland programs can still crash the machine by providing
bogus data in the ip->ip_len field I suspect. I haven't verified this,
but will soon be doing so and applying a fix of some sort. Probably
clamping the ip->ip_len value to the true packet size will be ok.
 1.24  22-May-1996  mycroft And PRU_SEND.
 1.23  22-May-1996  mycroft PRU_CONNECT also needs a proc pointer.
 1.22  22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.21  04-Feb-1996  christos branches: 1.21.4;
First pass at prototyping
 1.20  12-Aug-1995  mycroft splnet --> splsoftnet
 1.19  23-May-1995  cgd properly determine if send/rcv timeout values are out of range.
 1.18  22-Apr-1995  christos - new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.
 1.17  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.16  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.15  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.14  04-May-1994  mycroft More return types...
 1.13  25-Apr-1994  mycroft Remove another bit of that.
 1.12  25-Apr-1994  mycroft Remove a piece of the previous patch.
 1.11  25-Apr-1994  mycroft Minor cleanup.
 1.10  23-Jan-1994  deraadt more COMPAT_SUNOS changes.
 1.9  18-Dec-1993  mycroft Canonicalize all #includes.
 1.8  05-Nov-1993  cgd fix from david greenman, davidg@freefall.cdrom.com:
sosend was attempting to reserve space in an mbuf cluster for a datagram
header and because of bugs in the sosend's mbuf allocation algorithm,
sosend was calling uiomove twice as many times as was necessary. It turns
out that PREPEND does the right thing when a cluster is associated with
an mbuf header, so the datagram header allocation can be defered. This
also ends up additionally consuming one less mbuf for the TCP protocol
because TCP always allocates another header mbuf regardless if space is
available to prepend the protocol header. The net result of this fix is
that unix domain and pipe throughput is increased by a measured 10%.
 1.7  26-Oct-1993  cgd BSDI official patch #14:
SUMMARY:
Here is a patch for a kernel hang that can be provoked with a write
or send of a negative amount. The talk program is capable of exercising
this bug. This patch also includes a fix for a bug that caused data
to be delivered to TCP in smaller chunks than desired, and which caused
TCP to send a short packet when starting up. Finally, there is a bug
fix for MSG_PEEK with an oobmark pending.
 1.6  08-Sep-1993  mycroft branches: 1.6.2;
Patch from David Greenman to reduce CPU usage during network transmit.
 1.5  03-Aug-1993  mycroft Nuke an extra `||' Chris inserted.
 1.4  03-Aug-1993  cgd fix from Garrett Wollman <wollman@emba.uvm.edu> to return EPROTONOTSUPP
if user tries to get a socket for a protocol with no usrreq function
 1.3  27-Jun-1993  andrew * ansifications
* Yuval Yarom's socket recv(2) fixes, to prevent incorrect blocking and
lack thereof with recv(2) and MSG_WAITALL. Fixes a sbdrop() panic during
some MSG_WAITALL recv(2) sleeps. Access rights fix (also in
uipc_syscalls.c) too. A test program which shows these problems is
available.
 1.2  18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6.2.3  10-Dec-1993  pk Always set BROADCAST flag in DGRAM sockets in SunOS emulation procs.
 1.6.2.2  06-Nov-1993  mycroft Merge changes from trunk.
 1.6.2.1  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
init_main.c: New method of pseudo-device of initialization.
kern_clock.c: hardclock() and softclock() now take a pointer to a clockframe.
softclock() only does callouts.
kern_synch.c: Remove spurious declaration of endtsleep(). Adjust uses of
averunnable for new struct loadav.
subr_prf.c: Allow printf() formats in panic().
tty.c: averunnable changes.
vfs_subr.c: va_size and va_bytes are now quads.
 1.21.4.2  11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
 1.21.4.1  06-Dec-1996  rat Pullup request 1.24 -> 1.25 from Michael Graff <explorer@flame.org>

Fixes bug that caused kernel to crash when ``traceroute host 7000'' was
executed.
 1.28.4.1  28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.29.4.2  25-Jan-1999  cgd Patch to fix select(2)/accept(2) race condition which permits DoS. (mycroft)
 1.29.4.1  30-Jan-1998  mellon Pull up 1.30 and 1.31 (thorpej)
 1.35.2.1  08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.44.6.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.44.6.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.44.6.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.44.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.44.2.2  21-Jun-1999  perry pullup 1.44->1.47 (tv). We are now in total sync with version 1.48
 1.44.2.1  18-Jun-1999  perry pullup 1.47->1.48 (thorpej)
 1.48.2.6  21-Apr-2001  bouyer Sync with HEAD
 1.48.2.5  27-Mar-2001  bouyer Sync with HEAD.
 1.48.2.4  12-Mar-2001  bouyer Sync with HEAD.
 1.48.2.3  11-Feb-2001  bouyer Sync with HEAD.
 1.48.2.2  13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.48.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.50.4.5  18-Sep-2002  itojun sys/kern/uipc_socket.c 1.53

Return ECONNABORTED, if the socket (tcp connection for example)
is disconnected by RST right before accept(2). Fixes PR#10698/12027.
Checked with SUSv2, XNET 5.2, and Stevens (Unix Network Programming
vol 1 2nd ed) section 5.11.
 1.50.4.4  12-Jun-2002  he Pull up revision 1.67 (via patch, requested by he):
Make sure to preserve atomicity of receive operation when instructed
to do so. Make sure to drop rest of partial record if an error
occurs. Fixes PR#16990.
 1.50.4.3  08-Oct-2001  he Pull up revision 1.57 (requested by jdolecek):
In soreceive(): do not ignore uiomove() error. Fixes PR#11692.
 1.50.4.2  03-Feb-2001  he Pull up revision 1.52 (requested by itojun):
Prevent bogus data from being returned from the kernel on accept(2)
in case the peer already have disconnected. Partial fix for
PR#12027 and PR#10698.
 1.50.4.1  15-Dec-2000  he Pull up revision 1.51 (requested by fvdl):
Fix NFS+tcp client hangs on server or network outage. Again,
please note that this introduces yet another kernel interface
change: sobind() gains an argument.
 1.54.2.18  11-Dec-2002  thorpej Sync with HEAD.
 1.54.2.17  11-Nov-2002  nathanw Catch up to -current
 1.54.2.16  27-Aug-2002  nathanw Catch up to -current.
 1.54.2.15  01-Aug-2002  nathanw Catch up to -current.
 1.54.2.14  15-Jul-2002  nathanw Whitespace.
 1.54.2.13  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.54.2.12  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.54.2.11  20-Jun-2002  nathanw Catch up to -current.
 1.54.2.10  17-Apr-2002  nathanw Catch up to -current.
 1.54.2.9  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.54.2.8  28-Feb-2002  nathanw Catch up to -current.
 1.54.2.7  11-Jan-2002  nathanw More catchup.
 1.54.2.6  14-Nov-2001  nathanw Catch up to -current.
 1.54.2.5  08-Oct-2001  nathanw Catch up to -current.
 1.54.2.4  21-Sep-2001  nathanw Catch up to -current.
 1.54.2.3  21-Jun-2001  nathanw Catch up to -current.
 1.54.2.2  09-Apr-2001  nathanw Catch up with -current.
 1.54.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.56.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.56.2.7  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.56.2.6  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.56.2.5  16-Mar-2002  jdolecek Catch up with -current.
 1.56.2.4  15-Mar-2002  jdolecek no need to protect the kqueue SLIST manipulation with splnet - the
structures are never changed nor accessed from interrupt context
filt_solisten(): g/c the comment about what FreeBSD does, leave only
comment what the code does (it was checked to be correct)
 1.56.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.56.2.2  07-Sep-2001  thorpej More const.
 1.56.2.1  10-Jul-2001  lukem add kqueue methods for filt_so*
 1.58.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.66.4.2  08-Nov-2002  tron Pull up revision 1.68 (requested by thorpej in ticket #429):
Fix 2 bugs with MSG_WAITALL. The first is to not block forever if one is
trying to MSG_PEEK for more than the socket can hold. The second is that
before sleeping waiting for more data, upcall the protocol telling it you
have just received data so it can kick itself to re-fill the just drained
socket buffer.
 1.66.4.1  11-Jun-2002  lukem Pull up revision 1.67 (requested by he in ticket #238):
In soreceive(), if any part of a received record has been freed,
and an error occurs, make sure the socket doesn't retain a partial
copy by dropping the rest of the record.
This would otherwise trigger a panic("receive 1a") under DIAGNOSTIC.
Fixes PR#16990, suggested fix adapted.
Reviewed by Matt Thomas.
 1.66.2.3  29-Aug-2002  gehenna catch up with -current.
 1.66.2.2  15-Jul-2002  gehenna catch up with -current.
 1.66.2.1  20-Jun-2002  gehenna catch up with -current.
 1.83.2.11  11-Dec-2005  christos Sync with head.
 1.83.2.10  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.83.2.9  01-Apr-2005  skrll Sync with HEAD.
 1.83.2.8  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.83.2.7  24-Jan-2005  skrll Don't deref a NULL struct lwp *
 1.83.2.6  21-Sep-2004  skrll Fix the sync with head I botched.
 1.83.2.5  18-Sep-2004  skrll Sync with HEAD.
 1.83.2.4  03-Sep-2004  skrll Sync with HEAD
 1.83.2.3  25-Aug-2004  skrll Sync with HEAD.
 1.83.2.2  03-Aug-2004  skrll Sync with HEAD
 1.83.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.97.2.2  31-Oct-2005  tron Pull up following revision(s) (requested by nathanw in ticket #5939):
sys/kern/uipc_socket.c: revision 1.112
Check the argument to SO_LINGER.
 1.97.2.1  26-May-2004  he branches: 1.97.2.1.2;
Pull up revision 1.103 (requested by atatat in ticket #388):
Add remaining sysctl descriptions under kern subtree.
 1.97.2.1.2.3  28-Oct-2005  jmc "Pullup revs 1.011-1.112 (requested by nathanw in ticket #5939)
Check the argument to SO_LINGER."
 1.97.2.1.2.2  13-Sep-2005  riz branches: 1.97.2.1.2.2.2;
Back out ticket #5598, it causes netboot problems for (at least) sparc.
 1.97.2.1.2.1  22-Aug-2005  riz Pull up following revision(s) (requested by christos in ticket #5598):
sys/kern/uipc_socket.c: revision 1.105 via patch
PR/26210: Matthew Mondor: Since revision 1.14 when net-2 was merged,
the code to do receive packet accounting has been disabled for no apparent
reason. Re-enable it.
 1.97.2.1.2.2.2.1  31-Oct-2005  tron Pull up following revision(s) (requested by nathanw in ticket #5939):
sys/kern/uipc_socket.c: revision 1.112
Check the argument to SO_LINGER.
 1.107.6.2  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.107.6.1  25-Jan-2005  yamt convert to new apis.
 1.107.4.1  29-Apr-2005  kent sync with -current
 1.108.2.3  25-Oct-2006  ghen Back out ticket #1568, it breaks source and binary compatibility.
Noted by pavel.
 1.108.2.2  25-Oct-2006  ghen Pull up following revision(s) (requested by ginsbach in ticket #1568):
sys/kern/uipc_socket.c: revision 1.120
lib/libc/sys/socket.2: revision 1.32
lib/libc/sys/socket.2: revision 1.33
Sort ERRORS. Bump date.
Add EAFNOSUPPORT as a possible error if the address family is not
supported. This adds further differentiation between which argument to
socket(2) caused the error. No longer are invalid domain (address family)
errors classified as ENOPROTOSUPPORT errors. This should make socket(2)
conform to current POSIX and X/Open standards. Fixes PR/33676.
 1.108.2.1  22-Oct-2005  riz Pull up following revision(s) (requested by nathanw in ticket #911):
sys/kern/uipc_socket.c: revision 1.112
Check the argument to SO_LINGER.
 1.111.4.1  26-Oct-2005  yamt sync with head
 1.111.2.15  24-Mar-2008  yamt sync with head.
 1.111.2.14  17-Mar-2008  yamt sync with head.
 1.111.2.13  27-Feb-2008  yamt drop lazy mapping of mbuf external storage for now, because:
- it's currently broken wrt asm code. (cpu_in_cksum)
- there are other approaches worth to consider. eg. sf_buf
 1.111.2.12  11-Feb-2008  yamt sync with head.
 1.111.2.11  21-Jan-2008  yamt sync with head
 1.111.2.10  07-Dec-2007  yamt sync with head
 1.111.2.9  15-Nov-2007  yamt sync with head.
 1.111.2.8  27-Oct-2007  yamt sync with head.
 1.111.2.7  03-Sep-2007  yamt sync with head.
 1.111.2.6  26-Feb-2007  yamt sync with head.
 1.111.2.5  30-Dec-2006  yamt sync with head.
 1.111.2.4  07-Jul-2006  yamt - fix typos and compilation problems in uipc_mbuf.c rev.1.100.2.8.
- m_ext_free: fix the recursive call case.
- change return value of mcl_dec_and_test_reference.
- tweak assertions.
 1.111.2.3  21-Jun-2006  yamt sync with head.
 1.111.2.2  07-Jul-2005  yamt defer mapping only when defined(__HAVE_LAZY_MBUF).
 1.111.2.1  07-Jul-2005  yamt sosend_loan: defer mapping of mbuf external data pages.
mtod: map mbuf external data pages if needed.
 1.115.6.2  01-Jun-2006  kardel Sync with head.
 1.115.6.1  22-Apr-2006  simonb Sync with head.
 1.115.4.1  09-Sep-2006  rpaulo sync with head
 1.115.2.4  15-Jan-2006  yamt rename VMSPACE_IS_KERNEL to VMSPACE_IS_KERNEL_P. ("predicate")
suggested by Matt Thomas.
 1.115.2.3  31-Dec-2005  yamt redo the previous correctly.
 1.115.2.2  31-Dec-2005  yamt use VMSPACE_IS_KERNEL.
 1.115.2.1  31-Dec-2005  yamt uio_segflg/uio_lwp -> uio_vmspace.
 1.116.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.116.4.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.116.4.2  19-Apr-2006  elad sync with head.
 1.116.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.116.2.4  11-Aug-2006  yamt sync with head
 1.116.2.3  26-Jun-2006  yamt sync with head.
 1.116.2.2  24-May-2006  yamt sync with head.
 1.116.2.1  11-Apr-2006  yamt sync with head
 1.118.2.1  19-Jun-2006  chap Sync with head.
 1.120.2.1  13-Jul-2006  gdamore Merge from HEAD.
 1.122.6.2  10-Dec-2006  yamt sync with head.
 1.122.6.1  22-Oct-2006  yamt sync with head
 1.122.4.3  01-Feb-2007  ad Sync with head.
 1.122.4.2  12-Jan-2007  ad Sync with head.
 1.122.4.1  18-Nov-2006  ad Sync with head.
 1.129.2.1  13-May-2007  pavel Pull up following revision(s) (requested by yamt in ticket #621):
sys/kern/uipc_syscalls.c: revision 1.108-1.109 via patch
sys/kern/uipc_socket.c: revision 1.139 via patch
- soabort: don't leak a socket on error.
- add an assertion.

sys_accept: don't leak a socket on error.

sys_accept: fix usecount botch and double soclose in rev.1.108.
 1.132.2.5  07-May-2007  yamt sync with head.
 1.132.2.4  15-Apr-2007  yamt sync with head.
 1.132.2.3  24-Mar-2007  yamt sync with head.
 1.132.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.132.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.134.2.6  09-Oct-2007  ad Sync with head.
 1.134.2.5  01-Sep-2007  ad Update for pool_cache API changes.
 1.134.2.4  20-Aug-2007  ad Sync with HEAD.
 1.134.2.3  08-Jun-2007  ad Sync with head.
 1.134.2.2  10-Apr-2007  ad Sync with head.
 1.134.2.1  13-Mar-2007  ad Sync with head.
 1.135.2.1  11-Jul-2007  mjf Sync with head.
 1.136.2.1  18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.140.6.6  09-Dec-2007  jmcneill Sync with HEAD.
 1.140.6.5  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.140.6.4  11-Nov-2007  joerg Sync with HEAD.
 1.140.6.3  07-Oct-2007  joerg Sync with HEAD.
 1.140.6.2  02-Oct-2007  joerg Sync with HEAD.
 1.140.6.1  09-Aug-2007  jmcneill Sync with HEAD.
 1.140.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.141.4.2  06-Aug-2007  yamt sosetopt: clear SB_AUTOSIZE when setting buffer size explicitly.
 1.141.4.1  06-Aug-2007  yamt file uipc_socket.c was added on branch matt-mips64 on 2007-08-06 11:41:53 +0000
 1.141.2.4  23-Mar-2008  matt sync with HEAD
 1.141.2.3  09-Jan-2008  matt sync with HEAD
 1.141.2.2  08-Nov-2007  matt sync with -HEAD
 1.141.2.1  06-Nov-2007  matt sync with HEAD
 1.142.2.1  06-Oct-2007  yamt sync with head.
 1.144.4.4  18-Feb-2008  mjf Sync with HEAD.
 1.144.4.3  27-Dec-2007  mjf Sync with HEAD.
 1.144.4.2  08-Dec-2007  mjf Sync with HEAD.
 1.144.4.1  19-Nov-2007  mjf Sync with HEAD.
 1.144.2.1  13-Nov-2007  bouyer Sync with HEAD
 1.147.2.2  26-Dec-2007  ad Sync with head.
 1.147.2.1  08-Dec-2007  ad Sync with head.
 1.149.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.151.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.151.6.4  28-Sep-2008  mjf Sync with HEAD.
 1.151.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.151.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.151.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.151.2.1  24-Mar-2008  keiichi sync with head.
 1.159.2.3  06-Jun-2008  christos add so_egid and so_cpid for pf.
 1.159.2.2  04-Jun-2008  yamt sync with head
 1.159.2.1  18-May-2008  yamt sync with head.
 1.160.2.4  11-Mar-2010  yamt sync with head
 1.160.2.3  16-Sep-2009  yamt sync with head
 1.160.2.2  04-May-2009  yamt sync with head.
 1.160.2.1  16-May-2008  yamt sync with head.
 1.164.2.3  10-Oct-2008  skrll Sync with HEAD.
 1.164.2.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.164.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.167.2.2  28-Jul-2008  simonb Sync with head.
 1.167.2.1  18-Jun-2008  simonb Sync with head.
 1.168.2.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.168.2.1  19-Oct-2008  haad Sync with HEAD.
 1.177.4.4  08-Aug-2011  riz Pull up following revision(s) (requested by bouyer in ticket #1644):
sys/sys/socketvar.h: revision 1.126
sys/kern/init_main.c: revision 1.433
sys/kern/uipc_socket.c: revision 1.205
Fix kern/45093 as discussed on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2011/06/17/msg010734.html
The cause of the problem is that the so_pendfree is processed with
the softnet_lock held at one point, and processing the list
calls sodoloanfree() which may kpause(). As the thread sleeps with
softnet_lock held, it ultimately cause a deadlock (see the PR or tech-kern
thread for details).
Although it should be possible to call sodopendfree() after releasing
the socket lock, it's not so easy to know where he socket lock is held and
where it's not, so we may hit the issue again later.
Add a kernel thread to handle the so_pendfree list, and wake up this
thread when adding mbufs to this list. Get rid of the various sodopendfree()
calls, hopefully fixing definitively the problem.
 1.177.4.3  03-May-2009  bouyer branches: 1.177.4.3.2;
Pull up following revision(s) (requested by ad in ticket #731):
sys/kern/uipc_socket.c: revision 1.189
PR kern/41311: Mutex error: mutex_vector_enter: locking against myself
 1.177.4.2  04-Apr-2009  snj branches: 1.177.4.2.2;
Pull up following revision(s) (requested by ad in ticket #661):
sys/arch/xen/xen/xenevt.c: revision 1.32
sys/compat/svr4/svr4_net.c: revision 1.56
sys/compat/svr4_32/svr4_32_net.c: revision 1.19
sys/dev/dmover/dmover_io.c: revision 1.32
sys/dev/putter/putter.c: revision 1.21
sys/kern/kern_descrip.c: revision 1.190
sys/kern/kern_drvctl.c: revision 1.23
sys/kern/kern_event.c: revision 1.64
sys/kern/sys_mqueue.c: revision 1.14
sys/kern/sys_pipe.c: revision 1.109
sys/kern/sys_socket.c: revision 1.59
sys/kern/uipc_syscalls.c: revision 1.136
sys/kern/vfs_vnops.c: revision 1.164
sys/kern/uipc_socket.c: revision 1.188
sys/net/bpf.c: revision 1.144
sys/net/if_tap.c: revision 1.55
sys/opencrypto/cryptodev.c: revision 1.47
sys/sys/file.h: revision 1.67
sys/sys/param.h: patch
sys/sys/socketvar.h: revision 1.119
Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
 1.177.4.1  02-Feb-2009  snj Pull up following revision(s) (requested by yamt in ticket #393):
sys/kern/uipc_socket.c: revision 1.185
sys/kern/uipc_socket2.c: revision 1.101
sys/kern/uipc_syscalls.c: revision 1.135
sys/miscfs/portal/portal_vnops.c: revision 1.81
sys/netsmb/smb_trantcp.c: revision 1.40
sys/nfs/nfs_socket.c: revision 1.177
sys/sys/socketvar.h: revision 1.118
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.
 1.177.4.3.2.1  08-Aug-2011  riz Pull up following revision(s) (requested by bouyer in ticket #1644):
sys/sys/socketvar.h: revision 1.126
sys/kern/init_main.c: revision 1.433
sys/kern/uipc_socket.c: revision 1.205
Fix kern/45093 as discussed on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2011/06/17/msg010734.html
The cause of the problem is that the so_pendfree is processed with
the softnet_lock held at one point, and processing the list
calls sodoloanfree() which may kpause(). As the thread sleeps with
softnet_lock held, it ultimately cause a deadlock (see the PR or tech-kern
thread for details).
Although it should be possible to call sodopendfree() after releasing
the socket lock, it's not so easy to know where he socket lock is held and
where it's not, so we may hit the issue again later.
Add a kernel thread to handle the so_pendfree list, and wake up this
thread when adding mbufs to this list. Get rid of the various sodopendfree()
calls, hopefully fixing definitively the problem.
 1.177.4.2.2.1  03-May-2009  bouyer branches: 1.177.4.2.2.1.2;
Pull up following revision(s) (requested by ad in ticket #731):
sys/kern/uipc_socket.c: revision 1.189
PR kern/41311: Mutex error: mutex_vector_enter: locking against myself
 1.177.4.2.2.1.2.2  25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.177.4.2.2.1.2.1  21-Apr-2010  matt sync to netbsd-5
 1.177.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.177.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.177.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.186.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.200.4.1  05-Mar-2011  rmind sync with head
 1.200.2.1  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.201.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.202.2.1  08-Feb-2011  bouyer Sync with HEAD
 1.205.6.2  05-Apr-2012  mrg sync to latest -current.
 1.205.6.1  18-Feb-2012  mrg merge to -current.
 1.205.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.205.2.2  30-Oct-2012  yamt sync with head
 1.205.2.1  17-Apr-2012  yamt sync with head
 1.209.2.4  25-Nov-2013  bouyer Pull up following revision(s) (requested by spz in ticket #988):
sys/kern/uipc_socket.c: revision 1.220
PR/48098: Brian Marcotte: panic: kernel diagnostic assertion "cred != NULL":
Fix from Michael van Elst, tcpdrop crashes kernel on ebryonic connections.
 1.209.2.3  02-Aug-2013  martin Pullup ticket #927:

sys/kern/uipc_socket.c 1.216

Fix an inversion in checking for authorization to drop TCP connections
found (and the obvious fix suggested) by Sander Bos.

Requested by spz.
 1.209.2.2  14-Feb-2013  jdc branches: 1.209.2.2.2;
Pull up revisions:
src/sys/kern/uipc_socket.c revision 1.213
src/sys/kern/uipc_syscalls.c revision 1.160
(requested by christos in ticket #822).

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places=
,
and too many flags that mean the same thing and are different.
 1.209.2.1  12-Jul-2012  riz branches: 1.209.2.1.4;
Pull up following revision(s) (requested by chs in ticket #408):
sys/kern/uipc_socket.c: revision 1.211
in soreceive(), handle uios larger than 31 bits.
fixes the remaining problem in PR 43240.
 1.209.2.2.2.2  25-Nov-2013  bouyer Pull up following revision(s) (requested by spz in ticket #988):
sys/kern/uipc_socket.c: revision 1.220
PR/48098: Brian Marcotte: panic: kernel diagnostic assertion "cred != NULL":
Fix from Michael van Elst, tcpdrop crashes kernel on ebryonic connections.
 1.209.2.2.2.1  02-Aug-2013  martin Pullup ticket #927:

sys/kern/uipc_socket.c 1.216

Fix an inversion in checking for authorization to drop TCP connections
found (and the obvious fix suggested) by Sander Bos.

Requested by spz.
 1.209.2.1.4.3  21-Jul-2017  snj Pull up following revision(s) (requested by riastradh in ticket #1453):
sys/kern/uipc_socket.c: revision 1.213
sys/kern/uipc_syscalls.c: revision 1.160
PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.
 1.209.2.1.4.2  25-Nov-2013  bouyer Pull up following revision(s) (requested by spz in ticket #988):
sys/kern/uipc_socket.c: revision 1.220
PR/48098: Brian Marcotte: panic: kernel diagnostic assertion "cred != NULL":
Fix from Michael van Elst, tcpdrop crashes kernel on ebryonic connections.
 1.209.2.1.4.1  02-Aug-2013  martin Pullup ticket #927:

sys/kern/uipc_socket.c 1.216

Fix an inversion in checking for authorization to drop TCP connections
found (and the obvious fix suggested) by Sander Bos.

Requested by spz.
 1.211.2.5  03-Dec-2017  jdolecek update from HEAD
 1.211.2.4  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.211.2.3  23-Jun-2013  tls resync from head
 1.211.2.2  25-Feb-2013  tls resync with head
 1.211.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.215.4.4  18-May-2014  rmind sync with head
 1.215.4.3  18-Oct-2013  rmind Add soref() and sounref().
 1.215.4.2  28-Aug-2013  rmind sync with head
 1.215.4.1  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.221.2.1  10-Aug-2014  tls Rebase.
 1.235.2.7  28-Aug-2017  skrll Sync with HEAD
 1.235.2.6  05-Oct-2016  skrll Sync with HEAD
 1.235.2.5  09-Jul-2016  skrll Sync with HEAD
 1.235.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.235.2.3  22-Sep-2015  skrll Sync with HEAD
 1.235.2.2  06-Jun-2015  skrll Sync with HEAD
 1.235.2.1  06-Apr-2015  skrll Sync with HEAD
 1.248.2.1  04-Nov-2016  pgoyette Sync with HEAD
 1.252.6.1  02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.252.2.1  25-May-2017  bouyer Allow any user to bind to CAN sockets.
Maybe a better security model is needed.
 1.255.2.5  25-Feb-2020  martin Pull up following revision(s) (requested by maxv in ticket #1509):

sys/kern/uipc_socket.c: revision 1.288

Zero out 'tv', to prevent uninitialized bytes in its padding from leaking
to userland. Found by kMSan.
 1.255.2.4  12-Nov-2018  martin Pull up following revision(s) (requested by hannken in ticket #1089):

external/bsd/nsd/include/config.h: revision 1.5
sys/kern/uipc_syscalls.c: revision 1.198
sys/kern/uipc_syscalls.c: revision 1.199
sys/kern/uipc_socket.c: revision 1.267

Update getsockopt(SO_ERROR) to behave like soreceive() and
return and clear so->so_rerror if so->so_error is zero.

Ok: christos@

-

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@

-

sys_recvmmsg: don't defer an error that already gets returned.

-

Re-enable {send,recv}mmsg now they are working.
 1.255.2.3  09-Jun-2018  martin Pull up following revision(s) (requested by roy in ticket #868):

sys/sys/socketvar.h: revision 1.156
sys/kern/uipc_socket2.c: revision 1.130
sys/kern/uipc_socket.c: revision 1.264

Separate receive socket errors from general socket errors.
 1.255.2.2  09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.255.2.1  18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.259.2.6  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.259.2.5  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.259.2.4  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.259.2.3  25-Jun-2018  pgoyette Sync with HEAD
 1.259.2.2  02-May-2018  pgoyette Synch with HEAD
 1.259.2.1  22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.264.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.264.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.264.2.1  10-Jun-2019  christos Sync with HEAD
 1.281.2.3  22-Sep-2020  martin Pull up following revision(s) (requested by christos in ticket #1091):

sys/kern/uipc_socket.c: revision 1.291
sys/kern/uipc_usrreq.c: revision 1.199
sys/kern/uipc_socket2.c: revision 1.138

add socket info for user and group for unix sockets in fstat.
 1.281.2.2  25-Feb-2020  martin Pull up following revision(s) (requested by maxv in ticket #720):

sys/kern/uipc_socket.c: revision 1.288

Zero out 'tv', to prevent uninitialized bytes in its padding from leaking
to userland. Found by kMSan.
 1.281.2.1  21-Oct-2019  martin Pull up following revision(s) (requested by pgoyette in ticket #339):

sys/compat/common/kern_uipc_socket_50.c: revision 1.3
sys/sys/compat_stub.h: revision 1.19
sys/kern/uipc_socket.c: revision 1.284

Actually return the updated pointer-to-mbuf-pointer to the caller
rather than discarding-after-assignment. Introduced from the
[pgoyette-compat] branch work.
 1.285.2.1  29-Feb-2020  ad Sync with head.
 1.292.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.302.4.3  11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #830):

sys/kern/uipc_socket.c: revision 1.304
sys/kern/uipc_syscalls.c: revision 1.207

Fix a ~16 year old perf regression: when creating a socket, add a reference
to the caller's credentials rather than copying them. On an 80486DX2/66 this
seems to ~halve the time taken to create a socket.

Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.
 1.302.4.2  20-Jul-2024  martin Pull up following revision(s) (requested by jdolecek in ticket #741):

sys/kern/uipc_socket.c: revision 1.309

make kqfilter() behave the same for PIPE_SOCKETPAIR pipe as it does
for standard one - refuse EVFILT_WRITE if the reader is already disconnected
fixes test failure for kernel/kqueue/write/t_pipe.c on PIPE_SOCKETPAIR kernel
PR kern/55690
 1.302.4.1  04-Feb-2024  martin Pull up following revision(s) (requested by jdolecek in ticket #583):

sys/kern/uipc_socket.c: revision 1.308
sys/kern/uipc_syscalls.c: revision 1.211
sys/sys/socketvar.h: revision 1.168
sys/net/if_gre.c: revision 1.185

fix PIPE_SOCKETPAIR variant of pipe1() to apply correctly the 'flags'
passed when called via pipe2(2), fixing repeatable process hang during
compilation with 'gcc -pipe'

refactor fsocreate() to return the new socket and file pointers,
expect the caller to call fd_affix() once initialization is fully complete
use the new fsocreate() to replace the duplicate open-coded 'flags' handling
in makesocket() used for socketpair(2), and in the PIPE_SOCKETPAIR pipe1()
this also fixes lib/libc/sys/t_pipe2 pipe2_cloexec test to succeed
on PIPE_SOCKETPAIR kernel

fixes PR kern/55690

RSS XML Feed