Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/uipc_syscalls.c
RevisionDateAuthorComments
 1.215  16-Jul-2025  kre Kernel part of O_CLOFORK implementation (plus kernel revbump)

This is Ricardo Branco's implementation of O_CLOFORK (and
associated fcntl, etc) for NetBSD (with a few minor changes
by me).

For now, the header file symbols that should be exposed to
userland are hidden inside temporary #ifdef _KERNEL blocks,
just to avoid random userland apps, or config scripts, from
seeing any of this before it is better tested.

Userland parts of this will follow soon.

This also bumps the kernel version to 10.99.15 (changes to
data structs, and the signature of fd_dup()).
 1.214  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sprinkle SET_ERROR dtrace probes.

PR kern/58378: Kernel error code origination lacks dtrace probes
 1.213  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sort includes.

No functional change intended.
 1.212  05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.211  03-Feb-2024  jdolecek branches: 1.211.2;
fix PIPE_SOCKETPAIR variant of pipe1() to apply correctly the 'flags'
passed when called via pipe2(2), fixing repeatable process hang during
compilation with 'gcc -pipe'

refactor fsocreate() to return the new socket and file pointers,
expect the caller to call fd_affix() once initialization is fully complete

use the new fsocreate() to replace the duplicate open-coded 'flags' handling
in makesocket() used for socketpair(2), and in the PIPE_SOCKETPAIR pipe1()

this also fixes lib/libc/sys/t_pipe2 pipe2_cloexec test to succeed
on PIPE_SOCKETPAIR kernel

fixes PR kern/55690
 1.210  02-Nov-2023  martin Back out the following revisions on behalf of core:

sys/sys/lwp.h: revision 1.228
sys/sys/pipe.h: revision 1.40
sys/kern/uipc_socket.c: revision 1.306
sys/kern/kern_sleepq.c: revision 1.84
sys/rump/librump/rumpkern/locks_up.c: revision 1.13
sys/kern/sys_pipe.c: revision 1.165
usr.bin/fstat/fstat.c: revision 1.119
sys/rump/librump/rumpkern/locks.c: revision 1.87
sys/ddb/db_xxx.c: revision 1.78
sys/ddb/db_command.c: revision 1.187
sys/sys/condvar.h: revision 1.18
sys/ddb/db_interface.h: revision 1.42
sys/sys/socketvar.h: revision 1.166
sys/kern/uipc_syscalls.c: revision 1.209
sys/kern/kern_condvar.c: revision 1.60

Add cv_fdrestart() [...]
Use cv_fdrestart() to implement fo_restart.
Simplify/streamline pipes a little bit [...]

This changes have caused regressions and need to be debugged.
The cv_fdrestart() addition needs more discussion.
 1.209  13-Oct-2023  ad Use cv_fdrestart() to implement fo_restart.
 1.208  04-Oct-2023  ad kauth_cred_hold(): return cred verbatim so that donating a reference to
another data structure can be done more elegantly.
 1.207  09-Sep-2023  ad Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.
 1.206  01-Jul-2022  riastradh branches: 1.206.4;
sendto(2), recvfrom(2): Scrub internal struct msghdr on stack.

Otherwise this is kernel stack disclosure via ktrace.

Reported-by: syzbot+1d40303b310063778194@syzkaller.appspotmail.com
 1.205  29-Jun-2022  riastradh recvmmsg(2): More timespec validation.

Reported-by: syzbot+004ed2f264534bd27312@syzkaller.appspotmail.com
Reported-by: syzbot+6f9014c842c4e78df7bc@syzkaller.appspotmail.com
 1.204  28-Jun-2022  riastradh recvmmsg(2): Avoid arithmetic overflow in timeout calculations.

XXX This is not right -- it doesn't actually do anything to time
out...

Reported-by: syzbot+784209d76a94fcc6417b@syzkaller.appspotmail.com
 1.203  27-Jun-2022  riastradh sendmsg(2): Avoid buffer overrun in ktrace of invalid cmsghdr.

Reported-by: syzbot+efded148140b23425f5c@syzkaller.appspotmail.com
 1.202  02-Oct-2021  thorpej ...and correct my terrible spelling.
 1.201  02-Oct-2021  thorpej - Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.
 1.200  23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.199  12-Nov-2018  hannken branches: 1.199.4;
sys_recvmmsg: don't defer an error that already gets returned.
 1.198  07-Nov-2018  hannken Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@
 1.197  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.196  01-Aug-2018  rjs Add ioctl(2) handler for kernel part of sctp_peeloff().
 1.195  31-Jul-2018  rjs Add getsockopt2() syscall.
 1.194  04-May-2018  christos branches: 1.194.2;
define MBUFTYPES here.
 1.193  03-May-2018  christos Fix COMPAT_NETBSD32 cmsg handling:

1. alignment was wrong for > 1 message
2. macros were doing incorrect pointer comparisons, fortunately ending
the iteration early after the fists cmsg instead of crashing.
3. don't output 32 bit ktrace records for cmsg. 32 bit programs running
under emulation on 64 bit systems should produce 64 bit ktrace records
so that the native ktrace can handle the records; remove extra arguments
that are now not needed (the 32 bit msghdr).
4. output the correct type for cmsg trace records.
5. output all the cmsg records in traces instead of just the first one.

Welcome to 8.99.15 because of the argument removal.

XXX: Really all the code should be changed to use the CMSG_{FIRST,NXT}HDR
macros...
 1.192  16-Mar-2018  christos PR/53103: Timo Buhrmester: linux emulation of sendto(2) broken

The sockargs refactoring broke it, because sockargs only works with a user
address. Added an argument to sockargs to indicate where the address is
coming from. Welcome to 8.99.14.
 1.191  12-Feb-2018  maxv branches: 1.191.2;
Add a KASSERT; we expect *from to be a single mbuf (not chained).
 1.190  04-Jan-2018  christos Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).

(Tom Ivar Helbekkmo)
 1.189  31-Dec-2017  christos pass valsize for getsockopt like we do for setsockopt
 1.188  26-Dec-2017  kamil Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>
 1.187  20-Jun-2017  christos Change len type to be unsigned int for consistency with the input type.
Don't check for negative; it does not matter we clamp anyway. This
broke the compat32 getsockname() where an unitialized socklen_t ended
up randomly negative causing it to fail.
 1.186  03-Feb-2017  christos branches: 1.186.6;
expose sendmsg_so and recvmsg_so.
 1.185  02-Feb-2017  christos expose copyout_sockname_sb
 1.184  03-Dec-2016  christos branches: 1.184.2;
Add missing ktrkuser
 1.183  13-Sep-2016  martin Make the ktrace record written by do_sys_sendmsg/do_sys_recvmsg overridable
by the caller. Use this in compat_netbsd32 to log the 32bit version, so
the 32bit userland kdump is happy.
 1.182  07-Jul-2016  msaitoh branches: 1.182.2;
KNF. Remove extra spaces. No functional change.
 1.181  01-Nov-2015  christos Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.
 1.180  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.179  22-Jul-2015  maxv Memory leak. Triggerable from an unprivileged user via COMPAT_43.
 1.178  09-May-2015  rtr change sosend() to accept sockaddr * instead of mbuf * for nam.

bump to 7.99.16
 1.177  02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.176  24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.175  03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.174  06-Mar-2015  rtr Return EINVAL if namelen isn't large enough to encompass the expected
members of sockaddr structures. i.e. sa_len and sa_family.

Discussed with and patch by christos@
 1.173  05-Sep-2014  matt branches: 1.173.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.172  09-Aug-2014  rtr branches: 1.172.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.171  09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.170  18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.169  17-May-2014  rmind - fsocreate: set SS_NBIO before the file descriptor is affixed as there is
a theoretical race condition (hard to trigger, though); remove the LWP
parameter and clean up the code a little.
- Sprinkle few comments.
- Remove M_SOOPTS while here.
 1.168  17-May-2014  rmind makesocket: set SS_NBIO slightly earlier.
 1.167  17-May-2014  rmind Remove trailing whitespaces, wrap long lines, minor KNF; no functional changes.
 1.166  07-Apr-2014  seanb Fix a case where an erroneous EAGAIN was returned out of recvmmsg.
This occured when some, but not all of the mmsg array members
were filled with data from a non-blocking socket.
PR kern/48725
 1.165  09-Oct-2013  christos branches: 1.165.2;
delete extra m_len initialization.
 1.164  09-Oct-2013  christos PR/48292: Justin Cormack: paccept creates sockets that cannot be made blocking
Reset the socket flags not just the file flags for non-blocking I/O.
XXX: pullup 6
 1.163  08-Oct-2013  christos PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.
 1.162  29-Aug-2013  rmind Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.
 1.161  03-Jun-2013  christos branches: 1.161.2;
use the proper name for kdump pretty-printing.
 1.160  14-Feb-2013  christos PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.
 1.159  14-Feb-2013  riastradh Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos
 1.158  29-Dec-2012  mlelstv The sanity check prevented messages that carry only ancillary data.
 1.157  29-Dec-2012  mlelstv If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.

Undo the last commit which introduced this error path.

Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.

This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).
 1.156  17-Jul-2012  njoly branches: 1.156.2;
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.
 1.155  22-Jun-2012  christos Add {send,recv}mmsg from Linux
 1.154  25-Jan-2012  christos branches: 1.154.2;
revert atomics for so_options since it is a short.
 1.153  25-Jan-2012  christos need <sys/atomic.h>
 1.152  25-Jan-2012  christos Add locking, requested by yamt. Note that locking is not used everywhere
for these.
 1.151  25-Jan-2012  christos As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]
 1.150  21-Dec-2011  christos simplify expression
 1.149  20-Dec-2011  christos - Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).
 1.148  04-Nov-2011  christos branches: 1.148.4;
Fix error I introduced in previous commit that caused asymmetric connects
when SOCK_NONBLOCK or SOCK_CLOEXEC was specified. Factor out common code
and simplify error return.
 1.147  21-Sep-2011  christos branches: 1.147.2;
Put the mbuf type in the ktrace record so that we know how to decode it
in userland.
 1.146  27-Jul-2011  uebayasi These don't need uvm/uvm_extern.h.
 1.145  15-Jul-2011  christos fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)
 1.144  26-Jun-2011  christos * Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.
 1.143  24-Apr-2011  rmind - Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.
 1.142  10-Apr-2011  christos - Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)
 1.141  23-Apr-2010  rmind branches: 1.141.2;
Replace M_IOV and some malloc(9)s with kmem(9), and while there:
- Fix invalid free (M_TEMP vs M_IOV) in do_sys_recvmsg(), spotted by jakllsch@.
Also, same fix in osf1_sys_sendmsg_xopen().
- Fix attempt to free non-allocated memory in error path in netbsd32___getfh30().
- Plug a memory leak in compat_43_netbsd32_orecvmsg().
 1.140  21-Jan-2010  pgoyette branches: 1.140.2; 1.140.4;
Remove unnecessary call to kauth_cred_free().

This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@
 1.139  29-Dec-2009  elad Add credentials to to sockets.

We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!
 1.138  20-Dec-2009  dsl If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
 1.137  09-Dec-2009  dsl Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
 1.136  04-Apr-2009  ad Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)
 1.135  21-Jan-2009  yamt branches: 1.135.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.
 1.134  06-Aug-2008  plunky branches: 1.134.2; 1.134.4;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.133  24-Jun-2008  ad branches: 1.133.2;
Nothing uses getsock/getvnode any more.
 1.132  30-May-2008  rmind branches: 1.132.2;
do_sys_accept: release the reference to sock in few error paths.
Should fix PR/38790, report and test-case by Nicolas Joly.
 1.131  28-Apr-2008  martin branches: 1.131.2;
Remove clause 3 and 4 from TNF licenses
 1.130  24-Apr-2008  ad branches: 1.130.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.129  24-Apr-2008  ad Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.128  21-Mar-2008  ad branches: 1.128.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.127  06-Feb-2008  ad branches: 1.127.6;
Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.
 1.126  26-Dec-2007  ad Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.
 1.125  20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.124  16-Dec-2007  elad Make solisten() take an lwp pointer like the rest, so it can be passed down
to pr_usrreq.
 1.123  24-Nov-2007  dyoung branches: 1.123.2; 1.123.6;
Pass the mbuf type (e.g., MT_SONAME, MT_SOOPTS) as the second
argument to getsockmbuf().
 1.122  05-Oct-2007  dyoung branches: 1.122.4;
Use getsombuf().
 1.121  19-Sep-2007  christos branches: 1.121.2;
minor nits; no code change.
 1.120  19-Sep-2007  dyoung 1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.119  06-Sep-2007  rmind do_sys_sendmsg: Plug a possible leak.
From CID: 4535
 1.118  01-Sep-2007  dsl Don't error calls to copy socket addresses to userspace when the application
has provided a non-null buffer pointer and a zero length.
 1.117  27-Aug-2007  dsl ktrace socket control structures (ie msghdr, address etc) using ktrkuser().
 1.116  15-Aug-2007  ad branches: 1.116.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.
 1.115  15-Jul-2007  dsl branches: 1.115.2; 1.115.6;
Remove non-user flags (especially MSG_IOVUSRSPACE) from mp->msg_flags
before passing to so_receive.
This may (or may not) have any effect...
 1.114  01-Jul-2007  dsl Check for SOL_SOCKET when checking for SCM_RIGHTS.
 1.113  24-Jun-2007  dsl Split sys_getpeername() and sys_getsockname() so they can be called when the
'name' is wanted in kernel code.
Similarly split sys_accept() and change the split in recvmsg() so that it
is useful to the compat functions, recvit() is removed and replaced by
do_sys_recvmsg().
Factor out the code that writes socked names to userspace (from mbuf) to
avoid replicated code.
Extract the code that writes socket 'control' (CMSG) data out to userspace,
being more careful about the 'fd' that may exist inside SCM_RIGHTS msgs.
(they still get lost if some of the latter copyout calls fail).
Since these are new functions, and old LKMs will fail to load.
 1.112  02-Jun-2007  enami - Fix obvious typos so that sendto(2) works.
- Wrap lines again.
 1.111  01-Jun-2007  dsl Split sys_bind() and sys_connect() so that compat code can use common code
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.
 1.110  13-May-2007  dsl Fallout from caddr_t deletion - remove a load of redundant (void *) casts.
 1.109  18-Apr-2007  yamt sys_accept: fix usecount botch and double soclose in rev.1.108.
 1.108  15-Apr-2007  yamt sys_accept: don't leak a socket on error.
 1.107  04-Mar-2007  christos branches: 1.107.2; 1.107.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.106  09-Feb-2007  ad branches: 1.106.2;
Merge newlock2 to head.
 1.105  01-Nov-2006  yamt branches: 1.105.2;
remove some __unused from function parameters.
 1.104  23-Oct-2006  elad PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic

Patch applied, thanks!
 1.103  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.102  22-Aug-2006  seanb branches: 1.102.2; 1.102.4;
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.
 1.101  23-Jul-2006  ad branches: 1.101.2;
Use the LWP cached credentials where sane.
 1.100  26-Jun-2006  mrg version the socket(2) syscall. for compat30 socket, we use
EPROTONOSUPPORT instead of EAFNOSUPPORT.

from pavel@ with a little bit of clean up from myself.

XXX: netbsd32 (and perhaps other emulations) should be able
XXX: to call the standard socket calls for this i think, but
XXX: revisit this at another time.
 1.99  16-May-2006  christos branches: 1.99.4;
Don't set mature an fd that has been ffree'd
 1.98  11-May-2006  christos Add MSG_NOSIGNAL (from FreeBSD)
 1.97  01-Mar-2006  yamt branches: 1.97.2; 1.97.4; 1.97.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.96  26-Dec-2005  perry branches: 1.96.2; 1.96.4; 1.96.6;
u_intN_t -> uintN_t
 1.95  11-Dec-2005  christos merge ktrace-lwp.
 1.94  03-Sep-2005  martin In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.

Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.

This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.
 1.93  03-Sep-2005  martin minor knf tweak
 1.92  30-May-2005  martin branches: 1.92.2;
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.
 1.91  29-May-2005  christos - add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.90  26-Feb-2005  perry branches: 1.90.2;
nuke trailing whitespace
 1.89  30-Nov-2004  christos branches: 1.89.4; 1.89.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
 1.88  22-May-2004  jonathan Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.
 1.87  18-May-2004  ragge Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.

Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.

As a side effect, the new connect() behaviour conformes to Posix.
 1.86  29-Nov-2003  matt branches: 1.86.2;
Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.
 1.85  29-Nov-2003  perry Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.
 1.84  13-Nov-2003  chs eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
 1.83  04-Sep-2003  matt Adapt to the new calling conventions of unp_connect2
 1.82  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.81  29-Jun-2003  fvdl branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.80  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.79  05-Apr-2003  christos PR/21030: Naoto Shimazaki: fcntl to accepted socket does not work properly
 1.78  26-Feb-2003  matt Remove leftover MBUFTRACE asserts.
 1.77  26-Feb-2003  drochner deactivate MBUFTRACE related KASSERT()s in the !MBUFTRACE case
 1.76  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.75  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.74  26-Nov-2002  christos si_ -> sel_ to avoid conflicts with siginfo.
 1.73  25-Nov-2002  itojun no need for error check after MEXTMALLOC - jdolecek
 1.72  25-Nov-2002  itojun MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().
 1.71  23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.70  04-Sep-2002  matt Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.
 1.69  31-May-2002  itojun support setsockopt() with larger data (up to MCLBYTES).
From: Hitoshi Asaeda <Hitoshi.Asaeda@sophia.inria.fr>
 1.68  11-Feb-2002  jdolecek branches: 1.68.8;
Switch default for pipes to the faster John S. Dyson's implementation.
Old, socketpair-based ones are available with option PIPE_SOCKETPAIR.
 1.67  12-Nov-2001  lukem add RCSIDs
 1.66  16-Sep-2001  wiz branches: 1.66.2;
Spell 'occurred' with two 'r's.
 1.65  17-Jul-2001  jdolecek branches: 1.65.2;
Expel MSG_COMPAT/COMPAT_OLDSOCK, make the COMPAT_43 wrappers
arrange things as needed. Unfortunately, the check in sockargs()
have to stay, since 4.3BSD bind(2), connect(2) and sendto(2) were
not versioned at the time :(

This code was tested to pass regression tests.
 1.64  01-Jul-2001  matt branches: 1.64.2;
Use consistent types for len. Limit sockarg length to reasonable values.
 1.63  25-Jun-2001  jdolecek Back off the sendit()/recvit() change, some have problems with it
 1.62  25-Jun-2001  jdolecek sys_connect(): fix the call to FILE_UNUSE() so that it's done on return, rather
than immediatelly after getsock() call
 1.61  25-Jun-2001  jdolecek Add 'kernsa' parameter for sendit()/recvit(); if nonzero, msg->msg_name
is supposed to point directly to struct mbuf or struct sockaddr in kernel
space as appropriate, rather than being a pointer to memory in userland.

This is to be used by compat/* when emulation needs to wrap
send{to|msg}(2)/recv{from|msg}(2) and modify the passed struct
sockaddr.
 1.60  16-Jun-2001  jdolecek Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.
 1.59  14-Jun-2001  thorpej Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
 1.58  06-May-2001  manu implement the recently introduced EMUL_BSD_ASYNCIO_PIPE emulation package
flag.

EMUL_BSD_ASYNCIO_PIPE notes that the emulated binaries expect the original
BSD pipe behavior for asynchronous I/O, which is to fire SIGIO on read() and
write(). OSes without this flag do not expect any SIGIO to be fired on
read() and write() for pipes, even when async I/O was requested. As far as
we know, the OSes that need EMUL_BSD_ASYNCIO_PIPE are NetBSD, OSF/1 and
Darwin.
 1.57  27-Feb-2001  lukem branches: 1.57.2;
convert to ANSI KNF
 1.56  10-Dec-2000  fvdl Make sobind() take a struct proc *. It already took curproc and
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).
 1.55  24-Nov-2000  jdolecek define COMPAT_OLDSOCK unconditionally - the code is used virtually for all
emulations besides NetBSD, and this way it's LKM-safe
 1.54  02-Aug-2000  thorpej MALLOC()/FREE() are not to be used for variable sized allocations.
 1.53  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.52  27-May-2000  sommerfeld branches: 1.52.4;
Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()
 1.51  30-Mar-2000  augustss Get rid of register declarations.
 1.50  23-Mar-2000  thorpej Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.
 1.49  05-Nov-1999  mycroft branches: 1.49.2;
Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.
 1.48  30-Oct-1999  enami back out unnecessary stylistic changes in recent changes, to keep coding
style closer to NKF.
 1.47  27-Oct-1999  jdolecek minor cleanup of previous - avoid goto and code duplication
 1.46  27-Oct-1999  darrenr patch from Greg A. Woods to fix panic problems with code that attempts to
recover from failures to accept a socket successfully. Problem suggested
by this:
> It would appear (from two "panic: closef: count < 0" failures in less
> than 12 hours) that Darren's fix to accept(2) for lost file descriptors
> isn't quite correct. His fix inserts a call to closef() to handle one
> of several possible error conditions. However everywhere else in the
> socket code in the same file where falloc() cleanup is necessary the
> function used is ffree().
 1.45  01-Jul-1999  itojun branches: 1.45.2; 1.45.4; 1.45.6;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.44  01-Jul-1999  darrenr fix sys_accept() to return EOPNOTSUPP for protocols which don't support
listen/accept (PR_LISTEN flag in protosw) and detect obvious faults in
parameters passed. It is still possible for the address used for copying
the socket information to become invalid between that check and the copyout
so close the connection's allocated fd if the copyout fails so that we can
return EFAULT without allocating an fd and the application not knowing about
it. Ideally we'd be able to queue the connection back up so a later accept
could retrieve it but unfortunately that's not possible.
 1.43  05-May-1999  thorpej Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.
 1.42  30-Apr-1999  cgd add checks for COMPAT_OSF1 in the appropriate places
 1.41  10-Feb-1999  kleink branches: 1.41.2; 1.41.4; 1.41.6;
* Due to addition and use of socklen_t, make the socket option and address
arguments passed to accept(), bind(), connect(), getpeername(), getsockname(),
getsockopt(), recvfrom(), sendto() and sendmsg() unsigned, which also elimiates
a few casts.
* Reflect the (now) signedness of msg_iovlen, which necessiates the addition
of a few casts.
 1.40  18-Dec-1998  drochner solve the COMPAT_OLDSOCK/MSG_COMPAT problem differently:
The source files which need MSG_COMPAT define COMPAT_OLDSOCK.
 1.39  26-Nov-1998  mycroft Revert the functional change in rev 1.38; permit a msg_iovlen of 0.
There are two reasons for this:
* We should be able to pass file descriptors without sending any data.
* We could send zero-length iovecs anyway (but we shouldn't have to do this).
Also, msg_iovlen is already a u_int, so delete a bunch of casts.
 1.38  04-Aug-1998  kleink Per XNS Issue 5, calling recvmsg(2) or sendmsg(2) with an msg.msg_iovlen less
than or equal to 0 shall fail with EMSGSIZE; the latter condition was not being
checked for. Also, document the msg.msg_iovlen > {IOV_MAX} case.
 1.37  04-Aug-1998  kleink UIO_MAXIOV -> IOV_MAX
 1.36  04-Aug-1998  perry Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)
 1.35  03-Aug-1998  kleink Fix two off-by-one bugs, both present in each recvmsg(2) and sendmsg(2):
* the first one would cause an unnecessary malloc() of iovec storage for
a msg_iovlen of UIO_SMALLIOV although the required amount of memory has
been allocated on the stack.
* the second one would cause a recvmsg() or sendmsg() with a msg_iovlen of
UIO_MAXIOV to fail with EMSGSIZE, which is also a violation of XNS5.
 1.34  31-Jul-1998  perry fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.
 1.33  29-Jul-1998  thorpej branches: 1.33.2;
When checking for overflow in the residual count, test against SSIZE_MAX.
The read/write system calls return ssize_t because -1 is used to indicate
error, therefore the transfer size MUST be limited to SSIZE_MAX, otherwise
garbage can be returned to the user.

There is NO change from existing behavior here, only a more precise
definition of that the semantics are, except in the Alpha case, where
the full SSIZE_MAX transfer size can now be realized (ssize_t is 64-bit
on the Alpha).
 1.32  18-Jul-1998  lukem use AF_LOCAL instead of AF_UNIX
 1.31  25-Jun-1998  thorpej defopt KTRACE
 1.30  25-Apr-1998  matt Hook for 0-copy (or other optimized) sends and receives
 1.29  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.28  06-Feb-1998  thorpej When copying out multiple control messages, ensure that the next control
message is aligned. From David Borman <dab@bsdi.com>.
 1.27  07-Jan-1998  thorpej Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).
 1.26  07-Jan-1998  thorpej Fix bug in recvit() that would cause recvmsg() to only receive one
control message, even if there were multiple control messages on
the queue. From Jean-Luc Richier <Jean-Luc.Richier@imag.fr>, in
bug report kern/4700.
 1.25  26-Jun-1997  thorpej branches: 1.25.8;
Use UCHAR_MAX rather than "255" when sanity-checking the length of a
sockaddr in sockargs().
 1.24  26-Jun-1997  thorpej In sockargs():
- Add a comment describing my feelings about this interface, in general.
- Remove the COMPAT_OLDSOCK length hack. Instead, if the socket argument
is too long to fit in an mbuf, allocate enough external storage to
hold it.
- If the socket argument is a sockaddr, don't allow the length to be
greater than 255, as that would overflow sa_len.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for his sanity checking.
 1.23  22-Dec-1996  cgd * catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
* Change sockargs()'s second argument to be a const void *, to help
with dealing with the syscall argument type fixups/const poisoning.
 1.22  14-Jun-1996  cgd avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.
 1.21  22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.20  17-May-1996  pk branches: 1.20.4;
Don't touch retval[] in socketpair(); manual page says this system call
return 0 on success (PR#2428).
 1.19  09-Feb-1996  christos More proto fixes
 1.18  04-Feb-1996  christos First pass at prototyping
 1.17  10-Oct-1995  mycroft Add hooks for COMPAT_FREEBSD, from Noriyuki Soda.
 1.16  07-Oct-1995  mycroft Prefix names of system call implementation functions with `sys_'.
 1.15  19-Sep-1995  thorpej Make system calls conform to a standard prototype and bring those
prototypes into scope.
 1.14  12-Aug-1995  mycroft splnet --> splsoftnet
 1.13  24-Jun-1995  christos Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).
 1.12  10-May-1995  christos tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed
 1.11  05-Mar-1995  fvdl Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.
 1.10  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.9  20-Oct-1994  cgd update for new syscall args description mechanism
 1.8  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7  04-May-1994  mycroft Add return types where missing. Simplify some of the compat conditionals.
Include compat code if COMPAT_SUNOS with or without COMPAT_43.
 1.6  18-Dec-1993  mycroft Canonicalize all #includes.
 1.5  17-Jul-1993  mycroft branches: 1.5.4;
Finish moving struct definitions outside of function declarations.
 1.4  27-Jun-1993  andrew * ansifications
* Yuval Yarom's socket recv(2) fixes - access rights problems (see also
uipc_socket.c).
 1.3  22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2  18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.5.4.3  14-Nov-1993  mycroft Canonicalize all #includes.
 1.5.4.2  10-Nov-1993  mycroft AF_UNIX --> AF_LOCAL
 1.5.4.1  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
init_main.c: New method of pseudo-device of initialization.
kern_clock.c: hardclock() and softclock() now take a pointer to a clockframe.
softclock() only does callouts.
kern_synch.c: Remove spurious declaration of endtsleep(). Adjust uses of
averunnable for new struct loadav.
subr_prf.c: Allow printf() formats in panic().
tty.c: averunnable changes.
vfs_subr.c: va_size and va_bytes are now quads.
 1.20.4.1  11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
 1.25.8.3  07-Feb-1998  mellon Pull up 1.28 (thorpej)
 1.25.8.2  29-Jan-1998  mellon Pull up 1.27 (thorpej)
 1.25.8.1  29-Jan-1998  mellon Pull up 1.26 (thorpej)
 1.33.2.1  08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.41.6.2  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.41.6.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.41.4.2  01-Jul-1999  thorpej Sync w/ -current.
 1.41.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.41.2.3  29-Jul-2001  he Apply patch (requested by he):
Add required include files to make this compile again.
 1.41.2.2  19-Jul-2001  perry fix overflow in sendmsg() -- requested by David Maxwell
 1.41.2.1  21-Jun-1999  cgd pull up rev(s) 1.42 from trunk. (cgd)
 1.45.6.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.45.4.1  15-Nov-1999  fvdl Sync with -current
 1.45.2.4  12-Mar-2001  bouyer Sync with HEAD.
 1.45.2.3  13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.45.2.2  08-Dec-2000  bouyer Sync with HEAD.
 1.45.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.49.2.2  05-Nov-1999  mycroft Fix recent bug in sys_accept(): we must remove the file descriptor from the
file descriptor table before freeing the file description.
 1.49.2.1  05-Nov-1999  mycroft file uipc_syscalls.c was added on branch comdex-fall-1999 on 1999-11-05 11:48:58 +0000
 1.52.4.4  15-Dec-2002  he Revert previous pullup (requested by itojun):
Apparently, there is no need to check return from MEXTMALLOC().
 1.52.4.3  15-Dec-2002  he Pull up revision 1.72 (requested by itojun):
MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for
malloc().
 1.52.4.2  02-Jul-2001  jhawk Pull up revision 1.64 via patch (requested by jdolecek):
Use consistent types for len. Limit sockarg length to reasonable values.
 1.52.4.1  15-Dec-2000  he Pull up revision 1.56 (requested by fvdl):
Fix NFS+tcp client hangs on server or network outage. Again,
please note that this introduces yet another kernel interface
change: sobind() gains an argument.
 1.57.2.13  11-Dec-2002  thorpej Sync with HEAD.
 1.57.2.12  11-Nov-2002  nathanw Catch up to -current
 1.57.2.11  17-Sep-2002  nathanw Catch up to -current.
 1.57.2.10  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.57.2.9  20-Jun-2002  nathanw Rename the local variable "l" to "len" so as not to shadow the LWP
argument to the syscall sys_setsockopt().
 1.57.2.8  20-Jun-2002  nathanw Catch up to -current.
 1.57.2.7  29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.57.2.6  28-Feb-2002  nathanw Catch up to -current.
 1.57.2.5  14-Nov-2001  nathanw Catch up to -current.
 1.57.2.4  21-Sep-2001  nathanw Catch up to -current.
 1.57.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.57.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.57.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.64.2.6  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.64.2.5  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.64.2.4  16-Mar-2002  jdolecek Catch up with -current.
 1.64.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.64.2.2  03-Aug-2001  lukem update to -current
 1.64.2.1  10-Jul-2001  lukem add calls to KNOTE(9) as appropriate
 1.65.2.1  01-Oct-2001  fvdl Catch up with -current.
 1.66.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.68.8.1  20-Jun-2002  gehenna catch up with -current.
 1.81.2.8  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.81.2.7  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.81.2.6  18-Dec-2004  skrll Sync with HEAD.
 1.81.2.5  12-Nov-2004  skrll Adapt to branch.
 1.81.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.81.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.81.2.2  03-Aug-2004  skrll Sync with HEAD
 1.81.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.86.2.3  29-Oct-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10740):
sys/kern/uipc_syscalls.c: revision 1.104
PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic
Patch applied, thanks!
 1.86.2.2  28-Aug-2006  tron Pull up following revision(s) (requested by seanb in ticket #10675):
sys/kern/uipc_syscalls.c: revision 1.102
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.
 1.86.2.1  20-May-2004  tron branches: 1.86.2.1.2; 1.86.2.1.4;
Pull up revision 1.87 (requested by ragge in ticket #354):
Fix connect() "bug": If connect() is interrupted by a signal, the connection
attempt is terminated, so if a process needs frequent timer interrupts
it can't ever connect() to a machine far away.
Bug found by Erik Lundgren, bugfix (for the same problem) is similar to
the way FreeBSD solved the same problem.
As a side effect, the new connect() behaviour conformes to Posix.
 1.86.2.1.4.2  29-Oct-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10740):
sys/kern/uipc_syscalls.c: revision 1.104
PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic
Patch applied, thanks!
 1.86.2.1.4.1  28-Aug-2006  tron Pull up following revision(s) (requested by seanb in ticket #10675):
sys/kern/uipc_syscalls.c: revision 1.102
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.
 1.86.2.1.2.2  29-Oct-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10740):
sys/kern/uipc_syscalls.c: revision 1.104
PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic
Patch applied, thanks!
 1.86.2.1.2.1  28-Aug-2006  tron Pull up following revision(s) (requested by seanb in ticket #10675):
sys/kern/uipc_syscalls.c: revision 1.102
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.
 1.89.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.89.4.1  29-Apr-2005  kent sync with -current
 1.90.2.6  24-Oct-2006  ghen Pull up following revision(s) (requested by elad in ticket #1566):
sys/kern/uipc_syscalls.c: revision 1.104
PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic
Patch applied, thanks!
 1.90.2.5  25-Aug-2006  ghen Pull up following revision(s) (requested by seanb in ticket #1472):
sys/kern/uipc_syscalls.c: revision 1.102
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.
 1.90.2.4  09-Sep-2005  tron branches: 1.90.2.4.2;
Pull up following revision(s) (requested by martin in ticket #746):
sys/kern/uipc_syscalls.c: revision 1.94
In adjust_rights() Use CMSG_SPACE() to calculate the number of
filedescriptors passed in this message - the counterpart in
unp_externalize does this as well.
Note that CMSG_SPACE(0) does not make sense, since it does not invoke
the alignment magic - so use CMSG_SPACE(sizeof(int)) and adjust the
calculated total later.
This fixes the postfix conection cache for 64bit platforms. Previously
the number of passed filed descriptors (nfds) would have been
calculeted too high, causing the fdrelease() of uninitialized junk.
 1.90.2.3  09-Sep-2005  tron Pull up following revision(s) (requested by martin in ticket #746):
sys/kern/uipc_syscalls.c: revision 1.93
minor knf tweak
 1.90.2.2  09-Sep-2005  tron Pull up following revision(s) (requested by martin in ticket #746):
sys/kern/uipc_syscalls.c: revision 1.92
Close additional file descriptors if we set MSG_CTRUNC in a SCM_RIGHTS
message. From der Mouse in PR kern/30370.
 1.90.2.1  09-Sep-2005  tron Pull up following revision(s) (requested by martin in ticket #746):
sys/kern/uipc_syscalls.c: revision 1.91
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.90.2.4.2.2  24-Oct-2006  ghen Pull up following revision(s) (requested by elad in ticket #1566):
sys/kern/uipc_syscalls.c: revision 1.104
PR/34873: Ryo Shimizu: sendmsg() can cause kernel panic
Patch applied, thanks!
 1.90.2.4.2.1  25-Aug-2006  ghen Pull up following revision(s) (requested by seanb in ticket #1472):
sys/kern/uipc_syscalls.c: revision 1.102
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.
 1.92.2.9  24-Mar-2008  yamt sync with head.
 1.92.2.8  11-Feb-2008  yamt sync with head.
 1.92.2.7  21-Jan-2008  yamt sync with head
 1.92.2.6  07-Dec-2007  yamt sync with head
 1.92.2.5  27-Oct-2007  yamt sync with head.
 1.92.2.4  03-Sep-2007  yamt sync with head.
 1.92.2.3  26-Feb-2007  yamt sync with head.
 1.92.2.2  30-Dec-2006  yamt sync with head.
 1.92.2.1  21-Jun-2006  yamt sync with head.
 1.96.6.2  01-Jun-2006  kardel Sync with head.
 1.96.6.1  22-Apr-2006  simonb Sync with head.
 1.96.4.1  09-Sep-2006  rpaulo sync with head
 1.96.2.1  31-Dec-2005  yamt uio_segflg/uio_lwp -> uio_vmspace.
 1.97.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.97.4.1  11-May-2006  elad sync with head
 1.97.2.3  03-Sep-2006  yamt sync with head.
 1.97.2.2  11-Aug-2006  yamt sync with head
 1.97.2.1  24-May-2006  yamt sync with head.
 1.99.4.1  13-Jul-2006  gdamore Merge from HEAD.
 1.101.2.1  24-Aug-2006  tron Pull up following revision(s) (requested by seanb in ticket #47):
sys/kern/uipc_syscalls.c: revision 1.102
Don't leave a dangling socket (no associated struct file) if
user supplied a bad name or anamelen parameter to accept(2).
If bad paramaters were suplied and a copyout() failed, the
struct file was cleaned up but not the associated socket. This
could leave sockets in CLOSE_WAIT that could never be closed.
 1.102.4.2  10-Dec-2006  yamt sync with head.
 1.102.4.1  22-Oct-2006  yamt sync with head
 1.102.2.4  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.102.2.3  16-Jan-2007  ad Fix locking botches.
 1.102.2.2  18-Nov-2006  ad Sync with head.
 1.102.2.1  20-Oct-2006  ad Acquire proclist_lock / proclist_mutex when sending signals.
 1.105.2.1  13-May-2007  pavel Pull up following revision(s) (requested by yamt in ticket #621):
sys/kern/uipc_syscalls.c: revision 1.108-1.109 via patch
sys/kern/uipc_socket.c: revision 1.139 via patch
- soabort: don't leak a socket on error.
- add an assertion.

sys_accept: don't leak a socket on error.

sys_accept: fix usecount botch and double soclose in rev.1.108.
 1.106.2.4  17-May-2007  yamt sync with head.
 1.106.2.3  07-May-2007  yamt sync with head.
 1.106.2.2  15-Apr-2007  yamt sync with head.
 1.106.2.1  12-Mar-2007  rmind Sync with HEAD.
 1.107.4.1  11-Jul-2007  mjf Sync with head.
 1.107.2.5  09-Oct-2007  ad Sync with head.
 1.107.2.4  20-Aug-2007  ad Sync with HEAD.
 1.107.2.3  15-Jul-2007  ad Sync with head.
 1.107.2.2  09-Jun-2007  ad Sync with head.
 1.107.2.1  08-Jun-2007  ad Sync with head.
 1.115.6.5  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.115.6.4  07-Oct-2007  joerg Sync with HEAD.
 1.115.6.3  02-Oct-2007  joerg Sync with HEAD.
 1.115.6.2  03-Sep-2007  jmcneill Sync with HEAD.
 1.115.6.1  16-Aug-2007  jmcneill Sync with HEAD.
 1.115.2.2  10-Sep-2007  skrll Sync with HEAD.
 1.115.2.1  03-Sep-2007  skrll Sync with HEAD.
 1.116.2.3  23-Mar-2008  matt sync with HEAD
 1.116.2.2  09-Jan-2008  matt sync with HEAD
 1.116.2.1  06-Nov-2007  matt sync with HEAD
 1.121.2.1  06-Oct-2007  yamt sync with head.
 1.122.4.3  18-Feb-2008  mjf Sync with HEAD.
 1.122.4.2  27-Dec-2007  mjf Sync with HEAD.
 1.122.4.1  08-Dec-2007  mjf Sync with HEAD.
 1.123.6.1  02-Jan-2008  bouyer Sync with HEAD
 1.123.2.2  26-Dec-2007  ad Sync with head.
 1.123.2.1  04-Dec-2007  ad Make periphery of sendmsg/recvmsg/sendto/recvfrom MP safe.
 1.127.6.4  28-Sep-2008  mjf Sync with HEAD.
 1.127.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.127.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.127.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.128.2.2  04-Jun-2008  yamt sync with head
 1.128.2.1  18-May-2008  yamt sync with head.
 1.130.2.4  11-Aug-2010  yamt sync with head.
 1.130.2.3  11-Mar-2010  yamt sync with head
 1.130.2.2  04-May-2009  yamt sync with head.
 1.130.2.1  16-May-2008  yamt sync with head.
 1.131.2.4  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.131.2.3  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.131.2.2  14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.131.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.132.2.1  27-Jun-2008  simonb Sync with head.
 1.133.2.1  19-Oct-2008  haad Sync with HEAD.
 1.134.4.4  13-Dec-2013  sborrill Pull up the following revisions(s) (requested by spz in ticket #1891):
sys/kern/uipc_syscalls.c: revision 1.163

If the unix socket is closed before accept, the mbuf returned by
m_get() will have an uninitialized length and contain junk from a
previous call. Initialize m_len to be 0 to handle this case.
Fixes PR/47591
 1.134.4.3  28-Mar-2010  snj Apply patch (requested by jakllsch in ticket #1352):
In do_sys_recvmsg(), call free(9) with the same type malloc(9) used.
 1.134.4.2  04-Apr-2009  snj branches: 1.134.4.2.2; 1.134.4.2.4;
Pull up following revision(s) (requested by ad in ticket #661):
sys/arch/xen/xen/xenevt.c: revision 1.32
sys/compat/svr4/svr4_net.c: revision 1.56
sys/compat/svr4_32/svr4_32_net.c: revision 1.19
sys/dev/dmover/dmover_io.c: revision 1.32
sys/dev/putter/putter.c: revision 1.21
sys/kern/kern_descrip.c: revision 1.190
sys/kern/kern_drvctl.c: revision 1.23
sys/kern/kern_event.c: revision 1.64
sys/kern/sys_mqueue.c: revision 1.14
sys/kern/sys_pipe.c: revision 1.109
sys/kern/sys_socket.c: revision 1.59
sys/kern/uipc_syscalls.c: revision 1.136
sys/kern/vfs_vnops.c: revision 1.164
sys/kern/uipc_socket.c: revision 1.188
sys/net/bpf.c: revision 1.144
sys/net/if_tap.c: revision 1.55
sys/opencrypto/cryptodev.c: revision 1.47
sys/sys/file.h: revision 1.67
sys/sys/param.h: patch
sys/sys/socketvar.h: revision 1.119
Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
 1.134.4.1  02-Feb-2009  snj Pull up following revision(s) (requested by yamt in ticket #393):
sys/kern/uipc_socket.c: revision 1.185
sys/kern/uipc_socket2.c: revision 1.101
sys/kern/uipc_syscalls.c: revision 1.135
sys/miscfs/portal/portal_vnops.c: revision 1.81
sys/netsmb/smb_trantcp.c: revision 1.40
sys/nfs/nfs_socket.c: revision 1.177
sys/sys/socketvar.h: revision 1.118
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.
 1.134.4.2.4.1  21-Apr-2010  matt sync to netbsd-5
 1.134.4.2.2.1  28-Mar-2010  snj Apply patch (requested by jakllsch in ticket #1352):
In do_sys_recvmsg(), call free(9) with the same type malloc(9) used.
 1.134.2.2  28-Apr-2009  skrll Sync with HEAD.
 1.134.2.1  03-Mar-2009  skrll Sync with HEAD.
 1.135.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.140.4.3  31-May-2011  rmind sync with head
 1.140.4.2  21-Apr-2011  rmind sync with head
 1.140.4.1  30-May-2010  rmind sync with head
 1.140.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.141.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.147.2.5  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.147.2.4  23-Jan-2013  yamt sync with head
 1.147.2.3  30-Oct-2012  yamt sync with head
 1.147.2.2  17-Apr-2012  yamt sync with head
 1.147.2.1  10-Nov-2011  yamt sync with head
 1.148.4.1  18-Feb-2012  mrg merge to -current.
 1.154.2.5  14-Dec-2013  bouyer Pull up following revision(s) (requested by spz in ticket #996):
sys/kern/uipc_syscalls.c: revision 1.163
PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.
 1.154.2.4  18-Feb-2013  riz branches: 1.154.2.4.2;
Pull up following revision(s) (requested by riastradh in ticket #831):
sys/kern/uipc_usrreq.c: revision 1.141
sys/kern/uipc_syscalls.c: revision 1.159
Fix some screw cases in cmsg file descriptor passing.
- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.
- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.
- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.
ok christos
 1.154.2.3  14-Feb-2013  jdc Pull up revisions:
src/sys/kern/uipc_socket.c revision 1.213
src/sys/kern/uipc_syscalls.c revision 1.160
(requested by christos in ticket #822).

PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places=
,
and too many flags that mean the same thing and are different.
 1.154.2.2  07-Jan-2013  riz Pull up following revision(s) (requested by mlelstv in ticket #778):
sys/kern/uipc_syscalls.c: revision 1.157
sys/kern/uipc_syscalls.c: revision 1.158
If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.
Undo the last commit which introduced this error path.
Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.
This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).
The sanity check prevented messages that carry only ancillary data.
 1.154.2.1  20-Jul-2012  riz branches: 1.154.2.1.4;
Pull up following revision(s) (requested by njoly in ticket #423):
sys/kern/uipc_syscalls.c: revision 1.156
Avoid kmem_alloc KASSERT for 0 byte allocation, when tracing processes
that use empty messages with sendmsg/recvmsg.
 1.154.2.4.2.1  14-Dec-2013  bouyer Pull up following revision(s) (requested by spz in ticket #996):
sys/kern/uipc_syscalls.c: revision 1.163
PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.
 1.154.2.1.4.3  21-Jul-2017  snj Pull up following revision(s) (requested by riastradh in ticket #1453):
sys/kern/uipc_socket.c: revision 1.213
sys/kern/uipc_syscalls.c: revision 1.160
PR/47569: Valery Ushakov: SOCK_NONBLOCK does not work because it does not
set SS_NBIO.
XXX: there are too many flags that mean the same thing in too many places,
and too many flags that mean the same thing and are different.
 1.154.2.1.4.2  14-Dec-2013  bouyer Pull up following revision(s) (requested by spz in ticket #996):
sys/kern/uipc_syscalls.c: revision 1.163
PR/47591: Michael Plass: If the unix socket is closed before accept,
unp->unp_conn will be NULL in PRU_ACCEPT, as called from
sys_accept->so_accept. This will cause the usrreq to return with
no error, leaving the mbuf gotten from m_get() with an uninitialized
length, containing junk from a previous call. Initialize m_len to
be 0 to handle this case. This is yet another reason why Beverly's
idea of setting m_len = 0 in m_get() makes a lot of sense. Arguably
this could be an error, since the data we return now has 0 family
and length.
 1.154.2.1.4.1  07-Jan-2013  riz Pull up following revision(s) (requested by mlelstv in ticket #778):
sys/kern/uipc_syscalls.c: revision 1.157
sys/kern/uipc_syscalls.c: revision 1.158
If an untraced process sleeps in recvmsg/sendmsg, the syscall does not
allocate an iov structure for ktrace. When tracing is then enabled
and the process wakes up, it crashes the kernel.
Undo the last commit which introduced this error path.
Avoid the mentioned kmem_alloc assertion by adding a sanity check analog
to similar code in sys_generic.c for I/O on file handles instead of
sockets.
This also causes the syscall to return EMSGSIZE if the msg_iovlen member
of the msg structure is less than or equal to 0, as defined in
recvmsg(2)/sendmsg(2).
The sanity check prevented messages that carry only ancillary data.
 1.156.2.4  03-Dec-2017  jdolecek update from HEAD
 1.156.2.3  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.156.2.2  23-Jun-2013  tls resync from head
 1.156.2.1  25-Feb-2013  tls resync with head
 1.161.2.2  18-May-2014  rmind sync with head
 1.161.2.1  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.165.2.1  10-Aug-2014  tls Rebase.
 1.172.2.2  08-Nov-2015  riz Pull up following revision(s) (requested by christos in ticket #1018):
sys/kern/uipc_syscalls.c: revision 1.181
Don't overwrite the user iov pointer in sendmmsg. Make the send and receive
code look the same.
 1.172.2.1  08-Aug-2015  martin Pull up following revision(s) (requested by maxv in ticket #942):
sys/kern/uipc_syscalls.c: revision 1.179
Memory leak. Triggerable from an unprivileged user via COMPAT_43.
 1.173.2.8  28-Aug-2017  skrll Sync with HEAD
 1.173.2.7  05-Feb-2017  skrll Sync with HEAD
 1.173.2.6  05-Dec-2016  skrll Sync with HEAD
 1.173.2.5  05-Oct-2016  skrll Sync with HEAD
 1.173.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.173.2.3  22-Sep-2015  skrll Sync with HEAD
 1.173.2.2  06-Jun-2015  skrll Sync with HEAD
 1.173.2.1  06-Apr-2015  skrll Sync with HEAD
 1.182.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.182.2.1  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.184.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.186.6.2  12-Nov-2018  martin Pull up following revision(s) (requested by hannken in ticket #1089):

external/bsd/nsd/include/config.h: revision 1.5
sys/kern/uipc_syscalls.c: revision 1.198
sys/kern/uipc_syscalls.c: revision 1.199
sys/kern/uipc_socket.c: revision 1.267

Update getsockopt(SO_ERROR) to behave like soreceive() and
return and clear so->so_rerror if so->so_error is zero.

Ok: christos@

-

Don't defer errors from sendmmsg(). This matches the linux manpage.

Defer errors from recvmmsg() through so_rerror and
tests and return a deferred error on entry.

Ok: christos@

-

sys_recvmmsg: don't defer an error that already gets returned.

-

Re-enable {send,recv}mmsg now they are working.
 1.186.6.1  18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.191.2.4  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.191.2.3  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.191.2.2  21-May-2018  pgoyette Sync with HEAD
 1.191.2.1  22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.194.2.1  10-Jun-2019  christos Sync with HEAD
 1.199.4.1  04-Oct-2021  martin Pull up following revision(s) (requested by thorpej in ticket #1351):

sys/miscfs/fifofs/fifo_vnops.c: revision 1.88
sys/kern/uipc_syscalls.c: revision 1.201
tests/lib/libc/sys/t_poll.c: revision 1.6
tests/lib/libc/sys/t_poll.c: revision 1.7
tests/lib/libc/sys/t_poll.c: revision 1.8

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.

In the fifo_hup1 test, also ensure that POLLHUP is de-asserted when a
new writer appears.

Add a fifo_inout test case that validates the expected POLLIN / POLLOUT
behavior for FIFOs:
- A FIFO is readable so long as at least 1 byte is available.
- A FIFO is writable so long as at least PIPE_BUF (obtained with _PC_PIPE_BUF)
space is avaiable.
This will be cloned for a forthcoming kevent test case.
 1.206.4.2  11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #830):

sys/kern/uipc_socket.c: revision 1.304
sys/kern/uipc_syscalls.c: revision 1.207

Fix a ~16 year old perf regression: when creating a socket, add a reference
to the caller's credentials rather than copying them. On an 80486DX2/66 this
seems to ~halve the time taken to create a socket.

Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.
 1.206.4.1  04-Feb-2024  martin Pull up following revision(s) (requested by jdolecek in ticket #583):

sys/kern/uipc_socket.c: revision 1.308
sys/kern/uipc_syscalls.c: revision 1.211
sys/sys/socketvar.h: revision 1.168
sys/net/if_gre.c: revision 1.185

fix PIPE_SOCKETPAIR variant of pipe1() to apply correctly the 'flags'
passed when called via pipe2(2), fixing repeatable process hang during
compilation with 'gcc -pipe'

refactor fsocreate() to return the new socket and file pointers,
expect the caller to call fd_affix() once initialization is fully complete
use the new fsocreate() to replace the duplicate open-coded 'flags' handling
in makesocket() used for socketpair(2), and in the PIPE_SOCKETPAIR pipe1()
this also fixes lib/libc/sys/t_pipe2 pipe2_cloexec test to succeed
on PIPE_SOCKETPAIR kernel

fixes PR kern/55690
 1.211.2.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed