Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/uipc_socket2.c
RevisionDateAuthorComments
 1.148  14-Sep-2025  andvar Fix various typos in comments and log message.
 1.147  07-Dec-2024  riastradh sys/kern/uipc_*.c: Fix leading whitespace issues.

Nix stray spaces before tab indentation.
 1.146  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sprinkle SET_ERROR dtrace probes.

PR kern/58378: Kernel error code origination lacks dtrace probes
 1.145  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Nix trailing whitespace.

No functional change intended.
 1.144  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sort includes.

No functional change intended.
 1.143  03-Jan-2024  andvar branches: 1.143.2;
s/addreseses/addresses/ in comments (and one missing whitespace).
 1.142  26-Oct-2022  riastradh ddb/db_active.h: New home for extern db_active.

This can be included unconditionally, and db_active can then be
queried unconditionally; if DDB is not in the kernel, then db_active
is a constant zero. Reduces need for #include opt_ddb.h, #ifdef DDB.
 1.141  09-Apr-2022  riastradh unix(4): Convert membar_exit to membar_release.

Use atomic_load_consume or atomic_load_relaxed where necessary.

Comment on why unlocked nonatomic access is valid where it is done.
 1.140  02-Oct-2021  thorpej - fifo_poll(): If the last writer has disappeared, detect this and return
POLLHUP, per POSIX.
- fifo_close(): Use the new fifo_socantrcvmore(), which is like the
garden-variety socantrcvmore(), except it specifies POLL_HUP rather
than POLL_IN (so the correct code for SIGIO is sent).
- sowakeup(): Allow POLL_HUP as a code (notifies poll'ers with POLLHUP).
- Add test cases for correct POLLHUP behavior with FIFOs.

Fixes PR kern/56429.
 1.139  04-Mar-2021  msaitoh Add missing opt_inet.h.
 1.138  26-Aug-2020  christos branches: 1.138.2;
add socket info for user and group for unix sockets in fstat.
 1.137  23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.136  01-Feb-2020  riastradh Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)
 1.135  01-Feb-2020  riastradh Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.
 1.134  11-Jul-2019  maxv branches: 1.134.2; 1.134.4;
Fix info leaks: the alignment of the structures causes uninitialized heap
memory to be copied to userland in sys_recvmsg().
 1.133  04-Nov-2018  christos - Introduce a new SO_RERROR socket option to explicitly turn on
receive overflow errors re-instating the default behavior to
silently ignore them as before 2018-03-19.
- Introduce a new kern.sooptions sysctl to control the default
behavior of socket options. Setting this to 0x4000 (SO_RERROR),
turns on receive overflow error reporting for all sockets.
- Change dhcpcd to turn on SO_RERROR on all its sockets.

As discussed in tech-net.
 1.132  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.131  20-Jul-2018  msaitoh Add "show socket" command written by Hiroki SUENAGA. It prints usage of
system's socket buffers.
 1.130  06-Jun-2018  roy branches: 1.130.2;
Separate receive socket errors from general socket errors.
 1.129  29-Apr-2018  maxv Remove references to m_copy in comments.
 1.128  19-Mar-2018  roy socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.
 1.127  18-Mar-2018  christos - Convert sb_lowat to unsigned for consistency. There are no negative value
uses
- Check for overflow as mentioned in the comment
- Sprinkle const
 1.126  06-Jul-2017  christos branches: 1.126.4;
move the timestamp stuff to uipc_socket.c because it already has the compat
includes.
 1.125  06-Jul-2017  christos Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
 1.124  02-Oct-2016  christos branches: 1.124.8;
more MFREE -> m_free
 1.123  23-May-2016  tls branches: 1.123.2;
Fix a longstanding problem with accept filters noticed by Timo Buhrmester:
sockets sitting in the accept filter can consume the entire listen queue,
such that the application is never able to handle any connections. Handle
this by simply passing through the oldest queued cxn when the queue is full.

This is fair because the longer a cxn lingers in the queue (stays connected
but does not meet the requirements of the filter for passage) the more likely
it is to be passed through, at which point the application can dispose of it.

Works because none of our accept filters actually allocate private state
per-cxn. If they did, we'd have to fix the API bug that there is presently
no way to tell an accf to finish/deallocate for a single cxn (accf_destroy
kills off the entire filter instance for a given listen socket).
 1.122  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.121  05-Sep-2014  matt branches: 1.121.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.
 1.120  31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.119  19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.118  18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.117  17-May-2014  rmind sonewconn: insert the socket into the queue *after* the protocol attach.
This potentially avoids unnecessary race conditions when handling partial
connections.
 1.116  17-May-2014  rmind - sonewconn: improve the initialisation order and add some asserts.
- Add various comments describing primitive routines operating on sockets,
clarify connection life-cycle and improve the description of socket queues.
- Sprinkle more asserts.
 1.115  08-Oct-2013  christos branches: 1.115.2;
0 -> NULL
MGET -> m_get
No functional change.
 1.114  15-Sep-2013  martin Avoid unused variable warnings
 1.113  29-Aug-2013  rmind Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.
 1.112  28-Jun-2013  matt branches: 1.112.2;
Make sbdrop panic more verbose
 1.111  27-Jun-2013  christos Introduce a more general method of sbcreatecontrol, sbcreatecontrol1 that
can take flags (M_WAITOK), and allocate large messages if needed. It also
returns the allocated pointer instead of copying the data to the passed
pointer. Implement sbcreatecontrol() using that.
 1.110  20-Dec-2011  christos branches: 1.110.6;
- Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).
 1.109  31-Aug-2011  plunky branches: 1.109.2; 1.109.6;
NULL does not need a cast
 1.108  24-Apr-2011  rmind - Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.
 1.107  09-Apr-2011  christos Preserve SB_ASYNC on the accepted socket. From: Dmitry Matveev
http://mail-index.netbsd.org/tech-net/2011/02/17/msg002457.html
 1.106  30-Dec-2009  elad branches: 1.106.4; 1.106.6;
Don't bother caching egid. It'll be removed soon.
 1.105  30-Dec-2009  elad Always use resource limits from the process, as proposed in

http://mail-index.netbsd.org/tech-kern/2009/12/30/msg006756.html

okay christos@.
 1.104  02-Sep-2009  tls Add a direction argument to socket upcalls, so they can tell why they've
been called when, for example, they're waiting for space to write. From
Ritesh Agrawal at Coyote Point.
 1.103  24-Jul-2009  christos check return code from soreserve() (Sean Boudreau)
 1.102  09-Apr-2009  yamt sonewconn: add an assertion.
 1.101  21-Jan-2009  yamt branches: 1.101.2;
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.
 1.100  24-Oct-2008  dyoung branches: 1.100.2; 1.100.4;
Change 'return (expr);' to 'return expr;'. Change (type *)0 to
NULL. No functional change intended.
 1.99  14-Oct-2008  ad Accept filters:

- Remove remaining #ifdef INET.
- Avoid holding locks so we don't need to do KM_NOSLEEP allocations.
- Use a rwlock to protect the accept filter list.
- Make it safe to unload accept filter modules.
- Minor KNF.
 1.98  11-Oct-2008  pooka Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
 1.97  04-Aug-2008  tls Add accept filters, ported from FreeBSD by Coyote Point Systems. Add inetd
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.

OK core@.
 1.96  18-Jun-2008  yamt branches: 1.96.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@
 1.95  10-Jun-2008  ad There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.
 1.94  26-May-2008  ad branches: 1.94.2;
Use pool_cache for sockets.
 1.93  24-May-2008  christos Coverity CID 5025: sbreserve is never called with a null socket.
 1.92  28-Apr-2008  martin branches: 1.92.2;
Remove clause 3 and 4 from TNF licenses
 1.91  24-Apr-2008  ad branches: 1.91.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.90  01-Mar-2008  rmind branches: 1.90.2;
Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.89  07-Feb-2008  ad branches: 1.89.2; 1.89.6;
sonewconn: inherit FNONBLOCK from the parent.
 1.88  29-Jan-2008  yamt sbreserve: curlwp can't be NULL these days.
XXX these code seems to need an overhaul.
 1.87  29-Jan-2008  yamt sbrelease: unwrap a short line.
 1.86  25-Sep-2007  ad branches: 1.86.4;
Use selinit() / seldestroy().
 1.85  02-Aug-2007  rmind branches: 1.85.2; 1.85.4; 1.85.6; 1.85.8;
TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.84  04-Jul-2007  tls branches: 1.84.2;
After looking at FreeBSD 6 again -- we were also failing to copy the
send and receive timeouts. Fix this.
 1.83  04-Jul-2007  tls Copy SNDLOWAT and RCVLOWAT socket options to accepted socket, so applications
can rely on all socket options being propagated from the listen socket as
the manual page says (and as everything but Linux has always done). FreeBSD 6
fixes this the same way, but this bug appears elsewhere and is...Old.
 1.82  04-Mar-2007  christos branches: 1.82.2; 1.82.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.81  01-Nov-2006  yamt branches: 1.81.4;
remove some __unused from function parameters.
 1.80  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.79  03-Oct-2006  elad Back out socket credentials for now, until we figure a better way of
handling the reference counting from interrupt context.
 1.78  02-Oct-2006  elad Add credentials to sockets, 'so_cred'.

Brought up on tech-kern@ some ~2 months ago, didn't seem to be an
objection; brought up again recently and no objection either... this is
not too intrusive and I've been running with this for a while.
 1.77  16-Aug-2006  plunky branches: 1.77.4;
Fix broken comments - there is no SO_ISCONNECTED or SO_ISCONFIRMING

this fixes kern/32058
 1.76  16-Aug-2006  plunky Remove macro call sonewconn() => sonewconn1() as it is no longer necessary.
There are no such calls and the compiler would catch mistakes like this
in any case.
 1.75  23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.74  03-Jul-2006  christos Make sure we have at least PIPE_BUF bytes available in the socket send buffer.
Review and comment by yamt.
 1.73  01-Jul-2006  christos Revert previous change to bump the socket low watermark to sock_loan_thresh.
With sock_loan_thresh=4096, sb_lowat==sb_hiwat, and sowritable will never
be true (even if only a single byte is pending). Some programs (like screen)
expect select() to return that a socket is writable on a socket when there
is space to write to it. XXX: What is the right thing to do here?
 1.72  21-Jun-2006  yamt bump default so_snd.sb_lowat to increase chance to use loaning.

the idea to tweak the watermark from Jonathan Stone.
reviewed by Bill Studenmund.
 1.71  14-May-2006  elad branches: 1.71.4;
integrate kauth.
 1.70  24-Dec-2005  perry branches: 1.70.4; 1.70.6; 1.70.8; 1.70.10; 1.70.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.69  11-Dec-2005  christos merge ktrace-lwp.
 1.68  29-May-2005  christos branches: 1.68.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.67  07-May-2005  christos PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
 1.66  26-Feb-2005  perry nuke trailing whitespace
 1.65  24-Jun-2004  jonathan branches: 1.65.4; 1.65.6;
Rename MBUFTRACE helper function m_claim() to m_claimm(),
for consistency with M_FREE() and m_freem(). Affected files:

sys/mbuf.h
kern/uipc_socket2.c
kern/uipc_mbuf.c
net/if_ethersubr.c
netatalk/ddp_input.c
nfs/nfs_socket.c
 1.64  11-Jun-2004  jonathan Fix potential memory leak in sbappendaddrchain():

We do an MGETHDR)() for each mbuf "packet" of the input chain, to hold
the socket address prepended to that "packet". If those MGETHDR()s
ever failed, we would leak all the successfully-allocated mbuf
headers. Leak noted by Yamamoto-san (yamt@NetBSD.org); thanks for catching it!

Add socketbuf invariant-checking macros to sbappendaddrchain(), and
replace a stray bcopy() with memcpy(), also as suggested by Yamamoto-san.
 1.63  27-May-2004  jonathan Rework to make FAST_IPSEC PF_KEY dumps unicast and reliable:

Introduce new socket-layer function sbappendaddrchain() to
sys/kern/uipc_socket2.c: like sbappendaddr(), only takes a chain of
records and appends the entire chain in one pass. sbappendaddrchain()
also takes an `sbprio' argument, which indicates the caller requires
special `reliable' handling of the socket-buffer. `sbprio' is
described in sys/sys/socketvar.h, although (for now) the different
levels are not yet implemented.

Rework sys/netipsec/key.c PF_KEY DUMP responses to build a chain of
mbuf records, one record per dump response. Unicast the entire chain
to the requestor, with all-or-none semantics.

Changed files;
sys/socketvar.h kern/uipc_socket2.c netipsec/key.c
Reviewed by:
Jason Thorpe, Thor Lancelot Simon, post to tech-kern.

Todo: request pullup to 2.0 branch. Post-2.0, rework sysctl() API for
dumps to use new record-chain constructors. Actually implement
the distinct service levels in sbappendaddrchain() so we can use them
to make PF_KEY ACQUIRE messages more reliable.
 1.62  19-Apr-2004  christos Charge root for socket buffers without a socket pointer.
 1.61  18-Apr-2004  matt Constify the addr parameter to sbappenaddr.
 1.60  18-Apr-2004  matt sbreserve can be called with a NULL socket, deal with it.
 1.59  17-Apr-2004  christos PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.
 1.58  21-Oct-2003  thorpej branches: 1.58.2;
Cache the "adjusted" value of sb_max when sb_max is changed, in order
to avoid doing quad math in sbreserve().

Change suggested by Simon Burge, and code inspired by a similar change
in FreeBSD.
 1.57  22-Sep-2003  christos - pass signo to fownsignal [ok by jd]
- make urg signal handling use fownsignal
- remove out of band detection in sowakeup
 1.56  21-Sep-2003  jdolecek cleanup & uniform descriptor owner handling:
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl

change discussed on tech-kern@
 1.55  06-Sep-2003  christos SA_SIGINFO changes.
 1.54  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.53  29-Jun-2003  fvdl branches: 1.53.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.52  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.51  23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.50  17-Apr-2003  fvdl A bit of an ugly workaround to avoid a warning for a larger MSIZE.
Shouldn't make a difference in the generated code.
 1.49  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.48  23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.47  27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.46  22-Aug-2002  thorpej In sbcompress(), if we toss an empty mbuf, make sure to update
sb_lastrecord if necessary.

From Daniel Hartmeier <daniel@benzedrine.cx>.
 1.45  03-Jul-2002  thorpej Rename SB_UPDATE_TAIL() to SB_EMPTY_FIXUP(), per suggestion from
Jonathan Stone.
 1.44  03-Jul-2002  thorpej Rename sbappend_stream() to sbappendstream(), per suggestion from
Jonathan Stone.
 1.43  03-Jul-2002  thorpej Make insertion of data into socket buffers O(C):
* Keep pointers to the first and last mbufs of the last record in the
socket buffer.
* Use the sb_lastrecord pointer in the sbappend*() family of functions
to avoid traversing the packet chain to find the last record.
* Add a new sbappend_stream() function for stream protocols which
guarantee that there will never be more than one record in the
socket buffer. This function uses the sb_mbtail pointer to perform
the data insertion. Make TCP use sbappend_stream().

On a profiling run, this makes sbappend of a TCP transmission using
a 1M socket buffer go from 50% of the time to .02% of the time.

Thanks to Bill Sommerfeld and YAMAMOTO Takashi for their debugging
assistance!
 1.42  12-Nov-2001  lukem branches: 1.42.8;
add RCSIDs
 1.41  05-Aug-2001  enami branches: 1.41.4;
Give different names for different wait channels.

# and minor knf fix while I'm here.
 1.40  27-Jul-2001  thorpej Now that M_TRAILINGSPACE() checks buffer writeability properly,
we can greatly simplify sbcompress(). Slightly modified from
a similar change in FreeBSD.
 1.39  16-Jun-2001  manu branches: 1.39.2;
Use SB_ASYNC in struct sockbuf sb_flags field instead of SS_ASYNC in
struct socket so_state field to decide if we need to send asynchronous
notifications. This makes possible to request notification on write but
not on read, and vice versa.

This is used in Linux emulation code, because when async I/O is requested,
Linux does not send SIGIO to write end of sockets, and it never send any
SIGIO to any end of pipes. Il Linux emulation code, we then set SB_ASYNC
only on the read end of sockets, and on no end for pipes.
 1.38  30-Apr-2001  kml Large values of sb_max would cause an overflow in sbreserve(); cast to
u_quad_t to avoid this. (from FreeBSD uipc_socket2.c v1.19)
 1.37  27-Feb-2001  lukem branches: 1.37.2;
convert to ANSI KNF
 1.36  30-Mar-2000  augustss Get rid of register declarations.
 1.35  29-Feb-2000  itojun more fix to ancillary data alignment. we need padding after
last cmsg_data item (see the figure on RFC2292 page 18).
 1.34  18-Feb-2000  itojun fix alignment problem in ancillary messages (alpha).

the change constitutes binary compatibility issue hen sizeof(long) !=4.
there's no way to be backward compatible, and only guys affected
are IPv6 userland tools.

From: =?iso-8859-1?Q?G=F6ran_Bengtson?= <goeran@cdg.chalmers.se>
 1.33  04-Aug-1999  mycroft branches: 1.33.2;
The old compaction test had an off-by-one error that caused it to not compact
in some cases where it could have. Fix this, and the new version as well.
 1.32  04-Aug-1999  matt Don't compress mbuf clusters which are referenced by multiple
mbufs since you might overwriting valuable data. (think of
m_copy'ed data from a TCP re-transmission queue. Since those
might be in clusters and referenced in two sockets).
 1.31  04-Aug-1999  mycroft It's now possible for sbcompress() to compact mbuf clusters, so do it.
This helps prevent mbuf cluster exhaustion when receiving lots of small
packets.
 1.30  01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.29  22-Apr-1999  simonb Move inclusion of "opt_sb_max.h" from sys/socketvar.h to
conf/param.c, and move the initialisation of the sb_max
variable from kern/uipc_socket2.c to conf/param.c. Now
everthing that includes sys/socketvar.h doesn't get
recompiled when SB_MAX's value changes.
 1.28  23-Mar-1999  lukem branches: 1.28.2; 1.28.4; 1.28.6;
Ensure that you can only bind a more specific address when it is done by the
same uid or by root.

This code is from FreeBSD. (Whilst it was originally obtained from OpenBSD,
FreeBSD fixed it to work with multicast. To quote the commit message:
- Don't bother checking for conflicting sockets if we're binding to a
multicast address.
- Don't return an error if we're binding to INADDR_ANY, the conflicting
socket is bound to INADDR_ANY, and the conflicting socket has
SO_REUSEPORT set.
)
 1.27  20-Jan-1999  mycroft Do not remove sockets from the accept(2) queue on close.
 1.26  04-Aug-1998  perry Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)
 1.25  02-Aug-1998  thorpej Use the pool allocator for sockets.
 1.24  25-Apr-1998  matt Hook for 0-copy (or other optimized) sends and receives
 1.23  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.22  07-Jan-1998  thorpej Make insertion and removal of sockets from the partial and incoming
connections queues O(C) rather than O(N).
 1.21  09-Oct-1997  mycroft branches: 1.21.2;
Make various standard wmesg strings const.
 1.20  26-Jun-1997  thorpej branches: 1.20.4;
In sbappendaddr(), if the sockaddr length is larger than will fit in
an mbuf, allocate enough external storage to hold the sockaddr. Thanks
to enami tsugutomo <enami@cv.sony.co.jp> for providing sanity-checks.
 1.19  11-Jan-1997  thorpej branches: 1.19.8;
Implement sbcreatecontrol(), a generic function to create a "control"
mbuf for presentation on a socket buffer.
 1.18  09-Dec-1996  thorpej In sbreserve(), don't allow a count of 0. Fixes PR #2794, from
Erik Berls <cyber@dis.org>.
 1.17  26-Nov-1996  thorpej Back out previous soqinsque() and soqremque() changes. This will
stop the panics until the socket queues get converted to <sys/queue.h>.
 1.16  10-Nov-1996  thorpej Optimization of soqinsque() and soqremque():

Keep queue of pending sockets in a double linked list. Previously,
a singly linked list was used, giving O(N) insertion/deletion times,
and was a major time consumer for sockets with large pending queues.
The double linked list give O(C) insertion/deletion times with only
a small cost in complexity.

Since a socket can be on, at most, one queue at a time, both so_q and
so_q0 can safely be used as (forward and backward, respectively) queue
pointers.

Submitted my Matt Thomas <matt@3am-software.com>, a long time ago.
(Geez, I've been running with this patch for _months_, and had completely
forgotten about it!)
 1.15  13-Oct-1996  christos backout previous kprintf change
 1.14  10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.13  07-Sep-1996  mycroft Implement poll(2).
 1.12  22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.11  04-Feb-1996  christos branches: 1.11.4;
First pass at prototyping
 1.10  16-Aug-1995  mycroft Access rights are now stored in MT_CONTROL mbufs. Document this.
 1.9  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.7  04-May-1994  mycroft More return types...
 1.6  25-Apr-1994  mycroft Remove sbselqueue().
 1.5  18-Dec-1993  mycroft Canonicalize all #includes.
 1.4  27-Jun-1993  andrew branches: 1.4.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.3  18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.2  21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.4.2  14-Nov-1993  mycroft Canonicalize all #includes.
 1.4.4.1  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
init_main.c: New method of pseudo-device of initialization.
kern_clock.c: hardclock() and softclock() now take a pointer to a clockframe.
softclock() only does callouts.
kern_synch.c: Remove spurious declaration of endtsleep(). Adjust uses of
averunnable for new struct loadav.
subr_prf.c: Allow printf() formats in panic().
tty.c: averunnable changes.
vfs_subr.c: va_size and va_bytes are now quads.
 1.11.4.2  11-Dec-1996  mycroft From trunk:
Don't allow SO_{SND,RCV}BUF with a buffer size of 0.
 1.11.4.1  11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
 1.19.8.3  28-Jun-1997  thorpej Remove handling of SS_FORCE.
 1.19.8.2  26-Jun-1997  thorpej Update from trunk.
 1.19.8.1  14-May-1997  mellon Support SS_FORCE bit in sonewconn1() to override accept queue limit for SYN cache
 1.20.4.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.21.2.2  25-Jan-1999  cgd Patch to fix select(2)/accept(2) race condition which permits DoS. (mycroft)
 1.21.2.1  29-Jan-1998  mellon Pull up 1.22 (thorpej)
 1.28.6.2  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.28.6.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.28.4.2  01-Jul-1999  thorpej Sync w/ -current.
 1.28.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.28.2.1  22-Sep-1999  cgd pull up revs 1.31-1.33 from trunk (requested by cgd):
Compact mbuf clusters, to help prevent mbuf cluster exhaustion when
receiving lots of small packets. This costs some performance (the
compaction copies data), but adds a lot of stability to many systems.
 1.33.2.2  12-Mar-2001  bouyer Sync with HEAD.
 1.33.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.37.2.7  11-Nov-2002  nathanw Catch up to -current
 1.37.2.6  18-Oct-2002  nathanw Catch up to -current.
 1.37.2.5  27-Aug-2002  nathanw Catch up to -current.
 1.37.2.4  01-Aug-2002  nathanw Catch up to -current.
 1.37.2.3  14-Nov-2001  nathanw Catch up to -current.
 1.37.2.2  24-Aug-2001  nathanw Catch up with -current.
 1.37.2.1  21-Jun-2001  nathanw Catch up to -current.
 1.39.2.7  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.39.2.6  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.39.2.5  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.39.2.4  08-Sep-2001  thorpej Add a selnotify(), which does a selwakeup() + KNOTE(), rather than
requiring all callers to do both.

This may be a transitional step only, or it may stick. I haven't
decided yet.
 1.39.2.3  25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.39.2.2  03-Aug-2001  lukem update to -current
 1.39.2.1  10-Jul-2001  lukem add calls to KNOTE(9) as appropriate
 1.41.4.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.42.8.2  29-Aug-2002  gehenna catch up with -current.
 1.42.8.1  15-Jul-2002  gehenna catch up with -current.
 1.53.2.6  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.53.2.5  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.53.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.53.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.53.2.2  03-Aug-2004  skrll Sync with HEAD
 1.53.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.58.2.3  14-Jul-2004  tron Pull up revision 1.65 (requested by jonathan in ticket #648):
Rename MBUFTRACE helper function m_claim() to m_claimm(),
for consistency with M_FREE() and m_freem(). Affected files:
sys/mbuf.h
kern/uipc_socket2.c
kern/uipc_mbuf.c
net/if_ethersubr.c
netatalk/ddp_input.c
nfs/nfs_socket.c
 1.58.2.2  16-Jun-2004  tron Pull up revision 1.64 (requested by jonathan in ticket #503):
Fix potential memory leak in sbappendaddrchain():
We do an MGETHDR)() for each mbuf "packet" of the input chain, to hold
the socket address prepended to that "packet". If those MGETHDR()s
ever failed, we would leak all the successfully-allocated mbuf
headers. Leak noted by Yamamoto-san (yamt@NetBSD.org); thanks for catching it!
Add socketbuf invariant-checking macros to sbappendaddrchain(), and
replace a stray bcopy() with memcpy(), also as suggested by Yamamoto-san.
 1.58.2.1  30-May-2004  tron Pull up revision 1.63 (requested by jonathan in ticket #405):
Rework to make FAST_IPSEC PF_KEY dumps unicast and reliable:
Introduce new socket-layer function sbappendaddrchain() to
sys/kern/uipc_socket2.c: like sbappendaddr(), only takes a chain of
records and appends the entire chain in one pass. sbappendaddrchain()
also takes an `sbprio' argument, which indicates the caller requires
special `reliable' handling of the socket-buffer. `sbprio' is
described in sys/sys/socketvar.h, although (for now) the different
levels are not yet implemented.
Rework sys/netipsec/key.c PF_KEY DUMP responses to build a chain of
mbuf records, one record per dump response. Unicast the entire chain
to the requestor, with all-or-none semantics.
Changed files;
sys/socketvar.h kern/uipc_socket2.c netipsec/key.c
Reviewed by:
Jason Thorpe, Thor Lancelot Simon, post to tech-kern.
Todo: request pullup to 2.0 branch. Post-2.0, rework sysctl() API for
dumps to use new record-chain constructors. Actually implement
the distinct service levels in sbappendaddrchain() so we can use them
to make PF_KEY ACQUIRE messages more reliable.
 1.65.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.65.4.1  29-Apr-2005  kent sync with -current
 1.68.2.7  17-Mar-2008  yamt sync with head.
 1.68.2.6  11-Feb-2008  yamt sync with head.
 1.68.2.5  04-Feb-2008  yamt sync with head.
 1.68.2.4  27-Oct-2007  yamt sync with head.
 1.68.2.3  03-Sep-2007  yamt sync with head.
 1.68.2.2  30-Dec-2006  yamt sync with head.
 1.68.2.1  21-Jun-2006  yamt sync with head.
 1.70.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.70.10.2  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.70.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.70.8.4  03-Sep-2006  yamt sync with head.
 1.70.8.3  11-Aug-2006  yamt sync with head
 1.70.8.2  26-Jun-2006  yamt sync with head.
 1.70.8.1  24-May-2006  yamt sync with head.
 1.70.6.1  01-Jun-2006  kardel Sync with head.
 1.70.4.1  09-Sep-2006  rpaulo sync with head
 1.71.4.1  13-Jul-2006  gdamore Merge from HEAD.
 1.77.4.2  10-Dec-2006  yamt sync with head.
 1.77.4.1  22-Oct-2006  yamt sync with head
 1.81.4.1  12-Mar-2007  rmind Sync with HEAD.
 1.82.4.1  11-Jul-2007  mjf Sync with head.
 1.82.2.3  09-Oct-2007  ad Sync with head.
 1.82.2.2  20-Aug-2007  ad Sync with HEAD.
 1.82.2.1  15-Jul-2007  ad Sync with head.
 1.84.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.85.8.2  02-Aug-2007  rmind TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.85.8.1  02-Aug-2007  rmind file uipc_socket2.c was added on branch matt-mips64 on 2007-08-02 02:42:41 +0000
 1.85.6.1  06-Oct-2007  yamt sync with head.
 1.85.4.2  23-Mar-2008  matt sync with HEAD
 1.85.4.1  06-Nov-2007  matt sync with HEAD
 1.85.2.1  02-Oct-2007  joerg Sync with HEAD.
 1.86.4.1  18-Feb-2008  mjf Sync with HEAD.
 1.89.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.89.6.4  28-Sep-2008  mjf Sync with HEAD.
 1.89.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.89.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.89.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.89.2.1  24-Mar-2008  keiichi sync with head.
 1.90.2.4  17-Jun-2008  yamt sync with head.
 1.90.2.3  06-Jun-2008  christos add so_egid and so_cpid for pf.
 1.90.2.2  04-Jun-2008  yamt sync with head
 1.90.2.1  18-May-2008  yamt sync with head.
 1.91.2.5  11-Mar-2010  yamt sync with head
 1.91.2.4  16-Sep-2009  yamt sync with head
 1.91.2.3  19-Aug-2009  yamt sync with head.
 1.91.2.2  04-May-2009  yamt sync with head.
 1.91.2.1  16-May-2008  yamt sync with head.
 1.92.2.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.92.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.94.2.1  18-Jun-2008  simonb Sync with head.
 1.96.2.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.96.2.1  19-Oct-2008  haad Sync with HEAD.
 1.100.4.1  02-Feb-2009  snj Pull up following revision(s) (requested by yamt in ticket #393):
sys/kern/uipc_socket.c: revision 1.185
sys/kern/uipc_socket2.c: revision 1.101
sys/kern/uipc_syscalls.c: revision 1.135
sys/miscfs/portal/portal_vnops.c: revision 1.81
sys/netsmb/smb_trantcp.c: revision 1.40
sys/nfs/nfs_socket.c: revision 1.177
sys/sys/socketvar.h: revision 1.118
restore the pre socket locking patch signal behaviour.
this fixes a busy-loop in nfs_connect.
 1.100.2.2  28-Apr-2009  skrll Sync with HEAD.
 1.100.2.1  03-Mar-2009  skrll Sync with HEAD.
 1.101.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.106.6.1  06-Jun-2011  jruoho Sync with HEAD.
 1.106.4.2  31-May-2011  rmind sync with head
 1.106.4.1  21-Apr-2011  rmind sync with head
 1.109.6.1  18-Feb-2012  mrg merge to -current.
 1.109.2.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.109.2.1  17-Apr-2012  yamt sync with head
 1.110.6.2  03-Dec-2017  jdolecek update from HEAD
 1.110.6.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.112.2.4  18-May-2014  rmind sync with head
 1.112.2.3  17-Oct-2013  rmind Eliminate some of the splsoftnet() calls, misc clean up.
 1.112.2.2  23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.112.2.1  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.115.2.1  10-Aug-2014  tls Rebase.
 1.121.2.3  05-Oct-2016  skrll Sync with HEAD
 1.121.2.2  29-May-2016  skrll Sync with HEAD
 1.121.2.1  22-Sep-2015  skrll Sync with HEAD
 1.123.2.1  04-Nov-2016  pgoyette Sync with HEAD
 1.124.8.4  07-Aug-2019  martin Pull up following revision(s) (requested by maxv in ticket #1330):

sys/kern/uipc_socket2.c: revision 1.134

Fix info leaks: the alignment of the structures causes uninitialized heap
memory to be copied to userland in sys_recvmsg().
 1.124.8.3  31-Jul-2018  martin Pull up following revision(s) (requested by msaitoh in ticket #954):

sys/sys/socketvar.h: revision 1.157
share/man/man4/ddb.4: revision 1.180
share/man/man4/ddb.4: revision 1.181
sys/kern/uipc_socket2.c: revision 1.131
sys/ddb/db_command.c: revision 1.154

Add "show socket" command written by Hiroki SUENAGA. It prints usage of
system's socket buffers.
Improve wording.
 1.124.8.2  09-Jun-2018  martin Pull up following revision(s) (requested by roy in ticket #868):

sys/sys/socketvar.h: revision 1.156
sys/kern/uipc_socket2.c: revision 1.130
sys/kern/uipc_socket.c: revision 1.264

Separate receive socket errors from general socket errors.
 1.124.8.1  09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.126.4.6  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.126.4.5  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.126.4.4  28-Jul-2018  pgoyette Sync with HEAD
 1.126.4.3  25-Jun-2018  pgoyette Sync with HEAD
 1.126.4.2  02-May-2018  pgoyette Synch with HEAD
 1.126.4.1  22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.130.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.130.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.130.2.1  10-Jun-2019  christos Sync with HEAD
 1.134.4.1  29-Feb-2020  ad Sync with head.
 1.134.2.3  20-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/kern/kern_event.c: revision 1.106
sys/kern/sys_select.c: revision 1.51
sys/kern/subr_exec_fd.c: revision 1.10
sys/kern/sys_aio.c: revision 1.46
sys/kern/kern_descrip.c: revision 1.244
sys/kern/kern_descrip.c: revision 1.245
sys/ddb/db_xxx.c: revision 1.72
sys/ddb/db_xxx.c: revision 1.73
sys/miscfs/fdesc/fdesc_vnops.c: revision 1.132
sys/kern/uipc_usrreq.c: revision 1.195
sys/kern/sys_descrip.c: revision 1.36
sys/kern/uipc_usrreq.c: revision 1.196
sys/kern/uipc_socket2.c: revision 1.135
sys/kern/uipc_socket2.c: revision 1.136
sys/kern/kern_sig.c: revision 1.383
sys/kern/kern_sig.c: revision 1.384
sys/compat/netbsd32/netbsd32_ioctl.c: revision 1.107
sys/miscfs/procfs/procfs_vnops.c: revision 1.208
sys/kern/subr_exec_fd.c: revision 1.9
sys/kern/kern_descrip.c: revision 1.252
(all via patch)

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:
- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.

Load struct fdfile::ff_file with atomic_load_consume.
Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)
kern_descrip.c: Fix membars around reference count decrement.

In general, the `last one out hit the lights' style of reference
counting (as opposed to the `whoever's destroying must wait for
pending users to finish' style) requires memory barriers like so:

... usage of resources associated with object ...
membar_release();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_acquire();
... freeing of resources associated with object ...

This way, all usage happens-before all freeing. This fixes several
errors:
- fd_close failed to ensure whatever its caller did would
happen-before the freeing, in the case where another thread is
concurrently trying to close the fd (ff->ff_file == NULL).
Fix: Add membar_release before atomic_dec_uint(&ff->ff_refcnt) in
that branch.
- fd_close failed to ensure all loads its caller had issued will have
happened-before the freeing, in the case where the fd is still in
use by another thread (fdp->fd_refcnt > 1 and ff->ff_refcnt-- > 0).
Fix: Change membar_producer to membar_release before
atomic_dec_uint(&ff->ff_refcnt).
- fd_close failed to ensure that any usage of fp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&ff->ff_refcnt).
- fd_free failed to ensure that any usage of fdp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&fdp->fd_refcnt).

While here, change membar_exit -> membar_release. No semantic
change, just updating away from the legacy API.
 1.134.2.2  02-Oct-2021  martin Pull up following revision(s) (requested by thorpej in ticket #1350):

sys/kern/uipc_socket2.c: revision 1.140
tests/lib/libc/sys/t_poll.c: revision 1.5
sys/miscfs/fifofs/fifo_vnops.c: revision 1.87

- fifo_poll(): If the last writer has disappeared, detect this and return
POLLHUP, per POSIX.
- fifo_close(): Use the new fifo_socantrcvmore(), which is like the
garden-variety socantrcvmore(), except it specifies POLL_HUP rather
than POLL_IN (so the correct code for SIGIO is sent).
- sowakeup(): Allow POLL_HUP as a code (notifies poll'ers with POLLHUP).
- Add test cases for correct POLLHUP behavior with FIFOs.

Fixes PR kern/56429.
 1.134.2.1  22-Sep-2020  martin Pull up following revision(s) (requested by christos in ticket #1091):

sys/kern/uipc_socket.c: revision 1.291
sys/kern/uipc_usrreq.c: revision 1.199
sys/kern/uipc_socket2.c: revision 1.138

add socket info for user and group for unix sockets in fstat.
 1.138.2.1  03-Apr-2021  thorpej Sync with HEAD.
 1.143.2.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed