Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/sys_pipe.c
RevisionDateAuthorComments
 1.168  16-Jul-2025  kre Kernel part of O_CLOFORK implementation (plus kernel revbump)

This is Ricardo Branco's implementation of O_CLOFORK (and
associated fcntl, etc) for NetBSD (with a few minor changes
by me).

For now, the header file symbols that should be exposed to
userland are hidden inside temporary #ifdef _KERNEL blocks,
just to avoid random userland apps, or config scripts, from
seeing any of this before it is better tested.

Userland parts of this will follow soon.

This also bumps the kernel version to 10.99.15 (changes to
data structs, and the signature of fd_dup()).
 1.167  10-Feb-2024  andvar branches: 1.167.2;
fix various typos in comments and log messages.
 1.166  02-Nov-2023  martin Back out the following revisions on behalf of core:

sys/sys/lwp.h: revision 1.228
sys/sys/pipe.h: revision 1.40
sys/kern/uipc_socket.c: revision 1.306
sys/kern/kern_sleepq.c: revision 1.84
sys/rump/librump/rumpkern/locks_up.c: revision 1.13
sys/kern/sys_pipe.c: revision 1.165
usr.bin/fstat/fstat.c: revision 1.119
sys/rump/librump/rumpkern/locks.c: revision 1.87
sys/ddb/db_xxx.c: revision 1.78
sys/ddb/db_command.c: revision 1.187
sys/sys/condvar.h: revision 1.18
sys/ddb/db_interface.h: revision 1.42
sys/sys/socketvar.h: revision 1.166
sys/kern/uipc_syscalls.c: revision 1.209
sys/kern/kern_condvar.c: revision 1.60

Add cv_fdrestart() [...]
Use cv_fdrestart() to implement fo_restart.
Simplify/streamline pipes a little bit [...]

This changes have caused regressions and need to be debugged.
The cv_fdrestart() addition needs more discussion.
 1.165  13-Oct-2023  ad Simplify/streamline pipes a little bit:

- Allocate only one struct pipe not two (no need to be bidirectional here).
- Then use f_flag (FREAD/FWRITE) to figure out what to do in the fileops.
- Never wake the other side or acquire long-term (I/O) lock unless needed.
- Whenever possible, defer wakeups until after locks have been released.
- Do some things locklessly in pipe_ioctl() and pipe_poll().

Some notable results:

- -30% latency on a 486DX2/66 doing 1 byte ping-pong within a single process.
- 2.5x less lock contention during "make cleandir" of src on a 48 CPU machine.
- 1.5x bandwith with 1kB messages on the same 48 CPU machine (8kB: same b/w).
 1.164  05-Oct-2023  ad Update comments to match reality
 1.163  04-Oct-2023  ad pipe1(): call getnanotime() once not twice.
 1.162  04-Oct-2023  ad pipe->pipe_waiters isn't needed on NetBSD, kernel condvars do this for free.
 1.161  04-Oct-2023  ad pipe_read(): try to skip locking the pipe if a non-blocking fd is used, as
is very often the case with BSD make (from FreeBSD/mjg@).
 1.160  22-Apr-2023  riastradh file(9): New fo_posix_fadvise operation.

XXX kernel revbump -- changes struct fileops API and ABI
 1.159  22-Apr-2023  riastradh file(9): New fo_fpathconf operation.

XXX kernel revbump -- struct fileops API and ABI change
 1.158  11-Oct-2021  thorpej Setting EV_EOF requires modifying kn->kn_flags. However, that relies on
holding the kq_lock of that note's kq. Rather than exposing this directly,
add new knote_set_eof() and knote_clear_eof() functions that handle the
necessary locking and don't leak as many implementation details to modules.

NetBSD 9.99.91
 1.157  02-Oct-2021  hannken Fix a deadlock where one thread writes to a pipe, has more data
and no space in the pipe and waits on "pipe_wcv" while the reader
is closing the pipe and waits on "pipe_draincv".

Swap the test for "PIPE_EOF" and the "cv_wait_sig()" in "pipe_write()".

PR bin/56422 "zgrep -l sometimes hangs"
 1.156  27-Sep-2021  thorpej Tweak filt_piperead() and filt_pipewrite() so that:
- There is only a single return from the function (and thus a single
place where the pipe lock must be released).
- kn->kn_data is referenced only inside the lock perimeter.
 1.155  26-Sep-2021  thorpej The pipe kq filter ops are MPSAFE.
 1.154  26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.153  07-Sep-2021  andvar s/aquire/acquire/ in comments, also one typo fix acqure->acquire.
 1.152  25-Jan-2021  dholland Fix a thundering herd problem in pipes.

Wake only one waiter when data becomes available, not all of them.
Waking them all is not a usual case, but turns up with make's job
token pipes. (Probably make's job signalling scheme should also be
revised, assuming rillig hasn't already done that, but that's a
separate issue.)

This change will not do us much good for the moment because we don't
distinguish cv_signal from cv_broadcast for interruptible sleeps, but
that's also a separate problem.

Seen on FreeBSD; from mjg at freebsd a couple months ago. Patch was
mine (iirc) but the real work in this sort of thing is discovering the
problem.
 1.151  11-Dec-2020  thorpej Use sel{record,remove}_knote().
 1.150  25-Jun-2020  maxv branches: 1.150.2;
Fix NULL deref. The original code before Jaromir's cleanup had an #ifndef
block that wrongly contained the 'else' statement, causing the NULL check
to have no effect.

Reported-by: syzbot+c41bbfe5a7ff07bf0f99@syzkaller.appspotmail.com
 1.149  25-Jun-2020  jdolecek remove experimental direct pipe code (using uvm_loan()) I added in 2001 - it's
slower than the non-direct variant on MP systems, if anybody wants
to hack on this further it's available in Attic
 1.148  26-Apr-2019  mlelstv branches: 1.148.2;
Handle half-closed pipes in FIONWRITE and FIONSPACE.
 1.147  26-Apr-2019  mlelstv Clean up pipe structure before recycling it.
 1.146  10-Jun-2018  jdolecek branches: 1.146.2;
convert the (still disabled) 'direct write' for pipes to use the
experimental PMAP_DIRECT if available; the direct code paths now survive
longer than the pmap_enter() variant, but still triggers panic during
build.sh tools run; remove some obsolete sysctls

add some XXXs to mark places which need attention to make this more stable

Note: the loan case is now actually significantly slower than the
non-loan case on MP systems, due to synchronous IPIs triggered by
marking the page read-only by uvm_loan(); this is being discussed
in the email thread
https://mail-index.netbsd.org/tech-kern/2018/05/21/msg023441.html

that is basically the same issue due to which loaning was disabled
for sosend()
 1.145  19-May-2018  jdolecek Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.144  20-Apr-2018  jdolecek add prot parameter for uvm_emap_enter(), so that it's possible to
enter also read/write mappings
 1.143  26-Dec-2017  kamil branches: 1.143.2;
Refactor pipe1() and correct a bug in sys_pipe2() (SYS_pipe2)

sys_pipe2() returns two integers (values), the 2nd one is a copy of the 2nd
file descriptor that lands in fildes[2]. This is a side effect of reusing
the code for sys_pipe() (SYS_pipe) and not cleaning it up.

The first returned value is (on success) 0.

Introduced a small refactoring in pipe1() that it does not operate over
retval[], but on an array int[2]. A user sets retval[] for pipe() when
desired and needed.

This refactoring touches compat code: netbsd32, linux, linux32.

Before the changes on NetBSD/amd64:

$ ktruss -i ./a.out
[...]
15131 1 a.out pipe2(0x7f7fff2e62b8, 0) = 0, 4
[...]

After the changes:

$ ktruss -i ./a.out
[...]
782 1 a.out pipe2(0x7f7fff97e850, 0) = 0
[...]

There should not be a visible change for current users.

Sponsored by <The NetBSD Foundation>
 1.142  30-Nov-2017  christos add fo_name so we can identify the fileops in a simple way.
 1.141  25-Oct-2017  maya Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.140  05-Sep-2014  matt branches: 1.140.12;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.139  05-Sep-2014  matt Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.
 1.138  25-Feb-2014  pooka branches: 1.138.4;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.137  28-Jun-2013  matt branches: 1.137.2;
Make page loaning in pipes color aware.
 1.136  16-May-2012  martin branches: 1.136.2;
Make sure we can deliver two file descriptors for pipe2() before we set
up anything special (like close on exec).
Fixes PR kern/46457.
 1.135  25-Jan-2012  christos branches: 1.135.2;
As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]
 1.134  20-Oct-2011  njoly branches: 1.134.2; 1.134.6;
Do call fd_set_exclose() on both file descriptors, to set the
close-on-exec flag.
 1.133  05-Oct-2011  apb When pipe1() calls pipe_create() and it fails, use the error
result from pipe_create(), don't assume it will always be ENOMEM.

From PR 45423 by Greg Woods.
 1.132  15-Jul-2011  christos fail with EINVAL if flags not are not O_CLOEXEC|O_NONBLOCK in pipe2(2) and
dup3(2)
 1.131  26-Jun-2011  christos * Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.
 1.130  10-Apr-2011  christos - Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)
 1.129  17-Jan-2011  uebayasi Include internal definitions (uvm/uvm.h) only where necessary.
 1.128  11-Aug-2010  pgoyette branches: 1.128.2;
Keep condvar wmesg within 8-char limit
 1.127  20-Dec-2009  dsl branches: 1.127.2; 1.127.4;
If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
 1.126  15-Dec-2009  dsl Don't ERESTART write() calls for now.
I suspect some programs don't allow for the partial transfer.
 1.125  13-Dec-2009  dsl Another, better, fix for PR/26567.
Only sleep once within each pipe_read/pipe_write call.
If there is no data/space available after we wakeup return ERESTART so
then the 'fd' number is validated again.
A simple broadcast of the cvs is then enough to evict the correct threads
when close() is called from an active thread.
 1.124  13-Dec-2009  dsl Revert most of the previous change.
Only one fd needs clobbering, not all fds that reference the pipe.
This may be what ad@ realised when he tried to add the same code to
sockets. Unfixes part of PR/26567.
 1.123  12-Dec-2009  dsl Add support for unblocking read/write when close called.
Fixes PR/26567 for pipes.
(NB ad backed out the fix for sockets)
 1.122  10-Dec-2009  dsl Avoid leaking a mutex_obj when pipe_create() fails for the read pipe.
Remove the unused argument from pipeclose().
 1.121  09-Dec-2009  dsl Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
 1.120  06-Dec-2009  dsl Correct comment, pipelock() no longer releases the mutex.
 1.119  31-Aug-2009  rmind Turn off pipe's direct I/O again, it corrupts the data (although build and
various activity survived while testing this). Corruptions also happen on
sparc64 where emap is not in effect, therefore bugs are in direct I/O code.
 1.118  29-Aug-2009  rmind - Re-enable direct I/O with emap for pipe.
- While not used, #ifdef KVA allocation in emap (so it wont burn the space).
 1.117  15-Jul-2009  rmind Revert previous: disable direct I/O on pipe, it cought a problem with emap.
 1.116  13-Jul-2009  rmind Re-enable direct I/O for pipe:
- Larger writes (2 or more pages) will use emap.
- Might help to catch rare hang (some very old bug).
 1.115  28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.114  28-Jun-2009  rmind Amend previous.
 1.113  28-Jun-2009  rmind - Convert some #ifdefs to KASSERT()s.
- KNF, style, no parameters in function declarations.
- No functional changes.
 1.112  11-Apr-2009  christos Fix locking as Andy explained. Also fill in uid and gid like sys_pipe did.
 1.111  11-Apr-2009  christos rename ctime to btime for consistency.
 1.110  11-Apr-2009  christos - maintain timespec internally.
- set birthtime too.
 1.109  04-Apr-2009  ad Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)
 1.108  15-Feb-2009  enami The knote objects attached by peer will still be linked in our list
if we are closed before the peer. So, remove them. It didn't matter
when pipe objects are directly returned to pool, but nowadays they
are cached.
 1.107  06-Feb-2009  enami branches: 1.107.2;
Instead of missing NULL check in pipe_create, let the pipe_ctor to wait
on buffer allocation. The other allocation is simply an optimization,
so leave it as is.
 1.106  01-Feb-2009  ad Apply pipe patch posted to tech-kern, slightly updated:

- Cache kva.
- Convert to use mutex_obj_alloc().
- Make better use of pool_cache.

Also:

Disable direct transfers for the moment. I believe there may be a bug that
can cause transfers to stall when switching between direct/buffered access.
I think this has most recently been run into on 'denver' but I have seen it
as far back as 3.1.

(As an aside, direct is a not a clear win on modern systems with large cache
and high TLB invalidation overhead. Particularly so on MP systems, although
micro benchmarks may report otherwise because they typically do not tax the
system. Anyone want to write a decent benchmark?)
 1.105  20-Jan-2009  yamt fix inverted POLL_ directions.
 1.104  20-Jan-2009  yamt pipeselwakeup: now POLL_HUP != POLL_ERR. remove unnecessary #if.
 1.103  17-Sep-2008  pooka branches: 1.103.2; 1.103.4;
remove M_PIPE (hi rmind!)
 1.102  06-Sep-2008  rmind Replace malloc with kmem.
 1.101  28-Apr-2008  martin branches: 1.101.2; 1.101.6;
Remove clause 3 and 4 from TNF licenses
 1.100  27-Mar-2008  ad branches: 1.100.2; 1.100.4;
Replace use of CACHE_LINE_SIZE in some obvious places.
 1.99  21-Mar-2008  ad Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.98  01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.97  29-Feb-2008  yamt fix a livelock with multiple readers by separating condvar.
 1.96  23-Feb-2008  chris Add missing pmap_update(pmap_kernel()); calls after pmap_kenter_pa and
pmap_remove.
 1.95  28-Jan-2008  ad branches: 1.95.2; 1.95.6;
- Update global counters using atomics before allocating. When freeing,
update the couters afterwards.
- Cosmetic / code generation changes.
 1.94  04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.93  02-Jan-2008  yamt remove PIPE_WANTW, PIPE_WANTR and PIPE_WANTCLOSE. cv_waiters is enough.
this fixes a deadlock between pipe_direct_write and pipeclose.

XXX this code should be simplified.
it's mostly pointless to have two struct pipes linked together,
esp. when we don't support bi-directional pipes.
 1.92  28-Dec-2007  ad Pull up 1.87.2.8.
 1.91  27-Dec-2007  ad pipe_direct_write: kill a mutex_exit() that escaped.
 1.90  26-Dec-2007  ad Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.
 1.89  20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.88  05-Dec-2007  pooka branches: 1.88.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.87  07-Nov-2007  ad branches: 1.87.2;
Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.86  25-Sep-2007  ad branches: 1.86.2; 1.86.4;
Use selinit() / seldestroy().
 1.85  09-Jul-2007  ad branches: 1.85.6; 1.85.8; 1.85.10;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.84  26-Mar-2007  hubertf Remove duplicate #include's
From: Slava Semushin <php-coder@altlinux.ru>
 1.83  23-Mar-2007  ad Fix a deadlock w/kqueue that was introduced with the last set of changes.
Spotted by yamt@.
 1.82  12-Mar-2007  ad branches: 1.82.2;
Put a lock around pipe->pipe_peer.
 1.81  12-Mar-2007  ad branches: 1.81.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.80  12-Mar-2007  ad Use mutexes & condvars.
 1.79  04-Mar-2007  christos branches: 1.79.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.78  09-Feb-2007  ad branches: 1.78.2;
Merge newlock2 to head.
 1.77  01-Nov-2006  yamt remove some __unused from function parameters.
 1.76  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.75  23-Sep-2006  xtraeme Remove duplicated includes, from Jeff Ito -> PR kern/26113. Thanks.
 1.74  23-Jul-2006  ad branches: 1.74.4; 1.74.6;
Use the LWP cached credentials where sane.
 1.73  07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.72  14-May-2006  elad branches: 1.72.2;
integrate kauth.
 1.71  01-Mar-2006  yamt branches: 1.71.2; 1.71.4; 1.71.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.70  24-Dec-2005  perry branches: 1.70.2; 1.70.4; 1.70.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.69  11-Dec-2005  christos merge ktrace-lwp.
 1.68  07-Dec-2005  thorpej Use ANSI function decls.
 1.67  29-Oct-2005  yamt just use ltsleep rather than lockmgr + PCATCH with horrible timeout dance.
 1.66  11-Sep-2005  christos branches: 1.66.2;
PR/27185: Christian Biere: kqueue: EOF on pipe gains no EVFILT_READ event
Set the PIPE_EOF flag before we wakeup() our peer. While here GC unused
argument from pipeselwakeup() and call it even when fp == NULL.
 1.65  01-Apr-2005  yamt branches: 1.65.2;
merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.64  12-Mar-2005  christos branches: 1.64.2;
PR/29679: Gunnar.Ritter: fstat() blksize on the write side of the pipe returns
0. Fix it by returning the peer's block size.
XXX: This is the minimal fix. Probably the buffer size should be initialized
somewhere else, but probably this would need some more code changes.
 1.63  26-Feb-2005  perry nuke trailing whitespace
 1.62  30-Nov-2004  christos branches: 1.62.4; 1.62.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
 1.61  21-Nov-2004  yamt pipe_direct_write: fallback to non-loan write in the case of
any errors from uvm_loan(), rather than only for ENOMEM, which is
never returned by uvm_loan().
 1.60  14-Nov-2004  atatat Wrap TIMEVAL_TO_TIMESPEC and TIMESPEC_TO_TIMEVAL macros in

do { ... } while(/*CONSTCOND*/0)

so that they can be used unadorned in if/else blocks, etc. This means
that you now *have* to put a ; at the end of the "call" to these
macros.
 1.59  06-Nov-2004  wrstuden Add support for FIONWRITE and FIONSPACE ioctls. FIONWRITE reports
the number of bytes in the send queue, and FIONSPACE reports the
number of free bytes in the send queue. These ioctls permit applications
to monitor file descriptor transmission dynamics.

In examining prior art, FIONWRITE exists with the semantics given
here. FIONSPACE is provided so that programs may easily determine how
much space is left in the send queue; they do not need to know the
send queue size.

The fact that a write may block even if there is enough space in the
send queue for it is noted in the documentation.

FIONWRITE functionality may be used to implement TIOCOUTQ for Linux
emulation - Linux extended this ioctl to sockets, even though they are
not ttys.
 1.58  17-Jul-2004  mycroft PRIBIO -> PSOCK. This emulates the pre-sys_pipe behavior, and avoids including
processes blocked on pipe I/O in the load average.
 1.57  25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.56  08-Apr-2004  atatat Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.
 1.55  24-Mar-2004  pooka branches: 1.55.2;
* replace incorrect M_WAITOK flag from pool_get() by proper PR_WAITOK
and remove redundant check for NULL return value
* switch pool page allocator to nointr allocator

jdolecek sayeth ok
 1.54  24-Mar-2004  atatat Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.53  03-Mar-2004  dsl No need to initialise [rw]pipe twice.
Initialise locks before trying to allocate pipe buffer, when allocate
fails we'll not explode trying to acquire the locks when tidying up.
 1.52  03-Mar-2004  christos initialize rpipe and wpipe to NULL, so that they are initialized in the
error path.
 1.51  26-Feb-2004  jdolecek pipelock() must release the pipe simplelock during tsleep()
fixes PR kern/24551 by Havard Eidnes
 1.50  24-Feb-2004  christos remove error(1) comment.
 1.49  24-Feb-2004  wiz occured -> occurred. From Peter Postma.
 1.48  22-Feb-2004  jdolecek use the new NOTE_SUBMIT to flag if the locking is necessary
for EVFILT_READ/EVFILT_WRITE knotes

fixes PR kern/23915 by Martin Husemann (pipes), and similar locking problem
in tty code
 1.47  04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.46  13-Nov-2003  yamt plug memory leak on error.
 1.45  25-Oct-2003  christos fix uninitialized variable
 1.44  22-Sep-2003  christos - pass signo to fownsignal [ok by jd]
- make urg signal handling use fownsignal
- remove out of band detection in sowakeup
 1.43  21-Sep-2003  jdolecek cleanup & uniform descriptor owner handling:
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl

change discussed on tech-kern@
 1.42  14-Sep-2003  christos ksiginfo_t support.
 1.41  11-Aug-2003  pk Workaround to prevent a lockup in pipelock() in the case that signals are
pending while we must wait for the lock.
 1.40  29-Jun-2003  fvdl branches: 1.40.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.39  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.38  21-Mar-2003  dsl Change 'data' argument to fo_ioctl and fo_fcntl from 'caddr_t' to 'void *'.
Avoids a lot of casting and removes the need for some line breaks.
Removed a load of (caddr_t) casts from calls to copyin/copyout as well.
(approved by christos - he has a plan to remove caddr_t...)
 1.37  12-Mar-2003  dsl Validate that pgid argument to TIOCSPGRP in part of current session.
Treat +ve numbers as process group ids and -ve as pids (see tcsetpgrp() in part of current session.
Treat +ve numbers as process group ids and -ve as pids - see tcsetpgrp(3).
(approved by christos)
 1.36  14-Feb-2003  pk On pipe reads, check for EOF before FNONBLOCK to avoid spurious EAGAIN errors.
 1.35  12-Feb-2003  pk Make the pipe code mostly MP-safe. There are a few unaddressed race
conditions at points where it's necessary to access both the up-stream
and down-stream parts of the bi-directional pipe data structure. These
are marked `XXXSMP' in the code.

Also, since the changes are pretty invasive, there little point in keeping
all the "#ifdef FreeBSD" code around; so all of that has been stripped out.
 1.34  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.33  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.32  05-Dec-2002  jdolecek pipe_stat(): add S_IRUSR and S_IWUSR to mode; this is what Linux does,
and seems like generally sensible (more sensible than not doing so), so done
in generic code rather than compat glue only

Change proposed in PR kern/18767 by Emmanuel Dreyfus.
 1.31  26-Nov-2002  christos si_ -> sel_ to avoid conflicts with siginfo.
 1.30  02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.29  01-Nov-2002  kristerw ISO C requires a statement after a label.
 1.28  01-Nov-2002  jdolecek pipe_read(): initialize ocnt before pipelock() call; it might have been
used unitialized when the pipelock() call would fail
bug found by Krister Walfridsson
 1.27  23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.26  25-Aug-2002  thorpej Fix signed/unsigned comparison warnings from GCC 3.3.
 1.25  17-Mar-2002  atatat branches: 1.25.4;
Convert ioctl code to use EPASSTHROUGH instead of -1 or ENOTTY for
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
 1.24  13-Mar-2002  jdolecek Merge the update to FreeBSD rev 1.95.
Changes:
* MP locking changes (mostly FreeBSD specific)
XXXSMP the MP locking macros are noops on NetBSD for now
* kevent fix (FreeBSD rev. 1.87): when the last reader/writer
disconnects, ensure that anybody who is waiting for the kevent
on the other end of the pipe gets EV_EOF
* kill __P
 1.23  08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.22  28-Feb-2002  thorpej Don't assign NULL to non-pointer variables.
 1.21  18-Dec-2001  chs unmap loaned pages before dropping the loan. some pmaps aren't
expecting pmap_kenter_pa() to be used to replace an existing mapping,
plus it just seems like a bad idea to keep around mappings of pages
that may be freed and reused.
 1.20  11-Dec-2001  jdolecek fix typo in #ifdef __FreeBSD__
Pointed out by Chris Jepeway in private e-mail, thanks!
 1.19  12-Nov-2001  lukem add RCSIDs
 1.18  06-Nov-2001  chs use pmap_kenter_pa() instead of pmap_enter(), this is required for
pages loaned to the kernel. this implies that we also need to
call pmap_kremove() before uvm_km_free().

other general cleanup: remove argument names from prototypes,
rename some variables, etc.
 1.17  28-Oct-2001  jdolecek Avoid using microtime(9) for atime/mtime, we don't need to have it
THAT accurate and microtime(9) is painlessly slow on i386 currently.
This speeds up small transfers much. The gain for large transfers
is less significant, but notable too.
Bottleneck was found by Andreas Persson (Re: kern/14246).

Performance improvement with PIII on 661 Mhz according to hbench (with
PIPE_MINDIRECT=8192):

buffersize before after
512 17 49
1024 33 110
2048 52 143
4096 77 163
8192 142 190
64K 577 662
128K 372 392
 1.16  08-Oct-2001  mycroft branches: 1.16.2;
When a pipe was grown to BIG_PIPE_SIZE, we could get in a select()/write() loop
because pipe_poll() and pipe_write() did not agree on when it was okay to write
more data. Fix pipe_write(), since it seems to be the broken one.
 1.15  29-Sep-2001  jdolecek Update the uio resid counts appropriately when any error occurs
(not just EPIPE), so that the higher-level code would note partial
write has happened and DTRT if the write was interrupted due to
e.g. delivery of signal.

This fixes kern/14087 by Frank van der Linden.
Much thanks to Frank for extensive help with debugging this, and review
of the fix.

Note: EPIPE/SIGPIPE delivery behaviour was retained - they're delivered
even if the write was partially successful.
 1.14  25-Sep-2001  jdolecek Take care to transfer whole buffer passed via write(2); write(2) should
not do short writes unless when using non-blocking I/O.
This fixes kern/13744 by Geoff C. Wing.

Note this partially undoes rev. 1.5 change. Upon closer examination,
it's been apparent that hbench-OS expectations were not actually justified.
 1.13  22-Sep-2001  jdolecek add new UVM_LOAN_WIRED flag - the memory pages loaned in TOPAGE case
are only wired if this flag is present (i.e. they are not wired by default now)
loaned pages are unloaned via new uvm_unloan(), uvm_unloananon() and
uvm_unloanpage() are no longer exported
adjust uvm_unloanpage() to unwire the pages if UVM_LOAN_WIRED is specified
mark uvm_loanuobj() and uvm_loanzero() static also in function implementation

kern/sys_pipe.c: uvm_unloanpage() --> uvm_unloan()
 1.12  20-Sep-2001  jdolecek call pmap_update() after pmap_enter()s
ALWAYS call uvm_unloanpage() in cleanup - it's necessary even
in pipe_loan_free() case, since uvm_km_free() doesn't seem
to implicitly unloan the loaned pages
 1.11  26-Jul-2001  jdolecek branches: 1.11.2;
pipe_create(): explicitly zero whole memory returned from pool_get(), instead
of some selective pieces. This fixes problem with NEW_PIPE in kernels
with DEBUG option, reported via e-mail by Chuck Silvers.

sys_pipe(): g/c fdp, provide it at the chunk of FreeBSD code where it's used
 1.10  18-Jul-2001  thorpej bcopy -> memcpy
 1.9  18-Jul-2001  thorpej bzero -> memset
 1.8  17-Jul-2001  jdolecek comment police
 1.7  17-Jul-2001  jdolecek fix bogus uio->uio_offset check introduced in rev. 1.5, which effectively
disabled loans for writes (a.k.a "direct write"), oops; use uio->uio_resid
for the check instead

don't bother updating uio->uio_offset in pipe_direct_write(), it's not used
by upper layers anyway
 1.6  17-Jul-2001  jdolecek only allocate buffer kva for the end which needs it
 1.5  02-Jul-2001  jdolecek branches: 1.5.2;
Don't try to be too smart about chunking - if the data size is bigger
than PIPE_CHUNK_SIZE, just transfer first PIPE_CHUNK_SIZE and return short
write, expecting the caller to call us again later (if they need). Previous
behaviour (besides being wrong for O_NONBLOCK reads) hung hbench under some
circumstances and other applications may have similar expectations as hbench.
This might also fix port-vax/13333 by Manuel Bowyer.

Other changes to pipe_direct_write() include:
* return short write (and success) on EOF if any data were already read;
we return EPIPE on next write(2) call
* simplify error handling, actually handle uvm_loan() failure correctly,
call pipe_loan_free() on error explicitly and only call uvm_unloan()
if the address space was _not_ already freed by pipe_loan_free()
Thanks Chuck Silvers for uvm_unloan() hints :)

Fallthough to common write in pipe_write() if pipe_direct_write()
returns ENOMEM, otherwise always break out immediatelly.
Use uvm_km_valloc_wait() instead uvm_km_valloc() in pipe_loan_alloc().
 1.4  21-Jun-2001  jdolecek branches: 1.4.2;
Don't include opt_new_pipe.h, it's not needed here
 1.3  21-Jun-2001  jdolecek Oops, fell into rpipe/wpipe trap:
The end we want to do selwakeup() on is not necessarily same as the one
we send SIGIO to. Make pipeselwakeup() accept two parameters and update
callers accordingly. This change fixes behaviour for code, which does
select(2)s on the write end waiting for reader (watched on gv, the problem
manifestated itself as a too long delay before the document was displayed).

Clearly separate the resource free code for FreeBSD
and NetBSD case in pipeclose(), so that it's a bit clearer what's going on.
Also LK_DRAIN the lock before the memory is returned to pipe_pool.

Add missing wakeup() in pipe_write() for PIPE_WANTCLOSE case.
 1.2  16-Jun-2001  jdolecek Add port of high performance pipe implementation written by John S. Dyson
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.

Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva

All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.

This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.
 1.1  16-Jun-2001  jdolecek branches: 1.1.1;
Initial revision
 1.1.1.2  13-Mar-2002  jdolecek Import FreeBSD sys_pipe.c rev. 1.95.

Changes:
* MP work (FreeBSD specific)
* kqueue fix (rev. 1.87): Make kevents on pipes work as described in
the manpage - when the last reader/writer disconnects, ensure that anybody
who is waiting for the kevent on the other end of the pipe gets EV_EOF.
* kill __P
 1.1.1.1  16-Jun-2001  jdolecek Import FreeBSD sys_pipe.c rev 1.82 for reference (this was used as a base
for the NetBSD port).
 1.4.2.15  11-Dec-2002  thorpej Sync with HEAD.
 1.4.2.14  11-Nov-2002  nathanw Catch up to -current
 1.4.2.13  27-Aug-2002  nathanw Catch up to -current.
 1.4.2.12  29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.4.2.11  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.4.2.10  28-Feb-2002  nathanw LWPify.
 1.4.2.9  08-Jan-2002  nathanw Catch up to -current.
 1.4.2.8  14-Nov-2001  nathanw Catch up to -current.
 1.4.2.7  22-Oct-2001  nathanw Catch up to -current.
 1.4.2.6  08-Oct-2001  nathanw Catch up to -current.
 1.4.2.5  26-Sep-2001  nathanw Catch up to -current.
Again.
 1.4.2.4  21-Sep-2001  nathanw Catch up to -current.
 1.4.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.4.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.4.2.1  21-Jun-2001  nathanw file sys_pipe.c was added on branch nathanw_sa on 2001-06-21 20:07:02 +0000
 1.5.2.11  29-Sep-2002  jdolecek drop (caddr_t) and (void *) casts for kn_hook
 1.5.2.10  22-Sep-2002  jdolecek improve previous - don't try to do anything if the 'read' end of pipe
is already closed, pipe_peer is NULL in that case
 1.5.2.9  21-Sep-2002  jdolecek filt_pipedetach(): for EVFILT_WRITE, need to detach the knote from the
peer's pipe_sel.si_note
this fixes kernel panic for EVFILT_WRITE when the write pipe descriptor
is closed before read or kqueue one
 1.5.2.8  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.5.2.7  07-Aug-2002  jdolecek pullup fix for problem mentioned in FreeBSD-SA-02:37 - local users could
panic the system using the kqueue mechanism
 1.5.2.6  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.5.2.5  16-Mar-2002  jdolecek Catch up with -current.
 1.5.2.4  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.5.2.3  08-Sep-2001  thorpej Add a selnotify(), which does a selwakeup() + KNOTE(), rather than
requiring all callers to do both.

This may be a transitional step only, or it may stick. I haven't
decided yet.
 1.5.2.2  07-Sep-2001  thorpej Adapt the kqueue support to NetBSD (not yet tested).
 1.5.2.1  03-Aug-2001  lukem update to -current
 1.11.2.2  11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.11.2.1  01-Oct-2001  fvdl Catch up with -current.
 1.16.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.25.4.1  29-Aug-2002  gehenna catch up with -current.
 1.40.2.11  11-Dec-2005  christos Sync with head.
 1.40.2.10  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.40.2.9  01-Apr-2005  skrll Sync with HEAD.
 1.40.2.8  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.40.2.7  18-Dec-2004  skrll Sync with HEAD.
 1.40.2.6  29-Nov-2004  skrll Sync with HEAD.
 1.40.2.5  14-Nov-2004  skrll Sync with HEAD.
 1.40.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.40.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.40.2.2  03-Aug-2004  skrll Sync with HEAD
 1.40.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.55.2.2  23-Jul-2004  tron branches: 1.55.2.2.2;
Pull up revision 1.58 (requested by mycroft in ticket #689):
PRIBIO -> PSOCK. This emulates the pre-sys_pipe behavior, and avoids
including
processes blocked on pipe I/O in the load average.
 1.55.2.1  21-Apr-2004  jmc Pullup rev 1.56 (requested by atatat in ticket #93)

Lots of sysctl descriptions mostly copied from sysctl(3).
 1.55.2.2.2.1  13-Sep-2005  riz Pull up following revision(s) (requested by christos in ticket #5845):
sys/kern/sys_pipe.c: revision 1.66
PR/27185: Christian Biere: kqueue: EOF on pipe gains no EVFILT_READ event
Set the PIPE_EOF flag before we wakeup() our peer. While here GC unused
argument from pipeselwakeup() and call it even when fp == NULL.
 1.62.6.2  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.62.6.1  25-Jan-2005  yamt convert to new apis.
 1.62.4.1  29-Apr-2005  kent sync with -current
 1.64.2.1  14-Sep-2005  tron Pull up following revision(s) (requested by christos in ticket #773):
sys/kern/sys_pipe.c: revision 1.66
PR/27185: Christian Biere: kqueue: EOF on pipe gains no EVFILT_READ event
Set the PIPE_EOF flag before we wakeup() our peer. While here GC unused
argument from pipeselwakeup() and call it even when fp == NULL.
 1.65.2.12  24-Mar-2008  yamt sync with head.
 1.65.2.11  17-Mar-2008  yamt sync with head.
 1.65.2.10  27-Feb-2008  yamt sync with head.
 1.65.2.9  04-Feb-2008  yamt sync with head.
 1.65.2.8  21-Jan-2008  yamt sync with head
 1.65.2.7  07-Dec-2007  yamt sync with head
 1.65.2.6  15-Nov-2007  yamt sync with head.
 1.65.2.5  27-Oct-2007  yamt sync with head.
 1.65.2.4  03-Sep-2007  yamt sync with head.
 1.65.2.3  26-Feb-2007  yamt sync with head.
 1.65.2.2  30-Dec-2006  yamt sync with head.
 1.65.2.1  21-Jun-2006  yamt sync with head.
 1.66.2.1  02-Nov-2005  yamt sync with head.
 1.70.6.3  01-Jun-2006  kardel Sync with head.
 1.70.6.2  22-Apr-2006  simonb Sync with head.
 1.70.6.1  04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.70.4.1  09-Sep-2006  rpaulo sync with head
 1.70.2.1  31-Dec-2005  yamt uio_segflg/uio_lwp -> uio_vmspace.
 1.71.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.71.4.2  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.71.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.71.2.3  11-Aug-2006  yamt sync with head
 1.71.2.2  26-Jun-2006  yamt sync with head.
 1.71.2.1  24-May-2006  yamt sync with head.
 1.72.2.1  19-Jun-2006  chap Sync with head.
 1.74.6.2  10-Dec-2006  yamt sync with head.
 1.74.6.1  22-Oct-2006  yamt sync with head
 1.74.4.2  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.74.4.1  18-Nov-2006  ad Sync with head.
 1.78.2.3  15-Apr-2007  yamt sync with head.
 1.78.2.2  24-Mar-2007  yamt sync with head.
 1.78.2.1  12-Mar-2007  rmind Sync with HEAD.
 1.79.2.6  01-Sep-2007  ad Use pool_cache for allocating a few more types of objects.
 1.79.2.5  30-Aug-2007  ad Add selinit() and seldestroy(), and use with pipes. Occasionally an LWP
can still be waiting on a object after it has been destroyed.
 1.79.2.4  28-Apr-2007  ad Fix locking botch.
 1.79.2.3  10-Apr-2007  ad Sync with head.
 1.79.2.2  10-Apr-2007  ad Fix a deadlock against kernel_lock.
 1.79.2.1  13-Mar-2007  ad Sync with head.
 1.81.2.1  11-Jul-2007  mjf Sync with head.
 1.82.2.1  29-Mar-2007  reinoud Pullup to -current
 1.85.10.1  06-Oct-2007  yamt sync with head.
 1.85.8.4  23-Mar-2008  matt sync with HEAD
 1.85.8.3  09-Jan-2008  matt sync with HEAD
 1.85.8.2  08-Nov-2007  matt sync with -HEAD
 1.85.8.1  06-Nov-2007  matt sync with HEAD
 1.85.6.3  09-Dec-2007  jmcneill Sync with HEAD.
 1.85.6.2  11-Nov-2007  joerg Sync with HEAD.
 1.85.6.1  02-Oct-2007  joerg Sync with HEAD.
 1.86.4.4  18-Feb-2008  mjf Sync with HEAD.
 1.86.4.3  27-Dec-2007  mjf Sync with HEAD.
 1.86.4.2  08-Dec-2007  mjf Sync with HEAD.
 1.86.4.1  19-Nov-2007  mjf Sync with HEAD.
 1.86.2.1  13-Nov-2007  bouyer Sync with HEAD
 1.87.2.8  28-Dec-2007  ad pipe_kqfilter, filt_pipedetach: fix a NULL deref.
 1.87.2.7  27-Dec-2007  ad Pull up 1.91.
 1.87.2.6  26-Dec-2007  ad Sync with head.
 1.87.2.5  21-Dec-2007  ad A couple of missing calls to mutex_enter().
 1.87.2.4  18-Dec-2007  ad Fix problems with the locking and simplify a bit.
 1.87.2.3  15-Dec-2007  ad Use atomic ops to maintain global counters.
 1.87.2.2  15-Dec-2007  ad Share a single mutex between both ends of the pipe and remove all the
crappy code that deals with locking in the wrong direction.
 1.87.2.1  08-Dec-2007  ad Sync with head.
 1.88.4.2  08-Jan-2008  bouyer Sync with HEAD
 1.88.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.95.6.3  28-Sep-2008  mjf Sync with HEAD.
 1.95.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.95.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.95.2.1  24-Mar-2008  keiichi sync with head.
 1.100.4.6  09-Oct-2010  yamt sync with head
 1.100.4.5  11-Mar-2010  yamt sync with head
 1.100.4.4  16-Sep-2009  yamt sync with head
 1.100.4.3  18-Jul-2009  yamt sync with head.
 1.100.4.2  04-May-2009  yamt sync with head.
 1.100.4.1  16-May-2008  yamt sync with head.
 1.100.2.1  18-May-2008  yamt sync with head.
 1.101.6.1  19-Oct-2008  haad Sync with HEAD.
 1.101.2.4  24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.101.2.3  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.101.2.2  14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.101.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.103.4.5  04-Apr-2009  snj branches: 1.103.4.5.4;
Pull up following revision(s) (requested by ad in ticket #661):
sys/arch/xen/xen/xenevt.c: revision 1.32
sys/compat/svr4/svr4_net.c: revision 1.56
sys/compat/svr4_32/svr4_32_net.c: revision 1.19
sys/dev/dmover/dmover_io.c: revision 1.32
sys/dev/putter/putter.c: revision 1.21
sys/kern/kern_descrip.c: revision 1.190
sys/kern/kern_drvctl.c: revision 1.23
sys/kern/kern_event.c: revision 1.64
sys/kern/sys_mqueue.c: revision 1.14
sys/kern/sys_pipe.c: revision 1.109
sys/kern/sys_socket.c: revision 1.59
sys/kern/uipc_syscalls.c: revision 1.136
sys/kern/vfs_vnops.c: revision 1.164
sys/kern/uipc_socket.c: revision 1.188
sys/net/bpf.c: revision 1.144
sys/net/if_tap.c: revision 1.55
sys/opencrypto/cryptodev.c: revision 1.47
sys/sys/file.h: revision 1.67
sys/sys/param.h: patch
sys/sys/socketvar.h: revision 1.119
Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
 1.103.4.4  24-Feb-2009  snj Pull up following revision(s) (requested by enami/joerg in ticket #468):
sys/kern/sys_pipe.c: revision 1.108
The knote objects attached by peer will still be linked in our list
if we are closed before the peer. So, remove them. It didn't matter
when pipe objects are directly returned to pool, but nowadays they
are cached.
 1.103.4.3  24-Feb-2009  snj Pull up following revision(s) (requested by enami/joerg in ticket #468):
sys/kern/sys_pipe.c: revision 1.107
Instead of missing NULL check in pipe_create, let the pipe_ctor to wait
on buffer allocation. The other allocation is simply an optimization,
so leave it as is.
 1.103.4.2  24-Feb-2009  snj Pull up following revision(s) (requested by enami/joerg in ticket #468):
sys/kern/sys_pipe.c: revision 1.106
sys/sys/pipe.h: revision 1.25
Apply pipe patch posted to tech-kern, slightly updated:
- Cache kva.
- Convert to use mutex_obj_alloc().
- Make better use of pool_cache.
Also:
Disable direct transfers for the moment. I believe there may be a bug that
can cause transfers to stall when switching between direct/buffered access.
I think this has most recently been run into on 'denver' but I have seen it
as far back as 3.1.
(As an aside, direct is a not a clear win on modern systems with large cache
and high TLB invalidation overhead. Particularly so on MP systems, although
micro benchmarks may report otherwise because they typically do not tax the
system. Anyone want to write a decent benchmark?)
 1.103.4.1  02-Feb-2009  snj Pull up following revision(s) (requested by yamt in ticket #394):
sys/kern/sys_pipe.c: revision 1.105
fix inverted POLL_ directions.
 1.103.4.5.4.1  25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.103.2.2  28-Apr-2009  skrll Sync with HEAD.
 1.103.2.1  03-Mar-2009  skrll Sync with HEAD.
 1.107.2.2  23-Jul-2009  jym Sync with HEAD.
 1.107.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.127.4.2  21-Apr-2011  rmind sync with head
 1.127.4.1  05-Mar-2011  rmind sync with head
 1.127.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.128.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.134.6.2  02-Jun-2012  mrg sync to latest -current.
 1.134.6.1  18-Feb-2012  mrg merge to -current.
 1.134.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.134.2.2  23-May-2012  yamt sync with head.
 1.134.2.1  17-Apr-2012  yamt sync with head
 1.135.2.1  19-May-2012  riz Pull up following revision(s) (requested by martin in ticket #270):
sys/kern/sys_pipe.c: revision 1.136
tests/lib/libc/sys/t_pipe2.c: revision 1.4
tests/lib/libc/sys/t_pipe2.c: revision 1.5
tests/lib/libc/sys/t_pipe2.c: revision 1.6
tests/lib/libc/sys/t_pipe2.c: revision 1.7
Make sure we can deliver two file descriptors for pipe2() before we set
up anything special (like close on exec).
Fixes PR kern/46457.
Add a case for PR kern/46457. This is skipped for the time being, as it
reproduces the panic described in the PR.
Enable the test for PR kern/46457 now that it does not crash the
kernel any more.
Fix typo in comment.
Simplify the test for PR kern/4645 and make it independend of resource
settings.
 1.136.2.2  03-Dec-2017  jdolecek update from HEAD
 1.136.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.137.2.1  18-May-2014  rmind sync with head
 1.138.4.1  01-May-2019  martin Pull up following revision(s) (requested by mlelstv in ticket #1692):

sys/kern/sys_pipe.c: revision 1.147
sys/kern/sys_pipe.c: revision 1.148

Clean up pipe structure before recycling it.

Handle half-closed pipes in FIONWRITE and FIONSPACE.
 1.140.12.2  08-Oct-2021  martin Pull up following revision(s) (requested by hannken in ticket #1698):

sys/kern/sys_pipe.c: revision 1.157

Fix a deadlock where one thread writes to a pipe, has more data
and no space in the pipe and waits on "pipe_wcv" while the reader
is closing the pipe and waits on "pipe_draincv".

Swap the test for "PIPE_EOF" and the "cv_wait_sig()" in "pipe_write()".

PR bin/56422 "zgrep -l sometimes hangs"
 1.140.12.1  01-May-2019  martin Pull up following revision(s) (requested by mlelstv in ticket #1253):

sys/kern/sys_pipe.c: revision 1.147
sys/kern/sys_pipe.c: revision 1.148

Clean up pipe structure before recycling it.

Handle half-closed pipes in FIONWRITE and FIONSPACE.
 1.143.2.3  25-Jun-2018  pgoyette Sync with HEAD
 1.143.2.2  21-May-2018  pgoyette Sync with HEAD
 1.143.2.1  22-Apr-2018  pgoyette Sync with HEAD
 1.146.2.1  10-Jun-2019  christos Sync with HEAD
 1.148.2.1  08-Oct-2021  martin Pull up following revision(s) (requested by hannken in ticket #1357):

sys/kern/sys_pipe.c: revision 1.157

Fix a deadlock where one thread writes to a pipe, has more data
and no space in the pipe and waits on "pipe_wcv" while the reader
is closing the pipe and waits on "pipe_draincv".

Swap the test for "PIPE_EOF" and the "cv_wait_sig()" in "pipe_write()".

PR bin/56422 "zgrep -l sometimes hangs"
 1.150.2.2  03-Apr-2021  thorpej Sync with HEAD.
 1.150.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.167.2.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed