Home | History | Annotate | Download | only in fdesc
History log of /src/sys/miscfs/fdesc/fdesc_vnops.c
RevisionDateAuthorComments
 1.140  27-Mar-2022  christos dedup the eofs link/symlink methods
 1.139  15-Jan-2022  riastradh sys/fs/fdesc: Delete silly vnop #define aliases.
 1.138  29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.137  29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.136  28-Jun-2021  chs VOP_BMAP() may be called via ioctl(FIOGETBMAP) on any vnode that applications
can open. change various pseudo-fs *_bmap methods return an error instead of
panic.

Reported-by: syzbot+8289a3eaf2ba60958c87@syzkaller.appspotmail.com
 1.135  01-May-2021  hannken Make sure fdesc_lookup() never returns VNON vnodes.

Should fix PR kern/56130 (fdescfs create nodes with wrong major number)
 1.134  27-Jun-2020  christos branches: 1.134.6;
Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.133  16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.132  01-Feb-2020  riastradh Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.
 1.131  02-Jan-2020  thorpej branches: 1.131.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.
 1.130  03-Sep-2018  riastradh branches: 1.130.4;
Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.129  26-May-2017  riastradh branches: 1.129.8; 1.129.10;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.128  11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.127  20-Aug-2016  hannken branches: 1.127.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.126  20-Apr-2015  riastradh branches: 1.126.2;
Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.125  05-Sep-2014  christos branches: 1.125.2;
The comment about toxicity was correct, restore VNON setting code and
then set the proper type in lookup.
 1.124  05-Sep-2014  matt Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.123  04-Sep-2014  christos remove debugging.
 1.122  04-Sep-2014  christos Well, nasty thing happen if you set /dev/tty to VNON too. Disable for now.
 1.121  25-Jul-2014  dholland branches: 1.121.2;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.120  13-Jul-2014  hannken Change fdesc from hashlist to vcache.
 1.119  20-Mar-2014  christos branches: 1.119.2;
kill sprintf
 1.118  27-Feb-2014  hannken The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.117  07-Feb-2014  hannken Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.116  23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.115  17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.114  16-Oct-2011  hannken branches: 1.114.2; 1.114.12; 1.114.16;
VOP_GETATTR() needs a shared lock at least.
 1.113  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.112  21-Jul-2010  hannken branches: 1.112.6;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.111  16-Jul-2010  hannken Use a kmutex to protect the hash chains and always take this mutex
before removing a node from the hash chain.

Release the hash list lock before calling getnewvnode() and check the
hash list again like other file systems do.

Take v_interlock before calling vget().
 1.110  24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.109  08-Jan-2010  pooka branches: 1.109.2; 1.109.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.108  31-Jul-2009  pooka Do a name-based search for the ctty major instead of requiring an
external symbol.
 1.107  24-May-2009  ad More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
 1.106  15-Mar-2009  cegger ansify function definitions
 1.105  14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.104  17-Dec-2008  cegger branches: 1.104.2;
kill MALLOC and FREE macros.
 1.103  05-May-2008  ad branches: 1.103.8;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.102  21-Mar-2008  ad branches: 1.102.2; 1.102.4;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.101  08-Dec-2007  pooka branches: 1.101.12;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.100  26-Nov-2007  pooka branches: 1.100.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.99  08-Oct-2007  ad branches: 1.99.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.98  27-Jul-2007  pooka branches: 1.98.4; 1.98.6; 1.98.8; 1.98.10;
whoops, forgot to commit this a while back: initialize new vnode size
 1.97  09-Jul-2007  ad branches: 1.97.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.96  09-Feb-2007  ad branches: 1.96.6; 1.96.8;
Merge newlock2 to head.
 1.95  09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.94  16-Nov-2006  christos branches: 1.94.2;
__unused removal on arguments; approved by core.
 1.93  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.92  14-May-2006  elad branches: 1.92.8; 1.92.10;
integrate kauth.
 1.91  04-Apr-2006  christos Coverity CID 1140: NULL dereference cannot happen, but protect against it.
 1.90  01-Mar-2006  yamt branches: 1.90.2; 1.90.4; 1.90.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.89  11-Dec-2005  christos branches: 1.89.2; 1.89.4; 1.89.6;
merge ktrace-lwp.
 1.88  02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.87  14-Sep-2005  christos branches: 1.87.2;
When readdir() is called from vfs_getcwd, uio->uio_procp is NULL. Deal with
that. Fixes 'cd /dev/fd && pwd'
 1.86  30-Aug-2005  xtraeme Remove __P()
 1.85  19-Aug-2005  christos 64 bit inode changes.
 1.84  29-May-2005  christos branches: 1.84.2;
- sprinkle const
- avoid shadowed variables.
 1.83  26-Feb-2005  perry nuke trailing whitespace
 1.82  30-Nov-2004  christos branches: 1.82.4; 1.82.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
 1.81  27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.80  21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.79  13-Sep-2003  jdolecek move dupfd from struct proc to struct lwp - it's per-LWP, not per-process; we
use curlwp where the lwp is not directly available, i.e. in device open
routines

briefly discussed on tech-kern
 1.78  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.77  29-Jun-2003  fvdl branches: 1.77.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.76  29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.75  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.74  10-Apr-2003  jdolecek use former genfs_eopnotsupp_rele() as genfs_eopnotsupp(), so that vnodes
are vput()/vrele()d as necessary - some filesystems did use the wrong
one for some ops, and it's just safer to not take the chance

based on suggestion by Bill Studenmund
 1.73  23-Feb-2003  pk Make updating a file's reference and use count MP-safe.
 1.72  23-Feb-2003  simonb Remove assigned-to but not used variable.
 1.71  23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.70  06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.69  02-Apr-2002  jdolecek branches: 1.69.2;
Changes to make it less likely to need to be revisited later again:
* fdesc_attr(): don't panic for 'unknown' descriptor types, rather use
(*fp->f_ops->fo_stat)() hook, as for DTYPE_SOCKET and DTYPE_PIPE
XXX perhaps use different vnode type than VBAD for these?
* fdesc_setattr(): just return 0 regardless of type, rather than paniccing
for 'unknown' descriptor types
 1.68  02-Apr-2002  jmc Treat pipes like sockets and don't do setattr on them
 1.67  06-Dec-2001  chs add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.66  15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.65  10-Nov-2001  lukem add RCSIDs
 1.64  16-Jun-2001  jdolecek branches: 1.64.2; 1.64.4; 1.64.6;
Add DTYPE_PIPE (to be used by new pipe implementation) and handle
it accordingly.
 1.63  14-Jun-2001  thorpej Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
 1.62  09-Apr-2001  jdolecek Change the first arg to fileops fo_stat routine to struct file *, adjust
callers and appropriate routines to cope. This makes fo_stat more
consistent with rest of fileops routines and also makes the fo_stat
match FreeBSD as an added bonus.
Discussed with Luke Mewburn on tech-kern@.
 1.61  09-Apr-2001  jdolecek Call file descriptor stat function via (*fp->f_ops->fo_stat) instead
of a switch statement and explicit call.
Sprinkle some FILE_USE()/FILE_UNUSE() as appropriate.
 1.60  07-Apr-2001  jdolecek Adapt to struct fileops, soo_stat() changes.
Pointed out by Bernd Ernesti in private mail.
 1.59  06-Mar-2001  jmc XXX: Temporary work around to fdesc truncating files when it shouldn't. Treat
setattr calls on underlying vnodes the same as sockets and just return 0.

This whole thing needs to be gutted and replaced with either fall throughs
to specfs (the attr forwarding is just bizarre and leads to weird crap like
the above truncation problems), or better yet a real cloning device node.
 1.58  22-Jan-2001  jdolecek branches: 1.58.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.57  08-Nov-2000  ad Update for hashinit() change.
 1.56  03-Aug-2000  thorpej MALLOC()/FREE() are not to be used for variable sized allocations.
 1.55  27-May-2000  thorpej sleep() -> tsleep()
 1.54  16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.53  25-Aug-1999  sommerfeld branches: 1.53.2; 1.53.8;
Change variable used for directory offset from "int" to "off_t".
Overkill, but avoids a host of truncation problems.
 1.52  24-Aug-1999  sommerfeld Fix PR8270:

Problem turned out to be due to improper handling of reads beyond EOF:
they should just return without error with the uio unchanged, and the
caller will recognize this as a zero-byte return (EOF).

The previous fix to protect directory reads against bogus uio_offset
values returned EINVAL, which broke mount -o union, which only
union'ed in the lower directory if the upper directory cleanly
returned EOF.

While we're here, protect kernfs as well.
 1.51  14-Aug-1999  christos protect against large uio_offsets
 1.50  03-Aug-1999  wrstuden Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.49  19-Jul-1999  thorpej From Bill Studenmund: unlock the fdescfs "/dev/tty" vnode before calling
cttyread()/cttywrite(), and lock it again when it returns.

Squashes the somewhat bizarre lossage I was observing w/ more(1), sudo(1),
etc.
 1.48  08-Jul-1999  wrstuden Bump osrelease to 1.4E. Add layerfs files, remove null_subr.c.

Update coda to new struct lock in struct vnode.

make fdescfs, kernfs, portalfs, and procfs actually lock their vnodes.
It's not that hard.

Make unionfs set v_vnlock = NULL so any overlayed fs will call its
VOP_LOCK.
 1.47  13-Aug-1998  kleink branches: 1.47.6; 1.47.8;
Per POSIX, fail with EINVAL if advisory locking is attempted on a file type
that doesn't support it, rather than using a homegrown EBADF or EOPNOTSUPP.
 1.46  09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.45  03-Aug-1998  kleink Recognize _PC_SYNC_IO.
 1.44  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.43  07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.42  10-Oct-1997  fvdl Bump last argument to VOP_READDIR to off_t (from u_long).
 1.41  05-May-1997  mycroft branches: 1.41.4;
Eliminate bogus uses of V{READ,WRITE,EXEC}. Use S_I[RWX]{USR,GRP,OTH} where
appropriate.
 1.40  16-Apr-1997  fvdl fdesc_seek -> genfs_seek, not genfs_badop
 1.39  11-Apr-1997  kleink Implement a POSIX compliant genfs VOP_SEEK() and use it in the appropriate
places; by Chris G. Demetriou and myself.
 1.38  25-Oct-1996  cgd define path name string variables that we should not (and, thankfully, do
not) modify as 'const char *' rather 'char *'.
 1.37  13-Oct-1996  christos backout previous kprintf changes
 1.36  10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.35  07-Sep-1996  mycroft Implement poll(2).
 1.34  01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.33  14-Jun-1996  mrg use VATTR_NULL macro.
 1.32  11-Apr-1996  mrg fix long-time bug in fdesc -- /dev/tty was a named pipe rather than a
mirror image of the real /dev/tty, a char dev. make it a char dev.
 1.31  13-Feb-1996  mycroft GC *_nullop(). Minor nits.
 1.30  09-Feb-1996  christos miscfs prototype changes
 1.29  09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.28  01-Feb-1996  jtc Rename struct timespec fields to conform to POSIX.1b
 1.27  09-Oct-1995  mycroft /dev/std* are of type DT_LNK.
 1.26  09-Oct-1995  mycroft Use the index number as the cookie, rather than multiplying by UIO_MX.
 1.25  09-Oct-1995  mycroft Add support for cookies, mostly from Greg Hudson.
 1.24  14-Dec-1994  mycroft Remove a_fp.
 1.23  14-Dec-1994  mycroft Revert dup handling.
 1.22  13-Dec-1994  mycroft Sync with CSRG.
 1.21  04-Dec-1994  mycroft Use fddupopen(), just like fdopen() does.
 1.20  14-Nov-1994  christos fixed struct comment
 1.19  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.18  20-Oct-1994  cgd update for new syscall args description mechanism
 1.17  19-Aug-1994  mycroft Convert hash tables.
 1.16  14-Jul-1994  mycroft Fix a fencepost error.
 1.15  29-Jun-1994  cgd branches: 1.15.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.14  08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.13  04-May-1994  cgd Rename a lot of process flags.
 1.12  25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.11  09-Jan-1994  ws Note that NFS mounting of fdesc doesn't make sense
 1.10  05-Jan-1994  cgd don't try to reclaim 'known' root vnode
 1.9  05-Jan-1994  cgd fix UFS vs 'real' fs type mixups
 1.8  05-Jan-1994  cgd update with latest fdesc file system from jsp@sequent.com
 1.7  23-Dec-1993  cgd fix fdesc_print return type (again)
 1.6  07-Sep-1993  ws branches: 1.6.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.5  02-Aug-1993  mycroft Make fdesc_print have a return type of void.
 1.4  07-Jun-1993  cgd give various filesystems their own vnode types
 1.3  30-Mar-1993  cgd added . and ..
 1.2  25-Mar-1993  cgd changed copyright notice thanks to following statement:

Return-Path: jsp@compnews.co.uk
Received: from ben.uknet.ac.uk by postgres.Berkeley.EDU (5.61/1.29)
id AA25983; Thu, 25 Mar 93 05:37:37 -0800
Received: from fennel.compnews.co.uk by ben.uknet.ac.uk via UKIP with SMTP (PP)
id <g.05640-0@ben.uknet.ac.uk>; Thu, 25 Mar 1993 13:37:19 +0000
Received: from sage.compnews.co.uk by fennel.compnews.co.uk;
Thu, 25 Mar 93 13:37:08 GMT
Message-Id: <28109.9303251337@sage.compnews.co.uk>
From: jsp@compnews.co.uk (Jan-Simon Pendry)
Date: Thu, 25 Mar 1993 13:37:05 +0100
In-Reply-To: cgd@postgres.berkeley.edu's message as of Mar 25, 5:32am.
Phone-Number-1: +44 430 432450
Phone-Number-2: +44 430 432480 x20
Fax-Number: +44 430 432022
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: cgd@postgres.berkeley.edu
Subject: Re: fdesc/kernfs/etc code...

You may put this copyright message on the source code:

/*
* Copyright (c) 1990, 1992 Jan-Simon Pendry
* All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Jan-Simon Pendry.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
 1.1  23-Mar-1993  cgd branches: 1.1.1;
files which implement the fdesc filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.2.3  06-Jan-1994  pk Re-instate EOPNOTSUPP.
 1.6.2.2  28-Dec-1993  pk Use ENODEV rather then EOPNOTSUP for unsupported operations on non-socket devices
 1.6.2.1  14-Nov-1993  mycroft Canonicalize all #includes.
 1.15.2.2  19-Aug-1994  mycroft update from trunk
 1.15.2.1  15-Jul-1994  cgd fix fencepost error. from trunk.
 1.41.4.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.47.8.1  02-Aug-1999  thorpej Update from trunk.
 1.47.6.1  28-Aug-1999  he Pull up revisions 1.51-1.53:
Protect {fdesc,kernfs,procfs}_readdir against directory seeks
with bogus offsets. (sommerfeld)
 1.53.8.1  21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.53.2.5  21-Apr-2001  bouyer Sync with HEAD
 1.53.2.4  12-Mar-2001  bouyer Sync with HEAD.
 1.53.2.3  11-Feb-2001  bouyer Sync with HEAD.
 1.53.2.2  22-Nov-2000  bouyer Sync with HEAD.
 1.53.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.58.2.7  11-Nov-2002  nathanw Catch up to -current
 1.58.2.6  17-Sep-2002  nathanw Catch up to -current.
 1.58.2.5  17-Apr-2002  nathanw Catch up to -current.
 1.58.2.4  08-Jan-2002  nathanw Catch up to -current.
 1.58.2.3  14-Nov-2001  nathanw Catch up to -current.
 1.58.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.58.2.1  09-Apr-2001  nathanw Catch up with -current.
 1.64.6.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.64.4.2  18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.64.4.1  07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.64.2.6  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.64.2.5  28-Sep-2002  jdolecek fdesc_kqfilter(): for Fdesc, invoke kqfilter of the underlying descriptor, and
fallback to genfs_kqfilter() for other files
 1.64.2.4  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.64.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.64.2.2  08-Sep-2001  thorpej Add kqueue support to "/dev/tty".
 1.64.2.1  10-Jul-2001  lukem support DTYPE_KQUEUE
 1.69.2.1  16-May-2002  gehenna Call the device interfaces via the device switch.
Replace the direct-access to devsw table with calling devsw APIs.
 1.77.2.8  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.77.2.7  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.77.2.6  18-Dec-2004  skrll Sync with HEAD.
 1.77.2.5  21-Sep-2004  skrll Fix the sync with head I botched.
 1.77.2.4  18-Sep-2004  skrll Sync with HEAD.
 1.77.2.3  24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.77.2.2  03-Aug-2004  skrll Sync with HEAD
 1.77.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.82.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.82.4.1  29-Apr-2005  kent sync with -current
 1.84.2.8  24-Mar-2008  yamt sync with head.
 1.84.2.7  21-Jan-2008  yamt sync with head
 1.84.2.6  07-Dec-2007  yamt sync with head
 1.84.2.5  27-Oct-2007  yamt sync with head.
 1.84.2.4  03-Sep-2007  yamt sync with head.
 1.84.2.3  26-Feb-2007  yamt sync with head.
 1.84.2.2  30-Dec-2006  yamt sync with head.
 1.84.2.1  21-Jun-2006  yamt sync with head.
 1.87.2.1  20-Oct-2005  yamt adapt fdesc.
 1.89.6.2  01-Jun-2006  kardel Sync with head.
 1.89.6.1  22-Apr-2006  simonb Sync with head.
 1.89.4.1  09-Sep-2006  rpaulo sync with head
 1.89.2.2  18-Feb-2006  yamt fix proc/lwp mismatch in the previous.
 1.89.2.1  18-Feb-2006  yamt adapt the rest of MI code.
 1.90.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.90.4.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.90.4.2  19-Apr-2006  elad sync with head.
 1.90.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.90.2.2  24-May-2006  yamt sync with head.
 1.90.2.1  11-Apr-2006  yamt sync with head
 1.92.10.2  10-Dec-2006  yamt sync with head.
 1.92.10.1  22-Oct-2006  yamt sync with head
 1.92.8.3  12-Jan-2007  ad Sync with head.
 1.92.8.2  18-Nov-2006  ad Sync with head.
 1.92.8.1  17-Nov-2006  ad Checkpoint work in progress.
 1.94.2.1  17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.96.8.1  11-Jul-2007  mjf Sync with head.
 1.96.6.3  20-Aug-2007  ad Sync with HEAD.
 1.96.6.2  13-Apr-2007  ad - Make the devsw interface MP safe, and add some comments.
- Allow individual block/character drivers to be marked MP safe.
- Provide wrappers around the device methods that look up the
device, returning ENXIO if it's not found, and acquire the
kernel lock if needed.
 1.96.6.1  21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.97.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.98.10.2  27-Jul-2007  pooka whoops, forgot to commit this a while back: initialize new vnode size
 1.98.10.1  27-Jul-2007  pooka file fdesc_vnops.c was added on branch matt-mips64 on 2007-07-27 08:38:40 +0000
 1.98.8.1  14-Oct-2007  yamt sync with head.
 1.98.6.2  09-Jan-2008  matt sync with HEAD
 1.98.6.1  06-Nov-2007  matt sync with HEAD
 1.98.4.3  09-Dec-2007  jmcneill Sync with HEAD.
 1.98.4.2  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.98.4.1  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.99.4.2  27-Dec-2007  mjf Sync with HEAD.
 1.99.4.1  08-Dec-2007  mjf Sync with HEAD.
 1.100.2.1  26-Dec-2007  ad Sync with head.
 1.101.12.3  17-Jan-2009  mjf Sync with HEAD.
 1.101.12.2  02-Jun-2008  mjf Sync with HEAD.
 1.101.12.1  03-Apr-2008  mjf Sync with HEAD.
 1.102.4.6  11-Aug-2010  yamt sync with head.
 1.102.4.5  11-Mar-2010  yamt sync with head
 1.102.4.4  19-Aug-2009  yamt sync with head.
 1.102.4.3  20-Jun-2009  yamt sync with head
 1.102.4.2  04-May-2009  yamt sync with head.
 1.102.4.1  16-May-2008  yamt sync with head.
 1.102.2.1  18-May-2008  yamt sync with head.
 1.103.8.2  28-Apr-2009  skrll Sync with HEAD.
 1.103.8.1  19-Jan-2009  skrll Sync with HEAD.
 1.104.2.2  23-Jul-2009  jym Sync with HEAD.
 1.104.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.109.4.3  19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.109.4.2  05-Mar-2011  rmind sync with head
 1.109.4.1  03-Jul-2010  rmind sync with head
 1.109.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.112.6.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.114.16.1  18-May-2014  rmind sync with head
 1.114.12.2  03-Dec-2017  jdolecek update from HEAD
 1.114.12.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.114.2.1  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.119.2.1  10-Aug-2014  tls Rebase.
 1.121.2.1  13-May-2015  snj Pull up following revision(s) (requested by riz in ticket #737):
sys/miscfs/fdesc/fdesc_vnops.c: revision 1.125 via patch
The comment about toxicity was correct, restore VNON setting code and
then set the proper type in lookup.
 1.125.2.3  28-Aug-2017  skrll Sync with HEAD
 1.125.2.2  05-Oct-2016  skrll Sync with HEAD
 1.125.2.1  06-Jun-2015  skrll Sync with HEAD
 1.126.2.1  26-Apr-2017  pgoyette Sync with HEAD
 1.127.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.129.10.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.129.10.1  10-Jun-2019  christos Sync with HEAD
 1.129.8.1  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.130.4.2  20-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/kern/kern_event.c: revision 1.106
sys/kern/sys_select.c: revision 1.51
sys/kern/subr_exec_fd.c: revision 1.10
sys/kern/sys_aio.c: revision 1.46
sys/kern/kern_descrip.c: revision 1.244
sys/kern/kern_descrip.c: revision 1.245
sys/ddb/db_xxx.c: revision 1.72
sys/ddb/db_xxx.c: revision 1.73
sys/miscfs/fdesc/fdesc_vnops.c: revision 1.132
sys/kern/uipc_usrreq.c: revision 1.195
sys/kern/sys_descrip.c: revision 1.36
sys/kern/uipc_usrreq.c: revision 1.196
sys/kern/uipc_socket2.c: revision 1.135
sys/kern/uipc_socket2.c: revision 1.136
sys/kern/kern_sig.c: revision 1.383
sys/kern/kern_sig.c: revision 1.384
sys/compat/netbsd32/netbsd32_ioctl.c: revision 1.107
sys/miscfs/procfs/procfs_vnops.c: revision 1.208
sys/kern/subr_exec_fd.c: revision 1.9
sys/kern/kern_descrip.c: revision 1.252
(all via patch)

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:
- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.

Load struct fdfile::ff_file with atomic_load_consume.
Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)
kern_descrip.c: Fix membars around reference count decrement.

In general, the `last one out hit the lights' style of reference
counting (as opposed to the `whoever's destroying must wait for
pending users to finish' style) requires memory barriers like so:

... usage of resources associated with object ...
membar_release();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_acquire();
... freeing of resources associated with object ...

This way, all usage happens-before all freeing. This fixes several
errors:
- fd_close failed to ensure whatever its caller did would
happen-before the freeing, in the case where another thread is
concurrently trying to close the fd (ff->ff_file == NULL).
Fix: Add membar_release before atomic_dec_uint(&ff->ff_refcnt) in
that branch.
- fd_close failed to ensure all loads its caller had issued will have
happened-before the freeing, in the case where the fd is still in
use by another thread (fdp->fd_refcnt > 1 and ff->ff_refcnt-- > 0).
Fix: Change membar_producer to membar_release before
atomic_dec_uint(&ff->ff_refcnt).
- fd_close failed to ensure that any usage of fp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&ff->ff_refcnt).
- fd_free failed to ensure that any usage of fdp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&fdp->fd_refcnt).

While here, change membar_exit -> membar_release. No semantic
change, just updating away from the legacy API.
 1.130.4.1  03-May-2021  martin Pull up following revision(s) (requested by hannken in ticket #1267):

sys/miscfs/fdesc/fdesc_vnops.c: revision 1.135

Make sure fdesc_lookup() never returns VNON vnodes.
Should fix PR kern/56130 (fdescfs create nodes with wrong major number)
 1.131.2.1  29-Feb-2020  ad Sync with head.
 1.134.6.2  01-Aug-2021  thorpej Sync with HEAD.
 1.134.6.1  13-May-2021  thorpej Sync with HEAD.

RSS XML Feed