Home | History | Annotate | only in /src/sys/miscfs
History log of /src/sys/miscfs
RevisionDateAuthorComments
 1.10 06-May-2015  hannken Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@
 1.9 05-Dec-2009  pooka branches: 1.9.22; 1.9.40;
Remove the portalfs kernel file system driver. Replace mount_portal(8)
with a version based on puffs. User functionality remains the same.
 1.8 11-Dec-2005  christos branches: 1.8.74;
merge ktrace-lwp.
 1.7 11-Nov-2004  jdolecek ptyfs moved from sys/miscfs/ to sys/fs/
 1.6 11-Nov-2004  christos Add ptyfs. This is experimental.
 1.5 16-Mar-2003  jdolecek branches: 1.5.2;
move union filesystem code from sys/miscfs/union to sys/fs/union
 1.4 26-Nov-2002  lukem Remove KDIR=, since SYS_INCLUDE=symlinks and KDIR are not supported any more.
 1.3 09-Sep-2001  assar install miscfs/syncfs/syncfs.h
 1.2 20-Jan-2000  wrstuden branches: 1.2.6; 1.2.8; 1.2.10;
Add overlay, a layered file system which overlays itself on
the underlying fs, rather than exporting it to another part of the
directory name space.
 1.1 12-Jun-1998  cgd branches: 1.1.14;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.10.1 01-Oct-2001  fvdl Catch up with -current.
 1.2.8.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.2.6.2 11-Dec-2002  thorpej Sync with HEAD.
 1.2.6.1 21-Sep-2001  nathanw Catch up to -current.
 1.5.2.1 14-Nov-2004  skrll Sync with HEAD.
 1.8.74.1 11-Mar-2010  yamt sync with head
 1.9.40.1 06-Jun-2015  skrll Sync with HEAD
 1.9.22.1 03-Dec-2017  jdolecek update from HEAD
 1.13 26-Oct-2022  riastradh miscfs/deadfs/deadfs.h: New home for deadfs-related externs.

XXX regen sys/kern/vnode_if.c and the others
 1.12 08-Jul-2022  hannken Make dead vfs ops "vfs_statvfs" and "vfs_vptofh" return EOPNOTSUPP.
Both operations may originate from (possible dead) vnodes.

Reported-by: syzbot+eceb203d44457742be3b@syzkaller.appspotmail.com
 1.11 19-Mar-2022  hannken Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.10 19-Mar-2022  hannken Switch spec_vnodeop vector to real vnode locking, VV_LOCKSWORK now.
 1.9 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.8 21-Aug-2017  hannken branches: 1.8.2; 1.8.4;
No need to cache anonymous device vnodes, they will never be looked up.

Set key to (dead_rootmount, 0, NULL) and add assertions.
 1.7 01-Jul-2015  hannken branches: 1.7.10;
Unfortunately MFS uses v_data of its anonymous device vnode so
it cannot be used as vcache key. Use v_interlock as key ...
 1.6 30-Jun-2015  hannken Redo previous again, v_specnode is invariant but not unique.

Set "vp->v_data = vp" and use v_data as key.
 1.5 29-Jun-2015  hannken Use the address of vp->v_specnode as vcache key. It is invariant
over the lifetime of the vnode.

The previous worked by luck, it took the first sizeof(void *) bytes
of struct vnode as key.

Resolves CID 1308957: wrong sizeof()
 1.4 23-Jun-2015  hannken Add a vfs_newvnode() method to deadfs and use it to create
anonymous device vnodes with bdevvp() and cdevvp().

Implement spec_inactive() and spec_reclaim() to handle these nodes.
 1.3 23-Jun-2015  hannken Use VFS_PROTOS() for deadfs. Rename dead_mount to dead_rootmount.
 1.2 23-Mar-2014  hannken branches: 1.2.4; 1.2.6; 1.2.10; 1.2.12;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.1 27-Feb-2014  hannken Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.
 1.2.12.2 28-Aug-2017  skrll Sync with HEAD
 1.2.12.1 22-Sep-2015  skrll Sync with HEAD
 1.2.10.3 03-Dec-2017  jdolecek update from HEAD
 1.2.10.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.10.1 23-Mar-2014  tls file dead_vfsops.c was added on branch tls-maxphys on 2014-08-20 00:04:30 +0000
 1.2.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.2.6.1 23-Mar-2014  yamt file dead_vfsops.c was added on branch yamt-pagecache on 2014-05-22 11:41:05 +0000
 1.2.4.2 18-May-2014  rmind sync with head
 1.2.4.1 23-Mar-2014  rmind file dead_vfsops.c was added on branch rmind-smpnet on 2014-05-18 17:46:09 +0000
 1.7.10.1 25-Aug-2017  snj Pull up following revision(s) (requested by hannken in ticket #227):
sys/sys/vnode_impl.h: revision 1.16
sys/kern/vfs_vnode.c: revision 1.97
sys/kern/vfs_vnode.c: revision 1.98
sys/kern/vfs_mount.c: revision 1.67
sys/miscfs/deadfs/dead_vfsops.c: revision 1.8
No need to cache anonymous device vnodes, they will never be looked up.
Set key to (dead_rootmount, 0, NULL) and add assertions.
--
Change forced unmount to revert open device vnodes to anonymous devices.
 1.8.4.1 10-Jun-2019  christos Sync with HEAD
 1.8.2.1 18-Jan-2019  pgoyette Synch with HEAD
 1.67 26-Oct-2022  riastradh miscfs/deadfs/deadfs.h: New home for deadfs-related externs.

XXX regen sys/kern/vnode_if.c and the others
 1.66 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.65 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.64 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.63 23-Feb-2020  ad branches: 1.63.10;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.62 20-Feb-2020  riastradh Use vn_bwrite, not genfs_nullop, for VOP_BWRITE.

VOP_BWRITE is responsible for calling biodone; can't just leave it
hanging.

XXX pullup
 1.61 26-Apr-2017  riastradh branches: 1.61.12; 1.61.16; 1.61.18;
Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.60 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.59 20-Apr-2015  riastradh branches: 1.59.2; 1.59.4;
Uncomment the argument struct declarations in deadfs.

We don't actually use them, but this is the only way the vop
versioning mechanism flags code that needs changing.
 1.58 20-Apr-2015  riastradh Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.57 25-Jul-2014  dholland branches: 1.57.4;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.56 27-Feb-2014  hannken branches: 1.56.2;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.55 27-Feb-2014  hannken Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.
 1.54 07-Feb-2014  hannken Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.53 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.52 07-Nov-2013  hannken Add missing operations that unlock or dereference their arguments.

Stop checking for a vnode state change -- dead vnodes never change state.
 1.51 12-Jun-2011  rmind branches: 1.51.2; 1.51.12; 1.51.16;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.50 17-Dec-2010  yamt branches: 1.50.6;
do minimal locking to make assertions like KASSERT(VOP_ISLOCKED(vp)) happy.
 1.49 02-Jul-2010  hannken LK_INTERLOCK is no longer a valid flag for VOP_LOCK().
 1.48 14-Mar-2009  dsl branches: 1.48.2; 1.48.4;
ANSIfy another 1261 function definitions.
The only ones left in sys are beyond by sed script!
(or in sys/dist or sys/external)
Mostly they have function pointer parameters.
 1.47 25-Jan-2008  ad branches: 1.47.10; 1.47.18; 1.47.24;
Remove VOP_LEASE. Discussed on tech-kern.
 1.46 02-Jan-2008  ad Merge vmlocking2 to head.
 1.45 10-Oct-2007  ad branches: 1.45.4; 1.45.6; 1.45.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.44 29-Jul-2007  ad branches: 1.44.4; 1.44.6; 1.44.8; 1.44.10;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.43 10-Dec-2006  pooka branches: 1.43.6; 1.43.14;
Teach deadfs about vm object locking for getpages. This avoids
errors resulting from situations where we take a page fault for a
vnode which has been converted a deadfs vnode.

wrstuden ok
 1.42 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.41 16-Nov-2006  christos branches: 1.41.2;
__unused removal on arguments; approved by core.
 1.40 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.39 14-May-2006  elad branches: 1.39.8; 1.39.10;
integrate kauth.
 1.38 11-Dec-2005  christos branches: 1.38.4; 1.38.6; 1.38.8; 1.38.10; 1.38.12;
merge ktrace-lwp.
 1.37 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.36 30-Aug-2005  xtraeme branches: 1.36.2;
Remove __P()
 1.35 27-Apr-2004  jrf branches: 1.35.12;
First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.34 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.33 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.32 06-Dec-2001  chs branches: 1.32.16;
add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.31 10-Nov-2001  lukem add RCSIDs
 1.30 22-Jan-2001  jdolecek branches: 1.30.2; 1.30.4; 1.30.8;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.29 27-May-2000  thorpej sleep() -> tsleep()
 1.28 30-Mar-2000  augustss Register, begone!
 1.27 12-Dec-1999  sommerfeld Actually nullop is a better idea here
 1.26 08-Dec-1999  sommerfeld one more tweak: placebo for VOP_LEASE
 1.25 08-Dec-1999  sommerfeld Add appropriate VOP_FCNTL handlers to deadfs and specfs ops vectors.
 1.24 18-May-1998  fvdl branches: 1.24.14; 1.24.20;
Since the interlock has been unlocked, also clear LK_INTERLOCK from
a_flags in order to not confuse the layer that is called through
the following VCALL.
 1.23 18-May-1998  pk dead_lock() must unlock `v_interlock'.
 1.22 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.21 13-Oct-1996  christos backout previous kprintf changes
 1.20 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.19 07-Sep-1996  mycroft Implement poll(2).
 1.18 05-Sep-1996  thorpej Remove some unused variables.
 1.17 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.16 13-Feb-1996  mycroft GC *_nullop(). Minor nits.
 1.15 13-Feb-1996  mycroft GC dead_nullop().
 1.14 09-Feb-1996  christos miscfs prototype changes
 1.13 10-Apr-1995  mycroft Return EOF for old vnodes of tty devices, rather than EIO.
 1.12 16-Dec-1994  mycroft Remove a_fp.
 1.11 14-Nov-1994  christos fixed struct comment
 1.10 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.7 22-Dec-1993  cgd fix return type of dead_print
 1.6 18-Dec-1993  mycroft Canonicalize all #includes.
 1.5 07-Sep-1993  ws branches: 1.5.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.4 01-Aug-1993  mycroft Add RCS identifiers (this time on the correct side of the branch), and
incorporate recent changes in netbsd-0-9 branch.
 1.3 27-Jun-1993  andrew branches: 1.3.2;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.2 20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.2.1 31-Jul-1993  cgd give names, err, wmesg's, to my "pain" -- i.e. convert sleep() to tsleep()
 1.5.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.24.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.24.14.2 11-Feb-2001  bouyer Sync with HEAD.
 1.24.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.30.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.30.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.30.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.32.16.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.32.16.4 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.32.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.32.16.1 03-Aug-2004  skrll Sync with HEAD
 1.35.12.6 04-Feb-2008  yamt sync with head.
 1.35.12.5 21-Jan-2008  yamt sync with head
 1.35.12.4 27-Oct-2007  yamt sync with head.
 1.35.12.3 03-Sep-2007  yamt sync with head.
 1.35.12.2 30-Dec-2006  yamt sync with head.
 1.35.12.1 21-Jun-2006  yamt sync with head.
 1.36.2.1 20-Oct-2005  yamt adapt deadfs.
 1.38.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.38.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.38.8.1 24-May-2006  yamt sync with head.
 1.38.6.1 01-Jun-2006  kardel Sync with head.
 1.38.4.1 09-Sep-2006  rpaulo sync with head
 1.39.10.3 18-Dec-2006  yamt sync with head.
 1.39.10.2 10-Dec-2006  yamt sync with head.
 1.39.10.1 22-Oct-2006  yamt sync with head
 1.39.8.2 12-Jan-2007  ad Sync with head.
 1.39.8.1 18-Nov-2006  ad Sync with head.
 1.41.2.2 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.41.2.1 12-Dec-2006  tron Pull up following revision(s) (requested by pooka in ticket #269):
sys/miscfs/deadfs/dead_vnops.c: revision 1.43
Teach deadfs about vm object locking for getpages. This avoids
errors resulting from situations where we take a page fault for a
vnode which has been converted a deadfs vnode.
wrstuden ok
 1.43.14.1 15-Aug-2007  skrll Sync with HEAD.
 1.43.6.6 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.43.6.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.43.6.4 10-Jun-2007  ad Acquire vp->v_interlock before calling vwait().
 1.43.6.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.43.6.2 13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.43.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.44.10.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.44.10.1 29-Jul-2007  ad file dead_vnops.c was added on branch matt-mips64 on 2007-07-29 13:31:12 +0000
 1.44.8.1 14-Oct-2007  yamt sync with head.
 1.44.6.3 23-Mar-2008  matt sync with HEAD
 1.44.6.2 09-Jan-2008  matt sync with HEAD
 1.44.6.1 06-Nov-2007  matt sync with HEAD
 1.44.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.45.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.45.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.45.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.47.24.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.47.18.1 28-Apr-2009  skrll Sync with HEAD.
 1.47.10.2 11-Aug-2010  yamt sync with head.
 1.47.10.1 04-May-2009  yamt sync with head.
 1.48.4.3 05-Mar-2011  rmind sync with head
 1.48.4.2 03-Jul-2010  rmind sync with head
 1.48.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.48.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.50.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.51.16.1 18-May-2014  rmind sync with head
 1.51.12.2 03-Dec-2017  jdolecek update from HEAD
 1.51.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.51.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.56.2.1 10-Aug-2014  tls Rebase.
 1.57.4.2 28-Aug-2017  skrll Sync with HEAD
 1.57.4.1 06-Jun-2015  skrll Sync with HEAD
 1.59.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.59.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.61.18.1 29-Feb-2020  ad Sync with head.
 1.61.16.1 21-Mar-2020  martin Pull up following revision(s) (requested by riastradh in ticket #789):

sys/miscfs/deadfs/dead_vnops.c: revision 1.62

Use vn_bwrite, not genfs_nullop, for VOP_BWRITE.
VOP_BWRITE is responsible for calling biodone; can't just leave it
hanging.

XXX pullup
 1.61.12.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.63.10.1 01-Aug-2021  thorpej Sync with HEAD.
 1.1 26-Oct-2022  riastradh miscfs/deadfs/deadfs.h: New home for deadfs-related externs.

XXX regen sys/kern/vnode_if.c and the others
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.23 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.22 13-Jul-2014  hannken branches: 1.22.28; 1.22.34;
Change fdesc from hashlist to vcache.
 1.21 27-Sep-2011  christos branches: 1.21.12; 1.21.26;
define FDESC_MAXNAMLEN and use it.
 1.20 31-Jul-2009  pooka Get rid of dependency on M_UFSMNT. Since we need storage only for
one pointer, simply hang that off of mnt_data instead of allocating
storage.
 1.19 28-Jun-2008  rumble Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.18 11-Dec-2005  christos branches: 1.18.70; 1.18.74; 1.18.76; 1.18.78;
merge ktrace-lwp.
 1.17 30-Aug-2005  xtraeme Remove __P()
 1.16 20-May-2004  atatat branches: 1.16.12;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.15 07-Aug-2003  agc branches: 1.15.2;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.14 29-Jun-2003  fvdl branches: 1.14.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.13 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.12 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.11 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.10 01-Mar-1998  fvdl branches: 1.10.14;
Merge with Lite2 + local changes
 1.9 09-Feb-1996  christos miscfs prototype changes
 1.8 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.7 13-Dec-1994  mycroft Sync with CSRG.
 1.6 19-Aug-1994  mycroft Convert hash tables.
 1.5 29-Jun-1994  cgd branches: 1.5.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.3 05-Jan-1994  cgd update with latest fdesc file system from jsp@sequent.com
 1.2 25-Mar-1993  cgd changed copyright notice thanks to following statement:

Return-Path: jsp@compnews.co.uk
Received: from ben.uknet.ac.uk by postgres.Berkeley.EDU (5.61/1.29)
id AA25983; Thu, 25 Mar 93 05:37:37 -0800
Received: from fennel.compnews.co.uk by ben.uknet.ac.uk via UKIP with SMTP (PP)
id <g.05640-0@ben.uknet.ac.uk>; Thu, 25 Mar 1993 13:37:19 +0000
Received: from sage.compnews.co.uk by fennel.compnews.co.uk;
Thu, 25 Mar 93 13:37:08 GMT
Message-Id: <28109.9303251337@sage.compnews.co.uk>
From: jsp@compnews.co.uk (Jan-Simon Pendry)
Date: Thu, 25 Mar 1993 13:37:05 +0100
In-Reply-To: cgd@postgres.berkeley.edu's message as of Mar 25, 5:32am.
Phone-Number-1: +44 430 432450
Phone-Number-2: +44 430 432480 x20
Fax-Number: +44 430 432022
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: cgd@postgres.berkeley.edu
Subject: Re: fdesc/kernfs/etc code...

You may put this copyright message on the source code:

/*
* Copyright (c) 1990, 1992 Jan-Simon Pendry
* All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Jan-Simon Pendry.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
 1.1 23-Mar-1993  cgd branches: 1.1.1;
files which implement the fdesc filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.5.2.1 19-Aug-1994  mycroft update from trunk
 1.10.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.14.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.14.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.14.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.14.2.2 03-Aug-2004  skrll Sync with HEAD
 1.14.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.15.2.1 23-May-2004  tron Pull up revision 1.16 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.16.12.1 21-Jun-2006  yamt sync with head.
 1.18.78.1 03-Jul-2008  simonb Sync with head.
 1.18.76.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.18.74.2 19-Aug-2009  yamt sync with head.
 1.18.74.1 04-May-2009  yamt sync with head.
 1.18.70.1 29-Jun-2008  mjf Sync with HEAD.
 1.21.26.1 10-Aug-2014  tls Rebase.
 1.21.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.22.34.1 17-Jan-2020  ad Sync with head.
 1.22.28.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.96 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.95 21-Mar-2020  pgoyette branches: 1.95.2;
Finish the transition to SYSCTL_SETUP by removing local sysctllog
in favor of the one provided by the module infrastructure.
 1.94 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.93 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.92 17-Feb-2017  hannken branches: 1.92.14; 1.92.20;
Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.91 09-Nov-2014  maxv branches: 1.91.2; 1.91.4; 1.91.6;
Do not uselessly include <sys/malloc.h>.
 1.90 04-Sep-2014  christos Fix type of /dev/tty
 1.89 13-Jul-2014  hannken branches: 1.89.2;
Change fdesc from hashlist to vcache.
 1.88 23-Mar-2014  hannken branches: 1.88.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.87 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.86 27-Sep-2011  christos branches: 1.86.2; 1.86.12; 1.86.16;
define FDESC_MAXNAMLEN and use it.
 1.85 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.84 08-Jan-2010  pooka branches: 1.84.2; 1.84.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.83 30-Nov-2009  pooka Introduce genfs_statvfs() as pretty much a no-info statvfs and
convert several pseudo file systems to use it.
 1.82 31-Jul-2009  pooka Get rid of dependency on M_UFSMNT. Since we need storage only for
one pointer, simply hang that off of mnt_data instead of allocating
storage.
 1.81 31-Jul-2009  pooka Instead of reporting some random "files used/free" figures for the
process doing statvfs(!), just report 0. The code had some kernel
panicking bug after the descriptor code update, the functionality
is more like a bunny rabbit hat than anything useful, and I can't
bother to figure out what the invariants in the new descriptor code
are.

fixes PR kern/41534 and kern/41786
 1.80 24-May-2009  ad More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
 1.79 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.78 17-Dec-2008  cegger branches: 1.78.2;
kill MALLOC and FREE macros.
 1.77 28-Jun-2008  rumble branches: 1.77.4;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.76 13-May-2008  simonb branches: 1.76.2;
mnt_data is a pointer, set it to NULL not 0 when we're finished with it.
 1.75 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.74 29-Apr-2008  ad branches: 1.74.2;
PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.73 28-Jan-2008  dholland branches: 1.73.6; 1.73.8; 1.73.10;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.72 02-Jan-2008  ad Merge vmlocking2 to head.
 1.71 26-Nov-2007  pooka branches: 1.71.2; 1.71.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.70 10-Oct-2007  ad branches: 1.70.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.69 31-Jul-2007  pooka branches: 1.69.2; 1.69.4; 1.69.6; 1.69.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.68 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.67 17-Jul-2007  pooka branches: 1.67.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.66 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.65 08-Jul-2007  pooka * allow unmount even if rootvp has a usecount > 1 provided that
MNT_FORCE is given
* decrease cargo cult index by getting rid of commented sections
with mntflushbuf() in them - AFAICT the call was removed from our
kernel over 13 years ago with the 4.4BSDlite import
 1.64 19-Jan-2007  hannken branches: 1.64.6; 1.64.8;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.63 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.62 16-Nov-2006  christos branches: 1.62.2;
__unused removal on arguments; approved by core.
 1.61 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.60 30-Aug-2006  christos branches: 1.60.2; 1.60.4;
fix missing initializers
 1.59 14-May-2006  elad integrate kauth.
 1.58 11-Dec-2005  christos branches: 1.58.4; 1.58.6; 1.58.8; 1.58.10; 1.58.12;
merge ktrace-lwp.
 1.57 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.56 30-Aug-2005  xtraeme Remove __P()
 1.55 29-May-2005  christos branches: 1.55.2;
- sprinkle const
- avoid shadowed variables.
 1.54 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.53 02-Jan-2005  thorpej branches: 1.53.2;
Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.52 13-Sep-2004  jdolecek set mp->mnt_stat.f_namemax on filesystem mount, for use by statvfs
 1.51 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.50 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.49 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.48 21-Apr-2004  christos add sys/dirent.h
 1.47 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.46 24-Mar-2004  atatat branches: 1.46.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.45 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.44 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.43 29-Jun-2003  fvdl branches: 1.43.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.42 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.41 29-Jun-2003  thorpej Adjust for ktrace/lwp changes.
 1.40 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.39 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.38 21-Sep-2002  christos MNT_GETARGS support
 1.37 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.36 15-Nov-2001  lukem branches: 1.36.8;
don't need <sys/types.h> when including <sys/param.h>
 1.35 10-Nov-2001  lukem add RCSIDs
 1.34 15-Sep-2001  chs branches: 1.34.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.33 30-May-2001  mrg branches: 1.33.2; 1.33.4;
use _KERNEL_OPT
 1.32 22-Jan-2001  jdolecek branches: 1.32.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.31 10-Jun-2000  assar make vfs_getnewfsid only take one argument and fetch the name of the
filesystem from the supplied mount argument. also make makefstype
take a const parameter. update all the callers.
 1.30 30-Mar-2000  simonb branches: 1.30.2;
Delete redundant decl of fdesc_root, it's in fdesc.h.
 1.29 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.28 08-Jul-1999  wrstuden branches: 1.28.2; 1.28.8;
Bump osrelease to 1.4E. Add layerfs files, remove null_subr.c.

Update coda to new struct lock in struct vnode.

make fdescfs, kernfs, portalfs, and procfs actually lock their vnodes.
It's not that hard.

Make unionfs set v_vnlock = NULL so any overlayed fs will call its
VOP_LOCK.
 1.27 26-Feb-1999  wrstuden branches: 1.27.4;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.26 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.25 05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.24 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.23 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.22 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.21 09-Feb-1996  christos miscfs prototype changes
 1.20 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.19 09-Mar-1995  mycroft copy*str() should use size_t.
 1.18 08-Mar-1995  cgd use u_long for copyin*
 1.17 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a (tiny) bit.
 1.16 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.15 13-Dec-1994  mycroft Sync with CSRG.
 1.14 15-Sep-1994  mycroft stat the file system at mount time, for `df -n', et al.
 1.13 29-Jun-1994  cgd branches: 1.13.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.12 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.11 23-Apr-1994  cgd make fs types consistent over new kernels. also, some proto foo.
 1.10 21-Apr-1994  cgd Convert mount, vnode, and buf structs to use <sys/queue.h>. Also,
some knf and structure frobbing to do along with it.
 1.9 14-Apr-1994  cgd fs types are names now.
 1.8 09-Jan-1994  ws Note that NFS mounting of fdesc doesn't make sense
 1.7 05-Jan-1994  cgd fix UFS vs 'real' fs type mixups
 1.6 05-Jan-1994  cgd update with latest fdesc file system from jsp@sequent.com
 1.5 23-Aug-1993  mycroft branches: 1.5.2;
RLIMIT_OFILE --> RLIMIT_NOFILE
 1.4 07-Jun-1993  cgd give various filesystems their own vnode types
 1.3 07-Jun-1993  cgd give miscfs filesystems their own mount structure malloc type.
 1.2 25-Mar-1993  cgd changed copyright notice thanks to following statement:

Return-Path: jsp@compnews.co.uk
Received: from ben.uknet.ac.uk by postgres.Berkeley.EDU (5.61/1.29)
id AA25983; Thu, 25 Mar 93 05:37:37 -0800
Received: from fennel.compnews.co.uk by ben.uknet.ac.uk via UKIP with SMTP (PP)
id <g.05640-0@ben.uknet.ac.uk>; Thu, 25 Mar 1993 13:37:19 +0000
Received: from sage.compnews.co.uk by fennel.compnews.co.uk;
Thu, 25 Mar 93 13:37:08 GMT
Message-Id: <28109.9303251337@sage.compnews.co.uk>
From: jsp@compnews.co.uk (Jan-Simon Pendry)
Date: Thu, 25 Mar 1993 13:37:05 +0100
In-Reply-To: cgd@postgres.berkeley.edu's message as of Mar 25, 5:32am.
Phone-Number-1: +44 430 432450
Phone-Number-2: +44 430 432480 x20
Fax-Number: +44 430 432022
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: cgd@postgres.berkeley.edu
Subject: Re: fdesc/kernfs/etc code...

You may put this copyright message on the source code:

/*
* Copyright (c) 1990, 1992 Jan-Simon Pendry
* All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Jan-Simon Pendry.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
 1.1 23-Mar-1993  cgd branches: 1.1.1;
files which implement the fdesc filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.5.2.3 06-Jan-1994  pk Re-instate EOPNOTSUPP.
 1.5.2.2 28-Dec-1993  pk Use ENODEV rather then EOPNOTSUP for unsupported operations on non-socket devices
 1.5.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.13.2.1 16-Sep-1994  cgd from trunk, per mycroft
 1.27.4.1 02-Aug-1999  thorpej Update from trunk.
 1.28.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.28.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.28.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.32.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.32.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.32.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.32.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.32.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.32.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.33.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.33.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.33.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.33.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.34.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.36.8.1 29-Aug-2002  gehenna catch up with -current.
 1.43.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.43.2.8 01-Apr-2005  skrll Sync with HEAD.
 1.43.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.43.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.43.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.43.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.43.2.3 03-Aug-2004  skrll Sync with HEAD
 1.43.2.2 02-Jul-2003  wrstuden Check in lwp-ification changes needed to get the evbarm/IQ80321 kernel
to compile.

only question I have is over the:
l->l_proc->p_stats->p_ru.ru_msgsnd++;
command at line 245 of dev/kttcp.c. Should we be doing per-lwp or
per-proc accounting?
 1.43.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.46.2.1 29-May-2004  tron Pull up revision 1.50 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.53.2.1 29-Apr-2005  kent sync with -current
 1.55.2.8 04-Feb-2008  yamt sync with head.
 1.55.2.7 21-Jan-2008  yamt sync with head
 1.55.2.6 07-Dec-2007  yamt sync with head
 1.55.2.5 27-Oct-2007  yamt sync with head.
 1.55.2.4 03-Sep-2007  yamt sync with head.
 1.55.2.3 26-Feb-2007  yamt sync with head.
 1.55.2.2 30-Dec-2006  yamt sync with head.
 1.55.2.1 21-Jun-2006  yamt sync with head.
 1.58.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.58.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.58.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.58.8.2 03-Sep-2006  yamt sync with head.
 1.58.8.1 24-May-2006  yamt sync with head.
 1.58.6.1 01-Jun-2006  kardel Sync with head.
 1.58.4.1 09-Sep-2006  rpaulo sync with head
 1.60.4.2 10-Dec-2006  yamt sync with head.
 1.60.4.1 22-Oct-2006  yamt sync with head
 1.60.2.3 01-Feb-2007  ad Sync with head.
 1.60.2.2 12-Jan-2007  ad Sync with head.
 1.60.2.1 18-Nov-2006  ad Sync with head.
 1.62.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.64.8.1 11-Jul-2007  mjf Sync with head.
 1.64.6.4 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.64.6.3 20-Aug-2007  ad Sync with HEAD.
 1.64.6.2 15-Jul-2007  ad Sync with head.
 1.64.6.1 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.67.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.69.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.69.8.1 31-Jul-2007  pooka file fdesc_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:16 +0000
 1.69.6.1 14-Oct-2007  yamt sync with head.
 1.69.4.3 23-Mar-2008  matt sync with HEAD
 1.69.4.2 09-Jan-2008  matt sync with HEAD
 1.69.4.1 06-Nov-2007  matt sync with HEAD
 1.69.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.69.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.70.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.70.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.71.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.71.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.73.10.6 11-Aug-2010  yamt sync with head.
 1.73.10.5 11-Mar-2010  yamt sync with head
 1.73.10.4 19-Aug-2009  yamt sync with head.
 1.73.10.3 20-Jun-2009  yamt sync with head
 1.73.10.2 04-May-2009  yamt sync with head.
 1.73.10.1 16-May-2008  yamt sync with head.
 1.73.8.1 18-May-2008  yamt sync with head.
 1.73.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.73.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.73.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.74.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.74.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.76.2.1 03-Jul-2008  simonb Sync with head.
 1.77.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.77.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.78.2.2 23-Jul-2009  jym Sync with HEAD.
 1.78.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.84.4.1 03-Jul-2010  rmind sync with head
 1.84.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.86.16.1 18-May-2014  rmind sync with head
 1.86.12.2 03-Dec-2017  jdolecek update from HEAD
 1.86.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.86.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.88.2.1 10-Aug-2014  tls Rebase.
 1.89.2.2 07-May-2015  snj Pull up following revision(s) (requested by riz in ticket #737):
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.90
Fix type of /dev/tty
 1.89.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.91.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.91.4.1 20-Mar-2017  pgoyette Sync with HEAD
 1.91.2.1 28-Aug-2017  skrll Sync with HEAD
 1.92.20.1 17-Jan-2020  ad Sync with head.
 1.92.14.2 21-Apr-2020  martin Sync with HEAD
 1.92.14.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.95.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.140 27-Mar-2022  christos dedup the eofs link/symlink methods
 1.139 15-Jan-2022  riastradh sys/fs/fdesc: Delete silly vnop #define aliases.
 1.138 29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.137 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.136 28-Jun-2021  chs VOP_BMAP() may be called via ioctl(FIOGETBMAP) on any vnode that applications
can open. change various pseudo-fs *_bmap methods return an error instead of
panic.

Reported-by: syzbot+8289a3eaf2ba60958c87@syzkaller.appspotmail.com
 1.135 01-May-2021  hannken Make sure fdesc_lookup() never returns VNON vnodes.

Should fix PR kern/56130 (fdescfs create nodes with wrong major number)
 1.134 27-Jun-2020  christos branches: 1.134.6;
Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.133 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.132 01-Feb-2020  riastradh Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.
 1.131 02-Jan-2020  thorpej branches: 1.131.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.
 1.130 03-Sep-2018  riastradh branches: 1.130.4;
Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.129 26-May-2017  riastradh branches: 1.129.8; 1.129.10;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.128 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.127 20-Aug-2016  hannken branches: 1.127.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.126 20-Apr-2015  riastradh branches: 1.126.2;
Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.125 05-Sep-2014  christos branches: 1.125.2;
The comment about toxicity was correct, restore VNON setting code and
then set the proper type in lookup.
 1.124 05-Sep-2014  matt Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.123 04-Sep-2014  christos remove debugging.
 1.122 04-Sep-2014  christos Well, nasty thing happen if you set /dev/tty to VNON too. Disable for now.
 1.121 25-Jul-2014  dholland branches: 1.121.2;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.120 13-Jul-2014  hannken Change fdesc from hashlist to vcache.
 1.119 20-Mar-2014  christos branches: 1.119.2;
kill sprintf
 1.118 27-Feb-2014  hannken The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.117 07-Feb-2014  hannken Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.116 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.115 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.114 16-Oct-2011  hannken branches: 1.114.2; 1.114.12; 1.114.16;
VOP_GETATTR() needs a shared lock at least.
 1.113 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.112 21-Jul-2010  hannken branches: 1.112.6;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.111 16-Jul-2010  hannken Use a kmutex to protect the hash chains and always take this mutex
before removing a node from the hash chain.

Release the hash list lock before calling getnewvnode() and check the
hash list again like other file systems do.

Take v_interlock before calling vget().
 1.110 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.109 08-Jan-2010  pooka branches: 1.109.2; 1.109.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.108 31-Jul-2009  pooka Do a name-based search for the ctty major instead of requiring an
external symbol.
 1.107 24-May-2009  ad More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
 1.106 15-Mar-2009  cegger ansify function definitions
 1.105 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.104 17-Dec-2008  cegger branches: 1.104.2;
kill MALLOC and FREE macros.
 1.103 05-May-2008  ad branches: 1.103.8;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.102 21-Mar-2008  ad branches: 1.102.2; 1.102.4;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.101 08-Dec-2007  pooka branches: 1.101.12;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.100 26-Nov-2007  pooka branches: 1.100.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.99 08-Oct-2007  ad branches: 1.99.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.98 27-Jul-2007  pooka branches: 1.98.4; 1.98.6; 1.98.8; 1.98.10;
whoops, forgot to commit this a while back: initialize new vnode size
 1.97 09-Jul-2007  ad branches: 1.97.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.96 09-Feb-2007  ad branches: 1.96.6; 1.96.8;
Merge newlock2 to head.
 1.95 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.94 16-Nov-2006  christos branches: 1.94.2;
__unused removal on arguments; approved by core.
 1.93 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.92 14-May-2006  elad branches: 1.92.8; 1.92.10;
integrate kauth.
 1.91 04-Apr-2006  christos Coverity CID 1140: NULL dereference cannot happen, but protect against it.
 1.90 01-Mar-2006  yamt branches: 1.90.2; 1.90.4; 1.90.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.89 11-Dec-2005  christos branches: 1.89.2; 1.89.4; 1.89.6;
merge ktrace-lwp.
 1.88 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.87 14-Sep-2005  christos branches: 1.87.2;
When readdir() is called from vfs_getcwd, uio->uio_procp is NULL. Deal with
that. Fixes 'cd /dev/fd && pwd'
 1.86 30-Aug-2005  xtraeme Remove __P()
 1.85 19-Aug-2005  christos 64 bit inode changes.
 1.84 29-May-2005  christos branches: 1.84.2;
- sprinkle const
- avoid shadowed variables.
 1.83 26-Feb-2005  perry nuke trailing whitespace
 1.82 30-Nov-2004  christos branches: 1.82.4; 1.82.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
 1.81 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.80 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.79 13-Sep-2003  jdolecek move dupfd from struct proc to struct lwp - it's per-LWP, not per-process; we
use curlwp where the lwp is not directly available, i.e. in device open
routines

briefly discussed on tech-kern
 1.78 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.77 29-Jun-2003  fvdl branches: 1.77.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.76 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.75 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.74 10-Apr-2003  jdolecek use former genfs_eopnotsupp_rele() as genfs_eopnotsupp(), so that vnodes
are vput()/vrele()d as necessary - some filesystems did use the wrong
one for some ops, and it's just safer to not take the chance

based on suggestion by Bill Studenmund
 1.73 23-Feb-2003  pk Make updating a file's reference and use count MP-safe.
 1.72 23-Feb-2003  simonb Remove assigned-to but not used variable.
 1.71 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.70 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.69 02-Apr-2002  jdolecek branches: 1.69.2;
Changes to make it less likely to need to be revisited later again:
* fdesc_attr(): don't panic for 'unknown' descriptor types, rather use
(*fp->f_ops->fo_stat)() hook, as for DTYPE_SOCKET and DTYPE_PIPE
XXX perhaps use different vnode type than VBAD for these?
* fdesc_setattr(): just return 0 regardless of type, rather than paniccing
for 'unknown' descriptor types
 1.68 02-Apr-2002  jmc Treat pipes like sockets and don't do setattr on them
 1.67 06-Dec-2001  chs add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.66 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.65 10-Nov-2001  lukem add RCSIDs
 1.64 16-Jun-2001  jdolecek branches: 1.64.2; 1.64.4; 1.64.6;
Add DTYPE_PIPE (to be used by new pipe implementation) and handle
it accordingly.
 1.63 14-Jun-2001  thorpej Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
 1.62 09-Apr-2001  jdolecek Change the first arg to fileops fo_stat routine to struct file *, adjust
callers and appropriate routines to cope. This makes fo_stat more
consistent with rest of fileops routines and also makes the fo_stat
match FreeBSD as an added bonus.
Discussed with Luke Mewburn on tech-kern@.
 1.61 09-Apr-2001  jdolecek Call file descriptor stat function via (*fp->f_ops->fo_stat) instead
of a switch statement and explicit call.
Sprinkle some FILE_USE()/FILE_UNUSE() as appropriate.
 1.60 07-Apr-2001  jdolecek Adapt to struct fileops, soo_stat() changes.
Pointed out by Bernd Ernesti in private mail.
 1.59 06-Mar-2001  jmc XXX: Temporary work around to fdesc truncating files when it shouldn't. Treat
setattr calls on underlying vnodes the same as sockets and just return 0.

This whole thing needs to be gutted and replaced with either fall throughs
to specfs (the attr forwarding is just bizarre and leads to weird crap like
the above truncation problems), or better yet a real cloning device node.
 1.58 22-Jan-2001  jdolecek branches: 1.58.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.57 08-Nov-2000  ad Update for hashinit() change.
 1.56 03-Aug-2000  thorpej MALLOC()/FREE() are not to be used for variable sized allocations.
 1.55 27-May-2000  thorpej sleep() -> tsleep()
 1.54 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.53 25-Aug-1999  sommerfeld branches: 1.53.2; 1.53.8;
Change variable used for directory offset from "int" to "off_t".
Overkill, but avoids a host of truncation problems.
 1.52 24-Aug-1999  sommerfeld Fix PR8270:

Problem turned out to be due to improper handling of reads beyond EOF:
they should just return without error with the uio unchanged, and the
caller will recognize this as a zero-byte return (EOF).

The previous fix to protect directory reads against bogus uio_offset
values returned EINVAL, which broke mount -o union, which only
union'ed in the lower directory if the upper directory cleanly
returned EOF.

While we're here, protect kernfs as well.
 1.51 14-Aug-1999  christos protect against large uio_offsets
 1.50 03-Aug-1999  wrstuden Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.49 19-Jul-1999  thorpej From Bill Studenmund: unlock the fdescfs "/dev/tty" vnode before calling
cttyread()/cttywrite(), and lock it again when it returns.

Squashes the somewhat bizarre lossage I was observing w/ more(1), sudo(1),
etc.
 1.48 08-Jul-1999  wrstuden Bump osrelease to 1.4E. Add layerfs files, remove null_subr.c.

Update coda to new struct lock in struct vnode.

make fdescfs, kernfs, portalfs, and procfs actually lock their vnodes.
It's not that hard.

Make unionfs set v_vnlock = NULL so any overlayed fs will call its
VOP_LOCK.
 1.47 13-Aug-1998  kleink branches: 1.47.6; 1.47.8;
Per POSIX, fail with EINVAL if advisory locking is attempted on a file type
that doesn't support it, rather than using a homegrown EBADF or EOPNOTSUPP.
 1.46 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.45 03-Aug-1998  kleink Recognize _PC_SYNC_IO.
 1.44 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.43 07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.42 10-Oct-1997  fvdl Bump last argument to VOP_READDIR to off_t (from u_long).
 1.41 05-May-1997  mycroft branches: 1.41.4;
Eliminate bogus uses of V{READ,WRITE,EXEC}. Use S_I[RWX]{USR,GRP,OTH} where
appropriate.
 1.40 16-Apr-1997  fvdl fdesc_seek -> genfs_seek, not genfs_badop
 1.39 11-Apr-1997  kleink Implement a POSIX compliant genfs VOP_SEEK() and use it in the appropriate
places; by Chris G. Demetriou and myself.
 1.38 25-Oct-1996  cgd define path name string variables that we should not (and, thankfully, do
not) modify as 'const char *' rather 'char *'.
 1.37 13-Oct-1996  christos backout previous kprintf changes
 1.36 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.35 07-Sep-1996  mycroft Implement poll(2).
 1.34 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.33 14-Jun-1996  mrg use VATTR_NULL macro.
 1.32 11-Apr-1996  mrg fix long-time bug in fdesc -- /dev/tty was a named pipe rather than a
mirror image of the real /dev/tty, a char dev. make it a char dev.
 1.31 13-Feb-1996  mycroft GC *_nullop(). Minor nits.
 1.30 09-Feb-1996  christos miscfs prototype changes
 1.29 09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.28 01-Feb-1996  jtc Rename struct timespec fields to conform to POSIX.1b
 1.27 09-Oct-1995  mycroft /dev/std* are of type DT_LNK.
 1.26 09-Oct-1995  mycroft Use the index number as the cookie, rather than multiplying by UIO_MX.
 1.25 09-Oct-1995  mycroft Add support for cookies, mostly from Greg Hudson.
 1.24 14-Dec-1994  mycroft Remove a_fp.
 1.23 14-Dec-1994  mycroft Revert dup handling.
 1.22 13-Dec-1994  mycroft Sync with CSRG.
 1.21 04-Dec-1994  mycroft Use fddupopen(), just like fdopen() does.
 1.20 14-Nov-1994  christos fixed struct comment
 1.19 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.18 20-Oct-1994  cgd update for new syscall args description mechanism
 1.17 19-Aug-1994  mycroft Convert hash tables.
 1.16 14-Jul-1994  mycroft Fix a fencepost error.
 1.15 29-Jun-1994  cgd branches: 1.15.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.14 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.13 04-May-1994  cgd Rename a lot of process flags.
 1.12 25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.11 09-Jan-1994  ws Note that NFS mounting of fdesc doesn't make sense
 1.10 05-Jan-1994  cgd don't try to reclaim 'known' root vnode
 1.9 05-Jan-1994  cgd fix UFS vs 'real' fs type mixups
 1.8 05-Jan-1994  cgd update with latest fdesc file system from jsp@sequent.com
 1.7 23-Dec-1993  cgd fix fdesc_print return type (again)
 1.6 07-Sep-1993  ws branches: 1.6.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.5 02-Aug-1993  mycroft Make fdesc_print have a return type of void.
 1.4 07-Jun-1993  cgd give various filesystems their own vnode types
 1.3 30-Mar-1993  cgd added . and ..
 1.2 25-Mar-1993  cgd changed copyright notice thanks to following statement:

Return-Path: jsp@compnews.co.uk
Received: from ben.uknet.ac.uk by postgres.Berkeley.EDU (5.61/1.29)
id AA25983; Thu, 25 Mar 93 05:37:37 -0800
Received: from fennel.compnews.co.uk by ben.uknet.ac.uk via UKIP with SMTP (PP)
id <g.05640-0@ben.uknet.ac.uk>; Thu, 25 Mar 1993 13:37:19 +0000
Received: from sage.compnews.co.uk by fennel.compnews.co.uk;
Thu, 25 Mar 93 13:37:08 GMT
Message-Id: <28109.9303251337@sage.compnews.co.uk>
From: jsp@compnews.co.uk (Jan-Simon Pendry)
Date: Thu, 25 Mar 1993 13:37:05 +0100
In-Reply-To: cgd@postgres.berkeley.edu's message as of Mar 25, 5:32am.
Phone-Number-1: +44 430 432450
Phone-Number-2: +44 430 432480 x20
Fax-Number: +44 430 432022
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: cgd@postgres.berkeley.edu
Subject: Re: fdesc/kernfs/etc code...

You may put this copyright message on the source code:

/*
* Copyright (c) 1990, 1992 Jan-Simon Pendry
* All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Jan-Simon Pendry.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
 1.1 23-Mar-1993  cgd branches: 1.1.1;
files which implement the fdesc filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.2.3 06-Jan-1994  pk Re-instate EOPNOTSUPP.
 1.6.2.2 28-Dec-1993  pk Use ENODEV rather then EOPNOTSUP for unsupported operations on non-socket devices
 1.6.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.15.2.2 19-Aug-1994  mycroft update from trunk
 1.15.2.1 15-Jul-1994  cgd fix fencepost error. from trunk.
 1.41.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.47.8.1 02-Aug-1999  thorpej Update from trunk.
 1.47.6.1 28-Aug-1999  he Pull up revisions 1.51-1.53:
Protect {fdesc,kernfs,procfs}_readdir against directory seeks
with bogus offsets. (sommerfeld)
 1.53.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.53.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.53.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.53.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.53.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.53.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.58.2.7 11-Nov-2002  nathanw Catch up to -current
 1.58.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.58.2.5 17-Apr-2002  nathanw Catch up to -current.
 1.58.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.58.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.58.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.58.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.64.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.64.4.2 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.64.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.64.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.64.2.5 28-Sep-2002  jdolecek fdesc_kqfilter(): for Fdesc, invoke kqfilter of the underlying descriptor, and
fallback to genfs_kqfilter() for other files
 1.64.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.64.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.64.2.2 08-Sep-2001  thorpej Add kqueue support to "/dev/tty".
 1.64.2.1 10-Jul-2001  lukem support DTYPE_KQUEUE
 1.69.2.1 16-May-2002  gehenna Call the device interfaces via the device switch.
Replace the direct-access to devsw table with calling devsw APIs.
 1.77.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.77.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.77.2.6 18-Dec-2004  skrll Sync with HEAD.
 1.77.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.77.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.77.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.77.2.2 03-Aug-2004  skrll Sync with HEAD
 1.77.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.82.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.82.4.1 29-Apr-2005  kent sync with -current
 1.84.2.8 24-Mar-2008  yamt sync with head.
 1.84.2.7 21-Jan-2008  yamt sync with head
 1.84.2.6 07-Dec-2007  yamt sync with head
 1.84.2.5 27-Oct-2007  yamt sync with head.
 1.84.2.4 03-Sep-2007  yamt sync with head.
 1.84.2.3 26-Feb-2007  yamt sync with head.
 1.84.2.2 30-Dec-2006  yamt sync with head.
 1.84.2.1 21-Jun-2006  yamt sync with head.
 1.87.2.1 20-Oct-2005  yamt adapt fdesc.
 1.89.6.2 01-Jun-2006  kardel Sync with head.
 1.89.6.1 22-Apr-2006  simonb Sync with head.
 1.89.4.1 09-Sep-2006  rpaulo sync with head
 1.89.2.2 18-Feb-2006  yamt fix proc/lwp mismatch in the previous.
 1.89.2.1 18-Feb-2006  yamt adapt the rest of MI code.
 1.90.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.90.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.90.4.2 19-Apr-2006  elad sync with head.
 1.90.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.90.2.2 24-May-2006  yamt sync with head.
 1.90.2.1 11-Apr-2006  yamt sync with head
 1.92.10.2 10-Dec-2006  yamt sync with head.
 1.92.10.1 22-Oct-2006  yamt sync with head
 1.92.8.3 12-Jan-2007  ad Sync with head.
 1.92.8.2 18-Nov-2006  ad Sync with head.
 1.92.8.1 17-Nov-2006  ad Checkpoint work in progress.
 1.94.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.96.8.1 11-Jul-2007  mjf Sync with head.
 1.96.6.3 20-Aug-2007  ad Sync with HEAD.
 1.96.6.2 13-Apr-2007  ad - Make the devsw interface MP safe, and add some comments.
- Allow individual block/character drivers to be marked MP safe.
- Provide wrappers around the device methods that look up the
device, returning ENXIO if it's not found, and acquire the
kernel lock if needed.
 1.96.6.1 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.97.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.98.10.2 27-Jul-2007  pooka whoops, forgot to commit this a while back: initialize new vnode size
 1.98.10.1 27-Jul-2007  pooka file fdesc_vnops.c was added on branch matt-mips64 on 2007-07-27 08:38:40 +0000
 1.98.8.1 14-Oct-2007  yamt sync with head.
 1.98.6.2 09-Jan-2008  matt sync with HEAD
 1.98.6.1 06-Nov-2007  matt sync with HEAD
 1.98.4.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.98.4.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.98.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.99.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.99.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.100.2.1 26-Dec-2007  ad Sync with head.
 1.101.12.3 17-Jan-2009  mjf Sync with HEAD.
 1.101.12.2 02-Jun-2008  mjf Sync with HEAD.
 1.101.12.1 03-Apr-2008  mjf Sync with HEAD.
 1.102.4.6 11-Aug-2010  yamt sync with head.
 1.102.4.5 11-Mar-2010  yamt sync with head
 1.102.4.4 19-Aug-2009  yamt sync with head.
 1.102.4.3 20-Jun-2009  yamt sync with head
 1.102.4.2 04-May-2009  yamt sync with head.
 1.102.4.1 16-May-2008  yamt sync with head.
 1.102.2.1 18-May-2008  yamt sync with head.
 1.103.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.103.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.104.2.2 23-Jul-2009  jym Sync with HEAD.
 1.104.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.109.4.3 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.109.4.2 05-Mar-2011  rmind sync with head
 1.109.4.1 03-Jul-2010  rmind sync with head
 1.109.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.112.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.114.16.1 18-May-2014  rmind sync with head
 1.114.12.2 03-Dec-2017  jdolecek update from HEAD
 1.114.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.114.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.119.2.1 10-Aug-2014  tls Rebase.
 1.121.2.1 13-May-2015  snj Pull up following revision(s) (requested by riz in ticket #737):
sys/miscfs/fdesc/fdesc_vnops.c: revision 1.125 via patch
The comment about toxicity was correct, restore VNON setting code and
then set the proper type in lookup.
 1.125.2.3 28-Aug-2017  skrll Sync with HEAD
 1.125.2.2 05-Oct-2016  skrll Sync with HEAD
 1.125.2.1 06-Jun-2015  skrll Sync with HEAD
 1.126.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.127.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.129.10.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.129.10.1 10-Jun-2019  christos Sync with HEAD
 1.129.8.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.130.4.2 20-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/kern/kern_event.c: revision 1.106
sys/kern/sys_select.c: revision 1.51
sys/kern/subr_exec_fd.c: revision 1.10
sys/kern/sys_aio.c: revision 1.46
sys/kern/kern_descrip.c: revision 1.244
sys/kern/kern_descrip.c: revision 1.245
sys/ddb/db_xxx.c: revision 1.72
sys/ddb/db_xxx.c: revision 1.73
sys/miscfs/fdesc/fdesc_vnops.c: revision 1.132
sys/kern/uipc_usrreq.c: revision 1.195
sys/kern/sys_descrip.c: revision 1.36
sys/kern/uipc_usrreq.c: revision 1.196
sys/kern/uipc_socket2.c: revision 1.135
sys/kern/uipc_socket2.c: revision 1.136
sys/kern/kern_sig.c: revision 1.383
sys/kern/kern_sig.c: revision 1.384
sys/compat/netbsd32/netbsd32_ioctl.c: revision 1.107
sys/miscfs/procfs/procfs_vnops.c: revision 1.208
sys/kern/subr_exec_fd.c: revision 1.9
sys/kern/kern_descrip.c: revision 1.252
(all via patch)

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:
- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.

Load struct fdfile::ff_file with atomic_load_consume.
Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)
kern_descrip.c: Fix membars around reference count decrement.

In general, the `last one out hit the lights' style of reference
counting (as opposed to the `whoever's destroying must wait for
pending users to finish' style) requires memory barriers like so:

... usage of resources associated with object ...
membar_release();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_acquire();
... freeing of resources associated with object ...

This way, all usage happens-before all freeing. This fixes several
errors:
- fd_close failed to ensure whatever its caller did would
happen-before the freeing, in the case where another thread is
concurrently trying to close the fd (ff->ff_file == NULL).
Fix: Add membar_release before atomic_dec_uint(&ff->ff_refcnt) in
that branch.
- fd_close failed to ensure all loads its caller had issued will have
happened-before the freeing, in the case where the fd is still in
use by another thread (fdp->fd_refcnt > 1 and ff->ff_refcnt-- > 0).
Fix: Change membar_producer to membar_release before
atomic_dec_uint(&ff->ff_refcnt).
- fd_close failed to ensure that any usage of fp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&ff->ff_refcnt).
- fd_free failed to ensure that any usage of fdp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&fdp->fd_refcnt).

While here, change membar_exit -> membar_release. No semantic
change, just updating away from the legacy API.
 1.130.4.1 03-May-2021  martin Pull up following revision(s) (requested by hannken in ticket #1267):

sys/miscfs/fdesc/fdesc_vnops.c: revision 1.135

Make sure fdesc_lookup() never returns VNON vnodes.
Should fix PR kern/56130 (fdescfs create nodes with wrong major number)
 1.131.2.1 29-Feb-2020  ad Sync with head.
 1.134.6.2 01-Aug-2021  thorpej Sync with HEAD.
 1.134.6.1 13-May-2021  thorpej Sync with HEAD.
 1.4 11-Oct-2014  uebayasi Define filesystem attributes with vfs dependency.
 1.3 11-Dec-2005  christos branches: 1.3.120;
merge ktrace-lwp.
 1.2 26-Feb-2005  perry nuke trailing whitespace
 1.1 16-Apr-2002  thorpej branches: 1.1.6; 1.1.8; 1.1.14; 1.1.22; 1.1.24;
Cleanup how file system configuration information is declared, grouping
related information together, with the file system code itself.

This is just low-hanging fruit -- more to come.
 1.1.24.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.22.1 29-Apr-2005  kent sync with -current
 1.1.14.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.8.1 16-Apr-2002  jdolecek file files.fdesc was added on branch kqueue on 2002-06-23 17:50:09 +0000
 1.1.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.1.6.1 16-Apr-2002  nathanw file files.fdesc was added on branch nathanw_sa on 2002-06-20 03:47:55 +0000
 1.3.120.1 03-Dec-2017  jdolecek update from HEAD
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.28 26-Oct-2022  riastradh miscfs/fifofs/fifo.h: New home for extern fifo_vnodeop_opv_desc.

Add include guard and fix missing includes while here too.
 1.27 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.26 29-Mar-2010  pooka branches: 1.26.76;
Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.25 25-Jan-2008  ad branches: 1.25.10; 1.25.30; 1.25.32;
Remove VOP_LEASE. Discussed on tech-kern.
 1.24 11-Dec-2005  christos branches: 1.24.46; 1.24.52;
merge ktrace-lwp.
 1.23 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.22 30-Aug-2005  xtraeme branches: 1.22.2;
Remove __P()
 1.21 16-Jun-2004  wrstuden branches: 1.21.12;
Change fifo_{un,}lock and fifo_islocked to use the "real" lock
ops, not the nolock variants. Should have no real impact as according
to mkid, we only use fifo_vnodeop_entries, via fifo_vnodeop_p,
for selective operations on fifos. All the fifo users use the native
file system's locking routines.

Removes one use of genfs_nolock and friends.
 1.20 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 23-Oct-2002  jdolecek branches: 1.19.6;
merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.18 06-Dec-2001  chs add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.17 22-Sep-2001  sommerfeld Add fifo_putpages() placebo so that the vnode's uobj is unlocked.
 1.16 13-Aug-1998  kleink branches: 1.16.24; 1.16.26; 1.16.28;
Per POSIX, fail with EINVAL if advisory locking is attempted on a file type
that doesn't support it, rather than using a homegrown EBADF or EOPNOTSUPP.
 1.15 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.14 22-Jun-1998  sommerfe defopt for options FIFO
 1.13 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.12 07-Sep-1996  mycroft Implement poll(2).
 1.11 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.10 09-Feb-1996  christos miscfs prototype changes
 1.9 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.6 05-Jan-1994  cgd fix return type for fifo_print
 1.5 07-Sep-1993  ws Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.4 27-Jun-1993  andrew ANSIfications - lots of function prototyping.
 1.3 20-May-1993  cgd add rcs ids as necessary, and also clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.16.28.1 01-Oct-2001  fvdl Catch up with -current.
 1.16.26.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.26.1 10-Jul-2001  lukem add fifo_kqfilter() and filt_fifo*()
 1.16.24.3 11-Nov-2002  nathanw Catch up to -current
 1.16.24.2 08-Jan-2002  nathanw Catch up to -current.
 1.16.24.1 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.19.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.19.6.1 03-Aug-2004  skrll Sync with HEAD
 1.21.12.2 04-Feb-2008  yamt sync with head.
 1.21.12.1 21-Jun-2006  yamt sync with head.
 1.22.2.1 20-Oct-2005  yamt adapt specfs and fifofs.
 1.24.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.24.46.1 23-Mar-2008  matt sync with HEAD
 1.25.32.1 30-May-2010  rmind sync with head
 1.25.30.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.25.10.1 11-Aug-2010  yamt sync with head.
 1.26.76.1 01-Aug-2021  thorpej Sync with HEAD.
 1.91 11-Oct-2021  thorpej Setting EV_EOF requires modifying kn->kn_flags. However, that relies on
holding the kq_lock of that note's kq. Rather than exposing this directly,
add new knote_set_eof() and knote_clear_eof() functions that handle the
necessary locking and don't leak as many implementation details to modules.

NetBSD 9.99.91
 1.90 02-Oct-2021  thorpej - Add a new EVFILT_WRITE test case for FIFOs that correctly validates
the writability thresholds.
- Fix a bug in fifo_kqfilter() exposed by the new test case; in the
EVFILT_WRITE case, we were attaching the wrong end of the socket
pair to the knote!
- In filt_fiforead(), use ">= so->so_rcv.sb_lowat" rather than "> 0"
for consistency with fifo_poll(). NFC.
 1.89 02-Oct-2021  thorpej ...and correct my terrible spelling.
 1.88 02-Oct-2021  thorpej - Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.
 1.87 02-Oct-2021  thorpej - fifo_poll(): If the last writer has disappeared, detect this and return
POLLHUP, per POSIX.
- fifo_close(): Use the new fifo_socantrcvmore(), which is like the
garden-variety socantrcvmore(), except it specifies POLL_HUP rather
than POLL_IN (so the correct code for SIGIO is sent).
- sowakeup(): Allow POLL_HUP as a code (notifies poll'ers with POLLHUP).
- Add test cases for correct POLLHUP behavior with FIFOs.

Fixes PR kern/56429.
 1.86 29-Sep-2021  thorpej The kq filterops that interact with sockets are MPSAFE.
 1.85 29-Sep-2021  thorpej - Change selremove_knote() from returning void to bool, and return
true if the last knote was removed and there are no more knotes
on the selinfo.
- Use this new return value in filt_sordetach(), filt_sowdetach(),
filt_fifordetach(), and filt_fifowdetach() to know when to clear
SB_KOTE without having to know select/kqueue implementation details.
 1.84 26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.83 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.82 19-Dec-2020  thorpej branches: 1.82.4;
Use sel{record,remove}_knote().
 1.81 27-Jun-2020  christos branches: 1.81.2;
Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.80 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.79 25-Oct-2017  maya branches: 1.79.8;
Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.78 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.77 09-Aug-2014  rtr branches: 1.77.4; 1.77.8; 1.77.12;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.76 25-Jul-2014  dholland Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.75 17-May-2014  rmind fifo_open: assign v_fifoinfo once initialised; add an assert while here.
 1.74 07-Feb-2014  hannken branches: 1.74.2;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.73 08-Apr-2013  skrll branches: 1.73.4;
Remove some set but unused variables
 1.72 21-Dec-2011  christos branches: 1.72.6;
only set CANTRCVMORE if no error.
 1.71 20-Dec-2011  christos - Eliminate so_nbio and turn it into a bit SS_NBIO in so_state.
- Introduce MSG_NBIO so that we can turn non blocking i/o on a per call basis
- Use MSG_NBIO to fix the XXX: multi-threaded issues on the fifo sockets.
- Don't set SO_CANTRCVMORE, if we were interrupted (perhaps do it for all
errors?).
 1.70 31-Aug-2011  plunky branches: 1.70.2; 1.70.6;
NULL does not need a cast
 1.69 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.68 29-Mar-2010  pooka Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.67 27-Mar-2010  pooka Access fifoinfo only when it's non-NULL.
 1.66 28-Apr-2008  martin branches: 1.66.20; 1.66.22;
Remove clause 3 and 4 from TNF licenses
 1.65 24-Apr-2008  ad branches: 1.65.2;
Fix locking in the fifo kqueue routines.
 1.64 24-Apr-2008  ad Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.63 21-Mar-2008  ad branches: 1.63.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.62 11-Feb-2008  yamt branches: 1.62.6;
sprinkle KERNEL_LOCK for socket.
a little different version was tested by Matthias Drochner.
 1.61 06-Feb-2008  ad Don't lock the socket to set/clear FNONBLOCK. Just set it atomically.
 1.60 25-Jan-2008  ad Remove VOP_LEASE. Discussed on tech-kern.
 1.59 05-Dec-2007  pooka Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.58 26-Nov-2007  pooka branches: 1.58.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.57 16-Nov-2006  christos branches: 1.57.22; 1.57.24; 1.57.30;
__unused removal on arguments; approved by core.
 1.56 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.55 14-May-2006  elad branches: 1.55.8; 1.55.10;
integrate kauth.
 1.54 11-Dec-2005  christos branches: 1.54.4; 1.54.6; 1.54.8; 1.54.10; 1.54.12;
merge ktrace-lwp.
 1.53 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.52 30-Aug-2005  xtraeme branches: 1.52.2;
Remove __P()
 1.51 26-Feb-2005  perry branches: 1.51.4;
nuke trailing whitespace
 1.50 17-Jul-2004  mycroft branches: 1.50.4; 1.50.6;
Clean up reader/writer counts for the revoke case in fifo_close().
 1.49 22-May-2004  jonathan Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.
 1.48 12-May-2004  jrf caddr_t -> void * and removal of some more casts.
 1.47 29-Apr-2004  jrf Removed remaining caddr_t casts we do not need in miscfs. Recompiled
kernel and ran for a day or so. There are still some caddr_t types in
the arguments of some calls, I will do those separately (later) as
they touch a lot more of the system.
Approved by christos@NetBSD.org.
 1.46 06-Mar-2004  wrstuden Handle the case of fifo_close() getting called from vclean(). In that
case, we tear down the node-specific storage as if there were no more open
users. As vclean() will VT_NON the vnode before anyone else will get access
to the vnode, this is our last chance.

Fixes memory leak in revoke(2) path noticed by tedu at openbsd dot org.
 1.45 29-Nov-2003  matt Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.
 1.44 29-Nov-2003  perry Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.
 1.43 03-Sep-2003  matt Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.
 1.42 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.41 29-Jun-2003  fvdl branches: 1.41.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.40 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.39 17-Mar-2003  martin Fix a race condition where a writer could already have closed the fifo
before the reader woke up - this made the reader loop again, waiting
for another writer, even though there was input available.

Thanks to Jaromir for spotting the real cause and sugesting a solution.

This should fix PR port-sparc64/20283.
 1.38 02-Mar-2003  jdolecek use different wmesg for the reader and the writer
 1.37 26-Nov-2002  christos si_ -> sel_
 1.36 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.35 26-Aug-2002  thorpej Fix a signed/unsigned comparison warning from GCC 3.3.
 1.34 27-Jul-2002  chs we can't use the vnode's v_usecount to track how many times the vnode
has been VOP_OPEN()'d. if the fifo is being accessed via a layered fs,
v_usecount is always one (representing the hold by the layered vnode)
regardless of how many times the vnode has been opened. instead, keep a
separate counter for opens. fixes PR 17195 and probably 17724.
 1.33 06-Dec-2001  chs branches: 1.33.8; 1.33.10;
add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.32 10-Nov-2001  lukem add RCSIDs
 1.31 22-Sep-2001  sommerfeld branches: 1.31.2;
Add fifo_putpages() placebo so that the vnode's uobj is unlocked.
 1.30 27-Feb-2001  lukem branches: 1.30.2; 1.30.4; 1.30.6;
convert to ansi knf
 1.29 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.28 30-Mar-2000  augustss branches: 1.28.4;
Register, begone!
 1.27 31-Oct-1998  matt branches: 1.27.12;
Use the so_send and so_receive funcptrs in the socket instead of calling
sosend/soreceive directly. [I've been meaning to commit these for months.]
 1.26 03-Aug-1998  kleink Recognize _PC_SYNC_IO.
 1.25 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.24 09-Oct-1997  mycroft Make openstr[] const.
 1.23 18-May-1997  kleink branches: 1.23.4;
When reading from an empty FIFO no process has opened for writing, and
O_NONBLOCK is set, return 0.
 1.22 13-Oct-1996  christos backout previous kprintf changes
 1.21 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.20 07-Sep-1996  mycroft Implement poll(2).
 1.19 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.18 16-Mar-1996  christos Fix printf format follies.
 1.17 09-Feb-1996  christos miscfs prototype changes
 1.16 14-Apr-1995  mycroft Allow opening a FIFO with O_RDWR.
 1.15 02-Apr-1995  mycroft Emulate SCO behaviour when both FREAD and FWRITE are set, but only for SCO
executables.
 1.14 14-Dec-1994  mycroft Remove a_fp.
 1.13 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.12 14-Nov-1994  christos fixed struct comment
 1.11 29-Oct-1994  cgd light clean; make sure headers are properly included, types are OK, etc.
 1.10 20-Oct-1994  cgd update for new syscall args description mechanism
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.7 05-Jan-1994  cgd fix return type for fifo_print
 1.6 18-Dec-1993  mycroft Canonicalize all #includes.
 1.5 27-Jun-1993  andrew branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.4 27-May-1993  cgd make the sleeps on socket open interruptable.
 1.3 20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.2 02-Apr-1993  cgd Jay Fenlason <hack@datacube.com>:

1: the fi_readers and fi_writers fields of the fifoinfo structure were not
being initialized to 0. This caused the driver to not sleep the first
process to open the fifo--it thought there was already another process to
talk to (most of the time.)

2: fifo_open() was calling tsleep() without unlocking the inode of the fifo
file. This caused *any* subsequent access to the file (even an ls (!)) to
hang forever. Note that this bug was usually masked by bug #1 above.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.5.4.2 14-Nov-1993  mycroft Canonicalize all #includes.
 1.5.4.1 10-Nov-1993  mycroft AF_UNIX --> AF_LOCAL
 1.23.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.27.12.3 12-Mar-2001  bouyer Sync with HEAD.
 1.27.12.2 11-Feb-2001  bouyer Sync with HEAD.
 1.27.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.4.1 08-Apr-2004  jmc Pullup patch (requested by wrstuden in ticket #125)

Handle the case of fifo_close() getting called from vclean(). In that
case, we tear down the node-specific storage as if there were no more open
users. As vclean() will VT_NON the vnode before anyone else will get access
o the vnode, this is our last chance.

Fixes memory leak in revoke(2) path.
 1.30.6.2 01-Oct-2001  fvdl Catch up with -current.
 1.30.6.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.30.4.7 29-Sep-2002  jdolecek don't need cast to (caddr_t) for kn_hook anymore
 1.30.4.6 24-Sep-2002  jdolecek shuffle filt_*() to not need forward decls, put all kq related function
and structures together
 1.30.4.5 22-Sep-2002  jdolecek need to set kn_hook in fifo_kqfilter() (looks like merge botch,
this thing is correct in FreeBSD version)
 1.30.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.30.4.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.30.4.2 07-Sep-2001  thorpej More const.
 1.30.4.1 10-Jul-2001  lukem add fifo_kqfilter() and filt_fifo*()
 1.30.2.7 11-Dec-2002  thorpej Sync with HEAD.
 1.30.2.6 11-Nov-2002  nathanw Catch up to -current
 1.30.2.5 27-Aug-2002  nathanw Catch up to -current.
 1.30.2.4 01-Aug-2002  nathanw Catch up to -current.
 1.30.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.30.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.30.2.1 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.31.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.33.10.2 15-Mar-2004  jmc Pullup rev 1.46 (requested by wrstuden in ticket #1621)

Handle the case of fifo_close() getting called from vclean().
Fixes memory leak in revoke(2) path.
 1.33.10.1 29-Jul-2002  lukem Pull up revision 1.34 (requested by chuq in ticket #566):
we can't use the vnode's v_usecount to track how many times the vnode
has been VOP_OPEN()'d. if the fifo is being accessed via a layered fs,
v_usecount is always one (representing the hold by the layered vnode)
regardless of how many times the vnode has been opened. instead, keep a
separate counter for opens. fixes PR 17195 and probably 17724.
 1.33.8.1 29-Aug-2002  gehenna catch up with -current.
 1.41.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.41.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.41.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.41.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.41.2.2 03-Aug-2004  skrll Sync with HEAD
 1.41.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.50.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.50.4.1 29-Apr-2005  kent sync with -current
 1.51.4.6 24-Mar-2008  yamt sync with head.
 1.51.4.5 27-Feb-2008  yamt sync with head.
 1.51.4.4 11-Feb-2008  yamt sync with head.
 1.51.4.3 04-Feb-2008  yamt sync with head.
 1.51.4.2 07-Dec-2007  yamt sync with head
 1.51.4.1 21-Jun-2006  yamt sync with head.
 1.52.2.1 20-Oct-2005  yamt adapt specfs and fifofs.
 1.54.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.54.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.54.8.1 24-May-2006  yamt sync with head.
 1.54.6.1 01-Jun-2006  kardel Sync with head.
 1.54.4.1 09-Sep-2006  rpaulo sync with head
 1.55.10.2 10-Dec-2006  yamt sync with head.
 1.55.10.1 22-Oct-2006  yamt sync with head
 1.55.8.1 18-Nov-2006  ad Sync with head.
 1.57.30.2 18-Feb-2008  mjf Sync with HEAD.
 1.57.30.1 08-Dec-2007  mjf Sync with HEAD.
 1.57.24.2 23-Mar-2008  matt sync with HEAD
 1.57.24.1 09-Jan-2008  matt sync with HEAD
 1.57.22.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.57.22.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.58.2.1 08-Dec-2007  ad Sync with head.
 1.62.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.62.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.63.2.1 18-May-2008  yamt sync with head.
 1.65.2.2 11-Aug-2010  yamt sync with head.
 1.65.2.1 16-May-2008  yamt sync with head.
 1.66.22.2 03-Jul-2010  rmind sync with head
 1.66.22.1 30-May-2010  rmind sync with head
 1.66.20.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.66.20.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.70.6.1 18-Feb-2012  mrg merge to -current.
 1.70.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.70.2.1 17-Apr-2012  yamt sync with head
 1.72.6.3 03-Dec-2017  jdolecek update from HEAD
 1.72.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.72.6.1 23-Jun-2013  tls resync from head
 1.73.4.2 18-May-2014  rmind sync with head
 1.73.4.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.74.2.1 10-Aug-2014  tls Rebase.
 1.77.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.77.8.1 26-Apr-2017  pgoyette Sync with HEAD
 1.77.4.1 28-Aug-2017  skrll Sync with HEAD
 1.79.8.3 04-Oct-2021  martin Pull up following revision(s) (requested by thorpej in ticket #1353):

sys/miscfs/fifofs/fifo_vnops.c: revision 1.90
tests/kernel/kqueue/write/t_fifo.c: revision 1.5

- Add a new EVFILT_WRITE test case for FIFOs that correctly validates
the writability thresholds.
- Fix a bug in fifo_kqfilter() exposed by the new test case; in the
EVFILT_WRITE case, we were attaching the wrong end of the socket
pair to the knote!
- In filt_fiforead(), use ">= so->so_rcv.sb_lowat" rather than "> 0"
for consistency with fifo_poll(). NFC.
 1.79.8.2 04-Oct-2021  martin Pull up following revision(s) (requested by thorpej in ticket #1351):

sys/miscfs/fifofs/fifo_vnops.c: revision 1.88
sys/kern/uipc_syscalls.c: revision 1.201
tests/lib/libc/sys/t_poll.c: revision 1.6
tests/lib/libc/sys/t_poll.c: revision 1.7
tests/lib/libc/sys/t_poll.c: revision 1.8

- Strenghen the poll(2) fifo_inout test to ensure that once the reader
has read enough that exactly PIPE_BUF space is available that the FIFO
becomes writable again.
- When creating a FIFO, ensure that the receive low water mark is 1
(a FIFO must be readable when at least 1 byte is available); this
was already the case implicitly, but this makes it explicit.
- Similarly, set the send low water mark to PIPE_BUF to ensure that
the pipe is writable when at least PIPE_BUF bytes of space are available
in the send buffer. Without this change, the strengthened test case
above does not pass (the default send low water mark is larger than
PIPE_BUF; see soreserve()).
- Make the same low water mark changes to the PIPE_SOCKETPAIR case.

In the fifo_hup1 test, also ensure that POLLHUP is de-asserted when a
new writer appears.

Add a fifo_inout test case that validates the expected POLLIN / POLLOUT
behavior for FIFOs:
- A FIFO is readable so long as at least 1 byte is available.
- A FIFO is writable so long as at least PIPE_BUF (obtained with _PC_PIPE_BUF)
space is avaiable.
This will be cloned for a forthcoming kevent test case.
 1.79.8.1 02-Oct-2021  martin Pull up following revision(s) (requested by thorpej in ticket #1350):

sys/kern/uipc_socket2.c: revision 1.140
tests/lib/libc/sys/t_poll.c: revision 1.5
sys/miscfs/fifofs/fifo_vnops.c: revision 1.87

- fifo_poll(): If the last writer has disappeared, detect this and return
POLLHUP, per POSIX.
- fifo_close(): Use the new fifo_socantrcvmore(), which is like the
garden-variety socantrcvmore(), except it specifies POLL_HUP rather
than POLL_IN (so the correct code for SIGIO is sent).
- sowakeup(): Allow POLL_HUP as a code (notifies poll'ers with POLLHUP).
- Add test cases for correct POLLHUP behavior with FIFOs.

Fixes PR kern/56429.
 1.81.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.82.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.2 08-Jul-1999  wrstuden branches: 1.2.14; 1.2.16; 1.2.18;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.1 12-Jun-1998  cgd branches: 1.1.10;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.10.1 02-Aug-1999  thorpej Update from trunk.
 1.2.18.1 01-Oct-2001  fvdl Catch up with -current.
 1.2.16.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.2.14.1 21-Sep-2001  nathanw Catch up to -current.
 1.39 27-Mar-2022  christos dedup the eofs link/symlink methods
 1.38 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.37 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.36 07-Aug-2020  christos branches: 1.36.6;
accmode should be accmode_t
 1.35 27-Jun-2020  christos Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.34 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.33 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.32 27-Feb-2014  hannken branches: 1.32.6; 1.32.10; 1.32.14;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.31 02-May-2013  riastradh branches: 1.31.4;
Fix (harmless) typo in struct genfs_rename_ops::gro_lookup prototype.
 1.30 08-May-2012  riastradh branches: 1.30.2;
Implement a genfs_rename abstraction.

First major step in incrementally adapting all the file systems to a
saner rename VOP protocol.
 1.29 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.28 30-Nov-2009  pooka branches: 1.28.12; 1.28.16; 1.28.18;
Introduce genfs_statvfs() as pretty much a no-info statvfs and
convert several pseudo file systems to use it.
 1.27 23-Jun-2009  elad Move the implementation of vaccess() to genfs_can_access(), in line with
the other routines of the same spirit.

Adjust file-system code to use it.

Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).

No objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
 1.26 07-May-2009  elad Extract the open-coded authorization logic for chtimes() from various
file-systems and put it in a single function, genfs_can_chtimes().

This also makes UDF follow the same policy as all other file-systems.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/27/msg004951.html
 1.25 25-Apr-2009  elad Add genfs_can_mount() and use it to prevent some more code duplication of
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).

Proposed with no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html

The vnode is always expected to be locked, so no locking is done outside
the file-system code.
 1.24 22-Apr-2009  elad Per discussion on tech-kern@:

- Replace use of label/goto with returns

- Rename, change prototype of, and move functions from vfs_subr.c to
genfs_vnops.c
 1.23 28-Jan-2008  dholland branches: 1.23.10; 1.23.18; 1.23.24;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.22 25-Jan-2008  ad Remove VOP_LEASE. Discussed on tech-kern.
 1.21 24-Apr-2007  perseant branches: 1.21.8; 1.21.14;
Split the VOP interface part of genfs_putpages() from the code. The new
function that does the work, genfs_do_putpages(), now takes as an argument
a pointer to the page that would be waited on, if PGO_BUSYWAIT were not set.
This allows a consumer, e.g. lfs_putpages(), to perform an action outside
the scope of UVM before sleeping on the page in question.
 1.20 11-Dec-2005  christos branches: 1.20.24; 1.20.26; 1.20.30; 1.20.32; 1.20.38;
merge ktrace-lwp.
 1.19 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.18 30-Aug-2005  xtraeme branches: 1.18.2;
Remove __P()
 1.17 10-Apr-2003  jdolecek branches: 1.17.2; 1.17.18;
use former genfs_eopnotsupp_rele() as genfs_eopnotsupp(), so that vnodes
are vput()/vrele()d as necessary - some filesystems did use the wrong
one for some ops, and it's just safer to not take the chance

based on suggestion by Bill Studenmund
 1.16 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.15 18-Dec-2001  chs add some compatibility routines to allow mmap() to work non-UBCified
filesystems (in the same non-coherent fashion that they worked before).
 1.14 06-Dec-2001  chs add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.13 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.12 28-May-2001  chs branches: 1.12.2; 1.12.4;
add a genfs_mmap() and change all of the disk-based filesystems
to implement VOP_MMAP() with the genfs version, in preparation for
actually using this VOP.
 1.11 27-Nov-2000  chs branches: 1.11.2;
Initial integration of the Unified Buffer Cache project.
 1.10 03-Aug-1999  wrstuden branches: 1.10.2;
Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.9 08-Jul-1999  wrstuden Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.8 13-Aug-1998  kleink branches: 1.8.8;
Add genfs_einval(), which does the obvious thing.
 1.7 10-Aug-1998  matthias create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.6 25-Jun-1998  thorpej - Rename nqnfs_vop_lease_check() to genfs_lease_check(). If NFSSERVER is
not in the kernel, genfs_lease_check() is simply a no-op. This allows
LKM'd file systems to be exported (previously did not work properly
due to a compile-time decision based on -DNFSSERVER).
- defopt NFSSERVER
 1.5 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.4 05-Jan-1998  perry RCSID Police.
 1.3 11-Apr-1997  kleink Implement a POSIX compliant genfs VOP_SEEK() and use it in the appropriate
places; by Chris G. Demetriou and myself.
 1.2 07-Sep-1996  mycroft Implement poll(2).
 1.1 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.8.8.3 02-Aug-1999  thorpej Oops, some prototypes got nuked by mistake.
 1.8.8.2 02-Aug-1999  thorpej Update from trunk.
 1.8.8.1 04-Jul-1999  chs create genfs_getpages() and genfs_putpages().
these should be able to handle most of the local-disk filesystems.
 1.10.2.1 08-Dec-2000  bouyer Sync with HEAD.
 1.11.2.4 11-Nov-2002  nathanw Catch up to -current
 1.11.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.11.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.11.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.12.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.12.2.3 25-Sep-2002  jdolecek implement genfs_kqfilter() - this is based upon ufs_kqfilter(), but uses
vp->v_size for EVFILT_READ
 1.12.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.12.2.1 10-Jul-2001  lukem prototype genfs_kqfilter()
 1.17.18.3 04-Feb-2008  yamt sync with head.
 1.17.18.2 03-Sep-2007  yamt sync with head.
 1.17.18.1 21-Jun-2006  yamt sync with head.
 1.17.2.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.18.2.1 20-Oct-2005  yamt remove genfs_fsync.
 1.20.38.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.20.32.1 11-Jul-2007  mjf Sync with head.
 1.20.30.1 08-Jun-2007  ad Sync with head.
 1.20.26.1 07-May-2007  yamt sync with head.
 1.20.24.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.21.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.21.8.1 23-Mar-2008  matt sync with HEAD
 1.23.24.2 23-Jul-2009  jym Sync with HEAD.
 1.23.24.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.23.18.1 28-Apr-2009  skrll Sync with HEAD.
 1.23.10.4 11-Mar-2010  yamt sync with head
 1.23.10.3 18-Jul-2009  yamt sync with head.
 1.23.10.2 16-May-2009  yamt sync with head
 1.23.10.1 04-May-2009  yamt sync with head.
 1.28.18.2 27-Oct-2014  msaitoh Pull up following revision(s) (requested by riastradh in ticket #1135):
sys/miscfs/genfs/genfs.h: revision 1.31
Fix (harmless) typo in struct genfs_rename_ops::gro_lookup prototype.
 1.28.18.1 02-Jul-2012  jdc Pull up revisions:
src/sys/conf/files revision 1.1050
src/sys/miscfs/genfs/genfs.h revision 1.30 via patch
src/sys/miscfs/genfs/genfs_rename.c revision 1.1 via patch
src/sys/rump/librump/rumpvfs/Makefile.rumpvfs revision 1.33
(requested by riastradh in ticket #286).

Implement a genfs_rename abstraction.

First major step in incrementally adapting all the file systems to a
saner rename VOP protocol.
 1.28.16.2 02-Jun-2012  mrg sync to latest -current.
 1.28.16.1 05-Apr-2012  mrg sync to latest -current.
 1.28.12.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.28.12.2 23-May-2012  yamt sync with head.
 1.28.12.1 17-Apr-2012  yamt sync with head
 1.30.2.3 03-Dec-2017  jdolecek update from HEAD
 1.30.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.30.2.1 23-Jun-2013  tls resync from head
 1.31.4.1 18-May-2014  rmind sync with head
 1.32.14.1 21-Apr-2017  bouyer Sync with HEAD
 1.32.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.32.6.1 28-Aug-2017  skrll Sync with HEAD
 1.36.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.104 05-Apr-2024  riastradh uvm: Expand v_size <= v_writesize assertions to help diagnostics.

PR kern/58117
 1.103 09-Apr-2023  riastradh genfs: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.102 14-Jan-2022  riastradh genfs(9): Prune dead branch.
 1.101 19-Aug-2020  simonb Remove trailing \n from UVMHIST_LOG() format strings.
 1.100 14-Aug-2020  chs centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.99 10-Aug-2020  rin Output offsets in hex for UVMHIST.
 1.98 14-Jun-2020  ad genfs_putpages(): when building a cluster make use of pages in the in the
existing uvm_page_array.
 1.97 25-May-2020  ad - Alter the convention for uvm_page_array slightly, so the basic search
parameters can't change part way through a search: move the "uobj" and
"flags" arguments over to uvm_page_array_init() and store those with the
array.

- With that, detect when it's not possible to find any more pages in the
tree with the given search parameters, and avoid repeated tree lookups if
the caller loops over uvm_page_array_fill_and_peek().
 1.96 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.95 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.94 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.93 14-Mar-2020  ad Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.92 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.91 14-Mar-2020  ad Unused variable.
 1.90 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.89 14-Mar-2020  ad OR into bp->b_cflags; don't overwrite.
 1.88 27-Feb-2020  ad Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
 1.87 24-Feb-2020  ad v_interlock -> vmobjlock
 1.86 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.85 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.84 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.83 31-Dec-2019  ad branches: 1.83.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.82 31-Dec-2019  ad Rename uvm_page_locked_p() -> uvm_page_owner_locked_p()
 1.81 16-Dec-2019  ad genfs_do_putpages(): add a missing call to uvm_page_array_advance().

Spotted by the automated test runs and:

Reported-by: syzbot+adc1f0ce21bcece5307d@syzkaller.appspotmail.com
 1.80 16-Dec-2019  ad Correction to previous for DEBUG case.
 1.79 15-Dec-2019  ad Fix DEBUG build.
 1.78 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.77 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.76 06-Oct-2019  mlelstv Defer to synchronous I/O before the aiodone work queue exists.
 1.75 11-Jul-2019  maxv Fix (harmless) uninitialized variable: 'pg' could be 'endm', in which case
'pg->uobject' would not be initialized. Just invert the two last conditions
of the KASSERT.

ok hannken@
 1.74 10-Dec-2018  jdolecek assert that WAPBL journal write lock is actually held when called with
PGO_JOURNALLOCKED or IO_JOURNALLOCKED

suggested by mrg@, thanks
 1.73 09-Dec-2018  jdolecek support flag PGO_JOURNALLOCKED also for genfs_getpages()
 1.72 28-May-2018  chs branches: 1.72.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.71 28-Oct-2017  pgoyette branches: 1.71.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.70 27-Jun-2017  hannken Add missing check for dead or dying vnode to the entry of genfs_getpages().
 1.69 04-Jun-2017  hannken Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
 1.68 01-Apr-2017  dholland branches: 1.68.6;
Clarify meaning of "glocked" argument of genfs_putpages_read.
 1.67 01-Apr-2017  riastradh Simplify genfs_getpages_read async/unlock protocol.

Previously the caller unlocked for error or sync I/O, whereas
genfs_getpages_read unlocked on successful async.

Now caller unlocks in every case, and genfs_getpages_read doesn't
touch the lock.
 1.66 30-Mar-2017  hannken Change last users of FSTRANS_LAZY to FSTRANS_SHARED and change
genfs_suspendctl() to move from FSTRANS_NORMAL to FSTRANS_SUSPENDED
and vice versa.
 1.65 09-Mar-2017  hannken Protect genfs_do_putpages() against vnodes disappearing during
a forced mount update from read-write to read-only.
 1.64 01-Mar-2017  hannken Protect genfs_getpages() against vnodes disappearing during a
forced mount update from read-write to read-only.
 1.63 29-Sep-2016  christos branches: 1.63.2;
don't change the loop counts; noted by mrg@
 1.62 29-Sep-2016  christos Allow sparc kernels to build with SSP by using a constant PAGE_SIZE...
 1.61 06-May-2015  hannken branches: 1.61.2;
Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@
 1.60 12-Apr-2015  skrll Fix UVMHIST build.
 1.59 10-Apr-2015  riastradh Pull VOP_BMAP/VOP_STRATEGY loop from getpages into its own function.

No functional change.

In preparation for a gop_read like the existing gop_write.
 1.58 25-Oct-2013  martin branches: 1.58.6;
Turn a few __unused into __diagused
 1.57 19-Oct-2013  martin Mark a potentially unused variable
 1.56 19-Oct-2013  martin Mark a potentially unused (if an arch implements pmap_update as empty
macro) variable accordingly.
 1.55 22-May-2012  yamt branches: 1.55.2; 1.55.4;
don't block on pager map for read-ahead.
reduce code duplication.
 1.54 29-Apr-2012  chs change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.53 31-Oct-2011  yamt branches: 1.53.2; 1.53.6; 1.53.8;
typo in a comment
 1.52 09-Oct-2011  uebayasi Trim unused headers.
 1.51 01-Sep-2011  matt Use the new UVM_KMF_COLORMATCH flag to get a congruent mappings of the user
buffer so we can use unmanaged mappings (pmap_kenter_pa/pmap_kremove).
 1.50 31-Aug-2011  rmind genfs_do_directio: acquire the lock of page owner for now and fix PR/45177.
Will be revisited to avoid locking dance and be more efficient, e.g. we can
use unmanaged-mapping by allocating with colouring in mind.
 1.49 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.48 21-Apr-2011  matt branches: 1.48.2;
Move some #ifdefs to prevent a code path change when DEBUG .vs. !DEBUG
Solves problem an assert firing when using NFS on MIPS.
 1.47 18-Apr-2011  rmind G/C unused speedup_syncer() mechanism and thus simplify some code.
Update some comments to reflect the reality. No actual changes to
the (used) syncer logic.

OK ad@
 1.46 06-Dec-2010  uebayasi branches: 1.46.2;
Correct an assertion; pointed out by mrg@ and pooka@, thanks.
 1.45 03-Dec-2010  hannken genfs_do_putpages(): When testing an uobject for dirty or modified
pages skip uninitialized (PG_FAKE) pages (DEBUG only).
 1.44 30-Nov-2010  hannken Always take the object lock before changing vmpage flags. Fixes a deadlock
where a thread is waiting on "genput" but the page in question is neither
BUSY nor WANTED.

No objections from tech-kern@.
 1.43 19-Nov-2010  uebayasi Whitespace.
 1.42 09-Nov-2010  hannken Genfs_getpages(): Break a deadlock where one thread runs VOP_GETPAGES(),
has busy pages and wants the wapbl lock as reader from wapbl_begin(),
another thread has the wapbl lock as reader and waits for a page from
the first thread. Now a third thread calls wapbl_flush() and wants the
wapbl lock as writer.

Move the wapbl_begin() up to a point where genfs_getpages() has no busy
pages yet.
 1.41 03-Nov-2010  uebayasi genfs_getpages: restore vm_page array correctly in PGO_LOCKED error
code path.
 1.40 01-Sep-2010  chs replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.39 19-Aug-2010  pooka print more info in the "past eof" panic
 1.38 08-Aug-2010  chs in genfs_getpages(), mark the vnode dirty (ie. add to syncer worklist
and set VI_WRMAPDIRTY) after we have busied the pages rather than
before. this prevents other threads calling genfs_do_putpages() from
marking the vnode clean again while we're in the process of creating
new writable mappings, since such threads will wait for the page(s) to
become unbusy before proceeding.
fixes the problem recently reported by hannken@ on tech-kern.
 1.37 29-Jul-2010  hannken Add vm page flag PG_MARKER and use it to tag dummy marker pages
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
 1.36 30-Jan-2010  uebayasi branches: 1.36.2; 1.36.4;
Reduce the diff between genfs_getpages() and genfs_do_io(). These should be
merged eventually.
 1.35 30-Jan-2010  uebayasi Slightly more descriptive local variable names.
 1.34 29-Jan-2010  uebayasi genfs_getpages: Narrow & clarify the context where I/O happens & vmobjlock is dropped.
 1.33 29-Jan-2010  uebayasi genfs_getpages: Redo previous with a better goto label.
 1.32 28-Jan-2010  uebayasi Revert part which variable initializations within interleaved gotos.

again: if (...) goto err;
void *ptr = alloc();
if (...) goto again;
if (...) goto err1;
...
err1: if (ptr) free(ptr);
err:
return;

This leaks memory if exited with "goto again; -> goto err;".
 1.31 28-Jan-2010  uebayasi genfs_getpages: More constification & localization.
 1.30 28-Jan-2010  uebayasi genfs_getpages: Constify 2 variables, move one. No functional changes.
 1.29 28-Jan-2010  uebayasi genfs_getpages: Constify orignpages. Don't override its meaning by the value
re-calucated from GOP_SIZE(GOP_SIZE_MEM), but assign another variable
(orignmempages).
 1.28 28-Jan-2010  uebayasi Unbreak modules build.
 1.27 28-Jan-2010  uebayasi genfs_getpages: Constify & localize more variables.
 1.26 28-Jan-2010  uebayasi genfs_getpages: Move local variable declarations that are used only for I/O
to where they're used. This helps to track what's going in this lengthy
function.
 1.25 28-Jan-2010  uebayasi genfs_getpages: Localize a few more variables.
 1.24 28-Jan-2010  uebayasi genfs_putpages: Localize a few variables. No functional changes.
 1.23 27-Jan-2010  uebayasi Use genfs_node_*lock().
 1.22 27-Jan-2010  uebayasi Constify some pointers in genfs_getpages() and genfs_do_putpages().
 1.21 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.20 18-Apr-2009  pooka Move genfs_null_putpages() from genfs_io.c to genfs_vnops.c -- it does
not really do i/o.
 1.19 23-Feb-2009  rmind genfs_getpages: rework 1.18 revision - move uvm_pagermapout() back.
It is useful to make KVA available ASAP. Per discussion with <yamt>.
 1.18 04-Feb-2009  rmind branches: 1.18.2;
genfs_getpages: move putiobuf() and uvm_pagermapout() outside the glock.
OK by <ad>.
 1.17 16-Jan-2009  yamt - g/c stale function prototypes.
- rename UVM_PAGE_HASH_PENALTY to UVM_PAGE_TREE_PENALTY.
 1.16 01-Dec-2008  joerg Check that the filesystem acutally uses WAPBL before initiating a
transaction for the directio case. Fixes PR 39929 and similiar issues
seen with PostgreSQL.
 1.15 16-Nov-2008  pooka more <sys/buf.h> police
 1.14 31-Oct-2008  christos - allocate 8 pointers on the stack to avoid stack overflow in nfs.
- make that 8 a constant
- remove bogus panic
 1.13 19-Oct-2008  hannken branches: 1.13.2; 1.13.4;
Make genfs_directio() IO_JOURNALLOCKED aware. DirectIO no longer triggers
"locking against myself" panic in wapbl_begin().

Observed and tested by: Frank Kardel <kardel@netbsd.org>
 1.12 10-Oct-2008  hannken Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.11 14-Aug-2008  yamt remove always-true conditionals.
 1.10 11-Aug-2008  yamt constify
 1.9 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.8 04-Jun-2008  ad branches: 1.8.2; 1.8.4;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.7 14-May-2008  reinoud Import writing part of the UDF file system making optical media like CD's
and DVD's behave like floppy discs. Writing is supported upto and including
version 2.01; version 2.50 and 2.60 will follow.

Also extending the UDF implementation to support symbolic links and
hardlinks.

Added are the mmcformat(8) tool to format rewritable CD/DVD discs and
newfs_udf(8).

Limitations:
all operations can be performed on the file system though the
sheduling is currently optimised for archiving workloads.

mv(1)/rename(2) is currently only implemented for non-directories.
 1.6 19-Apr-2008  hannken branches: 1.6.2; 1.6.4;
Remove a race when pages are released while waiting for fstrans_start().

Fixes PR #38460
 1.5 18-Jan-2008  yamt branches: 1.5.6; 1.5.8;
genfs_do_putpages: DEBUG checks.
 1.4 18-Jan-2008  yamt genfs_do_putpages: ensure that we clean the vnode in the case of PGO_RECLAIM.
 1.3 18-Jan-2008  yamt push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.2 02-Jan-2008  ad Merge vmlocking2 to head.
 1.1 17-Oct-2007  pooka branches: 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.12; 1.1.14; 1.1.16; 1.1.20;
Split I/O-related routines (getpages, putpages, etc.) which are heavily
tied to uvm out of genfs_vnops into genfs_io.c
 1.1.20.2 19-Jan-2008  bouyer Sync with HEAD
 1.1.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.1.16.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.1.14.4 23-Mar-2008  matt sync with HEAD
 1.1.14.3 09-Jan-2008  matt sync with HEAD
 1.1.14.2 06-Nov-2007  matt sync with HEAD
 1.1.14.1 17-Oct-2007  matt file genfs_io.c was added on branch matt-armv6 on 2007-11-06 23:33:16 +0000
 1.1.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.1.10.3 21-Jan-2008  yamt sync with head
 1.1.10.2 27-Oct-2007  yamt sync with head.
 1.1.10.1 17-Oct-2007  yamt file genfs_io.c was added on branch yamt-lazymbuf on 2007-10-27 11:35:52 +0000
 1.1.8.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.1.8.1 17-Oct-2007  joerg file genfs_io.c was added on branch jmcneill-pm on 2007-10-26 15:48:52 +0000
 1.1.6.2 23-Oct-2007  ad Sync with head.
 1.1.6.1 17-Oct-2007  ad file genfs_io.c was added on branch vmlocking on 2007-10-23 20:36:43 +0000
 1.1.4.2 18-Oct-2007  yamt sync with head.
 1.1.4.1 17-Oct-2007  yamt file genfs_io.c was added on branch yamt-x86pmap on 2007-10-18 08:33:12 +0000
 1.5.8.2 17-Jun-2008  yamt sync with head.
 1.5.8.1 18-May-2008  yamt sync with head.
 1.5.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.5.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.5.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.5.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.6.4.3 10-Oct-2008  skrll Sync with HEAD.
 1.6.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.6.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.6.2.5 09-Oct-2010  yamt sync with head
 1.6.2.4 11-Aug-2010  yamt sync with head.
 1.6.2.3 11-Mar-2010  yamt sync with head
 1.6.2.2 04-May-2009  yamt sync with head.
 1.6.2.1 16-May-2008  yamt sync with head.
 1.8.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.8.4.1 19-Oct-2008  haad Sync with HEAD.
 1.8.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.13.4.5 21-Apr-2012  riz Back out a commit included in the ticket 1750 patch which obviously
doesn't belong there.
 1.13.4.4 21-Apr-2012  riz Pull up following revision(s) (requested by spz in ticket #1750):
crypto/dist/openssl/crypto/mem.c patch
crypto/dist/openssl/crypto/asn1/a_d2i_fp.c patch
crypto/dist/openssl/crypto/buffer/buffer.c patch
sys/miscfs/genfs/genfs_io.c patch

Address CVE-2012-2110.
[spz, ticket #1750]
 1.13.4.3 07-Sep-2010  bouyer branches: 1.13.4.3.2;
Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.13.4.2 04-Apr-2009  snj branches: 1.13.4.2.2; 1.13.4.2.4;
Pull up following revision(s) (requested by joerg in ticket #664):
sys/miscfs/genfs/genfs_io.c: revision 1.16
Check that the filesystem acutally uses WAPBL before initiating a
transaction for the directio case. Fixes PR 39929 and similiar issues
seen with PostgreSQL.
 1.13.4.1 02-Nov-2008  snj Pull up following revision(s) (requested by tron in ticket #9):
sys/nfs/nfs_bio.c: revision 1.180
sys/miscfs/genfs/genfs_io.c: revision 1.14
sys/uvm/uvm_extern.h: revision 1.149
- allocate 8 pointers on the stack to avoid stack overflow in nfs.
- make that 8 a constant
- remove bogus panic
 1.13.4.3.2.2 21-Apr-2012  riz Back out a commit included in the ticket 1750 patch which obviously
doesn't belong there.
 1.13.4.3.2.1 21-Apr-2012  riz Pull up following revision(s) (requested by spz in ticket #1750):
crypto/dist/openssl/crypto/mem.c patch
crypto/dist/openssl/crypto/asn1/a_d2i_fp.c patch
crypto/dist/openssl/crypto/buffer/buffer.c patch
sys/miscfs/genfs/genfs_io.c patch

Address CVE-2012-2110.
[spz, ticket #1750]
 1.13.4.2.4.5 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.13.4.2.4.4 09-Feb-2012  matt Change to use the updated uvm_pageout_* signature.
 1.13.4.2.4.3 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.13.4.2.4.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.13.4.2.4.1 29-Apr-2011  matt Fix placement of #ifdef DEBUG / #endif
 1.13.4.2.2.3 21-Apr-2012  riz Back out a commit included in the ticket 1750 patch which obviously
doesn't belong there.
 1.13.4.2.2.2 21-Apr-2012  riz Pull up following revision(s) (requested by spz in ticket #1750):
crypto/dist/openssl/crypto/mem.c patch
crypto/dist/openssl/crypto/asn1/a_d2i_fp.c patch
crypto/dist/openssl/crypto/buffer/buffer.c patch
sys/miscfs/genfs/genfs_io.c patch

Address CVE-2012-2110.
[spz, ticket #1750]
 1.13.4.2.2.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.13.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.13.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.13.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.18.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.36.4.5 31-May-2011  rmind sync with head
 1.36.4.4 21-Apr-2011  rmind sync with head
 1.36.4.3 05-Mar-2011  rmind sync with head
 1.36.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.36.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.36.2.65 11-Feb-2011  uebayasi Clean up debug code.
 1.36.2.64 21-Nov-2010  uebayasi Clarify things a bit in XIP putpages.
 1.36.2.63 21-Nov-2010  uebayasi Put back XIP putpages, but slightly modifed to be called from the
generic putpages, and to call pgo_get() than a specific function.
Without this, UVM went mad after unmount (vinval, etc.).
 1.36.2.62 21-Nov-2010  uebayasi Rename PGO_ZERO as PGO_HOLE, and s/uvm_page_zeropage/uvm_page_holepage/.
 1.36.2.61 21-Nov-2010  uebayasi Assert.
 1.36.2.60 21-Nov-2010  uebayasi Resurrect PGO_ZERO support.

When vnode pager encounters hole pages in XIP'ed vnodes, it fills
page slots with PGO_ZERO and returns them back to the caller (fault
handler). Fault handlers are responsible to check page slots and
redirect PGO_ZERO to the single "zero page" allocated by calling
uvm_page_zeropage_alloc(9).

The zero page is wired, read-only (PG_RDONLY) page. It's shared
by multiple vnodes, it has no single owner.

XIP'ed vnodes are supposed to be "stable" during I/O (unlocked).
Because XIP'ed mounts are always read-only. There's no chance to
change mappings of XIP'ed vnodes and their XIP'ed pages. Thus the
cached uobj is reused after pgo_get() for PGO_ZERO.

(Do we need a new concept of "read-only UVM object"?)
 1.36.2.59 21-Nov-2010  uebayasi Revert XIP putpages totally.

XIP'ed uobj owns no pages; uvm_object::uo_npages is always 0,
nothing happens.

Upper layer is responsible to unmap pmap-level mappings.
 1.36.2.58 21-Nov-2010  uebayasi Clean up, reduce diff from trunk.
 1.36.2.57 21-Nov-2010  uebayasi Merge getpages finalization code.

In XIP case, there's nothing to do because MMIO device pages are
"staleless", unlike page caches used as I/O buffers.
 1.36.2.56 21-Nov-2010  uebayasi Kill one more goto.
 1.36.2.55 21-Nov-2010  uebayasi Clean up gotos.
 1.36.2.54 21-Nov-2010  uebayasi Clean up ifdefs.
 1.36.2.53 21-Nov-2010  uebayasi Adjust merged XIP getpages page slot offset calculation again so
it really works. Sprinkle a few assertions and UVMHISTs.
 1.36.2.52 20-Nov-2010  uebayasi genfs_do_getpages_xip_io_done: Adjust page condition checks:
- Expect uvn_findpage_xip() returns busy pages.
- Device pages are always initialized (== !uninitialized == !fake).
 1.36.2.51 20-Nov-2010  uebayasi XIP has no "fake" (== unitinialized) pages, because pages are
already initialized when mounted. Adjust getpages loop again.
 1.36.2.50 20-Nov-2010  uebayasi Adjust again when heading page slots are skipped.
 1.36.2.49 20-Nov-2010  uebayasi Fix a bug (offset calculation) in the previous.
 1.36.2.48 20-Nov-2010  uebayasi Snapshot of getpages BMAP loop merge.
 1.36.2.47 19-Nov-2010  uebayasi genfs_*_xip_io: Adjust start offset.
 1.36.2.46 19-Nov-2010  uebayasi Comment.
 1.36.2.45 19-Nov-2010  uebayasi Remove unused code.
 1.36.2.44 19-Nov-2010  uebayasi More adjustment.

Reorder
genfs_node_unlock() -> putiobuf()
to
putiobuf() -> genfs_node_unlock()
but I don't think there's any constraint between these two.
 1.36.2.43 19-Nov-2010  uebayasi Share mode code. Care glock.
 1.36.2.42 19-Nov-2010  uebayasi Reduce code duplication.
 1.36.2.41 19-Nov-2010  uebayasi Reduce code.
 1.36.2.40 19-Nov-2010  uebayasi Reduce unnecessary code.
 1.36.2.39 19-Nov-2010  uebayasi Call XIP getpages() from within the generic one.
 1.36.2.38 19-Nov-2010  uebayasi Really remove XIP hole code.
 1.36.2.37 19-Nov-2010  uebayasi Comment out XIP hole page redirection code. Since makefs(8) doesn't
support hole, and these code paths can be never tested.

(The current XIP is read-only, so hole pages are pointless in
practice.)
 1.36.2.36 19-Nov-2010  uebayasi Adjust XIP putpages to I/O XIP getpages.
 1.36.2.35 19-Nov-2010  uebayasi Make XIP genfs_getpages_xip() return pages in I/O path, preparing
merge into the generic genfs_getpages().
 1.36.2.34 18-Nov-2010  uebayasi Make XIP pager use cdev_mmap() instead of struct vm_physseg.
 1.36.2.33 18-Nov-2010  uebayasi Style change.
 1.36.2.32 16-Nov-2010  uebayasi Factor out the part which lookups physical page "identity" from
UVM object, into sys/uvm/uvm_vnode.c:uvn_findpage_xip(). Eventually
this will become a call to cdev UVM object pager.
 1.36.2.31 15-Nov-2010  uebayasi Move zero-page into a common place, in the hope that it's shared
for other purposes.

According to Chuck Silvers, zero-page mappings don't need to be
explicitly unmapped in putpages(). Follow that advice.
 1.36.2.30 06-Nov-2010  uebayasi Sync with HEAD.
 1.36.2.29 04-Nov-2010  uebayasi Split physical device segment pages from "managed" to "managed
device". Cache that information as a flag PG_DEVICE so that callers
don't need to walk physsegs everytime.

Remove PQ_FIXED, which means that page daemon doesn't need to know
device segment pages at all. But still fault handlers need to know
them.

I think this is what I can do best now.
 1.36.2.28 04-Nov-2010  uebayasi Remove a XXX comment which is only confusing.
 1.36.2.27 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.36.2.26 27-Sep-2010  uebayasi genfs_do_getpages_xip1: Adjust locking. Although XIP never does real I/O,
it's called without PGO_LOCKED in some cases. Leave vmobjlock unlocked in
that case.
 1.36.2.25 26-Sep-2010  uebayasi Minor fixes.
 1.36.2.24 26-Sep-2010  uebayasi Implement XIP "putpages". Invalidate MMU mappings of pages at the
request of PGO_FREE. PGO_DEACTIVATE and PGO_CLEANIT do nothing, because
XIP pages are neither queued nor writable.

Allocate read-only "zero" page per vnode. Put it at offset 0 of vnode's
uvm_object. This per-vnode "zero" page is mapped to all hole pages of
the vnode. If one of its mapped pages are forced to be PGO_FREE'ed,
all the mappings are invalidated.
 1.36.2.23 26-Sep-2010  uebayasi Wrap long lines.
 1.36.2.22 25-Aug-2010  uebayasi Fix UVMHIST build.

Remove a comment about xip getpages optimization; quick profiling showed
that this routine is not expensive. It'd be better to concentrate on
reducing TLB miss.
 1.36.2.21 17-Aug-2010  uebayasi Sync with HEAD.
 1.36.2.20 12-Aug-2010  uebayasi vm_physseg::start is PFN, not mdpgno, so don't decode it using
pmap_phys_address().
 1.36.2.19 11-Aug-2010  uebayasi In XIP vnode pager, assert that filesystem blocks and pages are aligned.
 1.36.2.18 22-Jul-2010  uebayasi s/PG_XIP/PQ_FIXED/, meaning that the fault handler sees XIP pages as
"fixed", and doesn't pass them to paging activity.

("XIP" is a vnode specific knowledge. It was wrong that the fault
handler had to know such a special thing.)
 1.36.2.17 20-Jul-2010  uebayasi genfs_do_getpages_xip: Simplify code.
 1.36.2.16 15-Jul-2010  uebayasi s/PG_DIRECT/PG_XIP/
 1.36.2.15 12-Jul-2010  uebayasi Reduce more diff by backing out XIP page specific code. Allow XIP pages
to be loaned.
 1.36.2.14 09-Jul-2010  uebayasi opt_direct_page.h is no more.
 1.36.2.13 09-Jul-2010  uebayasi Mark XIP pages as PG_CLEAN and/or PG_BUSY when appropriate. Protect
vnode lock when vm_page::flags is manipulated.
 1.36.2.12 07-Jul-2010  uebayasi To simplify things, revert global vm_page_md hash and allocate struct
vm_page [] for XIP physical segments.
 1.36.2.11 06-Jul-2010  uebayasi Directly allocate zero'ed vm_page for XIP unallocated blocks, instead
of abusing pool page. Move the code to XIP vnode pager in genfs_io.c.
 1.36.2.10 08-Jun-2010  uebayasi Comment.
 1.36.2.9 07-Jun-2010  uebayasi Comment.
 1.36.2.8 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.36.2.7 28-Apr-2010  uebayasi When mounting a block device as XIP, pass registered struct vm_physseg
* as a cookie from the block device to the caller (== mount code).
struct vm_physseg * will be passed to XIP vnode pager
(genfs_do_getpages_xip()), then converted back to paddr_t.

(My future plan is to pass struct vm_physseg * back to the fault handler,
and to pmap_enter() as is.)
 1.36.2.6 23-Mar-2010  uebayasi Put run-time XIP-specific per-mount data in struct specdev, not struct mount.
 1.36.2.5 17-Mar-2010  uebayasi Put comments to reflect my intent about genfs_do_getpages_xip method.
 1.36.2.4 28-Feb-2010  uebayasi Don't always enable XIP on this branch to prepare the merge. Fix build
without XIP in places.
 1.36.2.3 28-Feb-2010  uebayasi To mount block devices as XIP, pass physical address "cookie" used by
bus_space_mmap(9) / pmap_phys_addr(9) via struct mount.
 1.36.2.2 23-Feb-2010  uebayasi genfs_do_getpages_xip: Drop vmobjlock before calling VOP_BMAP, otherwise
deadlock. No idea how this worked for me before.

Directly call uvm_phys_to_vm_page_device() to make a device page cookie.
 1.36.2.1 11-Feb-2010  uebayasi genfs_getpages() for XIP.

Pages are directly mappable, and always there. What we need to do here is
to address filesystem blocks and tell those addresses back to the fault
handler by encoding the physical addresses in struct vm_page * pointers.

(I hate code duplication. What can I do?)
 1.46.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.48.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.53.8.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.53.6.1 02-Jun-2012  mrg sync to latest -current.
 1.53.2.19 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.53.2.18 22-Apr-2013  yamt revert unnecessary diff
 1.53.2.17 02-Nov-2012  yamt tweak integrity_sync cases
some comments
 1.53.2.16 01-Aug-2012  yamt - fix integrity sync.
putpages for integrity sync (fsync, msync with MS_SYNC, etc) should not
skip pages being written back by other threads.

- adapt to radix tree tag api changes.
 1.53.2.15 01-Aug-2012  yamt remove stale comments
 1.53.2.14 23-May-2012  yamt sync with head.
 1.53.2.13 17-Feb-2012  yamt byebye PG_HOLE as it turned out to be unnecessary.
 1.53.2.12 05-Feb-2012  yamt genfs_gop_write_rwmap: comment
 1.53.2.11 05-Feb-2012  yamt use unsigned
comments
assertions
 1.53.2.10 25-Jan-2012  yamt comment
 1.53.2.9 24-Jan-2012  yamt - g/c #if 0'ed code
- minor optimization
- comments
 1.53.2.8 18-Jan-2012  yamt - bug fixes
- minor optimizations
- assertions
- comments
 1.53.2.7 14-Jan-2012  yamt fix overwrite case
 1.53.2.6 20-Dec-2011  yamt don't inline uvn_findpages in genfs_io.
 1.53.2.5 30-Nov-2011  yamt g/c #if 1
comment
 1.53.2.4 26-Nov-2011  yamt - uvm_page_array_fill: add some more parameters
- uvn_findpages: use gang-lookup
- genfs_putpages: re-enable backward clustering
- mechanical changes after the recent radixtree.h api changes
 1.53.2.3 20-Nov-2011  yamt - simplify code
- comments
 1.53.2.2 10-Nov-2011  yamt - remove uobj->memq
- fix UVM_PAGE_TRKOWN
- comments
 1.53.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.55.4.1 18-May-2014  rmind sync with head
 1.55.2.4 03-Dec-2017  jdolecek update from HEAD
 1.55.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.55.2.2 10-Oct-2012  bouyer We know vp is not NULL, no need to check
 1.55.2.1 12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.58.6.3 28-Aug-2017  skrll Sync with HEAD
 1.58.6.2 05-Oct-2016  skrll Sync with HEAD
 1.58.6.1 06-Jun-2015  skrll Sync with HEAD
 1.61.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.61.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.61.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.63.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.68.6.3 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.68.6.2 05-Jul-2017  martin Pull up following revision(s) (requested by hannken in ticket #84):
sys/miscfs/genfs/genfs_io.c: revision 1.70
Add missing check for dead or dying vnode to the entry of genfs_getpages().
 1.68.6.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.71.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.71.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.72.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.72.2.1 10-Jun-2019  christos Sync with HEAD
 1.83.2.2 29-Feb-2020  ad Sync with head.
 1.83.2.1 17-Jan-2020  ad Sync with head.
 1.24 14-Mar-2020  ad Update a comment.
 1.23 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.22 28-May-2018  chs branches: 1.22.2; 1.22.8;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.21 06-Jun-2013  dholland branches: 1.21.32;
Add missing declaration of struct vnode.
 1.20 01-Sep-2010  chs branches: 1.20.8; 1.20.18;
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.19 27-Jan-2010  uebayasi branches: 1.19.2; 1.19.4;
Don't forget to tell the result of rw_tryenter().
 1.18 27-Jan-2010  uebayasi Add genfs_node_rdtrylock().
 1.17 14-May-2008  reinoud branches: 1.17.8; 1.17.14; 1.17.16;
Import writing part of the UDF file system making optical media like CD's
and DVD's behave like floppy discs. Writing is supported upto and including
version 2.01; version 2.50 and 2.60 will follow.

Also extending the UDF implementation to support symbolic links and
hardlinks.

Added are the mmcformat(8) tool to format rewritable CD/DVD discs and
newfs_udf(8).

Limitations:
all operations can be performed on the file system though the
sheduling is currently optimised for archiving workloads.

mv(1)/rename(2) is currently only implemented for non-directories.
 1.16 20-Feb-2007  ad branches: 1.16.38; 1.16.40; 1.16.42; 1.16.44;
Add genfs_node_destroy(). Fixes a lock "leak" seen when running LOCKDEBUG
kernels.
 1.15 15-Feb-2007  ad branches: 1.15.2;
Replace some uses of lockmgr() / simplelocks.
 1.14 14-Oct-2006  yamt add wrapper functions of lockmgr on g_glock.
 1.13 06-Oct-2006  dogcow fix build error in mount_sysvbfs.
 1.12 05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.11 14-May-2006  elad branches: 1.11.8; 1.11.10;
integrate kauth.
 1.10 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.9 11-Dec-2005  christos branches: 1.9.4; 1.9.6; 1.9.8; 1.9.10; 1.9.12;
merge ktrace-lwp.
 1.8 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.7 17-Jul-2005  yamt ensure that vnodes with dirty pages are always on syncer's queue.

- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).

- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.

fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)

- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).

- add some assertions.
 1.6 28-Jun-2005  yamt branches: 1.6.2;
- constify genfs_ops.
- use member designators.
 1.5 24-Sep-2003  yamt branches: 1.5.14;
fix a bug of lfs.

genfs_getpages() can read in more blocks than it should due to faked filesize
of lfs_gop_size(). it's a security problem and it makes gcc3 "internal error"

to fix this,
- in genfs_getpages(), always calculate diskeof and memeof separately
so that filesystems (in this case, lfs) can use different strategies
for them.
- introduce GOP_SIZE_MEM flag and use it to request in-core filesize.
(it was an intention of GOP_SIZE_READ,
but after the above change _READ is not a straightforward name)

after this, no one uses GOP_SIZE_{READ,WRITE} anymore but leave them for now.
 1.4 17-Feb-2003  perseant branches: 1.4.2;
Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.3 18-Dec-2001  chs branches: 1.3.2;
add some compatibility routines to allow mmap() to work non-UBCified
filesystems (in the same non-coherent fashion that they worked before).
 1.2 15-Sep-2001  chs branches: 1.2.2; 1.2.4;
add a forward decl for struct vm_page.
 1.1 15-Sep-2001  chs interfaces and structures used by new genfs_{get,put}pages().
 1.2.4.2 01-Oct-2001  fvdl Catch up with -current.
 1.2.4.1 15-Sep-2001  fvdl file genfs_node.h was added on branch thorpej-devvp on 2001-10-01 12:47:18 +0000
 1.2.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.2.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.2.2.1 15-Sep-2001  nathanw file genfs_node.h was added on branch nathanw_sa on 2001-09-21 22:36:37 +0000
 1.3.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.3.2.1 18-Dec-2001  thorpej file genfs_node.h was added on branch kqueue on 2002-01-10 20:01:34 +0000
 1.4.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.4.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.4.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.14.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.6.2.3 26-Feb-2007  yamt sync with head.
 1.6.2.2 30-Dec-2006  yamt sync with head.
 1.6.2.1 21-Jun-2006  yamt sync with head.
 1.9.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.9.12.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.9.10.3 20-Apr-2006  christos kauth_cred_t -> struct kauth_cred;
 1.9.10.2 19-Apr-2006  elad sync with head.
 1.9.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.9.8.2 24-May-2006  yamt sync with head.
 1.9.8.1 01-Apr-2006  yamt sync with head.
 1.9.6.2 01-Jun-2006  kardel Sync with head.
 1.9.6.1 22-Apr-2006  simonb Sync with head.
 1.9.4.1 09-Sep-2006  rpaulo sync with head
 1.11.10.1 22-Oct-2006  yamt sync with head
 1.11.8.1 18-Nov-2006  ad Sync with head.
 1.15.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.16.44.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.16.42.3 09-Oct-2010  yamt sync with head
 1.16.42.2 11-Mar-2010  yamt sync with head
 1.16.42.1 16-May-2008  yamt sync with head.
 1.16.40.1 18-May-2008  yamt sync with head.
 1.16.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.17.16.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.17.14.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.17.8.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.19.4.1 05-Mar-2011  rmind sync with head
 1.19.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.20.18.1 23-Jun-2013  tls resync from head
 1.20.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.20.8.1 20-Nov-2011  yamt - simplify code
- comments
 1.21.32.1 25-Jun-2018  pgoyette Sync with HEAD
 1.22.8.1 17-Jan-2020  ad Sync with head.
 1.22.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.7 20-Oct-2021  thorpej Don't use genfs_rename_knote() in the "rename foo over hard-link to itself"
case, which simply results in removing the "from" name; there are assertions
in genfs_rename_knote() that are too strong for that case.

PR kern/56460
 1.6 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.5 05-Sep-2020  riastradh genfs_rename: Fix deadlocks in cross-directory cyclic rename.

Reproducer:

A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rmdir("c/d/e"); rmdir("c/d"); }
B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c", "c/d/e"); }
C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c/d/e", "c"); }

Deadlock:

- A holds c and wants to lock d; and either
- B holds . and d and wants to lock c, or
- C holds . and d and wants to lock c.

The problem with these is that genfs_rename_enter_separate in B or C
tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C,
tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order
.->c->d->e.

The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do
tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be
the same (hard links), and we can't detect that until after we've
looked them both up -- and in some file systems (I'm looking at you,
ufs), there is no mere lookup operation, only lookup-and-lock, so we
can't even hold the lock on one of tvp or fvp when we look up the
other one if there's a chance they might be the same.

Fortunately the cases
(a) tvp = fvp
(b) tvp or fvp is a directory
are mutually exclusive as long as directories cannot be hard-linked.
In case (a) we can just defer locking {tvp, fvp} until the end, because
it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In
case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or
tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory,
because it can't possibly coincide with the second one of {fvp, tvp}.

With this change, we can now prove that the locking order is consistent
with the ancestor->descendant partial ordering. Where two nodes are
incommensurate under that partial ordering, they are only ever locked
by rename and there is only ever one rename at a time.

Proof:

- For same-directory renames, genfs_rename_enter_common locks the
directory first and then the children. The order
directory->child[i] is consistent with ancestor->descendant and
child[0]/child[1] are incommensurate.

- For cross-directory renames:

. While a rename is in progress and the fs-wide rename lock is held,
directories can be created or removed but not changed, so the
outcome of gro_genealogy -- which, given fdvp and tdvp, returns
the node N relating fdvp/N/.../tdvp or null if there is none --
can only transition from finding N to not finding N, if one of
the directories is removed while any of the vnodes are unlocked.
Merely creating directories cannot change the ancestry of tdvp,
and concurrent renames are not possible.

Thus, if a gro_genealogy determined the operation to have the
form fdvp/N/.../tdvp, then it might cease to have that form, but
only because tdvp was removed which will harmlessly cause the
rename to fail later on. Similarly, if gro_genealogy determined
the operation _not_ to have the form fdvp/N/.../tdvp then it
can't begin to have that form until after the rename has
completed.

The lock order is,

=> for fdvp/.../tdvp:
1. lock fdvp
2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
3. lock fvp if a directory (consistent with fdvp->fvp)
4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp)
5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible)
7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp)

=> for incommensurate fdvp & tdvp, or for tdvp/.../fdvp:
1. lock tdvp
2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
3. lock tvp if a directory (consistent with tdvp->tvp)
4. lock fdvp (either incommensurate with tdvp and/or tvp, or
consistent with tdvp(->tvp)->fdvp)
5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible)
7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp)

Deadlocks found by hannken@; resolution worked out with dholland@.

XXX I think we could improve concurrency somewhat -- with a likely
big win for applications like tar and rsync that create many files
with temporary names and then rename them to the permanent one in the
same directory -- by making vfs_renamelock a reader/writer lock: any
number of same-directory renames, or exactly one cross-directory
rename, at any one time.
 1.4 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.3 30-Mar-2017  hannken branches: 1.3.18;
Remove now redundant calls to fstrans_start()/fstrans_done().
 1.2 06-Feb-2014  hannken branches: 1.2.6; 1.2.10; 1.2.14;
Move fstrans_start()/fstrans_done() into genfs_insane_rename() to protect
the complete rename operation like we do for all other vnode operations.
 1.1 08-May-2012  riastradh branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.10;
Implement a genfs_rename abstraction.

First major step in incrementally adapting all the file systems to a
saner rename VOP protocol.
 1.1.10.1 18-May-2014  rmind sync with head
 1.1.8.2 03-Dec-2017  jdolecek update from HEAD
 1.1.8.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.1.6.2 02-Jul-2012  jdc Pull up revisions:
src/sys/conf/files revision 1.1050
src/sys/miscfs/genfs/genfs.h revision 1.30 via patch
src/sys/miscfs/genfs/genfs_rename.c revision 1.1 via patch
src/sys/rump/librump/rumpvfs/Makefile.rumpvfs revision 1.33
(requested by riastradh in ticket #286).

Implement a genfs_rename abstraction.

First major step in incrementally adapting all the file systems to a
saner rename VOP protocol.
 1.1.6.1 08-May-2012  jdc file genfs_rename.c was added on branch netbsd-6 on 2012-07-02 18:01:17 +0000
 1.1.4.2 02-Jun-2012  mrg sync to latest -current.
 1.1.4.1 08-May-2012  mrg file genfs_rename.c was added on branch jmcneill-usbmp on 2012-06-02 11:09:36 +0000
 1.1.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.2 23-May-2012  yamt sync with head.
 1.1.2.1 08-May-2012  yamt file genfs_rename.c was added on branch yamt-pagecache on 2012-05-23 10:08:14 +0000
 1.2.14.1 21-Apr-2017  bouyer Sync with HEAD
 1.2.10.1 26-Apr-2017  pgoyette Sync with HEAD
 1.2.6.1 28-Aug-2017  skrll Sync with HEAD
 1.3.18.1 13-Sep-2020  martin Pull up following revision(s) (requested by riastradh in ticket #1083):

sys/miscfs/genfs/genfs_rename.c: revision 1.5
tests/fs/vfs/t_renamerace.c: revision 1.37
tests/fs/vfs/t_renamerace.c: revision 1.38

tests/fs/vfs/t_renamerace: Test a screw case hannken@ found.

genfs_rename: Fix deadlocks in cross-directory cyclic rename.

Reproducer:
A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rmdir("c/d/e"); rmdir("c/d"); }
B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c", "c/d/e"); }
C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c/d/e", "c"); }

Deadlock:
- A holds c and wants to lock d; and either
- B holds . and d and wants to lock c, or
- C holds . and d and wants to lock c.

The problem with these is that genfs_rename_enter_separate in B or C
tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C,
tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order
.->c->d->e.

The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do
tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be
the same (hard links), and we can't detect that until after we've
looked them both up -- and in some file systems (I'm looking at you,
ufs), there is no mere lookup operation, only lookup-and-lock, so we
can't even hold the lock on one of tvp or fvp when we look up the
other one if there's a chance they might be the same.

Fortunately the cases
(a) tvp = fvp
(b) tvp or fvp is a directory
are mutually exclusive as long as directories cannot be hard-linked.

In case (a) we can just defer locking {tvp, fvp} until the end, because
it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In
case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or
tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory,
because it can't possibly coincide with the second one of {fvp, tvp}.

With this change, we can now prove that the locking order is consistent
with the ancestor->descendant partial ordering. Where two nodes are
incommensurate under that partial ordering, they are only ever locked
by rename and there is only ever one rename at a time.

Proof:
- For same-directory renames, genfs_rename_enter_common locks the
directory first and then the children. The order
directory->child[i] is consistent with ancestor->descendant and
child[0]/child[1] are incommensurate.
- For cross-directory renames:
. While a rename is in progress and the fs-wide rename lock is held,
directories can be created or removed but not changed, so the
outcome of gro_genealogy -- which, given fdvp and tdvp, returns
the node N relating fdvp/N/.../tdvp or null if there is none --
can only transition from finding N to not finding N, if one of
the directories is removed while any of the vnodes are unlocked.
Merely creating directories cannot change the ancestry of tdvp,
and concurrent renames are not possible.
Thus, if a gro_genealogy determined the operation to have the
form fdvp/N/.../tdvp, then it might cease to have that form, but
only because tdvp was removed which will harmlessly cause the
rename to fail later on. Similarly, if gro_genealogy determined
the operation _not_ to have the form fdvp/N/.../tdvp then it
can't begin to have that form until after the rename has
completed.
The lock order is,
=> for fdvp/.../tdvp:
1. lock fdvp
2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
3. lock fvp if a directory (consistent with fdvp->fvp)
4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp)
5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible)
7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp)
=> for incommensurate fdvp & tdvp, or for tdvp/.../fdvp:
1. lock tdvp
2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
3. lock tvp if a directory (consistent with tdvp->tvp)
4. lock fdvp (either incommensurate with tdvp and/or tvp, or
consistent with tdvp(->tvp)->fdvp)
5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible)
7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp)

Deadlocks found by hannken@; resolution worked out with dholland@.

XXX I think we could improve concurrency somewhat -- with a likely
big win for applications like tar and rsync that create many files
with temporary names and then rename them to the permanent one in the
same directory -- by making vfs_renamelock a reader/writer lock: any
number of same-directory renames, or exactly one cross-directory
rename, at any one time.
 1.11 08-Jul-2022  hannken Handle IMNT_GONE on the file system we want suspended not its
lowest mount we really suspend.
 1.10 22-Dec-2019  ad Make mntvnode_lock per-mount, and address false sharing of struct mount.
 1.9 20-Feb-2019  hannken Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.
 1.8 05-Oct-2018  hannken Bring back three state file system suspension:

NORMAL -> SUSPENDING -> SUSPENDED

and add operation fstrans_start_lazy() that only blocks while SUSPENDED.

Change vndthread() support operation handle_with_rdwr() to bracket
its file system operations by fstrans_start_lazy() and fstrans_done().

PR kern/53624 (dom0 freeze on domU exit)
 1.7 24-May-2017  hannken branches: 1.7.2; 1.7.8; 1.7.10;
With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73
 1.6 07-May-2017  hannken Return ENOENT if trying to suspend an unmounted file system.
 1.5 30-Mar-2017  hannken branches: 1.5.4;
Change last users of FSTRANS_LAZY to FSTRANS_SHARED and change
genfs_suspendctl() to move from FSTRANS_NORMAL to FSTRANS_SUSPENDED
and vice versa.
 1.4 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.3 30-Nov-2009  pooka branches: 1.3.22; 1.3.40; 1.3.44; 1.3.48;
Introduce genfs_statvfs() as pretty much a no-info statvfs and
convert several pseudo file systems to use it.
 1.2 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.1 28-Jan-2008  dholland branches: 1.1.2; 1.1.4; 1.1.10; 1.1.12; 1.1.14; 1.1.16;
Part of the rename patches *doh*
 1.1.16.2 11-Mar-2010  yamt sync with head
 1.1.16.1 16-May-2008  yamt sync with head.
 1.1.14.1 18-May-2008  yamt sync with head.
 1.1.12.2 23-Mar-2008  matt sync with HEAD
 1.1.12.1 28-Jan-2008  matt file genfs_vfsops.c was added on branch matt-armv6 on 2008-03-23 02:05:03 +0000
 1.1.10.1 02-Jun-2008  mjf Sync with HEAD.
 1.1.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.1 28-Jan-2008  mjf file genfs_vfsops.c was added on branch mjf-devfs on 2008-02-18 21:07:00 +0000
 1.1.2.2 04-Feb-2008  yamt sync with head.
 1.1.2.1 28-Jan-2008  yamt file genfs_vfsops.c was added on branch yamt-lazymbuf on 2008-02-04 09:24:29 +0000
 1.3.48.1 21-Apr-2017  bouyer Sync with HEAD
 1.3.44.2 26-Apr-2017  pgoyette Sync with HEAD
 1.3.44.1 20-Mar-2017  pgoyette Sync with HEAD
 1.3.40.1 28-Aug-2017  skrll Sync with HEAD
 1.3.22.1 03-Dec-2017  jdolecek update from HEAD
 1.5.4.1 11-May-2017  pgoyette Sync with HEAD
 1.7.10.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.7.10.1 10-Jun-2019  christos Sync with HEAD
 1.7.8.1 20-Oct-2018  pgoyette Sync with head
 1.7.2.1 09-Oct-2018  martin Pull up following revision(s) (requested by hannken in ticket #1052):

sys/kern/vfs_trans.c: revision 1.51
distrib/sets/lists/comp/mi: revision 1.2233
share/man/man9/fstrans.9: revision 1.27
share/man/man9/Makefile: revision 1.431
sys/sys/fstrans.h: revision 1.12
sys/rump/librump/rumpkern/emul.c: revision 1.187
sys/dev/vnd.c: revision 1.266
sys/miscfs/genfs/genfs_vfsops.c: revision 1.8

Bring back three state file system suspension:

NORMAL -> SUSPENDING -> SUSPENDED

and add operation fstrans_start_lazy() that only blocks while SUSPENDED.

Change vndthread() support operation handle_with_rdwr() to bracket
its file system operations by fstrans_start_lazy() and fstrans_done().

PR kern/53624 (dom0 freeze on domU exit)
 1.220 03-Mar-2023  hannken Fix genfs_can_chtimes() to also handle the condition:

If the time pointer is null, then write permission
on the file is also sufficient.

From FreeBSD.

Should fix PR kern/57246 "NFS group permissions regression"
 1.219 27-Mar-2022  christos branches: 1.219.4;
dedup the eofs link/symlink methods
 1.218 27-Mar-2022  christos Expose groupmember as kauth_cred_groupmember and use it.
 1.217 19-Mar-2022  hannken Remove now unused genfs_nolock(), genfs_nounlock() and genfs_noislocked().
 1.216 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.215 11-Oct-2021  thorpej Mark the EVFILT_VNODE filters MP-safe.
 1.214 11-Oct-2021  thorpej Setting EV_EOF requires modifying kn->kn_flags. However, that relies on
holding the kq_lock of that note's kq. Rather than exposing this directly,
add new knote_set_eof() and knote_clear_eof() functions that handle the
necessary locking and don't leak as many implementation details to modules.

NetBSD 9.99.91
 1.213 10-Oct-2021  thorpej Must hold kn->kn_kq->kq_lock to modify kn->kn_flags.
 1.212 26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.211 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.210 05-Sep-2020  riastradh branches: 1.210.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.209 07-Aug-2020  christos accmode should be accmode_t
 1.208 27-Jun-2020  christos Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.207 20-May-2020  christos Fix EPERM vs EACCES on chtimes (thanks @hannken)
 1.206 18-May-2020  christos remove debugging, it is just clutter.
 1.205 18-May-2020  christos Fix EPERM vs EACCES return.
 1.204 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.203 25-Apr-2020  christos Allow root to access and modify system space extended attributes.
XXX: this routine should not be using the string, but the attribute namespace.
I have fixed this in the ACL code.
 1.202 23-Feb-2020  ad Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.
 1.201 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.200 01-Dec-2019  ad branches: 1.200.2;
Minor vnode locking changes:

- Stop using atomics to maniupulate v_usecount. It was a mistake to begin
with. It doesn't work as intended unless the XLOCK bit is incorporated in
v_usecount and we don't have that any more. When I introduced this 10+
years ago it was to reduce pressure on v_interlock but it doesn't do that,
it just makes stuff disappear from lockstat output and introduces problems
elsewhere. We could do atomic usecounts on vnodes but there has to be a
well thought out scheme.

- Resurrect LK_UPGRADE/LK_DOWNGRADE which will be needed to work effectively
when there is increased use of shared locks on vnodes.

- Allocate the vnode lock using rw_obj_alloc() to reduce false sharing of
struct vnode.

- Put all of the LRU lists into a single cache line, and do not requeue a
vnode if it's already on the correct list and was requeued recently (less
than a second ago).

Kernel build before and after:

119.63s real 1453.16s user 2742.57s system
115.29s real 1401.52s user 2690.94s system
 1.199 25-Oct-2017  maya branches: 1.199.4;
Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.198 01-Jul-2017  christos Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
 1.197 04-Jun-2017  hannken Locking a layer vnode using the regular bypass routine is no longer
racy. Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
 1.196 04-Jun-2017  hannken Now that FSTRANS is part of VOP_*LOCK() remove FSTRANS and vdead_check()
from genfs_.*lock() and assert the vnode state once the vnode is locked.
 1.195 11-Apr-2017  riastradh branches: 1.195.4;
Eliminate now-unused WILLUNLOCK vop flag.
 1.194 30-Mar-2017  hannken Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
 1.193 11-Jan-2017  hannken branches: 1.193.2;
Move vnode member v_lock as vi_lock to vnode_impl.h.
 1.192 24-Mar-2014  hannken branches: 1.192.4; 1.192.6; 1.192.8; 1.192.10; 1.192.14;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.191 12-Mar-2014  hannken Restructure genfs_deadlock() and genfs_lock() to always lock before
testing for dead node. Use ISSET() to test flags, add assertions.

Save the mount for fstrans_done() before genfs_unlock() unlocks the node.
 1.190 27-Feb-2014  hannken The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.189 30-Mar-2012  njoly branches: 1.189.2; 1.189.4;
uid mismatch for file flags changes is expected to fail with EPERM not
EACCES.
 1.188 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.187 12-Jun-2011  rmind branches: 1.187.2; 1.187.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.186 27-Dec-2010  hannken branches: 1.186.6;
Extend the range of fstrans transactions to a sequence of vnode operations
on a locked vnode. This leaves a suspended file system and therefore a
snapshot with either all or no operations of such a sequence done.
 1.185 30-Nov-2010  dholland Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
 1.184 30-Nov-2010  dholland Abolish struct componentname's cn_pnbuf. Use the path buffer in the
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)

This removes the need for the SAVENAME and HASBUF namei flags.
 1.183 01-Sep-2010  chs replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.182 01-Jul-2010  hannken Remove vlockmgr(). Generic vnode lock operations now use a rwlock located
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().

Welcome to 5.99.34.

Discussed on tech-kern.
 1.181 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.180 24-Jun-2010  hannken genfs_nolock(): LK_INTERLOCK flag no longer possible.
 1.179 24-Jun-2010  hannken Clean up vnode lock operations:

- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.

- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.

Discussed on tech-kern.
 1.178 06-Jun-2010  hannken Change layered file systems to always pass the locking VOP's down to the
leaf file system. Remove now unused member v_vnlock from struct vnode.
Welcome to 5.99.30

Discussed on tech-kern.
 1.177 08-Apr-2010  pooka Call VOP_ABORTOP in genfs_eopnotsupp. This prevents file system
authors from having to get down on their knees and pray they won't
get POGA'd(*) again.

This plugs componentname leaks in at least smbfs and buggy puffs
servers (buggy servers shouldn't be able to leak kernel memory).

*) principle of greatest astonishment
 1.176 27-Jan-2010  uebayasi branches: 1.176.2; 1.176.4;
Don't forget to tell the result of rw_tryenter().
 1.175 27-Jan-2010  uebayasi Add genfs_node_rdtrylock().
 1.174 20-Nov-2009  roy Allow chown if caller is in the new group.
 1.173 20-Nov-2009  pooka Disallow chown for files the caller does not own.
 1.172 23-Jun-2009  elad Move the implementation of vaccess() to genfs_can_access(), in line with
the other routines of the same spirit.

Adjust file-system code to use it.

Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).

No objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
 1.171 07-May-2009  elad Extract the open-coded authorization logic for chtimes() from various
file-systems and put it in a single function, genfs_can_chtimes().

This also makes UDF follow the same policy as all other file-systems.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/27/msg004951.html
 1.170 25-Apr-2009  elad Add genfs_can_mount() and use it to prevent some more code duplication of
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).

Proposed with no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html

The vnode is always expected to be locked, so no locking is done outside
the file-system code.
 1.169 22-Apr-2009  elad Per discussion on tech-kern@:

- Replace use of label/goto with returns

- Rename, change prototype of, and move functions from vfs_subr.c to
genfs_vnops.c
 1.168 18-Apr-2009  pooka Move genfs_null_putpages() from genfs_io.c to genfs_vnops.c -- it does
not really do i/o.
 1.167 28-Apr-2008  martin branches: 1.167.8; 1.167.10; 1.167.14; 1.167.16; 1.167.18;
Remove clause 3 and 4 from TNF licenses
 1.166 19-Apr-2008  hannken branches: 1.166.2;
Remove stale include <sys/fstrans.h>.
 1.165 21-Mar-2008  ad branches: 1.165.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.164 05-Feb-2008  ad branches: 1.164.6;
Lock v_knlist with the vnode interlock. PR kern/37881.
 1.163 30-Jan-2008  ad Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.162 25-Jan-2008  riz Since VOP_LEASE is gone, remove genfs_lease_check() too. Now my kernel
builds again. :)
 1.161 17-Jan-2008  ad Fix v_freelisthd assertion failure during call to vdevdone(). No calling
VOPs without a vnode reference!
 1.160 02-Jan-2008  ad Merge vmlocking2 to head.
 1.159 05-Dec-2007  pooka branches: 1.159.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.158 17-Oct-2007  pooka branches: 1.158.4; 1.158.6;
Split I/O-related routines (getpages, putpages, etc.) which are heavily
tied to uvm out of genfs_vnops into genfs_io.c
 1.157 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.156 29-Jul-2007  ad branches: 1.156.4; 1.156.6; 1.156.8; 1.156.10;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.155 09-Jul-2007  ad branches: 1.155.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.154 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.153 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.152 13-May-2007  yamt use a cached value of v_size. no functional changes.
 1.151 24-Apr-2007  perseant Split the VOP interface part of genfs_putpages() from the code. The new
function that does the work, genfs_do_putpages(), now takes as an argument
a pointer to the page that would be waited on, if PGO_BUSYWAIT were not set.
This allows a consumer, e.g. lfs_putpages(), to perform an action outside
the scope of UVM before sleeping on the page in question.
 1.150 04-Mar-2007  christos branches: 1.150.2; 1.150.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.149 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.148 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.147 20-Feb-2007  ad Add genfs_node_destroy(). Fixes a lock "leak" seen when running LOCKDEBUG
kernels.
 1.146 15-Feb-2007  ad branches: 1.146.2;
Replace some uses of lockmgr() / simplelocks.
 1.145 09-Feb-2007  ad Merge newlock2 to head.
 1.144 29-Jan-2007  hannken Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.143 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.142 27-Dec-2006  yamt remove nqnfs.
 1.141 15-Dec-2006  yamt put ->K loaned pages on the page queue, so that page loaning doesn't
disturb pagedaemon/pdpolicy.
 1.140 30-Nov-2006  pooka branches: 1.140.2; 1.140.4;
* update comments before putpages(): the vm object is always returned
unlocked instead of locked. chuq agrees
* use slock set to &uobj->vmobjlock also for the last simple lock
operation to be consistent with the rest of the function
 1.139 25-Nov-2006  christos instead of const int, use a #define which most of the time will evaluate
in a compile-time constant.
 1.138 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.137 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.136 14-Oct-2006  yamt add wrapper functions of lockmgr on g_glock.
 1.135 14-Oct-2006  yamt genfs_getpages: use kmem_zalloc.
 1.134 14-Oct-2006  yamt genfs_do_io: iodone handler should be called at splbio.
 1.133 12-Oct-2006  yamt genfs_putpages: don't try to deactivate loaned pages.
reported and tested by Nicolas Joly on current-users@.
 1.132 12-Oct-2006  thorpej genfs_lease_check(): Consume the arguments even if NFSSERVER is not defined.
 1.131 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.130 05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.129 15-Sep-2006  yamt branches: 1.129.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.128 23-Jul-2006  ad branches: 1.128.4;
Use the LWP cached credentials where sane.
 1.127 22-Jul-2006  yamt - genfs_getpages: in the case of PGO_LOCKED, check if we can acquire
g_glock as suggested by Chuck Silvers on tech-kern@.
- genfs_rel_pages: handle PGO_DONTCARE so that it can be used for the above.
 1.126 22-Jul-2006  yamt - in genfs_getpages, take g_glock earlier so that it can't be
intervened by truncation.
it also fixes a deadlock. (g_glock vs pages locking order)
- uvm_vnp_setsize: modify v_size while holding v_interlock.

reviewed by Chuck Silvers.
 1.125 14-May-2006  elad integrate kauth.
 1.124 11-Apr-2006  yamt genfs_getpages:
- use "overwrite" variable consistently.
- remove a set-only variable.
no functional changes.
 1.123 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.122 01-Mar-2006  yamt branches: 1.122.2; 1.122.4; 1.122.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.121 16-Jan-2006  reinoud branches: 1.121.2; 1.121.4;
Add genfs support for directories and softlinks next to regular files and
block devices.

Discussed on tech-kern and ok'd by Chuck
 1.120 11-Jan-2006  yamt use nestiobuf api for genfs.
 1.119 04-Jan-2006  yamt - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.118 24-Dec-2005  perry branches: 1.118.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.117 15-Dec-2005  yamt fix lock/unlock mismatch in rev.1.115.
reported by Chris Tribo on current-users@.
 1.116 11-Dec-2005  christos merge ktrace-lwp.
 1.115 03-Dec-2005  yamt genfs_compat_getpages: add minimum support of async get. ie. ignore them.
should fix a crash reported by Jukka Salmi on current-users@.
 1.114 02-Dec-2005  yamt genfs_gop_write: use devvp directly as genfs_getpages does.
 1.113 02-Dec-2005  yamt genfs_putpages: initialize marker pages only when needed.
 1.112 30-Nov-2005  yamt revert rev.1.111 as it isn't necessary or correct.
- currently no one in tree has a problem with zero b_lblkno, afaik.
- this buf is used for "devvp", so it doesn't make sense to
use lbn in the "vp".
 1.111 30-Nov-2005  reinoud Learn genfs that (struct buf *)->b_lblkno allways need to point to the
logical block number of the file instead of allways zero.
 1.110 29-Nov-2005  yamt merge yamt-readahead branch.
 1.109 12-Nov-2005  yamt branches: 1.109.2;
genfs_getpages:
- add an assertion.
- call VOP_STRATEGY of underlying vnode directly, rather than
through the filesystem vnode.
- no need to set b_dev here because VOP_STRATEGY will take care of it.
 1.108 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.107 07-Oct-2005  elad branches: 1.107.2;
Remove Veriexec bits from genfs, don't #if 0 them.
 1.106 07-Oct-2005  elad Various fixes from blymn@ and myself.

Also, put genfs changes under #if 0, and don't do per-page fingerprints
until this is properly discussed, as requested by yamt@.
 1.105 05-Oct-2005  elad Introduce per-page fingerprints in Veriexec.

This closes a hole pointed out by Thor Lancelot Simon on tech-kern ~3
years ago.

The problem was with running binaries from remote storage, where our
kernel (and Veriexec) has no control over any changes to files.

An attacker could, after the fingerprint has been verified and
program loaded to memory, inject malicious code into the backing
store on the remote storage, followed by a forced flush, causing
a page-in of the malicious data from backing store, bypassing
integrity checks.

Initial implementation by Brett Lymn.
 1.104 26-Jul-2005  yamt don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.
 1.103 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.102 17-Jul-2005  yamt genfs_putpages: don't bother to clean the vnode unless VONWORKLST.
 1.101 17-Jul-2005  yamt ensure that vnodes with dirty pages are always on syncer's queue.

- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).

- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.

fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)

- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).

- add some assertions.
 1.100 17-Jul-2005  yamt - introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.

- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.
 1.99 16-Jul-2005  yamt genfs_getpages: don't forget to put the vnode onto the syncer's work queue
even in the case of PGO_LOCKED.
 1.98 28-Jun-2005  yamt branches: 1.98.2;
- constify genfs_ops.
- use member designators.
 1.97 29-May-2005  christos - sprinkle const
- avoid shadowed variables.
 1.96 26-Feb-2005  perry branches: 1.96.2;
nuke trailing whitespace
 1.95 16-Feb-2005  chs undo the part of rev. 1.93 that turned the past-EOF check into an assertion.
read() can't request pages past EOF, but mmap() can. apparently I had
disengaged the brain when I said that was ok.
 1.94 25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.93 25-Jan-2005  drochner branches: 1.93.2;
-in the read-ahead code, avoid to issue read requests at/past EOF
-because noone should request reads past EOF, or writes past EOF which
are not explicitely marked as file-extending (PGO_PASTEOF), turn
a boundary check into a KASSERT
approved by Chuck Silvers
 1.92 22-Dec-2004  dbj branches: 1.92.2;
check for _KERNEL_OPT around opt include
 1.91 04-Oct-2004  enami Backout previous; seeing many busy page on the pageq is normal.
 1.90 03-Oct-2004  enami So that not to leave pages busy unnecessarily, bound to specified region
when building cluster if we aren't pagedaemon and clean entire cluster
if we are pagedaemon.
 1.89 03-Oct-2004  enami Count obj pages freed by pagedaemon.
 1.88 17-Sep-2004  skrll There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.87 27-May-2004  yamt - remove a comment which is no longer true.
- add "XXX vn_lock" comments where we can call VOP_READ/WRITE
without vnode lock held. (genfs_compat_*)
 1.86 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.85 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.84 10-Jan-2004  yamt store a i/o priority hint in struct buf for buffer queue discipline.
 1.83 27-Nov-2003  pk genfs_revoke: use ltsleep() to release the vnode spin lock to avoid a
sleep/wakeup race.
 1.82 24-Sep-2003  yamt fix a bug of lfs.

genfs_getpages() can read in more blocks than it should due to faked filesize
of lfs_gop_size(). it's a security problem and it makes gcc3 "internal error"

to fix this,
- in genfs_getpages(), always calculate diskeof and memeof separately
so that filesystems (in this case, lfs) can use different strategies
for them.
- introduce GOP_SIZE_MEM flag and use it to request in-core filesize.
(it was an intention of GOP_SIZE_READ,
but after the above change _READ is not a straightforward name)

after this, no one uses GOP_SIZE_{READ,WRITE} anymore but leave them for now.
 1.81 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.80 29-Jun-2003  fvdl branches: 1.80.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.79 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.78 17-Jun-2003  simonb Micro-optimisation- move the "pgs == NULL" check from the previous
change to immediately after the malloc call. This can't fail in
the non-malloc case.

Reviewd by YAMAMOTO Takashi.
 1.77 15-Jun-2003  yamt genfs_getpages: if number of pages requested is >16,
use malloc/free for array of pointers to vm_page.
otherwise, use on-stack array as used to.
this change fixes assertion failure when nfsd gets a big read request
that isn't aligned with filesystem block.
discussed on tech-kern.
 1.76 23-Apr-2003  tls Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.75 10-Apr-2003  jdolecek use former genfs_eopnotsupp_rele() as genfs_eopnotsupp(), so that vnodes
are vput()/vrele()d as necessary - some filesystems did use the wrong
one for some ops, and it's just safer to not take the chance

based on suggestion by Bill Studenmund
 1.74 10-Apr-2003  jdolecek improve genfs_eopnotsupp_rele() so that's usable for vop_rename,
which uses WILLPUT for member which may be NULL
handle correctly dvp == vp case for WILLPUT members, so this works
for vop_remove, vop_rename

thanks Bill Studenmund for code&comments on this
 1.73 25-Feb-2003  thorpej Add a new BUF_INIT() macro which initializes b_dep and b_interlock, and
use it. This fixes a few places where either b_dep or b_interlock were
not properly initialized.
 1.72 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.71 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.70 21-Jan-2003  christos step 3. Assign lwp properly if null, so that we can PHOLD without segfaulting.
 1.69 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.68 15-Nov-2002  yamt genfs_compat_gop_write: set uio_iovcnt correctly.
 1.67 25-Oct-2002  yamt use B_ASYNC for children of nested buffers in genfs_getpages.
ok'ed by Chuck Silvers.
 1.66 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.65 21-Oct-2002  fvdl Use B_ASYNC in the !PGO_SYNCIO case. Gets back most, if not all, NFS
read throughput performance lost since the introduction of UBC. Spotted
by YAMAMOTO Takashi, many thanks to him.
 1.64 29-May-2002  enami Add missing pageq lock while uvm_pagefree() is called (either directly
or indirectly). Reviewed by chuq.
 1.63 18-May-2002  enami branches: 1.63.2;
Just give up to do readahead rather than waiting busy pages.
While I'm here, added few patchable variable so that one can
easily measure readahead behaviour.
 1.62 14-May-2002  perseant branches: 1.62.2;
Protect v_synclist with splbio(); note that LIST_REMOVE is not an idempotent
operation if more than one LIST_REMOVE happens on interrupt, so both the test
for VONWORKLIST and the LIST_REMOVE(vp, v_synclist) need to be in splbio().
 1.61 10-May-2002  enami Redo rev. 1.57 a bit different way; don't use `tpg' since it may be freed.
Subtract the number of pages behind us when calculating new offset instead.
 1.60 10-May-2002  enami Don't modify the local variable `n' in genfs_putpages(). It should contain
the number of elements in the page array at the beginning of every iteration.
 1.59 09-May-2002  enami When traversing by list, if the page next to us is a page in the cluster,
advance the pointer.
 1.58 09-May-2002  enami - In genfs_putpages(), no need to restrict the cluster within the given
region.
- In uvm_aio_aiodone(), remove assertions no longer true.
 1.57 06-May-2002  enami Since npages may includes number of pages behind us, we can't use it to
update current offset. Instead, use the last page in the run of pages
to calculate new offset.
 1.56 06-May-2002  enami Stylistic change; introduce new local variable and use it instead of
sprinkling different expression to test if we're pagedaemon.
 1.55 26-Apr-2002  enami We don't need to re-activate page in genfs_putpages() when GOP_WRITE returns
ENOMEM (temporary memory shortage) since it is already handled in
uvm_aio_aiodone() for both async/sync case. Discussed with chuq.
 1.54 16-Apr-2002  enami genfs_{compat_}getpages(): For PGO_LOCKED request, it is safe to return
read only page if it was due to read fault. This avoid many unnecessary
read fault introduced by recent nfs_bio.c change. Reviewed by chuq.
 1.53 16-Apr-2002  enami KNF and other misc. cosmetic changes.
 1.52 22-Mar-2002  chs in genfs_compat_getpages(), clear any part of a page that
VOP_READ() doesn't fill in (eg. because it's past EOF).
 1.51 17-Mar-2002  atatat Convert ioctl code to use EPASSTHROUGH instead of -1 or ENOTTY for
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
 1.50 02-Mar-2002  chs don't yield the cpu in genfs_putpages() if we're the pagedaemon.
pointed out by enami. fixes PR 15784.
 1.49 19-Feb-2002  chs fix two problems:
- when yielding the cpu while using the vnode's page list, use a marker page
to keep our place in the list (like the other cases where we drop the lock).
- wait until no one else has the page busy before deciding if the page needs
to be cleaned. a page will be dirty while it's being initialized but will
be marked clean before PG_BUSY is cleared.
both found by enami.
 1.48 13-Feb-2002  enami Don't bother to subtract 0.
 1.47 12-Feb-2002  enami Don't leave junk in pgs[] array since it will be passed to uvn_findpages()
again.
 1.46 26-Jan-2002  chs in genfs_putpages():
- yield the cpu if we've taken too long.
- when traversing by offset, skip over any pages that we clustered.
 1.45 31-Dec-2001  chs in genfs_gop_write(), actually set the B_ASYNC flag on buffers that we're
not going to wait for. this doesn't matter for real devices since we call
VOP_STRATEGY() directly, but NFS uses this flag to decide whether or not
to hand the buffer off to an nfsiod thread.
 1.44 31-Dec-2001  chs in genfs_putpages(), we must wait for any pending write i/os to complete
if the putpages request is synchronous.
 1.43 18-Dec-2001  chs add some compatibility routines to allow mmap() to work non-UBCified
filesystems (in the same non-coherent fashion that they worked before).
 1.42 06-Dec-2001  chs add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.41 30-Nov-2001  christos PR/14781: Matthew Fredette: Clamp the number of read-ahead pages to 16 because
other code has this limit. Also while I am here, convert the magic 16 into
a #define constant and use it in the appropriate places. This is a temporary
fix, since all this read-ahead business is XXXUBC anyway.
 1.40 10-Nov-2001  lukem add RCSIDs
 1.39 03-Oct-2001  enami branches: 1.39.2;
s/genfs_do_putpages/genfs_gop_write/ in uvmhist.
 1.38 21-Sep-2001  chs when zeroing pages past EOF, don't zero the page containing EOF if it
already contains valid data. should fix PRs 13361 and 13436.
 1.37 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.36 17-Aug-2001  chs branches: 1.36.2;
make genfs get/put work for block devices as well:
- the "fs bshift" for block devices is always DEV_BSHIFT.
- retrieve the device vnode from VOP_BMAP() and use that to set b_dev
in page i/o buffers.
 1.35 14-Jun-2001  chs branches: 1.35.2;
be sure to allocate dirty zeroed pages to cover blocks we allocate
to resolve a write fault. fixes PR 13201.
also, be sure to allocate blocks for write faults to holes even if
the page is already in memory. fixes PR 13189.
 1.34 28-May-2001  chs add a genfs_mmap() and change all of the disk-based filesystems
to implement VOP_MMAP() with the genfs version, in preparation for
actually using this VOP.
 1.33 26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.32 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.31 28-Feb-2001  chs branches: 1.31.2;
in genfs_getpages(), don't try to optimize zeroing past EOF.
fixes PR 12297.
 1.30 27-Feb-2001  chs distinguish between a file's in-memory EOF (which marks the offset at
which we disallow creation of page cache pages) and its on-disk EOF
(which marks the offset at which there is not (yet) data on disk that
we need to read when creating pages). for requests with PGO_PASTEOF,
the in-memory EOF maybe be much larger than the on-disk EOF.
 1.29 18-Feb-2001  chs fix a couple more bugs:
- in genfs_getpages(), unbusy any pages that we don't free in the error path.
- in genfs_putpages(), if we get a bmap error, record that in the master buf.
 1.28 12-Feb-2001  fvdl Oops, removal unintenionally commited debug code.
 1.27 12-Feb-2001  fvdl Format arg nit.
 1.26 05-Feb-2001  chs fix several bugs:
- in the cases where we skip over the i/o loop, increment npages by ridx
so that when the cleanup code starts processing the pgs array at index 0
it'll actually process all of the pages.
- process the PG_RELEASED flag when unbusying pages.
- add some missing MP locking.
- use MIN() and MAX() instead of min() and max() since the latter are
functions which take arguments of type "int" but we call them with
values of type "off_t", so the values could be truncated.
- in the PGO_PASTEOF case, use the larger of the current file size and the
end of the requested range of pages as the file size for this request.
this fixes some problems with sparsing writes to large offsets.
 1.25 22-Jan-2001  fvdl Cast lbn to off_t in a few places, to avoid daddr_t overflow and all sorts
of havoc. From Bill Sommerfeld.
 1.24 27-Dec-2000  chs several bugs:
- in genfs_getpages() don't start read-ahead if we get an error on the
sync read, and always start read-ahead after the range of the sync read
if we do any at all.
- off-by-one error in genfs_size().
 1.23 09-Dec-2000  chs only zero the part of the page after EOF if we're actually
initializing the page.
 1.22 27-Nov-2000  chs allow building without SOFTDEP by adding the pageiodone hook to bio_ops.
 1.21 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.20 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.
 1.19 03-Aug-2000  thorpej Convert namei pathname buffer allocation to use the pool allocator.
 1.18 29-May-2000  mycroft branches: 1.18.2;
Stylistic change.
 1.17 13-May-2000  perseant branches: 1.17.2;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.16 30-Mar-2000  augustss Register, begone!
 1.15 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.14 23-Oct-1999  fvdl Remove some mentioned members in the vop {un}lock args struct that we
do not actually have.
 1.13 03-Aug-1999  wrstuden branches: 1.13.2; 1.13.4; 1.13.6;
Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.12 08-Jul-1999  wrstuden Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.11 05-Mar-1999  mycroft branches: 1.11.4;
Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.10 13-Aug-1998  kleink Add genfs_einval(), which does the obvious thing.
 1.9 10-Aug-1998  matthias create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.8 25-Jun-1998  thorpej - Rename nqnfs_vop_lease_check() to genfs_lease_check(). If NFSSERVER is
not in the kernel, genfs_lease_check() is simply a no-op. This allows
LKM'd file systems to be exported (previously did not work properly
due to a compile-time decision based on -DNFSSERVER).
- defopt NFSSERVER
 1.7 05-Jun-1998  kleink * Convert fsync vnode operator implementations and usage from the old
waitfor argument and MNT_WAIT/MNT_NOWAIT to flags and FSYNC_WAIT.
* In genfs_fsync(), honor the FSYNC_NODATA flag.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 05-Jan-1998  perry RCSID Police.
 1.4 11-Apr-1997  kleink Implement a POSIX compliant genfs VOP_SEEK() and use it in the appropriate
places; by Chris G. Demetriou and myself.
 1.3 07-Sep-1996  mycroft Implement poll(2).
 1.2 05-Sep-1996  thorpej Remove some unused variables.
 1.1 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.11.4.7 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.11.4.6 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.11.4.5 02-Aug-1999  thorpej Update from trunk.
 1.11.4.4 31-Jul-1999  chs genfs_getpages() now handles:
- faults on offsets past the nominal EOF during extending writes.
- returning multiple pages in the !PGO_LOCKED case if multiple pages
are requested.
- using new VOP_BALLOC() interface to for allocating getpages with
blocksize<pagesize.
genfs_putpages() now handles:
- writing pages which do not have full backing store allocated.
 1.11.4.3 12-Jul-1999  chs fix the PGO_OVERWRITE case, I don't know how it was working before.
tidy a few other bits.
 1.11.4.2 11-Jul-1999  chs yet another major rework of the generic getpages.
we now do the block allocations for allocating getpages operations
after reading the pages. for nested i/os, use b_resid rather than
b_bcount to track the amount left to go. return values for
getpages/putpages are now unix errnos rather than VM_PAGER_*.
readahead is gone again for the moment.
 1.11.4.1 04-Jul-1999  chs create genfs_getpages() and genfs_putpages().
these should be able to handle most of the local-disk filesystems.
 1.13.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.4.2 15-Nov-1999  fvdl Sync with -current
 1.13.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.13.2.6 12-Mar-2001  bouyer Sync with HEAD.
 1.13.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.13.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.13.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.13.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.13.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.18.2.1 14-Dec-2000  he Pull up revision 1.20 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.31.2.22 07-Jan-2003  thorpej In the SA universe, the switch-to-this-LWP decision is made at a
different level than where preempt() calls are made, which renders
the "newlwp" argument useless. Replace it with a "more work to do"
boolean argument. Returning to userspace preempt() calls pass 0.
"Voluntary" preemptions in e.g. uiomove() pass 1. This will be used
to indicate to the SA subsystem that the LWP is not yet finished in
the kernel.

Collapse the SA vs. non-SA cases of preempt() together, making the
conditional code block much smaller, and don't call sa_preempt() if
more work is to come.

NOTE: THIS IS NOT A COMPLETE FIX TO THE preempt()-in-uiomove() PROBLEM
THAT CURRENTLY EXISTS FOR SA PROCESSES.
 1.31.2.21 11-Dec-2002  thorpej Sync with HEAD.
 1.31.2.20 11-Nov-2002  nathanw Catch up to -current
 1.31.2.19 23-Oct-2002  thorpej Fix a merge botch.
 1.31.2.18 23-Oct-2002  thorpej Sync with rev. 1.65.
 1.31.2.17 16-Jul-2002  nathanw pagedaemon_proc really should be a proc, not a LWP.
 1.31.2.16 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.31.2.15 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.31.2.14 20-Jun-2002  nathanw Catch up to -current.
 1.31.2.13 17-Apr-2002  nathanw Catch up to -current.
 1.31.2.12 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.31.2.11 28-Feb-2002  nathanw LWPify.
 1.31.2.10 28-Feb-2002  nathanw Catch up to -current.
 1.31.2.9 09-Jan-2002  nathanw curproc ==> curproc->l_proc
 1.31.2.8 08-Jan-2002  nathanw Catch up to -current.
 1.31.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.31.2.6 08-Oct-2001  nathanw Catch up to -current.
 1.31.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.31.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.31.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.31.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.31.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.35.2.8 02-Oct-2002  jdolecek knote data is now 64bit, g/c obsolete comment
 1.35.2.7 29-Sep-2002  jdolecek don't need cast to (caddr_t) for kn_hook anymore
 1.35.2.6 25-Sep-2002  jdolecek implement genfs_kqfilter() - this is based upon ufs_kqfilter(), but uses
vp->v_size for EVFILT_READ
 1.35.2.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.35.2.4 16-Mar-2002  jdolecek Catch up with -current.
 1.35.2.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.35.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.35.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.36.2.5 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.36.2.4 01-Oct-2001  fvdl Catch up with -current.
 1.36.2.3 27-Sep-2001  fvdl Put back line that got misplaced somehow.
 1.36.2.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.36.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.39.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.62.2.1 30-May-2002  gehenna Catch up with -current.
 1.63.2.3 26-Aug-2003  tron Pull up revision 1.76 (requested by tls in ticket #1434):
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.63.2.2 23-Oct-2002  lukem Pull up revision 1.65 (requested by fvdl in ticket #935):
Use B_ASYNC in the !PGO_SYNCIO case. Gets back most, if not all, NFS
read throughput performance lost since the introduction of UBC. Spotted
by YAMAMOTO Takashi, many thanks to him.
 1.63.2.1 01-Jun-2002  tv Pull up revision 1.64 (requested by enami in ticket #114):
Add missing pageq lock while uvm_pagefree() is called (either directly
or indirectly). Reviewed by chuq.
 1.80.2.11 11-Dec-2005  christos Sync with head.
 1.80.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.80.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.80.2.8 17-Feb-2005  skrll Sync with HEAD.
 1.80.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.80.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.80.2.5 19-Oct-2004  skrll Sync with HEAD
 1.80.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.80.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.80.2.2 03-Aug-2004  skrll Sync with HEAD
 1.80.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.92.2.1 29-Apr-2005  kent sync with -current
 1.93.2.3 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.93.2.2 12-Feb-2005  yamt sync with head.
 1.93.2.1 25-Jan-2005  yamt file genfs_vnops.c was added on branch yamt-km on 2005-02-12 18:17:53 +0000
 1.96.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.98.2.10 24-Mar-2008  yamt sync with head.
 1.98.2.9 11-Feb-2008  yamt sync with head.
 1.98.2.8 04-Feb-2008  yamt sync with head.
 1.98.2.7 21-Jan-2008  yamt sync with head
 1.98.2.6 07-Dec-2007  yamt sync with head
 1.98.2.5 27-Oct-2007  yamt sync with head.
 1.98.2.4 03-Sep-2007  yamt sync with head.
 1.98.2.3 26-Feb-2007  yamt sync with head.
 1.98.2.2 30-Dec-2006  yamt sync with head.
 1.98.2.1 21-Jun-2006  yamt sync with head.
 1.107.2.1 20-Oct-2005  yamt remove genfs_fsync.
 1.109.2.4 19-Nov-2005  yamt remove contextless read-ahead code.
 1.109.2.3 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.109.2.2 15-Nov-2005  yamt adapt ffs, lfs, nfs.
 1.109.2.1 14-Nov-2005  yamt disable genfs readahead.
 1.118.2.3 01-Feb-2006  yamt sync with head.
 1.118.2.2 15-Jan-2006  yamt sync with head.
 1.118.2.1 31-Dec-2005  yamt adapt some random parts of kernel to uio_vmspace.
 1.121.4.2 01-Jun-2006  kardel Sync with head.
 1.121.4.1 22-Apr-2006  simonb Sync with head.
 1.121.2.1 09-Sep-2006  rpaulo sync with head
 1.122.6.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.122.6.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.122.4.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.122.4.2 19-Apr-2006  elad sync with head.
 1.122.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.122.2.5 11-Aug-2006  yamt sync with head
 1.122.2.4 24-May-2006  yamt sync with head.
 1.122.2.3 11-Apr-2006  yamt sync with head
 1.122.2.2 01-Apr-2006  yamt sync with head.
 1.122.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.128.4.4 01-Feb-2007  ad Sync with head.
 1.128.4.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.128.4.2 12-Jan-2007  ad Sync with head.
 1.128.4.1 18-Nov-2006  ad Sync with head.
 1.129.2.3 18-Dec-2006  yamt sync with head.
 1.129.2.2 10-Dec-2006  yamt sync with head.
 1.129.2.1 22-Oct-2006  yamt sync with head
 1.140.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.140.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by perseant in ticket #703):
sys/miscfs/genfs/genfs.h 1.21
sys/miscfs/genfs/genfs_vnops.c 1.151
sys/ufs/lfs/lfs.h 1.119, 1.120
sys/ufs/lfs/lfs_bio.c 1.99-101
sys/ufs/lfs/lfs_extern.h 1.89
sys/ufs/lfs/lfs_inode.c 1.108, 1.109
sys/ufs/lfs/lfs_segment.c 1.197, 1.199, 1.200
sys/ufs/lfs/lfs_subr.c 1.69, 1.70
sys/ufs/lfs/lfs_syscalls.c 1.119
sys/ufs/lfs/lfs_vfsops.c 1.234, 1.235
sys/ufs/lfs/lfs_vnops.c 1.195, 1.196, 1.200, 1.202-206

Reduce busy waiting in lfs_putpages(), and other LFS improvements.
 1.146.2.4 17-May-2007  yamt sync with head.
 1.146.2.3 07-May-2007  yamt sync with head.
 1.146.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.146.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.150.4.1 11-Jul-2007  mjf Sync with head.
 1.150.2.15 23-Oct-2007  ad Sync with head.
 1.150.2.14 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.150.2.13 01-Sep-2007  yamt fix a race and add a comment about it.
 1.150.2.12 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.150.2.11 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.150.2.10 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.150.2.9 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.150.2.8 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.150.2.7 09-Jun-2007  ad Sync with head.
 1.150.2.6 08-Jun-2007  ad Sync with head.
 1.150.2.5 13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.150.2.4 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.150.2.3 05-Apr-2007  ad Compile fixes.
 1.150.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.150.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.155.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.156.10.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.156.10.1 29-Jul-2007  ad file genfs_vnops.c was added on branch matt-mips64 on 2007-07-29 12:15:47 +0000
 1.156.8.2 18-Oct-2007  yamt sync with head.
 1.156.8.1 14-Oct-2007  yamt sync with head.
 1.156.6.3 23-Mar-2008  matt sync with HEAD
 1.156.6.2 09-Jan-2008  matt sync with HEAD
 1.156.6.1 06-Nov-2007  matt sync with HEAD
 1.156.4.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.156.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.158.6.2 08-Dec-2007  ad Sync with head.
 1.158.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.158.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.158.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.159.4.2 19-Jan-2008  bouyer Sync with HEAD
 1.159.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.164.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.164.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.165.2.1 18-May-2008  yamt sync with head.
 1.166.2.7 09-Oct-2010  yamt sync with head
 1.166.2.6 11-Aug-2010  yamt sync with head.
 1.166.2.5 11-Mar-2010  yamt sync with head
 1.166.2.4 18-Jul-2009  yamt sync with head.
 1.166.2.3 16-May-2009  yamt sync with head
 1.166.2.2 04-May-2009  yamt sync with head.
 1.166.2.1 16-May-2008  yamt sync with head.
 1.167.18.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.167.16.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.167.14.2 23-Jul-2009  jym Sync with HEAD.
 1.167.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.167.10.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.167.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.176.4.4 05-Mar-2011  rmind sync with head
 1.176.4.3 03-Jul-2010  rmind sync with head
 1.176.4.2 30-May-2010  rmind sync with head
 1.176.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.176.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.176.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.176.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.186.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.187.6.1 05-Apr-2012  mrg sync to latest -current.
 1.187.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.187.2.1 17-Apr-2012  yamt sync with head
 1.189.4.1 18-May-2014  rmind sync with head
 1.189.2.2 03-Dec-2017  jdolecek update from HEAD
 1.189.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.192.14.1 08-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #1442):
sys/kern/kern_event.c: revision 1.92 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.198 via patch
sys/sys/event.h: revision 1.30 via patch
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.192.10.2 26-Apr-2017  pgoyette Sync with HEAD
 1.192.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.192.8.1 08-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #1442):
sys/kern/kern_event.c: revision 1.92 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.198 via patch
sys/sys/event.h: revision 1.30 via patch
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.192.6.2 28-Aug-2017  skrll Sync with HEAD
 1.192.6.1 05-Feb-2017  skrll Sync with HEAD
 1.192.4.1 08-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #1442):
sys/kern/kern_event.c: revision 1.92 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.198 via patch
sys/sys/event.h: revision 1.30 via patch
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.193.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.195.4.2 05-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #91):
sys/kern/kern_event.c: revision 1.92
sys/miscfs/genfs/genfs_vnops.c: revision 1.198
sys/sys/event.h: revision 1.30
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.195.4.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.199.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.200.2.4 29-Feb-2020  ad Sync with head.
 1.200.2.3 24-Jan-2020  ad vnodes:

- Have own v_usecount again, don't share the uvm_object's refcount.
- Cluster the members of vnode_t and vnode_impl_t in a cache-concious way.
- Go back to having vi_lock directly in vnode_impl_t.
- Go back to having v_usecount adjusted with atomics.
- Start adjusting v_holdcnt with atomics, too.
- Put all the namecache stuff back into vnode_impl_t.
 1.200.2.2 22-Jan-2020  ad Make sure LK_UPGRADE always comes with LK_NOWAIT; dropping the lock in here
is unclean and I wonder if it could screw over fstrans.
 1.200.2.1 18-Jan-2020  ad Allow VOP_LOCK(LK_NONE).
 1.210.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.219.4.1 05-Mar-2023  martin Pull up following revision(s) (requested by hannken in ticket #111):

sys/miscfs/genfs/genfs_vnops.c: revision 1.220

Fix genfs_can_chtimes() to also handle the condition:

If the time pointer is null, then write permission
on the file is also sufficient.

From FreeBSD.

Should fix PR kern/57246 "NFS group permissions regression"
 1.17 11-Apr-2017  hannken Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.16 28-May-2014  hannken branches: 1.16.4; 1.16.8; 1.16.12;
Change field "layerm_tag" to correct type "enum vtagtype".

CID 1216449: Mixing enum types
 1.15 25-May-2014  hannken Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.14 06-Jun-2010  hannken branches: 1.14.18; 1.14.32;
Change layered file systems to always pass the locking VOP's down to the
leaf file system. Remove now unused member v_vnlock from struct vnode.
Welcome to 5.99.30

Discussed on tech-kern.
 1.13 30-Jan-2008  ad branches: 1.13.10; 1.13.30; 1.13.32;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.12 10-Oct-2007  ad branches: 1.12.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.11 11-Dec-2005  christos branches: 1.11.30; 1.11.44; 1.11.46; 1.11.48;
merge ktrace-lwp.
 1.10 25-Sep-2005  jmmv Follow compat naming tradition: rename compat_export_args to export_args30.
 1.9 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.8 30-Aug-2005  xtraeme Remove __P()
 1.7 24-Jul-2005  erh Provide a sysctl (vfs.layerfs.debug) to control verbose output when
LAYERFS_DIAGNOSTIC is turned on.
 1.6 28-May-2004  wrstuden branches: 1.6.12;
Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).

We still need VOP_UPCALL(), but this is better than things were before.

This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.5 07-Aug-2003  agc branches: 1.5.2;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.4 07-Jun-2001  wiz branches: 1.4.22;
Typos in comments (misc/13133 by Michael K. Sanders)
 1.3 30-Mar-2000  simonb branches: 1.3.6;
Delete redundant decl of layer_node_create(), it's in layer_extern.h.
 1.2 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.1 08-Jul-1999  wrstuden branches: 1.1.2; 1.1.4;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.1.4.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.1.2.1 08-Jul-1999  thorpej file layer.h was added on branch chs-ubc2 on 1999-08-02 22:27:34 +0000
 1.3.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.4.22.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.4.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.4.22.1 03-Aug-2004  skrll Sync with HEAD
 1.5.2.1 30-May-2004  tron Pull up revision 1.6 (requested by wrstuden in ticket #424):
Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).
We still need VOP_UPCALL(), but this is better than things were before.
This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.6.12.3 04-Feb-2008  yamt sync with head.
 1.6.12.2 27-Oct-2007  yamt sync with head.
 1.6.12.1 21-Jun-2006  yamt sync with head.
 1.11.48.1 14-Oct-2007  yamt sync with head.
 1.11.46.2 23-Mar-2008  matt sync with HEAD
 1.11.46.1 06-Nov-2007  matt sync with HEAD
 1.11.44.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.11.30.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.12.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.13.32.1 03-Jul-2010  rmind sync with head
 1.13.30.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.13.10.1 11-Aug-2010  yamt sync with head.
 1.14.32.1 10-Aug-2014  tls Rebase.
 1.14.18.2 03-Dec-2017  jdolecek update from HEAD
 1.14.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.16.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.16.8.1 26-Apr-2017  pgoyette Sync with HEAD
 1.16.4.1 28-Aug-2017  skrll Sync with HEAD
 1.41 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.40 04-Jun-2017  hannken branches: 1.40.6; 1.40.12;
Locking a layer vnode using the regular bypass routine is no longer
racy. Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
 1.39 30-Mar-2017  hannken branches: 1.39.6;
Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
 1.38 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.37 27-Jan-2017  hannken Handle v_writecount from layer_open(), layer_close() and layer_revoke()
so lower file system vnodes get marked as open for writing.
 1.36 25-May-2014  hannken branches: 1.36.4; 1.36.8; 1.36.12;
Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.35 27-Feb-2014  hannken branches: 1.35.2;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.34 01-Feb-2012  dholland branches: 1.34.6; 1.34.10;
Change the syscall API for quotas over to the new non-proplib one.

- struct vfs_quotactl_args -> struct quotactl_args
- add sys/stdint.h to sys/quotactl.h for clean userland build
- install sys/quotactl.h in /usr/include
- update set lists for same
- add new marshalling code in libquota
- add new unmarshalling code in vfs_syscalls.c
- discard proplib interpreter code in vfs_quotactl.c
- add dispatching code for the 14 quotactl ops in vfs_quotactl.c
- mark the proplib quotactl syscall obsolete
- add a new syscall number for the new quotactl syscall
- change the name of the syscall to __quotactl()
- remove the decl of the old quotactl from quota/quotaprop.h
- add a decl of the new quotactl to sys/quotactl.h
- update the libc build
- update ktruss
- remove proplib marshalling code from libquota
- update copy of syscall table in gdb ppc sources
- hack rumphijack to accomodate new quotactl name (as I recall,
pooka wanted such a name change to simplify something, but I
don't really see what/how)

This change appears to require a kernel version bump for rumpish
reasons.
 1.33 29-Jan-2012  dholland Remove the extra op argument to VFS_QUOTACTL() - the op is now stored
purely in the args structure.

This change requires a kernel version bump.
 1.32 29-Jan-2012  dholland Introduce struct vfs_quotactl_args. Use it.

This change uglifies vfs_quotactl some in order to make room for
moving operation-specific but FS-independent logic out of ufs_quota.c.

Note: this change requires a kernel version bump.
 1.31 29-Jan-2012  dholland Move the proplib-based quota command dispatching (that is, the code
that knows the magic string names for the allowed actions) out of
UFS-specific code and to fs-independent code.

This introduces QUOTACTL_* operation codes and changes the signature
of VFS_QUOTACTL() again for compile safety.

Note: this change requires a kernel version bump.
 1.30 29-Jan-2012  dholland Move the code for iterating over the multiple RPC calls in a quota
proplib XML packet to vfs_quotactl.c out of sys/ufs/ufs.

Add a dummy extra arg to VFS_QUOTACTL for compile safety.

Note: this change requires a kernel version bump.
 1.29 11-Jul-2011  hannken branches: 1.29.2; 1.29.6;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.28 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.27 10-Jan-2011  hannken branches: 1.27.2; 1.27.4;
Add layer_revoke() that adjusts the lower vnode use count to be at least as
high as the upper vnode count before passing down the VOP_REVOKE().

This way vclean() check for active (vp->v_usecount > 1) vnodes gets it right.

Should fix PR kern/43456.
 1.26 02-Jul-2010  hannken LK_INTERLOCK is no longer a valid flag for VOP_LOCK(). This makes
layer_*lock*() obsolete. Remove them and handle lock operations
with the generic bypass function.

Ride 5.99.34.
 1.25 02-Jul-2010  rmind Slightly clean-up layerfs and nullfs: update the big description more to
the reality (remove duplicate one in nullfs, merge some differences from
it), KNF, improve and update some comments, add few KASSERT()s, remove
unused declarations, avoid double inclusion of headers, misc.

No functional changes.
 1.24 28-Jan-2008  dholland branches: 1.24.10; 1.24.30; 1.24.32;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.23 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.22 13-Jul-2006  martin branches: 1.22.28; 1.22.30; 1.22.36;
Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.21 14-May-2006  elad branches: 1.21.4;
integrate kauth.
 1.20 11-Dec-2005  christos branches: 1.20.4; 1.20.6; 1.20.8; 1.20.10; 1.20.12;
merge ktrace-lwp.
 1.19 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.18 30-Aug-2005  xtraeme Remove __P()
 1.17 30-Jun-2004  hannken branches: 1.17.12;
Do LAYERFS_REMOVED for vop_rmdir.

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.16 07-Jun-2004  yamt do a LAYERFS_REMOVED hack for vop_rename as well.
 1.15 29-May-2004  wrstuden Add layerfs_snapshot() as a handler routine for VFS_SNAPSHOT() calls
through a layered file system.

Note: we don't actually support snapshots through a layered file system,
and this routine returns an error. However we: 1) have clearly documented
what needs fixing (which isn't trivial to fix) and 2) if we do fix
this, all layered file systems can take advantage of it at once.
 1.14 28-May-2004  wrstuden Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).

We still need VOP_UPCALL(), but this is better than things were before.

This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.13 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.12 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.11 25-Jan-2004  hannken branches: 1.11.2;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.10 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.9 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.8 29-Jun-2003  fvdl branches: 1.8.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.7 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.6 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.5 06-Dec-2001  chs add VOP_GETPAGES and VOP_PUTPAGES methods for layered filesystems.
drop the interlock on the upper layer, acquire the interlock on the
lower layer.
 1.4 07-Jun-2001  wiz branches: 1.4.2;
Typos in comments (misc/13133 by Michael K. Sanders)
 1.3 16-Mar-2000  jdolecek branches: 1.3.6;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.2 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.1 08-Jul-1999  wrstuden branches: 1.1.2; 1.1.4;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.1.4.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.1.2.1 08-Jul-1999  thorpej file layer_extern.h was added on branch chs-ubc2 on 1999-08-02 22:27:34 +0000
 1.3.6.2 08-Jan-2002  nathanw Catch up to -current.
 1.3.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.4.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.8.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.8.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.8.2.2 03-Aug-2004  skrll Sync with HEAD
 1.8.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.11.2.3 02-Jul-2004  he Pull up revision 1.17 (requested by hannken in ticket #575):
Do LAYERFS_REMOVED for vop_rmdir.
 1.11.2.2 21-Jun-2004  tron Pull up revision 1.16 (requested by yamt in ticket #512):
do a LAYERFS_REMOVED hack for vop_rename as well.
 1.11.2.1 30-May-2004  tron Pull up revision 1.14 (requested by wrstuden in ticket #424):
Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).
We still need VOP_UPCALL(), but this is better than things were before.
This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.17.12.4 04-Feb-2008  yamt sync with head.
 1.17.12.3 07-Dec-2007  yamt sync with head
 1.17.12.2 30-Dec-2006  yamt sync with head.
 1.17.12.1 21-Jun-2006  yamt sync with head.
 1.20.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.20.10.2 20-Apr-2006  christos kauth_cred_t -> struct kauth_cred;
 1.20.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.20.8.2 11-Aug-2006  yamt sync with head
 1.20.8.1 24-May-2006  yamt sync with head.
 1.20.6.1 01-Jun-2006  kardel Sync with head.
 1.20.4.1 09-Sep-2006  rpaulo sync with head
 1.21.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.22.36.2 18-Feb-2008  mjf Sync with HEAD.
 1.22.36.1 08-Dec-2007  mjf Sync with HEAD.
 1.22.30.2 23-Mar-2008  matt sync with HEAD
 1.22.30.1 09-Jan-2008  matt sync with HEAD
 1.22.28.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.24.32.3 21-Apr-2011  rmind sync with head
 1.24.32.2 05-Mar-2011  rmind sync with head
 1.24.32.1 03-Jul-2010  rmind sync with head
 1.24.30.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.24.10.1 11-Aug-2010  yamt sync with head.
 1.27.4.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.27.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.29.6.1 18-Feb-2012  mrg merge to -current.
 1.29.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.29.2.1 17-Apr-2012  yamt sync with head
 1.34.10.1 18-May-2014  rmind sync with head
 1.34.6.2 03-Dec-2017  jdolecek update from HEAD
 1.34.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.35.2.1 10-Aug-2014  tls Rebase.
 1.36.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.36.8.2 26-Apr-2017  pgoyette Sync with HEAD
 1.36.8.1 20-Mar-2017  pgoyette Sync with HEAD
 1.36.4.2 28-Aug-2017  skrll Sync with HEAD
 1.36.4.1 05-Feb-2017  skrll Sync with HEAD
 1.39.6.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.40.12.1 17-Jan-2020  ad Sync with head.
 1.40.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.39 10-Apr-2022  andvar fix various typos in comments and output/log messages.
 1.38 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.37 09-Nov-2014  maxv branches: 1.37.20; 1.37.30;
Do not uselessly include <sys/malloc.h>.
 1.36 25-May-2014  hannken branches: 1.36.2;
Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.35 10-Feb-2014  hannken branches: 1.35.2;
Change layerfs_vget(), layerfs_fhtovp() and the various layer xxx_mount()
functions to unlock/relock the node for the call to layer_node_create().

Finally remove dirty hacks (LK_NOWAIT, kpause) from layer_node_find().
 1.34 09-Feb-2014  hannken When layer_node_alloc() finds another thread already inserted the node
into the hashlist and discards the now unneeded node it will raise a
panic "dead but not clean".

Reorder the initialization and use ungetnewvnode() to discard the node.
 1.33 29-Jan-2014  hannken Allow layer_node_create() with unlocked lower node and change
layer_bypass() to enter nodes from creation operations unlocked.
 1.32 12-Jun-2011  rmind branches: 1.32.2; 1.32.12; 1.32.16;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.31 21-Jul-2010  hannken branches: 1.31.6;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.30 02-Jul-2010  rmind Slightly clean-up layerfs and nullfs: update the big description more to
the reality (remove duplicate one in nullfs, merge some differences from
it), KNF, improve and update some comments, add few KASSERT()s, remove
unused declarations, avoid double inclusion of headers, misc.

No functional changes.
 1.29 06-Jun-2010  hannken Change layered file systems to always pass the locking VOP's down to the
leaf file system. Remove now unused member v_vnlock from struct vnode.
Welcome to 5.99.30

Discussed on tech-kern.
 1.28 08-Jan-2010  pooka branches: 1.28.2; 1.28.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.27 15-Mar-2009  cegger ansify function definitions
 1.26 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.25 24-Jan-2008  ad branches: 1.25.10; 1.25.18; 1.25.24;
specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.24 23-Jan-2008  ad layer_node_find: if we find a node being cleaned out, then ignore it and
continue. A thread trying to clean out the extant layer vnode needs to
acquire the shared lock (i.e. the lower vnode's lock), which our caller
already holds. To allow the cleaning to succeed the current thread must make
progress. So, for a brief time more than one vnode in a layered file system
may refer to a single vnode in the lower file system.
 1.23 02-Jan-2008  ad Merge vmlocking2 to head.
 1.22 10-Oct-2007  ad branches: 1.22.4; 1.22.6; 1.22.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.21 09-Dec-2006  chs branches: 1.21.6; 1.21.18; 1.21.20; 1.21.22;
a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.20 25-Nov-2006  elad branches: 1.20.2;
Part of PR/33280: Christian Ehrhardt: In the error path (which probably
can't happen) lmp->layerm_hashlock is not unlocked.
 1.19 24-Nov-2006  wiz s/existance/existence/, from Zafer.
 1.18 11-Dec-2005  christos branches: 1.18.20; 1.18.22;
merge ktrace-lwp.
 1.17 30-Aug-2005  xtraeme Remove __P()
 1.16 24-Jul-2005  erh Provide a sysctl (vfs.layerfs.debug) to control verbose output when
LAYERFS_DIAGNOSTIC is turned on.
 1.15 07-Aug-2003  agc branches: 1.15.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.14 29-Jun-2003  fvdl branches: 1.14.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.13 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.12 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.11 20-Feb-2002  enami Don't use MALLOC for variable sized allocation.
 1.10 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.9 10-Nov-2001  lukem add RCSIDs
 1.8 07-Jun-2001  wiz branches: 1.8.2; 1.8.6;
Typos in comments (misc/13133 by Michael K. Sanders)
 1.7 27-Nov-2000  chs branches: 1.7.2;
Initial integration of the Unified Buffer Cache project.
 1.6 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.5 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.4 25-Oct-1999  wrstuden Since we don't put layered device nodes in the spechash hash chains,
initialize vp->v_hashchain to NULL.
 1.3 15-Jul-1999  wrstuden branches: 1.3.2; 1.3.4; 1.3.6; 1.3.8;
Define VLAYER and make layered fs's set this flag when creating their vnodes.

getnewvnode now checks this bit, and it if's set makes sure a vnode's not
locked before removing it from the free list.

Closes PR 7954 by Alan Barrett <apb@iafrica.com>.
 1.2 12-Jul-1999  wrstuden Fix tyop pointed out by Chuck Silvers <chuq@chuq.com>.
 1.1 08-Jul-1999  wrstuden Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.3.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.6.1 15-Nov-1999  fvdl Sync with -current
 1.3.4.2 08-Dec-2000  bouyer Sync with HEAD.
 1.3.4.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.2.2 02-Aug-1999  thorpej Update from trunk.
 1.3.2.1 15-Jul-1999  thorpej file layer_subr.c was added on branch chs-ubc2 on 1999-08-02 22:27:34 +0000
 1.7.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.7.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.7.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.7.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.8.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.8.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.8.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.14.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.14.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.14.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.14.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.14.2.2 03-Aug-2004  skrll Sync with HEAD
 1.14.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.15.16.5 04-Feb-2008  yamt sync with head.
 1.15.16.4 21-Jan-2008  yamt sync with head
 1.15.16.3 27-Oct-2007  yamt sync with head.
 1.15.16.2 30-Dec-2006  yamt sync with head.
 1.15.16.1 21-Jun-2006  yamt sync with head.
 1.18.22.1 10-Dec-2006  yamt sync with head.
 1.18.20.1 12-Jan-2007  ad Sync with head.
 1.20.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.21.22.1 14-Oct-2007  yamt sync with head.
 1.21.20.3 23-Mar-2008  matt sync with HEAD
 1.21.20.2 09-Jan-2008  matt sync with HEAD
 1.21.20.1 06-Nov-2007  matt sync with HEAD
 1.21.18.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.21.6.2 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.21.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.22.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.22.6.3 12-Dec-2007  ad layer_node_alloc: copy VV_MPSAFE from lowervp.
 1.22.6.2 06-Dec-2007  ad - layer_node_find: fix a race.
- Use kmem_alloc/free.
 1.22.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.22.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.25.24.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.25.18.1 28-Apr-2009  skrll Sync with HEAD.
 1.25.10.3 11-Aug-2010  yamt sync with head.
 1.25.10.2 11-Mar-2010  yamt sync with head
 1.25.10.1 04-May-2009  yamt sync with head.
 1.28.4.5 30-May-2011  rmind - Amend getnewvnode(9) to take the lock for sharing, not a vnode.
- Update tmpfs to perform vnode and UVM object lock sharing correctly.
 1.28.4.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.28.4.3 05-Mar-2011  rmind sync with head
 1.28.4.2 03-Jul-2010  rmind sync with head
 1.28.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.28.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.31.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.32.16.1 18-May-2014  rmind sync with head
 1.32.12.2 03-Dec-2017  jdolecek update from HEAD
 1.32.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.32.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.35.2.1 10-Aug-2014  tls Rebase.
 1.36.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.37.30.1 20-Apr-2020  bouyer Sync with HEAD
 1.37.20.1 21-Apr-2020  martin Sync with HEAD
 1.56 09-Dec-2022  hannken Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com
 1.55 18-Jul-2022  thorpej Make kqueue event status for vnodes shareable, and for stacked file systems
like nullfs, make the upper vnode share that status with the lower vnode.

And, lo, NetBSD 9.99.99.

Fixes PR kern/56713.
 1.54 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.53 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.52 07-Aug-2019  pgoyette branches: 1.52.2;
Many years ago someone created a new __link_set_sysctl_funcs to hold
the list of routines that need to be called for setting up sysctl
variables. This worked great for all code included in the kernel
itself, but didn't deal with modules that want to create their own
sysctl data. So, we ended up with a lot of #ifdef _MODULE blocks
so modules could explicitly call their setup functions when loaded
as non-built-in modules.

So today, we complete the task that was started so many years ago.

When modules are loaded, after we've called xxx_modcmd(INIT...) we
check if the module contains its own __link_set_sysctl_funcs, and
if so we call the functions listed. We add a struct sysctllog member
to the struct module so we can call sysctl_teardown() when the module
gets unloaded. (The sequence of events ensures that the sysctl stuff
doesn't get created until the rest of the module's init code does any
required memory allocation.)

So, no more need to explicitly call the sysctl setup routines when
built as a loadable module.
 1.51 04-Jun-2017  hannken branches: 1.51.6;
Locking a layer vnode using the regular bypass routine is no longer
racy. Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
 1.50 01-Jun-2017  chs branches: 1.50.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.49 11-Apr-2017  hannken Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.48 30-Mar-2017  hannken Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
 1.47 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.46 20-Apr-2015  riastradh branches: 1.46.2; 1.46.4;
Cull unused vnode v_iflags: VI_LAYER, VI_LOCKSHARE.
 1.45 09-Nov-2014  maxv branches: 1.45.2;
Do not uselessly include <sys/malloc.h>.
 1.44 25-May-2014  hannken branches: 1.44.2;
Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.43 25-Feb-2014  pooka branches: 1.43.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.42 10-Feb-2014  hannken Change layerfs_vget(), layerfs_fhtovp() and the various layer xxx_mount()
functions to unlock/relock the node for the call to layer_node_create().

Finally remove dirty hacks (LK_NOWAIT, kpause) from layer_node_find().
 1.41 31-May-2012  pgoyette branches: 1.41.2; 1.41.4;
Ooopppsss! sysctl nodes created during module load time cannot be
PERMANENT
 1.40 31-May-2012  pgoyette When built as module, track sysctl node creations, and destroy them on
module exit.
 1.39 01-Feb-2012  dholland Change the syscall API for quotas over to the new non-proplib one.

- struct vfs_quotactl_args -> struct quotactl_args
- add sys/stdint.h to sys/quotactl.h for clean userland build
- install sys/quotactl.h in /usr/include
- update set lists for same
- add new marshalling code in libquota
- add new unmarshalling code in vfs_syscalls.c
- discard proplib interpreter code in vfs_quotactl.c
- add dispatching code for the 14 quotactl ops in vfs_quotactl.c
- mark the proplib quotactl syscall obsolete
- add a new syscall number for the new quotactl syscall
- change the name of the syscall to __quotactl()
- remove the decl of the old quotactl from quota/quotaprop.h
- add a decl of the new quotactl to sys/quotactl.h
- update the libc build
- update ktruss
- remove proplib marshalling code from libquota
- update copy of syscall table in gdb ppc sources
- hack rumphijack to accomodate new quotactl name (as I recall,
pooka wanted such a name change to simplify something, but I
don't really see what/how)

This change appears to require a kernel version bump for rumpish
reasons.
 1.38 29-Jan-2012  dholland Remove the extra op argument to VFS_QUOTACTL() - the op is now stored
purely in the args structure.

This change requires a kernel version bump.
 1.37 29-Jan-2012  dholland Introduce struct vfs_quotactl_args. Use it.

This change uglifies vfs_quotactl some in order to make room for
moving operation-specific but FS-independent logic out of ufs_quota.c.

Note: this change requires a kernel version bump.
 1.36 29-Jan-2012  dholland Move the proplib-based quota command dispatching (that is, the code
that knows the magic string names for the allowed actions) out of
UFS-specific code and to fs-independent code.

This introduces QUOTACTL_* operation codes and changes the signature
of VFS_QUOTACTL() again for compile safety.

Note: this change requires a kernel version bump.
 1.35 29-Jan-2012  dholland Move the code for iterating over the multiple RPC calls in a quota
proplib XML packet to vfs_quotactl.c out of sys/ufs/ufs.

Add a dummy extra arg to VFS_QUOTACTL for compile safety.

Note: this change requires a kernel version bump.
 1.34 06-Mar-2011  bouyer branches: 1.34.4; 1.34.8;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.33 02-Jul-2010  rmind branches: 1.33.2; 1.33.4;
Slightly clean-up layerfs and nullfs: update the big description more to
the reality (remove duplicate one in nullfs, merge some differences from
it), KNF, improve and update some comments, add few KASSERT()s, remove
unused declarations, avoid double inclusion of headers, misc.

No functional changes.
 1.32 08-Jan-2010  pooka branches: 1.32.2; 1.32.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.31 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.30 05-Dec-2008  ad branches: 1.30.4;
PR kern/40110: null, overlay and umap modules loading -> panic (layerfs symbols not there)

Add a layerfs module.
 1.29 28-Jan-2008  dholland branches: 1.29.6; 1.29.10; 1.29.16; 1.29.18;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.28 08-Dec-2007  ad Use kmem_alloc/free.
 1.27 26-Nov-2007  pooka branches: 1.27.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.26 16-Nov-2006  christos branches: 1.26.12; 1.26.22; 1.26.24; 1.26.30;
__unused removal on arguments; approved by core.
 1.25 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.24 13-Jul-2006  martin branches: 1.24.4; 1.24.6;
Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.23 14-May-2006  elad branches: 1.23.4;
integrate kauth.
 1.22 11-Dec-2005  christos branches: 1.22.4; 1.22.6; 1.22.8; 1.22.10; 1.22.12;
merge ktrace-lwp.
 1.21 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.20 24-Jul-2005  erh Provide a sysctl (vfs.layerfs.debug) to control verbose output when
LAYERFS_DIAGNOSTIC is turned on.
 1.19 29-May-2004  wrstuden branches: 1.19.12;
Add layerfs_snapshot() as a handler routine for VFS_SNAPSHOT() calls
through a layered file system.

Note: we don't actually support snapshots through a layered file system,
and this routine returns an error. However we: 1) have clearly documented
what needs fixing (which isn't trivial to fix) and 2) if we do fix
this, all layered file systems can take advantage of it at once.
 1.18 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.17 22-May-2004  christos Unfortunately, we need to allocate space here. Pointed out by Juan RP.
 1.16 22-May-2004  christos we are copying all the infomation from statvfs here; we don't need an
intermediate copy on the stack.
 1.15 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.14 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.13 24-Mar-2004  atatat branches: 1.13.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.12 17-Jan-2004  atatat Rename sysctl setup function to match "reality"
 1.11 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.10 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.9 29-Jun-2003  fvdl branches: 1.9.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.8 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.7 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.6 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.5 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.4 10-Nov-2001  lukem add RCSIDs
 1.3 07-Jun-2001  wiz branches: 1.3.2; 1.3.6;
Typos in comments (misc/13133 by Michael K. Sanders)
 1.2 13-Mar-2000  soren branches: 1.2.6;
Fix doubled 'the's in comments.
 1.1 08-Jul-1999  wrstuden branches: 1.1.2; 1.1.4;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.1.4.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.1.2.1 08-Jul-1999  thorpej file layer_vfsops.c was added on branch chs-ubc2 on 1999-08-02 22:27:34 +0000
 1.2.6.3 08-Jan-2002  nathanw Catch up to -current.
 1.2.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.2.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.3.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.3.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.9.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.9.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.9.2.2 03-Aug-2004  skrll Sync with HEAD
 1.9.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.13.2.1 29-May-2004  tron Pull up revision 1.18 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.19.12.5 04-Feb-2008  yamt sync with head.
 1.19.12.4 21-Jan-2008  yamt sync with head
 1.19.12.3 07-Dec-2007  yamt sync with head
 1.19.12.2 30-Dec-2006  yamt sync with head.
 1.19.12.1 21-Jun-2006  yamt sync with head.
 1.22.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.22.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.22.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.22.8.2 11-Aug-2006  yamt sync with head
 1.22.8.1 24-May-2006  yamt sync with head.
 1.22.6.1 01-Jun-2006  kardel Sync with head.
 1.22.4.1 09-Sep-2006  rpaulo sync with head
 1.23.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.24.6.2 10-Dec-2006  yamt sync with head.
 1.24.6.1 22-Oct-2006  yamt sync with head
 1.24.4.1 18-Nov-2006  ad Sync with head.
 1.26.30.3 18-Feb-2008  mjf Sync with HEAD.
 1.26.30.2 27-Dec-2007  mjf Sync with HEAD.
 1.26.30.1 08-Dec-2007  mjf Sync with HEAD.
 1.26.24.2 23-Mar-2008  matt sync with HEAD
 1.26.24.1 09-Jan-2008  matt sync with HEAD
 1.26.22.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.26.22.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.26.12.1 09-Dec-2007  reinoud Pullup to HEAD
 1.27.2.2 08-Dec-2007  ad Sync with head.
 1.27.2.1 06-Dec-2007  ad - layer_node_find: fix a race.
- Use kmem_alloc/free.
 1.29.18.2 28-Apr-2009  skrll Sync with HEAD.
 1.29.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.29.16.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.29.10.3 11-Aug-2010  yamt sync with head.
 1.29.10.2 11-Mar-2010  yamt sync with head
 1.29.10.1 04-May-2009  yamt sync with head.
 1.29.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.30.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.32.4.2 21-Apr-2011  rmind sync with head
 1.32.4.1 03-Jul-2010  rmind sync with head
 1.32.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.33.4.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.33.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.34.8.2 02-Jun-2012  mrg sync to latest -current.
 1.34.8.1 18-Feb-2012  mrg merge to -current.
 1.34.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.34.4.2 30-Oct-2012  yamt sync with head
 1.34.4.1 17-Apr-2012  yamt sync with head
 1.41.4.1 18-May-2014  rmind sync with head
 1.41.2.2 03-Dec-2017  jdolecek update from HEAD
 1.41.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.43.2.1 10-Aug-2014  tls Rebase.
 1.44.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.45.2.2 28-Aug-2017  skrll Sync with HEAD
 1.45.2.1 06-Jun-2015  skrll Sync with HEAD
 1.46.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.46.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.46.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.50.2.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.51.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.51.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.52.2.2 29-Feb-2020  ad Sync with head.
 1.52.2.1 17-Jan-2020  ad Sync with head.
 1.72 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.71 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.70 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.69 04-Apr-2020  ad branches: 1.69.2;
Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.68 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.67 04-Jun-2017  hannken branches: 1.67.6; 1.67.12;
Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
 1.66 26-May-2017  riastradh branches: 1.66.2;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.65 24-May-2017  hannken Protect layer_getpages against vnodes disappearing during a
forced unmount.
 1.64 07-May-2017  hannken Move v_writecount adjustment from revoke to reclaim.
 1.63 26-Apr-2017  riastradh branches: 1.63.2;
Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.62 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.61 30-Mar-2017  hannken Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
 1.60 27-Jan-2017  hannken Handle v_writecount from layer_open(), layer_close() and layer_revoke()
so lower file system vnodes get marked as open for writing.
 1.59 20-Aug-2016  hannken branches: 1.59.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.58 25-May-2014  hannken branches: 1.58.4; 1.58.8;
Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.57 24-Mar-2014  hannken branches: 1.57.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.56 12-Mar-2014  hannken Restructure layer_lock() to always lock before testing for dead node.
Use ISSET() to test flags, add assertions.
 1.55 27-Feb-2014  hannken The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.54 07-Feb-2014  hannken Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.53 29-Jan-2014  hannken Allow layer_node_create() with unlocked lower node and change
layer_bypass() to enter nodes from creation operations unlocked.
 1.52 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.51 10-Oct-2012  dholland branches: 1.51.2;
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.

In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.

(in theory both of these fixes are not required together)

Should fix PR 47040.
 1.50 11-Jul-2011  hannken branches: 1.50.2; 1.50.8; 1.50.12;
Layer_fsync(): when syncing a device node call spec_fsync() to clean the
layer node before descending to the lower file system.

Adresses PR kern/38762 panic: vwakeup: neg numoutput
 1.49 11-Jul-2011  hannken Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.48 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.47 03-Apr-2011  rmind branches: 1.47.2;
- Use offsetof() in VOPARG_OFFSETOF() instead of re-implementing it.
- Remove VDESC_NOMAP_VPP and VDESC_VPP_WILLRELE.
- Remove VRELEL_NOINACTIVE and VRELEL_ONHEAD.
 1.46 13-Jan-2011  hannken branches: 1.46.2;
Layer_revoke(): change previous to always take an extra reference on the
lower vnode before passing down the VOP_REVOKE(). This way VOP_REVOKE()
on a layered file system always inactivates and closes the lower vnode.

Should finally fix PR kern/43456.
 1.45 10-Jan-2011  hannken Add layer_revoke() that adjusts the lower vnode use count to be at least as
high as the upper vnode count before passing down the VOP_REVOKE().

This way vclean() check for active (vp->v_usecount > 1) vnodes gets it right.

Should fix PR kern/43456.
 1.44 02-Jan-2011  hannken layer_inactive: With specnodes introduced during vmlocking2
it is safe to cache device nodes.

Tested with nullfs only as unionfs with device nodes panics.
 1.43 02-Jul-2010  hannken LK_INTERLOCK is no longer a valid flag for VOP_LOCK(). This makes
layer_*lock*() obsolete. Remove them and handle lock operations
with the generic bypass function.

Ride 5.99.34.
 1.42 02-Jul-2010  rmind Slightly clean-up layerfs and nullfs: update the big description more to
the reality (remove duplicate one in nullfs, merge some differences from
it), KNF, improve and update some comments, add few KASSERT()s, remove
unused declarations, avoid double inclusion of headers, misc.

No functional changes.
 1.41 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.40 06-Jun-2010  hannken Change layered file systems to always pass the locking VOP's down to the
leaf file system. Remove now unused member v_vnlock from struct vnode.
Welcome to 5.99.30

Discussed on tech-kern.
 1.39 08-Jan-2010  pooka branches: 1.39.2; 1.39.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.38 14-Mar-2009  dsl ANSIfy another 1261 function definitions.
The only ones left in sys are beyond by sed script!
(or in sys/dist or sys/external)
Mostly they have function pointer parameters.
 1.37 14-Feb-2009  plunky consistency checks made inside #ifdef SAFETY should really
be #ifdef DIAGNOSTIC
 1.36 03-Jan-2009  dholland branches: 1.36.2;
Clarify a comment
 1.35 30-Jan-2008  ad branches: 1.35.6; 1.35.10; 1.35.18; 1.35.20;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.34 02-Jan-2008  ad Merge vmlocking2 to head.
 1.33 22-Dec-2007  dyoung Bug fix: at the top of layer_bypass(), save a pointer to the mount
point for re-use at the bottom, instead of trying to re-read the
mount point from a potentially vrele()'d vnode.
 1.32 10-Oct-2007  ad branches: 1.32.4; 1.32.6; 1.32.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.31 16-Apr-2007  enami branches: 1.31.6; 1.31.8; 1.31.10;
Don't expand RCS id of ancestor file. The id itself is actually copied
from null_vnops.c since the log message of rev. 1.1 implies the copy.
 1.30 16-Apr-2007  chs define a pager flag PGO_RECLAIM, similar to FSYNC_RECLAIM, and use it
to skip unnecessary flushing when layered file system vnodes are recycled.
this also prevents a deadlock with the dodgy LFS putpages routine.
fixes the non-LFS part of PR 36150.
 1.29 09-Dec-2006  chs branches: 1.29.2; 1.29.6; 1.29.8;
a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.28 25-Nov-2006  elad branches: 1.28.2;
Part of PR/33280: Christian Ehrhardt: If LK_INTERLOCK is set
vp->v_interlock may be unlocked twice: Once explicitly and a second time
implicilty by lockmgr. LK_INTERLOCK is cleared from the variable flags but
not from ap->a_flags which is used with lockmgr. This is not so much of a
problem because there seems to be no call site that actually uses
LK_INTERLOCK with layer_unlock or VOP_UNLOCK.

okay martin@
 1.27 14-May-2006  elad branches: 1.27.8; 1.27.10;
integrate kauth.
 1.26 11-Dec-2005  christos branches: 1.26.4; 1.26.6; 1.26.8; 1.26.10; 1.26.12;
merge ktrace-lwp.
 1.25 30-Aug-2005  xtraeme Remove __P()
 1.24 26-Feb-2005  perry branches: 1.24.4;
nuke trailing whitespace
 1.23 30-Jun-2004  hannken branches: 1.23.4; 1.23.6;
Do LAYERFS_REMOVED for vop_rmdir.

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.22 19-Jun-2004  yamt layer_islocked: check a status of the lower vnode as well.
 1.21 16-Jun-2004  wrstuden Make sure we actually locked the parent vnode before we clear
PDIRUNLOCK. The whole reason we have the flag is to note (rare)
cases where we are supposed to have the parent directory locked
but don't. Permits error handling code to know what to do with
the parrent vnode (vrele() vs vput()).
 1.20 16-Jun-2004  yamt - eliminate gratuitous differences between umap_bypass() and layer_bypass().
- fix a typo in a comment.
no functional changes are intended.
 1.19 16-Jun-2004  yamt missing error recover from layer_node_create failure.
 1.18 11-Jun-2004  yamt umap_lookup/layer_lookup: NULL out *ap->a_vpp after calling
underlying filesystem because some caller including lookup()
assume that *vpp is NULL on error.
 1.17 07-Jun-2004  yamt do a LAYERFS_REMOVED hack for vop_rename as well.
 1.16 28-May-2004  wrstuden Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).

We still need VOP_UPCALL(), but this is better than things were before.

This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.15 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.14 25-Jan-2004  hannken branches: 1.14.2;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.13 30-Nov-2003  wiz Typo fixes in comments from jmc@openbsd.
 1.12 17-Nov-2003  wiz Various typo fixes from Jonathon Gray via jmc@openbsd.
 1.11 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.10 06-Dec-2001  chs branches: 1.10.16;
add VOP_GETPAGES and VOP_PUTPAGES methods for layered filesystems.
drop the interlock on the upper layer, acquire the interlock on the
lower layer.
 1.9 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.8 10-Nov-2001  lukem add RCSIDs
 1.7 24-Jul-2001  assar branches: 1.7.4;
change vop_symlink and vop_mknod to return vpp (the created node)
refed, so that the caller can actually use it. update callers and
file systems that implement these vnode operations
 1.6 07-Jun-2001  wiz branches: 1.6.2;
Typos in comments (misc/13133 by Michael K. Sanders)
 1.5 21-Dec-2000  enami branches: 1.5.2;
Don't cache a device vnode in a layer node cache once the layer node
is inactivated. Otherwise, the device won't closed.
 1.4 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.
 1.3 30-Mar-2000  augustss branches: 1.3.4;
Register, begone!
 1.2 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.1 08-Jul-1999  wrstuden branches: 1.1.2; 1.1.4;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.1.4.2 05-Jan-2001  bouyer Sync with HEAD
 1.1.4.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.1.2.1 08-Jul-1999  thorpej file layer_vnops.c was added on branch chs-ubc2 on 1999-08-02 22:27:34 +0000
 1.3.4.1 14-Dec-2000  he Pull up revision 1.4 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.5.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.5.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.5.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.5.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.6.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.6.2.1 03-Aug-2001  lukem update to -current
 1.7.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.10.16.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.10.16.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.10.16.4 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.10.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.10.16.1 03-Aug-2004  skrll Sync with HEAD
 1.14.2.4 02-Jul-2004  he Pull up revision 1.23 (requested by hannken in ticket #575):
Do LAYERFS_REMOVED for vop_rmdir.
 1.14.2.3 21-Jun-2004  tron Pull up revision 1.18 (requested by yamt in ticket #514):
umap_lookup/layer_lookup: NULL out *ap->a_vpp after calling
underlying filesystem because some caller including lookup()
assume that *vpp is NULL on error.
 1.14.2.2 21-Jun-2004  tron Pull up revision 1.17 (requested by yamt in ticket #512):
do a LAYERFS_REMOVED hack for vop_rename as well.
 1.14.2.1 30-May-2004  tron Pull up revision 1.16 (requested by wrstuden in ticket #424):
Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).
We still need VOP_UPCALL(), but this is better than things were before.
This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.23.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.23.4.1 29-Apr-2005  kent sync with -current
 1.24.4.6 04-Feb-2008  yamt sync with head.
 1.24.4.5 21-Jan-2008  yamt sync with head
 1.24.4.4 27-Oct-2007  yamt sync with head.
 1.24.4.3 03-Sep-2007  yamt sync with head.
 1.24.4.2 30-Dec-2006  yamt sync with head.
 1.24.4.1 21-Jun-2006  yamt sync with head.
 1.26.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.26.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.26.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.26.8.1 24-May-2006  yamt sync with head.
 1.26.6.1 01-Jun-2006  kardel Sync with head.
 1.26.4.1 09-Sep-2006  rpaulo sync with head
 1.27.10.1 10-Dec-2006  yamt sync with head.
 1.27.8.1 12-Jan-2007  ad Sync with head.
 1.28.2.3 14-Nov-2012  riz Pull up following revision(s) (requested by dholland in ticket #1466):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.28.2.2 16-Apr-2007  bouyer branches: 1.28.2.2.6;
Pull up following revision(s) (requested by chs in ticket #577):
sys/kern/vfs_subr.c: revision 1.287
sys/fs/union/union_vnops.c: revision 1.20
sys/miscfs/genfs/layer_vnops.c: revision 1.30
sys/uvm/uvm_pager.h: revision 1.35
define a pager flag PGO_RECLAIM, similar to FSYNC_RECLAIM, and use it
to skip unnecessary flushing when layered file system vnodes are recycled.
this also prevents a deadlock with the dodgy LFS putpages routine.
fixes the non-LFS part of PR 36150.
 1.28.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.28.2.2.6.1 14-Nov-2012  riz Pull up following revision(s) (requested by dholland in ticket #1466):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.29.8.1 11-Jul-2007  mjf Sync with head.
 1.29.6.4 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.29.6.3 15-Jul-2007  ad Sync with head.
 1.29.6.2 08-Jun-2007  ad Sync with head.
 1.29.6.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.29.2.1 07-May-2007  yamt sync with head.
 1.31.10.1 14-Oct-2007  yamt sync with head.
 1.31.8.3 23-Mar-2008  matt sync with HEAD
 1.31.8.2 09-Jan-2008  matt sync with HEAD
 1.31.8.1 06-Nov-2007  matt sync with HEAD
 1.31.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.32.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.32.6.5 26-Dec-2007  ad Sync with head.
 1.32.6.4 12-Dec-2007  ad Correct a comment.
 1.32.6.3 10-Dec-2007  ad - Don't drain the vnode lock in vclean(); reference counting and XLOCK
should be enough.
- LK_SETRECURSE is gone.
 1.32.6.2 06-Dec-2007  ad - layer_node_find: fix a race.
- Use kmem_alloc/free.
 1.32.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.32.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.35.20.1 06-Nov-2012  riz Pull up following revision(s) (requested by dholland in ticket #1814):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.35.18.3 28-Apr-2009  skrll Sync with HEAD.
 1.35.18.2 03-Mar-2009  skrll Sync with HEAD.
 1.35.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.35.10.3 11-Aug-2010  yamt sync with head.
 1.35.10.2 11-Mar-2010  yamt sync with head
 1.35.10.1 04-May-2009  yamt sync with head.
 1.35.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.36.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.39.4.5 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.39.4.4 21-Apr-2011  rmind sync with head
 1.39.4.3 05-Mar-2011  rmind sync with head
 1.39.4.2 03-Jul-2010  rmind sync with head
 1.39.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.39.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.46.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.47.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.50.12.3 03-Dec-2017  jdolecek update from HEAD
 1.50.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.50.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.50.8.1 18-Nov-2012  msaitoh Pull up following revision(s) (requested by dholland in ticket #664):
sys/kern/vfs_lookup.c: revision 1.195
sys/miscfs/genfs/layer_vnops.c: revision 1.51
In layer_lookup(), clear *vpp before returning EROFS, as otherwise a
stale value can be returned and this causes a diagnostic panic in
namei.
In relookup(), clear *vpp before calling VOP_LOOKUP, as is done in
lookup_once(), as an additional precautionary measure.
(in theory both of these fixes are not required together)
Should fix PR 47040.
 1.50.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.50.2.1 30-Oct-2012  yamt sync with head
 1.51.2.1 18-May-2014  rmind sync with head
 1.57.2.1 10-Aug-2014  tls Rebase.
 1.58.8.2 26-Apr-2017  pgoyette Sync with HEAD
 1.58.8.1 20-Mar-2017  pgoyette Sync with HEAD
 1.58.4.3 28-Aug-2017  skrll Sync with HEAD
 1.58.4.2 05-Feb-2017  skrll Sync with HEAD
 1.58.4.1 05-Oct-2016  skrll Sync with HEAD
 1.59.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.63.2.1 11-May-2017  pgoyette Sync with HEAD
 1.66.2.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.67.12.2 29-Feb-2020  ad Sync with head.
 1.67.12.1 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.67.6.2 21-Apr-2020  martin Sync with HEAD
 1.67.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.69.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.3 05-Jan-1994  mycroft Clean up deleted files.
 1.2 20-May-1993  cgd header cleanup
 1.1 23-Mar-1993  cgd files which implement the kern filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.7 26-May-2020  bouyer Add need-flags for kernfs.
Compile Xen kernfs support only if kernfs is compiled in the kernel.
Should fix MODULAR build.
 1.6 11-Oct-2014  uebayasi Define filesystem attributes with vfs dependency.
 1.5 20-Jul-2014  hannken Change kernfs from hashlist to vcache.
 1.4 03-Mar-2010  pooka branches: 1.4.20; 1.4.34;
You have found a scroll of genocide --More--
What class of monsters do you wish to genocide? --More--
> fs_foo.h
Wiped out all fs_foo.h
 1.3 11-Dec-2005  christos branches: 1.3.74; 1.3.96;
merge ktrace-lwp.
 1.2 08-Sep-2003  itojun add /kern/ipsecsa and /kern/ipsecsp, which can be inspected by setkey(8).
it allows easier access to ipsecsa/sp. it works around problem where
setkey -D does not work with large number of ipsec SAs due to socket buffer
size.
 1.1 16-Apr-2002  thorpej branches: 1.1.6; 1.1.8; 1.1.14;
Cleanup how file system configuration information is declared, grouping
related information together, with the file system code itself.

This is just low-hanging fruit -- more to come.
 1.1.14.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.14.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.14.1 03-Aug-2004  skrll Sync with HEAD
 1.1.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.8.1 16-Apr-2002  jdolecek file files.kernfs was added on branch kqueue on 2002-06-23 17:50:10 +0000
 1.1.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.1.6.1 16-Apr-2002  nathanw file files.kernfs was added on branch nathanw_sa on 2002-06-20 03:47:57 +0000
 1.3.96.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.3.74.1 11-Mar-2010  yamt sync with head
 1.4.34.1 10-Aug-2014  tls Rebase.
 1.4.20.2 03-Dec-2017  jdolecek update from HEAD
 1.4.20.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.45 27-Jun-2025  andvar Grammar and spelling fixes, mainly in comments. A few in documentation,
logging, test description, and SCSI ASC/ASCQ assignment descriptions.
 1.44 07-Apr-2020  jdolecek branches: 1.44.28;
switch KERNFS_ALLOCENTRY() to use kmem_zalloc() instead of malloc()
 1.43 04-Feb-2020  riastradh Use specfs vnops for specnodes in kernfs.

While here, don't filter out rootdev and rrootdev merely because
they're not cached.

Fixes the elusive /kern/rootdev and /kern/rrootdev nodes, which only
appeared sometimes when they felt like it, and fixes operations on
/kern/rootdev and /kern/rrootdev always returning EOPNOTSUPP.

We didn't seem to have a single PR for these issues but the following
PRs are all relevant:

PR bin/13564
PR kern/38265
PR kern/38778
PR kern/45974

XXX pullup-9, pullup-8, pullup-7, pullup-6, pullup-5, pullup-4, pullup-3, pullup-2, pullup-1.4T...
 1.42 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.41 02-Jan-2020  thorpej branches: 1.41.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.
 1.40 20-Jul-2014  hannken branches: 1.40.28; 1.40.32;
Change kernfs from hashlist to vcache.
 1.39 20-Jul-2014  hannken Remove another KAME IPSEC residue, "struct secasvar" and "struct secpolicy".
 1.38 17-Jul-2014  hannken Finish KAME IPSEC removal:
- Remove field kfs_value, it is always zero. Compute the hash from kt_tag.
- Remove stray definitions kernfs_revoke_sa and kernfs_revoke_sp.

While here, remove kfs_type from allocvp(), it is always kt->kt_tag.
 1.37 22-Mar-2012  drochner branches: 1.37.2; 1.37.12;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.36 27-Sep-2011  christos branches: 1.36.2; 1.36.6;
define KERNFS_MAXNAMLEN and use it.`
 1.35 11-Jan-2009  christos merge christos-time_t
 1.34 01-Aug-2008  apb branches: 1.34.2;
#include <sys/tree.h> to get a definition for SPLAY_ENTRY.
Needed by third party code, such as lsof.
 1.33 28-Jun-2008  rumble branches: 1.33.2;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.32 28-Dec-2006  alc branches: 1.32.40; 1.32.44; 1.32.46; 1.32.48;
fix comment (forgotten in rev 1.19):
- pfsnode -> kernfs_node
- procfs -> kernfs
 1.31 23-Jun-2006  christos branches: 1.31.4;
remove useless genop
 1.30 23-Jun-2006  bouyer For internal types call kernfs_default_xread() directly, as no entry in
the splay tree has been added for these types. Fix kern/33797 by
Geoff C. Wing.
While here also fix writes the same way (probably broken for 2 years),
and properly implement KERNFS_XREAD.
The IPsec code could probably be moved out now, and use kernfs_alloctype().
 1.29 23-Jun-2006  bouyer Backout previous: of course the change
"Allow optional /kern regular files to have custom read methods..."
works, it's used by Xen.
 1.28 23-Jun-2006  christos PR/33797: Geoff C. Wing: kernfs files are not supplying information
Roll back the change:
'Allow optional /kern regular files to have custom read methods...'
which does not work.
 1.27 14-Mar-2006  bouyer branches: 1.27.6;
Allow optionnal /kern regular files to have custom read methods, the same
way writes are handled: Add KERNFS_XREAD and KERNFS_FILEOP_WRITE files
operations definitions to kfsfileop, a xread function pointer to
kernfs_fileop, rename kernfs_read to kernfs_default_xread and add a
kernfs_read calling kernfs_try_fileop(KERNFS_FILEOP_READ).

Proposed on tech-kern on Feb 18 2006.
 1.26 11-Dec-2005  christos branches: 1.26.4; 1.26.6; 1.26.8; 1.26.10;
merge ktrace-lwp.
 1.25 30-Aug-2005  xtraeme Remove __P()
 1.24 20-May-2005  chs branches: 1.24.2;
kernfs does not support mmap(), remove code that pretends that it does.
 1.23 20-May-2004  atatat branches: 1.23.10;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.22 07-May-2004  cl Allow additional entries (files, subdirs) in kernfs. Also allow
defining additional kfstypes and provide hooks to run arbitrary code
for any vnodeop on the additional types.
 1.21 07-May-2004  cl Make lookup and readdir return the same inode number. kernfs_readdir
now uses kernfs_allocvp to map from kernfs entry to inode number,
kernfs_allocvp is now the only place where entries are mapped to inode
numbers. Also make KERNFS_FILENO not return random results for entries
not in kern_targets.
 1.20 27-Sep-2003  darcy branches: 1.20.2;
Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.19 26-Sep-2003  atatat Make kernfs peacefully co-exist with procfs.
 1.18 08-Sep-2003  itojun add /kern/ipsecsa and /kern/ipsecsp, which can be inspected by setkey(8).
it allows easier access to ipsecsa/sp. it works around problem where
setkey -D does not work with large number of ipsec SAs due to socket buffer
size.
 1.17 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.16 21-Feb-2001  jdolecek branches: 1.16.24;
make some more constant arrays 'const'
 1.15 27-Jan-2001  jdolecek Back previous out, it was right the way it was. Seems like I should
attend some basic arithmetic lessons to avoid such mistakes :-/
 1.14 27-Jan-2001  jdolecek fix 'physmem' - the actual value we want is ctob(physmem)
 1.13 14-Jul-2000  thorpej Sprinkle some const.
 1.12 01-Mar-1998  fvdl branches: 1.12.14; 1.12.24;
Merge with Lite2 + local changes
 1.11 10-May-1997  pk Move `struct kern_target' definition into kernfs.h
 1.10 09-Feb-1996  christos miscfs prototype changes
 1.9 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.6 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.5 05-Jan-1994  cgd update with latest kernfs file system from jsp@sequent.com
 1.4 28-May-1993  cgd add some more functionality: a setattr which supports chmod+chown+chgrp,
and the various support elsewhere to deal with it.
 1.3 27-Mar-1993  cgd added ".." and support for "rrootdev"
 1.2 25-Mar-1993  cgd changed copyright notice thanks to following statement:

Return-Path: jsp@compnews.co.uk
Received: from ben.uknet.ac.uk by postgres.Berkeley.EDU (5.61/1.29)
id AA25983; Thu, 25 Mar 93 05:37:37 -0800
Received: from fennel.compnews.co.uk by ben.uknet.ac.uk via UKIP with SMTP (PP)
id <g.05640-0@ben.uknet.ac.uk>; Thu, 25 Mar 1993 13:37:19 +0000
Received: from sage.compnews.co.uk by fennel.compnews.co.uk;
Thu, 25 Mar 93 13:37:08 GMT
Message-Id: <28109.9303251337@sage.compnews.co.uk>
From: jsp@compnews.co.uk (Jan-Simon Pendry)
Date: Thu, 25 Mar 1993 13:37:05 +0100
In-Reply-To: cgd@postgres.berkeley.edu's message as of Mar 25, 5:32am.
Phone-Number-1: +44 430 432450
Phone-Number-2: +44 430 432480 x20
Fax-Number: +44 430 432022
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: cgd@postgres.berkeley.edu
Subject: Re: fdesc/kernfs/etc code...

You may put this copyright message on the source code:

/*
* Copyright (c) 1990, 1992 Jan-Simon Pendry
* All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Jan-Simon Pendry.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
 1.1 23-Mar-1993  cgd branches: 1.1.1;
files which implement the kern filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.12.24.1 14-Jul-2000  thorpej Update from trunk:
Sprinkle some const.
 1.12.14.2 12-Mar-2001  bouyer Sync with HEAD.
 1.12.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.24.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.24.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.24.3 18-Sep-2004  skrll Sync with HEAD.
 1.16.24.2 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.16.24.1 03-Aug-2004  skrll Sync with HEAD
 1.20.2.4 24-May-2005  riz Pull up revision 1.24 (requested by chs in ticket #1540):
kernfs does not support mmap(), remove code that pretends that it does.
 1.20.2.3 23-May-2004  tron branches: 1.20.2.3.2;
Pull up revision 1.23 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.20.2.2 15-May-2004  tron Pull up revision 1.22 (requested by cl in ticket #336):
Allow additional entries (files, subdirs) in kernfs. Also allow
defining additional kfstypes and provide hooks to run arbitrary code
for any vnodeop on the additional types.
 1.20.2.1 14-May-2004  jdc Pull up revision 1.21 (requested by cl in ticket #322).

Make lookup and readdir return the same inode number. kernfs_readdir
now uses kernfs_allocvp to map from kernfs entry to inode number,
kernfs_allocvp is now the only place where entries are mapped to inode
numbers. Also make KERNFS_FILENO not return random results for entries
not in kern_targets.
 1.20.2.3.2.1 24-May-2005  riz Pull up revision 1.24 (requested by chs in ticket #1540):
kernfs does not support mmap(), remove code that pretends that it does.
 1.23.10.1 28-May-2005  tron Pull up revision 1.24 (requested by chs in ticket #329):
kernfs does not support mmap(), remove code that pretends that it does.
 1.24.2.2 30-Dec-2006  yamt sync with head.
 1.24.2.1 21-Jun-2006  yamt sync with head.
 1.26.10.1 19-Apr-2006  elad sync with head.
 1.26.8.2 26-Jun-2006  yamt sync with head.
 1.26.8.1 01-Apr-2006  yamt sync with head.
 1.26.6.1 22-Apr-2006  simonb Sync with head.
 1.26.4.1 09-Sep-2006  rpaulo sync with head
 1.27.6.1 13-Jul-2006  gdamore Merge from HEAD.
 1.31.4.1 12-Jan-2007  ad Sync with head.
 1.32.48.1 03-Jul-2008  simonb Sync with head.
 1.32.46.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.32.44.1 04-May-2009  yamt sync with head.
 1.32.40.3 17-Jan-2009  mjf Sync with HEAD.
 1.32.40.2 28-Sep-2008  mjf Sync with HEAD.
 1.32.40.1 29-Jun-2008  mjf Sync with HEAD.
 1.33.2.1 19-Oct-2008  haad Sync with HEAD.
 1.34.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.36.6.1 05-Apr-2012  mrg sync to latest -current.
 1.36.2.1 17-Apr-2012  yamt sync with head
 1.37.12.1 10-Aug-2014  tls Rebase.
 1.37.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.40.32.1 12-Feb-2020  martin Pull up following revision(s) (requested by riastradh in ticket #702):

sys/miscfs/kernfs/kernfs_vfsops.c: revision 1.98
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.163
sys/miscfs/kernfs/kernfs.h: revision 1.43

Use specfs vnops for specnodes in kernfs.

While here, don't filter out rootdev and rrootdev merely because
they're not cached.

Fixes the elusive /kern/rootdev and /kern/rrootdev nodes, which only
appeared sometimes when they felt like it, and fixes operations on
/kern/rootdev and /kern/rrootdev always returning EOPNOTSUPP.

We didn't seem to have a single PR for these issues but the following
PRs are all relevant:

PR bin/13564
PR kern/38265
PR kern/38778
PR kern/45974

XXX pullup-9, pullup-8, pullup-7, pullup-6, pullup-5, pullup-4, pullup-3, p=
ullup-2, pullup-1.4T...
 1.40.28.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.40.28.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.41.2.2 29-Feb-2020  ad Sync with head.
 1.41.2.1 17-Jan-2020  ad Sync with head.
 1.44.28.1 02-Aug-2025  perseant Sync with HEAD
 1.29 20-Jul-2014  hannken Change kernfs from hashlist to vcache.
 1.28 17-Jul-2014  hannken Finish KAME IPSEC removal:
- Remove field kfs_value, it is always zero. Compute the hash from kt_tag.
- Remove stray definitions kernfs_revoke_sa and kernfs_revoke_sp.

While here, remove kfs_type from allocvp(), it is always kt->kt_tag.
 1.27 08-Apr-2014  christos From Ilya Zykov: Unbreak kernfs which was broken by this commit

|Make the spec_node table implementation private to spec_vnops.c.
|To retrieve a spec_node, two new lookup functions (by device or by mount)
|are implemented. Both return a referenced vnode, for an opened block device
|the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
|will not fire. Otherwise any vnode matching the criteria gets returned.
|No objections on tech-kern.

The effect was that ls /kernfs appeared empty in most cases.
 1.26 27-Feb-2014  hannken branches: 1.26.2;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.25 22-Mar-2012  drochner branches: 1.25.2; 1.25.4;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.24 12-Jun-2011  rmind branches: 1.24.2; 1.24.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.23 21-Jul-2010  hannken branches: 1.23.6;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.22 21-Jul-2010  hannken Using vfinddev() leads to vnode races as it returns an unreferenced
vnode that may disappear before the caller has a chance to reference it.

Reference the vnode while the specfs cache is locked.

Welcome to 5.99.37.

No objections on tech-kern.
 1.21 01-Jul-2010  hannken Remove vlockmgr(). Generic vnode lock operations now use a rwlock located
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().

Welcome to 5.99.34.

Discussed on tech-kern.
 1.20 15-Mar-2009  cegger branches: 1.20.2; 1.20.4;
ansify function definitions
 1.19 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.18 11-Jan-2009  christos branches: 1.18.2;
merge christos-time_t
 1.17 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.16 05-May-2008  ad branches: 1.16.8;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.15 30-Jan-2008  ad branches: 1.15.6; 1.15.8; 1.15.10;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.14 02-Jan-2008  ad Merge vmlocking2 to head.
 1.13 10-Oct-2007  ad branches: 1.13.4; 1.13.6; 1.13.10;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.12 11-Mar-2007  ad branches: 1.12.12; 1.12.14; 1.12.16;
Remove useless cast.
 1.11 27-Feb-2007  ad branches: 1.11.2;
Destroy the hash locks on final unmount.
 1.10 15-Feb-2007  ad branches: 1.10.2;
Replace some uses of lockmgr() / simplelocks.
 1.9 11-Dec-2005  christos merge ktrace-lwp.
 1.8 30-Aug-2005  xtraeme Remove __P()
 1.7 26-Feb-2005  perry branches: 1.7.4;
nuke trailing whitespace
 1.6 07-May-2004  cl branches: 1.6.2; 1.6.6; 1.6.8;
remove code no longer needed since the type/permission information
is available in the entry's struct kern_target and every entry has a
(possibly shared) struct kern_target now.
 1.5 27-Sep-2003  darcy branches: 1.5.2;
Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.4 26-Sep-2003  atatat Make kernfs peacefully co-exist with procfs.
 1.3 10-Sep-2003  itojun fix permission of /kern/hostname to 0644
 1.2 10-Sep-2003  dan test against kt to get the right node of the given type, from enami@
 1.1 08-Sep-2003  itojun add /kern/ipsecsa and /kern/ipsecsp, which can be inspected by setkey(8).
it allows easier access to ipsecsa/sp. it works around problem where
setkey -D does not work with large number of ipsec SAs due to socket buffer
size.
 1.5.2.1 14-May-2004  jdc Pull up revision 1.6 (requested by cl in ticket #322).

remove code no longer needed since the type/permission information
is available in the entry's struct kern_target and every entry has a
(possibly shared) struct kern_target now.
 1.6.8.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.6.6.1 29-Apr-2005  kent sync with -current
 1.6.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.6.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.6.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.6.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.6.2.2 03-Aug-2004  skrll Sync with HEAD
 1.6.2.1 07-May-2004  skrll file kernfs_subr.c was added on branch ktrace-lwp on 2004-08-03 10:54:05 +0000
 1.7.4.6 04-Feb-2008  yamt sync with head.
 1.7.4.5 21-Jan-2008  yamt sync with head
 1.7.4.4 27-Oct-2007  yamt sync with head.
 1.7.4.3 03-Sep-2007  yamt sync with head.
 1.7.4.2 26-Feb-2007  yamt sync with head.
 1.7.4.1 21-Jun-2006  yamt sync with head.
 1.10.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.11.2.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.11.2.2 05-Apr-2007  ad Compile fixes.
 1.11.2.1 13-Mar-2007  ad Sync with head.
 1.12.16.1 14-Oct-2007  yamt sync with head.
 1.12.14.3 23-Mar-2008  matt sync with HEAD
 1.12.14.2 09-Jan-2008  matt sync with HEAD
 1.12.14.1 06-Nov-2007  matt sync with HEAD
 1.12.12.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.13.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.13.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.13.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.15.10.3 11-Aug-2010  yamt sync with head.
 1.15.10.2 04-May-2009  yamt sync with head.
 1.15.10.1 16-May-2008  yamt sync with head.
 1.15.8.1 18-May-2008  yamt sync with head.
 1.15.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.15.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.16.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.16.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.18.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.20.4.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.20.4.3 05-Mar-2011  rmind sync with head
 1.20.4.2 03-Jul-2010  rmind sync with head
 1.20.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.20.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.23.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.24.6.1 05-Apr-2012  mrg sync to latest -current.
 1.24.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.24.2.1 17-Apr-2012  yamt sync with head
 1.25.4.1 18-May-2014  rmind sync with head
 1.25.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.26.2.1 10-Aug-2014  tls Rebase.
 1.101 16-Feb-2025  joe remove unecessary branches
 1.100 07-Apr-2020  jdolecek branches: 1.100.28;
switch to kmem_zalloc() instead of malloc() for struct kernfs_mount
 1.99 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.98 04-Feb-2020  riastradh Use specfs vnops for specnodes in kernfs.

While here, don't filter out rootdev and rrootdev merely because
they're not cached.

Fixes the elusive /kern/rootdev and /kern/rrootdev nodes, which only
appeared sometimes when they felt like it, and fixes operations on
/kern/rootdev and /kern/rrootdev always returning EOPNOTSUPP.

We didn't seem to have a single PR for these issues but the following
PRs are all relevant:

PR bin/13564
PR kern/38265
PR kern/38778
PR kern/45974

XXX pullup-9, pullup-8, pullup-7, pullup-6, pullup-5, pullup-4, pullup-3, pullup-2, pullup-1.4T...
 1.97 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.96 17-Feb-2017  hannken branches: 1.96.14; 1.96.18; 1.96.20;
Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.95 20-Jul-2014  hannken branches: 1.95.4; 1.95.8; 1.95.12;
Change kernfs from hashlist to vcache.
 1.94 17-Jul-2014  hannken Finish KAME IPSEC removal:
- Remove field kfs_value, it is always zero. Compute the hash from kt_tag.
- Remove stray definitions kernfs_revoke_sa and kernfs_revoke_sp.

While here, remove kfs_type from allocvp(), it is always kt->kt_tag.
 1.93 23-Mar-2014  hannken branches: 1.93.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.92 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.91 27-Sep-2011  christos branches: 1.91.2; 1.91.12; 1.91.16;
define KERNFS_MAXNAMLEN and use it.`
 1.90 30-Nov-2009  pooka Introduce genfs_statvfs() as pretty much a no-info statvfs and
convert several pseudo file systems to use it.
 1.89 15-Mar-2009  cegger ansify function definitions
 1.88 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.87 17-Dec-2008  cegger branches: 1.87.2;
kill MALLOC and FREE macros.
 1.86 28-Jun-2008  rumble branches: 1.86.4;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.85 10-May-2008  rumble branches: 1.85.2;
Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.84 29-Apr-2008  ad branches: 1.84.2;
PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.83 28-Jan-2008  dholland branches: 1.83.6; 1.83.8; 1.83.10;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.82 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.81 31-Jul-2007  pooka branches: 1.81.2; 1.81.4; 1.81.10; 1.81.12;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.80 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.79 17-Jul-2007  pooka branches: 1.79.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.78 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.77 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.76 19-Jan-2007  hannken branches: 1.76.6; 1.76.8;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.75 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.74 16-Nov-2006  christos branches: 1.74.2;
__unused removal on arguments; approved by core.
 1.73 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.72 02-Sep-2006  christos branches: 1.72.2; 1.72.4;
add missing initializers.
 1.71 14-May-2006  elad integrate kauth.
 1.70 11-Dec-2005  christos branches: 1.70.4; 1.70.6; 1.70.8; 1.70.10; 1.70.12;
merge ktrace-lwp.
 1.69 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.68 30-Aug-2005  xtraeme Remove __P()
 1.67 29-Mar-2005  thorpej branches: 1.67.2;
- Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.66 02-Jan-2005  thorpej branches: 1.66.2;
Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.65 13-Sep-2004  jdolecek set mp->mnt_stat.f_namemax on filesystem mount, for use by statvfs
 1.64 29-May-2004  tron Don't leak memory in VFS_MOUNT() if set_statvfs_info() fails.
 1.63 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.62 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.61 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.60 21-Apr-2004  christos add sys/dirent.h
 1.59 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.58 24-Mar-2004  atatat branches: 1.58.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.57 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.56 27-Sep-2003  darcy Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.55 26-Sep-2003  atatat Make kernfs peacefully co-exist with procfs.
 1.54 08-Sep-2003  itojun add /kern/ipsecsa and /kern/ipsecsp, which can be inspected by setkey(8).
it allows easier access to ipsecsa/sp. it works around problem where
setkey -D does not work with large number of ipsec SAs due to socket buffer
size.
 1.53 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.52 29-Jun-2003  fvdl branches: 1.52.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.51 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.50 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.49 22-Apr-2003  christos fix lkm malloc lossage.
 1.48 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.47 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.46 21-Sep-2002  christos MNT_GETARGS support
 1.45 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.44 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.43 15-Nov-2001  lukem branches: 1.43.8;
don't need <sys/types.h> when including <sys/param.h>
 1.42 10-Nov-2001  lukem add RCSIDs
 1.41 15-Sep-2001  chs branches: 1.41.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.40 30-May-2001  mrg branches: 1.40.2; 1.40.4;
use _KERNEL_OPT
 1.39 04-Feb-2001  mrg branches: 1.39.2;
clean up some KERNFS_DIAGNOSTIC calls.
 1.38 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.37 10-Jun-2000  assar make vfs_getnewfsid only take one argument and fetch the name of the
filesystem from the supplied mount argument. also make makefstype
take a const parameter. update all the callers.
 1.36 16-Mar-2000  jdolecek branches: 1.36.2;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.35 26-Feb-1999  wrstuden branches: 1.35.8; 1.35.14;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.34 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.33 05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.32 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.31 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.30 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.29 22-Dec-1996  cgd branches: 1.29.10;
Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.28 13-Oct-1996  christos backout previous kprintf changes
 1.27 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.26 22-Apr-1996  christos remove include of <sys/cpu.h>
 1.25 09-Feb-1996  christos miscfs prototype changes
 1.24 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.23 09-Mar-1995  mycroft copy*str() should use size_t.
 1.22 08-Mar-1995  cgd use u_long for copyin*
 1.21 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a (tiny) bit.
 1.20 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.19 15-Sep-1994  mycroft stat the file system at mount time, for `df -n', et al.
 1.18 29-Jun-1994  cgd branches: 1.18.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.17 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.16 15-Jun-1994  mycroft Fix a bug in finding the raw root device.
 1.15 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.14 23-Apr-1994  cgd make fs types consistent over new kernels. also, some proto foo.
 1.13 21-Apr-1994  cgd Convert mount, vnode, and buf structs to use <sys/queue.h>. Also,
some knf and structure frobbing to do along with it.
 1.12 14-Apr-1994  cgd fs types are names now.
 1.11 05-Jan-1994  cgd update with latest kernfs file system from jsp@sequent.com
 1.10 20-Dec-1993  cgd branches: 1.10.2;
pull load average and misc changes down from magnum
 1.9 18-Dec-1993  mycroft Canonicalize all #includes.
 1.8 23-Aug-1993  cgd branches: 1.8.2;
changes from 0.9-ALPHA2 to 0.9-BETA
 1.7 07-Jun-1993  cgd branches: 1.7.2;
give various filesystems their own vnode types
 1.6 07-Jun-1993  cgd give miscfs filesystems their own mount structure malloc type.
 1.5 28-May-1993  cgd add some more functionality: a setattr which supports chmod+chown+chgrp,
and the various support elsewhere to deal with it.
 1.4 27-Mar-1993  cgd added cdevvp (after taking it out of vfs_subr.c) because realized
that it was only needed here.
 1.3 27-Mar-1993  cgd added ".." and support for "rrootdev"
 1.2 25-Mar-1993  cgd changed copyright notice thanks to following statement:

Return-Path: jsp@compnews.co.uk
Received: from ben.uknet.ac.uk by postgres.Berkeley.EDU (5.61/1.29)
id AA25983; Thu, 25 Mar 93 05:37:37 -0800
Received: from fennel.compnews.co.uk by ben.uknet.ac.uk via UKIP with SMTP (PP)
id <g.05640-0@ben.uknet.ac.uk>; Thu, 25 Mar 1993 13:37:19 +0000
Received: from sage.compnews.co.uk by fennel.compnews.co.uk;
Thu, 25 Mar 93 13:37:08 GMT
Message-Id: <28109.9303251337@sage.compnews.co.uk>
From: jsp@compnews.co.uk (Jan-Simon Pendry)
Date: Thu, 25 Mar 1993 13:37:05 +0100
In-Reply-To: cgd@postgres.berkeley.edu's message as of Mar 25, 5:32am.
Phone-Number-1: +44 430 432450
Phone-Number-2: +44 430 432480 x20
Fax-Number: +44 430 432022
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: cgd@postgres.berkeley.edu
Subject: Re: fdesc/kernfs/etc code...

You may put this copyright message on the source code:

/*
* Copyright (c) 1990, 1992 Jan-Simon Pendry
* All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Jan-Simon Pendry.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
 1.1 23-Mar-1993  cgd branches: 1.1.1;
files which implement the kern filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.7.2.1 20-Aug-1993  cgd kill old, badly placed incarnation of cdevvp
 1.8.2.3 03-Dec-1993  cgd don't try to get rootdev at vfs init time. wait until kernfs mount time
to do it. this is hackish, but it gets the job done, and is slightly
more robust than the previous way it was done...
 1.8.2.2 29-Nov-1993  mycroft Don't crash deferencing a null pointer if the raw root device was not found.
 1.8.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.10.2.3 06-Jan-1994  pk Re-instate EOPNOTSUPP
 1.10.2.2 28-Dec-1993  pk Return ENODEV rather then EOPNOTSUPP for unsupported operations.
 1.10.2.1 20-Dec-1993  pk file kernfs_vfsops.c was added on branch magnum on 1993-12-28 16:21:43 +0000
 1.18.2.1 16-Sep-1994  cgd from trunk, per mycroft
 1.29.10.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.35.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.35.8.2 11-Feb-2001  bouyer Sync with HEAD.
 1.35.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.36.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.39.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.39.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.39.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.39.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.39.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.39.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.39.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.40.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.40.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.40.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.40.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.41.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.43.8.2 29-Aug-2002  gehenna catch up with -current.
 1.43.8.1 16-May-2002  gehenna Replace the direct-access to devsw table with calling devsw APIs.
 1.52.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.52.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.52.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.52.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.52.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.52.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.52.2.2 03-Aug-2004  skrll Sync with HEAD
 1.52.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.58.2.1 29-May-2004  tron Pull up revision 1.62 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.66.2.1 29-Apr-2005  kent sync with -current
 1.67.2.6 04-Feb-2008  yamt sync with head.
 1.67.2.5 07-Dec-2007  yamt sync with head
 1.67.2.4 03-Sep-2007  yamt sync with head.
 1.67.2.3 26-Feb-2007  yamt sync with head.
 1.67.2.2 30-Dec-2006  yamt sync with head.
 1.67.2.1 21-Jun-2006  yamt sync with head.
 1.70.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.70.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.70.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.70.8.2 03-Sep-2006  yamt sync with head.
 1.70.8.1 24-May-2006  yamt sync with head.
 1.70.6.1 01-Jun-2006  kardel Sync with head.
 1.70.4.1 09-Sep-2006  rpaulo sync with head
 1.72.4.2 10-Dec-2006  yamt sync with head.
 1.72.4.1 22-Oct-2006  yamt sync with head
 1.72.2.3 01-Feb-2007  ad Sync with head.
 1.72.2.2 12-Jan-2007  ad Sync with head.
 1.72.2.1 18-Nov-2006  ad Sync with head.
 1.74.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.76.8.1 11-Jul-2007  mjf Sync with head.
 1.76.6.2 20-Aug-2007  ad Sync with HEAD.
 1.76.6.1 15-Jul-2007  ad Sync with head.
 1.79.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.81.12.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.81.12.1 31-Jul-2007  pooka file kernfs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:17 +0000
 1.81.10.2 18-Feb-2008  mjf Sync with HEAD.
 1.81.10.1 08-Dec-2007  mjf Sync with HEAD.
 1.81.4.2 23-Mar-2008  matt sync with HEAD
 1.81.4.1 09-Jan-2008  matt sync with HEAD
 1.81.2.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.83.10.3 11-Mar-2010  yamt sync with head
 1.83.10.2 04-May-2009  yamt sync with head.
 1.83.10.1 16-May-2008  yamt sync with head.
 1.83.8.1 18-May-2008  yamt sync with head.
 1.83.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.83.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.83.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.84.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.84.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.85.2.1 03-Jul-2008  simonb Sync with head.
 1.86.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.86.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.87.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.91.16.1 18-May-2014  rmind sync with head
 1.91.12.2 03-Dec-2017  jdolecek update from HEAD
 1.91.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.91.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.93.2.1 10-Aug-2014  tls Rebase.
 1.95.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.95.8.1 20-Mar-2017  pgoyette Sync with HEAD
 1.95.4.1 28-Aug-2017  skrll Sync with HEAD
 1.96.20.2 29-Feb-2020  ad Sync with head.
 1.96.20.1 17-Jan-2020  ad Sync with head.
 1.96.18.1 12-Feb-2020  martin Pull up following revision(s) (requested by riastradh in ticket #702):

sys/miscfs/kernfs/kernfs_vfsops.c: revision 1.98
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.163
sys/miscfs/kernfs/kernfs.h: revision 1.43

Use specfs vnops for specnodes in kernfs.

While here, don't filter out rootdev and rrootdev merely because
they're not cached.

Fixes the elusive /kern/rootdev and /kern/rrootdev nodes, which only
appeared sometimes when they felt like it, and fixes operations on
/kern/rootdev and /kern/rrootdev always returning EOPNOTSUPP.

We didn't seem to have a single PR for these issues but the following
PRs are all relevant:

PR bin/13564
PR kern/38265
PR kern/38778
PR kern/45974

XXX pullup-9, pullup-8, pullup-7, pullup-6, pullup-5, pullup-4, pullup-3, p=
ullup-2, pullup-1.4T...
 1.96.14.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.96.14.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.100.28.1 02-Aug-2025  perseant Sync with HEAD
 1.174 27-Mar-2022  christos dedup the eofs link/symlink methods
 1.173 12-Mar-2022  riastradh kernfs: Just fail with EOPNOTSUPP, don't panic, on VOP_BMAP.

Reported-by: syzbot+870d2eb4b4c8904ac734@syzkaller.appspotmail.com
 1.172 19-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.

Part 3; cvs randomly didn't commit all the files the first time, still
hunting down the files it skipped.
 1.171 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.170 06-Jul-2021  dholland Fix perms on /kern/{r,}rootdev.
 1.169 06-Jul-2021  dholland Add missing VOP_KQFILTER to kernfs.

Not sure if lack of it can be used for local DoS or not, but best to
fix.
 1.168 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.167 28-Jun-2021  chs VOP_BMAP() may be called via ioctl(FIOGETBMAP) on any vnode that applications
can open. change various pseudo-fs *_bmap methods return an error instead of
panic.

Reported-by: syzbot+8289a3eaf2ba60958c87@syzkaller.appspotmail.com
 1.166 27-Jun-2020  christos branches: 1.166.6;
Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.165 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.164 24-Feb-2020  ad v_interlock -> vmobjlock
 1.163 04-Feb-2020  riastradh Use specfs vnops for specnodes in kernfs.

While here, don't filter out rootdev and rrootdev merely because
they're not cached.

Fixes the elusive /kern/rootdev and /kern/rrootdev nodes, which only
appeared sometimes when they felt like it, and fixes operations on
/kern/rootdev and /kern/rrootdev always returning EOPNOTSUPP.

We didn't seem to have a single PR for these issues but the following
PRs are all relevant:

PR bin/13564
PR kern/38265
PR kern/38778
PR kern/45974

XXX pullup-9, pullup-8, pullup-7, pullup-6, pullup-5, pullup-4, pullup-3, pullup-2, pullup-1.4T...
 1.162 02-Jan-2020  thorpej branches: 1.162.2;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.
 1.161 29-Aug-2019  hannken Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.160 03-Sep-2018  riastradh branches: 1.160.4;
Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.159 31-Mar-2018  christos branches: 1.159.2;
factor out some repeated code and simplify the logputchar function.
 1.158 26-May-2017  riastradh branches: 1.158.2; 1.158.8;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.157 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.156 20-Aug-2016  hannken branches: 1.156.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.155 20-Apr-2015  riastradh branches: 1.155.2;
Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.154 25-Jul-2014  dholland branches: 1.154.2; 1.154.4; 1.154.6; 1.154.10;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.153 20-Jul-2014  hannken Change kernfs from hashlist to vcache.
 1.152 17-Jul-2014  hannken Finish KAME IPSEC removal:
- Remove field kfs_value, it is always zero. Compute the hash from kt_tag.
- Remove stray definitions kernfs_revoke_sa and kernfs_revoke_sp.

While here, remove kfs_type from allocvp(), it is always kt->kt_tag.
 1.151 08-Apr-2014  christos From Ilya Zykov: Unbreak kernfs which was broken by this commit

|Make the spec_node table implementation private to spec_vnops.c.
|To retrieve a spec_node, two new lookup functions (by device or by mount)
|are implemented. Both return a referenced vnode, for an opened block device
|the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
|will not fire. Otherwise any vnode matching the criteria gets returned.
|No objections on tech-kern.

The effect was that ls /kernfs appeared empty in most cases.
 1.150 07-Feb-2014  hannken branches: 1.150.2;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.149 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.148 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.147 18-Mar-2013  plunky branches: 1.147.6;
C99 section 6.7.2.3 (Tags) Note 3 states that:

A type specifier of the form

enum identifier

without an enumerator list shall only appear after the type it
specifies is complete.

which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).

(ok elad@)
 1.146 22-Mar-2012  drochner branches: 1.146.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.145 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.144 12-Dec-2011  njoly branches: 1.144.2;
Start making fs read(2) fail with EISDIR if the implementation does
not allow read on directories (kernfs, rumpfs, ptyfs and sysvbfs).
Adjust man page accordingly, and add a small corresponding vfs
testcase.
 1.143 21-Jul-2010  hannken branches: 1.143.8; 1.143.12;
Using vfinddev() leads to vnode races as it returns an unreferenced
vnode that may disappear before the caller has a chance to reference it.

Reference the vnode while the specfs cache is locked.

Welcome to 5.99.37.

No objections on tech-kern.
 1.142 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.141 31-Mar-2010  pooka If msgbuf is not enabled, do not report the node in readdir. That
way ls -l won't report funny errors because getattr for a readdir
result fails.

XXX: lookup for msgbuf still succeeds even if not enabled
 1.140 22-Jan-2010  njoly branches: 1.140.2; 1.140.4;
Remove unneeded strlen() call in KFShostname case.
 1.139 08-Jan-2010  pooka The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.138 03-Jul-2009  elad Where possible, extract the file-system's access() routine to two internal
functions: the first checking if the operation is possible (regardless of
permissions), the second checking file-system permissions, ACLs, etc.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
 1.137 23-Jun-2009  elad Move the implementation of vaccess() to genfs_can_access(), in line with
the other routines of the same spirit.

Adjust file-system code to use it.

Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).

No objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
 1.136 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.135 11-Jan-2009  christos branches: 1.135.2;
merge christos-time_t
 1.134 02-Jan-2008  ad branches: 1.134.6; 1.134.8; 1.134.12; 1.134.20;
Merge vmlocking2 to head.
 1.133 26-Nov-2007  pooka branches: 1.133.2; 1.133.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.132 28-Dec-2006  elad branches: 1.132.6; 1.132.18; 1.132.20; 1.132.26;
Revert bogus NULL check introduced in revision 1.96 that generated false
Coverity "bugs".
 1.131 28-Dec-2006  alc revert previous, after inspection `kfs->kfs_kt' could really not be NULL here.

reported/requested by elad@
 1.130 26-Dec-2006  alc CID-3855: check if 'kfs->kfs_kt != NULL' before dereferencing it
 1.129 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.128 16-Nov-2006  christos branches: 1.128.2;
__unused removal on arguments; approved by core.
 1.127 04-Nov-2006  jmmv Use size_t in a couple of places as it makes more sense WRT the places
where the variables are later used. From PR kern/25277 by Jeff Ito.
 1.126 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.125 23-Jun-2006  christos branches: 1.125.4; 1.125.6;
remove useless genop
 1.124 23-Jun-2006  bouyer For internal types call kernfs_default_xread() directly, as no entry in
the splay tree has been added for these types. Fix kern/33797 by
Geoff C. Wing.
While here also fix writes the same way (probably broken for 2 years),
and properly implement KERNFS_XREAD.
The IPsec code could probably be moved out now, and use kernfs_alloctype().
 1.123 23-Jun-2006  bouyer Backout previous: of course the change
"Allow optional /kern regular files to have custom read methods..."
works, it's used by Xen.
 1.122 23-Jun-2006  christos PR/33797: Geoff C. Wing: kernfs files are not supplying information
Roll back the change:
'Allow optional /kern regular files to have custom read methods...'
which does not work.
 1.121 07-Jun-2006  kardel branches: 1.121.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.120 14-May-2006  elad branches: 1.120.2;
integrate kauth.
 1.119 04-Apr-2006  christos Coverity CID 1087: Clarify NULL test.
 1.118 14-Mar-2006  bouyer branches: 1.118.2;
Allow optionnal /kern regular files to have custom read methods, the same
way writes are handled: Add KERNFS_XREAD and KERNFS_FILEOP_WRITE files
operations definitions to kfsfileop, a xread function pointer to
kernfs_fileop, rename kernfs_read to kernfs_default_xread and add a
kernfs_read calling kernfs_try_fileop(KERNFS_FILEOP_READ).

Proposed on tech-kern on Feb 18 2006.
 1.117 01-Mar-2006  yamt branches: 1.117.2; 1.117.4;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.116 24-Dec-2005  perry branches: 1.116.2; 1.116.4; 1.116.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.115 11-Dec-2005  christos merge ktrace-lwp.
 1.114 23-Nov-2005  christos Fix 64 bit truncation problem reported by http://www.securitylab.net
 1.113 02-Nov-2005  yamt branches: 1.113.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.112 01-Sep-2005  christos branches: 1.112.2;
Also protect the ipsec ioctls from negative offsets to prevent panics
in m_copydata(). Pointed out by Karl Janmar. Move the negative offset
check from kernfs_xread() to kernfs_read().
 1.111 31-Aug-2005  christos Don't allow negative offsets when reading the message buffer, because it
can allow reading arbitrary kernel memory.
 1.110 30-Aug-2005  xtraeme Remove __P()
 1.109 29-May-2005  christos branches: 1.109.2;
- sprinkle const
- avoid shadowed variables.
 1.108 20-May-2005  chs kernfs does not support mmap(), remove code that pretends that it does.
 1.107 26-Feb-2005  perry branches: 1.107.2;
nuke trailing whitespace
 1.106 27-Oct-2004  skrll branches: 1.106.4; 1.106.6;
Backout previous.
 1.105 27-Oct-2004  skrll Don't pass &proc0 in the UIO_SYSSPACE case it is not needed.
 1.104 13-May-2004  cl Workaround for gcc 2.95.3 failing to initialize structures
and/or unions inside structures using nested designators.
Should be reverted when gcc >=3.3.3 is ready for vax.
 1.103 12-May-2004  jrf caddr_t -> void * and removal of some more casts.
 1.102 07-May-2004  cl Allow additional entries (files, subdirs) in kernfs. Also allow
defining additional kfstypes and provide hooks to run arbitrary code
for any vnodeop on the additional types.
 1.101 07-May-2004  cl Make lookup and readdir return the same inode number. kernfs_readdir
now uses kernfs_allocvp to map from kernfs entry to inode number,
kernfs_allocvp is now the only place where entries are mapped to inode
numbers. Also make KERNFS_FILENO not return random results for entries
not in kern_targets.
 1.100 07-May-2004  cl Find the right entry when doing lookup on dotdot in kern/ipsec subdirs.
Also remove some duplicate code.
 1.99 29-Apr-2004  jrf Removed remaining caddr_t casts we do not need in miscfs. Recompiled
kernel and ran for a day or so. There are still some caddr_t types in
the arguments of some calls, I will do those separately (later) as
they touch a lot more of the system.
Approved by christos@NetBSD.org.
 1.98 27-Sep-2003  darcy branches: 1.98.2;
Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.97 26-Sep-2003  atatat Make kernfs peacefully co-exist with procfs.
 1.96 10-Sep-2003  itojun check before deref kfs_kt
 1.95 10-Sep-2003  dan Make /kern/. have linkcount 2 in non-IPSEC case, 4 in IPSEC case.
Thanks to Valeriy E. Ushakov.
 1.94 10-Sep-2003  itojun check if rootdev/rrootdev actually exists.
 1.93 10-Sep-2003  simonb 8 spaces is evil, convert to tab.
 1.92 10-Sep-2003  dan Make vnode times on /kern/boottime be the boot time, not "now".

Handy because ls(1) helpfully converts the time to human-readable
format when printing, and because shell tools like "test -nt" and
"find -newer" can be used against it.

"Inspired" by a discussion about removing lockfiles older than the
last reboot, and Al Crooks' handy observation that a close
approximation can be found with /var/run/dmesg.boot

While here, notice that a lot of the kernfs structures and naming
changed suddenly, and though it seems a clear improvement, there was no
mention in commit logs.
 1.91 08-Sep-2003  itojun remove non-precise comment
 1.90 08-Sep-2003  itojun add /kern/ipsecsa and /kern/ipsecsp, which can be inspected by setkey(8).
it allows easier access to ipsecsa/sp. it works around problem where
setkey -D does not work with large number of ipsec SAs due to socket buffer
size.
 1.89 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.88 29-Jun-2003  fvdl branches: 1.88.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.87 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.86 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.85 10-Apr-2003  jdolecek use former genfs_eopnotsupp_rele() as genfs_eopnotsupp(), so that vnodes
are vput()/vrele()d as necessary - some filesystems did use the wrong
one for some ops, and it's just safer to not take the chance

based on suggestion by Bill Studenmund
 1.84 12-Oct-2002  jdolecek put back the while loop in kernfs_getattr() removed in rev 1.82; it's
necessary to get the whole file length e.g. for msgbuf

this fixes the '/kern/msgbuf & less' problem reported on port-i386
by Dave Tyson
 1.83 03-Aug-2002  simonb Just use the "time" variable in the *_getattr functions instead of a call
to (the potentially expensive) microtime().
 1.82 19-Jul-2002  jdolecek Properly detect error in kernfs_xread().
Fixes kern/10278 by IWAMOTO Toshihiro, though implemented different way.

While here, clean up some int vs. size_t confusion, make
kernfs_x{read|write}() static and g/c some #if 0 stuff.
 1.81 05-Jul-2002  lukem be consistent about how va_[acm]time is set to the current time
(inspired by how procfs does it)
 1.80 05-Jul-2002  lukem set vap->va_ctime to vap->va_atime (the current time),
rather than vap->va_ctime (which is a no-op).
 1.79 06-Dec-2001  chs branches: 1.79.8; 1.79.10;
add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.78 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.77 10-Nov-2001  lukem add RCSIDs
 1.76 03-Jun-2001  chs branches: 1.76.2; 1.76.6;
let kernfs_mmap() use the default error method.
 1.75 21-Feb-2001  jdolecek branches: 1.75.2;
make some more constant arrays 'const'
 1.74 04-Feb-2001  mrg clean up some KERNFS_DIAGNOSTIC calls.
 1.73 27-Jan-2001  jdolecek Back previous out, it was right the way it was. Seems like I should
attend some basic arithmetic lessons to avoid such mistakes :-/
 1.72 27-Jan-2001  jdolecek fix 'physmem' - the actual value we want is ctob(physmem)
 1.71 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.70 03-Aug-2000  thorpej MALLOC()/FREE() are not to be used for variable sized allocations.
 1.69 14-Jul-2000  thorpej Sprinkle some const.
 1.68 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.67 25-Aug-1999  sommerfeld branches: 1.67.2; 1.67.8; 1.67.12;
Change variable used for directory offset from "int" to "off_t".
Overkill, but avoids a host of truncation problems.
 1.66 24-Aug-1999  sommerfeld Fix PR8270:

Problem turned out to be due to improper handling of reads beyond EOF:
they should just return without error with the uio unchanged, and the
caller will recognize this as a zero-byte return (EOF).

The previous fix to protect directory reads against bogus uio_offset
values returned EINVAL, which broke mount -o union, which only
union'ed in the lower directory if the upper directory cleanly
returned EOF.

While we're here, protect kernfs as well.
 1.65 03-Aug-1999  wrstuden Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.64 08-Jul-1999  wrstuden Bump osrelease to 1.4E. Add layerfs files, remove null_subr.c.

Update coda to new struct lock in struct vnode.

make fdescfs, kernfs, portalfs, and procfs actually lock their vnodes.
It's not that hard.

Make unionfs set v_vnlock = NULL so any overlayed fs will call its
VOP_LOCK.
 1.63 24-Mar-1999  mrg branches: 1.63.2; 1.63.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.62 13-Aug-1998  kleink Per POSIX, fail with EINVAL if advisory locking is attempted on a file type
that doesn't support it, rather than using a homegrown EBADF or EOPNOTSUPP.
 1.61 10-Aug-1998  matthias create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.60 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.59 03-Aug-1998  kleink Recognize _PC_SYNC_IO.
 1.58 08-Mar-1998  mrg standardise options header includes.
 1.57 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.56 12-Feb-1998  thorpej Don't include option headers if building an LKM.
 1.55 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.54 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.53 10-Oct-1997  fvdl Bump last argument to VOP_READDIR to off_t (from u_long).
 1.52 19-Sep-1997  leo Implement the kernel part of pr-1891. This allows for a more flexible sized
msgbuf. Note that old 'dmesg' and 'syslogd' binaries will continue running,
though old 'dmesg' binaries will output a few bytes of junk at the start of
the buffer, and will miss a few bytes at the end of the buffer.
 1.51 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.50 10-May-1997  pk branches: 1.50.4;
Move `struct kern_target' definition into kernfs.h
 1.49 08-May-1997  mycroft Pass the vnode type to vaccess(), and use it when checking VEXEC. Make sure
that the mode bits passed to vaccess() and returned by foo_getattr() contain
only permission bits.
 1.48 25-Oct-1996  cgd define path name string variables that we should not (and, thankfully, do
not) modify as 'const char *' rather 'char *'.
 1.47 13-Oct-1996  christos backout previous kprintf changes
 1.46 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.45 07-Sep-1996  mycroft Implement poll(2).
 1.44 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.43 16-Mar-1996  christos Fix printf format follies.
 1.42 13-Feb-1996  mycroft GC *_nullop(). Minor nits.
 1.41 09-Feb-1996  christos miscfs prototype changes
 1.40 09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.39 09-Oct-1995  mycroft Fix the inode calculation in kernfs_getattr().
 1.38 09-Oct-1995  mycroft Use the index number as the cookie, rather than multiplying by UIO_MX.
 1.37 09-Oct-1995  mycroft Add support for cookies, mostly from Greg Hudson.
 1.36 15-Apr-1995  cgd fix timeval vs. timespec warnings
 1.35 03-Feb-1995  mycroft Return EROFS rather than ENOENT in many cases. Also some cosmetic cleanup.
 1.34 27-Dec-1994  mycroft Format police.
 1.33 24-Dec-1994  ws Implement and use a common access checking routine
 1.32 14-Dec-1994  mycroft Remove a_fp.
 1.31 01-Dec-1994  mycroft Make sure averunnable.fscale is filled before using it.
 1.30 14-Nov-1994  christos fixed struct comment
 1.29 20-Oct-1994  cgd update for new syscall args description mechanism
 1.28 21-Jul-1994  mycroft Implement /kern/msgbuf.
 1.27 29-Jun-1994  cgd branches: 1.27.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.26 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.25 15-Jun-1994  mycroft Deal with silly DIAGNOSTIC check.
 1.24 15-Jun-1994  mycroft Use vget() for devices.
 1.23 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.22 17-May-1994  mycroft Really fix the file size problem.
 1.21 17-May-1994  cgd actually set va_size!
 1.20 14-Feb-1994  ws Add .. entry to /kern
 1.19 11-Feb-1994  cgd don't give bogus return code from read()
 1.18 09-Feb-1994  cgd don't panic if user does 'cat /kern', though result is dubious.
 1.17 05-Jan-1994  cgd update with latest kernfs file system from jsp@sequent.com
 1.16 22-Dec-1993  cgd fix return type of vnode print routine
 1.15 20-Dec-1993  cgd branches: 1.15.2;
pull load average and misc changes down from magnum
 1.14 18-Dec-1993  mycroft Canonicalize all #includes.
 1.13 20-Nov-1993  cgd #ifdef out /kern/root at theo's request
 1.12 07-Sep-1993  ws branches: 1.12.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.11 02-Aug-1993  mycroft Make kernfs_print have a return type of void.
 1.10 07-Jun-1993  cgd give various filesystems their own vnode types
 1.9 28-May-1993  cgd add some more functionality: a setattr which supports chmod+chown+chgrp,
and the various support elsewhere to deal with it.
 1.8 28-May-1993  cgd add kernfs_access function, to kill kernfs security hole
 1.7 20-May-1993  cgd header cleanup
 1.6 27-Apr-1993  cgd fix several off-by-one errors in hostname setting/reading
 1.5 27-Apr-1993  mycroft Use EPERM when write permission is denied, not EBADF.
 1.4 27-Mar-1993  cgd added ".." and support for "rrootdev"
 1.3 25-Mar-1993  cgd fixed problem where you couldn't unmount after looking...
 1.2 25-Mar-1993  cgd changed copyright notice thanks to following statement:

Return-Path: jsp@compnews.co.uk
Received: from ben.uknet.ac.uk by postgres.Berkeley.EDU (5.61/1.29)
id AA25983; Thu, 25 Mar 93 05:37:37 -0800
Received: from fennel.compnews.co.uk by ben.uknet.ac.uk via UKIP with SMTP (PP)
id <g.05640-0@ben.uknet.ac.uk>; Thu, 25 Mar 1993 13:37:19 +0000
Received: from sage.compnews.co.uk by fennel.compnews.co.uk;
Thu, 25 Mar 93 13:37:08 GMT
Message-Id: <28109.9303251337@sage.compnews.co.uk>
From: jsp@compnews.co.uk (Jan-Simon Pendry)
Date: Thu, 25 Mar 1993 13:37:05 +0100
In-Reply-To: cgd@postgres.berkeley.edu's message as of Mar 25, 5:32am.
Phone-Number-1: +44 430 432450
Phone-Number-2: +44 430 432480 x20
Fax-Number: +44 430 432022
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: cgd@postgres.berkeley.edu
Subject: Re: fdesc/kernfs/etc code...

You may put this copyright message on the source code:

/*
* Copyright (c) 1990, 1992 Jan-Simon Pendry
* All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Jan-Simon Pendry.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*/
 1.1 23-Mar-1993  cgd branches: 1.1.1;
files which implement the kern filesystem. from Jan-Simon Pendry,
pendry@vangogh.cs.berkeley.edu
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.12.2.4 29-Nov-1993  mycroft Don't crash deferencing a null pointer if the raw root device was not found.
 1.12.2.3 20-Nov-1993  cgd update from trunk
 1.12.2.2 14-Nov-1993  mycroft Canonicalize all #includes.
 1.12.2.1 24-Sep-1993  mycroft kernfs_vnops: averunnable changes.
 1.15.2.3 06-Jan-1994  pk Re-instate EOPNOTSUPP
 1.15.2.2 28-Dec-1993  pk Use ENODEV rather then EOPNOTSUP for unsupported operations on non-socket devices
 1.15.2.1 20-Dec-1993  pk file kernfs_vnops.c was added on branch magnum on 1993-12-28 16:35:19 +0000
 1.27.2.1 22-Jul-1994  cgd from trunk.
 1.50.4.3 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.50.4.2 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.50.4.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.63.4.1 02-Aug-1999  thorpej Update from trunk.
 1.63.2.1 28-Aug-1999  he Pull up revisions 1.66-1.67:
Protect {fdesc,kernfs,procfs}_readdir against directory seeks
with bogus offsets. (sommerfeld)
 1.67.12.1 14-Jul-2000  thorpej Update from trunk:
Sprinkle some const.
 1.67.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.67.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.67.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.67.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.75.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.75.2.5 13-Aug-2002  nathanw Catch up to -current.
 1.75.2.4 01-Aug-2002  nathanw Catch up to -current.
 1.75.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.75.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.75.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.76.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.76.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.76.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.79.10.3 29-Nov-2005  tron Pull up following revision(s) (requested by christos in ticket #5952):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.114 via patch
Fix 64 bit truncation problem reported by http://www.securitylab.net
 1.79.10.2 14-Oct-2002  lukem Pull up revision 1.84 (requested by jdolecek in ticket #911):
put back the while loop in kernfs_getattr() removed in rev 1.82; it's
necessary to get the whole file length e.g. for msgbuf
this fixes the '/kern/msgbuf & less' problem reported on port-i386
by Dave Tyson
 1.79.10.1 21-Jul-2002  lukem Pull up revision 1.82 (requested by jdolocek in ticket #526):
Properly detect error in kernfs_xread().
Fixes kern/10278 by IWAMOTO Toshihiro, though implemented different way.
While here, clean up some int vs. size_t confusion, make
kernfs_x{read|write}() static and g/c some #if 0 stuff.
 1.79.8.3 29-Aug-2002  gehenna catch up with -current.
 1.79.8.2 20-Jul-2002  gehenna catch up with -current.
 1.79.8.1 15-Jul-2002  gehenna catch up with -current.
 1.88.2.10 11-Dec-2005  christos Sync with head.
 1.88.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.88.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.88.2.7 31-Oct-2004  skrll Reduce diff to HEAD.
 1.88.2.6 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.88.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.88.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.88.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.88.2.2 03-Aug-2004  skrll Sync with HEAD
 1.88.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.98.2.7 29-Nov-2005  tron Pull up following revision(s) (requested by christos in ticket #10155):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.114 via patch
Fix 64 bit truncation problem reported by http://www.securitylab.net
 1.98.2.6 01-Sep-2005  riz Pull up following revision(s) (requested by christos in ticket #5637):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.112
Also protect the ipsec ioctls from negative offsets to prevent panics
in m_copydata(). Pointed out by Karl Janmar. Move the negative offset
check from kernfs_xread() to kernfs_read().
 1.98.2.5 31-Aug-2005  tron Pull up following revision(s) (requested by christos in ticket #5633):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.111
Don't allow negative offsets when reading the message buffer, because it
can allow reading arbitrary kernel memory.
 1.98.2.4 24-May-2005  riz Pull up revision 1.108 (requested by chs in ticket #1540):
kernfs does not support mmap(), remove code that pretends that it does.
 1.98.2.3 15-May-2004  tron branches: 1.98.2.3.2;
Pull up revision 1.104 (requested by cl in ticket #326):
Workaround for gcc 2.95.3 failing to initialize structures
and/or unions inside structures using nested designators.
Should be reverted when gcc >=3.3.3 is ready for vax.
 1.98.2.2 15-May-2004  tron Pull up revision 1.102 (requested by cl in ticket #336):
Allow additional entries (files, subdirs) in kernfs. Also allow
defining additional kfstypes and provide hooks to run arbitrary code
for any vnodeop on the additional types.
 1.98.2.1 14-May-2004  jdc Pull up revision 1.100 and 1.101 (requested by cl in ticket #322).

Find the right entry when doing lookup on dotdot in kern/ipsec subdirs.
Also remove some duplicate code.

Make lookup and readdir return the same inode number. kernfs_readdir
now uses kernfs_allocvp to map from kernfs entry to inode number,
kernfs_allocvp is now the only place where entries are mapped to inode
numbers. Also make KERNFS_FILENO not return random results for entries
not in kern_targets.
 1.98.2.3.2.4 29-Nov-2005  tron Pull up following revision(s) (requested by christos in ticket #10155):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.114 via patch
Fix 64 bit truncation problem reported by http://www.securitylab.net
 1.98.2.3.2.3 01-Sep-2005  riz branches: 1.98.2.3.2.3.2;
Pull up following revision(s) (requested by christos in ticket #5637):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.112
Also protect the ipsec ioctls from negative offsets to prevent panics
in m_copydata(). Pointed out by Karl Janmar. Move the negative offset
check from kernfs_xread() to kernfs_read().
 1.98.2.3.2.2 31-Aug-2005  tron Pull up following revision(s) (requested by christos in ticket #5633):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.111
Don't allow negative offsets when reading the message buffer, because it
can allow reading arbitrary kernel memory.
 1.98.2.3.2.1 24-May-2005  riz Pull up revision 1.108 (requested by chs in ticket #1540):
kernfs does not support mmap(), remove code that pretends that it does.
 1.98.2.3.2.3.2.1 29-Nov-2005  tron Pull up following revision(s) (requested by christos in ticket #10155):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.114 via patch
Fix 64 bit truncation problem reported by http://www.securitylab.net
 1.106.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.106.4.1 29-Apr-2005  kent sync with -current
 1.107.2.4 24-Nov-2005  tron Pull up following revision(s) (requested by christos in ticket #992):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.114 via patch
Fix 64 bit truncation problem reported by http://www.securitylab.net
 1.107.2.3 01-Sep-2005  tron Pull up following revision(s) (requested by christos in ticket #728):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.112
Also protect the ipsec ioctls from negative offsets to prevent panics
in m_copydata(). Pointed out by Karl Janmar. Move the negative offset
check from kernfs_xread() to kernfs_read().
 1.107.2.2 31-Aug-2005  tron Pull up following revision(s) (requested by christos in ticket #727):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.111
Don't allow negative offsets when reading the message buffer, because it
can allow reading arbitrary kernel memory.
 1.107.2.1 28-May-2005  tron Pull up revision 1.108 (requested by chs in ticket #329):
kernfs does not support mmap(), remove code that pretends that it does.
 1.109.2.4 21-Jan-2008  yamt sync with head
 1.109.2.3 07-Dec-2007  yamt sync with head
 1.109.2.2 30-Dec-2006  yamt sync with head.
 1.109.2.1 21-Jun-2006  yamt sync with head.
 1.112.2.1 20-Oct-2005  yamt adapt kernfs.
 1.113.2.1 29-Nov-2005  yamt sync with head.
 1.116.6.3 01-Jun-2006  kardel Sync with head.
 1.116.6.2 22-Apr-2006  simonb Sync with head.
 1.116.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.116.4.1 09-Sep-2006  rpaulo sync with head
 1.116.2.1 05-Feb-2006  yamt adapt kernfs.
 1.117.4.2 19-Apr-2006  elad sync with head.
 1.117.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.117.2.4 26-Jun-2006  yamt sync with head.
 1.117.2.3 24-May-2006  yamt sync with head.
 1.117.2.2 11-Apr-2006  yamt sync with head
 1.117.2.1 01-Apr-2006  yamt sync with head.
 1.118.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.120.2.1 19-Jun-2006  chap Sync with head.
 1.121.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.125.6.2 10-Dec-2006  yamt sync with head.
 1.125.6.1 22-Oct-2006  yamt sync with head
 1.125.4.2 12-Jan-2007  ad Sync with head.
 1.125.4.1 18-Nov-2006  ad Sync with head.
 1.128.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.132.26.2 18-Feb-2008  mjf Sync with HEAD.
 1.132.26.1 08-Dec-2007  mjf Sync with HEAD.
 1.132.20.1 09-Jan-2008  matt sync with HEAD
 1.132.18.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.132.6.1 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.133.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.133.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.134.20.2 28-Apr-2009  skrll Sync with HEAD.
 1.134.20.1 19-Jan-2009  skrll Sync with HEAD.
 1.134.12.4 11-Aug-2010  yamt sync with head.
 1.134.12.3 11-Mar-2010  yamt sync with head
 1.134.12.2 18-Jul-2009  yamt sync with head.
 1.134.12.1 04-May-2009  yamt sync with head.
 1.134.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.134.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.135.2.2 23-Jul-2009  jym Sync with HEAD.
 1.135.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.140.4.3 05-Mar-2011  rmind sync with head
 1.140.4.2 03-Jul-2010  rmind sync with head
 1.140.4.1 30-May-2010  rmind sync with head
 1.140.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.140.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.143.12.2 05-Apr-2012  mrg sync to latest -current.
 1.143.12.1 18-Feb-2012  mrg merge to -current.
 1.143.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.143.8.1 17-Apr-2012  yamt sync with head
 1.144.2.2 03-Sep-2016  bouyer Revert ticket 1367, it causes a kernel panic in test lib/libc/gen/t_getcwd
as seen in e.g.
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/netbsd-6/i386/201608291710Z_anita.txt

lib/libc/gen/t_getcwd (206/500): 2 test cases
getcwd_err: [0.006614s] Passed.
getcwd_fts: uvm_fault(0xc0e221b0, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c023ba9f cs 9 eflags 10246 cr2 1c ilevel 0
panic: trap
cpu1: Begin traceback...
panic(c04616d0,cdcfb938,cdcfb938,c023ba9f,9,10246,1c,0,1c,0) at netbsd:panic+0x18
trap() at netbsd:trap+0xb51
--- trap (number 6) ---
kernfs_readdir(cdcfbc0c,1,c11ce0b4,c0439f60,c11ce0b4,cdcfbc58,c0cc0cc0,cdcfbc7c,0,0) at netbsd:kernfs_readdir+0x98f
VOP_READDIR(c11ce0b4,cdcfbc58,c0cc0cc0,cdcfbc7c,0,0,c19287e0,1,cdcfbc58,cdcfbc74) at netbsd:VOP_READDIR+0x68
vn_readdir(c14c3000,bb512000,0,1000,cdcfbcbc,c19287e0,0,0,c14c3000,0) at netbsd:vn_readdir+0xbd
sys___getdents30(c19287e0,cdcfbd00,cdcfbd28,186,bb516000,0,cdcfbd00,c1199bf4,2,bb7a4fe7) at netbsd:sys___getdents30+0x8c
syscall(cdcfbd48,bb6b00b3,ab,bf7f001f,bb6b001f,0,bb5010d0,bf7fe764,bb7c4be0,0) at netbsd:syscall+0xaa
cpu1: End traceback...
 1.144.2.1 27-Aug-2016  bouyer Pull up following revision(s) (requested by is in ticket #1367):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.151
>From Ilya Zykov: Unbreak kernfs which was broken by this commit

|Make the spec_node table implementation private to spec_vnops.c.
|To retrieve a spec_node, two new lookup functions (by device or by mount)
|are implemented. Both return a referenced vnode, for an opened block device
|the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
|will not fire. Otherwise any vnode matching the criteria gets returned.
|No objections on tech-kern.

The effect was that ls /kernfs appeared empty in most cases.
 1.146.2.3 03-Dec-2017  jdolecek update from HEAD
 1.146.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.146.2.1 23-Jun-2013  tls resync from head
 1.147.6.1 18-May-2014  rmind sync with head
 1.150.2.1 10-Aug-2014  tls Rebase.
 1.154.10.1 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1703):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.154.6.1 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1703):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.154.4.3 28-Aug-2017  skrll Sync with HEAD
 1.154.4.2 05-Oct-2016  skrll Sync with HEAD
 1.154.4.1 06-Jun-2015  skrll Sync with HEAD
 1.154.2.1 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1703):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.155.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.156.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.158.8.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.158.8.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.158.2.1 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1346):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.159.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.159.2.1 10-Jun-2019  christos Sync with HEAD
 1.160.4.3 06-Jul-2021  martin Pull up following revision(s) (requested by dholland in ticket #1318):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.169
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.170

Add missing VOP_KQFILTER to kernfs.

Not sure if lack of it can be used for local DoS or not, but best to
fix.

-

Fix perms on /kern/{r,}rootdev.
 1.160.4.2 12-Feb-2020  martin Pull up following revision(s) (requested by riastradh in ticket #702):

sys/miscfs/kernfs/kernfs_vfsops.c: revision 1.98
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.163
sys/miscfs/kernfs/kernfs.h: revision 1.43

Use specfs vnops for specnodes in kernfs.

While here, don't filter out rootdev and rrootdev merely because
they're not cached.

Fixes the elusive /kern/rootdev and /kern/rrootdev nodes, which only
appeared sometimes when they felt like it, and fixes operations on
/kern/rootdev and /kern/rrootdev always returning EOPNOTSUPP.

We didn't seem to have a single PR for these issues but the following
PRs are all relevant:

PR bin/13564
PR kern/38265
PR kern/38778
PR kern/45974

XXX pullup-9, pullup-8, pullup-7, pullup-6, pullup-5, pullup-4, pullup-3, p=
ullup-2, pullup-1.4T...
 1.160.4.1 01-Sep-2019  martin Pull up following revision(s) (requested by hannken in ticket #132):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207
Add missing operation VOP_GETPAGES() returning EFAULT.
Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.
Observed by maxv@
 1.162.2.1 29-Feb-2020  ad Sync with head.
 1.166.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.5 12-Oct-2014  uebayasi Define layerfs as an attribute.
 1.4 11-Oct-2014  uebayasi Define filesystem attributes with vfs dependency.
 1.3 11-Dec-2005  christos branches: 1.3.120;
merge ktrace-lwp.
 1.2 26-Feb-2005  perry nuke trailing whitespace
 1.1 16-Apr-2002  thorpej branches: 1.1.6; 1.1.8; 1.1.14; 1.1.22; 1.1.24;
Cleanup how file system configuration information is declared, grouping
related information together, with the file system code itself.

This is just low-hanging fruit -- more to come.
 1.1.24.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.22.1 29-Apr-2005  kent sync with -current
 1.1.14.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.8.1 16-Apr-2002  jdolecek file files.nullfs was added on branch kqueue on 2002-06-23 17:50:11 +0000
 1.1.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.1.6.1 16-Apr-2002  nathanw file files.nullfs was added on branch nathanw_sa on 2002-06-20 03:47:58 +0000
 1.3.120.1 03-Dec-2017  jdolecek update from HEAD
 1.20 11-Apr-2017  hannken Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.19 02-Jul-2010  rmind branches: 1.19.18; 1.19.36; 1.19.40; 1.19.44;
Slightly clean-up layerfs and nullfs: update the big description more to
the reality (remove duplicate one in nullfs, merge some differences from
it), KNF, improve and update some comments, add few KASSERT()s, remove
unused declarations, avoid double inclusion of headers, misc.

No functional changes.
 1.18 28-Jun-2008  rumble branches: 1.18.16; 1.18.18;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.17 11-Dec-2005  christos branches: 1.17.70; 1.17.74; 1.17.76; 1.17.78;
merge ktrace-lwp.
 1.16 30-Aug-2005  xtraeme Remove __P()
 1.15 20-May-2004  atatat branches: 1.15.12;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.14 07-Aug-2003  agc branches: 1.14.2;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 07-Nov-2001  enami branches: 1.13.16;
Wrap long line and remove name of argument from function prototype.
 1.12 07-Jun-2001  wiz branches: 1.12.2; 1.12.6;
Typos and grammer fixes in comments (misc/13133 by Michael K. Sanders)
 1.11 13-Mar-2000  soren branches: 1.11.6;
Fix doubled 'the's in comments.
 1.10 08-Jul-1999  wrstuden branches: 1.10.2;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.9 06-Oct-1997  thorpej branches: 1.9.12;
Make the vfs ops and vnodeop_opv symbols match the name of the
file-system option used to configure the file system into the kernel.
 1.8 10-Apr-1997  cgd branches: 1.8.4;
don't try to use __builtin_return_address() on the Alpha. (It's never
worked as far as I can tell, and apparently crashes the kernel when
invoked here.) From Ross Harvey, PR#3471.
 1.7 17-May-1996  gwr Allow the DIAGNOSTIC to compile with old versions of gcc.
 1.6 10-May-1996  jtk Add locking code to avoid deadlocks on vnode reclaim, which means the
addition of null_lookup, null_lock, null_unlock, null_islocked.
 1.5 09-Feb-1996  christos miscfs prototype changes
 1.4 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.3 19-Aug-1994  mycroft Convert hash tables.
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.1 19-Aug-1994  mycroft update from trunk
 1.8.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.9.12.1 02-Aug-1999  thorpej Update from trunk.
 1.10.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.11.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.12.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.12.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.16.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.13.16.1 03-Aug-2004  skrll Sync with HEAD
 1.14.2.1 23-May-2004  tron Pull up revision 1.15 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.15.12.1 21-Jun-2006  yamt sync with head.
 1.17.78.1 03-Jul-2008  simonb Sync with head.
 1.17.76.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.17.74.2 11-Aug-2010  yamt sync with head.
 1.17.74.1 04-May-2009  yamt sync with head.
 1.17.70.1 29-Jun-2008  mjf Sync with HEAD.
 1.18.18.1 03-Jul-2010  rmind sync with head
 1.18.16.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.19.44.1 21-Apr-2017  bouyer Sync with HEAD
 1.19.40.1 26-Apr-2017  pgoyette Sync with HEAD
 1.19.36.1 28-Aug-2017  skrll Sync with HEAD
 1.19.18.1 03-Dec-2017  jdolecek update from HEAD
 1.15 08-Jul-1999  wrstuden Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.14 09-Apr-1999  wrstuden Make message about not loking a vnode in nullfs_create a little quieter -
now only enabled if NULLFS_DIAGNOSTIC and no longer if DEBUG or DIAGNOSTIC.
 1.13 30-Sep-1998  jonathan branches: 1.13.6;
Workaropund fix for PR #5239 from <minoura@kw.netlaputa.ne.jp>:
stop null_node_create() from locking the nullfs mountpoint multiple
times. Avoids a guaranteed, repeatably "locking against myself" panic
during mount of a nullfs filesystem. nullfs filesystems are still as
buggy as ever (e.g., see PR# 4907) but this you at least mount them.
 1.12 11-Mar-1998  fvdl Fix flags mess-up in vget. LK_EXCLUSIVE -> 0 (even indicated in the
comment, d'oh!)
 1.11 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.10 07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.9 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.8 13-Oct-1996  christos branches: 1.8.10;
backout previous kprintf changes
 1.7 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.6 10-May-1996  jtk Add locking code to avoid deadlocks on vnode reclaim, which means the
addition of null_lookup, null_lock, null_unlock, null_islocked.
 1.5 09-Feb-1996  christos miscfs prototype changes
 1.4 20-Sep-1994  cgd fix device aliasing and lost vnode problems.
 1.3 19-Aug-1994  mycroft Convert hash tables.
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.2 20-Sep-1994  cgd from trunk.
 1.2.2.1 19-Aug-1994  mycroft update from trunk
 1.8.10.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.13.6.1 15-Apr-1999  wrstuden branches: 1.13.6.1.2;
Pull up rev. 1.13->1.14, approved by Curt. Change a diagnostic message
in nullfs_node_create to only be emitted ifdef NULLFS_DIAGNOSTIC and
no longer if DEBUG or DIAGNOSTIC. Now matches all other diagnostic
messages in nullfs.
 1.13.6.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.13.6.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.101 06-Feb-2023  hannken Set IMNT_MPSAFE only if the lower layer has it set.
 1.100 04-Nov-2022  hannken branches: 1.100.2;
Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105
 1.99 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.98 04-Apr-2020  ad branches: 1.98.2;
Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.97 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.96 15-Dec-2019  joerg branches: 1.96.2;
Set IMNT_MPSAFE before creating the vnode for the root of the
filesystem. Otherwise, it won't be created with VV_MPSAFE and require
the kernel_lock.
 1.95 20-Feb-2019  hannken branches: 1.95.4;
Set "mnt_lower" before the first file system operation on the new file system.
 1.94 11-Apr-2017  hannken branches: 1.94.4; 1.94.12;
Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.93 30-Mar-2017  hannken Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.
 1.92 06-Mar-2017  hannken Add field "mnt_lower" to "struct mount" to track the file system
a layered file system is mounted on.

Welcome to 7.99.65
 1.91 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.90 09-Nov-2014  maxv branches: 1.90.2; 1.90.4; 1.90.6;
Do not uselessly include <sys/malloc.h>.
 1.89 25-May-2014  hannken branches: 1.89.2;
Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.88 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.87 23-Mar-2014  hannken branches: 1.87.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.86 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.85 10-Feb-2014  hannken Change layerfs_vget(), layerfs_fhtovp() and the various layer xxx_mount()
functions to unlock/relock the node for the call to layer_node_create().

Finally remove dirty hacks (LK_NOWAIT, kpause) from layer_node_find().
 1.84 30-Apr-2012  rmind branches: 1.84.2; 1.84.4;
- Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.83 19-Nov-2010  dholland branches: 1.83.8; 1.83.12; 1.83.14; 1.83.18; 1.83.20;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
 1.82 02-Jul-2010  rmind Slightly clean-up layerfs and nullfs: update the big description more to
the reality (remove duplicate one in nullfs, merge some differences from
it), KNF, improve and update some comments, add few KASSERT()s, remove
unused declarations, avoid double inclusion of headers, misc.

No functional changes.
 1.81 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.80 10-Apr-2010  jld Change the nullfs module's actual name to "null", to match the name
it's installed under and the name of the filesystem.

Fixes PR kern/43110.
 1.79 14-Mar-2009  dsl branches: 1.79.2; 1.79.4;
Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.78 05-Dec-2008  ad branches: 1.78.4;
PR kern/40110: null, overlay and umap modules loading -> panic (layerfs symbols not there)

Add a layerfs module.
 1.77 24-Jun-2008  ad branches: 1.77.2; 1.77.4; 1.77.6; 1.77.12; 1.77.16;
Set up the sysctl tree correctly when loaded as a file system.
 1.76 10-May-2008  rumble branches: 1.76.2;
Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.75 05-May-2008  ad branches: 1.75.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.74 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.73 28-Jan-2008  dholland branches: 1.73.6; 1.73.8; 1.73.10;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.72 02-Jan-2008  ad Merge vmlocking2 to head.
 1.71 08-Dec-2007  pooka branches: 1.71.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.70 26-Nov-2007  pooka branches: 1.70.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.69 10-Oct-2007  ad branches: 1.69.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.68 31-Jul-2007  pooka branches: 1.68.2; 1.68.4; 1.68.6; 1.68.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.67 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.66 17-Jul-2007  pooka branches: 1.66.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.65 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.64 08-Jul-2007  pooka * allow unmount even if rootvp has a usecount > 1 provided that
MNT_FORCE is given
* decrease cargo cult index by getting rid of commented sections
with mntflushbuf() in them - AFAICT the call was removed from our
kernel over 13 years ago with the 4.4BSDlite import
 1.63 08-Apr-2007  hannken Remove now obsolete vn_start_write() and vn_finished_write() and
corresponding flags.

Revert softdep_trackbufs() to its state before vn_start_write() was added.

Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.

Welcome to 4.99.17
 1.62 19-Jan-2007  hannken branches: 1.62.2; 1.62.6; 1.62.8;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.61 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.60 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.59 03-Sep-2006  christos branches: 1.59.2; 1.59.4;
add missing initializers
 1.58 11-Dec-2005  christos branches: 1.58.4; 1.58.8;
merge ktrace-lwp.
 1.57 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.56 30-Aug-2005  xtraeme Remove __P()
 1.55 29-Mar-2005  thorpej branches: 1.55.2;
- Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.54 26-Feb-2005  perry nuke trailing whitespace
 1.53 02-Jan-2005  thorpej branches: 1.53.2; 1.53.4;
Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.52 01-Jul-2004  hannken Keep a pointer to the leaf mount. Needed for write gating where a
file system gets suspended and has layered mounts above it.

Welcome to 2.0G

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.51 29-May-2004  wrstuden Add layerfs_snapshot() as a handler routine for VFS_SNAPSHOT() calls
through a layered file system.

Note: we don't actually support snapshots through a layered file system,
and this routine returns an error. However we: 1) have clearly documented
what needs fixing (which isn't trivial to fix) and 2) if we do fix
this, all layered file systems can take advantage of it at once.
 1.50 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.49 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.48 29-Apr-2004  jrf Removed remaining caddr_t casts we do not need in miscfs. Recompiled
kernel and ran for a day or so. There are still some caddr_t types in
the arguments of some calls, I will do those separately (later) as
they touch a lot more of the system.
Approved by christos@NetBSD.org.
 1.47 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.46 24-Mar-2004  atatat branches: 1.46.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.45 09-Mar-2004  atatat Remove pointless comment about layerfs_sysctl()
 1.44 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.43 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.42 29-Jun-2003  fvdl branches: 1.42.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.41 29-Jun-2003  thorpej Adjust for ktrace/lwp changes.
 1.40 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.39 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.38 21-Sep-2002  christos MNT_GETARGS support
 1.37 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.36 10-Nov-2001  lukem branches: 1.36.8;
add RCSIDs
 1.35 07-Nov-2001  enami Make the size of null node hash table to desiredvnodes instead of 16.
 1.34 07-Nov-2001  enami Call hashdone where appropriate.
 1.33 07-Nov-2001  enami Cosmetic changes.
 1.32 15-Sep-2001  chs branches: 1.32.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.31 02-Aug-2001  assar branches: 1.31.2;
(*fs_mount): do not get the parent vnode back from namei to just release it
 1.30 07-Jun-2001  wiz branches: 1.30.2;
Typos and grammer fixes in comments (misc/13133 by Michael K. Sanders)
 1.29 22-Jan-2001  jdolecek branches: 1.29.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.28 08-Nov-2000  ad Update for hashinit() change.
 1.27 10-Jun-2000  assar make vfs_getnewfsid only take one argument and fetch the name of the
filesystem from the supplied mount argument. also make makefstype
take a const parameter. update all the callers.
 1.26 16-Mar-2000  jdolecek branches: 1.26.2;
Adapt to last VFS changes - add appropriate vfs_done routine.
 1.25 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.24 08-Jul-1999  wrstuden branches: 1.24.2; 1.24.8;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.23 26-Feb-1999  wrstuden branches: 1.23.4;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.22 15-Jan-1999  wrstuden Oops. That extra "*" doesn't need to be there.
 1.21 13-Jan-1999  wrstuden In nullfs_mount, we need to check if error before VOP_UNLOCK(vp,0) as
vp is initialized iff error==0 in null_node_create.
 1.20 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.19 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.18 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.17 06-Oct-1997  thorpej Make the vfs ops and vnodeop_opv symbols match the name of the
file-system option used to configure the file system into the kernel.
 1.16 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.15 11-Mar-1997  mikel branches: 1.15.4;
this is nullfs, not lofs
 1.14 22-Dec-1996  cgd branches: 1.14.6;
Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.13 13-Oct-1996  christos backout previous kprintf changes
 1.12 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.11 10-May-1996  jtk Add locking code to avoid deadlocks on vnode reclaim, which means the
addition of null_lookup, null_lock, null_unlock, null_islocked.
 1.10 09-Feb-1996  christos miscfs prototype changes
 1.9 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.8 09-Mar-1995  mycroft copy*str() should use size_t.
 1.7 08-Mar-1995  cgd use u_long for copyin*
 1.6 25-Jan-1995  cgd return EOPNOTSUPP from fhtovp and vptofh functions; doing otherwise
correctly is not possible.
 1.5 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a (tiny) bit.
 1.4 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.3 15-Sep-1994  mycroft stat the file system at mount time, for `df -n', et al.
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.1 16-Sep-1994  cgd from trunk, per mycroft
 1.14.6.1 12-Mar-1997  is Merge in changes from Trunk
 1.15.4.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.15.4.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.23.4.1 02-Aug-1999  thorpej Update from trunk.
 1.24.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.24.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.24.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.24.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.26.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.29.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.29.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.29.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.29.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.29.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.29.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.30.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.30.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.30.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.30.2.1 03-Aug-2001  lukem update to -current
 1.31.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.32.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.36.8.1 29-Aug-2002  gehenna catch up with -current.
 1.42.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.2.8 01-Apr-2005  skrll Sync with HEAD.
 1.42.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.42.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.42.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.42.2.3 03-Aug-2004  skrll Sync with HEAD
 1.42.2.2 02-Jul-2003  wrstuden Check in lwp-ification changes needed to get the evbarm/IQ80321 kernel
to compile.

only question I have is over the:
l->l_proc->p_stats->p_ru.ru_msgsnd++;
command at line 245 of dev/kttcp.c. Should we be doing per-lwp or
per-proc accounting?
 1.42.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.46.2.1 29-May-2004  tron Pull up revision 1.49 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.53.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.53.2.1 29-Apr-2005  kent sync with -current
 1.55.2.8 04-Feb-2008  yamt sync with head.
 1.55.2.7 21-Jan-2008  yamt sync with head
 1.55.2.6 07-Dec-2007  yamt sync with head
 1.55.2.5 27-Oct-2007  yamt sync with head.
 1.55.2.4 03-Sep-2007  yamt sync with head.
 1.55.2.3 26-Feb-2007  yamt sync with head.
 1.55.2.2 30-Dec-2006  yamt sync with head.
 1.55.2.1 21-Jun-2006  yamt sync with head.
 1.58.8.1 14-Sep-2006  yamt sync with head.
 1.58.4.1 09-Sep-2006  rpaulo sync with head
 1.59.4.2 10-Dec-2006  yamt sync with head.
 1.59.4.1 22-Oct-2006  yamt sync with head
 1.59.2.2 01-Feb-2007  ad Sync with head.
 1.59.2.1 18-Nov-2006  ad Sync with head.
 1.62.8.1 11-Jul-2007  mjf Sync with head.
 1.62.6.6 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.62.6.5 20-Aug-2007  ad Sync with HEAD.
 1.62.6.4 15-Jul-2007  ad Sync with head.
 1.62.6.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.62.6.2 10-Apr-2007  ad Sync with head.
 1.62.6.1 05-Apr-2007  ad Compile fixes.
 1.62.2.1 15-Apr-2007  yamt sync with head.
 1.66.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.68.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.68.8.1 31-Jul-2007  pooka file null_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:17 +0000
 1.68.6.1 14-Oct-2007  yamt sync with head.
 1.68.4.3 23-Mar-2008  matt sync with HEAD
 1.68.4.2 09-Jan-2008  matt sync with HEAD
 1.68.4.1 06-Nov-2007  matt sync with HEAD
 1.68.2.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.68.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.68.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.69.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.69.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.69.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.70.2.3 26-Dec-2007  ad Sync with head.
 1.70.2.2 06-Dec-2007  ad Mark it MPSAFE.
 1.70.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.71.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.73.10.3 11-Aug-2010  yamt sync with head.
 1.73.10.2 04-May-2009  yamt sync with head.
 1.73.10.1 16-May-2008  yamt sync with head.
 1.73.8.1 18-May-2008  yamt sync with head.
 1.73.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.73.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.73.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.75.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.75.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.76.2.1 27-Jun-2008  simonb Sync with head.
 1.77.16.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.77.12.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.77.6.1 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.77.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.77.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.77.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.78.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.79.4.3 05-Mar-2011  rmind sync with head
 1.79.4.2 03-Jul-2010  rmind sync with head
 1.79.4.1 30-May-2010  rmind sync with head
 1.79.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.79.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.83.20.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.83.18.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.83.14.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.83.12.1 02-Jun-2012  mrg sync to latest -current.
 1.83.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.83.8.1 23-May-2012  yamt sync with head.
 1.84.4.1 18-May-2014  rmind sync with head
 1.84.2.2 03-Dec-2017  jdolecek update from HEAD
 1.84.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.87.2.1 10-Aug-2014  tls Rebase.
 1.89.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.90.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.90.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.90.4.1 20-Mar-2017  pgoyette Sync with HEAD
 1.90.2.1 28-Aug-2017  skrll Sync with HEAD
 1.94.12.3 21-Apr-2020  martin Sync with HEAD
 1.94.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.94.12.1 10-Jun-2019  christos Sync with HEAD
 1.94.4.1 24-Dec-2019  martin Pull up following revision(s) (requested by hannken in ticket #1476):

sys/miscfs/nullfs/null_vfsops.c: revision 1.96

Set IMNT_MPSAFE before creating the vnode for the root of the
filesystem. Otherwise, it won't be created with VV_MPSAFE and require
the kernel_lock.
 1.95.4.2 06-Feb-2023  martin Pull up following revision(s) (requested by hannken in ticket #1587):

sys/fs/union/union_vfsops.c: revision 1.86
sys/miscfs/nullfs/null_vfsops.c: revision 1.101 (via patch)

Set IMNT_MPSAFE only if all lower layers have it set.
 1.95.4.1 24-Dec-2019  martin Pull up following revision(s) (requested by hannken in ticket #581):

sys/miscfs/nullfs/null_vfsops.c: revision 1.96

Set IMNT_MPSAFE before creating the vnode for the root of the
filesystem. Otherwise, it won't be created with VV_MPSAFE and require
the kernel_lock.
 1.96.2.2 22-Jan-2020  ad Copy the IMNT_SHRLOOKUP flag from lowerrootvp's mount.
 1.96.2.1 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.98.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.100.2.1 06-Feb-2023  martin Pull up following revision(s) (requested by hannken in ticket #68):

sys/fs/union/union_vfsops.c: revision 1.86
sys/miscfs/nullfs/null_vfsops.c: revision 1.101

Set IMNT_MPSAFE only if all lower layers have it set.
 1.43 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.42 04-Jun-2017  hannken Locking a layer vnode using the regular bypass routine is no longer
racy. Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
 1.41 30-Mar-2017  hannken branches: 1.41.6;
Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
 1.40 27-Jan-2017  hannken Handle v_writecount from layer_open(), layer_close() and layer_revoke()
so lower file system vnodes get marked as open for writing.
 1.39 27-Feb-2014  hannken branches: 1.39.6; 1.39.10; 1.39.14;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.38 11-Jul-2011  hannken branches: 1.38.2; 1.38.12; 1.38.16;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.37 10-Jan-2011  hannken Add layer_revoke() that adjusts the lower vnode use count to be at least as
high as the upper vnode count before passing down the VOP_REVOKE().

This way vclean() check for active (vp->v_usecount > 1) vnodes gets it right.

Should fix PR kern/43456.
 1.36 02-Jul-2010  hannken LK_INTERLOCK is no longer a valid flag for VOP_LOCK(). This makes
layer_*lock*() obsolete. Remove them and handle lock operations
with the generic bypass function.

Ride 5.99.34.
 1.35 02-Jul-2010  rmind Slightly clean-up layerfs and nullfs: update the big description more to
the reality (remove duplicate one in nullfs, merge some differences from
it), KNF, improve and update some comments, add few KASSERT()s, remove
unused declarations, avoid double inclusion of headers, misc.

No functional changes.
 1.34 11-Dec-2005  christos branches: 1.34.74; 1.34.96; 1.34.98;
merge ktrace-lwp.
 1.33 30-Aug-2005  xtraeme Remove __P()
 1.32 26-Feb-2005  perry branches: 1.32.4;
nuke trailing whitespace
 1.31 30-Jun-2004  hannken branches: 1.31.4; 1.31.6;
Do LAYERFS_REMOVED for vop_rmdir.

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.30 07-Jun-2004  yamt do a LAYERFS_REMOVED hack for vop_rename as well.
 1.29 28-May-2004  wrstuden Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).

We still need VOP_UPCALL(), but this is better than things were before.

This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.28 25-Jan-2004  hannken branches: 1.28.2;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.27 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.26 10-Sep-2002  jtk branches: 1.26.6;
restore ancestral RCS ID from 4.4BSD-Lite2
 1.25 04-Jan-2002  chs add the entry for layer_getpages() to the VOP tables of the
layered file systems that need it.
 1.24 06-Dec-2001  chs add VOP_GETPAGES and VOP_PUTPAGES methods for layered filesystems.
drop the interlock on the upper layer, acquire the interlock on the
lower layer.
 1.23 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.22 10-Nov-2001  lukem add RCSIDs
 1.21 07-Nov-2001  enami Fix typo in comment.
 1.20 09-Jun-2001  wiz branches: 1.20.2; 1.20.6;
Some more corrections by Michael K. Sanders.
 1.19 07-Jun-2001  wiz Typos and grammer fixes in comments (misc/13133 by Michael K. Sanders)
 1.18 22-Jan-2001  jdolecek branches: 1.18.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.17 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.16 08-Jul-1999  wrstuden branches: 1.16.2;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.15 25-Mar-1999  bouyer branches: 1.15.4;
We must handle MNT_NODEV at open time, so add an open op for null and union,
and do proper checks in union_open(). Fix to nullfs from OpenBSD, extended
to umap and union by me.
 1.14 22-Mar-1999  sommerfe vinvalbuf, called from vclean, could cause a locking-against-self
deadlock in VOP_FSYNC() if the unreferenced vnode picked for
reclamation happened to be stacked on top of a vnode the process
already had locked. This could happen if the same filesystem was
accessed both through a union mount and directly; it seemed to happen
most frequently when the direct access was through NFS.

Avoid this deadlock by changing vinvalbuf to pass a new FSYNC_RECLAIM
flag bit to VOP_FSYNC() to indicate that a reclaim is in progress and
only a `shallow' fsync is necessary.

Do nothing in *_fsync() in umapfs, nullfs, and unionfs when
FSYNC_RECLAIM is set; the underlying vnodes will shortly be released
in *_reclaim and may be reclaimed (and fsync'ed) later.
 1.13 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.12 06-Oct-1997  thorpej Make the vfs ops and vnodeop_opv symbols match the name of the
file-system option used to configure the file system into the kernel.
 1.11 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.10 17-May-1997  pk branches: 1.10.4;
NULL => 0 (Arne Juul; PR#3629)
 1.9 13-Oct-1996  christos backout previous kprintf changes
 1.8 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.7 10-May-1996  jtk Add locking code to avoid deadlocks on vnode reclaim, which means the
addition of null_lookup, null_lock, null_unlock, null_islocked.
 1.6 13-Apr-1996  cgd fix screw-up in the prototyping changes: print pointers with %p, NOT
by casting them to (unsigned int) then printing with %x.
 1.5 09-Feb-1996  christos miscfs prototype changes
 1.4 19-Aug-1994  mycroft Convert hash tables.
 1.3 20-Jul-1994  mycroft Fix a null pointer dereference during rename(2).
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.2 19-Aug-1994  mycroft update from trunk
 1.2.2.1 20-Jul-1994  cgd update from trunk.
 1.10.4.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.10.4.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.15.4.1 02-Aug-1999  thorpej Update from trunk.
 1.16.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.16.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.18.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.18.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.18.2.4 11-Jan-2002  nathanw More catchup.
 1.18.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.18.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.18.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.20.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.20.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.20.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.26.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.26.6.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.26.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.26.6.1 03-Aug-2004  skrll Sync with HEAD
 1.28.2.3 02-Jul-2004  he Pull up revision 1.31 (requested by hannken in ticket #575):
Do LAYERFS_REMOVED for vop_rmdir.
 1.28.2.2 21-Jun-2004  tron Pull up revision 1.30 (requested by yamt in ticket #512):
do a LAYERFS_REMOVED hack for vop_rename as well.
 1.28.2.1 30-May-2004  tron Pull up revision 1.29 (requested by wrstuden in ticket #424):
Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).
We still need VOP_UPCALL(), but this is better than things were before.
This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.31.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.31.4.1 29-Apr-2005  kent sync with -current
 1.32.4.1 21-Jun-2006  yamt sync with head.
 1.34.98.2 05-Mar-2011  rmind sync with head
 1.34.98.1 03-Jul-2010  rmind sync with head
 1.34.96.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.34.74.1 11-Aug-2010  yamt sync with head.
 1.38.16.1 18-May-2014  rmind sync with head
 1.38.12.2 03-Dec-2017  jdolecek update from HEAD
 1.38.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.38.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.39.14.1 21-Apr-2017  bouyer Sync with HEAD
 1.39.10.2 26-Apr-2017  pgoyette Sync with HEAD
 1.39.10.1 20-Mar-2017  pgoyette Sync with HEAD
 1.39.6.2 28-Aug-2017  skrll Sync with HEAD
 1.39.6.1 05-Feb-2017  skrll Sync with HEAD
 1.41.6.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.1 20-Jan-2000  wrstuden branches: 1.1.6;
Add overlay, a layered file system which overlays itself on
the underlying fs, rather than exporting it to another part of the
directory name space.
 1.1.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.6.1 20-Jan-2000  bouyer file Makefile was added on branch thorpej_scsipi on 2000-11-20 18:09:47 +0000
 1.3 12-Oct-2014  uebayasi Define layerfs as an attribute.
 1.2 11-Oct-2014  uebayasi Define filesystem attributes with vfs dependency.
 1.1 16-Apr-2002  thorpej branches: 1.1.6; 1.1.8; 1.1.162;
Cleanup how file system configuration information is declared, grouping
related information together, with the file system code itself.

This is just low-hanging fruit -- more to come.
 1.1.162.1 03-Dec-2017  jdolecek update from HEAD
 1.1.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.8.1 16-Apr-2002  jdolecek file files.overlay was added on branch kqueue on 2002-06-23 17:50:11 +0000
 1.1.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.1.6.1 16-Apr-2002  nathanw file files.overlay was added on branch nathanw_sa on 2002-06-20 03:47:58 +0000
 1.9 11-Apr-2017  hannken Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.8 28-Jun-2008  rumble branches: 1.8.40; 1.8.60; 1.8.64; 1.8.68;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.7 11-Dec-2005  christos branches: 1.7.70; 1.7.74; 1.7.76; 1.7.78;
merge ktrace-lwp.
 1.6 30-Aug-2005  xtraeme Remove __P()
 1.5 20-May-2004  atatat branches: 1.5.12;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.4 07-Aug-2003  agc branches: 1.4.2;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.3 07-Jun-2001  wiz branches: 1.3.22;
Typos and grammer fixes in comments (misc/13133 by Michael K. Sanders)
 1.2 13-Mar-2000  soren branches: 1.2.6; 1.2.8;
Fix doubled 'the's in comments.
 1.1 20-Jan-2000  wrstuden Add overlay, a layered file system which overlays itself on
the underlying fs, rather than exporting it to another part of the
directory name space.
 1.2.8.1 21-Jun-2001  nathanw Catch up to -current.
 1.2.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.6.1 13-Mar-2000  bouyer file overlay.h was added on branch thorpej_scsipi on 2000-11-20 18:09:47 +0000
 1.3.22.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.3.22.1 03-Aug-2004  skrll Sync with HEAD
 1.4.2.1 23-May-2004  tron Pull up revision 1.5 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.5.12.1 21-Jun-2006  yamt sync with head.
 1.7.78.1 03-Jul-2008  simonb Sync with head.
 1.7.76.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.7.74.1 04-May-2009  yamt sync with head.
 1.7.70.1 29-Jun-2008  mjf Sync with HEAD.
 1.8.68.1 21-Apr-2017  bouyer Sync with HEAD
 1.8.64.1 26-Apr-2017  pgoyette Sync with HEAD
 1.8.60.1 28-Aug-2017  skrll Sync with HEAD
 1.8.40.1 03-Dec-2017  jdolecek update from HEAD
 1.74 16-Feb-2025  joe remove unecessary branches
 1.73 04-Nov-2022  hannken branches: 1.73.8;
Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105
 1.72 08-Jul-2022  hannken Don't use LK_RETRY as we need an active vnode here.
 1.71 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.70 21-Mar-2020  pgoyette branches: 1.70.2;
Finish the transition to SYSCTL_SETUP by removing local sysctllog
in favor of the one provided by the module infrastructure.
 1.69 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.68 20-Feb-2019  hannken Set "mnt_lower" before the first file system operation on the new file system.
 1.67 11-Apr-2017  hannken branches: 1.67.12;
Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.66 30-Mar-2017  hannken Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.
 1.65 06-Mar-2017  hannken Add field "mnt_lower" to "struct mount" to track the file system
a layered file system is mounted on.

Welcome to 7.99.65
 1.64 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.63 10-Nov-2014  maxv branches: 1.63.2; 1.63.4; 1.63.6;
Do not uselessly include <sys/malloc.h>.
 1.62 25-May-2014  hannken branches: 1.62.2;
Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.61 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.60 23-Mar-2014  hannken branches: 1.60.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.59 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.58 10-Feb-2014  hannken Change layerfs_vget(), layerfs_fhtovp() and the various layer xxx_mount()
functions to unlock/relock the node for the call to layer_node_create().

Finally remove dirty hacks (LK_NOWAIT, kpause) from layer_node_find().
 1.57 30-Apr-2012  rmind branches: 1.57.2; 1.57.4;
- Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.56 09-Jul-2010  hannken branches: 1.56.8; 1.56.12; 1.56.14; 1.56.18; 1.56.20;
Replace vget() with vref()/vn_lock(), this node already has a reference.
 1.55 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.54 05-Dec-2008  ad branches: 1.54.6; 1.54.8;
PR kern/40110: null, overlay and umap modules loading -> panic (layerfs symbols not there)

Add a layerfs module.
 1.53 28-Jun-2008  rumble branches: 1.53.2; 1.53.4; 1.53.6; 1.53.12; 1.53.16;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.52 13-May-2008  simonb branches: 1.52.2;
mnt_data is a pointer, set it to NULL not 0 when we're finished with it.
 1.51 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.50 05-May-2008  ad branches: 1.50.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.49 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.48 28-Jan-2008  dholland branches: 1.48.6; 1.48.8; 1.48.10;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.47 02-Jan-2008  ad Merge vmlocking2 to head.
 1.46 08-Dec-2007  ad branches: 1.46.4;
Destroy ovm_hashlock before freeing.
 1.45 26-Nov-2007  pooka branches: 1.45.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.44 10-Oct-2007  ad branches: 1.44.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.43 31-Jul-2007  pooka branches: 1.43.2; 1.43.4; 1.43.6; 1.43.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.42 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.41 17-Jul-2007  pooka branches: 1.41.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.40 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.39 08-Jul-2007  pooka * allow unmount even if rootvp has a usecount > 1 provided that
MNT_FORCE is given
* decrease cargo cult index by getting rid of commented sections
with mntflushbuf() in them - AFAICT the call was removed from our
kernel over 13 years ago with the 4.4BSDlite import
 1.38 08-Apr-2007  hannken Remove now obsolete vn_start_write() and vn_finished_write() and
corresponding flags.

Revert softdep_trackbufs() to its state before vn_start_write() was added.

Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.

Welcome to 4.99.17
 1.37 19-Jan-2007  hannken branches: 1.37.2; 1.37.6; 1.37.8; 1.37.10;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.36 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.35 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.34 03-Sep-2006  christos branches: 1.34.2; 1.34.4;
add missing initializers
 1.33 11-Dec-2005  christos branches: 1.33.4; 1.33.8;
merge ktrace-lwp.
 1.32 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.31 30-Aug-2005  xtraeme Remove __P()
 1.30 29-Mar-2005  thorpej branches: 1.30.2;
- Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.29 26-Feb-2005  perry nuke trailing whitespace
 1.28 02-Jan-2005  thorpej branches: 1.28.2; 1.28.4;
Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.27 01-Jul-2004  hannken Keep a pointer to the leaf mount. Needed for write gating where a
file system gets suspended and has layered mounts above it.

Welcome to 2.0G

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.26 29-May-2004  wrstuden Add layerfs_snapshot() as a handler routine for VFS_SNAPSHOT() calls
through a layered file system.

Note: we don't actually support snapshots through a layered file system,
and this routine returns an error. However we: 1) have clearly documented
what needs fixing (which isn't trivial to fix) and 2) if we do fix
this, all layered file systems can take advantage of it at once.
 1.25 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.24 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.23 29-Apr-2004  jrf Removed remaining caddr_t casts we do not need in miscfs. Recompiled
kernel and ran for a day or so. There are still some caddr_t types in
the arguments of some calls, I will do those separately (later) as
they touch a lot more of the system.
Approved by christos@NetBSD.org.
 1.22 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.21 24-Mar-2004  atatat branches: 1.21.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.20 09-Mar-2004  atatat Remove pointless comment about layerfs_sysctl()
 1.19 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.18 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.17 29-Jun-2003  fvdl branches: 1.17.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.16 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.15 29-Jun-2003  thorpej Adjust for ktrace/lwp changes.
 1.14 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.13 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.12 21-Sep-2002  christos MNT_GETARGS support
 1.11 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.10 15-Nov-2001  lukem branches: 1.10.8;
don't need <sys/types.h> when including <sys/param.h>
 1.9 10-Nov-2001  lukem add RCSIDs
 1.8 15-Sep-2001  chs branches: 1.8.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.7 07-Jun-2001  wiz branches: 1.7.2; 1.7.4;
Typos and grammer fixes in comments (misc/13133 by Michael K. Sanders)
 1.6 22-Jan-2001  jdolecek branches: 1.6.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.5 08-Nov-2000  ad branches: 1.5.2;
Update for hashinit() change.
 1.4 10-Jun-2000  assar make vfs_getnewfsid only take one argument and fetch the name of the
filesystem from the supplied mount argument. also make makefstype
take a const parameter. update all the callers.
 1.3 16-Mar-2000  jdolecek branches: 1.3.2;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.2 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.1 20-Jan-2000  wrstuden Add overlay, a layered file system which overlays itself on
the underlying fs, rather than exporting it to another part of the
directory name space.
 1.3.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.5.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.5.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.5.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.2.1 08-Nov-2000  bouyer file overlay_vfsops.c was added on branch thorpej_scsipi on 2000-11-20 18:09:47 +0000
 1.6.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.6.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.6.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.6.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.6.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.6.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.7.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.7.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.7.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.7.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.10.8.1 29-Aug-2002  gehenna catch up with -current.
 1.17.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.17.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.17.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.17.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.17.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.17.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.17.2.3 03-Aug-2004  skrll Sync with HEAD
 1.17.2.2 03-Jul-2003  wrstuden LWP-ify. Changes all seem to be catching up wiht recent set_statfs_info()
chances.
 1.17.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.21.2.1 29-May-2004  tron Pull up revision 1.24 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.28.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.28.2.1 29-Apr-2005  kent sync with -current
 1.30.2.8 04-Feb-2008  yamt sync with head.
 1.30.2.7 21-Jan-2008  yamt sync with head
 1.30.2.6 07-Dec-2007  yamt sync with head
 1.30.2.5 27-Oct-2007  yamt sync with head.
 1.30.2.4 03-Sep-2007  yamt sync with head.
 1.30.2.3 26-Feb-2007  yamt sync with head.
 1.30.2.2 30-Dec-2006  yamt sync with head.
 1.30.2.1 21-Jun-2006  yamt sync with head.
 1.33.8.1 14-Sep-2006  yamt sync with head.
 1.33.4.1 09-Sep-2006  rpaulo sync with head
 1.34.4.2 10-Dec-2006  yamt sync with head.
 1.34.4.1 22-Oct-2006  yamt sync with head
 1.34.2.2 01-Feb-2007  ad Sync with head.
 1.34.2.1 18-Nov-2006  ad Sync with head.
 1.37.10.1 09-Dec-2007  reinoud Pullup to HEAD
 1.37.8.1 11-Jul-2007  mjf Sync with head.
 1.37.6.6 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.37.6.5 20-Aug-2007  ad Sync with HEAD.
 1.37.6.4 15-Jul-2007  ad Sync with head.
 1.37.6.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.37.6.2 10-Apr-2007  ad Sync with head.
 1.37.6.1 05-Apr-2007  ad Compile fixes.
 1.37.2.1 15-Apr-2007  yamt sync with head.
 1.41.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.43.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.43.8.1 31-Jul-2007  pooka file overlay_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:17 +0000
 1.43.6.1 14-Oct-2007  yamt sync with head.
 1.43.4.3 23-Mar-2008  matt sync with HEAD
 1.43.4.2 09-Jan-2008  matt sync with HEAD
 1.43.4.1 06-Nov-2007  matt sync with HEAD
 1.43.2.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.43.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.43.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.44.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.44.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.44.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.45.2.2 08-Dec-2007  ad Sync with head.
 1.45.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.46.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.10.3 11-Aug-2010  yamt sync with head.
 1.48.10.2 04-May-2009  yamt sync with head.
 1.48.10.1 16-May-2008  yamt sync with head.
 1.48.8.1 18-May-2008  yamt sync with head.
 1.48.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.48.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.48.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.50.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.50.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.52.2.1 03-Jul-2008  simonb Sync with head.
 1.53.16.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.53.12.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.53.6.1 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.53.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.53.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.54.8.2 05-Mar-2011  rmind sync with head
 1.54.8.1 03-Jul-2010  rmind sync with head
 1.54.6.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.56.20.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.56.18.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.56.14.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.56.12.1 02-Jun-2012  mrg sync to latest -current.
 1.56.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.56.8.1 23-May-2012  yamt sync with head.
 1.57.4.1 18-May-2014  rmind sync with head
 1.57.2.2 03-Dec-2017  jdolecek update from HEAD
 1.57.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.60.2.1 10-Aug-2014  tls Rebase.
 1.62.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.63.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.63.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.63.4.1 20-Mar-2017  pgoyette Sync with HEAD
 1.63.2.1 28-Aug-2017  skrll Sync with HEAD
 1.67.12.3 21-Apr-2020  martin Sync with HEAD
 1.67.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.67.12.1 10-Jun-2019  christos Sync with HEAD
 1.70.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.73.8.1 02-Aug-2025  perseant Sync with HEAD
 1.25 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.24 04-Jun-2017  hannken Locking a layer vnode using the regular bypass routine is no longer
racy. Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
 1.23 30-Mar-2017  hannken branches: 1.23.6;
Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
 1.22 27-Jan-2017  hannken Handle v_writecount from layer_open(), layer_close() and layer_revoke()
so lower file system vnodes get marked as open for writing.
 1.21 10-Nov-2014  maxv branches: 1.21.2; 1.21.4; 1.21.6;
Do not uselessly include <sys/malloc.h>.
 1.20 27-Feb-2014  hannken branches: 1.20.4;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.19 11-Jul-2011  hannken branches: 1.19.2; 1.19.12; 1.19.16;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.18 10-Jan-2011  hannken Add layer_revoke() that adjusts the lower vnode use count to be at least as
high as the upper vnode count before passing down the VOP_REVOKE().

This way vclean() check for active (vp->v_usecount > 1) vnodes gets it right.

Should fix PR kern/43456.
 1.17 02-Jul-2010  hannken LK_INTERLOCK is no longer a valid flag for VOP_LOCK(). This makes
layer_*lock*() obsolete. Remove them and handle lock operations
with the generic bypass function.

Ride 5.99.34.
 1.16 11-Dec-2005  christos branches: 1.16.74; 1.16.96; 1.16.98;
merge ktrace-lwp.
 1.15 30-Aug-2005  xtraeme Remove __P()
 1.14 30-Jun-2004  hannken branches: 1.14.12;
Do LAYERFS_REMOVED for vop_rmdir.

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.13 07-Jun-2004  yamt do a LAYERFS_REMOVED hack for vop_rename as well.
 1.12 28-May-2004  wrstuden Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).

We still need VOP_UPCALL(), but this is better than things were before.

This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.11 25-Jan-2004  hannken branches: 1.11.2;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.10 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.9 04-Jan-2002  chs branches: 1.9.16;
add the entry for layer_getpages() to the VOP tables of the
layered file systems that need it.
 1.8 06-Dec-2001  chs add VOP_GETPAGES and VOP_PUTPAGES methods for layered filesystems.
drop the interlock on the upper layer, acquire the interlock on the
lower layer.
 1.7 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.6 10-Nov-2001  lukem add RCSIDs
 1.5 09-Jun-2001  wiz branches: 1.5.2; 1.5.6;
Some more corrections by Michael K. Sanders.
 1.4 07-Jun-2001  wiz Typos and grammer fixes in comments (misc/13133 by Michael K. Sanders)
 1.3 22-Jan-2001  jdolecek branches: 1.3.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.2 13-Mar-2000  soren branches: 1.2.6;
Fix doubled 'the's in comments.
 1.1 20-Jan-2000  wrstuden Add overlay, a layered file system which overlays itself on
the underlying fs, rather than exporting it to another part of the
directory name space.
 1.2.6.3 11-Feb-2001  bouyer Sync with HEAD.
 1.2.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.6.1 13-Mar-2000  bouyer file overlay_vnops.c was added on branch thorpej_scsipi on 2000-11-20 18:09:48 +0000
 1.3.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.3.2.4 11-Jan-2002  nathanw More catchup.
 1.3.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.3.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.3.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.5.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.5.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.16.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.9.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.16.1 03-Aug-2004  skrll Sync with HEAD
 1.11.2.3 02-Jul-2004  he Pull up revision 1.14 (requested by hannken in ticket #575):
Do LAYERFS_REMOVED for vop_rmdir.
 1.11.2.2 21-Jun-2004  tron Pull up revision 1.13 (requested by yamt in ticket #512):
do a LAYERFS_REMOVED hack for vop_rename as well.
 1.11.2.1 30-May-2004  tron Pull up revision 1.12 (requested by wrstuden in ticket #424):
Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).
We still need VOP_UPCALL(), but this is better than things were before.
This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.14.12.1 21-Jun-2006  yamt sync with head.
 1.16.98.2 05-Mar-2011  rmind sync with head
 1.16.98.1 03-Jul-2010  rmind sync with head
 1.16.96.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.16.74.1 11-Aug-2010  yamt sync with head.
 1.19.16.1 18-May-2014  rmind sync with head
 1.19.12.2 03-Dec-2017  jdolecek update from HEAD
 1.19.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.19.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.20.4.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.21.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.21.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.21.4.1 20-Mar-2017  pgoyette Sync with HEAD
 1.21.2.2 28-Aug-2017  skrll Sync with HEAD
 1.21.2.1 05-Feb-2017  skrll Sync with HEAD
 1.23.6.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.6 17-Apr-2003  jdolecek g/c, it's outdated and the info wouldn't belong here anyway
 1.5 12-Mar-1999  christos PR/7143: Jaromir Docelek: Add procfs/cmdline from Linux emulation
 1.4 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.3 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.2 20-Jan-1994  ws Make procfs really work for debugging.
Implement not & notepg files in procfs.
 1.1 05-Jan-1994  cgd branches: 1.1.1;
add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13 30-Mar-2019  christos add a node for the process resource limits.
 1.12 28-Aug-2017  kamil branches: 1.12.4;
Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
 1.11 30-Mar-2017  christos branches: 1.11.6;
add an auxv node.
 1.10 02-Nov-2016  pgoyette branches: 1.10.2;
* Split sys/kern/sys_process.c into three parts:
1 - ptrace(2) syscall for native emulation
2 - common ptrace(2) syscall code (shared with compat_netbsd32)
3 - support routines that are shared with PROCFS and/or KTRACE

* Add module glue for #1 and #2. Both modules will be built-in to the
kernel if "options PTRACE" is included in the config file (this is
the default, defined in sys/conf/std).

* Mark the ptrace(2) syscall as modular in syscalls.master (generated
files will be committed shortly).

* Conditionalize all remaining portions of PTRACE code on a new kernel
option PTRACE_HOOKS.

XXX Instead of PROCFS depending on 'options PTRACE', we should probably
just add a procfs attribute to the sys/kern/sys_process.c file's
entry in files.kern, and add PROCFS to the "#if defineds" for
process_domem(). It's really confusing to have two different ways
of requiring this file.
 1.9 11-Oct-2014  uebayasi branches: 1.9.2; 1.9.4;
Define filesystem attributes with vfs dependency.
 1.8 30-Aug-2006  cube branches: 1.8.102;
Restore dependency on PTRACE for PROCFS.
Bump required config(1) version.
 1.7 30-Aug-2006  jnemeth revert previous as it breaks the build due to invalid syntax
 1.6 29-Aug-2006  matt Make PTRACE and COREDUMP optional. Make the default (status quo) by putting
them in conf/std.
 1.5 11-Dec-2005  christos branches: 1.5.4; 1.5.8;
merge ktrace-lwp.
 1.4 26-Feb-2005  perry branches: 1.4.4;
nuke trailing whitespace
 1.3 03-Jan-2003  christos branches: 1.3.2; 1.3.10; 1.3.12;
Implement /proc/<pid>/fd/<n>. This is work in progress. Questionable things:
- Is it ok to convert DTYPE_PIPE to VFIFO and DTYPE_SOCKET to VSOCK?
- XXX: Avoid locking issue in ls -Rl /proc by avoiding curproc
- Does I/O to pipes work?
- XXX: Are there security implications?
 1.2 09-May-2002  thorpej branches: 1.2.6; 1.2.8;
Move code shared by procfs and the kernel proper out of procfs and
into the kernel proper (renaming functions from procfs_* to process_*).
 1.1 16-Apr-2002  thorpej Cleanup how file system configuration information is declared, grouping
related information together, with the file system code itself.

This is just low-hanging fruit -- more to come.
 1.2.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.2.8.1 09-May-2002  jdolecek file files.procfs was added on branch kqueue on 2002-06-23 17:50:12 +0000
 1.2.6.3 07-Jan-2003  thorpej Sync with HEAD.
 1.2.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.2.6.1 09-May-2002  nathanw file files.procfs was added on branch nathanw_sa on 2002-06-20 03:48:00 +0000
 1.3.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.3.10.1 29-Apr-2005  kent sync with -current
 1.3.2.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.4.4.1 30-Dec-2006  yamt sync with head.
 1.5.8.1 03-Sep-2006  yamt sync with head.
 1.5.4.1 09-Sep-2006  rpaulo sync with head
 1.8.102.1 03-Dec-2017  jdolecek update from HEAD
 1.9.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.9.4.1 04-Nov-2016  pgoyette Sync with HEAD
 1.9.2.2 28-Aug-2017  skrll Sync with HEAD
 1.9.2.1 05-Dec-2016  skrll Sync with HEAD
 1.10.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.6.1 12-Apr-2018  martin Pull up following revision(s) (requested by kamil in ticket #713):

sys/modules/procfs/Makefile: revision 1.4
sys/miscfs/procfs/procfs_vfsops.c: revision 1.98
bin/ps/ps.1: revision 1.108
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.32
sys/miscfs/procfs/procfs_vnops.c: revision 1.198
sys/kern/sys_ptrace_common.c: revision 1.23
sys/kern/sys_ptrace_common.c: revision 1.24
sbin/mount_procfs/mount_procfs.8: revision 1.36
sys/kern/sys_ptrace_common.c: revision 1.25
sys/kern/sys_ptrace.c: revision 1.5
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.30
sys/sys/proc.h: revision 1.342
sys/kern/sys_ptrace_common.c: revision 1.26
sys/miscfs/procfs/procfs_ctl.c: file removal
sys/kern/sys_ptrace_common.c: revision 1.27
sys/miscfs/procfs/procfs_subr.c: revision 1.109
sys/kern/sys_ptrace_common.c: revision 1.28
sys/secmodel/extensions/secmodel_extensions.c: revision 1.8
sys/kern/sys_ptrace_common.c: revision 1.29
sys/sys/ptrace.h: revision 1.62
sys/compat/netbsd32/netbsd32_signal.c: revision 1.45
share/man/man9/kauth.9: revision 1.109
sys/miscfs/procfs/files.procfs: revision 1.12
sys/compat/netbsd32/netbsd32.h: revision 1.115
sys/miscfs/procfs/procfs.h: revision 1.72
sys/compat/netbsd32/netbsd32_ptrace.c: revision 1.5
sys/kern/kern_sig.c: revision 1.337
sys/sys/kauth.h: revision 1.75
sys/sys/sysctl.h: revision 1.224
sys/kern/sys_ptrace_common.c: revision 1.30
sys/kern/sys_ptrace_common.c: revision 1.31
sys/kern/sys_ptrace_common.c: revision 1.32
sys/kern/sys_ptrace_common.c: revision 1.33
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.20
sys/kern/sys_ptrace_common.c: revision 1.34
sys/kern/sys_ptrace_common.c: revision 1.36
sys/kern/kern_proc.c: revision 1.207
sys/kern/kern_exit.c: revision 1.269
doc/TODO.ptrace: revision 1.29

Make {s,g}et{db,fp,}regs work again for PK_32 processes
XXX: pullup-8

add disgusting magic to handle compat_netbsd32 as a module.

use process_*reg32 instead of struct *reg32.

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed

PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>

untangle the mess:
- factor out common code
- break each ptrace subcall to its own sub-function
.. more to come ...
- reduce ifdef ugliness by moving it up top.
- factor out PT_IO and make PT_{READ,WRITE}_{I,D} use it
- factor out PT_DUMPCORE
- factor out sendsig code
.. more to come ...

handle siginfo requests for ptrace32

ptrace: Partially undo PT_{READ,WRITE}_{I,D} and unbreak these commands

The refactored code did not work and was generating EFAULT.

Sponsored by <The NetBSD Foundation>

Merge the code back; the problem was that since we are reading/writing
to a kernel address for PT_{READ,WRITE}_{I,D} we need the kernel vmspace.
provide separate read and write functions to accomodate register functions
that need a size argument.

don't ignore error from copyout_piod

Use the proper process (the tracee) to get information about lwps and
registers and the tracer for vmspace.

Add new sysctl(3) entry: security.models.extensions.user_set_dbregs

Model this new sysctl(3) entry after "user_set_cpu_affinity" in the same
level of sysctl(3) switches.

Allow to read unconditionally Debug Registers (no change here). This is
convenient as even if a user of a debugger does not use hardware assisted
watchpoints/breakpoints, a debugger can still prompt these values to store
in an internal cache with context of registers. Reading them should have
no security concerns.

Add a paranoid MI switch that prohibits by default setting these registers
by a regular user (non-superuser). Make this switch disabled by default.
There are enough reserved bits out there to allow using them
unconditionally on hardened hosts.

Features shipped with Debug Registers are optional features in debuggers.
There is no reduction in elementary functionality.

Reviewed by <christos>

Sponsored by <The NetBSD Foundation>
 1.12.4.1 10-Jun-2019  christos Sync with HEAD
 1.7 05-Jan-1994  mycroft Clean up deleted files.
 1.6 07-Sep-1993  ws Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.5 26-Aug-1993  pk Implement setattr: mode for process entries; mode + uid/gid for the
PROCFS root directory.
Fixed omission in pfs_root() which came to light as a result of the above:
hold on to vnode for root dir.
 1.4 25-Aug-1993  pk Fixed improperly initialized nfsnode in pfs_lookup()
 1.3 24-Aug-1993  pk copyright update.
 1.2 24-Aug-1993  pk Rcs Id added.
 1.1 24-Aug-1993  pk Initial version of a proc filesystem.
 1.87 01-Jul-2024  christos Add linux POSIX message queue support (Ricardo Branco)
 1.86 12-May-2024  christos branches: 1.86.2;
PR/58227: Ricardo Branco: Add support for proc/sysvipc in Linux emulator
 1.85 12-May-2024  christos PR/58240: Ricardo Branco: Add support for proc/self/limits as used by Linux
 1.84 17-Jan-2024  hannken Using the exechook to revoke procfs nodes is racy and may deadlock:

one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
 1.83 17-Jan-2024  hannken Add a hashmap to access all procfs nodes by pid.
 1.82 19-Jan-2022  martin branches: 1.82.4;
Now that an inline function dereferences it, make sure struct proc
is declared by including sys/proc.h here.
 1.81 17-Jan-2022  bouyer If the calling process is running under linux emulation, make /proc/xxx/fd/
return only symlinks pointing to the original file in the filesystem,
instead of a hard link. This matches the linux behavior, and some
linux programs relies on it (they unconditionally call readlink() on
/proc/xxx/fd/yy and don't deal with it returning EINVAL).
Proposed on tech-kern@ in
http://mail-index.netbsd.org/tech-kern/2022/01/11/msg027877.html
 1.80 29-Apr-2020  riastradh Put forward declaration a little further forward to unbreak build.
 1.79 29-Apr-2020  thorpej If the procfs mount is marked as linux-compat, then allow proc lookup
by any LWP ID in the proc, not just the canonical PID.
 1.78 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.77 26-Sep-2019  christos branches: 1.77.2;
Rewrite the procfs_fileno as an inline function to make it more clear what
it does...
 1.76 25-Apr-2019  mlelstv Restore mapping of file id to pid/type/fd.
Use 64bit file id to allow for 32bit fd and 25-26bit pid.
 1.75 30-Mar-2019  christos add a node for the process resource limits.
 1.74 31-Dec-2017  christos branches: 1.74.4;
rename some "cmdline" stuff now that it is used to print environment too
 1.73 31-Dec-2017  christos Add an environ node
 1.72 28-Aug-2017  kamil Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
 1.71 30-Mar-2017  christos branches: 1.71.6;
add an auxv node.
 1.70 27-Jul-2014  hannken branches: 1.70.4; 1.70.8; 1.70.12;
Change procfs from hashlist to vcache.
- Key is (type, pid, fd)
- Remove argument "p" from procfs_allocvp(). It is only used
when "type == PFSfd". Lookup the proc with proc_find() when
procfs_loadvnode() needs it.
- Use a vfs_vnode_iterator for procfs_revoke_vnodes().
 1.69 05-Apr-2014  christos branches: 1.69.2;
On my 24 proc box I got ENOSPC, so make the routine return the size it wants
and try again.
 1.68 28-May-2012  christos branches: 1.68.2; 1.68.4;
add a task process subdirectory for emul linux
 1.67 27-Sep-2011  christos branches: 1.67.2; 1.67.6;
define PROCFS_MAXNAMLEN and use it.
 1.66 04-Sep-2011  jmcneill PR# kern/45021: Please support /emul/linux/proc/version

Add /proc/version for procfs with -o linux. The version reported depends
on the emulation type of the calling process:

$ cat /proc/version
NetBSD version 5.99.55 (netbsd@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) NetBSD 5.99.55 (GENERIC) #39: Sun Sep 4 09:10:05 EDT 2011

$ /emul/linux/bin/cat /proc/version
Linux version 2.6.18 (linux@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010

$ /emul/linux32/bin/cat /proc/version
Linux version 2.6.18 (linux32@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010
 1.65 28-Jun-2008  rumble Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.64 24-May-2007  agc branches: 1.64.28; 1.64.32; 1.64.34; 1.64.36;
Extend the Linux emulation of /proc to include

/proc/stat
/proc/loadavg and
/proc/<pid>/statm.

These are only present when -o linux is specified as a mount option
to procfs.

Factor out some common code so that it can be used by a number of
functions.

XXX The values returned in the statm emulation need to be verified.
 1.63 09-Feb-2007  ad branches: 1.63.6; 1.63.8;
Merge newlock2 to head.
 1.62 29-Oct-2006  christos add an "emul" file node.
 1.61 25-Oct-2006  christos 1. fix procfs_validfile{,_linux} to test for NULL pointers properly.
2. make "exe" entry be a symlink to the executable, instead of pointing
directly to the vnode of the executable.
3. factor out commonly used code.
 1.60 20-Sep-2006  manu Emulate Linux's /proc/devices
 1.59 11-Dec-2005  christos branches: 1.59.20; 1.59.22;
merge ktrace-lwp.
 1.58 01-Oct-2005  atatat Add "cwd" and "root" symlinks to each process's directory. The cwd
link points to the process's current working directory, and the root
link points to the process's root directory. What else would you
expect?

For directories that are out of reach (caller is in a chroot, target
process is in a different chroot, etc), the links point to "/"
instead.
 1.57 30-Aug-2005  xtraeme Remove __P()
 1.56 20-Sep-2004  jdolecek branches: 1.56.12;
add 'mounts' file for -o linux, which lists all currently mounted
filesystems; Linux glibc statvfs() uses this to get some of mount flags,
and this file is also useful as /emul/linux/etc/mtab (via symlink)
 1.55 27-Aug-2004  skrll Do previous slightly differently - just pass a struct lwp * and derive the
struct proc *.

OK'd by Jaromir.
 1.54 21-Aug-2004  jdolecek fix process used for /proc/<pid>/stat contents - it should be process
<pid>, not the current process looking at the information
 1.53 20-May-2004  atatat Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.52 10-Dec-2003  drochner branches: 1.52.2;
a little bit more namespace sanity
 1.51 03-Oct-2003  yamt terminate snprintb 'new' format strings correctly.
(fixes overrun in mount_*)
 1.50 27-Sep-2003  mycroft Put pfsnode in the #ifdef _KERNEL too, so this actually compiles.
 1.49 27-Sep-2003  darcy Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.48 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.47 29-Jun-2003  fvdl branches: 1.47.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.46 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.45 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.44 28-May-2003  christos Add /proc/<pid>/stat for linux compat. j2sdk1.4.2 depends on it.
 1.43 18-Apr-2003  jdolecek change PROCFS_FILENO() to use 5 bits for 'type', since there are more than
16 types nowadays (i.e. Pfd is 17)
 1.42 17-Apr-2003  jdolecek use fd_getfile() in procfs_getfp(), and FILE_USE()/FILE_UNUSE() the
returned file descriptor pointer appropriately
 1.41 25-Feb-2003  jrf This addresses PR kerm/19989. Thanks to hamajima@nagoya.ydc.co.jp for submitting this patch which enables /proc/uptime for linux emul. Patch reviewed by atatat@netbsd.org and tron@netbsd.org, approved by tron@netbsd.org.
 1.40 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.39 03-Jan-2003  christos Implement /proc/<pid>/fd/<n>. This is work in progress. Questionable things:
- Is it ok to convert DTYPE_PIPE to VFIFO and DTYPE_SOCKET to VSOCK?
- XXX: Avoid locking issue in ls -Rl /proc by avoiding curproc
- Does I/O to pipes work?
- XXX: Are there security implications?
 1.38 21-Sep-2002  christos MNT_GETARGS support
 1.37 09-May-2002  thorpej Move code shared by procfs and the kernel proper out of procfs and
into the kernel proper (renaming functions from procfs_* to process_*).
 1.36 05-Dec-2001  thorpej * Allow machine-dependent code to specify hooks for ptrace(2)
(__HAVE_PTRACE_MACHDEP) and procfs (__HAVE_PROCFS_MACHDEP).
These changes will allow platforms like x86 (XMM) and PowerPC
(AltiVec) to export extended register sets in a sane manner.

* Use __HAVE_PTRACE_MACHDEP to export x86 XMM registers (standard
FP + SSE/SSE2) using PT_{GET,SET}XMMREGS (in the machdep
ptrace request space).
* Use __HAVE_PROCFS_MACHDEP to export x86 XMM registers via
/proc/N/xmmregs in procfs.
 1.35 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.34 29-Mar-2001  fvdl branches: 1.34.2; 1.34.4;
For -o linux mounts, add some code to emulate /proc/#/maps.
Needs NAMECACHE_ENTER_REVERSE to include filenames.
 1.33 25-Jan-2001  jdolecek branches: 1.33.2;
g/c pmnt_mp in struct procfs_args
 1.32 18-Jan-2001  jdolecek constify
 1.31 17-Jan-2001  fvdl Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.30 24-Nov-2000  chs remove dead code and other misc cleanup.
 1.29 16-Mar-2000  jdolecek branches: 1.29.4;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.28 25-Jan-2000  fvdl At mount/unmount time, add an exec hook to revoke all vnodes iff the
process is about to exec a sugid binary.

To speed up things, use hashing for vnode allocation, like other filesystems
do. This avoids walking the whole procfs node list in the revoke case too.
 1.27 02-Sep-1999  thorpej branches: 1.27.2;
Make /proc/self a symlink to /proc/curproc. I've observed Linux programs
that expect /proc/self/cmdline to exist.
 1.26 24-Mar-1999  mrg branches: 1.26.2;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.25 13-Mar-1999  thorpej Expose procfs_rwmem(). (This function will go away entirely when we
delete Mach VM.)
 1.24 12-Mar-1999  christos PR/7143: Jaromir Docelek: Add procfs/cmdline from Linux emulation
 1.23 25-Jan-1999  msaitoh Add /proc/#/map. From FreeBSD.
 1.22 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.21 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.20 27-Aug-1997  thorpej Fix a reversed argument which caused procfs_checkioperm() to always return
"OK". Add a few comments to avoid further confusion.
 1.19 12-Aug-1997  thorpej Fix the procfs hole described on current-users, similar to a fix for
FreeBSD by Sean Eric Fagan, but a bit different. This makes the checks
in the same places as sef's FreeBSD patch, but does not hardcode the
"kmem" group into the kernel, and also does a check identical to the
(3) and (4) checks in the NetBSD ptrace(2):

(1) it's not owned by you, or is set-id on exec (unless
you're root), or

(2) it's init, which controls the security level of the
entire system, and the system was not compiled with
permanently insecure mode turned on.
 1.18 08-May-1997  mycroft branches: 1.18.4;
Pass the vnode type to vaccess(), and use it when checking VEXEC. Make sure
that the mode bits passed to vaccess() and returned by foo_getattr() contain
only permission bits.
 1.17 12-Feb-1996  christos close PR/2063: procfs_rw prototyped twice with different prototypes
 1.16 09-Feb-1996  christos miscfs prototype changes
 1.15 09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.14 09-Oct-1995  mycroft Add support for cookies, mostly from Greg Hudson.
 1.13 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.12 29-Oct-1994  cgd light clean; make sure headers are properly included, types are OK, etc.
 1.11 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.10 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.9 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.8 12-Apr-1994  cgd be a bit smarter about determining if files shouldn't be seen by the user.
Also, DON'T allow a lookup to succeed on a file that's not visible!
 1.7 06-Feb-1994  ws If you add files, be sure to have enough bits to encode an inode number!
 1.6 28-Jan-1994  cgd make a fpregs file.
 1.5 20-Jan-1994  ws Make procfs really work for debugging.
Implement not & notepg files in procfs.
 1.4 11-Jan-1994  ws Fix ugliness left over from my last mod
 1.3 09-Jan-1994  ws Bug fixes and enhancements:
Make NFS serving work (BUT DON'T USE "attach" TO /proc/*/ctl FOR NOW!!!)
Make `curproc' a symbolic link
Add `.' and `..' entries to the directories.
Return better guesses on the size of the files.
 1.2 05-Jan-1994  cgd fix UFS vs 'real' fs type mixups
 1.1 05-Jan-1994  cgd branches: 1.1.1;
add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.18.4.2 28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.18.4.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.26.2.1 01-Feb-2000  he Pull up revision 1.28 (via patch, requested by fvdl):
Close procfs security hole. Fixes SA#2000-001.
 1.27.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.27.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.27.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.27.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.27.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.29.4.1 30-Mar-2001  he Pull up revision 1.31 (requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.33.2.10 07-Jan-2003  thorpej Sync with HEAD.
 1.33.2.9 15-Oct-2002  nathanw Make _validfoo() routines go back to taking a proc.
 1.33.2.8 06-Oct-2002  thorpej Sync with HEAD.
 1.33.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.33.2.6 01-Apr-2002  nathanw procfs_domem() should take proc *, proc *; not proc *, lwp *.
 1.33.2.5 09-Jan-2002  nathanw Adapt procfs_machdep_rw() to LWPs.
 1.33.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.33.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.33.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.33.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.34.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.34.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.34.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.34.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.47.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.47.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.47.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.47.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.47.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.47.2.3 18-Aug-2004  skrll Revert to passing struct proc for {exit,exec}hook.
 1.47.2.2 03-Aug-2004  skrll Sync with HEAD
 1.47.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.52.2.1 23-May-2004  tron Pull up revision 1.53 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.56.12.4 03-Sep-2007  yamt sync with head.
 1.56.12.3 26-Feb-2007  yamt sync with head.
 1.56.12.2 30-Dec-2006  yamt sync with head.
 1.56.12.1 21-Jun-2006  yamt sync with head.
 1.59.22.2 10-Dec-2006  yamt sync with head.
 1.59.22.1 22-Oct-2006  yamt sync with head
 1.59.20.2 18-Nov-2006  ad Sync with head.
 1.59.20.1 17-Nov-2006  ad Checkpoint work in progress.
 1.63.8.1 11-Jul-2007  mjf Sync with head.
 1.63.6.1 08-Jun-2007  ad Sync with head.
 1.64.36.1 03-Jul-2008  simonb Sync with head.
 1.64.34.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.64.32.1 04-May-2009  yamt sync with head.
 1.64.28.1 29-Jun-2008  mjf Sync with HEAD.
 1.67.6.1 02-Jun-2012  mrg sync to latest -current.
 1.67.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.67.2.1 30-Oct-2012  yamt sync with head
 1.68.4.1 18-May-2014  rmind sync with head
 1.68.2.2 03-Dec-2017  jdolecek update from HEAD
 1.68.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.69.2.1 10-Aug-2014  tls Rebase.
 1.70.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.70.8.1 26-Apr-2017  pgoyette Sync with HEAD
 1.70.4.1 28-Aug-2017  skrll Sync with HEAD
 1.71.6.1 12-Apr-2018  martin Pull up following revision(s) (requested by kamil in ticket #713):

sys/modules/procfs/Makefile: revision 1.4
sys/miscfs/procfs/procfs_vfsops.c: revision 1.98
bin/ps/ps.1: revision 1.108
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.32
sys/miscfs/procfs/procfs_vnops.c: revision 1.198
sys/kern/sys_ptrace_common.c: revision 1.23
sys/kern/sys_ptrace_common.c: revision 1.24
sbin/mount_procfs/mount_procfs.8: revision 1.36
sys/kern/sys_ptrace_common.c: revision 1.25
sys/kern/sys_ptrace.c: revision 1.5
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.30
sys/sys/proc.h: revision 1.342
sys/kern/sys_ptrace_common.c: revision 1.26
sys/miscfs/procfs/procfs_ctl.c: file removal
sys/kern/sys_ptrace_common.c: revision 1.27
sys/miscfs/procfs/procfs_subr.c: revision 1.109
sys/kern/sys_ptrace_common.c: revision 1.28
sys/secmodel/extensions/secmodel_extensions.c: revision 1.8
sys/kern/sys_ptrace_common.c: revision 1.29
sys/sys/ptrace.h: revision 1.62
sys/compat/netbsd32/netbsd32_signal.c: revision 1.45
share/man/man9/kauth.9: revision 1.109
sys/miscfs/procfs/files.procfs: revision 1.12
sys/compat/netbsd32/netbsd32.h: revision 1.115
sys/miscfs/procfs/procfs.h: revision 1.72
sys/compat/netbsd32/netbsd32_ptrace.c: revision 1.5
sys/kern/kern_sig.c: revision 1.337
sys/sys/kauth.h: revision 1.75
sys/sys/sysctl.h: revision 1.224
sys/kern/sys_ptrace_common.c: revision 1.30
sys/kern/sys_ptrace_common.c: revision 1.31
sys/kern/sys_ptrace_common.c: revision 1.32
sys/kern/sys_ptrace_common.c: revision 1.33
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.20
sys/kern/sys_ptrace_common.c: revision 1.34
sys/kern/sys_ptrace_common.c: revision 1.36
sys/kern/kern_proc.c: revision 1.207
sys/kern/kern_exit.c: revision 1.269
doc/TODO.ptrace: revision 1.29

Make {s,g}et{db,fp,}regs work again for PK_32 processes
XXX: pullup-8

add disgusting magic to handle compat_netbsd32 as a module.

use process_*reg32 instead of struct *reg32.

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed

PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>

untangle the mess:
- factor out common code
- break each ptrace subcall to its own sub-function
.. more to come ...
- reduce ifdef ugliness by moving it up top.
- factor out PT_IO and make PT_{READ,WRITE}_{I,D} use it
- factor out PT_DUMPCORE
- factor out sendsig code
.. more to come ...

handle siginfo requests for ptrace32

ptrace: Partially undo PT_{READ,WRITE}_{I,D} and unbreak these commands

The refactored code did not work and was generating EFAULT.

Sponsored by <The NetBSD Foundation>

Merge the code back; the problem was that since we are reading/writing
to a kernel address for PT_{READ,WRITE}_{I,D} we need the kernel vmspace.
provide separate read and write functions to accomodate register functions
that need a size argument.

don't ignore error from copyout_piod

Use the proper process (the tracee) to get information about lwps and
registers and the tracer for vmspace.

Add new sysctl(3) entry: security.models.extensions.user_set_dbregs

Model this new sysctl(3) entry after "user_set_cpu_affinity" in the same
level of sysctl(3) switches.

Allow to read unconditionally Debug Registers (no change here). This is
convenient as even if a user of a debugger does not use hardware assisted
watchpoints/breakpoints, a debugger can still prompt these values to store
in an internal cache with context of registers. Reading them should have
no security concerns.

Add a paranoid MI switch that prohibits by default setting these registers
by a regular user (non-superuser). Make this switch disabled by default.
There are enough reserved bits out there to allow using them
unconditionally on hardened hosts.

Features shipped with Debug Registers are optional features in debuggers.
There is no reduction in elementary functionality.

Reviewed by <christos>

Sponsored by <The NetBSD Foundation>
 1.74.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.74.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.74.4.1 10-Jun-2019  christos Sync with HEAD
 1.77.2.1 17-Jan-2020  ad Sync with head.
 1.82.4.1 18-Apr-2024  martin Pull up following revision(s) (requested by hannken in ticket #668):

sys/miscfs/procfs/procfs.h: revision 1.83
sys/miscfs/procfs/procfs.h: revision 1.84
sys/kern/vfs_mount.c: revision 1.104
sys/miscfs/procfs/procfs_vnops.c: revision 1.230
sys/kern/init_main.c: revision 1.547
sys/kern/kern_hook.c: revision 1.15
sys/miscfs/procfs/procfs_vfsops.c: revision 1.112
sys/miscfs/procfs/procfs_vfsops.c: revision 1.113
sys/miscfs/procfs/procfs_vfsops.c: revision 1.114
sys/miscfs/procfs/procfs_subr.c: revision 1.117

Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
Protect kernel hooks exechook, exithook and forkhook with rwlock.

Lock as writer on establish/disestablish and as reader on list traverse.

For exechook ride "exec_lock" as it is already take as reader when
traversing the list. Add local locks for exithook and forkhook.

Move exec_init before signal_init as signal_init calls exechook_establish()
that needs "exec_lock".

PR kern/39913 "exec, fork, exit hooks need locking"

Add a hashmap to access all procfs nodes by pid.

Using the exechook to revoke procfs nodes is racy and may deadlock:
one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"

Remove all procfs nodes for this process on process exit.
 1.86.2.1 02-Aug-2025  perseant Sync with HEAD
 1.4 27-Sep-2019  christos Instead of casting to size_t, cast to uintmax_t to prevent truncation
(pointed out by chuq). In all these cases uio_offset can't be negative.
 1.3 26-Sep-2019  christos fix sign-compare issues: uio->uio_offset (off_t) is compared with (size_t):
cast the offset to size_t.
 1.2 30-Mar-2017  christos branches: 1.2.4; 1.2.6; 1.2.14; 1.2.18; 1.2.22;
remove comment.
 1.1 30-Mar-2017  christos add an auxv node.
 1.2.22.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2.18.2 03-Dec-2017  jdolecek update from HEAD
 1.2.18.1 30-Mar-2017  jdolecek file procfs_auxv.c was added on branch tls-maxphys on 2017-12-03 11:38:48 +0000
 1.2.14.2 28-Aug-2017  skrll Sync with HEAD
 1.2.14.1 30-Mar-2017  skrll file procfs_auxv.c was added on branch nick-nhusb on 2017-08-28 17:53:09 +0000
 1.2.6.2 26-Apr-2017  pgoyette Sync with HEAD
 1.2.6.1 30-Mar-2017  pgoyette file procfs_auxv.c was added on branch pgoyette-localcount on 2017-04-26 02:53:28 +0000
 1.2.4.2 21-Apr-2017  bouyer Sync with HEAD
 1.2.4.1 30-Mar-2017  bouyer file procfs_auxv.c was added on branch bouyer-socketcan on 2017-04-21 16:54:04 +0000
 1.33 18-May-2024  thorpej Remove unnecessary include of <sys/malloc.h>.
 1.32 27-Sep-2019  christos Instead of casting to size_t, cast to uintmax_t to prevent truncation
(pointed out by chuq). In all these cases uio_offset can't be negative.
 1.31 26-Sep-2019  christos fix sign-compare issues: uio->uio_offset (off_t) is compared with (size_t):
cast the offset to size_t.
 1.30 31-Dec-2017  christos branches: 1.30.4;
rename some "cmdline" stuff now that it is used to print environment too
 1.29 31-Dec-2017  christos Add an environ node
 1.28 04-Mar-2011  joerg Refactor ps_strings access. Based on PK_32, write either the normal
version or the 32bit compat layout in execve1. Introduce a new function
copyin_psstrings for reading it back from userland and converting it to
the native layout. Refactor procfs to share most of the code with the
kern.proc_args sysctl handler.

This material is based upon work partially supported by
The NetBSD Foundation under a contract with Joerg Sonnenberger.
 1.27 28-Apr-2008  martin branches: 1.27.22; 1.27.28; 1.27.30;
Remove clause 3 and 4 from TNF licenses
 1.26 17-Feb-2007  pavel branches: 1.26.38; 1.26.40; 1.26.42;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.25 09-Feb-2007  ad branches: 1.25.2;
Merge newlock2 to head.
 1.24 28-Dec-2006  elad PR/32877: Geoff C. Wing: mount_procfs(8) doesn't null-terminate cmdline
output

Patch applied, thanks!
 1.23 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.22 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.21 01-Mar-2006  yamt branches: 1.21.14; 1.21.16;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.20 11-Dec-2005  christos branches: 1.20.2; 1.20.4; 1.20.6;
merge ktrace-lwp.
 1.19 26-Feb-2005  perry branches: 1.19.4;
nuke trailing whitespace
 1.18 22-Apr-2004  itojun branches: 1.18.4; 1.18.6;
sprintf -> snprintf
 1.17 29-Jun-2003  fvdl branches: 1.17.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.16 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.15 07-Nov-2002  thorpej Fix signed/unsigned comparison warnings.
 1.14 09-May-2002  thorpej Move code shared by procfs and the kernel proper out of procfs and
into the kernel proper (renaming functions from procfs_* to process_*).
 1.13 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.12 10-Nov-2001  lukem add RCSIDs
 1.11 28-Sep-2000  eeh branches: 1.11.2; 1.11.4; 1.11.8;
Add support for variable end of user stacks needed to support COMPAT_NETBSD32:

`struct vmspace' has a new field `vm_minsaddr' which is the user TOS.

PS_STRINGS is deprecated in favor of curproc->p_pstr which is derived
from `vm_minsaddr'.

Bump the kernel version number.
 1.10 26-Sep-2000  thorpej PHOLD/PRELE around uvm_io() to user address space is unnecessary. There
is nothing in the U-area that we need.
 1.9 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.8 01-Jun-2000  simonb branches: 1.8.2;
Fix a possible kernel memory leak - if the cmdline of a process was
requested after it had started to exit but before it became a zombie
a page of kernel memory wouldn't be free'd.
 1.7 16-May-2000  simonb branches: 1.7.2;
Apply patch from Robert Elz in PR kern/10113. This fixes two problems
with procfs's cmdline - from the PR:

The cmdline implementation in procfs is bogus. It's possible that
part of the fix is a workaround of a UVM problem - that is, when
(internally) accessing the top of the process VM (the end of the
args) a request for I/0 of a PAGE_SIZE'd block starting at less
than a PAGE_SIZE from the end of the mem space returns EINVAL
rather than the data that is available. Whether this is a bug
in UVM or not depends upon how it is defined to work, and I was
unable to determine that. (Simon Burge found that problem, and
provided the basis of the workaround/fix).

Then, the cmdline function is unable to read more than one
page of args, and a good thing too, as the way it is written
attempting to get more than that would reference into lala land.

And, on an attempt to read a lot of data when the above is
fixed, most of the data won't be returned, only the final block
of any read.

Tested on alpha, pmax, i386 and sparc.
 1.6 22-Jul-1999  thorpej branches: 1.6.2;
Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.
 1.5 27-Apr-1999  thorpej Fix excessive memory usage, and fix handling of SZOMB processes. PR #7164,
Jaromir Dolecek.
 1.4 24-Mar-1999  mrg branches: 1.4.2;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.3 13-Mar-1999  thorpej malloc the arg temporary buffer, rather than declaring it as an automatic
array of ARG_MAX size. ARG_MAX is currently 256k, which causes a rather
serious stack overflow (kernel stacks are not very large, usually 8k).

Fixes memory corruption problems observed after accessig /proc/1/cmdline
during tests. Problem in my case manifested itself as massive lossage
in ffs_sync(), resulting in a crash, and sometimes, pooched file systems.

XXX This could, and probably should, be rewritten to use a much smaller
temporary buffer, and a loop around uiomove().
 1.2 13-Mar-1999  thorpej Some changes to `cmdline' to make it work properly:
- Don't error out on P_SYSTEM or SZOMB processes; instead, do what ps(1)
would do, i.e. the p_comm in parenthesis.
- Use uvm_io() (or procfs_rwmem() if !UVM) to read the target process's
psstrings and argument vector. Using copyin() is problematic, because
it operates on the current processes! That is, the old code would
always get the `cmdline' of the process reading the file, not that of
the target process.
 1.1 12-Mar-1999  christos PR/7143: Jaromir Docelek: Add procfs/cmdline from Linux emulation
 1.4.2.2 01-Jun-2000  he Pull up revision 1.8 (requested by simonb):
Fix a possible kernel memory leak - if the command line of a
process was requested after it had started to exit but before it
became a zombie a page of kernel memory would not be freed.
 1.4.2.1 27-Apr-1999  perry branches: 1.4.2.1.2;
pullup 1.4->1.5 (thorpej)
 1.4.2.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.4.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.8.2.1 18-Oct-2000  tv Pullup by patch [eeh]:
Support userspace at multiple addresses by making PSSTRINGS variable (using
p_psstr), and fix stackgap_init() appropriately.
 1.11.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.11.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.11.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.2.4 11-Nov-2002  nathanw Catch up to -current
 1.11.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.11.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.11.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.17.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.17.2.2 03-Aug-2004  skrll Sync with HEAD
 1.17.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.18.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.18.4.1 29-Apr-2005  kent sync with -current
 1.19.4.3 26-Feb-2007  yamt sync with head.
 1.19.4.2 30-Dec-2006  yamt sync with head.
 1.19.4.1 21-Jun-2006  yamt sync with head.
 1.20.6.1 22-Apr-2006  simonb Sync with head.
 1.20.4.1 09-Sep-2006  rpaulo sync with head
 1.20.2.1 15-Jan-2006  yamt convert procfs.
 1.21.16.2 10-Dec-2006  yamt sync with head.
 1.21.16.1 22-Oct-2006  yamt sync with head
 1.21.14.3 12-Jan-2007  ad Sync with head.
 1.21.14.2 18-Nov-2006  ad Sync with head.
 1.21.14.1 17-Nov-2006  ad Checkpoint work in progress.
 1.25.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.26.42.1 16-May-2008  yamt sync with head.
 1.26.40.1 18-May-2008  yamt sync with head.
 1.26.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.27.30.1 05-Mar-2011  bouyer Sync with HEAD
 1.27.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.27.22.1 05-Mar-2011  rmind sync with head
 1.30.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.49 28-Aug-2017  kamil Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
 1.48 04-Apr-2016  christos branches: 1.48.10;
Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>
 1.47 21-Oct-2009  rmind branches: 1.47.22; 1.47.40;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.46 14-Mar-2009  dsl ANSIfy another 1261 function definitions.
The only ones left in sys are beyond by sed script!
(or in sys/dist or sys/external)
Mostly they have function pointer parameters.
 1.45 24-Apr-2008  ad branches: 1.45.2; 1.45.10; 1.45.16;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.44 24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.43 23-Jan-2008  elad branches: 1.43.6; 1.43.8;
Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.
 1.42 07-Nov-2007  ad branches: 1.42.6;
Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.41 09-Jul-2007  ad branches: 1.41.6; 1.41.8; 1.41.12; 1.41.14;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.40 09-Mar-2007  ad branches: 1.40.2; 1.40.4;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.39 09-Feb-2007  ad branches: 1.39.2;
Merge newlock2 to head.
 1.38 19-Dec-2006  elad Some changes to get rid of another KAUTH_GENERIC_ISSUSER usage:
- Make procfs_control() in procfs_ctl.c static,
- Add an argument to the above, 'pfs', for the pfsnode,
- Add another request type to KAUTH_PROCESS_CANPROCFS named
KAUTH_REQ_PROCESS_CANPROCFS_CTL (and update documentation),
- Use the above combination in a call to kauth_authorize_process().
 1.37 22-Nov-2006  elad branches: 1.37.2;
Remove redundant securelevel check; this is already done in procfs_rw()
and we can't get here (procfs_control()) without being there first.

Pointed out by yamt@.
 1.36 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.35 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.34 03-Sep-2006  christos branches: 1.34.2; 1.34.4;
add missing initializers
 1.33 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.32 14-May-2006  elad integrate kauth.
 1.31 05-Mar-2006  christos branches: 1.31.2; 1.31.4;
cleanup more SET/CLR/ISSET lossage
 1.30 11-Dec-2005  christos branches: 1.30.4; 1.30.6; 1.30.8;
merge ktrace-lwp.
 1.29 30-Aug-2005  xtraeme Remove __P()
 1.28 26-Feb-2005  perry branches: 1.28.4;
nuke trailing whitespace
 1.27 07-Aug-2003  agc branches: 1.27.8; 1.27.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.26 29-Jun-2003  fvdl branches: 1.26.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.25 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.24 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.23 25-Jul-2002  jdolecek branches: 1.23.2;
Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.

Reviewed by Christos Zoulas.
 1.22 11-Jan-2002  christos branches: 1.22.8; 1.22.10;
Apply the same P_INEXEC test to avoid the execve/trace problem using
the procfs ptrace calls.
 1.21 05-Dec-2001  thorpej * Allow machine-dependent code to specify hooks for ptrace(2)
(__HAVE_PTRACE_MACHDEP) and procfs (__HAVE_PROCFS_MACHDEP).
These changes will allow platforms like x86 (XMM) and PowerPC
(AltiVec) to export extended register sets in a sane manner.

* Use __HAVE_PTRACE_MACHDEP to export x86 XMM registers (standard
FP + SSE/SSE2) using PT_{GET,SET}XMMREGS (in the machdep
ptrace request space).
* Use __HAVE_PROCFS_MACHDEP to export x86 XMM registers via
/proc/N/xmmregs in procfs.
 1.20 10-Nov-2001  lukem add RCSIDs
 1.19 18-Jan-2001  jdolecek branches: 1.19.2; 1.19.4; 1.19.8;
constify
 1.18 20-Aug-2000  thorpej Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.
 1.17 22-Jul-1999  thorpej branches: 1.17.2; 1.17.12;
Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.
 1.16 28-Apr-1997  mycroft branches: 1.16.16; 1.16.18;
Reinstate P_FSTRACE, with different semantics:
* Never send a SIGCHLD to the parent if P_FSTRACE is set.
* Do not permit mixing ptrace(2) and procfs; only permit using the one that
was attached.
 1.15 28-Apr-1997  mycroft Fix several deficiencies, as compared to ptrace(2):
* Did not check for P_SUGID on ATTACH.
* Did not check for tracing of init on ATTACH.
* Did not turn off single-step mode on RUN or DETACH.
* Might have screwed up reparenting in some cases.
* Allowed anyone to detach the process.
 1.14 09-Feb-1996  christos miscfs prototype changes
 1.13 13-Aug-1995  mycroft Lock the process in core before operating on it.
 1.12 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.11 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.10 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.9 07-May-1994  cgd setrun rename
 1.8 04-May-1994  cgd Rename a lot of process flags.
 1.7 20-Jan-1994  ws Make procfs really work for debugging.
Implement not & notepg files in procfs.
 1.6 09-Jan-1994  cgd fix some of my more recent botches, and clean up slightly.
 1.5 09-Jan-1994  cgd oops. fix that last
 1.4 09-Jan-1994  cgd minor cleanup; kill a few assignments
 1.3 09-Jan-1994  ws Bug fixes and enhancements:
Make NFS serving work (BUT DON'T USE "attach" TO /proc/*/ctl FOR NOW!!!)
Make `curproc' a symbolic link
Add `.' and `..' entries to the directories.
Return better guesses on the size of the files.
 1.2 08-Jan-1994  cgd reorganization of ptrace/procfs code
 1.1 05-Jan-1994  cgd branches: 1.1.1;
add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.16.18.1 02-Aug-1999  thorpej Update from trunk.
 1.16.16.1 14-Jan-2002  he Pull up revision 1.22 (requested by he):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.17.12.1 12-Jan-2002  he Pull up revision 1.22 (requested by christos):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.17.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.17.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.19.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.4.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.19.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.19.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.19.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.19.2.4 11-Jan-2002  nathanw More catchup.
 1.19.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.19.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.19.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.22.10.1 29-Jul-2002  lukem Pull up revision 1.23 (requested by jdolocek in ticket #557):
Make sure that the pointer to old parent process for ptraced children
gets reset properly when the old parent exits before the child. A flag
is set in old parent process when the child is reparented in ptrace(2).
If it's set when process is exiting, all running processes have their
'old parent process' pointer checked and reset if appropriate. Also
change to use 'struct proc *' pointer directly, rather than pid_t.
This fixes security/14444 by David Sainty.
Reviewed by Christos Zoulas.
 1.22.8.1 29-Aug-2002  gehenna catch up with -current.
 1.23.2.1 18-Dec-2002  gmcgarry Merge pcred and ucred, and poolify. TBD: check backward compatibility
and factor-out some higher-level functionality.
 1.26.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.26.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.26.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.26.2.2 03-Aug-2004  skrll Sync with HEAD
 1.26.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.27.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.27.8.1 29-Apr-2005  kent sync with -current
 1.28.4.6 04-Feb-2008  yamt sync with head.
 1.28.4.5 15-Nov-2007  yamt sync with head.
 1.28.4.4 03-Sep-2007  yamt sync with head.
 1.28.4.3 26-Feb-2007  yamt sync with head.
 1.28.4.2 30-Dec-2006  yamt sync with head.
 1.28.4.1 21-Jun-2006  yamt sync with head.
 1.30.8.4 03-Sep-2006  yamt sync with head.
 1.30.8.3 11-Aug-2006  yamt sync with head
 1.30.8.2 24-May-2006  yamt sync with head.
 1.30.8.1 13-Mar-2006  yamt sync with head.
 1.30.6.2 01-Jun-2006  kardel Sync with head.
 1.30.6.1 22-Apr-2006  simonb Sync with head.
 1.30.4.1 09-Sep-2006  rpaulo sync with head
 1.31.4.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.31.2.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.31.2.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.31.2.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.34.4.3 21-Dec-2006  yamt sync with head.
 1.34.4.2 10-Dec-2006  yamt sync with head.
 1.34.4.1 22-Oct-2006  yamt sync with head
 1.34.2.5 12-Jan-2007  ad Sync with head.
 1.34.2.4 29-Dec-2006  ad Checkpoint work in progress.
 1.34.2.3 18-Nov-2006  ad Sync with head.
 1.34.2.2 17-Nov-2006  ad Checkpoint work in progress.
 1.34.2.1 21-Oct-2006  ad - Make this compile. XXX Needs more work on locking.
- Do FILE_UNUSE() as the current LWP, otherwise we will wipe out the
target's advisory locks. XXX Double check.
 1.37.2.1 04-Jan-2007  bouyer Pull up following revision(s) (requested by hubert in ticket #334):
share/man/man9/kauth.9: revision 1.39
sys/miscfs/procfs/procfs_ctl.c: revision 1.38
sys/sys/kauth.h: revision 1.27
Some changes to get rid of another KAUTH_GENERIC_ISSUSER usage:
- Make procfs_control() in procfs_ctl.c static,
- Add an argument to the above, 'pfs', for the pfsnode,
- Add another request type to KAUTH_PROCESS_CANPROCFS named
KAUTH_REQ_PROCESS_CANPROCFS_CTL (and update documentation),
- Use the above combination in a call to kauth_authorize_process().
 1.39.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.40.4.1 11-Jul-2007  mjf Sync with head.
 1.40.2.3 25-Oct-2007  ad - Simplify debugger/procfs reference counting of processes. Use a per-proc
rwlock: rw_tryenter(RW_READER) to gain a reference, and rw_enter(RW_WRITER)
by the process itself to drain out reference holders before major changes
like exiting.
- Fix numerous bugs and locking issues in procfs.
- Mark procfs MPSAFE.
 1.40.2.2 15-Jul-2007  ad Sync with head.
 1.40.2.1 05-Apr-2007  ad Compile fixes.
 1.41.14.2 18-Feb-2008  mjf Sync with HEAD.
 1.41.14.1 19-Nov-2007  mjf Sync with HEAD.
 1.41.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.41.8.2 23-Mar-2008  matt sync with HEAD
 1.41.8.1 08-Nov-2007  matt sync with -HEAD
 1.41.6.1 11-Nov-2007  joerg Sync with HEAD.
 1.42.6.1 23-Jan-2008  bouyer Sync with HEAD.
 1.43.8.1 18-May-2008  yamt sync with head.
 1.43.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.45.16.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.45.10.1 28-Apr-2009  skrll Sync with HEAD.
 1.45.2.2 11-Mar-2010  yamt sync with head
 1.45.2.1 04-May-2009  yamt sync with head.
 1.47.40.1 22-Apr-2016  skrll Sync with HEAD
 1.47.22.1 03-Dec-2017  jdolecek update from HEAD
 1.48.10.1 12-Apr-2018  martin Pull up following revision(s) (requested by kamil in ticket #713):

sys/modules/procfs/Makefile: revision 1.4
sys/miscfs/procfs/procfs_vfsops.c: revision 1.98
bin/ps/ps.1: revision 1.108
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.32
sys/miscfs/procfs/procfs_vnops.c: revision 1.198
sys/kern/sys_ptrace_common.c: revision 1.23
sys/kern/sys_ptrace_common.c: revision 1.24
sbin/mount_procfs/mount_procfs.8: revision 1.36
sys/kern/sys_ptrace_common.c: revision 1.25
sys/kern/sys_ptrace.c: revision 1.5
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.30
sys/sys/proc.h: revision 1.342
sys/kern/sys_ptrace_common.c: revision 1.26
sys/miscfs/procfs/procfs_ctl.c: file removal
sys/kern/sys_ptrace_common.c: revision 1.27
sys/miscfs/procfs/procfs_subr.c: revision 1.109
sys/kern/sys_ptrace_common.c: revision 1.28
sys/secmodel/extensions/secmodel_extensions.c: revision 1.8
sys/kern/sys_ptrace_common.c: revision 1.29
sys/sys/ptrace.h: revision 1.62
sys/compat/netbsd32/netbsd32_signal.c: revision 1.45
share/man/man9/kauth.9: revision 1.109
sys/miscfs/procfs/files.procfs: revision 1.12
sys/compat/netbsd32/netbsd32.h: revision 1.115
sys/miscfs/procfs/procfs.h: revision 1.72
sys/compat/netbsd32/netbsd32_ptrace.c: revision 1.5
sys/kern/kern_sig.c: revision 1.337
sys/sys/kauth.h: revision 1.75
sys/sys/sysctl.h: revision 1.224
sys/kern/sys_ptrace_common.c: revision 1.30
sys/kern/sys_ptrace_common.c: revision 1.31
sys/kern/sys_ptrace_common.c: revision 1.32
sys/kern/sys_ptrace_common.c: revision 1.33
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.20
sys/kern/sys_ptrace_common.c: revision 1.34
sys/kern/sys_ptrace_common.c: revision 1.36
sys/kern/kern_proc.c: revision 1.207
sys/kern/kern_exit.c: revision 1.269
doc/TODO.ptrace: revision 1.29

Make {s,g}et{db,fp,}regs work again for PK_32 processes
XXX: pullup-8

add disgusting magic to handle compat_netbsd32 as a module.

use process_*reg32 instead of struct *reg32.

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed

PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>

untangle the mess:
- factor out common code
- break each ptrace subcall to its own sub-function
.. more to come ...
- reduce ifdef ugliness by moving it up top.
- factor out PT_IO and make PT_{READ,WRITE}_{I,D} use it
- factor out PT_DUMPCORE
- factor out sendsig code
.. more to come ...

handle siginfo requests for ptrace32

ptrace: Partially undo PT_{READ,WRITE}_{I,D} and unbreak these commands

The refactored code did not work and was generating EFAULT.

Sponsored by <The NetBSD Foundation>

Merge the code back; the problem was that since we are reading/writing
to a kernel address for PT_{READ,WRITE}_{I,D} we need the kernel vmspace.
provide separate read and write functions to accomodate register functions
that need a size argument.

don't ignore error from copyout_piod

Use the proper process (the tracee) to get information about lwps and
registers and the tracer for vmspace.

Add new sysctl(3) entry: security.models.extensions.user_set_dbregs

Model this new sysctl(3) entry after "user_set_cpu_affinity" in the same
level of sysctl(3) switches.

Allow to read unconditionally Debug Registers (no change here). This is
convenient as even if a user of a debugger does not use hardware assisted
watchpoints/breakpoints, a debugger can still prompt these values to store
in an internal cache with context of registers. Reading them should have
no security concerns.

Add a paranoid MI switch that prohibits by default setting these registers
by a regular user (non-superuser). Make this switch disabled by default.
There are enough reserved bits out there to allow using them
unconditionally on hardened hosts.

Features shipped with Debug Registers are optional features in debuggers.
There is no reduction in elementary functionality.

Reviewed by <christos>

Sponsored by <The NetBSD Foundation>
 1.14 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.13 21-Mar-2008  ad branches: 1.13.2; 1.13.4;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.12 07-Nov-2007  ad branches: 1.12.14;
Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.11 09-Feb-2007  ad branches: 1.11.6; 1.11.18; 1.11.20; 1.11.24; 1.11.26;
Merge newlock2 to head.
 1.10 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.9 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.8 23-Jul-2006  ad branches: 1.8.4; 1.8.6;
Use the LWP cached credentials where sane.
 1.7 14-May-2006  elad integrate kauth.
 1.6 11-Dec-2005  christos branches: 1.6.4; 1.6.6; 1.6.8; 1.6.10; 1.6.12;
merge ktrace-lwp.
 1.5 29-Jun-2003  fvdl branches: 1.5.2; 1.5.18;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.4 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.3 08-May-2003  nakayama Add breaks which were forgotten in rev. 1.2 change.
Inspired from a report by HIRATSUKA Kouichirou in tech-pkg-ja mailing list.
 1.2 17-Apr-2003  jdolecek use fd_getfile() in procfs_getfp(), and FILE_USE()/FILE_UNUSE() the
returned file descriptor pointer appropriately
 1.1 03-Jan-2003  christos branches: 1.1.2;
Implement /proc/<pid>/fd/<n>. This is work in progress. Questionable things:
- Is it ok to convert DTYPE_PIPE to VFIFO and DTYPE_SOCKET to VSOCK?
- XXX: Avoid locking issue in ls -Rl /proc by avoiding curproc
- Does I/O to pipes work?
- XXX: Are there security implications?
 1.1.2.2 07-Jan-2003  thorpej Sync with HEAD.
 1.1.2.1 03-Jan-2003  thorpej file procfs_fd.c was added on branch nathanw_sa on 2003-01-07 21:41:13 +0000
 1.5.18.5 24-Mar-2008  yamt sync with head.
 1.5.18.4 15-Nov-2007  yamt sync with head.
 1.5.18.3 26-Feb-2007  yamt sync with head.
 1.5.18.2 30-Dec-2006  yamt sync with head.
 1.5.18.1 21-Jun-2006  yamt sync with head.
 1.5.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.6.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.6.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.6.8.2 11-Aug-2006  yamt sync with head
 1.6.8.1 24-May-2006  yamt sync with head.
 1.6.6.1 01-Jun-2006  kardel Sync with head.
 1.6.4.1 09-Sep-2006  rpaulo sync with head
 1.8.6.2 10-Dec-2006  yamt sync with head.
 1.8.6.1 22-Oct-2006  yamt sync with head
 1.8.4.3 18-Nov-2006  ad Sync with head.
 1.8.4.2 17-Nov-2006  ad Checkpoint work in progress.
 1.8.4.1 21-Oct-2006  ad - Make this compile. XXX Needs more work on locking.
- Do FILE_UNUSE() as the current LWP, otherwise we will wipe out the
target's advisory locks. XXX Double check.
 1.11.26.1 19-Nov-2007  mjf Sync with HEAD.
 1.11.24.1 13-Nov-2007  bouyer Sync with HEAD
 1.11.20.1 08-Nov-2007  matt sync with -HEAD
 1.11.18.1 11-Nov-2007  joerg Sync with HEAD.
 1.11.6.1 25-Oct-2007  ad - Simplify debugger/procfs reference counting of processes. Use a per-proc
rwlock: rw_tryenter(RW_READER) to gain a reference, and rw_enter(RW_WRITER)
by the process itself to drain out reference holders before major changes
like exiting.
- Fix numerous bugs and locking issues in procfs.
- Mark procfs MPSAFE.
 1.12.14.2 02-Jun-2008  mjf Sync with HEAD.
 1.12.14.1 03-Apr-2008  mjf Sync with HEAD.
 1.13.4.1 16-May-2008  yamt sync with head.
 1.13.2.1 18-May-2008  yamt sync with head.
 1.17 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.16 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.15 11-Dec-2005  christos branches: 1.15.20; 1.15.22;
merge ktrace-lwp.
 1.14 07-Aug-2003  agc branches: 1.14.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 29-Jun-2003  fvdl branches: 1.13.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.12 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.11 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.10 09-May-2002  thorpej Move code shared by procfs and the kernel proper out of procfs and
into the kernel proper (renaming functions from procfs_* to process_*).
 1.9 05-Dec-2001  thorpej * Allow machine-dependent code to specify hooks for ptrace(2)
(__HAVE_PTRACE_MACHDEP) and procfs (__HAVE_PROCFS_MACHDEP).
These changes will allow platforms like x86 (XMM) and PowerPC
(AltiVec) to export extended register sets in a sane manner.

* Use __HAVE_PTRACE_MACHDEP to export x86 XMM registers (standard
FP + SSE/SSE2) using PT_{GET,SET}XMMREGS (in the machdep
ptrace request space).
* Use __HAVE_PROCFS_MACHDEP to export x86 XMM registers via
/proc/N/xmmregs in procfs.
 1.8 10-Nov-2001  lukem add RCSIDs
 1.7 17-Jan-2001  fvdl branches: 1.7.2; 1.7.4; 1.7.8;
Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.6 27-Aug-1997  thorpej branches: 1.6.18; 1.6.28;
Fix a reversed argument which caused procfs_checkioperm() to always return
"OK". Add a few comments to avoid further confusion.
 1.5 12-Aug-1997  thorpej Fix the procfs hole described on current-users, similar to a fix for
FreeBSD by Sean Eric Fagan, but a bit different. This makes the checks
in the same places as sef's FreeBSD patch, but does not hardcode the
"kmem" group into the kernel, and also does a check identical to the
(3) and (4) checks in the NetBSD ptrace(2):

(1) it's not owned by you, or is set-id on exec (unless
you're root), or

(2) it's init, which controls the security level of the
entire system, and the system was not compiled with
permanently insecure mode turned on.
 1.4 13-Aug-1995  mycroft branches: 1.4.14;
Lock the process in core before operating on it.
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.4.14.2 28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.4.14.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.6.28.1 30-Mar-2001  he Pull up revision 1.7 (requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.6.18.1 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.7.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.7.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.7.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.7.2.6 15-Oct-2002  nathanw Make _validfoo() routines go back to taking a proc.
 1.7.2.5 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.7.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.7.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.7.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.7.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.13.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.13.2.2 03-Aug-2004  skrll Sync with HEAD
 1.13.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.14.16.2 30-Dec-2006  yamt sync with head.
 1.14.16.1 21-Jun-2006  yamt sync with head.
 1.15.22.2 10-Dec-2006  yamt sync with head.
 1.15.22.1 22-Oct-2006  yamt sync with head
 1.15.20.1 18-Nov-2006  ad Sync with head.
 1.5 12-May-2024  christos PR/58240: Ricardo Branco: Add support for proc/self/limits as used by Linux
 1.4 23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.3 27-Sep-2019  christos Instead of casting to size_t, cast to uintmax_t to prevent truncation
(pointed out by chuq). In all these cases uio_offset can't be negative.
 1.2 26-Sep-2019  christos fix sign-compare issues: uio->uio_offset (off_t) is compared with (size_t):
cast the offset to size_t.
 1.1 30-Mar-2019  christos branches: 1.1.4;
add a node for the process resource limits.
 1.1.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.4.2 10-Jun-2019  christos Sync with HEAD
 1.1.4.1 30-Mar-2019  christos file procfs_limit.c was added on branch phil-wifi on 2019-06-10 22:09:06 +0000
 1.90 14-Sep-2024  pgoyette Define dependencies based on build options.
 1.89 01-Jul-2024  christos Add linux POSIX message queue support (Ricardo Branco)
 1.88 12-May-2024  christos branches: 1.88.2;
PR/58227: Ricardo Branco: Add support for proc/sysvipc in Linux emulator
 1.87 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.86 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.85 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.84 31-May-2020  rin struct statvfs is too large for stack. Use malloc(9) instead.

XXX
Switch to kmem(9) for entire this file.

Frame size, e.g. for m68k, becomes:
3292 --> 12
 1.83 23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.82 20-Apr-2020  martin Add missing include of <sys/atomic.h> to fix the build
 1.81 19-Apr-2020  thorpej - Only increment nprocs when we're creating a new process, not just
when allocating a PID.
- Per above, proc_free_pid() no longer decrements nprocs. It's now done
in proc_free() right after proc_free_pid().
- Ensure nprocs is accessed using atomics everywhere.
 1.80 02-Jan-2020  thorpej branches: 1.80.6;
- Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.
 1.79 31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.78 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.77 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.76 07-Sep-2019  chs have procfs_do_pid_stat() pass the proc's map to get_proc_size_info(),
rather than having the latter look up the map again and not check
for an error.
 1.75 23-Aug-2019  maxv Fix info leaks.
 1.74 05-Dec-2018  christos branches: 1.74.4;
As discussed in tech-kern:

- make sysctl kern.expose_address tri-state:
0: no access
1: access to processes with open /dev/kmem
2: access to everyone
defaults:
0: KASLR kernels
1: non-KASLR kernels

- improve efficiency by calling get_expose_address() per sysctl, not per
process.

- don't expose addresses for linux procfs

- welcome to 8.99.27, changes to fill_*proc ABI
 1.73 13-Apr-2017  hannken branches: 1.73.4; 1.73.10; 1.73.12;
Switch procfs_domounts() to mountlist iterator.
 1.72 28-Mar-2016  mlelstv branches: 1.72.2; 1.72.4;
Align /proc/<pid>/statm data with /proc/<pid>/stat and
provide RSS information. There is no data about shared
pages.

Helps PR 50801.
 1.71 24-Jul-2015  maxv Unused inits (harmless).

Found by Brainy.
 1.70 10-Aug-2014  matt branches: 1.70.2; 1.70.4; 1.70.10;
#include <sys/cpu.h>
 1.69 12-Jul-2014  njoly Use kproc2 to provide sensible informations for /proc/<pid>/stat.
 1.68 30-Jun-2014  njoly Use NZERO instead of hard-coded "20" value.
 1.67 05-Apr-2014  christos branches: 1.67.2;
On my 24 proc box I got ENOSPC, so make the routine return the size it wants
and try again.
 1.66 27-Nov-2013  christos Change the queue.3 *_END(&head) macros to NULL. Since we don't have CIRCLEQ
anymore, all the macros expand to NULL anyway, so this improves readability.
Requested by rmind@
 1.65 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.64 19-Dec-2011  christos branches: 1.64.6; 1.64.10;
don't produce different output if we are super user.
 1.63 16-Dec-2011  christos provide a root entry if one was not found.
 1.62 15-Dec-2011  christos PR/45700: use dostatvfs instead of grabbing the latest cached copy of
struct statvfs from the mount point, so that chroot is handled properly.
 1.61 04-Sep-2011  jmcneill branches: 1.61.2; 1.61.6;
PR# kern/45021: Please support /emul/linux/proc/version

Add /proc/version for procfs with -o linux. The version reported depends
on the emulation type of the calling process:

$ cat /proc/version
NetBSD version 5.99.55 (netbsd@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) NetBSD 5.99.55 (GENERIC) #39: Sun Sep 4 09:10:05 EDT 2011

$ /emul/linux/bin/cat /proc/version
Linux version 2.6.18 (linux@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010

$ /emul/linux32/bin/cat /proc/version
Linux version 2.6.18 (linux32@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010
 1.60 28-Aug-2011  jmcneill both LINUX_USRSTACK32 and USRSTACK32 need to be defined for linux32
 1.59 20-Dec-2010  matt Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.58 19-Oct-2009  dholland branches: 1.58.4;
Avoid leaking pages. Fixes PR 42053 from SHIMIZU Ryo.
 1.57 11-Jan-2009  christos this change was somehow missed.
 1.56 11-Jan-2009  christos merge christos-time_t
 1.55 29-Dec-2008  pooka Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.
 1.54 31-May-2008  ad branches: 1.54.6; 1.54.8; 1.54.14;
Kill devsw_lock and just use specfs_lock. The two would need merging
in order to prevent unload of modules when a device that they provide
is still open.
 1.53 06-May-2008  ad branches: 1.53.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.52 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.51 29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.50 24-Apr-2008  ad branches: 1.50.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.49 24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.48 30-Jan-2008  ad branches: 1.48.6; 1.48.8; 1.48.10;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.47 22-Dec-2007  yamt procfs_douptime: simply use microuptime() instead of a mysterious calculation.
 1.46 22-Dec-2007  yamt procfs_docpustat: g/c a write-only variable.
 1.45 12-Nov-2007  ad branches: 1.45.2; 1.45.6;
Revision 1.42 was lost. Pointed out by Nicolas Joly:

This was using mutex_exit where mutex_enter was required.
 1.44 11-Nov-2007  christos report the proper stack size on 32 bit emulations.
 1.43 07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.42 11-Oct-2007  ad branches: 1.42.2; 1.42.4;
This was using mutex_exit where mutex_enter was required.
 1.41 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.40 08-Oct-2007  ad Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.
 1.39 26-May-2007  agc branches: 1.39.6; 1.39.8; 1.39.10;
In /proc/<pid>/statm, avoid leaking buffer space if the attempt to get
vmspace information fails.

Return the nice value properly to userland via the /proc/<pid>/stat entry.

Use vm sizes from vmspace, rather than rusage structs, for the same
reasons as mentioned previously - see the comment in
kvm_proc.c::kvm_getproc2() about rusage values and zombie processes.
 1.38 25-May-2007  agc Use a bit more common code for the MULTIPROCESSOR and !MULTIPROCESSOR
cases.

Use the lwp's priority when returning the priority value, rather than
returning the nice value.
 1.37 25-May-2007  agc Various changes for better Linux emulation:

+ in /proc/<pid>/statm emulation, use the memory values from vmspace,
rather than struct rusage, since the rusage values appear to be 0 for
all processes except zombies. cf dsl's comment in
kvm_proc.c::kvm_getproc2()

+ in /proc/<pid>/stat, instead of returning the tv_sec value, return the
number of ticks we've had (roughly equivalent to the Linux jiffies).
Calculate these values from the tv_usec values.

Also:

+ enclose CPU_INFO_ITERATOR and CPU_INFO_FOREACH usage in #ifdef
MULTIPROCESSOR, at the request of Nick Hudson

Together, these changes allow htop to work on NetBSD.
 1.36 24-May-2007  dogcow use PRIu64, not llu, to unbork on 64-bit platforms.
 1.35 24-May-2007  agc Extend the Linux emulation of /proc to include

/proc/stat
/proc/loadavg and
/proc/<pid>/statm.

These are only present when -o linux is specified as a mount option
to procfs.

Factor out some common code so that it can be used by a number of
functions.

XXX The values returned in the statm emulation need to be verified.
 1.34 01-Apr-2007  christos return a page less than the actual top of stack so that linux-java works.
 1.33 09-Mar-2007  ad branches: 1.33.2; 1.33.4;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.32 09-Feb-2007  ad branches: 1.32.2;
Merge newlock2 to head.
 1.31 24-Dec-2006  elad Add two comments. No functional change.
 1.30 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.29 27-Oct-2006  christos don't allocate large buffers on the stack.
 1.28 23-Oct-2006  elad PR/34888: Nicolas Joly: kernel panic while trying to access
/emul/linux/proc/0/stat

Patch applied, thanks for the report!
 1.27 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.26 20-Sep-2006  manu Emulate Linux's /proc/devices
 1.25 24-Jun-2006  christos branches: 1.25.4; 1.25.6;
PR/33815: Nicolas Joly: /emul/linux/proc/#/stat always report current
process status
 1.24 11-Dec-2005  christos branches: 1.24.4; 1.24.8; 1.24.16;
merge ktrace-lwp.
 1.23 29-May-2005  christos branches: 1.23.2;
- sprinkle const
- avoid shadowed variables.
 1.22 01-Mar-2005  christos branches: 1.22.2; 1.22.4;
Remove bogus len setting noted by J. Chapman Flack.
 1.21 27-Feb-2005  christos Give more space for cpu info and allocate it dynamically.
 1.20 26-Feb-2005  perry nuke trailing whitespace
 1.19 20-Sep-2004  jdolecek branches: 1.19.4; 1.19.6;
add 'mounts' file for -o linux, which lists all currently mounted
filesystems; Linux glibc statvfs() uses this to get some of mount flags,
and this file is also useful as /emul/linux/etc/mtab (via symlink)
 1.18 27-Aug-2004  skrll Do previous slightly differently - just pass a struct lwp * and derive the
struct proc *.

OK'd by Jaromir.
 1.17 21-Aug-2004  jdolecek fix process used for /proc/<pid>/stat contents - it should be process
<pid>, not the current process looking at the information
 1.16 22-Apr-2004  itojun sprintf -> snprintf
 1.15 30-Oct-2003  christos branches: 1.15.2;
t_pgrp can be null.
 1.14 21-Aug-2003  he Add casts of LINUX_USRSTACK and USRSTACK to handle the cases
where these are not constants.
 1.13 09-Aug-2003  christos LINUX_USRSTACK is only defined on i386. Thanks Izumi!
 1.12 09-Aug-2003  christos Only choose the linux usrstack if the netbsd usrstack was higher.
 1.11 09-Aug-2003  christos Change the way we compute the top of the stack. This makes java-1.4.2 work.
 1.10 29-Jun-2003  fvdl branches: 1.10.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.9 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.8 29-May-2003  hannken Change "%qu" to "PRIu64" to make it compile on sparc64.
 1.7 28-May-2003  christos Add /proc/<pid>/stat for linux compat. j2sdk1.4.2 depends on it.
 1.6 27-Feb-2003  hannken Change "%llu" to "PRIu64" to make it compile on sparc64.
 1.5 25-Feb-2003  jrf This addresses PR kerm/19989. Thanks to hamajima@nagoya.ydc.co.jp for submitting this patch which enables /proc/uptime for linux emul. Patch reviewed by atatat@netbsd.org and tron@netbsd.org, approved by tron@netbsd.org.
 1.4 09-Dec-2001  chs replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names.
 1.3 10-Nov-2001  lukem add RCSIDs
 1.2 18-Jan-2001  tv branches: 1.2.2; 1.2.4; 1.2.6; 1.2.10;
No-op revision to force update of this file to a non-"-kk" version.
 1.1 17-Jan-2001  fvdl branches: 1.1.2;
Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.1.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.1.2.1 17-Jan-2001  bouyer file procfs_linux.c was added on branch thorpej_scsipi on 2001-01-18 09:23:48 +0000
 1.2.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.2.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.2.4.2 30-Mar-2001  he Pull up revisions 1.1-1.2 (new, via patch, requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.2.4.1 18-Jan-2001  he file procfs_linux.c was added on branch netbsd-1-5 on 2001-03-30 21:48:11 +0000
 1.2.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.2.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.10.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.10.2.5 24-Sep-2004  skrll Sync with HEAD.
 1.10.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.10.2.2 03-Aug-2004  skrll Sync with HEAD
 1.10.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.15.2.3 29-Oct-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10739):
sys/miscfs/procfs/procfs_linux.c: revision 1.28
PR/34888: Nicolas Joly: kernel panic while trying to access
/emul/linux/proc/0/stat
Patch applied, thanks for the report!
 1.15.2.2 30-Aug-2004  tron branches: 1.15.2.2.2; 1.15.2.2.4;
Pull up revision 1.18 via patch (requested by jdolecek in ticket #799):
Do previous slightly differently - just pass a struct lwp * and derive the
struct proc *.
OK'd by Jaromir.
 1.15.2.1 30-Aug-2004  tron Pull up revision 1.17 (requested by jdolecek in ticket #799):
fix process used for /proc/<pid>/stat contents - it should be process
<pid>, not the current process looking at the information
 1.15.2.2.4.1 29-Oct-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10739):
sys/miscfs/procfs/procfs_linux.c: revision 1.28
PR/34888: Nicolas Joly: kernel panic while trying to access
/emul/linux/proc/0/stat
Patch applied, thanks for the report!
 1.15.2.2.2.1 29-Oct-2006  tron Pull up following revision(s) (requested by adrianp in ticket #10739):
sys/miscfs/procfs/procfs_linux.c: revision 1.28
PR/34888: Nicolas Joly: kernel panic while trying to access
/emul/linux/proc/0/stat
Patch applied, thanks for the report!
 1.19.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.19.4.1 29-Apr-2005  kent sync with -current
 1.22.4.1 24-Oct-2006  ghen Pull up following revision(s) (requested by elad in ticket #1567):
sys/miscfs/procfs/procfs_linux.c: revision 1.28
PR/34888: Nicolas Joly: kernel panic while trying to access
/emul/linux/proc/0/stat
Patch applied, thanks for the report!
 1.22.2.1 24-Oct-2006  ghen Pull up following revision(s) (requested by elad in ticket #1567):
sys/miscfs/procfs/procfs_linux.c: revision 1.28
PR/34888: Nicolas Joly: kernel panic while trying to access
/emul/linux/proc/0/stat
Patch applied, thanks for the report!
 1.23.2.8 04-Feb-2008  yamt sync with head.
 1.23.2.7 21-Jan-2008  yamt sync with head
 1.23.2.6 15-Nov-2007  yamt sync with head.
 1.23.2.5 27-Oct-2007  yamt sync with head.
 1.23.2.4 03-Sep-2007  yamt sync with head.
 1.23.2.3 26-Feb-2007  yamt sync with head.
 1.23.2.2 30-Dec-2006  yamt sync with head.
 1.23.2.1 21-Jun-2006  yamt sync with head.
 1.24.16.1 13-Jul-2006  gdamore Merge from HEAD.
 1.24.8.1 26-Jun-2006  yamt sync with head.
 1.24.4.1 09-Sep-2006  rpaulo sync with head
 1.25.6.2 10-Dec-2006  yamt sync with head.
 1.25.6.1 22-Oct-2006  yamt sync with head
 1.25.4.4 12-Jan-2007  ad Sync with head.
 1.25.4.3 18-Nov-2006  ad Sync with head.
 1.25.4.2 17-Nov-2006  ad Checkpoint work in progress.
 1.25.4.1 21-Oct-2006  ad - Make this compile. XXX Needs more work on locking.
- Do FILE_UNUSE() as the current LWP, otherwise we will wipe out the
target's advisory locks. XXX Double check.
 1.32.2.2 15-Apr-2007  yamt sync with head.
 1.32.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.33.4.1 11-Jul-2007  mjf Sync with head.
 1.33.2.5 25-Oct-2007  ad - Simplify debugger/procfs reference counting of processes. Use a per-proc
rwlock: rw_tryenter(RW_READER) to gain a reference, and rw_enter(RW_WRITER)
by the process itself to drain out reference holders before major changes
like exiting.
- Fix numerous bugs and locking issues in procfs.
- Mark procfs MPSAFE.
 1.33.2.4 14-Jul-2007  ad Make it possible to track time spent by soft interrupts as is done for
normal LWPs, and provide a sysctl to switch it on/off. Not enabled by
default because microtime() is not free. XXX Not happy with this but
I want it get it out of my local tree for the time being.
 1.33.2.3 08-Jun-2007  ad Sync with head.
 1.33.2.2 10-Apr-2007  ad Sync with head.
 1.33.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.39.10.1 14-Oct-2007  yamt sync with head.
 1.39.8.4 23-Mar-2008  matt sync with HEAD
 1.39.8.3 09-Jan-2008  matt sync with HEAD
 1.39.8.2 08-Nov-2007  matt sync with -HEAD
 1.39.8.1 06-Nov-2007  matt sync with HEAD
 1.39.6.3 14-Nov-2007  joerg Sync with HEAD.
 1.39.6.2 11-Nov-2007  joerg Sync with HEAD.
 1.39.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.42.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.42.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.42.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.42.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.45.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.45.2.1 26-Dec-2007  ad Sync with head.
 1.48.10.2 04-Jun-2008  yamt sync with head
 1.48.10.1 18-May-2008  yamt sync with head.
 1.48.8.3 30-Dec-2008  christos sync with head.
 1.48.8.2 01-Nov-2008  christos Sync with head.
 1.48.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.48.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.48.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.50.2.3 11-Mar-2010  yamt sync with head
 1.50.2.2 04-May-2009  yamt sync with head.
 1.50.2.1 16-May-2008  yamt sync with head.
 1.53.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.54.14.3 29-Apr-2011  matt Use _KERNEL_OPT
 1.54.14.2 05-Feb-2011  cliff - include opt_multiprocessor.h for explicit MULTIPROCESSOR dependency
 1.54.14.1 21-Apr-2010  matt sync to netbsd-5
 1.54.8.1 27-Oct-2009  bouyer Pull up following revision(s) (requested by markd in ticket #1113):
sys/miscfs/procfs/procfs_linux.c: revision 1.58
Avoid leaking pages. Fixes PR 42053 from SHIMIZU Ryo.
 1.54.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.58.4.1 05-Mar-2011  rmind sync with head
 1.61.6.1 18-Feb-2012  mrg merge to -current.
 1.61.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.61.2.1 17-Apr-2012  yamt sync with head
 1.64.10.1 18-May-2014  rmind sync with head
 1.64.6.2 03-Dec-2017  jdolecek update from HEAD
 1.64.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.67.2.1 10-Aug-2014  tls Rebase.
 1.70.10.1 21-Jan-2020  martin Pull up the following, requested by christos in ticket #1720:

sys/compat/common/kern_sig_43.c 1.36
sys/compat/linux/arch/amd64/linux_machdep.c 1.59
sys/compat/linux/common/linux_fcntl.h 1.18
sys/compat/linux/common/linux_file64.c 1.62
sys/compat/linux/common/linux_ipc.c 1.57
sys/compat/linux/common/linux_misc.c 1.243
sys/compat/linux/common/linux_signal.c 1.81
sys/compat/linux/common/linux_socket.c 1.149 (patch)
sys/compat/linux/common/linux_socket.h 1.24
sys/compat/linux/common/linux_statfs.h 1.7
sys/compat/linux/common/linux_termios.c 1.38
sys/compat/linux/common/linux_termios.h 1.22
sys/compat/linux32/common/linux32_dirent.c 1.20
sys/compat/linux32/common/linux32_ioctl.c 1.14
sys/compat/linux32/common/linux32_misc.c 1.27
sys/compat/linux32/common/linux32_signal.c 1.20
sys/compat/linux32/common/linux32_sysinfo.c 1.8
sys/compat/linux32/common/linux32_termios.c 1.15
sys/compat/linux32/common/linux32_utsname.c 1.10
sys/compat/netbsd32/netbsd32_compat_20.c 1.39
sys/compat/netbsd32/netbsd32_compat_43.c 1.59
sys/compat/netbsd32/netbsd32_compat_50.c 1.44
sys/compat/ossaudio/ossaudio.c 1.75
sys/kern/sysv_shm.c 1.138
sys/miscfs/procfs/procfs_linux.c 1.75 (patch)
sys/sys/shm.h 1.54 (patch)

Fix various info leaks, out of bound access, usage of uninitialized
values and direct access to userland variables from kernel space
and memory leaks in system calls implemented for the compatibility
subsystems.
 1.70.4.3 28-Aug-2017  skrll Sync with HEAD
 1.70.4.2 22-Apr-2016  skrll Sync with HEAD
 1.70.4.1 22-Sep-2015  skrll Sync with HEAD
 1.70.2.1 21-Jan-2020  martin Pull up the following, requested by christos in ticket #1720:

sys/compat/common/kern_sig_43.c 1.36
sys/compat/linux/arch/amd64/linux_machdep.c 1.59
sys/compat/linux/common/linux_fcntl.h 1.18
sys/compat/linux/common/linux_file64.c 1.62
sys/compat/linux/common/linux_ipc.c 1.57
sys/compat/linux/common/linux_misc.c 1.243
sys/compat/linux/common/linux_signal.c 1.81
sys/compat/linux/common/linux_socket.c 1.149 (patch)
sys/compat/linux/common/linux_socket.h 1.24
sys/compat/linux/common/linux_statfs.h 1.7
sys/compat/linux/common/linux_termios.c 1.38
sys/compat/linux/common/linux_termios.h 1.22
sys/compat/linux32/common/linux32_dirent.c 1.20
sys/compat/linux32/common/linux32_ioctl.c 1.14
sys/compat/linux32/common/linux32_misc.c 1.27
sys/compat/linux32/common/linux32_signal.c 1.20
sys/compat/linux32/common/linux32_sysinfo.c 1.8
sys/compat/linux32/common/linux32_termios.c 1.15
sys/compat/linux32/common/linux32_utsname.c 1.10
sys/compat/netbsd32/netbsd32_compat_20.c 1.39
sys/compat/netbsd32/netbsd32_compat_43.c 1.59
sys/compat/netbsd32/netbsd32_compat_50.c 1.44
sys/compat/ossaudio/ossaudio.c 1.75
sys/kern/sysv_shm.c 1.138
sys/miscfs/procfs/procfs_linux.c 1.75 (patch)
sys/sys/shm.h 1.54 (patch)

Fix various info leaks, out of bound access, usage of uninitialized
values and direct access to userland variables from kernel space
and memory leaks in system calls implemented for the compatibility
subsystems.
 1.72.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.72.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.73.12.4 21-Apr-2020  martin Sync with HEAD
 1.73.12.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.73.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.73.12.1 10-Jun-2019  christos Sync with HEAD
 1.73.10.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.73.4.2 21-Jan-2020  martin Pull up the following, requested by christos in ticket #1487:

sys/compat/common/kern_sig_43.c 1.36
sys/compat/linux/arch/amd64/linux_machdep.c 1.59
sys/compat/linux/common/linux_fcntl.h 1.18
sys/compat/linux/common/linux_file64.c 1.62
sys/compat/linux/common/linux_ipc.c 1.57
sys/compat/linux/common/linux_misc.c 1.243
sys/compat/linux/common/linux_signal.c 1.81
sys/compat/linux/common/linux_socket.c 1.149
sys/compat/linux/common/linux_socket.h 1.24
sys/compat/linux/common/linux_statfs.h 1.7
sys/compat/linux/common/linux_termios.c 1.38
sys/compat/linux/common/linux_termios.h 1.22
sys/compat/linux32/common/linux32_dirent.c 1.20
sys/compat/linux32/common/linux32_ioctl.c 1.14
sys/compat/linux32/common/linux32_misc.c 1.27
sys/compat/linux32/common/linux32_signal.c 1.20
sys/compat/linux32/common/linux32_sysinfo.c 1.8
sys/compat/linux32/common/linux32_termios.c 1.15
sys/compat/linux32/common/linux32_utsname.c 1.10
sys/compat/netbsd32/netbsd32_compat_20.c 1.39
sys/compat/netbsd32/netbsd32_compat_43.c 1.59
sys/compat/netbsd32/netbsd32_compat_50.c 1.44
sys/compat/ossaudio/ossaudio.c 1.75
sys/kern/sysv_shm.c 1.138
sys/miscfs/procfs/procfs_linux.c 1.75 (patch)
sys/sys/shm.h 1.54

Fix various info leaks, out of bound access, usage of uninitialized
values and direct access to userland variables from kernel space
and memory leaks in system calls implemented for the compatibility
subsystems.
 1.73.4.1 10-Sep-2019  martin Pull up following revision(s) (requested by chs in ticket #1370):

sys/miscfs/procfs/procfs_linux.c: revision 1.76

have procfs_do_pid_stat() pass the proc's map to get_proc_size_info(),
rather than having the latter look up the map again and not check
for an error.
 1.74.4.2 13-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #194):

sys/compat/linux/common/linux_socket.c: revision 1.146
sys/compat/linux/common/linux_socket.c: revision 1.147
sys/compat/linux/common/linux_socket.c: revision 1.148
sys/compat/linux/common/linux_socket.c: revision 1.149
sys/compat/linux/arch/amd64/linux_machdep.c: revision 1.59
sys/compat/linux32/common/linux32_sysinfo.c: revision 1.8
sys/kern/sysv_shm.c: revision 1.138
sys/compat/linux/common/linux_file64.c: revision 1.61
sys/compat/linux/common/linux_file64.c: revision 1.62
sys/compat/netbsd32/netbsd32_compat_43.c: revision 1.58
sys/compat/linux32/common/linux32_dirent.c: revision 1.20
sys/compat/linux32/common/linux32_utsname.c: revision 1.10
sys/compat/linux/common/linux_termios.h: revision 1.22
sys/compat/linux32/common/linux32_termios.c: revision 1.15
sys/compat/linux32/common/linux32_misc.c: revision 1.27
sys/compat/linux32/common/linux32_ioctl.c: revision 1.14
sys/compat/linux/common/linux_statfs.h: revision 1.7
sys/compat/linux/common/linux_ipc.c: revision 1.57
sys/compat/linux/common/linux_fcntl.h: revision 1.18
sys/compat/linux/common/linux_socket.h: revision 1.24
sys/sys/shm.h: revision 1.54
sys/compat/ossaudio/ossaudio.c: revision 1.75
sys/compat/linux32/common/linux32_signal.c: revision 1.20
sys/miscfs/procfs/procfs_linux.c: revision 1.75
sys/compat/linux/common/linux_signal.c: revision 1.81
sys/compat/linux/common/linux_termios.c: revision 1.38
sys/compat/linux/common/linux_misc.c: revision 1.241
sys/compat/linux/common/linux_misc.c: revision 1.242
sys/compat/linux/common/linux_misc.c: revision 1.243
sys/compat/linux/common/linux_misc.c: revision 1.244

Fix info leaks.

Fix stupid bugs in linux_sys_shmctl(): the index could be out of bound
(page fault) and there was no proper locking.
Maybe we should just remove LINUX_SHM_STAT, like compat_linux32.

Remove printf.

When dealing with an unknown value, set -1, to prevent (harmless)
uninitialized accesses later.

Add a default case, don't call sys_ioctl() with an uninitialized 'com'
argument.

Fix error handling, returns an errno, not -1.

Put the printf under DEBUG_LINUX.


Hum, don't forget the 'pid' argument, otherwise we're not gonna go very
far.

Don't read data from userland directly. This simply does not work on any
recent x86 CPU (thanks to SMAP) and all architectures that forbid direct
access to userland from the kernel. But I guess no one noticed because no
one ever uses compat_linux, right?

Hum, don't pass an mbuf to realloc(). Inspired from copyin32_msg_control().

Fix memory leak.

I don't see the point in having this useless printf, but add a '\n' to it,
so that it at least displays useless stuff correctly.

Hum, remove incorrect assignment. Userland could have passed a smaller
namelen, and the uninitialized bytes from sb_data were being used later in
the network stack.
 1.74.4.1 10-Sep-2019  martin Pull up following revision(s) (requested by chs in ticket #190):

sys/miscfs/procfs/procfs_linux.c: revision 1.76

have procfs_do_pid_stat() pass the proc's map to get_proc_size_info(),
rather than having the latter look up the map again and not check
for an error.
 1.80.6.2 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.80.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.88.2.1 02-Aug-2025  perseant Sync with HEAD
 1.47 27-Sep-2019  christos Instead of casting to size_t, cast to uintmax_t to prevent truncation
(pointed out by chuq). In all these cases uio_offset can't be negative.
 1.46 26-Sep-2019  christos fix sign-compare issues: uio->uio_offset (off_t) is compared with (size_t):
cast the offset to size_t.
 1.45 17-Oct-2014  christos branches: 1.45.20;
Maps don't change that frequently between reads, so don't give up and
do what linux does (support reading from an offset).
 1.44 18-Mar-2014  riastradh branches: 1.44.4; 1.44.8;
Merge riastradh-drm2 to HEAD.
 1.43 18-Jul-2013  ryo PR/48048: Add a missing vm_map_unlock_read() and uvmspace_free() to the ENOMEM error case in procfs_domap()d
 1.42 06-May-2012  christos branches: 1.42.2; 1.42.4; 1.42.10;
- match format with the linux map printing
- fix PK_32 map printing for linux processes
should fix 32 bit java stack guard setting.
 1.41 16-Oct-2011  hannken branches: 1.41.2; 1.41.6; 1.41.8; 1.41.12; 1.41.14;
VOP_GETATTR() needs a shared lock at least.
 1.40 26-Jul-2011  yamt fix a botch in PRIxVADDR change (rev.1.38)
 1.39 15-Sep-2010  jym Use PRIxVADDR to print vaddr_t elements. Wrap lines.
 1.38 14-Dec-2009  uebayasi branches: 1.38.2; 1.38.4;
gimpy invented PRIxVADDR format specifier.
 1.37 11-Jan-2009  christos merge christos-time_t
 1.36 25-Jul-2008  christos branches: 1.36.2; 1.36.6; 1.36.12;
use bufsize instead of BUFFERSIZE
 1.35 25-Jul-2008  christos Handle files with a large number of mappings gracefully. Reported by Nicholas
Joly.
 1.34 15-Dec-2007  christos branches: 1.34.6; 1.34.10; 1.34.12; 1.34.14; 1.34.16;
use vnode_to_path.
 1.33 26-Nov-2007  pooka branches: 1.33.2; 1.33.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.32 21-Jul-2007  pooka branches: 1.32.4; 1.32.6; 1.32.12; 1.32.14;
nuke homegrown getcwd_common() decl
 1.31 01-Apr-2007  christos branches: 1.31.4;
Instead of reading and writing little by little, allocate memory and
write the whole map in one shot so that we don't have to deal with the
map changing under us. Fixes the linux emulated jdk-1.6 where it was
losing the last map entry and could not find the stack on startup.
 1.30 18-Feb-2007  ad branches: 1.30.4; 1.30.6;
procfs_map():

- Drop the target's vm_map lock before calling uiomove(). We could
deadlock if inspecting /proc/curproc/map.
- If the vm_map might have changed, restart the operation, but give
up after 250 retries if the map keeps changing. XXX This is not
ideal.
 1.29 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.28 09-Feb-2007  ad branches: 1.28.2;
Merge newlock2 to head.
 1.27 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.26 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.25 23-Jul-2006  ad branches: 1.25.4; 1.25.6;
Use the LWP cached credentials where sane.
 1.24 14-May-2006  elad integrate kauth.
 1.23 11-Dec-2005  christos branches: 1.23.4; 1.23.6; 1.23.8; 1.23.10; 1.23.12;
merge ktrace-lwp.
 1.22 30-Aug-2005  xtraeme Remove __P()
 1.21 26-Feb-2005  perry branches: 1.21.4;
nuke trailing whitespace
 1.20 07-Aug-2003  agc branches: 1.20.8; 1.20.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 29-Jun-2003  fvdl branches: 1.19.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.18 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.17 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.16 07-Nov-2002  thorpej Fix signed/unsigned comparison warnings.
 1.15 10-Nov-2001  lukem add RCSIDs
 1.14 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.13 02-Jun-2001  chs branches: 1.13.2; 1.13.6;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.12 02-Apr-2001  pk Cast `field-width' arguments to type `int'.
 1.11 29-Mar-2001  fvdl For -o linux mounts, add some code to emulate /proc/#/maps.
Needs NAMECACHE_ENTER_REVERSE to include filenames.
 1.10 17-Jan-2001  fvdl branches: 1.10.2;
Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.9 24-Nov-2000  chs remove dead code and other misc cleanup.
 1.8 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.7 27-Jun-2000  mrg remove redudant <vm/pmap.h> includes. <vm/pmap.h> -> <uvm/uvm_pmap.h>
 1.6 25-Jun-2000  mrg remove some redundant <vm/vm_xxx.h> includes
 1.5 10-Apr-1999  drochner branches: 1.5.2; 1.5.12;
remove unneeded <vm/vm_object.h>
 1.4 24-Mar-1999  mrg branches: 1.4.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.3 03-Feb-1999  msaitoh sprintf->snprintf
 1.2 28-Jan-1999  drochner make it compile with !UVM
 1.1 25-Jan-1999  msaitoh Add /proc/#/map. From FreeBSD.
 1.4.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.5.12.1 30-Mar-2001  he Pull up revision 1.10 (requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.5.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.5.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.5.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.5.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.2.8 11-Nov-2002  nathanw Catch up to -current
 1.10.2.7 15-Oct-2002  nathanw Make _validfoo() routines go back to taking a proc.
 1.10.2.6 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.10.2.5 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.10.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.10.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.10.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.13.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.13.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.19.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.19.2.2 03-Aug-2004  skrll Sync with HEAD
 1.19.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.20.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.20.8.1 29-Apr-2005  kent sync with -current
 1.21.4.6 21-Jan-2008  yamt sync with head
 1.21.4.5 07-Dec-2007  yamt sync with head
 1.21.4.4 03-Sep-2007  yamt sync with head.
 1.21.4.3 26-Feb-2007  yamt sync with head.
 1.21.4.2 30-Dec-2006  yamt sync with head.
 1.21.4.1 21-Jun-2006  yamt sync with head.
 1.23.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.23.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.23.8.2 11-Aug-2006  yamt sync with head
 1.23.8.1 24-May-2006  yamt sync with head.
 1.23.6.1 01-Jun-2006  kardel Sync with head.
 1.23.4.1 09-Sep-2006  rpaulo sync with head
 1.25.6.2 10-Dec-2006  yamt sync with head.
 1.25.6.1 22-Oct-2006  yamt sync with head
 1.25.4.1 17-Nov-2006  ad Checkpoint work in progress.
 1.28.2.2 15-Apr-2007  yamt sync with head.
 1.28.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.30.6.1 11-Jul-2007  mjf Sync with head.
 1.30.4.2 20-Aug-2007  ad Sync with HEAD.
 1.30.4.1 10-Apr-2007  ad Sync with head.
 1.31.4.1 15-Aug-2007  skrll Sync with HEAD.
 1.32.14.2 21-Jul-2007  pooka nuke homegrown getcwd_common() decl
 1.32.14.1 21-Jul-2007  pooka file procfs_map.c was added on branch matt-mips64 on 2007-07-21 22:47:37 +0000
 1.32.12.2 27-Dec-2007  mjf Sync with HEAD.
 1.32.12.1 08-Dec-2007  mjf Sync with HEAD.
 1.32.6.1 09-Jan-2008  matt sync with HEAD
 1.32.4.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.33.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.33.2.1 26-Dec-2007  ad Sync with head.
 1.34.16.1 19-Oct-2008  haad Sync with HEAD.
 1.34.14.1 28-Jul-2008  simonb Sync with head.
 1.34.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.34.10.3 09-Oct-2010  yamt sync with head
 1.34.10.2 11-Mar-2010  yamt sync with head
 1.34.10.1 04-May-2009  yamt sync with head.
 1.34.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.34.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.36.12.2 21-Apr-2010  matt sync to netbsd-5
 1.36.12.1 24-Aug-2009  matt Fix some vaddr_t/vaddr_t type droppings.
 1.36.6.2 09-Nov-2008  christos account for major and minor being unsigned long long
 1.36.6.1 25-Jul-2008  christos file procfs_map.c was added on branch christos-time_t on 2008-11-09 02:05:20 +0000
 1.36.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.38.4.1 05-Mar-2011  rmind sync with head
 1.38.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.41.14.1 29-Jul-2013  msaitoh Pull up following revision(s) (requested by ryo in ticket #917):
sys/miscfs/procfs/procfs_map.c: revision 1.43
PR/48048: Add a missing vm_map_unlock_read() and uvmspace_free() to the ENOMEM
error case in procfs_domap()d
 1.41.12.1 29-Jul-2013  msaitoh Pull up following revision(s) (requested by ryo in ticket #917):
sys/miscfs/procfs/procfs_map.c: revision 1.43
PR/48048: Add a missing vm_map_unlock_read() and uvmspace_free() to the ENOMEM
error case in procfs_domap()d
 1.41.8.2 06-Jul-2017  snj Pull up following revision(s) (requested by tsutsui in ticket #1434):
sys/miscfs/procfs/procfs_map.c: revision 1.45
Maps don't change that frequently between reads, so don't give up and
do what linux does (support reading from an offset).
 1.41.8.1 29-Jul-2013  msaitoh Pull up following revision(s) (requested by ryo in ticket #917):
sys/miscfs/procfs/procfs_map.c: revision 1.43
PR/48048: Add a missing vm_map_unlock_read() and uvmspace_free() to the ENOMEM error case in procfs_domap()d
 1.41.6.1 02-Jun-2012  mrg sync to latest -current.
 1.41.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.41.2.1 23-May-2012  yamt sync with head.
 1.42.10.1 23-Jul-2013  riastradh sync with HEAD
 1.42.4.1 28-Aug-2013  rmind sync with head
 1.42.2.2 03-Dec-2017  jdolecek update from HEAD
 1.42.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.44.8.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.44.4.1 14-Feb-2017  snj Pull up following revision(s) (requested by chs in ticket #1358):
sys/miscfs/procfs/procfs_map.c: revision 1.45
Maps don't change that frequently between reads, so don't give up and
do what linux does (support reading from an offset).
 1.45.20.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.37 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.36 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.35 11-Dec-2005  christos branches: 1.35.20; 1.35.22;
merge ktrace-lwp.
 1.34 07-Aug-2003  agc branches: 1.34.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.33 29-Jun-2003  fvdl branches: 1.33.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.32 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.31 09-May-2002  thorpej Move code shared by procfs and the kernel proper out of procfs and
into the kernel proper (renaming functions from procfs_* to process_*).
 1.30 12-Jan-2002  christos When checking for permissions, include the P_INEXEC test and return
EAGAIN if the process is exec'ing.
 1.29 10-Nov-2001  lukem add RCSIDs
 1.28 06-Nov-2001  simonb In procfs_domem() the addr variable is only needed if PMAP_NEED_PROCWR is
defined.
 1.27 24-Nov-2000  chs branches: 1.27.2; 1.27.4; 1.27.8;
remove dead code and other misc cleanup.
 1.26 26-Sep-2000  thorpej PHOLD/PRELE around uvm_io() to user address space is unnecessary. There
is nothing in the U-area that we need.
 1.25 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.24 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.23 25-Mar-1999  sommerfe branches: 1.23.2; 1.23.8; 1.23.18;
Disallow tracing of processes unless tracer's root directory is at or
above tracee's root directory.
 1.22 24-Mar-1999  mrg completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.21 13-Mar-1999  thorpej Expose procfs_rwmem(). (This function will go away entirely when we
delete Mach VM.)
 1.20 25-Feb-1999  is Machine independent part of fix for PR 6152 (gdb doesn't work on machines
with UVM and seperate I&D-Cache). Mostly by Michael Hitch, but pass struct
proc * instead of the pmap. Reason: said machine will need a method to do
the syncing operation for "curproc", too; this way more code can be shared.
 1.19 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.18 10-Feb-1998  mrg branches: 1.18.2;
- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.17 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.16 13-Sep-1997  enami Use the same indentation as other two place, sys_ptrace() and
procfs_control().

Ok'ed by Jason R. Thorpe.
 1.15 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.14 27-Aug-1997  thorpej Fix a reversed argument which caused procfs_checkioperm() to always return
"OK". Add a few comments to avoid further confusion.
 1.13 13-Aug-1997  explorer Move procfs_checkioperm() from procvs_subr.c to procfs_mem.c, since _subr is
not included in a kernel without procfs, and it seems wrong to pull
all of procfs_subr.c in for just that one function. Perhaps this
should go into a new file instead?
 1.12 12-Aug-1997  thorpej Fix the procfs hole described on current-users, similar to a fix for
FreeBSD by Sean Eric Fagan, but a bit different. This makes the checks
in the same places as sef's FreeBSD patch, but does not hardcode the
"kmem" group into the kernel, and also does a check identical to the
(3) and (4) checks in the NetBSD ptrace(2):

(1) it's not owned by you, or is set-id on exec (unless
you're root), or

(2) it's init, which controls the security level of the
entire system, and the system was not compiled with
permanently insecure mode turned on.
 1.11 13-Oct-1996  christos branches: 1.11.10;
backout previous kprintf changes
 1.10 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.9 11-Jun-1996  mycroft Add a missing PHOLD()/PRELE() pair.
 1.8 09-Feb-1996  christos branches: 1.8.4;
miscfs prototype changes
 1.7 05-Jan-1995  chopps initialize variable as pointed out by David Jones <dej@qpoint.torfree.net>
this should fix pr #699
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.4 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.3 17-Mar-1994  briggs PG_COW -> PG_COPYONWRITE to match earlier changes in vm_page.h.
 1.2 05-Jan-1994  cgd make it compile (cleanly) for us
 1.1 05-Jan-1994  cgd branches: 1.1.1;
add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.8.4.1 10-Dec-1996  mycroft From trunk:
Add a missing PHOLD()/PRELE() pair.
 1.11.10.3 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.11.10.2 28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.11.10.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.18.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.23.18.1 14-Jan-2002  he Pull up revision 1.30 (requested by christos):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.23.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.23.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.23.2.1 14-Jan-2002  he Pull up revision 1.30 (requested by he):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.27.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.27.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.27.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.27.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.27.2.7 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.27.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.27.2.5 01-Apr-2002  nathanw Missed l => p conversion in previous.
 1.27.2.4 01-Apr-2002  nathanw procfs_domem() should take proc *, proc *; not proc *, lwp *.
 1.27.2.3 28-Feb-2002  nathanw Catch up to -current.
 1.27.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.27.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.33.2.5 24-Feb-2005  skrll Reduce diff to HEAD
 1.33.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.2 03-Aug-2004  skrll Sync with HEAD
 1.33.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.34.16.2 30-Dec-2006  yamt sync with head.
 1.34.16.1 21-Jun-2006  yamt sync with head.
 1.35.22.2 10-Dec-2006  yamt sync with head.
 1.35.22.1 22-Oct-2006  yamt sync with head
 1.35.20.1 18-Nov-2006  ad Sync with head.
 1.15 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.14 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.13 11-Dec-2005  christos branches: 1.13.20; 1.13.22;
merge ktrace-lwp.
 1.12 07-Aug-2003  agc branches: 1.12.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.11 29-Jun-2003  fvdl branches: 1.11.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.10 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.9 10-Nov-2001  lukem add RCSIDs
 1.8 29-Jun-1994  cgd branches: 1.8.46; 1.8.48; 1.8.52;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.6 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.5 05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.4 04-May-1994  cgd kill obvious bug; glad to know this was tested!
 1.3 04-May-1994  cgd Rename a lot of process flags.
 1.2 20-Jan-1994  ws Make procfs really work for debugging.
Implement not & notepg files in procfs.
 1.1 05-Jan-1994  cgd branches: 1.1.1;
add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.8.52.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.8.48.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.46.1 14-Nov-2001  nathanw Catch up to -current.
 1.11.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.11.2.2 03-Aug-2004  skrll Sync with HEAD
 1.11.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.12.16.2 30-Dec-2006  yamt sync with head.
 1.12.16.1 21-Jun-2006  yamt sync with head.
 1.13.22.2 10-Dec-2006  yamt sync with head.
 1.13.22.1 22-Oct-2006  yamt sync with head
 1.13.20.1 18-Nov-2006  ad Sync with head.
 1.23 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.22 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.21 11-Dec-2005  christos branches: 1.21.20; 1.21.22;
merge ktrace-lwp.
 1.20 07-Aug-2003  agc branches: 1.20.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 29-Jun-2003  fvdl branches: 1.19.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.18 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.17 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.16 09-May-2002  thorpej Move code shared by procfs and the kernel proper out of procfs and
into the kernel proper (renaming functions from procfs_* to process_*).
 1.15 12-Jan-2002  christos Don't hide the real return code with EPERM.
 1.14 05-Dec-2001  thorpej * Allow machine-dependent code to specify hooks for ptrace(2)
(__HAVE_PTRACE_MACHDEP) and procfs (__HAVE_PROCFS_MACHDEP).
These changes will allow platforms like x86 (XMM) and PowerPC
(AltiVec) to export extended register sets in a sane manner.

* Use __HAVE_PTRACE_MACHDEP to export x86 XMM registers (standard
FP + SSE/SSE2) using PT_{GET,SET}XMMREGS (in the machdep
ptrace request space).
* Use __HAVE_PROCFS_MACHDEP to export x86 XMM registers via
/proc/N/xmmregs in procfs.
 1.13 10-Nov-2001  lukem add RCSIDs
 1.12 17-Jan-2001  fvdl branches: 1.12.2; 1.12.4; 1.12.8;
Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.11 27-Aug-1997  thorpej branches: 1.11.12; 1.11.18; 1.11.28;
Fix a reversed argument which caused procfs_checkioperm() to always return
"OK". Add a few comments to avoid further confusion.
 1.10 12-Aug-1997  thorpej Fix the procfs hole described on current-users, similar to a fix for
FreeBSD by Sean Eric Fagan, but a bit different. This makes the checks
in the same places as sef's FreeBSD patch, but does not hardcode the
"kmem" group into the kernel, and also does a check identical to the
(3) and (4) checks in the NetBSD ptrace(2):

(1) it's not owned by you, or is set-id on exec (unless
you're root), or

(2) it's init, which controls the security level of the
entire system, and the system was not compiled with
permanently insecure mode turned on.
 1.9 13-Aug-1995  mycroft branches: 1.9.14;
Lock the process in core before operating on it.
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.6 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.5 04-May-1994  cgd Rename a lot of process flags.
 1.4 12-Apr-1994  cgd be a bit smarter about determining if files shouldn't be seen by the user.
Also, DON'T allow a lookup to succeed on a file that's not visible!
 1.3 28-Jan-1994  cgd make a fpregs file.
 1.2 08-Jan-1994  cgd reorganization of ptrace/procfs code
 1.1 05-Jan-1994  cgd branches: 1.1.1;
add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.9.14.2 28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.9.14.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.11.28.2 14-Jan-2002  he Pull up revision 1.15 (requested by christos):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.11.28.1 30-Mar-2001  he Pull up revision 1.12 (requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.11.18.1 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.11.12.1 14-Jan-2002  he Pull up revision 1.15 (requested by he):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.12.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.12.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.12.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.12.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.12.2.7 15-Oct-2002  nathanw Make _validfoo() routines go back to taking a proc.
 1.12.2.6 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.12.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.12.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.12.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.12.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.12.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.19.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.19.2.2 03-Aug-2004  skrll Sync with HEAD
 1.19.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.20.16.2 30-Dec-2006  yamt sync with head.
 1.20.16.1 21-Jun-2006  yamt sync with head.
 1.21.22.2 10-Dec-2006  yamt sync with head.
 1.21.22.1 22-Oct-2006  yamt sync with head
 1.21.20.1 18-Nov-2006  ad Sync with head.
 1.40 23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.39 29-Sep-2017  kre Use %ju and (intmax_t) to unbreak i386 build.
 1.38 29-Sep-2017  christos Split the status printing routines (one for NetBSD and one for Linux) for
simplicity (Robert Swindelis)
 1.37 14-Nov-2016  kre Return the "true" parent's pid as the parent pid (ppid) via the
various sysctl/procfs interfaces that allow it to be interrogated.
(This is rather than the temporary parent's pid when a process is
being traced and has been reparented.)

XXX The ppid in elf32 core files has not been similarly adjusted,
XXX Should it be ?
 1.36 21-Oct-2009  rmind branches: 1.36.22; 1.36.40; 1.36.44;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.35 11-Jan-2009  christos merge christos-time_t
 1.34 24-Apr-2008  ad branches: 1.34.2; 1.34.10;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.33 24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.32 09-Mar-2007  ad branches: 1.32.36; 1.32.38; 1.32.40;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.31 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.30 09-Feb-2007  ad branches: 1.30.2;
Merge newlock2 to head.
 1.29 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.28 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.27 14-May-2006  elad branches: 1.27.8; 1.27.10;
integrate kauth.
 1.26 11-Dec-2005  christos branches: 1.26.4; 1.26.6; 1.26.8; 1.26.10; 1.26.12;
merge ktrace-lwp.
 1.25 29-May-2005  christos branches: 1.25.2;
- sprinkle const
- avoid shadowed variables.
 1.24 26-Feb-2005  perry nuke trailing whitespace
 1.23 22-Apr-2004  itojun branches: 1.23.4; 1.23.6;
sprintf -> snprintf
 1.22 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.21 29-Jun-2003  fvdl branches: 1.21.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.20 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.19 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.18 07-Nov-2002  thorpej Fix signed/unsigned comparison warnings.
 1.17 10-Nov-2001  lukem add RCSIDs
 1.16 30-Dec-2000  david branches: 1.16.2; 1.16.4; 1.16.6; 1.16.8;
Increase psbuf size as in FreeBSD patch. We don't have jail(8), so the
recent bugtraq exploit doesn't apply, but it could be exploitable in
other ways.
 1.15 09-Aug-1998  perry branches: 1.15.12;
bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.14 14-Feb-1998  thorpej Prevent the session ID from disappearing if the session leader exits
(thus causing s_leader to become NULL) by storing the session ID separately
in the session structure. Export the session ID to userspace in the
eproc structure.

Submitted by Tom Proett <proett@nas.nasa.gov>.
 1.13 13-Oct-1996  christos backout previous kprintf changes
 1.12 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.11 16-Mar-1996  christos Fix printf format follies.
 1.10 01-Jun-1995  jtc Moved egid credential from cr_groups[0] to new field cr_gid. POSIX.1
requires that sgid executables and the setuid() syscall *not* change
the supplemental group list.
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.7 15-Jun-1994  mycroft Fix a bug pointed out by JSP.
 1.6 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.5 05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.4 04-May-1994  cgd Rename a lot of process flags.
 1.3 10-Jan-1994  ws Fix sign extension bug
 1.2 09-Jan-1994  ws Bug fixes and enhancements:
Make NFS serving work (BUT DON'T USE "attach" TO /proc/*/ctl FOR NOW!!!)
Make `curproc' a symbolic link
Add `.' and `..' entries to the directories.
Return better guesses on the size of the files.
 1.1 05-Jan-1994  cgd branches: 1.1.1;
add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.15.12.1 05-Jan-2001  bouyer Sync with HEAD
 1.16.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.16.6.2 13-Oct-2001  fvdl Revert the t_dev -> t_devvp change in struct tty. The way that tty
structs are currently used (especially by console ttys) aren't
ready for it, and this will require quite a few changes.
 1.16.6.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.16.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.2.3 11-Nov-2002  nathanw Catch up to -current
 1.16.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.21.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.21.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.21.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.21.2.2 03-Aug-2004  skrll Sync with HEAD
 1.21.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.23.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.23.4.1 29-Apr-2005  kent sync with -current
 1.25.2.4 03-Sep-2007  yamt sync with head.
 1.25.2.3 26-Feb-2007  yamt sync with head.
 1.25.2.2 30-Dec-2006  yamt sync with head.
 1.25.2.1 21-Jun-2006  yamt sync with head.
 1.26.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.26.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.26.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.26.8.1 24-May-2006  yamt sync with head.
 1.26.6.1 01-Jun-2006  kardel Sync with head.
 1.26.4.1 09-Sep-2006  rpaulo sync with head
 1.27.10.2 10-Dec-2006  yamt sync with head.
 1.27.10.1 22-Oct-2006  yamt sync with head
 1.27.8.3 18-Nov-2006  ad Sync with head.
 1.27.8.2 17-Nov-2006  ad Checkpoint work in progress.
 1.27.8.1 21-Oct-2006  ad - Make this compile. XXX Needs more work on locking.
- Do FILE_UNUSE() as the current LWP, otherwise we will wipe out the
target's advisory locks. XXX Double check.
 1.30.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.30.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.32.40.1 18-May-2008  yamt sync with head.
 1.32.38.3 09-Nov-2008  christos account for major and minor being unsigned long long
 1.32.38.2 01-Nov-2008  christos Sync with head.
 1.32.38.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.32.36.2 17-Jan-2009  mjf Sync with HEAD.
 1.32.36.1 02-Jun-2008  mjf Sync with HEAD.
 1.34.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.34.2.2 11-Mar-2010  yamt sync with head
 1.34.2.1 04-May-2009  yamt sync with head.
 1.36.44.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.36.40.1 05-Dec-2016  skrll Sync with HEAD
 1.36.22.1 03-Dec-2017  jdolecek update from HEAD
 1.120 01-Jul-2024  christos Add linux POSIX message queue support (Ricardo Branco)
 1.119 12-May-2024  christos branches: 1.119.2;
PR/58227: Ricardo Branco: Add support for proc/sysvipc in Linux emulator
 1.118 12-May-2024  christos PR/58240: Ricardo Branco: Add support for proc/self/limits as used by Linux
 1.117 17-Jan-2024  hannken Using the exechook to revoke procfs nodes is racy and may deadlock:

one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
 1.116 23-May-2020  ad branches: 1.116.20;
Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.115 29-Apr-2020  thorpej If the procfs mount is marked as linux-compat, then allow proc lookup
by any LWP ID in the proc, not just the canonical PID.
 1.114 26-Sep-2019  christos fix sign-compare issues: uio->uio_offset (off_t) is compared with (size_t):
cast the offset to size_t.
 1.113 30-Mar-2019  christos add a node for the process resource limits.
 1.112 16-Apr-2018  hannken branches: 1.112.2;
Change procfs_revoke_vnodes() to use vrecycle()/vgone() instead
of VOP_REVOKE().

Gets rid of a bunch of suspensions on /proc as vrecycle() will
succeed most time and we suspend at most once per call.
 1.111 31-Dec-2017  christos branches: 1.111.2;
rename some "cmdline" stuff now that it is used to print environment too
 1.110 31-Dec-2017  christos Add an environ node
 1.109 28-Aug-2017  kamil Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
 1.108 01-Apr-2017  riastradh branches: 1.108.6;
KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.107 30-Mar-2017  christos add an auxv node.
 1.106 10-Nov-2014  maxv branches: 1.106.2; 1.106.4; 1.106.6;
Do not uselessly include <sys/malloc.h>.
 1.105 27-Jul-2014  hannken branches: 1.105.2;
Change procfs from hashlist to vcache.
- Key is (type, pid, fd)
- Remove argument "p" from procfs_allocvp(). It is only used
when "type == PFSfd". Lookup the proc with proc_find() when
procfs_loadvnode() needs it.
- Use a vfs_vnode_iterator for procfs_revoke_vnodes().
 1.104 07-Feb-2014  hannken branches: 1.104.2;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.103 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.102 25-Nov-2012  christos branches: 1.102.2;
do something reasonable with kernel semaphores.
 1.101 28-May-2012  christos branches: 1.101.2;
add a task process subdirectory for emul linux
 1.100 04-Sep-2011  jmcneill branches: 1.100.2; 1.100.6;
PR# kern/45021: Please support /emul/linux/proc/version

Add /proc/version for procfs with -o linux. The version reported depends
on the emulation type of the calling process:

$ cat /proc/version
NetBSD version 5.99.55 (netbsd@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) NetBSD 5.99.55 (GENERIC) #39: Sun Sep 4 09:10:05 EDT 2011

$ /emul/linux/bin/cat /proc/version
Linux version 2.6.18 (linux@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010

$ /emul/linux32/bin/cat /proc/version
Linux version 2.6.18 (linux32@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010
 1.99 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.98 21-Jul-2010  hannken branches: 1.98.6;
Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.97 01-Jul-2010  hannken Remove vlockmgr(). Generic vnode lock operations now use a rwlock located
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().

Welcome to 5.99.34.

Discussed on tech-kern.
 1.96 01-Jul-2010  rmind Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.
 1.95 15-Mar-2009  cegger branches: 1.95.2; 1.95.4;
ansify function definitions
 1.94 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.93 17-Dec-2008  cegger branches: 1.93.2;
kill MALLOC and FREE macros.
 1.92 05-Sep-2008  skrll branches: 1.92.2;
PR/39324 kernel diagnostic assertion "l->l_stat != LSZOMB" failed.

Ignore procs with zero or all LSZOMB LWPs. Get a non-LSZOMB LWP to perform
operations against as part of the deal.

procfs really needs to be updated to support multi-threading fully.
Hi Antti!
 1.91 02-Jul-2008  rmind branches: 1.91.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.
 1.90 05-May-2008  ad branches: 1.90.2; 1.90.4;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.89 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.88 24-Apr-2008  ad branches: 1.88.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.87 24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.86 21-Mar-2008  ad branches: 1.86.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.85 30-Jan-2008  ad branches: 1.85.6;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.84 23-Jan-2008  elad Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.
 1.83 02-Jan-2008  ad Merge vmlocking2 to head.
 1.82 07-Nov-2007  ad branches: 1.82.2; 1.82.6;
Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.81 10-Oct-2007  ad branches: 1.81.2; 1.81.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.80 24-May-2007  agc branches: 1.80.6; 1.80.8; 1.80.10;
Extend the Linux emulation of /proc to include

/proc/stat
/proc/loadavg and
/proc/<pid>/statm.

These are only present when -o linux is specified as a mount option
to procfs.

Factor out some common code so that it can be used by a number of
functions.

XXX The values returned in the statm emulation need to be verified.
 1.79 09-Mar-2007  ad branches: 1.79.2; 1.79.4;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.78 27-Feb-2007  ad Destroy the hash locks on final unmount.
 1.77 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.76 15-Feb-2007  ad branches: 1.76.2;
Replace some uses of lockmgr() / simplelocks.
 1.75 09-Feb-2007  ad Merge newlock2 to head.
 1.74 24-Dec-2006  christos fix permissions on /proc/<pid> node. From elad.
 1.73 28-Nov-2006  elad Move ktrace, ptrace, systrace, and procfs to use kauth(9).

First, remove process_checkioperm() calls from MD code. Similar checks
using kauth(9) routines (on the process scope, using appropriate action)
are done in the callers.

Add secmodel back-end to handle each subsystem.
 1.72 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.71 29-Oct-2006  christos add an "emul" file node.
 1.70 25-Oct-2006  christos 1. fix procfs_validfile{,_linux} to test for NULL pointers properly.
2. make "exe" entry be a symlink to the executable, instead of pointing
directly to the vnode of the executable.
3. factor out commonly used code.
 1.69 20-Sep-2006  manu Emulate Linux's /proc/devices
 1.68 01-Mar-2006  yamt branches: 1.68.14; 1.68.16;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.67 11-Dec-2005  christos branches: 1.67.2; 1.67.4; 1.67.6;
merge ktrace-lwp.
 1.66 01-Oct-2005  atatat Add "cwd" and "root" symlinks to each process's directory. The cwd
link points to the process's current working directory, and the root
link points to the process's root directory. What else would you
expect?

For directories that are out of reach (caller is in a chroot, target
process is in a different chroot, etc), the links point to "/"
instead.
 1.65 30-Aug-2005  xtraeme Remove __P()
 1.64 29-May-2005  christos branches: 1.64.2;
- sprinkle const
- avoid shadowed variables.
 1.63 26-Feb-2005  perry nuke trailing whitespace
 1.62 20-Sep-2004  jdolecek branches: 1.62.4; 1.62.6;
add 'mounts' file for -o linux, which lists all currently mounted
filesystems; Linux glibc statvfs() uses this to get some of mount flags,
and this file is also useful as /emul/linux/etc/mtab (via symlink)
 1.61 27-Aug-2004  skrll Do previous slightly differently - just pass a struct lwp * and derive the
struct proc *.

OK'd by Jaromir.
 1.60 21-Aug-2004  jdolecek fix process used for /proc/<pid>/stat contents - it should be process
<pid>, not the current process looking at the information
 1.59 14-May-2004  christos Simplify the code by:
1. Checking for a negative uio_offset at the beginning. This really does
not affect us in most cases because we check that later too.
2. Checking for attempts to write to init sooner and in all cases.
 1.58 27-Sep-2003  darcy branches: 1.58.2; 1.58.4;
Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.57 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.56 29-Jun-2003  fvdl branches: 1.56.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.55 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.54 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.53 28-May-2003  christos Add /proc/<pid>/stat for linux compat. j2sdk1.4.2 depends on it.
 1.52 18-Apr-2003  christos Make the mode of /proc/<pid>/fd dr-x------
 1.51 18-Apr-2003  christos Make symlinks for directories that point to the actual directory.
Make symlinks to [kqueue] and [misc] for kqueue and misc fds.
 1.50 17-Apr-2003  jdolecek do not show nodes corresponding to directory descriptors for process
in fd/ subdirectory, nor allow lookup/open for the nodes
this fixes PR kern/21187 for good, and also avoids interesting directory
locking issues
 1.49 17-Apr-2003  jdolecek use fd_getfile() in procfs_getfp(), and FILE_USE()/FILE_UNUSE() the
returned file descriptor pointer appropriately
 1.48 15-Mar-2003  enami Release the hash lock on failure.
 1.47 04-Mar-2003  tron Teach procfs_allocvp() about Puptime to avoid panics if "/proc/uptime"
is opened.
 1.46 25-Feb-2003  jrf This addresses PR kerm/19989. Thanks to hamajima@nagoya.ydc.co.jp for submitting this patch which enables /proc/uptime for linux emul. Patch reviewed by atatat@netbsd.org and tron@netbsd.org, approved by tron@netbsd.org.
 1.45 03-Feb-2003  jdolecek don't bother special-casing DTYPE_KQUEUE/DTYPE_MISC nor panic for unknown
descriptors; just return with EOPNOTSUPP for any unsupported descriptor type
 1.44 03-Feb-2003  jdolecek procfs_allocvp():
* do not set *vpp unless successful, otherwise we'd trigger
DIAGNOSTIC panic in lookup(9) on error return
* on error, make sure to free malloc'ed memory and ungetnewvnode() the
previously acquired vnode

this fixes panic on 'tail -f <file> &; ls -l /proc/$!/fd' reported by
Andrew Brown

fix reviewed by Christos Zoulas
 1.43 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.42 03-Jan-2003  christos Implement /proc/<pid>/fd/<n>. This is work in progress. Questionable things:
- Is it ok to convert DTYPE_PIPE to VFIFO and DTYPE_SOCKET to VSOCK?
- XXX: Avoid locking issue in ls -Rl /proc by avoiding curproc
- Does I/O to pipes work?
- XXX: Are there security implications?
 1.41 07-Nov-2002  thorpej Fix a signed/unsigned comparison warning.
 1.40 05-Dec-2001  thorpej * Allow machine-dependent code to specify hooks for ptrace(2)
(__HAVE_PTRACE_MACHDEP) and procfs (__HAVE_PROCFS_MACHDEP).
These changes will allow platforms like x86 (XMM) and PowerPC
(AltiVec) to export extended register sets in a sane manner.

* Use __HAVE_PTRACE_MACHDEP to export x86 XMM registers (standard
FP + SSE/SSE2) using PT_{GET,SET}XMMREGS (in the machdep
ptrace request space).
* Use __HAVE_PROCFS_MACHDEP to export x86 XMM registers via
/proc/N/xmmregs in procfs.
 1.39 10-Nov-2001  lukem add RCSIDs
 1.38 15-Sep-2001  chs branches: 1.38.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.37 29-Mar-2001  fvdl branches: 1.37.2; 1.37.4;
For -o linux mounts, add some code to emulate /proc/#/maps.
Needs NAMECACHE_ENTER_REVERSE to include filenames.
 1.36 18-Jan-2001  jdolecek branches: 1.36.2;
constify
 1.35 17-Jan-2001  fvdl Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.34 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.33 24-Nov-2000  chs remove dead code and other misc cleanup.
 1.32 08-Nov-2000  ad Update for hashinit() change.
 1.31 16-Mar-2000  jdolecek branches: 1.31.4;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.30 25-Feb-2000  fvdl Fix pasto: some lines of the procfs hash code were copied from the
UFS code, and I forgot to rename the "ihash" variable, causing
weird effects, because 3/4th of the UFS hash table would become
unreachable after procfs was loaded as an LKM.
 1.29 25-Jan-2000  fvdl At mount/unmount time, add an exec hook to revoke all vnodes iff the
process is about to exec a sugid binary.

To speed up things, use hashing for vnode allocation, like other filesystems
do. This avoids walking the whole procfs node list in the revoke case too.
 1.28 02-Sep-1999  thorpej branches: 1.28.2;
Make /proc/self a symlink to /proc/curproc. I've observed Linux programs
that expect /proc/self/cmdline to exist.
 1.27 08-Jul-1999  wrstuden Bump osrelease to 1.4E. Add layerfs files, remove null_subr.c.

Update coda to new struct lock in struct vnode.

make fdescfs, kernfs, portalfs, and procfs actually lock their vnodes.
It's not that hard.

Make unionfs set v_vnlock = NULL so any overlayed fs will call its
VOP_LOCK.
 1.26 12-Mar-1999  christos branches: 1.26.2; 1.26.4;
PR/7143: Jaromir Docelek: Add procfs/cmdline from Linux emulation
 1.25 25-Jan-1999  msaitoh Add /proc/#/map. From FreeBSD.
 1.24 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.23 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.22 30-Oct-1997  mycroft Make the curproc link executable.
 1.21 13-Aug-1997  explorer branches: 1.21.4;
Move procfs_checkioperm() from procvs_subr.c to procfs_mem.c, since _subr is
not included in a kernel without procfs, and it seems wrong to pull
all of procfs_subr.c in for just that one function. Perhaps this
should go into a new file instead?
 1.20 12-Aug-1997  thorpej Fix the procfs hole described on current-users, similar to a fix for
FreeBSD by Sean Eric Fagan, but a bit different. This makes the checks
in the same places as sef's FreeBSD patch, but does not hardcode the
"kmem" group into the kernel, and also does a check identical to the
(3) and (4) checks in the NetBSD ptrace(2):

(1) it's not owned by you, or is set-id on exec (unless
you're root), or

(2) it's init, which controls the security level of the
entire system, and the system was not compiled with
permanently insecure mode turned on.
 1.19 25-Jun-1997  mycroft branches: 1.19.4;
Don't allow writes to init's memory or registers while in secure mode.
 1.18 05-May-1997  mycroft Need stat.h.
 1.17 05-May-1997  mycroft Eliminate bogus uses of V{READ,WRITE,EXEC}. Use S_I[RWX]{USR,GRP,OTH} where
appropriate.
 1.16 25-Oct-1996  cgd remove bogus cast of second arg to bcmp(). (nm_name is a const char*,
and was being unnecessarily cast to 'char *'; -Wcast-qual.)
 1.15 12-Feb-1996  christos close PR/2063: procfs_rw prototyped twice with different prototypes
 1.14 09-Feb-1996  christos miscfs prototype changes
 1.13 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.12 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.11 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.10 25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.9 28-Jan-1994  cgd make a fpregs file.
 1.8 20-Jan-1994  ws Make procfs really work for debugging.
Implement not & notepg files in procfs.
 1.7 10-Jan-1994  mycroft Add a missing break so my machine doesn't panic.
 1.6 09-Jan-1994  ws Bug fixes and enhancements:
Make NFS serving work (BUT DON'T USE "attach" TO /proc/*/ctl FOR NOW!!!)
Make `curproc' a symbolic link
Add `.' and `..' entries to the directories.
Return better guesses on the size of the files.
 1.5 05-Jan-1994  cgd add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 24-Aug-1993  pk branches: 1.3.2;
copyright update.
 1.2 24-Aug-1993  pk Rcs Id added.
 1.1 24-Aug-1993  pk branches: 1.1.1;
Initial version of a proc filesystem.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.3.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.19.4.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.21.4.1 30-Oct-1997  mellon Pull rev 1.22 up from trunk (mycroft)
 1.26.4.1 02-Aug-1999  thorpej Update from trunk.
 1.26.2.2 28-Feb-2000  he Pull up revision 1.30 (requested by fvdl):
Fix a critical typo in the earlier procfs security fix.
 1.26.2.1 01-Feb-2000  he Pull up revision 1.29 (via patch, requested by fvdl):
Close procfs security hole. Fixes SA#2000-001.
 1.28.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.28.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.28.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.28.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.28.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.28.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.31.4.1 30-Mar-2001  he Pull up revision 1.35 (requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.36.2.9 07-Jan-2003  thorpej Sync with HEAD.
 1.36.2.8 11-Nov-2002  nathanw Catch up to -current
 1.36.2.7 01-Apr-2002  nathanw procfs_domem() should take proc *, proc *; not proc *, lwp *.
 1.36.2.6 09-Jan-2002  nathanw Use proc_representative_lwp() instead of bailing out.
Adapt PROCFS_MACHDEP to lwps.
 1.36.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.36.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.36.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.36.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.36.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.37.4.2 01-Oct-2001  fvdl Catch up with -current.
 1.37.4.1 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.37.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.38.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.56.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.56.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.56.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.56.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.56.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.56.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.56.2.3 18-Aug-2004  skrll Revert to passing struct proc for {exit,exec}hook.
 1.56.2.2 03-Aug-2004  skrll Sync with HEAD
 1.56.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.58.4.1 31-Aug-2005  tron Pull up following revision(s) (requested by christos in ticket #5634):
sys/miscfs/procfs/procfs_subr.c: revision 1.59
Simplify the code by:
1. Checking for a negative uio_offset at the beginning. This really does
not affect us in most cases because we check that later too.
2. Checking for attempts to write to init sooner and in all cases.
 1.58.2.1 31-Aug-2005  tron Pull up following revision(s) (requested by christos in ticket #5634):
sys/miscfs/procfs/procfs_subr.c: revision 1.59
Simplify the code by:
1. Checking for a negative uio_offset at the beginning. This really does
not affect us in most cases because we check that later too.
2. Checking for attempts to write to init sooner and in all cases.
 1.62.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.62.4.1 29-Apr-2005  kent sync with -current
 1.64.2.9 24-Mar-2008  yamt sync with head.
 1.64.2.8 04-Feb-2008  yamt sync with head.
 1.64.2.7 21-Jan-2008  yamt sync with head
 1.64.2.6 15-Nov-2007  yamt sync with head.
 1.64.2.5 27-Oct-2007  yamt sync with head.
 1.64.2.4 03-Sep-2007  yamt sync with head.
 1.64.2.3 26-Feb-2007  yamt sync with head.
 1.64.2.2 30-Dec-2006  yamt sync with head.
 1.64.2.1 21-Jun-2006  yamt sync with head.
 1.67.6.1 22-Apr-2006  simonb Sync with head.
 1.67.4.1 09-Sep-2006  rpaulo sync with head
 1.67.2.1 15-Jan-2006  yamt convert procfs.
 1.68.16.2 10-Dec-2006  yamt sync with head.
 1.68.16.1 22-Oct-2006  yamt sync with head
 1.68.14.6 12-Jan-2007  ad Sync with head.
 1.68.14.5 29-Dec-2006  ad Checkpoint work in progress.
 1.68.14.4 18-Nov-2006  ad Sync with head.
 1.68.14.3 17-Nov-2006  ad Checkpoint work in progress.
 1.68.14.2 24-Oct-2006  ad - Redo LWP locking slightly and fix some races.
- Fix some locking botches.
- Make signal mask / stack per-proc for SA processes.
- Add _lwp_kill().
 1.68.14.1 21-Oct-2006  ad - Make this compile. XXX Needs more work on locking.
- Do FILE_UNUSE() as the current LWP, otherwise we will wipe out the
target's advisory locks. XXX Double check.
 1.76.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.76.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.79.4.1 11-Jul-2007  mjf Sync with head.
 1.79.2.4 25-Oct-2007  ad - Simplify debugger/procfs reference counting of processes. Use a per-proc
rwlock: rw_tryenter(RW_READER) to gain a reference, and rw_enter(RW_WRITER)
by the process itself to drain out reference holders before major changes
like exiting.
- Fix numerous bugs and locking issues in procfs.
- Mark procfs MPSAFE.
 1.79.2.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.79.2.2 08-Jun-2007  ad Sync with head.
 1.79.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.80.10.1 14-Oct-2007  yamt sync with head.
 1.80.8.4 23-Mar-2008  matt sync with HEAD
 1.80.8.3 09-Jan-2008  matt sync with HEAD
 1.80.8.2 08-Nov-2007  matt sync with -HEAD
 1.80.8.1 06-Nov-2007  matt sync with HEAD
 1.80.6.2 11-Nov-2007  joerg Sync with HEAD.
 1.80.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.81.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.81.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.81.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.82.6.2 23-Jan-2008  bouyer Sync with HEAD.
 1.82.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.82.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.85.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.85.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.85.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.85.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.86.2.1 18-May-2008  yamt sync with head.
 1.88.2.3 11-Aug-2010  yamt sync with head.
 1.88.2.2 04-May-2009  yamt sync with head.
 1.88.2.1 16-May-2008  yamt sync with head.
 1.90.4.1 03-Jul-2008  simonb Sync with head.
 1.90.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.91.2.1 19-Oct-2008  haad Sync with HEAD.
 1.92.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.92.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.93.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.95.4.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.95.4.3 05-Mar-2011  rmind sync with head
 1.95.4.2 03-Jul-2010  rmind sync with head
 1.95.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.95.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.98.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.100.6.1 02-Jun-2012  mrg sync to latest -current.
 1.100.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.100.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.100.2.1 30-Oct-2012  yamt sync with head
 1.101.2.3 03-Dec-2017  jdolecek update from HEAD
 1.101.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.101.2.1 25-Feb-2013  tls resync with head
 1.102.2.1 18-May-2014  rmind sync with head
 1.104.2.1 10-Aug-2014  tls Rebase.
 1.105.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.106.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.106.4.1 26-Apr-2017  pgoyette Sync with HEAD
 1.106.2.1 28-Aug-2017  skrll Sync with HEAD
 1.108.6.2 17-Apr-2018  martin Pull up following revision(s) (requested by hannken in ticket #772):

sys/miscfs/procfs/procfs_subr.c: revision 1.112

Change procfs_revoke_vnodes() to use vrecycle()/vgone() instead
of VOP_REVOKE().

Gets rid of a bunch of suspensions on /proc as vrecycle() will
succeed most time and we suspend at most once per call.
 1.108.6.1 12-Apr-2018  martin Pull up following revision(s) (requested by kamil in ticket #713):

sys/modules/procfs/Makefile: revision 1.4
sys/miscfs/procfs/procfs_vfsops.c: revision 1.98
bin/ps/ps.1: revision 1.108
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.32
sys/miscfs/procfs/procfs_vnops.c: revision 1.198
sys/kern/sys_ptrace_common.c: revision 1.23
sys/kern/sys_ptrace_common.c: revision 1.24
sbin/mount_procfs/mount_procfs.8: revision 1.36
sys/kern/sys_ptrace_common.c: revision 1.25
sys/kern/sys_ptrace.c: revision 1.5
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.30
sys/sys/proc.h: revision 1.342
sys/kern/sys_ptrace_common.c: revision 1.26
sys/miscfs/procfs/procfs_ctl.c: file removal
sys/kern/sys_ptrace_common.c: revision 1.27
sys/miscfs/procfs/procfs_subr.c: revision 1.109
sys/kern/sys_ptrace_common.c: revision 1.28
sys/secmodel/extensions/secmodel_extensions.c: revision 1.8
sys/kern/sys_ptrace_common.c: revision 1.29
sys/sys/ptrace.h: revision 1.62
sys/compat/netbsd32/netbsd32_signal.c: revision 1.45
share/man/man9/kauth.9: revision 1.109
sys/miscfs/procfs/files.procfs: revision 1.12
sys/compat/netbsd32/netbsd32.h: revision 1.115
sys/miscfs/procfs/procfs.h: revision 1.72
sys/compat/netbsd32/netbsd32_ptrace.c: revision 1.5
sys/kern/kern_sig.c: revision 1.337
sys/sys/kauth.h: revision 1.75
sys/sys/sysctl.h: revision 1.224
sys/kern/sys_ptrace_common.c: revision 1.30
sys/kern/sys_ptrace_common.c: revision 1.31
sys/kern/sys_ptrace_common.c: revision 1.32
sys/kern/sys_ptrace_common.c: revision 1.33
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.20
sys/kern/sys_ptrace_common.c: revision 1.34
sys/kern/sys_ptrace_common.c: revision 1.36
sys/kern/kern_proc.c: revision 1.207
sys/kern/kern_exit.c: revision 1.269
doc/TODO.ptrace: revision 1.29

Make {s,g}et{db,fp,}regs work again for PK_32 processes
XXX: pullup-8

add disgusting magic to handle compat_netbsd32 as a module.

use process_*reg32 instead of struct *reg32.

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed

PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>

untangle the mess:
- factor out common code
- break each ptrace subcall to its own sub-function
.. more to come ...
- reduce ifdef ugliness by moving it up top.
- factor out PT_IO and make PT_{READ,WRITE}_{I,D} use it
- factor out PT_DUMPCORE
- factor out sendsig code
.. more to come ...

handle siginfo requests for ptrace32

ptrace: Partially undo PT_{READ,WRITE}_{I,D} and unbreak these commands

The refactored code did not work and was generating EFAULT.

Sponsored by <The NetBSD Foundation>

Merge the code back; the problem was that since we are reading/writing
to a kernel address for PT_{READ,WRITE}_{I,D} we need the kernel vmspace.
provide separate read and write functions to accomodate register functions
that need a size argument.

don't ignore error from copyout_piod

Use the proper process (the tracee) to get information about lwps and
registers and the tracer for vmspace.

Add new sysctl(3) entry: security.models.extensions.user_set_dbregs

Model this new sysctl(3) entry after "user_set_cpu_affinity" in the same
level of sysctl(3) switches.

Allow to read unconditionally Debug Registers (no change here). This is
convenient as even if a user of a debugger does not use hardware assisted
watchpoints/breakpoints, a debugger can still prompt these values to store
in an internal cache with context of registers. Reading them should have
no security concerns.

Add a paranoid MI switch that prohibits by default setting these registers
by a regular user (non-superuser). Make this switch disabled by default.
There are enough reserved bits out there to allow using them
unconditionally on hardened hosts.

Features shipped with Debug Registers are optional features in debuggers.
There is no reduction in elementary functionality.

Reviewed by <christos>

Sponsored by <The NetBSD Foundation>
 1.111.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.112.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.112.2.1 10-Jun-2019  christos Sync with HEAD
 1.116.20.1 18-Apr-2024  martin Pull up following revision(s) (requested by hannken in ticket #668):

sys/miscfs/procfs/procfs.h: revision 1.83
sys/miscfs/procfs/procfs.h: revision 1.84
sys/kern/vfs_mount.c: revision 1.104
sys/miscfs/procfs/procfs_vnops.c: revision 1.230
sys/kern/init_main.c: revision 1.547
sys/kern/kern_hook.c: revision 1.15
sys/miscfs/procfs/procfs_vfsops.c: revision 1.112
sys/miscfs/procfs/procfs_vfsops.c: revision 1.113
sys/miscfs/procfs/procfs_vfsops.c: revision 1.114
sys/miscfs/procfs/procfs_subr.c: revision 1.117

Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
Protect kernel hooks exechook, exithook and forkhook with rwlock.

Lock as writer on establish/disestablish and as reader on list traverse.

For exechook ride "exec_lock" as it is already take as reader when
traversing the list. Add local locks for exithook and forkhook.

Move exec_init before signal_init as signal_init calls exechook_establish()
that needs "exec_lock".

PR kern/39913 "exec, fork, exit hooks need locking"

Add a hashmap to access all procfs nodes by pid.

Using the exechook to revoke procfs nodes is racy and may deadlock:
one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"

Remove all procfs nodes for this process on process exit.
 1.119.2.1 02-Aug-2025  perseant Sync with HEAD
 1.120 14-Sep-2024  pgoyette Define dependencies based on build options.
 1.119 09-Sep-2024  pgoyette Now we have another dependency for the SYSV_* stuff.
 1.118 09-Sep-2024  pgoyette procfs grew a new dependency
 1.117 01-Jul-2024  christos Add linux POSIX message queue support (Ricardo Branco)
 1.116 12-May-2024  christos branches: 1.116.2;
PR/58227: Ricardo Branco: Add support for proc/sysvipc in Linux emulator
 1.115 12-May-2024  christos PR/58240: Ricardo Branco: Add support for proc/self/limits as used by Linux
 1.114 17-Jan-2024  hannken Remove all procfs nodes for this process on process exit.
 1.113 17-Jan-2024  hannken Using the exechook to revoke procfs nodes is racy and may deadlock:

one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
 1.112 17-Jan-2024  hannken Add a hashmap to access all procfs nodes by pid.
 1.111 17-Jan-2022  bouyer branches: 1.111.4;
If the calling process is running under linux emulation, make /proc/xxx/fd/
return only symlinks pointing to the original file in the filesystem,
instead of a hard link. This matches the linux behavior, and some
linux programs relies on it (they unconditionally call readlink() on
/proc/xxx/fd/yy and don't deal with it returning EINVAL).
Proposed on tech-kern@ in
http://mail-index.netbsd.org/tech-kern/2022/01/11/msg027877.html
 1.110 28-Dec-2020  riastradh Fix procfs environ node.
 1.109 23-May-2020  ad branches: 1.109.2;
Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.108 29-Apr-2020  thorpej If the procfs mount is marked as linux-compat, then allow proc lookup
by any LWP ID in the proc, not just the canonical PID.
 1.107 20-Apr-2020  htodd Sort include files.
 1.106 20-Apr-2020  htodd Add missing include to fix build.
 1.105 19-Apr-2020  thorpej - Only increment nprocs when we're creating a new process, not just
when allocating a PID.
- Per above, proc_free_pid() no longer decrements nprocs. It's now done
in proc_free() right after proc_free_pid().
- Ensure nprocs is accessed using atomics everywhere.
 1.104 04-Apr-2020  ad branches: 1.104.2;
Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.103 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.102 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.101 30-Mar-2019  christos branches: 1.101.4; 1.101.6;
add a node for the process resource limits.
 1.100 31-Dec-2017  christos branches: 1.100.4;
rename some "cmdline" stuff now that it is used to print environment too
 1.99 31-Dec-2017  christos Add an environ node
 1.98 28-Aug-2017  kamil Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
 1.97 30-Mar-2017  christos branches: 1.97.6;
add an auxv node.
 1.96 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.95 03-Nov-2016  pgoyette branches: 1.95.2;
Module procfs needs ptrace_common for process_do{,fp}regs
 1.94 10-Nov-2014  maxv branches: 1.94.2; 1.94.4;
Do not uselessly include <sys/malloc.h>.
 1.93 05-Sep-2014  matt Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.92 27-Jul-2014  hannken branches: 1.92.2;
Change procfs from hashlist to vcache.
- Key is (type, pid, fd)
- Remove argument "p" from procfs_allocvp(). It is only used
when "type == PFSfd". Lookup the proc with proc_find() when
procfs_loadvnode() needs it.
- Use a vfs_vnode_iterator for procfs_revoke_vnodes().
 1.91 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.90 23-Mar-2014  hannken branches: 1.90.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.89 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.88 07-Feb-2014  hannken Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.87 30-Apr-2012  rmind branches: 1.87.2; 1.87.4;
- Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.86 27-Sep-2011  christos branches: 1.86.2; 1.86.6; 1.86.8; 1.86.12; 1.86.14;
define PROCFS_MAXNAMLEN and use it.
 1.85 30-Nov-2009  pooka Introduce genfs_statvfs() as pretty much a no-info statvfs and
convert several pseudo file systems to use it.
 1.84 02-Oct-2009  elad Put procfs policy back in the subsystem.
 1.83 15-Mar-2009  cegger ansify function definitions
 1.82 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.81 28-Jun-2008  rumble branches: 1.81.4; 1.81.6; 1.81.10; 1.81.16; 1.81.20;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.80 13-May-2008  simonb branches: 1.80.2;
mnt_data is a pointer, set it to NULL not 0 when we're finished with it.
 1.79 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.78 29-Apr-2008  ad branches: 1.78.2;
PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.77 28-Jan-2008  dholland branches: 1.77.6; 1.77.8; 1.77.10;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.76 26-Dec-2007  ad Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.
 1.75 26-Nov-2007  pooka branches: 1.75.2; 1.75.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.74 31-Jul-2007  pooka branches: 1.74.2; 1.74.4; 1.74.10; 1.74.12;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.73 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.72 17-Jul-2007  pooka branches: 1.72.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.71 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.70 09-Feb-2007  ad branches: 1.70.6;
Merge newlock2 to head.
 1.69 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.68 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.67 16-Nov-2006  christos branches: 1.67.2;
__unused removal on arguments; approved by core.
 1.66 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.65 03-Sep-2006  christos branches: 1.65.2; 1.65.4;
add missing initializers
 1.64 14-May-2006  elad integrate kauth.
 1.63 11-Dec-2005  christos branches: 1.63.4; 1.63.6; 1.63.8; 1.63.10; 1.63.12;
merge ktrace-lwp.
 1.62 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.61 30-Aug-2005  xtraeme Remove __P()
 1.60 29-Mar-2005  thorpej branches: 1.60.2;
- Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.59 02-Jan-2005  thorpej branches: 1.59.2;
Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.58 13-Sep-2004  jdolecek set mp->mnt_stat.f_namemax on filesystem mount, for use by statvfs
 1.57 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.56 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.55 27-Apr-2004  jrf First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
 1.54 21-Apr-2004  christos add sys/dirent.h
 1.53 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.52 24-Mar-2004  atatat branches: 1.52.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.51 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.50 27-Sep-2003  darcy Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.49 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.48 29-Jun-2003  fvdl branches: 1.48.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.47 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.46 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.45 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.44 03-Jan-2003  christos Implement /proc/<pid>/fd/<n>. This is work in progress. Questionable things:
- Is it ok to convert DTYPE_PIPE to VFIFO and DTYPE_SOCKET to VSOCK?
- XXX: Avoid locking issue in ls -Rl /proc by avoiding curproc
- Does I/O to pipes work?
- XXX: Are there security implications?
 1.43 21-Sep-2002  christos MNT_GETARGS support
 1.42 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.41 10-Nov-2001  lukem branches: 1.41.8;
add RCSIDs
 1.40 15-Sep-2001  chs branches: 1.40.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.39 30-May-2001  mrg branches: 1.39.2; 1.39.4;
use _KERNEL_OPT
 1.38 25-Jan-2001  jdolecek branches: 1.38.2;
g/c pmnt_mp in struct procfs_args
 1.37 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.36 17-Jan-2001  fvdl Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.35 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.34 10-Jun-2000  assar branches: 1.34.2;
make vfs_getnewfsid only take one argument and fetch the name of the
filesystem from the supplied mount argument. also make makefstype
take a const parameter. update all the callers.
 1.33 16-Mar-2000  jdolecek branches: 1.33.2;
Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.

For each leaf filesystem, add appropriate vfs_done routine.
 1.32 25-Jan-2000  fvdl At mount/unmount time, add an exec hook to revoke all vnodes iff the
process is about to exec a sugid binary.

To speed up things, use hashing for vnode allocation, like other filesystems
do. This avoids walking the whole procfs node list in the revoke case too.
 1.31 26-Feb-1999  wrstuden branches: 1.31.2; 1.31.8; 1.31.14;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.30 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.29 05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.28 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.27 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.26 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.25 09-Feb-1996  christos miscfs prototype changes
 1.24 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.23 09-Mar-1995  mycroft copy*str() should use size_t.
 1.22 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a (tiny) bit.
 1.21 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.20 15-Sep-1994  mycroft Fix typo.
 1.19 15-Sep-1994  mycroft stat the file system at mount time, for `df -n', et al.
 1.18 29-Jun-1994  cgd branches: 1.18.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.17 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.16 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.15 23-Apr-1994  cgd make fs types consistent over new kernels. also, some proto foo.
 1.14 21-Apr-1994  cgd Convert mount, vnode, and buf structs to use <sys/queue.h>. Also,
some knf and structure frobbing to do along with it.
 1.13 15-Apr-1994  cgd forgot these...
 1.12 14-Apr-1994  cgd fs types are names now.
 1.11 20-Jan-1994  ws Make procfs really work for debugging.
Implement not & notepg files in procfs.
 1.10 09-Jan-1994  ws Bug fixes and enhancements:
Make NFS serving work (BUT DON'T USE "attach" TO /proc/*/ctl FOR NOW!!!)
Make `curproc' a symbolic link
Add `.' and `..' entries to the directories.
Return better guesses on the size of the files.
 1.9 05-Jan-1994  cgd add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.8 18-Dec-1993  mycroft Canonicalize all #includes.
 1.7 26-Aug-1993  pk branches: 1.7.2;
Implement setattr: mode for process entries; mode + uid/gid for the
PROCFS root directory.
Fixed omission in pfs_root() which came to light as a result of the above:
hold on to vnode for root dir.
 1.6 25-Aug-1993  mycroft Um, last change was wrong. Instead, add 3 to the number of inodes (forget
about the root directory, too).
 1.5 25-Aug-1993  mycroft Subtract to from the free count for `.' and `..', to maintain the fiction that
this is a real file system.
 1.4 24-Aug-1993  pk Fill inode fields in procfs_statfs(), in stead of block fields
 1.3 24-Aug-1993  pk copyright update.
 1.2 24-Aug-1993  pk Rcs Id added.
 1.1 24-Aug-1993  pk branches: 1.1.1;
Initial version of a proc filesystem.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.7.2.2 28-Dec-1993  pk Return ENODEV rather then EOPNOTSUP for unsupported operations.
 1.7.2.1 14-Nov-1993  mycroft Canonicalize all #includes.
 1.18.2.1 16-Sep-1994  cgd from trunk, per mycroft
 1.31.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.31.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.31.8.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.31.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.31.2.1 01-Feb-2000  he Pull up revision 1.32 (via patch, requested by fvdl):
Close procfs security hole. Fixes SA#2000-001.
 1.33.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.34.2.1 30-Mar-2001  he Pull up revision 1.36 (requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.38.2.6 07-Jan-2003  thorpej Sync with HEAD.
 1.38.2.5 18-Oct-2002  nathanw Catch up to -current.
 1.38.2.4 01-Aug-2002  nathanw Catch up to -current.
 1.38.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.38.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.38.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.39.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.39.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.39.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.39.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.40.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.41.8.1 29-Aug-2002  gehenna catch up with -current.
 1.48.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.48.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.48.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.48.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.48.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.48.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.48.2.2 03-Aug-2004  skrll Sync with HEAD
 1.48.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.52.2.1 29-May-2004  tron Pull up revision 1.56 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.59.2.1 29-Apr-2005  kent sync with -current
 1.60.2.7 04-Feb-2008  yamt sync with head.
 1.60.2.6 21-Jan-2008  yamt sync with head
 1.60.2.5 07-Dec-2007  yamt sync with head
 1.60.2.4 03-Sep-2007  yamt sync with head.
 1.60.2.3 26-Feb-2007  yamt sync with head.
 1.60.2.2 30-Dec-2006  yamt sync with head.
 1.60.2.1 21-Jun-2006  yamt sync with head.
 1.63.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.63.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.63.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.63.8.2 03-Sep-2006  yamt sync with head.
 1.63.8.1 24-May-2006  yamt sync with head.
 1.63.6.1 01-Jun-2006  kardel Sync with head.
 1.63.4.1 09-Sep-2006  rpaulo sync with head
 1.65.4.2 10-Dec-2006  yamt sync with head.
 1.65.4.1 22-Oct-2006  yamt sync with head
 1.65.2.4 01-Feb-2007  ad Sync with head.
 1.65.2.3 12-Jan-2007  ad Sync with head.
 1.65.2.2 18-Nov-2006  ad Sync with head.
 1.65.2.1 17-Nov-2006  ad Checkpoint work in progress.
 1.67.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.70.6.3 25-Oct-2007  ad - Simplify debugger/procfs reference counting of processes. Use a per-proc
rwlock: rw_tryenter(RW_READER) to gain a reference, and rw_enter(RW_WRITER)
by the process itself to drain out reference holders before major changes
like exiting.
- Fix numerous bugs and locking issues in procfs.
- Mark procfs MPSAFE.
 1.70.6.2 20-Aug-2007  ad Sync with HEAD.
 1.70.6.1 15-Jul-2007  ad Sync with head.
 1.72.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.74.12.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.74.12.1 31-Jul-2007  pooka file procfs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:17 +0000
 1.74.10.2 18-Feb-2008  mjf Sync with HEAD.
 1.74.10.1 08-Dec-2007  mjf Sync with HEAD.
 1.74.4.2 23-Mar-2008  matt sync with HEAD
 1.74.4.1 09-Jan-2008  matt sync with HEAD
 1.74.2.1 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.75.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.75.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.77.10.3 11-Mar-2010  yamt sync with head
 1.77.10.2 04-May-2009  yamt sync with head.
 1.77.10.1 16-May-2008  yamt sync with head.
 1.77.8.1 18-May-2008  yamt sync with head.
 1.77.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.77.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.78.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.78.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.80.2.1 03-Jul-2008  simonb Sync with head.
 1.81.20.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.81.16.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.81.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.81.6.1 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.81.4.1 28-Apr-2009  skrll Sync with HEAD.
 1.86.14.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.86.12.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.86.8.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.86.6.1 02-Jun-2012  mrg sync to latest -current.
 1.86.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.86.2.1 23-May-2012  yamt sync with head.
 1.87.4.1 18-May-2014  rmind sync with head
 1.87.2.2 03-Dec-2017  jdolecek update from HEAD
 1.87.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.90.2.1 10-Aug-2014  tls Rebase.
 1.92.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.94.4.3 26-Apr-2017  pgoyette Sync with HEAD
 1.94.4.2 20-Mar-2017  pgoyette Sync with HEAD
 1.94.4.1 04-Nov-2016  pgoyette Sync with HEAD
 1.94.2.2 28-Aug-2017  skrll Sync with HEAD
 1.94.2.1 05-Dec-2016  skrll Sync with HEAD
 1.95.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.97.6.1 12-Apr-2018  martin Pull up following revision(s) (requested by kamil in ticket #713):

sys/modules/procfs/Makefile: revision 1.4
sys/miscfs/procfs/procfs_vfsops.c: revision 1.98
bin/ps/ps.1: revision 1.108
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.32
sys/miscfs/procfs/procfs_vnops.c: revision 1.198
sys/kern/sys_ptrace_common.c: revision 1.23
sys/kern/sys_ptrace_common.c: revision 1.24
sbin/mount_procfs/mount_procfs.8: revision 1.36
sys/kern/sys_ptrace_common.c: revision 1.25
sys/kern/sys_ptrace.c: revision 1.5
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.30
sys/sys/proc.h: revision 1.342
sys/kern/sys_ptrace_common.c: revision 1.26
sys/miscfs/procfs/procfs_ctl.c: file removal
sys/kern/sys_ptrace_common.c: revision 1.27
sys/miscfs/procfs/procfs_subr.c: revision 1.109
sys/kern/sys_ptrace_common.c: revision 1.28
sys/secmodel/extensions/secmodel_extensions.c: revision 1.8
sys/kern/sys_ptrace_common.c: revision 1.29
sys/sys/ptrace.h: revision 1.62
sys/compat/netbsd32/netbsd32_signal.c: revision 1.45
share/man/man9/kauth.9: revision 1.109
sys/miscfs/procfs/files.procfs: revision 1.12
sys/compat/netbsd32/netbsd32.h: revision 1.115
sys/miscfs/procfs/procfs.h: revision 1.72
sys/compat/netbsd32/netbsd32_ptrace.c: revision 1.5
sys/kern/kern_sig.c: revision 1.337
sys/sys/kauth.h: revision 1.75
sys/sys/sysctl.h: revision 1.224
sys/kern/sys_ptrace_common.c: revision 1.30
sys/kern/sys_ptrace_common.c: revision 1.31
sys/kern/sys_ptrace_common.c: revision 1.32
sys/kern/sys_ptrace_common.c: revision 1.33
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.20
sys/kern/sys_ptrace_common.c: revision 1.34
sys/kern/sys_ptrace_common.c: revision 1.36
sys/kern/kern_proc.c: revision 1.207
sys/kern/kern_exit.c: revision 1.269
doc/TODO.ptrace: revision 1.29

Make {s,g}et{db,fp,}regs work again for PK_32 processes
XXX: pullup-8

add disgusting magic to handle compat_netbsd32 as a module.

use process_*reg32 instead of struct *reg32.

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed

PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>

untangle the mess:
- factor out common code
- break each ptrace subcall to its own sub-function
.. more to come ...
- reduce ifdef ugliness by moving it up top.
- factor out PT_IO and make PT_{READ,WRITE}_{I,D} use it
- factor out PT_DUMPCORE
- factor out sendsig code
.. more to come ...

handle siginfo requests for ptrace32

ptrace: Partially undo PT_{READ,WRITE}_{I,D} and unbreak these commands

The refactored code did not work and was generating EFAULT.

Sponsored by <The NetBSD Foundation>

Merge the code back; the problem was that since we are reading/writing
to a kernel address for PT_{READ,WRITE}_{I,D} we need the kernel vmspace.
provide separate read and write functions to accomodate register functions
that need a size argument.

don't ignore error from copyout_piod

Use the proper process (the tracee) to get information about lwps and
registers and the tracer for vmspace.

Add new sysctl(3) entry: security.models.extensions.user_set_dbregs

Model this new sysctl(3) entry after "user_set_cpu_affinity" in the same
level of sysctl(3) switches.

Allow to read unconditionally Debug Registers (no change here). This is
convenient as even if a user of a debugger does not use hardware assisted
watchpoints/breakpoints, a debugger can still prompt these values to store
in an internal cache with context of registers. Reading them should have
no security concerns.

Add a paranoid MI switch that prohibits by default setting these registers
by a regular user (non-superuser). Make this switch disabled by default.
There are enough reserved bits out there to allow using them
unconditionally on hardened hosts.

Features shipped with Debug Registers are optional features in debuggers.
There is no reduction in elementary functionality.

Reviewed by <christos>

Sponsored by <The NetBSD Foundation>
 1.100.4.3 21-Apr-2020  martin Sync with HEAD
 1.100.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.100.4.1 10-Jun-2019  christos Sync with HEAD
 1.101.6.2 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.101.6.1 17-Jan-2020  ad Sync with head.
 1.101.4.1 04-Feb-2021  martin Pull up following revision(s) (requested by riastradh in ticket #1195):

sys/miscfs/procfs/procfs_vfsops.c: revision 1.110

Fix procfs environ node.
 1.104.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.109.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.111.4.3 16-Sep-2024  martin Pull up following revision(s) (requested by pgoyette in ticket #868):

sys/miscfs/procfs/procfs_vfsops.c: revision 1.120 (via patch)

Define dependencies based on build options.
 1.111.4.2 13-Sep-2024  martin Pull up following revision(s) (requested by pgoyette in ticket #857):

sys/modules/procfs/Makefile: revision 1.8
sys/miscfs/procfs/procfs_vfsops.c: revision 1.118
sys/miscfs/procfs/procfs_vfsops.c: revision 1.119

procfs grew a new dependency

Include the SYSV_* entries for modular procfs

Now we have another dependency for the SYSV_* stuff.
 1.111.4.1 18-Apr-2024  martin Pull up following revision(s) (requested by hannken in ticket #668):

sys/miscfs/procfs/procfs.h: revision 1.83
sys/miscfs/procfs/procfs.h: revision 1.84
sys/kern/vfs_mount.c: revision 1.104
sys/miscfs/procfs/procfs_vnops.c: revision 1.230
sys/kern/init_main.c: revision 1.547
sys/kern/kern_hook.c: revision 1.15
sys/miscfs/procfs/procfs_vfsops.c: revision 1.112
sys/miscfs/procfs/procfs_vfsops.c: revision 1.113
sys/miscfs/procfs/procfs_vfsops.c: revision 1.114
sys/miscfs/procfs/procfs_subr.c: revision 1.117

Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
Protect kernel hooks exechook, exithook and forkhook with rwlock.

Lock as writer on establish/disestablish and as reader on list traverse.

For exechook ride "exec_lock" as it is already take as reader when
traversing the list. Add local locks for exithook and forkhook.

Move exec_init before signal_init as signal_init calls exechook_establish()
that needs "exec_lock".

PR kern/39913 "exec, fork, exit hooks need locking"

Add a hashmap to access all procfs nodes by pid.

Using the exechook to revoke procfs nodes is racy and may deadlock:
one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"

Remove all procfs nodes for this process on process exit.
 1.116.2.1 02-Aug-2025  perseant Sync with HEAD
 1.233 01-Jul-2024  christos Add linux POSIX message queue support (Ricardo Branco)
 1.232 12-May-2024  christos branches: 1.232.2;
PR/58227: Ricardo Branco: Add support for proc/sysvipc in Linux emulator
 1.231 12-May-2024  christos PR/58240: Ricardo Branco: Add support for proc/self/limits as used by Linux
 1.230 17-Jan-2024  hannken Add a hashmap to access all procfs nodes by pid.
 1.229 17-Jun-2022  shm branches: 1.229.4;
Add missing permission check
 1.228 27-Mar-2022  christos dedup the eofs link/symlink methods
 1.227 17-Jan-2022  bouyer If the calling process is running under linux emulation, make /proc/xxx/fd/
return only symlinks pointing to the original file in the filesystem,
instead of a hard link. This matches the linux behavior, and some
linux programs relies on it (they unconditionally call readlink() on
/proc/xxx/fd/yy and don't deal with it returning EINVAL).
Proposed on tech-kern@ in
http://mail-index.netbsd.org/tech-kern/2022/01/11/msg027877.html
 1.226 14-Jan-2022  christos Fix emul and exe DT_ types (from RVP, as was the previous commit)
 1.225 14-Jan-2022  christos Put the appropriate DT_ constant in the dirent structure depending on the
file type.
 1.224 11-Jan-2022  christos remove redundant error initialization and break earlier. (from rvp)
 1.223 11-Jan-2022  hannken Use a single "p" variable.

Should fix PR kern/56614: kernel panic on tmux
 1.222 10-Jan-2022  christos use a single nc variable.
 1.221 10-Jan-2022  christos Fix locking in the error path (from RVP). Centralize unlock code.
 1.220 08-Dec-2021  andvar s/efficent/efficient/ in comments.
 1.219 05-Oct-2021  christos PR/53299: RVP: kernfs and procfs are broken when sysctl security.curtain
is enabled
 1.218 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.217 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.216 28-Jun-2021  chs VOP_BMAP() may be called via ioctl(FIOGETBMAP) on any vnode that applications
can open. change various pseudo-fs *_bmap methods return an error instead of
panic.

Reported-by: syzbot+8289a3eaf2ba60958c87@syzkaller.appspotmail.com
 1.215 27-Jun-2020  christos branches: 1.215.6;
Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.214 23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.213 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.212 29-Apr-2020  thorpej If the procfs mount is marked as linux-compat, then allow proc lookup
by any LWP ID in the proc, not just the canonical PID.
 1.211 21-Apr-2020  ad Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.
 1.210 24-Feb-2020  ad branches: 1.210.4;
v_interlock -> vmobjlock
 1.209 23-Feb-2020  ad Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.
 1.208 01-Feb-2020  riastradh Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.
 1.207 29-Aug-2019  hannken branches: 1.207.2;
Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.206 30-Mar-2019  christos branches: 1.206.4;
add a node for the process resource limits.
 1.205 14-Oct-2018  jdolecek remove M_CANFAIL flag for malloc(9) - it was completely ignored, so had
actually no effect
 1.204 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.203 07-Apr-2018  hannken branches: 1.203.2;
Lock the target cwdi and take an additional reference to the
vnode we are interested in to prevent it from disappearing
before getcwd_common().

Should fix PR kern/53096 (netbsd-8 crash on heavy disk I/O)
 1.202 31-Dec-2017  christos branches: 1.202.2;
Add an environ node
 1.201 01-Dec-2017  christos Allow procfs_kqfilter, since we allow poll. "go" does it.
 1.200 08-Nov-2017  christos fix locking, remove error(1) comments.
 1.199 08-Nov-2017  christos use p->p_path, remove unused code.
 1.198 28-Aug-2017  kamil Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
 1.197 26-May-2017  riastradh branches: 1.197.2;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.196 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.195 30-Mar-2017  christos add an auxv node.
 1.194 20-Aug-2016  hannken branches: 1.194.2;
Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.193 20-Apr-2015  riastradh branches: 1.193.2;
Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.192 05-Sep-2014  matt branches: 1.192.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.191 27-Jul-2014  hannken branches: 1.191.2; 1.191.4; 1.191.8;
Change procfs from hashlist to vcache.
- Key is (type, pid, fd)
- Remove argument "p" from procfs_allocvp(). It is only used
when "type == PFSfd". Lookup the proc with proc_find() when
procfs_loadvnode() needs it.
- Use a vfs_vnode_iterator for procfs_revoke_vnodes().
 1.190 25-Jul-2014  dholland Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.189 07-Feb-2014  hannken branches: 1.189.2;
Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.188 23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.187 17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.186 18-Mar-2013  plunky branches: 1.186.6;
C99 section 6.7.2.3 (Tags) Note 3 states that:

A type specifier of the form

enum identifier

without an enumerator list shall only appear after the type it
specifies is complete.

which means that we cannot pass an "enum vtype" argument to
kauth_access_action() without fully specifying the type first.
Unfortunately there is a complicated include file loop which
makes that difficult, so convert this minimal function into a
macro (and capitalize it).

(ok elad@)
 1.185 25-Nov-2012  christos do something reasonable with kernel semaphores.
 1.184 28-May-2012  christos branches: 1.184.2;
add a task process subdirectory for emul linux
 1.183 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.182 04-Sep-2011  jmcneill branches: 1.182.2; 1.182.6;
PR# kern/45021: Please support /emul/linux/proc/version

Add /proc/version for procfs with -o linux. The version reported depends
on the emulation type of the calling process:

$ cat /proc/version
NetBSD version 5.99.55 (netbsd@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) NetBSD 5.99.55 (GENERIC) #39: Sun Sep 4 09:10:05 EDT 2011

$ /emul/linux/bin/cat /proc/version
Linux version 2.6.18 (linux@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010

$ /emul/linux32/bin/cat /proc/version
Linux version 2.6.18 (linux32@localhost) (gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)) #0 Wed Mar 3 03:03:03 PST 2010
 1.181 23-Jun-2011  christos From Aleksey Cheusov: Don't make it easy for compromised systems to bypass
ASLR protections by providing the mapping addresses of programs to everyone.
 1.180 01-Jul-2010  rmind Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.
 1.179 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.178 08-Jun-2010  hannken Procfs_lookup() does not lookup directory descriptors in the fd/
subdirectory. There is no need for recursive vnode locking here.

Ok: Christos Zoulas <christos@netbsd.org>
 1.177 08-Jan-2010  pooka branches: 1.177.2; 1.177.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.176 03-Jul-2009  elad Where possible, extract the file-system's access() routine to two internal
functions: the first checking if the operation is possible (regardless of
permissions), the second checking file-system permissions, ACLs, etc.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
 1.175 23-Jun-2009  elad Move the implementation of vaccess() to genfs_can_access(), in line with
the other routines of the same spirit.

Adjust file-system code to use it.

Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).

No objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
 1.174 24-May-2009  ad More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
 1.173 17-Dec-2008  cegger branches: 1.173.2;
kill MALLOC and FREE macros.
 1.172 05-Sep-2008  skrll branches: 1.172.2;
PR/39324 kernel diagnostic assertion "l->l_stat != LSZOMB" failed.

Ignore procs with zero or all LSZOMB LWPs. Get a non-LSZOMB LWP to perform
operations against as part of the deal.

procfs really needs to be updated to support multi-threading fully.
Hi Antti!
 1.171 05-Sep-2008  skrll ANSIfy
 1.170 02-Jul-2008  rmind branches: 1.170.2;
Remove proc_representative_lwp(), use a simple LIST_FIRST() instead.
OK by <ad>.
 1.169 28-Apr-2008  martin branches: 1.169.2; 1.169.4;
Remove clause 3 and 4 from TNF licenses
 1.168 24-Apr-2008  ad branches: 1.168.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.167 24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.166 21-Mar-2008  ad branches: 1.166.2;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.165 23-Jan-2008  elad branches: 1.165.6;
Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.
 1.164 02-Jan-2008  ad Merge vmlocking2 to head.
 1.163 26-Nov-2007  pooka branches: 1.163.2; 1.163.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.162 09-Nov-2007  christos make the last argument of procfs_dir size_t
 1.161 07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.160 10-Oct-2007  ad branches: 1.160.2; 1.160.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.159 08-Oct-2007  ad Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.158 22-Jul-2007  pooka branches: 1.158.4; 1.158.6; 1.158.8; 1.158.10;
Don't allow getcwd() on procfs vnodes and provide "/" as the path
instead of the result from getcwd(). The works around locking
panics caused by namei calling VOP_READLINK while holding on to a
directory lock and getcwd() trying to acquire that lock. The real
fix would be to get rid of getcwd() calls within VOPs (not locking
safe), but that's not a viable option in the netbsd-4 timeframe.

Suggestion for workaround from David Holland.
 1.157 24-May-2007  agc branches: 1.157.2;
Extend the Linux emulation of /proc to include

/proc/stat
/proc/loadavg and
/proc/<pid>/statm.

These are only present when -o linux is specified as a mount option
to procfs.

Factor out some common code so that it can be used by a number of
functions.

XXX The values returned in the statm emulation need to be verified.
 1.156 04-Apr-2007  rmind Unfortunately, missed procfs_proc_unlock() in previous.
Pointed out by pooka@
 1.155 04-Apr-2007  rmind procfs_readlink: Handle a possible fail of fd_getfile(), also, we
do not need to check for error again.
CID: 4436
 1.154 09-Mar-2007  ad branches: 1.154.2; 1.154.4;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.153 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.152 03-Mar-2007  salo Don't prepend rootvnode to the path in non-NULL case for exe links.
It breaks procfs in chroot.

from <christos>, tested by me.
 1.151 19-Feb-2007  pooka When checking for file validity under pid/, do proper proc->lwp
lookup (fsvo proper) instead of fiddling directly with the lwp
list.
 1.150 18-Feb-2007  pooka Don't check for validity of p in lookup for root nodes, since it
will always be NULL. Rather, just call pt_valid with NULL directly
and let it decide if we're a linux mount or not.
 1.149 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.148 16-Feb-2007  pooka branches: 1.148.2;
In lookup, when checking for procfs process node validity, target the
process we're trying to get information about through procfs, not
the caller of lookup.

fixes 'ls -l /proc/*/file' panic, which would occur when trying to
lookup "file" for a kernel thread, which doesn't have p->p_textvp.
 1.147 15-Feb-2007  ad Need to acquire procp->p_mutex for procfs_dir().
 1.146 11-Feb-2007  ad Eliminate a couple of reference count and mutex leaks.
 1.145 09-Feb-2007  ad Merge newlock2 to head.
 1.144 25-Dec-2006  elad PR/35226: Johann Franz: Problems with permissions in
/usr/pkg/emul/linux/proc .

Okay mlelstv@
 1.143 09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.142 04-Dec-2006  christos From Nicolas Joly: restore previous behavior in procfs_validfile_linux, since
readdir passes a NULL lwp.
 1.141 03-Dec-2006  elad Move kauth(9) call to where it belongs. Noticed by Nicolas Joly, thanks!
 1.140 28-Nov-2006  elad branches: 1.140.2;
Move ktrace, ptrace, systrace, and procfs to use kauth(9).

First, remove process_checkioperm() calls from MD code. Similar checks
using kauth(9) routines (on the process scope, using appropriate action)
are done in the callers.

Add secmodel back-end to handle each subsystem.
 1.139 25-Nov-2006  skrll Expose the 'exe' symlink to the process realpath in NetBSD as well. An
example user is gdb.

OK'd by christos.
 1.138 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.137 29-Oct-2006  christos add an "emul" file node.
 1.136 25-Oct-2006  christos 1. fix procfs_validfile{,_linux} to test for NULL pointers properly.
2. make "exe" entry be a symlink to the executable, instead of pointing
directly to the vnode of the executable.
3. factor out commonly used code.
 1.135 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.134 20-Sep-2006  manu Emulate Linux's /proc/devices
 1.133 13-Jun-2006  yamt branches: 1.133.6; 1.133.8;
use KAUTH_PROCESS_CANSEE rather than CURTAIN where appropriate.
 1.132 13-Jun-2006  yamt remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.
 1.131 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.130 14-May-2006  elad branches: 1.130.2;
integrate kauth.
 1.129 02-Feb-2006  christos branches: 1.129.2; 1.129.4; 1.129.6; 1.129.8;
PR/32692: Matthew Mondor: linux compatibility in /proc/self should point
directly to the directory containing the pid instead of pointing to
/proc/curproc, because some programs rely on calling readlink on /proc/self
to get the pid.
 1.128 11-Dec-2005  christos branches: 1.128.2; 1.128.4;
merge ktrace-lwp.
 1.127 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.126 01-Oct-2005  atatat branches: 1.126.2;
Add "cwd" and "root" symlinks to each process's directory. The cwd
link points to the process's current working directory, and the root
link points to the process's root directory. What else would you
expect?

For directories that are out of reach (caller is in a chroot, target
process is in a different chroot, etc), the links point to "/"
instead.
 1.125 11-Sep-2005  elad Implement curtain for procfs.
 1.124 30-Aug-2005  xtraeme Remove __P()
 1.123 29-May-2005  christos branches: 1.123.2;
- sprinkle const
- avoid shadowed variables.
 1.122 02-Apr-2005  christos PR/29782: Martin Husemann: procfs can not unmount when some process has its
current directory in curproc. Fix from Pedro Martelletto:
We cannot call vgone() from procfs_inactive() if we are coming from
vclean(). that's what's probably causing the deadlock.
 1.121 26-Feb-2005  perry nuke trailing whitespace
 1.120 04-Oct-2004  yamt branches: 1.120.4; 1.120.6;
procfs_readdir:
- return correct cookie when buffer size is small.
- simplify logic.
 1.119 04-Oct-2004  yamt procfs_readdir: remove a redundant assignment.
 1.118 02-Oct-2004  yamt procfs_getattr: correct size of /proc/self.
 1.117 01-Oct-2004  yamt procfs_readdir:
- fix a locking problem, using proclist_foreach_call. PR/27098.
- correct snprintf size argument.
 1.116 01-Oct-2004  yamt procfs_readdir: fix an offset handling bug after addition of /proc/self.
 1.115 01-Oct-2004  yamt procfs_readdir: use a list macro.
 1.114 20-Sep-2004  jdolecek add 'mounts' file for -o linux, which lists all currently mounted
filesystems; Linux glibc statvfs() uses this to get some of mount flags,
and this file is also useful as /emul/linux/etc/mtab (via symlink)
 1.113 29-Apr-2004  jrf Removed remaining caddr_t casts we do not need in miscfs. Recompiled
kernel and ran for a day or so. There are still some caddr_t types in
the arguments of some calls, I will do those separately (later) as
they touch a lot more of the system.
Approved by christos@NetBSD.org.
 1.112 22-Apr-2004  itojun sprintf -> snprintf
 1.111 15-Feb-2004  jdolecek unlock the descriptor table simple lock after fd_getfile() call in
procfs_readdir()
fixes procfs locking problems reported on current-users@, problem place
found by enami tsugutomo
 1.110 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.109 27-Sep-2003  darcy Changes as discussed with itojun on tech-kern. I have modified the enums
to have KFS or PFS differentiators. Further I have wrapped the enum in
procfs in "#ifdef _KERNEL" as it is done in kernfs.

To see the discussion go to http://mail-index.NetBSD.org/tech-kern/2003/09/
and look for "Mismatched enums in include files" in the list.
 1.108 07-Sep-2003  itojun remove meaningless line (variable overwritten 2 lines below)
 1.107 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.106 29-Jun-2003  fvdl branches: 1.106.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.105 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.104 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.103 28-May-2003  christos Add /proc/<pid>/stat for linux compat. j2sdk1.4.2 depends on it.
 1.102 18-Apr-2003  christos Make symlinks for directories that point to the actual directory.
Make symlinks to [kqueue] and [misc] for kqueue and misc fds.
 1.101 17-Apr-2003  jdolecek do not show nodes corresponding to directory descriptors for process
in fd/ subdirectory, nor allow lookup/open for the nodes
this fixes PR kern/21187 for good, and also avoids interesting directory
locking issues
 1.100 17-Apr-2003  jdolecek procfs_readdir(): in Pfd case, only show descriptors of types we want
how to represent (vnodes, fifo, pipes); also use fd_getfile() et al

this avoids annoying EOPNOTSUPP error messages from ls -F and such
 1.99 17-Apr-2003  jdolecek procfs_lookup(): use fd_getfile() et al in Pfd case
 1.98 17-Apr-2003  jdolecek use fd_getfile() in procfs_getfp(), and FILE_USE()/FILE_UNUSE() the
returned file descriptor pointer appropriately
 1.97 17-Apr-2003  jdolecek make some local arrays/variables static + const
 1.96 10-Apr-2003  jdolecek use former genfs_eopnotsupp_rele() as genfs_eopnotsupp(), so that vnodes
are vput()/vrele()d as necessary - some filesystems did use the wrong
one for some ops, and it's just safer to not take the chance

based on suggestion by Bill Studenmund
 1.95 05-Apr-2003  dsl Remove pointless check against PID_MAX. Let pfind() do the validation.
(The new pid allocation code may decide to allocate pids above PID_MAX.)
 1.94 25-Feb-2003  jrf This addresses PR kerm/19989. Thanks to hamajima@nagoya.ydc.co.jp for submitting this patch which enables /proc/uptime for linux emul. Patch reviewed by atatat@netbsd.org and tron@netbsd.org, approved by tron@netbsd.org.
 1.93 04-Jan-2003  martin Cast off_t expression to long long to match format even on 64 bit
plattforms.

Shouldn't we introduce a PRIoff_t macro to create such format strings?
 1.92 03-Jan-2003  christos add LK_CANRECURSE in the locking of /dev/<pid>/fd/<n> and remove the curproc
kludge. Thanks to fvdl.
 1.91 03-Jan-2003  christos Implement /proc/<pid>/fd/<n>. This is work in progress. Questionable things:
- Is it ok to convert DTYPE_PIPE to VFIFO and DTYPE_SOCKET to VSOCK?
- XXX: Avoid locking issue in ls -Rl /proc by avoiding curproc
- Does I/O to pipes work?
- XXX: Are there security implications?
 1.90 03-Aug-2002  simonb Just use the "time" variable in the *_getattr functions instead of a call
to (the potentially expensive) microtime().
 1.89 09-May-2002  thorpej branches: 1.89.2;
Move code shared by procfs and the kernel proper out of procfs and
into the kernel proper (renaming functions from procfs_* to process_*).
 1.88 12-Jan-2002  christos Don't hide the real return code with EPERM.
 1.87 06-Dec-2001  chs add a VOP_PUTPAGES method for all the filesystems that don't have pages,
just unlock the interlock.
 1.86 05-Dec-2001  thorpej * Allow machine-dependent code to specify hooks for ptrace(2)
(__HAVE_PTRACE_MACHDEP) and procfs (__HAVE_PROCFS_MACHDEP).
These changes will allow platforms like x86 (XMM) and PowerPC
(AltiVec) to export extended register sets in a sane manner.

* Use __HAVE_PTRACE_MACHDEP to export x86 XMM registers (standard
FP + SSE/SSE2) using PT_{GET,SET}XMMREGS (in the machdep
ptrace request space).
* Use __HAVE_PROCFS_MACHDEP to export x86 XMM registers via
/proc/N/xmmregs in procfs.
 1.85 10-Nov-2001  lukem add RCSIDs
 1.84 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.83 31-Aug-2001  chs branches: 1.83.2; 1.83.4;
map files are zero-length.
 1.82 03-Jun-2001  chs branches: 1.82.2;
procfs_bmap() should never be called, make it a "bad op".
let procfs_mmap() use the default error method.
 1.81 14-Apr-2001  kleink In procfs_readdir(), give /proc/# directories DT_DIR (rather than DT_REG).
 1.80 30-Mar-2001  fvdl Bump va_blocksize for the map files some more, so that programs with
quite a few mappings have a chance of being handled correctly if
st_blksize is looked at.
 1.79 29-Mar-2001  fvdl For -o linux mounts, add some code to emulate /proc/#/maps.
Needs NAMECACHE_ENTER_REVERSE to include filenames.
 1.78 21-Feb-2001  jdolecek branches: 1.78.2;
make some more constant arrays 'const'
 1.77 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.76 17-Jan-2001  fvdl Add a few linux-style files, only enabled when -o linux is specified
for the mount. Currently these are /proc/cpuinfo and /proc/meminfo.
The former only does something on i386 right now.
 1.75 24-Nov-2000  chs remove dead code and other misc cleanup.
 1.74 09-Aug-2000  tv Only show the "exe" entry to Linux processes, suggested by christos.
Since there are actually three struct emul's for linux, use the e_name
field to determine eligibility with strcmp().
 1.73 09-Aug-2000  tv Some versions of Linux libc look for /proc/.../exe instead of /proc/../file.
Add an entry for "exe" that is the same as "file", provided only if
COMPAT_LINUX is set.
 1.72 03-Aug-2000  thorpej MALLOC()/FREE() are not to be used for variable sized allocations.
 1.71 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.70 30-Mar-2000  simonb branches: 1.70.4;
Delete duplicate declaration of atopid().
 1.69 02-Sep-1999  thorpej branches: 1.69.2; 1.69.8;
Make /proc/self a symlink to /proc/curproc. I've observed Linux programs
that expect /proc/self/cmdline to exist.
 1.68 25-Aug-1999  sommerfeld Change variable used for directory offset from "int" to "off_t".
Overkill, but avoids a host of truncation problems.
 1.67 24-Aug-1999  sommerfeld Fix PR8270:

Problem turned out to be due to improper handling of reads beyond EOF:
they should just return without error with the uio unchanged, and the
caller will recognize this as a zero-byte return (EOF).

The previous fix to protect directory reads against bogus uio_offset
values returned EINVAL, which broke mount -o union, which only
union'ed in the lower directory if the upper directory cleanly
returned EOF.

While we're here, protect kernfs as well.
 1.66 14-Aug-1999  christos protect against large uio_offset
 1.65 03-Aug-1999  wrstuden Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.64 25-Jul-1999  thorpej Add calls to lock the proclist as appropriate.
 1.63 14-Jul-1999  thorpej Fix a paste-o in procfs_lookup() introduced with the vnode locking changes.
Fixes PR #7961, Mario Kemper <magick@bundy.lip.owl.de>.
 1.62 08-Jul-1999  wrstuden Bump osrelease to 1.4E. Add layerfs files, remove null_subr.c.

Update coda to new struct lock in struct vnode.

make fdescfs, kernfs, portalfs, and procfs actually lock their vnodes.
It's not that hard.

Make unionfs set v_vnlock = NULL so any overlayed fs will call its
VOP_LOCK.
 1.61 12-Mar-1999  christos branches: 1.61.2; 1.61.4;
PR/7143: Jaromir Docelek: Add procfs/cmdline from Linux emulation
 1.60 25-Jan-1999  msaitoh Add /proc/#/map. From FreeBSD.
 1.59 08-Sep-1998  thorpej - Use proclists[], rather than checking allproc and zombproc explicitly.
- Add some comments about locking.
 1.58 13-Aug-1998  kleink Per POSIX, fail with EINVAL if advisory locking is attempted on a file type
that doesn't support it, rather than using a homegrown EBADF or EOPNOTSUPP.
 1.57 10-Aug-1998  matthias create miscfs/genfs/genfs_vnops.c:genfs_enoioctl and make all the other
filesystems use it instead of a private version.
 1.56 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.55 03-Aug-1998  kleink Recognize _PC_SYNC_IO.
 1.54 21-Apr-1998  fvdl procfs_readdir: in case of error, check if cookies actually have
been allocated before freeing them. From Wolfgang Solfrank.
 1.53 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.52 10-Oct-1997  fvdl Bump last argument to VOP_READDIR to off_t (from u_long).
 1.51 27-Aug-1997  thorpej Fix a reversed argument which caused procfs_checkioperm() to always return
"OK". Add a few comments to avoid further confusion.
 1.50 12-Aug-1997  thorpej Fix the procfs hole described on current-users, similar to a fix for
FreeBSD by Sean Eric Fagan, but a bit different. This makes the checks
in the same places as sef's FreeBSD patch, but does not hardcode the
"kmem" group into the kernel, and also does a check identical to the
(3) and (4) checks in the NetBSD ptrace(2):

(1) it's not owned by you, or is set-id on exec (unless
you're root), or

(2) it's init, which controls the security level of the
entire system, and the system was not compiled with
permanently insecure mode turned on.
 1.49 08-May-1997  mycroft branches: 1.49.4;
Pass the vnode type to vaccess(), and use it when checking VEXEC. Make sure
that the mode bits passed to vaccess() and returned by foo_getattr() contain
only permission bits.
 1.48 05-May-1997  mycroft Need stat.h.
 1.47 05-May-1997  mycroft Eliminate bogus uses of V{READ,WRITE,EXEC}. Use S_I[RWX]{USR,GRP,OTH} where
appropriate.
 1.46 28-Apr-1997  mycroft Minor code cleanup.
 1.45 25-Oct-1996  cgd define path name string variables that we should not (and, thankfully, do
not) modify as 'const char *' rather 'char *'.
 1.44 13-Oct-1996  christos backout previous kprintf changes
 1.43 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.42 07-Sep-1996  mycroft Implement poll(2).
 1.41 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.40 16-Mar-1996  christos Fix printf format follies.
 1.39 13-Feb-1996  mycroft GC *_nullop(). Minor nits.
 1.38 12-Feb-1996  christos close PR/2063: procfs_rw prototyped twice with different prototypes
 1.37 09-Feb-1996  christos miscfs prototype changes
 1.36 09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.35 09-Oct-1995  mycroft Use the index number as the cookie, rather than multiplying by UIO_MX.
 1.34 09-Oct-1995  mycroft Add support for cookies, mostly from Greg Hudson.
 1.33 15-Apr-1995  cgd fix timeval vs. timespec warnings
 1.32 03-Feb-1995  mycroft Return EROFS rather than ENOENT in many cases. Also some cosmetic cleanup.
 1.31 27-Dec-1994  mycroft Format police.
 1.30 24-Dec-1994  ws Implement and use a common access checking routine
 1.29 14-Dec-1994  mycroft Remove a_fp.
 1.28 14-Nov-1994  christos fixed struct comment
 1.27 30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.26 20-Oct-1994  cgd update for new syscall args description mechanism
 1.25 30-Aug-1994  mycroft Convert process, file, and namei lists and hash tables to use queue.h.
 1.24 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.23 16-Jun-1994  mycroft Remove an unneeded test.
 1.22 15-Jun-1994  mycroft Minor update from JSP after merging my changes.
 1.21 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.20 05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.19 15-Apr-1994  cgd forgot these...
 1.18 12-Apr-1994  cgd be a bit smarter about determining if files shouldn't be seen by the user.
Also, DON'T allow a lookup to succeed on a file that's not visible!
 1.17 15-Feb-1994  mycroft Undo last change; executables is `file', not `a.out'.
 1.16 14-Feb-1994  ws Rename file -> a.out
 1.15 14-Feb-1994  ws Don't try to show a file for a process if there is none
 1.14 28-Jan-1994  cgd make a fpregs file.
 1.13 20-Jan-1994  ws Make procfs really work for debugging.
Implement not & notepg files in procfs.
 1.12 09-Jan-1994  ws Bug fixes and enhancements:
Make NFS serving work (BUT DON'T USE "attach" TO /proc/*/ctl FOR NOW!!!)
Make `curproc' a symbolic link
Add `.' and `..' entries to the directories.
Return better guesses on the size of the files.
 1.11 05-Jan-1994  cgd return size of 'reg' from getattr()
 1.10 05-Jan-1994  cgd make it compile (cleanly) for us
 1.9 05-Jan-1994  cgd add new procfs code, from Jan-Simon Pendry, jsp@sequent.com.
This is pretty-much "virgin", so that diffs can be done later.
 1.8 18-Dec-1993  mycroft Canonicalize all #includes.
 1.7 16-Sep-1993  cgd kill volatile warning.
 1.6 07-Sep-1993  ws branches: 1.6.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.5 26-Aug-1993  pk Implement setattr: mode for process entries; mode + uid/gid for the
PROCFS root directory.
Fixed omission in pfs_root() which came to light as a result of the above:
hold on to vnode for root dir.
 1.4 25-Aug-1993  pk Fixed improperly initialized nfsnode in pfs_lookup()
 1.3 24-Aug-1993  pk copyright update.
 1.2 24-Aug-1993  pk Rcs Id added.
 1.1 24-Aug-1993  pk branches: 1.1.1;
Initial version of a proc filesystem.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.6.2.2 14-Nov-1993  mycroft Canonicalize all #includes.
 1.6.2.1 24-Sep-1993  mycroft Changes from trunk.
 1.49.4.3 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.49.4.2 28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.49.4.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.61.4.1 02-Aug-1999  thorpej Update from trunk.
 1.61.2.2 14-Jan-2002  he Pull up revision 1.88 (via patch, requested by he):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.61.2.1 28-Aug-1999  he Pull up revisions 1.66-1.68:
Protect {fdesc,kernfs,procfs}_readdir against directory seeks
with bogus offsets. (sommerfeld)
 1.69.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.69.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.69.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.69.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.69.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.69.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.69.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.70.4.2 14-Jan-2002  he Pull up revision 1.88 (via patch, requested by christos):
Fix a ptrace/execve race condition which could be used to modify
the child process' image during execve. This would be a security
issue due to setuid programs.
 1.70.4.1 30-Mar-2001  he Pull up revisions 1.74-1.76 (via patch, requested by fvdl):
Add some required Linux emulation bits to support the Linux
version of VMware.
 1.78.2.13 07-Jan-2003  thorpej Sync with HEAD.
 1.78.2.12 15-Oct-2002  nathanw Make all the procfs_validfoo() routines go back to taking a proc
instead of an lwp; they aren't doing anything useful with the LWP.

Revert changes that changed /proc/curproc to /proc/curlwp, and broke it in
the process.
 1.78.2.11 13-Aug-2002  nathanw Catch up to -current.
 1.78.2.10 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.78.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.78.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.78.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.78.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.78.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.78.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.78.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.78.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.78.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.82.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.82.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.82.2.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.82.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.82.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.83.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.83.2.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.89.2.1 29-Aug-2002  gehenna catch up with -current.
 1.106.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.106.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.106.2.8 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.106.2.7 19-Oct-2004  skrll Sync with HEAD
 1.106.2.6 24-Sep-2004  skrll Sync with HEAD.
 1.106.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.106.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.106.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.106.2.2 03-Aug-2004  skrll Sync with HEAD
 1.106.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.120.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.120.4.1 29-Apr-2005  kent sync with -current
 1.123.2.10 24-Mar-2008  yamt sync with head.
 1.123.2.9 04-Feb-2008  yamt sync with head.
 1.123.2.8 21-Jan-2008  yamt sync with head
 1.123.2.7 07-Dec-2007  yamt sync with head
 1.123.2.6 15-Nov-2007  yamt sync with head.
 1.123.2.5 27-Oct-2007  yamt sync with head.
 1.123.2.4 03-Sep-2007  yamt sync with head.
 1.123.2.3 26-Feb-2007  yamt sync with head.
 1.123.2.2 30-Dec-2006  yamt sync with head.
 1.123.2.1 21-Jun-2006  yamt sync with head.
 1.126.2.1 20-Oct-2005  yamt adapt procfs.
 1.128.4.1 09-Sep-2006  rpaulo sync with head
 1.128.2.1 18-Feb-2006  yamt sync with head.
 1.129.8.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.129.6.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.129.6.2 10-Mar-2006  elad process_authorize() -> kauth_authorize_process(), to be closer to the
original and as requested by yamt@ and thorpej@.
 1.129.6.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.129.4.2 26-Jun-2006  yamt sync with head.
 1.129.4.1 24-May-2006  yamt sync with head.
 1.129.2.2 01-Jun-2006  kardel Sync with head.
 1.129.2.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.130.2.1 19-Jun-2006  chap Sync with head.
 1.133.8.2 10-Dec-2006  yamt sync with head.
 1.133.8.1 22-Oct-2006  yamt sync with head
 1.133.6.7 12-Jan-2007  ad Sync with head.
 1.133.6.6 29-Dec-2006  ad Checkpoint work in progress.
 1.133.6.5 18-Nov-2006  ad Sync with head.
 1.133.6.4 17-Nov-2006  ad Checkpoint work in progress.
 1.133.6.3 24-Oct-2006  ad - Redo LWP locking slightly and fix some races.
- Fix some locking botches.
- Make signal mask / stack per-proc for SA processes.
- Add _lwp_kill().
 1.133.6.2 21-Oct-2006  ad - Make this compile. XXX Needs more work on locking.
- Do FILE_UNUSE() as the current LWP, otherwise we will wipe out the
target's advisory locks. XXX Double check.
 1.133.6.1 11-Sep-2006  ad - Convert some locks to mutexes and RW locks.
- Use the proclist_lock to protect pgrps and sessions in some places.
 1.140.2.7 27-Sep-2007  xtraeme Pull up following revision(s) (requested by martti in ticket #905):
sys/miscfs/procfs/procfs_vnops.c: revision 1.152

Don't prepend rootvnode to the path in non-NULL case for exe links.
It breaks procfs in chroot.
from <christos>, tested by me.
 1.140.2.6 23-Jul-2007  liamjfoy Pull up following revision(s) (requested by pooka in ticket #785):
sys/miscfs/procfs/procfs_vnops.c: revision 1.158
Don't allow getcwd() on procfs vnodes and provide "/" as the path
instead of the result from getcwd(). The works around locking
panics caused by namei calling VOP_READLINK while holding on to a
directory lock and getcwd() trying to acquire that lock. The real
fix would be to get rid of getcwd() calls within VOPs (not locking
safe), but that's not a viable option in the netbsd-4 timeframe.
Suggestion for workaround from David Holland.
 1.140.2.5 31-Mar-2007  bouyer branches: 1.140.2.5.2;
pull up the following revisions (requested by pooka in ticket #537):
sys/miscfs/procfs/procfs_vnops.c 1.148, 1.150-1.151 via patch
Fixes a panic when doing stat */exe.
 1.140.2.4 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.140.2.3 03-Jan-2007  tron Pull up following revision(s) (requested by elad in ticket #308):
sys/secmodel/bsd44/secmodel_bsd44_suser.c: revision 1.21 via patch
sys/miscfs/procfs/procfs_vnops.c: revision 1.144
PR/35226: Johann Franz: Problems with permissions in
/usr/pkg/emul/linux/proc .
Okay mlelstv@
 1.140.2.2 03-Jan-2007  tron Pull up following revision(s) (requested by elad in ticket #307):
sys/miscfs/procfs/procfs_vnops.c: revision 1.142
From Nicolas Joly: restore previous behavior in procfs_validfile_linux,
since
readdir passes a NULL lwp.
 1.140.2.1 06-Dec-2006  tron Pull up following revision(s) (requested by elad in ticket #248):
sys/miscfs/procfs/procfs_vnops.c: revision 1.141
Move kauth(9) call to where it belongs. Noticed by Nicolas Joly, thanks!
 1.140.2.5.2.2 30-Sep-2007  wrstuden Catch up on netbsd-4 as of a few days ago.
 1.140.2.5.2.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.148.2.3 15-Apr-2007  yamt sync with head.
 1.148.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.148.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.154.4.1 11-Jul-2007  mjf Sync with head.
 1.154.2.7 25-Oct-2007  ad - Simplify debugger/procfs reference counting of processes. Use a per-proc
rwlock: rw_tryenter(RW_READER) to gain a reference, and rw_enter(RW_WRITER)
by the process itself to drain out reference holders before major changes
like exiting.
- Fix numerous bugs and locking issues in procfs.
- Mark procfs MPSAFE.
 1.154.2.6 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.154.2.5 20-Aug-2007  ad Sync with HEAD.
 1.154.2.4 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.154.2.3 08-Jun-2007  ad Sync with head.
 1.154.2.2 10-Apr-2007  ad Sync with head.
 1.154.2.1 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.157.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.158.10.2 22-Jul-2007  pooka Don't allow getcwd() on procfs vnodes and provide "/" as the path
instead of the result from getcwd(). The works around locking
panics caused by namei calling VOP_READLINK while holding on to a
directory lock and getcwd() trying to acquire that lock. The real
fix would be to get rid of getcwd() calls within VOPs (not locking
safe), but that's not a viable option in the netbsd-4 timeframe.

Suggestion for workaround from David Holland.
 1.158.10.1 22-Jul-2007  pooka file procfs_vnops.c was added on branch matt-mips64 on 2007-07-22 13:37:14 +0000
 1.158.8.1 14-Oct-2007  yamt sync with head.
 1.158.6.4 23-Mar-2008  matt sync with HEAD
 1.158.6.3 09-Jan-2008  matt sync with HEAD
 1.158.6.2 08-Nov-2007  matt sync with -HEAD
 1.158.6.1 06-Nov-2007  matt sync with HEAD
 1.158.4.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.158.4.2 11-Nov-2007  joerg Sync with HEAD.
 1.158.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.160.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.160.4.2 08-Dec-2007  mjf Sync with HEAD.
 1.160.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.160.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.163.6.2 23-Jan-2008  bouyer Sync with HEAD.
 1.163.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.163.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.165.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.165.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.165.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.165.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.166.2.1 18-May-2008  yamt sync with head.
 1.168.2.6 11-Aug-2010  yamt sync with head.
 1.168.2.5 11-Mar-2010  yamt sync with head
 1.168.2.4 18-Jul-2009  yamt sync with head.
 1.168.2.3 20-Jun-2009  yamt sync with head
 1.168.2.2 04-May-2009  yamt sync with head.
 1.168.2.1 16-May-2008  yamt sync with head.
 1.169.4.1 03-Jul-2008  simonb Sync with head.
 1.169.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.170.2.1 19-Oct-2008  haad Sync with HEAD.
 1.172.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.173.2.1 23-Jul-2009  jym Sync with HEAD.
 1.177.4.1 03-Jul-2010  rmind sync with head
 1.177.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.182.6.2 02-Jun-2012  mrg sync to latest -current.
 1.182.6.1 05-Apr-2012  mrg sync to latest -current.
 1.182.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.182.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.182.2.2 30-Oct-2012  yamt sync with head
 1.182.2.1 17-Apr-2012  yamt sync with head
 1.184.2.4 03-Dec-2017  jdolecek update from HEAD
 1.184.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.184.2.2 23-Jun-2013  tls resync from head
 1.184.2.1 25-Feb-2013  tls resync with head
 1.186.6.1 18-May-2014  rmind sync with head
 1.189.2.1 10-Aug-2014  tls Rebase.
 1.191.8.1 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1703):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.191.4.1 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1703):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.191.2.1 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1703):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.192.2.3 28-Aug-2017  skrll Sync with HEAD
 1.192.2.2 05-Oct-2016  skrll Sync with HEAD
 1.192.2.1 06-Jun-2015  skrll Sync with HEAD
 1.193.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.194.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.197.2.4 17-Jun-2022  martin Pull up following revision(s) (requested by shm in ticket #1748):

sys/miscfs/procfs/procfs_vnops.c: revision 1.229

Add missing permission check
 1.197.2.3 29-Aug-2019  martin Pull up following revision(s) (requested by hannken in ticket #1346):

sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207

Add missing operation VOP_GETPAGES() returning EFAULT.

Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.

Observed by maxv@
 1.197.2.2 12-Apr-2018  martin Pull up following revision(s) (requested by kamil in ticket #713):

sys/modules/procfs/Makefile: revision 1.4
sys/miscfs/procfs/procfs_vfsops.c: revision 1.98
bin/ps/ps.1: revision 1.108
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.32
sys/miscfs/procfs/procfs_vnops.c: revision 1.198
sys/kern/sys_ptrace_common.c: revision 1.23
sys/kern/sys_ptrace_common.c: revision 1.24
sbin/mount_procfs/mount_procfs.8: revision 1.36
sys/kern/sys_ptrace_common.c: revision 1.25
sys/kern/sys_ptrace.c: revision 1.5
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.30
sys/sys/proc.h: revision 1.342
sys/kern/sys_ptrace_common.c: revision 1.26
sys/miscfs/procfs/procfs_ctl.c: file removal
sys/kern/sys_ptrace_common.c: revision 1.27
sys/miscfs/procfs/procfs_subr.c: revision 1.109
sys/kern/sys_ptrace_common.c: revision 1.28
sys/secmodel/extensions/secmodel_extensions.c: revision 1.8
sys/kern/sys_ptrace_common.c: revision 1.29
sys/sys/ptrace.h: revision 1.62
sys/compat/netbsd32/netbsd32_signal.c: revision 1.45
share/man/man9/kauth.9: revision 1.109
sys/miscfs/procfs/files.procfs: revision 1.12
sys/compat/netbsd32/netbsd32.h: revision 1.115
sys/miscfs/procfs/procfs.h: revision 1.72
sys/compat/netbsd32/netbsd32_ptrace.c: revision 1.5
sys/kern/kern_sig.c: revision 1.337
sys/sys/kauth.h: revision 1.75
sys/sys/sysctl.h: revision 1.224
sys/kern/sys_ptrace_common.c: revision 1.30
sys/kern/sys_ptrace_common.c: revision 1.31
sys/kern/sys_ptrace_common.c: revision 1.32
sys/kern/sys_ptrace_common.c: revision 1.33
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.20
sys/kern/sys_ptrace_common.c: revision 1.34
sys/kern/sys_ptrace_common.c: revision 1.36
sys/kern/kern_proc.c: revision 1.207
sys/kern/kern_exit.c: revision 1.269
doc/TODO.ptrace: revision 1.29

Make {s,g}et{db,fp,}regs work again for PK_32 processes
XXX: pullup-8

add disgusting magic to handle compat_netbsd32 as a module.

use process_*reg32 instead of struct *reg32.

Remove the filesystem tracing feature

This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
- /proc/#/ctl from mount_procfs(8)
- P_FSTRACE note from the documentation of ps(1)
- /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
- KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
- source code file miscfs/procfs/procfs_ctl.c
- PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
- KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
- PSL_FSTRACE (0x00010000) from sys/sys/proc.h
- P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed

PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).
Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>

untangle the mess:
- factor out common code
- break each ptrace subcall to its own sub-function
.. more to come ...
- reduce ifdef ugliness by moving it up top.
- factor out PT_IO and make PT_{READ,WRITE}_{I,D} use it
- factor out PT_DUMPCORE
- factor out sendsig code
.. more to come ...

handle siginfo requests for ptrace32

ptrace: Partially undo PT_{READ,WRITE}_{I,D} and unbreak these commands

The refactored code did not work and was generating EFAULT.

Sponsored by <The NetBSD Foundation>

Merge the code back; the problem was that since we are reading/writing
to a kernel address for PT_{READ,WRITE}_{I,D} we need the kernel vmspace.
provide separate read and write functions to accomodate register functions
that need a size argument.

don't ignore error from copyout_piod

Use the proper process (the tracee) to get information about lwps and
registers and the tracer for vmspace.

Add new sysctl(3) entry: security.models.extensions.user_set_dbregs

Model this new sysctl(3) entry after "user_set_cpu_affinity" in the same
level of sysctl(3) switches.

Allow to read unconditionally Debug Registers (no change here). This is
convenient as even if a user of a debugger does not use hardware assisted
watchpoints/breakpoints, a debugger can still prompt these values to store
in an internal cache with context of registers. Reading them should have
no security concerns.

Add a paranoid MI switch that prohibits by default setting these registers
by a regular user (non-superuser). Make this switch disabled by default.
There are enough reserved bits out there to allow using them
unconditionally on hardened hosts.

Features shipped with Debug Registers are optional features in debuggers.
There is no reduction in elementary functionality.

Reviewed by <christos>

Sponsored by <The NetBSD Foundation>
 1.197.2.1 08-Apr-2018  snj Pull up following revision(s) (requested by hannken in ticket #702):
sys/miscfs/procfs/procfs_vnops.c: 1.203
Lock the target cwdi and take an additional reference to the
vnode we are interested in to prevent it from disappearing
before getcwd_common().
Should fix PR kern/53096 (netbsd-8 crash on heavy disk I/O)
 1.202.2.3 20-Oct-2018  pgoyette Sync with head
 1.202.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.202.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.203.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.203.2.1 10-Jun-2019  christos Sync with HEAD
 1.206.4.3 20-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/kern/kern_event.c: revision 1.106
sys/kern/sys_select.c: revision 1.51
sys/kern/subr_exec_fd.c: revision 1.10
sys/kern/sys_aio.c: revision 1.46
sys/kern/kern_descrip.c: revision 1.244
sys/kern/kern_descrip.c: revision 1.245
sys/ddb/db_xxx.c: revision 1.72
sys/ddb/db_xxx.c: revision 1.73
sys/miscfs/fdesc/fdesc_vnops.c: revision 1.132
sys/kern/uipc_usrreq.c: revision 1.195
sys/kern/sys_descrip.c: revision 1.36
sys/kern/uipc_usrreq.c: revision 1.196
sys/kern/uipc_socket2.c: revision 1.135
sys/kern/uipc_socket2.c: revision 1.136
sys/kern/kern_sig.c: revision 1.383
sys/kern/kern_sig.c: revision 1.384
sys/compat/netbsd32/netbsd32_ioctl.c: revision 1.107
sys/miscfs/procfs/procfs_vnops.c: revision 1.208
sys/kern/subr_exec_fd.c: revision 1.9
sys/kern/kern_descrip.c: revision 1.252
(all via patch)

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:
- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.

Load struct fdfile::ff_file with atomic_load_consume.
Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)
kern_descrip.c: Fix membars around reference count decrement.

In general, the `last one out hit the lights' style of reference
counting (as opposed to the `whoever's destroying must wait for
pending users to finish' style) requires memory barriers like so:

... usage of resources associated with object ...
membar_release();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_acquire();
... freeing of resources associated with object ...

This way, all usage happens-before all freeing. This fixes several
errors:
- fd_close failed to ensure whatever its caller did would
happen-before the freeing, in the case where another thread is
concurrently trying to close the fd (ff->ff_file == NULL).
Fix: Add membar_release before atomic_dec_uint(&ff->ff_refcnt) in
that branch.
- fd_close failed to ensure all loads its caller had issued will have
happened-before the freeing, in the case where the fd is still in
use by another thread (fdp->fd_refcnt > 1 and ff->ff_refcnt-- > 0).
Fix: Change membar_producer to membar_release before
atomic_dec_uint(&ff->ff_refcnt).
- fd_close failed to ensure that any usage of fp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&ff->ff_refcnt).
- fd_free failed to ensure that any usage of fdp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&fdp->fd_refcnt).

While here, change membar_exit -> membar_release. No semantic
change, just updating away from the legacy API.
 1.206.4.2 17-Jun-2022  martin Pull up following revision(s) (requested by shm in ticket #1475):

sys/miscfs/procfs/procfs_vnops.c: revision 1.229

Add missing permission check
 1.206.4.1 01-Sep-2019  martin Pull up following revision(s) (requested by hannken in ticket #132):
sys/miscfs/kernfs/kernfs_vnops.c: revision 1.161
sys/miscfs/procfs/procfs_vnops.c: revision 1.207
Add missing operation VOP_GETPAGES() returning EFAULT.
Without this operation posix_fadvise(..., POSIX_FADV_WILLNEED)
would leave the v_interlock held.
Observed by maxv@
 1.207.2.2 29-Feb-2020  ad Sync with head.
 1.207.2.1 25-Jan-2020  ad Make cwdinfo use mostly lockless, and largely hide the details in vfs_cwd.c.
 1.210.4.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.215.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.229.4.1 18-Apr-2024  martin Pull up following revision(s) (requested by hannken in ticket #668):

sys/miscfs/procfs/procfs.h: revision 1.83
sys/miscfs/procfs/procfs.h: revision 1.84
sys/kern/vfs_mount.c: revision 1.104
sys/miscfs/procfs/procfs_vnops.c: revision 1.230
sys/kern/init_main.c: revision 1.547
sys/kern/kern_hook.c: revision 1.15
sys/miscfs/procfs/procfs_vfsops.c: revision 1.112
sys/miscfs/procfs/procfs_vfsops.c: revision 1.113
sys/miscfs/procfs/procfs_vfsops.c: revision 1.114
sys/miscfs/procfs/procfs_subr.c: revision 1.117

Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
Protect kernel hooks exechook, exithook and forkhook with rwlock.

Lock as writer on establish/disestablish and as reader on list traverse.

For exechook ride "exec_lock" as it is already take as reader when
traversing the list. Add local locks for exithook and forkhook.

Move exec_init before signal_init as signal_init calls exechook_establish()
that needs "exec_lock".

PR kern/39913 "exec, fork, exit hooks need locking"

Add a hashmap to access all procfs nodes by pid.

Using the exechook to revoke procfs nodes is racy and may deadlock:
one thread runs doexechooks() -> procfs_revoke_vnodes() and wants to suspend
the file system for vgone(), while another thread runs a forced unmount,
has the file system suspended, tries to disestablish the exechook and
waits for doexechooks() to complete.

Establish/disestablish the exechook on module load/unload instead
mount/unmount and use the hashmap to access all procfs nodes for this pid.

May fix PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"

Remove all procfs nodes for this process on process exit.
 1.232.2.1 02-Aug-2025  perseant Sync with HEAD
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.219 06-Jan-2025  mlelstv Use correct type function for block and character devices.
Use DIOCGMEDIASSIZE ioctl when no partition info is available
to generate st_size information. This helps dk(4) and dm(4) devices.
 1.218 22-Apr-2023  riastradh branches: 1.218.6;
specfs: KNF. No functional change intended.
 1.217 22-Apr-2023  hannken Remove unused specdev member sd_rdev.

Ride 10.99.4
 1.216 15-Oct-2022  riastradh specfs(9): Attribute blame by stack trace for write to r/o medium.
 1.215 21-Sep-2022  riastradh specfs(9): XXX comment: what if read downgrades lock?
 1.214 12-Aug-2022  riastradh specfs: Refuse to open a closing-in-progress block device.

We could wait for close to complete, but if this happened ever so
slightly earlier it would lead to EBUSY anyway, so there's no point
in adding logic for that -- either way the caller neglected to wait
for the last close to finish before trying to open it the device
again.

https://mail-index.netbsd.org/current-users/2022/08/09/msg042800.html

Reported-by: syzbot+4388f20706ec8a4c8db0@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=47c67ab6d3a87514d0707882a9ad6671beaa8642

Reported-by: syzbot+0f1756652dce4cb341ed@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=a632ce762d64241fc82a9bc57230b7b7c7095d1a
 1.213 12-Aug-2022  riastradh specfs: Assert !closing on successful open.

- If there's a prior concurrent close, it must have interrupted this
open.

- If there's a new concurrent close, it must wait until this open has
released device_lock before it can revoke.
 1.212 12-Aug-2022  riastradh specfs: Assert opencnt>0 on successful open.
 1.211 11-Aug-2022  riastradh specfs: Sprinkle opencnt/opened/closing assertions.

There seems to be a bug here but I'm not sure what it is yet:

https://mail-index.netbsd.org/current-users/2022/08/09/msg042800.html
https://syzkaller.appspot.com/bug?id=47c67ab6d3a87514d0707882a9ad6671beaa8642

The decision to actually invoke d_close is serialized under
device_lock, so it should not be possible for more than one process
to close at the same time, but syzbot and kre found a way for
sd_closing to be false later in spec_close. Let's make sure it's
false when we're making what should be the exclusive decision to
close.

We can't assert !sd_opened before cancel and spec_io_drain, because
those are necessary to interrupt and wait for pending opens that
might later set sd_opened, but we can assert !sd_opened afterward
because once sd_closing is true nothing should set sd_opened.
 1.210 28-Mar-2022  riastradh driver(9): New devsw d_cancel op to interrupt I/O before close.

If specified, when revoking a device node or closing its last open
node, specfs will:

1. Call d_cancel, which should return promptly without blocking.
2. Wait for all concurrent d_read/write/ioctl/&c. to drain.
3. Call d_close.

Otherwise, specfs will:

1. Call d_close.
2. Wait for all concurrent d_read/write/ioctl/&c. to drain.

This fallback is problematic because often parts of d_close rely on
concurrent devsw operations to have completed already, so it is up to
each driver to have its own mechanism for waiting, and the extra step
in (2) is almost redundant. But it is still important to ensure that
devsw operations are not active by the time a module tries to invoke
devsw_detach, because only d_open is protected against that.

The signature of d_cancel matches d_close, mostly so we don't raise
questions about `why is this different?'; the lwp argument is not
useful but we should remove it from open/cancel/close all at the same
time.

The only way d_cancel should fail, if it does at all, is with ENODEV,
meaning the driver doesn't support cancelling outstanding I/O, and
will take responsibility for that in d_close. I would make it return
void and only have bdev_cancel and cdev_cancel possibly return ENODEV
so specfs can detect whether a driver supports it, but this would
break the pattern around devsw operation types.

Drivers are allowed to omit it from struct bdevsw, struct cdevsw --
if so, it is as if they used a function that just returns ENODEV.

XXX kernel ABI change to struct bdevsw/cdevsw requires bump
 1.209 28-Mar-2022  riastradh specfs: Remove specnode from hash table in spec_node_revoke.

Previously, it was possible for spec_node_lookup_by_dev to handle a
speconde that a concurrent spec_node_destroy is about to remove from
the hash table and then free, as soon as spec_node_lookup_by_dev
releases device_lock.

Now, the ordering is:

1. Remove specnode from hash table in spec_node_revoke. At this
point, no _new_ vnode references are possible (other than possibly
one acquired by vcache_vget under v_interlock), but there may be
existing ones.

2. Mark vnode reclaimed so vcache_vget will fail.

3. The last vrele (or equivalent logic in vcache_vget) will then free
the specnode in spec_node_destroy.

This way, _if_ a thread in spec_node_lookup_by_dev finds a specnode
in the hash table under device_lock/v_interlock, _then_ it will not
be freed until the thread completes vcache_vget.

This change requires calling spec_node_revoke unconditionally for
device special nodes, not just for active ones. Might introduce
slightly more contention on device_lock but not much because we
already have to take it in this path anyway a little later in
spec_node_destroy.
 1.208 28-Mar-2022  riastradh specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.
 1.207 28-Mar-2022  riastradh specfs: Assert opencnt is nonzero before decrementing.
 1.206 28-Mar-2022  riastradh specfs: Take an I/O reference across bdev/cdev_open.

- Revoke is used to invalidate all prior access control checks when
device permissions are changing, so it must wait for .d_open to exit
so any new access must go through new access control checks.

- Revoke is used by vdevgone in xyz_detach to wait until all use of
the driver's data structures have completed before xyz_detach frees
them.

So we need to make sure spec_close waits for .d_open too.
 1.205 28-Mar-2022  riastradh specfs: Wait for last close in spec_node_revoke.

Otherwise, revoke -- and vdevgone, in the detach path of removable
devices -- may complete while I/O operations are still running
concurrently.
 1.204 28-Mar-2022  riastradh specfs: Prevent new opens while close is waiting to drain.

Otherwise, bdev/cdev_close could have cancelled all _existing_ opens,
and waited for them to complete (and freed resources used by them) --
but a new one could start, and hang (e.g., a tty), at the same time
spec_close tries to drain all pending I/O operations, one of which
(the new open) is now hanging indefinitely.

Preventing the new open from even starting until bdev/cdev_close is
finished and all I/O operations have drained avoids this deadlock.
 1.203 28-Mar-2022  riastradh specfs: Take an I/O reference in spec_node_setmountedfs.

This is not quite correct. We _should_ require the caller to hold a
vnode lock around spec_node_getmountedfs, and an exclusive vnode lock
around spec_node_setmountedfs, so that it is only necessary to check
whether revoke has already happened, not hold an I/O reference.

Unfortunately, various callers in various file systems don't follow
this sensible rule. So let's at least make sure the vnode can't be
revoked in spec_node_setmountedfs, while we're in bdev_ioctl, and
leave a comment explaining what the sorry state of affairs is and how
to fix it later.
 1.202 28-Mar-2022  riastradh specfs: Drain all I/O operations after last .d_close call.

New kind of I/O reference on specdevs, sd_iocnt. This could be done
with psref instead; I chose a reference count instead for now because
we already have to take a per-object lock anyway, v_interlock, for
vdead_check, so another atomic is not likely to hurt much more. We
can always change the mechanism inside spec_io_enter/exit/drain later
on.

Make sure every access to vp->v_rdev or vp->v_specnode and every call
to a devsw operation is protected either:

- by the vnode lock (with vdead_check if we unlocked/relocked),
- by positive sd_opencnt,
- by spec_io_enter/exit, or
- by sd_opencnt management in open/close.
 1.201 28-Mar-2022  riastradh specfs: Resolve a race between close and a failing reopen.
 1.200 28-Mar-2022  riastradh specfs: Paranoia: Assert opencnt is zero on reclaim.
 1.199 28-Mar-2022  riastradh specfs: Omit needless vdead_check in spec_fdiscard.

The vnode lock is held, so the vnode cannot be revoked without also
changing v_op so subsequent uses under the vnode lock will go to
deadfs's VOP_FDISCARD instead (which is genfs_eopnotsupp).
 1.198 28-Mar-2022  riastradh specfs: Add a comment and assertion to spec_close about refcnts.
 1.197 28-Mar-2022  riastradh specfs: If sd_opencnt is zero, sn_opencnt had better be zero.
 1.196 28-Mar-2022  riastradh specfs: Factor KASSERT out of switch in spec_open.

No functional change.
 1.195 28-Mar-2022  riastradh specfs: sn_gone cannot be set while we hold the vnode lock.

Revoke runs with the vnode lock too, which is exclusive. Add an
assertion to this effect in spec_node_revoke to make it clear.
 1.194 28-Mar-2022  riastradh specfs: Reorganize D_DISK tail of spec_open and explain what's up.

No functional change intended.
 1.193 28-Mar-2022  riastradh specfs: Factor VOP_UNLOCK/vn_lock out of switch for clarity.

No functional change.
 1.192 28-Mar-2022  riastradh specfs: Factor common device_lock out of switch for clarity.

No functional change.
 1.191 28-Mar-2022  riastradh specfs: Delete bogus comment about .d_open/.d_close at same time.

Annoying as it is that .d_open and .d_close can run at the same time,
it is also necessary for tty semantics, where open can block
indefinitely, and it is the responsibility of close (called via
revoke) necessary to interrupt it.
 1.190 28-Mar-2022  riastradh specfs: Split spec_open switch into three sections.

The sections are now:

1. Acquire open reference.

1a (intermezzo). Set VV_ISTTY.

2. Drop the vnode lock to call .d_open and autoload modules if
necessary.

3. Handle concurrent revoke if it happenend, or release open reference
if .d_open failed.

No functional change. Sprinkle comments about problems.
 1.189 28-Mar-2022  riastradh specfs: Factor common kauth check out of switch in spec_open.

No functional change.
 1.188 28-Mar-2022  riastradh specfs: Assert v_type is VBLK or VCHR in spec_open.

Nothing else makes sense. Prune dead branches (and replace default
case by panic).
 1.187 28-Mar-2022  riastradh specfs: Call bdev_open without the vnode lock.

There is no need for it to serialize opens, because they are already
serialized by sd_opencnt which for block devices is always either 0
or 1.

There's not obviously any other reason why the vnode lock should be
held across bdev_open, other than that it might be nice to avoid
dropping it if not necessary. For character devices we always have
to drop the vnode lock because open might hang indefinitely, when
opening a tty, which is not allowed while holding the vnode lock.
 1.186 28-Mar-2022  riastradh specfs: Note lock order for vnode lock, device_lock, v_interlock.
 1.185 28-Mar-2022  riastradh driver(9): Eliminate D_MCLOSE.

D_MCLOSE was introduced a few years ago by mistake for audio(4),
which should have used -- and now does use -- fd_clone to create
per-open state. The semantics was originally to call close once
every time the device node is closed, not only for the last close.
Nothing uses it any more, and it complicates reasoning about the
system, so let's simplify it away.
 1.184 19-Mar-2022  hannken Switch spec_vnodeop vector to real vnode locking, VV_LOCKSWORK now.
 1.183 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.182 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.181 25-Dec-2020  mlelstv branches: 1.181.4;
When reading from a block device, queue parallel block requests to
fill a buffer with breadn.
 1.180 27-Jun-2020  christos branches: 1.180.2;
Introduce genfs_pathconf() and use it for the default case in all filesystems.
 1.179 23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.178 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.177 13-Apr-2020  jdolecek when determining I/O block size for VBLK device, only use pi_bsize
returned by DIOCGPARTINFO if it's bigger than DEV_BSIZE and less
than MAXBSIZE (MAXPHYS)

fixes panic "buf mem pool index 8" in buf_mempoolidx() when the
disklabel contains bsize 128KB and something reads the block device -
buffer cache can't allocate bufs bigger than MAXPHYS
 1.176 22-Sep-2019  christos branches: 1.176.6;
Add a new member to struct vfsstat and grow the unused members
The new member is caled f_mntfromlabel and it is the dkw_wname
of the corresponding wedge. This is now used by df -W to display
the mountpoint name as NAME=
 1.175 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.174 24-Jun-2017  hannken branches: 1.174.4; 1.174.6;
Refuse to open a block device with zero open count when it has
a mountpoint set. This may happen after forced detach or unplug
of a mounted block device.
 1.173 01-Jun-2017  chs branches: 1.173.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.172 26-May-2017  riastradh Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.171 12-Apr-2017  martin branches: 1.171.2;
Make the non-DIAGNOSTIC version compile
 1.170 11-Apr-2017  riastradh Make VOP_INACTIVE preserve vnode lock on return.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2017/04/01/msg021751.html

Ride 7.99.68, a bumpy bus of incremental vfs improvements!
 1.169 01-Mar-2017  hannken Add a diagnostic test for buffers written to a block device holding
a read-only mounted file system.

This will become a KASSERT in the near future.
 1.168 02-Jan-2017  hannken branches: 1.168.2;
Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54
 1.167 09-Dec-2016  nat Add functions to access device flags. This restores simultaneous audio
open/close.

OK hannken@ christos@
 1.166 08-Dec-2016  nat The audio sub-system now supports the following features as
posted to tech-kern:

* Simultaneous playback and mixing of multiple streams
* Playback streams can be of different encoding, frequency, precision
and number of channels
* Simultaneous recording to different formats
* One audio device per process
* Sysctls to set the common format frequency, precision and channels
* Independent mixer controls for recording/playback per stream
* Utilizes little cpu time for multiple streams / good performance
* Compatible with existing programs that use OSS/NetBSD audio
* Changes to audioctl(1) to allow specifying process id for corresponding
audio device
 1.165 08-Sep-2016  pgoyette Revert rev 1.164. This will be redone differently (using "dummy"
modules).

This implementation requires changes to a base kernel in order to
update the set of "special" modules, kinda defeating the purpose of
having modules in the first place. The new method will use dummy
modules (with name tap and tun) which will depend on the real
modules with the if_ prefix.

Coming soon to a NetBSD near you.
 1.164 08-Sep-2016  pgoyette if_config processing wants to auto-load modules named with an if_ prefix,
while specfc wants to auto-load modules without the prefix. For modules
which can be loaded both ways (ie, if_tap and if_tun), provide a simple
conversion table for specfs so it can auto-load the if_ module.

This table should always be quite small, and the auto-load operation is
relatively infrequent, so the additional overhead of comparing names should
be tolerable.
 1.163 20-Aug-2016  hannken Remove now obsolete operation vcache_remove().

Welcome to 7.99.36
 1.162 04-Apr-2016  hannken branches: 1.162.2;
Avoid a race with spec_revoke for the assertion too.

Final fix for PR kern/50467 Panic from disconnecting phone while reading
its contents
 1.161 26-Mar-2016  hannken Whhen spec_strategy() extracts v_rdev take care to avoid a
race with spec_revoke.

Fixes PR kern/50467 Panic from disconnecting phone while reading its contents
 1.160 05-Jan-2016  pgoyette Fix a couple of checks for kernel vm_space, and convert the 'naked
panic" code to KASSERT/KASSERTMSG.

Thanks, Taylor!
 1.159 23-Dec-2015  pgoyette Revert previous
 1.158 22-Dec-2015  pgoyette If we attempt to autoload a driver module, make sure we return an error
if it fails. Otherwise we might end up calling a builtin-but-disabled
driver module and that can generate all sorts of issues...
 1.157 08-Dec-2015  christos Replace DIOCGPART -> DIOCGPARTINFO which returns the data needed instead of
pointers.
 1.156 08-Dec-2015  christos unfortunately it is not that easy to get rid of DIOCGPART. DTRT for the
raw partition and print a warning if we overflowed. I guess the right solution
for this is to create yet another version of disklabel that is 64 bit friendly.
 1.155 05-Dec-2015  jnemeth messing with uninitialized structs is a bad thing
 1.154 04-Dec-2015  christos Use DIOCGMEDIASIZE instead of DIOCGPART so that we are not limited to 2G.
XXX: All DIOCGPART code needs to be removed...
XXX: pullup-7
 1.153 01-Jul-2015  hannken Unfortunately MFS uses v_data of its anonymous device vnode so
it cannot be used as vcache key. Use v_interlock as key ...
 1.152 30-Jun-2015  hannken Redo previous again, v_specnode is invariant but not unique.

Set "vp->v_data = vp" and use v_data as key.
 1.151 29-Jun-2015  hannken Use the address of vp->v_specnode as vcache key. It is invariant
over the lifetime of the vnode.

The previous worked by luck, it took the first sizeof(void *) bytes
of struct vnode as key.

Resolves CID 1308957: wrong sizeof()
 1.150 29-Jun-2015  christos Revert previous, and explain why.
 1.149 29-Jun-2015  christos CID 1308957: Fix wrong sizeof()
 1.148 23-Jun-2015  hannken Add a vfs_newvnode() method to deadfs and use it to create
anonymous device vnodes with bdevvp() and cdevvp().

Implement spec_inactive() and spec_reclaim() to handle these nodes.
 1.147 20-Apr-2015  riastradh Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.
 1.146 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.145 25-Jul-2014  dholland branches: 1.145.2; 1.145.4; 1.145.6;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.144 25-Jul-2014  dholland Implement spec_fdiscard() using bdev_discard() and cdev_discard().
Also define spec_fallocate() to genfs_eopnotsupp().
 1.143 24-Mar-2014  hannken branches: 1.143.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.142 07-Feb-2014  hannken Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.141 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.140 20-Jul-2013  dholland oops, spell b_bcount properly
 1.139 20-Jul-2013  dholland In spec_strategy, if fscow_run() fails, set b_resid along with b_error
to avoid panic in biodone. Noticed by riastradh.
 1.138 16-Jun-2013  dholland branches: 1.138.2; 1.138.4;
Hang a warning banner on some nasty code I just found.
 1.137 13-Feb-2013  hannken Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17
 1.136 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.135 29-Apr-2012  chs branches: 1.135.2;
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.134 12-Jun-2011  rmind branches: 1.134.2; 1.134.6; 1.134.8;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.133 27-Apr-2011  hannken branches: 1.133.2;
Remove no longer needed flag FSYNC_VFS /* fsync: via FSYNC_VFS() */.
 1.132 26-Apr-2011  hannken Change vflushbuf() to return an error if a synchronous write fails.

Welcome to 5.99.51.
 1.131 21-Aug-2010  pgoyette branches: 1.131.2;
Update the rest of the kernel to conform to the module subsystem's new
locking protocol.
 1.130 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.129 13-Apr-2010  ahoka Revert my last change, it's not The Right Thing [tm].
 1.128 13-Apr-2010  ahoka Autoload modules with any class.

This fixes autoloading of pf, zfs and possibly others.
 1.127 14-Nov-2009  elad branches: 1.127.2; 1.127.4;
- Move kauth_init() a little bit higher.

- Add spec_init() to authorize special device actions (and passthru too for
the time being). Move policy out of secmodel_suser.
 1.126 06-Oct-2009  elad Factor out a block of code that appears in three places (Veriexec, keylock,
and securelevel) so that others can use it as well.
 1.125 04-Oct-2009  tsutsui Put workaround fix for LOCKDEBUG panic mentioned in PR kern/41078:
Don't try to load a driver module if the driver is already exist but just
not attached. [bc]dev_open() could return ENXIO even if the driver exists.

XXX: Maybe this should be handled by helper functions for
XXX: module_autoload() calls on demand.
 1.124 25-Apr-2009  rmind - Rearrange pg_delete() and pg_remove() (renamed pg_free), thus
proc_enterpgrp() with proc_leavepgrp() to free process group and/or
session without proc_lock held.
- Rename SESSHOLD() and SESSRELE() to to proc_sesshold() and
proc_sessrele(). The later releases proc_lock now.

Quick OK by <ad>.
 1.123 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.122 02-Feb-2009  haad branches: 1.122.2;
Add support for loading pseudo-device drivers. Try to autoload modules from
specs_open routine. If devsw_open fail, get driver name with devsw_getname
routine and autoload module.

For now only dm drivervcan be loaded, other pseudo drivers needs more work.

Ok by ad@.
 1.121 11-Jan-2009  christos merge christos-time_t
 1.120 29-Dec-2008  pooka Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.
 1.119 16-May-2008  hannken branches: 1.119.6; 1.119.10;
Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.118 29-Apr-2008  ad branches: 1.118.2;
PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.117 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.116 24-Apr-2008  ad branches: 1.116.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.115 25-Jan-2008  hannken branches: 1.115.6; 1.115.8;
Spec_open(): clear sd_bdevvp if bdev_open() failed.

Ok: Andrew Doran <ad@netbsd.org>
 1.114 25-Jan-2008  ad Remove VOP_LEASE. Discussed on tech-kern.
 1.113 24-Jan-2008  ad spec_fsync: don't assert that 'vp' holds the block device open. If it's
not open, there shouldn't be dirty buffers so vinvalbuf() is harmless.
 1.112 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.111 02-Jan-2008  ad Merge vmlocking2 to head.
 1.110 02-Dec-2007  hannken branches: 1.110.2; 1.110.6;
Fscow_run(): add a flag "bool data_valid" to note still valid data.
Buffers run through copy-on-write are marked B_COWDONE. This condition
is valid until the buffer has run through bwrite() and gets cleared from
biodone().

Welcome to 4.99.39.

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.109 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.108 10-Oct-2007  ad branches: 1.108.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.107 08-Oct-2007  ad Merge brelse() changes from the vmlocking branch.
 1.106 07-Oct-2007  hannken Update the file system copy-on-write handler.

- Instead of hooking the handler on the specdev of a mounted file system
hook directly on the `struct mount'.

- Rename from `vn_cow_*' to `fscow_*' and move to `kern/vfs_trans.c'. Use
`mount_*specific' instead of clobbering `struct mount' or `struct specinfo'.

- Replace the hand-made reader/writer lock with a krwlock.

- Keep `vn_cow_*' functions and mark as obsolete.

- Welcome to NetBSD 4.99.32 - `struct specinfo' changed size.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.105 01-Sep-2007  pooka branches: 1.105.2;
Make bioops a pointer and point it to the softdeps struct in softdep
init. Decouples "options SOFTDEP" from the main kernel and ffs code.
 1.104 03-Aug-2007  pooka branches: 1.104.2; 1.104.4; 1.104.6;
ANSI-fy
 1.103 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.102 27-Jul-2007  pooka vop_mmap parameter change
 1.101 22-Jul-2007  pooka Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.100 09-Jul-2007  ad branches: 1.100.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.99 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.98 04-Mar-2007  christos branches: 1.98.2; 1.98.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.97 26-Nov-2006  elad branches: 1.97.4;
Implement Veriexec's raw disk policy on-top of kauth(9)'s device scope,
using both the rawio_spec and passthru actions to detect raw disk
activity. Same for kernel memory policy.

Update documentation (no longer need to expose veriexec_rawchk()) and
remove all Veriexec-related bits from specfs.
 1.96 04-Nov-2006  elad Change KAUTH_SYSTEM_RAWIO to KAUTH_DEVICE_RAWIO_SPEC (moving the raw i/o
requests to the device scope) and add KAUTH_DEVICE_RAWIO_PASSTHRU.

Expose iskmemdev() through sys/conf.h.

okay yamt@
 1.95 02-Nov-2006  elad Redo Veriexec raw disk/memory access policies so they hold only if the
request is for write access.
 1.94 01-Nov-2006  elad Only use blkdev/bvp for the Veriexec case. While here, fix up IPS mode
restrictions on kernel memory.

okay yamt@
 1.93 30-Oct-2006  elad oops, remove debug printf slipped in. good catch from yamt@, thanks!
 1.92 30-Sep-2006  jld The poll routine needs to return POLLERR on error, not an errno. Sorry
about that. Pointed out by Juergen Hannken-Illjes in mail.
 1.91 21-Sep-2006  jld Protect spec_poll from racing against revocation and thus dereferencing a
NULL v_specinfo. Mostly copied (with understanding) from rev 1.83's fix
to spec_ioctl, and needed for the same reason (kern/vfs_subr.c r1.231).
 1.90 19-Sep-2006  elad For the VBLK case, we always check vfs_mountedon() and it has nothing
to do with the security model used. Move back the call to spec_open(),
which can now return the real return value from vfs_mountedon() (EBUSY)
and not EPERM, changing semantics.
 1.89 08-Sep-2006  elad branches: 1.89.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.88 11-Aug-2006  christos branches: 1.88.2;
Pretending to be Elad's keyboard:

fileassoc.diff adds a fileassoc_table_run() routine that allows you to
pass a callback to be called with every entry on a given mount.

veriexec.diff adds some raw device access policies: if raw disk is
opened at strict level 1, all fingerprints on this disk will be
invalidated as a safety measure. level 2 will not allow opening disk
for raw writing if we monitor it, and prevent raw writes to memory.
level 3 will not allow opening any disk for raw writing.

both update all relevant documentation.

veriexec concept is okay blymn@.
 1.87 14-May-2006  elad branches: 1.87.6;
integrate kauth.
 1.86 01-Mar-2006  yamt branches: 1.86.2; 1.86.4; 1.86.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.85 11-Dec-2005  christos branches: 1.85.2; 1.85.4; 1.85.6;
merge ktrace-lwp.
 1.84 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.83 11-Sep-2005  chs branches: 1.83.2;
in spec_ioctl(), don't dereference v_specinfo if it's NULL.
this is needed due to rev. 1.231 of kern/vfs_subr.c, which now sets
v_specinfo to NULL before changing the vnode's ops vector.
 1.82 30-Aug-2005  xtraeme Remove __P()
 1.81 21-Jun-2005  ws branches: 1.81.2;
PR-30566: Poll must not return <sys/errno.h> values.
Start with those places I can easily test.
 1.80 26-Feb-2005  perry branches: 1.80.2;
nuke trailing whitespace
 1.79 25-May-2004  hannken branches: 1.79.4; 1.79.6;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.78 12-May-2004  jrf caddr_t -> void * and removal of some more casts.
 1.77 14-Feb-2004  hannken branches: 1.77.2; 1.77.4; 1.77.6;
Add a generic copy-on-write hook to add/remove functions that will be
called with every buffer written through spec_strategy().

Used by fss(4). Future file-system-internal snapshots will need them too.

Welcome to 1.6ZK

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.76 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.75 10-Dec-2003  hannken The file system snapshot pseudo driver.

Uses a hook in spec_strategy() to save data written from a mounted
file system to its block device and a hook in dounmount().

Not enabled by default in any kernel config.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.74 26-Nov-2003  pk spec_close: asserting that the terminal's process group be set if it is
associated with a session is too strong; a foreground group may go away
without being immediately replaced with another.
 1.73 25-Nov-2003  pk spec_close: we don't need to lock the vnode just to make a copy of its flags.
 1.72 24-Nov-2003  pk spec_close: controlling terminal hack: drop session reference count only if
we actually had a reference.
 1.71 06-Nov-2003  dsl When closing a process's controlling terminal, also remove the links
to the session and pgrp from the tty. The way that the console is
handled means that the vrele() may not actually do the final close
on the tty itself.
 1.70 15-Oct-2003  dsl Set vnode size of character disk devices to that of the partition when they
are opened (was always done for block devices).
This means that fstat will report the partition size and hence newfs
needn't grovel into the disklabel to find the filesystem size.
 1.69 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.68 29-Jun-2003  fvdl branches: 1.68.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.67 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.66 26-Oct-2002  jdolecek now that mem_no is emitted by config(8), there is no reason to keep
copy of more or less identical iskmemdev() for every arch; move the function
to spec_vnop.c, and g/c machine-dependant copies
 1.65 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.64 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.63 26-Aug-2002  thorpej Fix a signed/unsigned comparison warning from GCC 3.3.
 1.62 10-Jul-2002  wiz Spell acquire with a 'c'.
 1.61 12-May-2002  matt branches: 1.61.2;
Extern speclisth
 1.60 10-Nov-2001  lukem add RCSIDs
 1.59 23-Sep-2001  chs branches: 1.59.2;
change spec_{read,write}() to specify the device blkno in units of DEV_BSIZE
rather than the device's sector size. this allows /dev/rcd0a and /dev/cd0a
to return the same data. fixes PRs 3261 and 14026.
 1.58 21-Sep-2001  chs use shared locks instead of exclusive for VOP_READ() and VOP_READDIR().
 1.57 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.56 18-Aug-2001  chs branches: 1.56.2;
undo the part of the last revision that made user block device access
use the UBC interfaces. too many problems with that yet.
 1.55 17-Aug-2001  chs initialize the UVM vnode size for block devices.
UBCify user access to block devices.
 1.54 17-Apr-2001  thorpej branches: 1.54.2;
Don't hold vp->v_interlock when calling vcount(); vcount() calls
vgone(), which may sleep.
 1.53 22-Jan-2001  jdolecek branches: 1.53.2;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.52 08-Nov-2000  chs fix an LP64BE bogon.
 1.51 27-Oct-2000  jmc Remove usecount check in spec_open. It fails to catch VALIAS situations
and vfs_mountedon will handle them all correctly.
 1.50 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.
 1.49 22-Jul-2000  jdolecek change the lf_advlock() arguments from

int lf_advlock __P((struct lockf **,
off_t, caddr_t, int, struct flock *, int));
to

int lf_advlock __P((struct vop_advlock_args *, struct lockf **, off_t));

This matches common usage and is also compatible with similar change
in FreeBSD (though they use u_quad_t as last arg).
 1.48 30-Mar-2000  augustss branches: 1.48.4;
Register, begone!
 1.47 08-Dec-1999  sommerfeld Add appropriate VOP_FCNTL handlers to deadfs and specfs ops vectors.
 1.46 08-Dec-1999  sommerfeld Change to comment (only) indicating what the specfs ops vector is used for.
 1.45 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.44 16-Oct-1999  wrstuden branches: 1.44.2; 1.44.4;
In spec_close(), if we're not doing a non-blocking close and VXLOCK is
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.

Make all unmount routines lock the device vnode before calling VOP_CLOSE().
 1.43 02-Oct-1998  ross branches: 1.43.6; 1.43.12;
Make spec_write() process errors and return them, otherwise we don't even
notice things like hitting the end of a partition or device. (To be sure,
writes to block special files are rare, but as long as we support them...)
 1.42 18-Aug-1998  thorpej Add some braces to make egcs happy (ambiguous else warning).
 1.41 03-Aug-1998  kleink Recognize _PC_SYNC_IO.
 1.40 05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.39 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.38 16-Oct-1997  christos Add missing cast to dev_t
 1.37 09-Oct-1997  mycroft Make various standard wmesg strings const.
 1.36 02-Apr-1997  kleink branches: 1.36.4;
Remove superfluous (uio_resid == 0) check.
 1.35 02-Apr-1997  kleink added advisory record locking support
 1.34 13-Oct-1996  christos backout previous kprintf changes
 1.33 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.32 07-Sep-1996  mycroft Implement poll(2).
 1.31 05-Sep-1996  thorpej Remove some unused variables.
 1.30 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.29 22-Apr-1996  christos remove include of <sys/cpu.h>
 1.28 09-Feb-1996  christos miscfs prototype changes
 1.27 15-Oct-1995  mycroft Implement VOP_BWRITE() using vn_bwrite(), per r_friedl@informatik.uni-kl.de.
 1.26 24-Jul-1995  cgd branches: 1.26.2;
avoid unnecessary aging of buffers. This used to make sense, when buffer
caches were much smaller, but makes little sense now, and will become more
useless as RAM (and buffer cache) sizes grow. Suggested by Bob Baron.
 1.25 08-Jul-1995  cgd add missing splx(), as suggested by enami@sys.ptg.sony.co.jp.
 1.24 02-Jul-1995  mycroft Make spec_read() and spec_write() vaguely consistent.
 1.23 10-Apr-1995  mycroft Use the new d_type field. Set VISTTY for vnodes of tty devices.
 1.22 14-Dec-1994  mycroft Remove a_fp.
 1.21 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.20 14-Nov-1994  christos fixed struct comment; passed extra argument (struct file *) to open
 1.19 29-Oct-1994  cgd light clean; make sure headers are properly included, types are OK, etc.
 1.18 20-Oct-1994  cgd update for new syscall args description mechanism
 1.17 16-Jul-1994  paulus Support for block special files with sector sizes other than DEV_BSIZE -
if the device has a disklabel with a non-zero sector size value, that
value is used instead of DEV_BSIZE.
 1.16 29-Jun-1994  cgd branches: 1.16.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.15 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.14 24-May-1994  cgd MIN -> min, MAX -> max
 1.13 21-Apr-1994  cgd Convert mount, vnode, and buf structs to use <sys/queue.h>. Also,
some knf and structure frobbing to do along with it.
 1.12 27-Jan-1994  cgd oops; fix that last...
 1.11 27-Jan-1994  cgd hack from Mike Karels to deal with the last close on a controlling
terminal. from 4.4BSD.
 1.10 22-Dec-1993  cgd fix return type of vnode print routine
 1.9 18-Dec-1993  mycroft Canonicalize all #includes.
 1.8 12-Nov-1993  cgd new specfs.h and fifo.h locations
 1.7 30-Oct-1993  glass fix chris typo.
 1.6 29-Oct-1993  cgd limit block sizes requested
 1.5 23-Aug-1993  cgd branches: 1.5.2;
changes from 0.9-ALPHA2 to 0.9-BETA
 1.4 27-Jun-1993  andrew branches: 1.4.2;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.3 20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.2 21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.4 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.1 20-Aug-1993  mycroft Add a small bit of debugging code.
 1.5.2.3 06-Jan-1994  pk Re-instate EOPNOTSUPP.
 1.5.2.2 28-Dec-1993  pk Use ENODEV rather then EOPNOTSUP for unsupported operations on non-socket devices
 1.5.2.1 12-Nov-1993  cgd new specfs.h and fifo.h locations, and include file syntax updates
 1.16.2.1 16-Jul-1994  cgd update from trunk, per paulus
 1.26.2.1 15-Oct-1995  mycroft Update from main branch.
 1.36.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.43.12.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.43.12.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.43.6.2 27-Oct-2000  he Pull up revision 1.51 (requested by jmc):
Fix security problem in spec_open().
 1.43.6.1 18-Oct-1999  cgd pull up rev 1.44 from trunk (requested by wrstuden):
In spec_close(), call the device's close routine with the vnode
unlocked if the call might block. Force a non-blocking close if
VXLOCK is set. This eliminates a potential deadlock situation, and
should eliminate the dirty buffers on reboot issue.
 1.44.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.44.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.44.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.44.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.44.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.48.4.4 27-Oct-2001  he Pull up revision 1.59 (via patch, requested by chs):
Change spec_{read,write}() to specify block number in units of
DEV_BSIZE instead of the device's sector size. With this,
/dev/rcd0a and /dev/cd0a returns the same data. Fixes PR#3261
and PR#14026.
 1.48.4.3 14-Dec-2000  he Pull up revision 1.50 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.48.4.2 30-Oct-2000  tv Pullup 1.51 [jmc]:
Remove usecount check in spec_open. It fails to catch VALIAS situations
and vfs_mountedon will handle them all correctly.
 1.48.4.1 30-Jul-2000  jdolecek Pullup from trunk (approved by thorpej):
Change lf_advlock() to:
int lf_advlock (struct vop_advlock_args *, struct lockf **, off_t)

This matches it's usage. Change inspired by FreeBSD, though we use
off_t instead u_quad_t as the last argument.

sys/lockf.h rev. 1.9
msdosfs/msdosfs_vnops.c rev. 1.99
kern/vfs_lockf.c rev. 1.17
miscfs/specfs/spec_vnops.c rev. 1.49
nfs/nfs_vnops.c rev. 1.115
ufs/ext2fs/ext2fs_vnops.c rev. 1.28
ufs/ufs/ufs_vnops.c rev. 1.72
 1.53.2.15 11-Nov-2002  nathanw Catch up to -current
 1.53.2.14 17-Sep-2002  nathanw Catch up to -current.
 1.53.2.13 27-Aug-2002  nathanw Catch up to -current.
 1.53.2.12 01-Aug-2002  nathanw Catch up to -current.
 1.53.2.11 15-Jul-2002  nathanw Whitespace.
 1.53.2.10 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.53.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.53.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.53.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.53.2.6 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.53.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.53.2.4 24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.53.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.53.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.53.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.54.2.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.54.2.6 26-Sep-2002  jdolecek spec_kqfilter(): return EOPNOTSUPP for !VCHR case; block devices don't
support kevents, and we don't want to attempt to support for any other
files ending here neither (i.e. those which get spec vnodeops via vflush())
 1.54.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.54.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.54.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.54.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.54.2.1 10-Jul-2001  lukem add spec_kqfilter()
 1.56.2.6 01-Oct-2001  fvdl Catch up with -current.
 1.56.2.5 28-Sep-2001  fvdl Bring locking state of VCHR vnodes across device entry points back
to what they used to be. The locking state of vnodes across
device functions really needs to be cleaned up, but at least
this is not worse than it was before.
 1.56.2.4 27-Sep-2001  fvdl Do real locking for cloned vnodes (most filesystems have real locking
for spec vnodes, so clones should have it too). Could probably do locking
all the time for spec vnodes, but need to check if vnodes created
during bootstrap with {b,c}devvp will cause trouble if they have actual
locks.
 1.56.2.3 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.56.2.2 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.56.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.59.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.61.2.3 29-Aug-2002  gehenna catch up with -current.
 1.61.2.2 15-Jul-2002  gehenna catch up with -current.
 1.61.2.1 16-May-2002  gehenna Replace the direct-access to devsw table with calling devsw APIs.
 1.68.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.68.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.68.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.68.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.68.2.2 03-Aug-2004  skrll Sync with HEAD
 1.68.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.77.6.1 29-Dec-2005  riz Pull up following revision(s) (requested by chs in ticket #10207):
sys/miscfs/specfs/spec_vnops.c: revision 1.83
in spec_ioctl(), don't dereference v_specinfo if it's NULL.
this is needed due to rev. 1.231 of kern/vfs_subr.c, which now sets
v_specinfo to NULL before changing the vnode's ops vector.
 1.77.4.1 29-Dec-2005  riz Pull up following revision(s) (requested by chs in ticket #10207):
sys/miscfs/specfs/spec_vnops.c: revision 1.83
in spec_ioctl(), don't dereference v_specinfo if it's NULL.
this is needed due to rev. 1.231 of kern/vfs_subr.c, which now sets
v_specinfo to NULL before changing the vnode's ops vector.
 1.77.2.1 29-Dec-2005  riz Pull up following revision(s) (requested by chs in ticket #10207):
sys/miscfs/specfs/spec_vnops.c: revision 1.83
in spec_ioctl(), don't dereference v_specinfo if it's NULL.
this is needed due to rev. 1.231 of kern/vfs_subr.c, which now sets
v_specinfo to NULL before changing the vnode's ops vector.
 1.79.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.79.4.1 29-Apr-2005  kent sync with -current
 1.80.2.2 11-Nov-2006  bouyer Pull up following revision(s) (requested by jld in ticket #1557):
sys/miscfs/specfs/spec_vnops.c: revision 1.91 via patch
Protect spec_poll from racing against revocation and thus dereferencing a
NULL v_specinfo. Mostly copied (with understanding) from rev 1.83's fix
to spec_ioctl, and needed for the same reason (kern/vfs_subr.c r1.231).
 1.80.2.1 26-Sep-2005  tron branches: 1.80.2.1.2; 1.80.2.1.4;
Pull up following revision(s) (requested by chs in ticket #812):
sys/miscfs/specfs/spec_vnops.c: revision 1.83
in spec_ioctl(), don't dereference v_specinfo if it's NULL.
this is needed due to rev. 1.231 of kern/vfs_subr.c, which now sets
v_specinfo to NULL before changing the vnode's ops vector.
 1.80.2.1.4.1 11-Nov-2006  bouyer Pull up following revision(s) (requested by jld in ticket #1557):
sys/miscfs/specfs/spec_vnops.c: revision 1.91 via patch
Protect spec_poll from racing against revocation and thus dereferencing a
NULL v_specinfo. Mostly copied (with understanding) from rev 1.83's fix
to spec_ioctl, and needed for the same reason (kern/vfs_subr.c r1.231).
 1.80.2.1.2.1 11-Nov-2006  bouyer Pull up following revision(s) (requested by jld in ticket #1557):
sys/miscfs/specfs/spec_vnops.c: revision 1.91 via patch
Protect spec_poll from racing against revocation and thus dereferencing a
NULL v_specinfo. Mostly copied (with understanding) from rev 1.83's fix
to spec_ioctl, and needed for the same reason (kern/vfs_subr.c r1.231).
 1.81.2.7 04-Feb-2008  yamt sync with head.
 1.81.2.6 21-Jan-2008  yamt sync with head
 1.81.2.5 07-Dec-2007  yamt sync with head
 1.81.2.4 27-Oct-2007  yamt sync with head.
 1.81.2.3 03-Sep-2007  yamt sync with head.
 1.81.2.2 30-Dec-2006  yamt sync with head.
 1.81.2.1 21-Jun-2006  yamt sync with head.
 1.83.2.1 20-Oct-2005  yamt adapt specfs and fifofs.
 1.85.6.2 01-Jun-2006  kardel Sync with head.
 1.85.6.1 22-Apr-2006  simonb Sync with head.
 1.85.4.1 09-Sep-2006  rpaulo sync with head
 1.85.2.1 31-Dec-2005  yamt adapt some random parts of kernel to uio_vmspace.
 1.86.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.86.4.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.86.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.86.2.3 14-Sep-2006  yamt sync with head.
 1.86.2.2 03-Sep-2006  yamt sync with head.
 1.86.2.1 24-May-2006  yamt sync with head.
 1.87.6.1 14-Aug-2006  tron Pull up following revision(s) (requested by elad in ticket #15):
sys/miscfs/specfs/spec_vnops.c: revision 1.88
share/man/man9/fileassoc.9: revision 1.7
sys/kern/kern_verifiedexec.c: revision 1.66
sys/sys/verified_exec.h: revision 1.39
sys/sys/fileassoc.h: revision 1.3
lib/libc/gen/sysctl.3: revision 1.178
share/man/man9/veriexec.9: revision 1.4
sys/kern/kern_fileassoc.c: revision 1.6
Pretending to be Elad's keyboard:
fileassoc.diff adds a fileassoc_table_run() routine that allows you to
pass a callback to be called with every entry on a given mount.
veriexec.diff adds some raw device access policies: if raw disk is
opened at strict level 1, all fingerprints on this disk will be
invalidated as a safety measure. level 2 will not allow opening disk
for raw writing if we monitor it, and prevent raw writes to memory.
level 3 will not allow opening any disk for raw writing.
both update all relevant documentation.
veriexec concept is okay blymn@.
 1.88.2.2 12-Jan-2007  ad Sync with head.
 1.88.2.1 18-Nov-2006  ad Sync with head.
 1.89.2.2 10-Dec-2006  yamt sync with head.
 1.89.2.1 22-Oct-2006  yamt sync with head
 1.97.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.98.4.1 11-Jul-2007  mjf Sync with head.
 1.98.2.12 09-Oct-2007  ad Sync with head.
 1.98.2.11 09-Oct-2007  ad Sync with head.
 1.98.2.10 24-Aug-2007  ad Remove the only (and insane) reference to B_TAPE that came long with 386BSD.
 1.98.2.9 20-Aug-2007  ad Sync with HEAD.
 1.98.2.8 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.98.2.7 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.98.2.6 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.98.2.5 09-Jun-2007  ad Sync with head.
 1.98.2.4 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.98.2.3 13-Apr-2007  ad - Make the devsw interface MP safe, and add some comments.
- Allow individual block/character drivers to be marked MP safe.
- Provide wrappers around the device methods that look up the
device, returning ENXIO if it's not found, and acquire the
kernel lock if needed.
 1.98.2.2 13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.98.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.100.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.100.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.104.6.2 03-Aug-2007  pooka ANSI-fy
 1.104.6.1 03-Aug-2007  pooka file spec_vnops.c was added on branch matt-mips64 on 2007-08-03 08:45:37 +0000
 1.104.4.3 23-Mar-2008  matt sync with HEAD
 1.104.4.2 09-Jan-2008  matt sync with HEAD
 1.104.4.1 06-Nov-2007  matt sync with HEAD
 1.104.2.5 03-Dec-2007  joerg Sync with HEAD.
 1.104.2.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.104.2.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.104.2.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.104.2.1 03-Aug-2007  jmcneill file spec_vnops.c was added on branch jmcneill-pm on 2007-09-03 16:48:52 +0000
 1.105.2.1 14-Oct-2007  yamt sync with head.
 1.108.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.108.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.110.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.110.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.115.8.1 18-May-2008  yamt sync with head.
 1.115.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.115.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.116.2.5 09-Oct-2010  yamt sync with head
 1.116.2.4 11-Aug-2010  yamt sync with head.
 1.116.2.3 11-Mar-2010  yamt sync with head
 1.116.2.2 04-May-2009  yamt sync with head.
 1.116.2.1 16-May-2008  yamt sync with head.
 1.118.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.119.10.3 30-Dec-2008  christos sync with head.
 1.119.10.2 09-Nov-2008  christos account for major and minor being unsigned long long
 1.119.10.1 16-May-2008  christos file spec_vnops.c was added on branch christos-time_t on 2008-11-09 02:05:20 +0000
 1.119.6.3 28-Apr-2009  skrll Sync with HEAD.
 1.119.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.119.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.122.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.127.4.4 31-May-2011  rmind sync with head
 1.127.4.3 05-Mar-2011  rmind sync with head
 1.127.4.2 03-Jul-2010  rmind sync with head
 1.127.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.127.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.127.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.131.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.133.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.134.8.2 10-May-2016  snj Pull up following revision(s) (requested by hannken in ticket #1376):
sys/miscfs/specfs/spec_vnops.c: revisions 1.161, 1.162 via patch
Whhen spec_strategy() extracts v_rdev take care to avoid a
race with spec_revoke.
Fixes PR kern/50467 Panic from disconnecting phone while reading its contents
--
Avoid a race with spec_revoke for the assertion too.
Final fix for PR kern/50467 Panic from disconnecting phone while reading
its contents
 1.134.8.1 07-May-2012  riz branches: 1.134.8.1.4; 1.134.8.1.6;
Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.134.8.1.6.1 10-May-2016  snj Pull up following revision(s) (requested by hannken in ticket #1376):
sys/miscfs/specfs/spec_vnops.c: revisions 1.161, 1.162 via patch
Whhen spec_strategy() extracts v_rdev take care to avoid a
race with spec_revoke.
Fixes PR kern/50467 Panic from disconnecting phone while reading its contents
--
Avoid a race with spec_revoke for the assertion too.
Final fix for PR kern/50467 Panic from disconnecting phone while reading
its contents
 1.134.8.1.4.1 10-May-2016  snj Pull up following revision(s) (requested by hannken in ticket #1376):
sys/miscfs/specfs/spec_vnops.c: revisions 1.161, 1.162 via patch
Whhen spec_strategy() extracts v_rdev take care to avoid a
race with spec_revoke.
Fixes PR kern/50467 Panic from disconnecting phone while reading its contents
--
Avoid a race with spec_revoke for the assertion too.
Final fix for PR kern/50467 Panic from disconnecting phone while reading
its contents
 1.134.6.1 02-Jun-2012  mrg sync to latest -current.
 1.134.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.134.2.2 23-Jan-2013  yamt sync with head
 1.134.2.1 23-May-2012  yamt sync with head.
 1.135.2.4 03-Dec-2017  jdolecek update from HEAD
 1.135.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.135.2.2 23-Jun-2013  tls resync from head
 1.135.2.1 25-Feb-2013  tls resync with head
 1.138.4.1 23-Jul-2013  riastradh sync with HEAD
 1.138.2.2 18-May-2014  rmind sync with head
 1.138.2.1 28-Aug-2013  rmind sync with head
 1.143.2.1 10-Aug-2014  tls Rebase.
 1.145.6.1 29-Apr-2016  snj Pull up following revision(s) (requested by hannken in ticket #1154):
sys/miscfs/specfs/spec_vnops.c: revision 1.161, 1.162
Whhen spec_strategy() extracts v_rdev take care to avoid a
race with spec_revoke.
Fixes PR kern/50467 Panic from disconnecting phone while reading its contents
--
Avoid a race with spec_revoke for the assertion too.
Final fix for PR kern/50467 Panic from disconnecting phone while reading
its contents
 1.145.4.9 28-Aug-2017  skrll Sync with HEAD
 1.145.4.8 05-Feb-2017  skrll Sync with HEAD
 1.145.4.7 05-Oct-2016  skrll Sync with HEAD
 1.145.4.6 22-Apr-2016  skrll Sync with HEAD
 1.145.4.5 19-Mar-2016  skrll Sync with HEAD
 1.145.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.145.4.3 22-Sep-2015  skrll Sync with HEAD
 1.145.4.2 06-Jun-2015  skrll Sync with HEAD
 1.145.4.1 06-Apr-2015  skrll Sync with HEAD
 1.145.2.1 29-Apr-2016  snj Pull up following revision(s) (requested by hannken in ticket #1154):
sys/miscfs/specfs/spec_vnops.c: revisions 1.161, 1.162
Whhen spec_strategy() extracts v_rdev take care to avoid a
race with spec_revoke.
Fixes PR kern/50467 Panic from disconnecting phone while reading its contents
--
Avoid a race with spec_revoke for the assertion too.
Final fix for PR kern/50467 Panic from disconnecting phone while reading
its contents
 1.162.2.8 26-Apr-2017  pgoyette Sync with HEAD
 1.162.2.7 20-Mar-2017  pgoyette Sync with HEAD
 1.162.2.6 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.162.2.5 23-Jul-2016  pgoyette Simplify, remove redundant code.
 1.162.2.4 23-Jul-2016  pgoyette Restore original handling of ioctl() returns. If the underlying disk's
ioctl() returns success, we call uvm_vnp_setsize(). Regardless of any
error from the ioctl() call we should return success.
 1.162.2.3 22-Jul-2016  pgoyette Return the actual error code, rather than blind success.
 1.162.2.2 21-Jul-2016  pgoyette fix an error patch to call {b,c}devsw_release()
 1.162.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.168.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.171.2.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.173.2.1 01-Jul-2017  snj Pull up following revision(s) (requested by hannken in ticket #76):
sys/miscfs/specfs/spec_vnops.c: revision 1.174
Refuse to open a block device with zero open count when it has
a mountpoint set. This may happen after forced detach or unplug
of a mounted block device.
 1.174.6.3 21-Apr-2020  martin Sync with HEAD
 1.174.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.174.6.1 10-Jun-2019  christos Sync with HEAD
 1.174.4.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.176.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.180.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.181.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.218.6.1 02-Aug-2025  perseant Sync with HEAD
 1.54 22-Apr-2023  hannken Remove unused specdev member sd_rdev.

Ride 10.99.4
 1.53 26-Oct-2022  riastradh miscfs/specfs/specdev.h: New home for extern spec_vnodeop_opv_desc.

Also use it for extern spec_vnodeop_p, which is already there.
 1.52 28-Mar-2022  riastradh specfs: Reorder struct specnode members to save padding.

Shrinks from 40 bytes to 32 bytes on LP64 systems this way.
 1.51 28-Mar-2022  riastradh specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.
 1.50 28-Mar-2022  riastradh specfs: Prevent new opens while close is waiting to drain.

Otherwise, bdev/cdev_close could have cancelled all _existing_ opens,
and waited for them to complete (and freed resources used by them) --
but a new one could start, and hang (e.g., a tty), at the same time
spec_close tries to drain all pending I/O operations, one of which
(the new open) is now hanging indefinitely.

Preventing the new open from even starting until bdev/cdev_close is
finished and all I/O operations have drained avoids this deadlock.
 1.49 28-Mar-2022  riastradh specfs: Drain all I/O operations after last .d_close call.

New kind of I/O reference on specdevs, sd_iocnt. This could be done
with psref instead; I chose a reference count instead for now because
we already have to take a per-object lock anyway, v_interlock, for
vdead_check, so another atomic is not likely to hurt much more. We
can always change the mechanism inside spec_io_enter/exit/drain later
on.

Make sure every access to vp->v_rdev or vp->v_specnode and every call
to a devsw operation is protected either:

- by the vnode lock (with vdead_check if we unlocked/relocked),
- by positive sd_opencnt,
- by spec_io_enter/exit, or
- by sd_opencnt management in open/close.
 1.48 28-Mar-2022  riastradh specfs: Resolve a race between close and a failing reopen.
 1.47 28-Mar-2022  riastradh specfs: Document sn_opencnt, sd_opencnt, sd_refcnt.
 1.46 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.45 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.44 23-Jun-2015  hannken branches: 1.44.34;
Add a vfs_newvnode() method to deadfs and use it to create
anonymous device vnodes with bdevvp() and cdevvp().

Implement spec_inactive() and spec_reclaim() to handle these nodes.
 1.43 25-Jul-2014  dholland branches: 1.43.4;
Implement spec_fdiscard() using bdev_discard() and cdev_discard().
Also define spec_fallocate() to genfs_eopnotsupp().
 1.42 30-Sep-2013  hannken branches: 1.42.2;
Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.41 21-Apr-2013  dholland branches: 1.41.4;
add missing spec_whiteout
 1.40 13-Feb-2013  hannken Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17
 1.39 14-Nov-2009  elad branches: 1.39.2; 1.39.12; 1.39.22;
- Move kauth_init() a little bit higher.

- Add spec_init() to authorize special device actions (and passthru too for
the time being). Move policy out of secmodel_suser.
 1.38 06-Oct-2009  elad Factor out a block of code that appears in three places (Veriexec, keylock,
and securelevel) so that others can use it as well.
 1.37 29-Dec-2008  pooka Rename specfs_lock as device_lock and move it from specfs to devsw.
Relaxes kernel dependency on vfs.
 1.36 28-Apr-2008  martin branches: 1.36.8;
Remove clause 3 and 4 from TNF licenses
 1.35 25-Jan-2008  ad branches: 1.35.6; 1.35.8; 1.35.10;
Remove VOP_LEASE. Discussed on tech-kern.
 1.34 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.33 07-Oct-2007  hannken branches: 1.33.4;
Update the file system copy-on-write handler.

- Instead of hooking the handler on the specdev of a mounted file system
hook directly on the `struct mount'.

- Rename from `vn_cow_*' to `fscow_*' and move to `kern/vfs_trans.c'. Use
`mount_*specific' instead of clobbering `struct mount' or `struct specinfo'.

- Replace the hand-made reader/writer lock with a krwlock.

- Keep `vn_cow_*' functions and mark as obsolete.

- Welcome to NetBSD 4.99.32 - `struct specinfo' changed size.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.32 03-Aug-2007  pooka branches: 1.32.2; 1.32.4; 1.32.6; 1.32.8;
cleanup unused prototype
 1.31 22-Jul-2007  pooka Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.30 14-May-2006  elad branches: 1.30.18; 1.30.28;
integrate kauth.
 1.29 11-Dec-2005  christos branches: 1.29.4; 1.29.6; 1.29.8; 1.29.10; 1.29.12;
merge ktrace-lwp.
 1.28 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.27 30-Aug-2005  xtraeme branches: 1.27.2;
Remove __P()
 1.26 25-May-2004  hannken branches: 1.26.12;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.25 14-Feb-2004  hannken Add a generic copy-on-write hook to add/remove functions that will be
called with every buffer written through spec_strategy().

Used by fss(4). Future file-system-internal snapshots will need them too.

Welcome to 1.6ZK

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.24 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.23 06-Jan-2003  matt branches: 1.23.2;
Add multiple inclusion protection.
 1.22 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.21 12-May-2002  matt Extern speclisth
 1.20 17-Aug-2001  chs branches: 1.20.2;
add definitions for UBCification of block devices.
 1.19 08-Dec-1999  sommerfeld branches: 1.19.6; 1.19.8;
Add appropriate VOP_FCNTL handlers to deadfs and specfs ops vectors.
 1.18 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.17 01-Mar-1998  fvdl branches: 1.17.14; 1.17.16; 1.17.20;
Merge with Lite2 + local changes
 1.16 11-Apr-1997  kleink Implement a POSIX compliant genfs VOP_SEEK() and use it in the appropriate
places; by Chris G. Demetriou and myself.
 1.15 02-Apr-1997  kleink added advisory record locking support
 1.14 07-Sep-1996  mycroft Implement poll(2).
 1.13 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.12 13-Feb-1996  mycroft GC *_nullop(). Minor nits.
 1.11 09-Feb-1996  christos miscfs prototype changes
 1.10 15-Oct-1995  mycroft Implement VOP_BWRITE() using vn_bwrite(), per r_friedl@informatik.uni-kl.de.
 1.9 13-Dec-1994  mycroft branches: 1.9.2;
Turn lease_check() into a vnode op, per CSRG.
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 08-Jun-1994  mycroft Update to 4.4-Lite fs code, with local changes.
 1.6 22-Dec-1993  cgd fix return type of vnode print routine
 1.5 07-Sep-1993  ws Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.4 27-Jun-1993  andrew ANSIfications - lots of function prototyping.
 1.3 20-May-1993  cgd add rcs ids as necessary, and also clean up headers
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.2.1 15-Oct-1995  mycroft Update from main branch.
 1.17.20.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.17.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.17.16.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.17.16.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.17.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.19.8.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.19.8.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.19.8.1 10-Jul-2001  lukem add spec_kqfilter()
 1.19.6.4 07-Jan-2003  thorpej Sync with HEAD.
 1.19.6.3 11-Nov-2002  nathanw Catch up to -current
 1.19.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.19.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.20.2.5 01-Oct-2001  fvdl Catch up with -current.
 1.20.2.4 27-Sep-2001  fvdl Do real locking for cloned vnodes (most filesystems have real locking
for spec vnodes, so clones should have it too). Could probably do locking
all the time for spec vnodes, but need to check if vnodes created
during bootstrap with {b,c}devvp will cause trouble if they have actual
locks.
 1.20.2.3 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.20.2.2 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.20.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.23.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.23.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.23.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.23.2.1 03-Aug-2004  skrll Sync with HEAD
 1.26.12.4 04-Feb-2008  yamt sync with head.
 1.26.12.3 27-Oct-2007  yamt sync with head.
 1.26.12.2 03-Sep-2007  yamt sync with head.
 1.26.12.1 21-Jun-2006  yamt sync with head.
 1.27.2.1 20-Oct-2005  yamt adapt specfs and fifofs.
 1.29.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.29.10.1 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.29.8.1 24-May-2006  yamt sync with head.
 1.29.6.1 01-Jun-2006  kardel Sync with head.
 1.29.4.1 09-Sep-2006  rpaulo sync with head
 1.30.28.1 15-Aug-2007  skrll Sync with HEAD.
 1.30.18.3 09-Oct-2007  ad Sync with head.
 1.30.18.2 20-Aug-2007  ad Sync with HEAD.
 1.30.18.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.32.8.2 03-Aug-2007  pooka cleanup unused prototype
 1.32.8.1 03-Aug-2007  pooka file specdev.h was added on branch matt-mips64 on 2007-08-03 08:50:24 +0000
 1.32.6.1 14-Oct-2007  yamt sync with head.
 1.32.4.2 23-Mar-2008  matt sync with HEAD
 1.32.4.1 06-Nov-2007  matt sync with HEAD
 1.32.2.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.32.2.1 03-Aug-2007  joerg file specdev.h was added on branch jmcneill-pm on 2007-10-26 15:48:57 +0000
 1.33.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.35.10.3 11-Mar-2010  yamt sync with head
 1.35.10.2 04-May-2009  yamt sync with head.
 1.35.10.1 16-May-2008  yamt sync with head.
 1.35.8.1 18-May-2008  yamt sync with head.
 1.35.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.35.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.36.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.39.22.4 03-Dec-2017  jdolecek update from HEAD
 1.39.22.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.39.22.2 23-Jun-2013  tls resync from head
 1.39.22.1 25-Feb-2013  tls resync with head
 1.39.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.39.2.3 28-May-2010  uebayasi specdev::v_phys_addr is now specdev::v_physseg.
 1.39.2.2 28-Apr-2010  uebayasi When mounting a block device as XIP, pass registered struct vm_physseg
* as a cookie from the block device to the caller (== mount code).
struct vm_physseg * will be passed to XIP vnode pager
(genfs_do_getpages_xip()), then converted back to paddr_t.

(My future plan is to pass struct vm_physseg * back to the fault handler,
and to pmap_enter() as is.)
 1.39.2.1 23-Mar-2010  uebayasi Put run-time XIP-specific per-mount data in struct specdev, not struct mount.
 1.41.4.1 18-May-2014  rmind sync with head
 1.42.2.1 10-Aug-2014  tls Rebase.
 1.43.4.1 22-Sep-2015  skrll Sync with HEAD
 1.44.34.1 01-Aug-2021  thorpej Sync with HEAD.
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.3 12-Oct-2014  uebayasi Define layerfs as an attribute.
 1.2 11-Oct-2014  uebayasi Define filesystem attributes with vfs dependency.
 1.1 16-Apr-2002  thorpej branches: 1.1.6; 1.1.8; 1.1.162;
Cleanup how file system configuration information is declared, grouping
related information together, with the file system code itself.

This is just low-hanging fruit -- more to come.
 1.1.162.1 03-Dec-2017  jdolecek update from HEAD
 1.1.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.8.1 16-Apr-2002  jdolecek file files.umapfs was added on branch kqueue on 2002-06-23 17:50:16 +0000
 1.1.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.1.6.1 16-Apr-2002  nathanw file files.umapfs was added on branch nathanw_sa on 2002-06-20 03:48:04 +0000
 1.19 20-Aug-2019  perseant Clean up debugging cruft that somehow made it into my previous commit.
 1.18 20-Aug-2019  perseant Allow the user to specify the filesystem ID for umapfs at mount time,
allowing a consistent filesystem ID across reboots. Closes PR #54471.
 1.17 11-Apr-2017  hannken branches: 1.17.12;
Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.16 28-Jun-2008  rumble branches: 1.16.40; 1.16.60; 1.16.64; 1.16.68;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.15 14-May-2006  elad branches: 1.15.58; 1.15.62; 1.15.64; 1.15.66;
integrate kauth.
 1.14 11-Dec-2005  christos branches: 1.14.4; 1.14.6; 1.14.8; 1.14.10; 1.14.12;
merge ktrace-lwp.
 1.13 30-Aug-2005  xtraeme Remove __P()
 1.12 26-Feb-2005  perry branches: 1.12.4;
nuke trailing whitespace
 1.11 20-May-2004  atatat branches: 1.11.4; 1.11.6;
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.10 07-Aug-2003  agc branches: 1.10.2;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.9 08-Jul-1999  wrstuden branches: 1.9.36;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.8 01-Mar-1998  fvdl branches: 1.8.10;
Merge with Lite2 + local changes
 1.7 06-Oct-1997  thorpej Make the vfs ops and vnodeop_opv symbols match the name of the
file-system option used to configure the file system into the kernel.
 1.6 09-Feb-1996  christos branches: 1.6.12;
miscfs prototype changes
 1.5 15-Apr-1995  cgd clean up some return-type warnings
 1.4 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.3 19-Aug-1994  mycroft Convert hash tables.
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.1 19-Aug-1994  mycroft update from trunk
 1.6.12.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.8.10.1 02-Aug-1999  thorpej Update from trunk.
 1.9.36.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.9.36.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.9.36.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.36.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.36.1 03-Aug-2004  skrll Sync with HEAD
 1.10.2.1 23-May-2004  tron Pull up revision 1.11 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.11.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.11.4.1 29-Apr-2005  kent sync with -current
 1.12.4.1 21-Jun-2006  yamt sync with head.
 1.14.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.14.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.14.8.1 24-May-2006  yamt sync with head.
 1.14.6.1 01-Jun-2006  kardel Sync with head.
 1.14.4.1 09-Sep-2006  rpaulo sync with head
 1.15.66.1 03-Jul-2008  simonb Sync with head.
 1.15.64.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.15.62.1 04-May-2009  yamt sync with head.
 1.15.58.1 29-Jun-2008  mjf Sync with HEAD.
 1.16.68.1 21-Apr-2017  bouyer Sync with HEAD
 1.16.64.1 26-Apr-2017  pgoyette Sync with HEAD
 1.16.60.1 28-Aug-2017  skrll Sync with HEAD
 1.16.40.1 03-Dec-2017  jdolecek update from HEAD
 1.17.12.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.29 09-Nov-2014  maxv Do not uselessly include <sys/malloc.h>.
 1.28 15-Mar-2009  cegger branches: 1.28.22; 1.28.38;
ansify function definitions
 1.27 14-Mar-2009  dsl Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.26 13-Feb-2009  plunky While we remap credentials we should ignore cred == FSCRED as well as
cred == NOCRED.

This fixes a page fault occurring when a union is mounted over a umap,
as FSCRED is passed by union filesystem.
 1.25 30-Jun-2007  dsl branches: 1.25.32; 1.25.42; 1.25.44; 1.25.48;
Updates for changes prototype of kauth_cred_set/getgroups().
 1.24 14-May-2006  elad branches: 1.24.18; 1.24.20;
integrate kauth.
 1.23 11-Dec-2005  christos branches: 1.23.4; 1.23.6; 1.23.8; 1.23.10; 1.23.12;
merge ktrace-lwp.
 1.22 30-Aug-2005  xtraeme Remove __P()
 1.21 26-Feb-2005  perry branches: 1.21.4;
nuke trailing whitespace
 1.20 07-Aug-2003  agc branches: 1.20.8; 1.20.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 15-Nov-2001  lukem branches: 1.19.16;
don't need <sys/types.h> when including <sys/param.h>
 1.18 10-Nov-2001  lukem add RCSIDs
 1.17 07-Jun-2001  wiz branches: 1.17.2; 1.17.6;
Typos and grammer fixes in comments (misc/13133 by Michael K. Sanders)
 1.16 13-Mar-2000  soren branches: 1.16.6;
Fix doubled 'the's in comments.
 1.15 08-Jul-1999  wrstuden branches: 1.15.2;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.14 19-Mar-1999  perseant branches: 1.14.4;
Apply patch from kern/5538:

Fix group mapping so members of group 0 get other group-ids mapped as well.
Avoid rename panic by checking (*this_vp_p) against NULLVP before
dereferencing it (same change as to NULLFS some time ago).
 1.13 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.12 07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.11 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.10 13-Oct-1996  christos branches: 1.10.10;
backout previous kprintf changes
 1.9 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.8 05-Mar-1996  thorpej Don't deref a bad ucred pointer, from Dave Carrel <carrel@cisco.com>,
PR #1699.
 1.7 09-Feb-1996  christos miscfs prototype changes
 1.6 01-Jun-1995  jtc Moved egid credential from cr_groups[0] to new field cr_gid. POSIX.1
requires that sgid executables and the setuid() syscall *not* change
the supplemental group list.
 1.5 15-Apr-1995  cgd clean up some return-type warnings
 1.4 20-Sep-1994  cgd fix device aliasing and lost vnode problems.
 1.3 19-Aug-1994  mycroft Convert hash tables.
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.2 20-Sep-1994  cgd from trunk.
 1.2.2.1 19-Aug-1994  mycroft update from trunk
 1.10.10.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.14.4.1 02-Aug-1999  thorpej Update from trunk.
 1.15.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.6.3 08-Jan-2002  nathanw Catch up to -current.
 1.16.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.16.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.17.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.17.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.16.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.16.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.19.16.1 03-Aug-2004  skrll Sync with HEAD
 1.20.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.20.8.1 29-Apr-2005  kent sync with -current
 1.21.4.2 03-Sep-2007  yamt sync with head.
 1.21.4.1 21-Jun-2006  yamt sync with head.
 1.23.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.23.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.23.10.2 14-Mar-2006  elad Use kauth_cred_[sg]etgroups() where appropriate.
 1.23.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.23.8.1 24-May-2006  yamt sync with head.
 1.23.6.1 01-Jun-2006  kardel Sync with head.
 1.23.4.1 09-Sep-2006  rpaulo sync with head
 1.24.20.1 11-Jul-2007  mjf Sync with head.
 1.24.18.1 15-Jul-2007  ad Sync with head.
 1.25.48.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.25.44.1 23-Feb-2009  snj Pull up following revision(s) (requested by plunky in ticket #461):
sys/miscfs/umapfs/umap_subr.c: revision 1.26
sys/miscfs/umapfs/umap_vnops.c: revision 1.44
While we remap credentials we should ignore cred == FSCRED as well as
cred == NOCRED.
This fixes a page fault occurring when a union is mounted over a umap,
as FSCRED is passed by union filesystem.
 1.25.42.2 28-Apr-2009  skrll Sync with HEAD.
 1.25.42.1 03-Mar-2009  skrll Sync with HEAD.
 1.25.32.1 04-May-2009  yamt sync with head.
 1.28.38.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.28.22.1 03-Dec-2017  jdolecek update from HEAD
 1.105 16-Feb-2025  joe remove unecessary branches
 1.104 04-Nov-2022  hannken branches: 1.104.8;
Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105
 1.103 13-Apr-2020  ad Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.
 1.102 16-Mar-2020  pgoyette branches: 1.102.2;
Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.101 20-Aug-2019  perseant Allow the user to specify the filesystem ID for umapfs at mount time,
allowing a consistent filesystem ID across reboots. Closes PR #54471.
 1.100 20-Feb-2019  hannken Set "mnt_lower" before the first file system operation on the new file system.
 1.99 11-Apr-2017  hannken branches: 1.99.12;
Field "layerm_vfs" of "struct layer_mount" got superseded by "mnt_lower".
Adapt consumers and remove the now unused field.

Ride 7.99.68
 1.98 30-Mar-2017  hannken Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.
 1.97 06-Mar-2017  hannken Add field "mnt_lower" to "struct mount" to track the file system
a layered file system is mounted on.

Welcome to 7.99.65
 1.96 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.95 09-Nov-2014  maxv branches: 1.95.2; 1.95.4; 1.95.6;
Do not uselessly include <sys/malloc.h>.
 1.94 11-Aug-2014  maxv 1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.

Both triggerable from root only.
 1.93 25-May-2014  hannken branches: 1.93.2;
Change layerfs from hashlist to vcache.
Make VI_LOCKSHARE public again.

Ride 6.99.43
 1.92 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.91 23-Mar-2014  hannken branches: 1.91.2;
Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.90 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.89 10-Feb-2014  hannken Change layerfs_vget(), layerfs_fhtovp() and the various layer xxx_mount()
functions to unlock/relock the node for the call to layer_node_create().

Finally remove dirty hacks (LK_NOWAIT, kpause) from layer_node_find().
 1.88 30-Apr-2012  rmind branches: 1.88.2; 1.88.4;
- Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.87 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.86 19-Nov-2010  dholland branches: 1.86.8; 1.86.12; 1.86.14; 1.86.18; 1.86.20;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
 1.85 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.84 11-Apr-2010  mlelstv The *_modcmd functions use the module name as prefix.
 1.83 11-Apr-2010  pooka Make module name match MOUNT_NAME. Inspired by PR kern/43110.
 1.82 14-Mar-2009  dsl branches: 1.82.2; 1.82.4;
Change about 4500 of the K&R function definitions to ANSI ones.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
 1.81 05-Dec-2008  ad branches: 1.81.4;
PR kern/40110: null, overlay and umap modules loading -> panic (layerfs symbols not there)

Add a layerfs module.
 1.80 28-Jun-2008  rumble branches: 1.80.2; 1.80.4; 1.80.6; 1.80.12; 1.80.16;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.79 13-May-2008  simonb branches: 1.79.2;
mnt_data is a pointer, set it to NULL not 0 when we're finished with it.
 1.78 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.77 05-May-2008  ad branches: 1.77.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.76 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.75 28-Jan-2008  dholland branches: 1.75.6; 1.75.8; 1.75.10;
Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.74 02-Jan-2008  ad Merge vmlocking2 to head.
 1.73 08-Dec-2007  pooka branches: 1.73.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.72 26-Nov-2007  pooka branches: 1.72.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.71 10-Oct-2007  ad branches: 1.71.4;
umapm_hashlock is a mutex.
 1.70 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.69 31-Jul-2007  pooka branches: 1.69.2; 1.69.4; 1.69.6; 1.69.8;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.68 26-Jul-2007  pooka Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter.
 1.67 17-Jul-2007  pooka branches: 1.67.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.66 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.65 08-Jul-2007  pooka * allow unmount even if rootvp has a usecount > 1 provided that
MNT_FORCE is given
* decrease cargo cult index by getting rid of commented sections
with mntflushbuf() in them - AFAICT the call was removed from our
kernel over 13 years ago with the 4.4BSDlite import
 1.64 08-Apr-2007  hannken Remove now obsolete vn_start_write() and vn_finished_write() and
corresponding flags.

Revert softdep_trackbufs() to its state before vn_start_write() was added.

Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.

Welcome to 4.99.17
 1.63 19-Jan-2007  hannken branches: 1.63.2; 1.63.6; 1.63.8;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.62 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.61 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.60 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.59 03-Sep-2006  christos branches: 1.59.2; 1.59.4;
add missing initializers
 1.58 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.57 14-May-2006  elad integrate kauth.
 1.56 11-Dec-2005  christos branches: 1.56.4; 1.56.6; 1.56.8; 1.56.10; 1.56.12;
merge ktrace-lwp.
 1.55 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.54 30-Aug-2005  xtraeme Remove __P()
 1.53 29-May-2005  christos branches: 1.53.2;
- sprinkle const
- avoid shadowed variables.
 1.52 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.51 26-Feb-2005  perry nuke trailing whitespace
 1.50 02-Jan-2005  thorpej branches: 1.50.2; 1.50.4;
Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.49 01-Jul-2004  hannken Keep a pointer to the leaf mount. Needed for write gating where a
file system gets suspended and has layered mounts above it.

Welcome to 2.0G

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.48 29-May-2004  wrstuden Add layerfs_snapshot() as a handler routine for VFS_SNAPSHOT() calls
through a layered file system.

Note: we don't actually support snapshots through a layered file system,
and this routine returns an error. However we: 1) have clearly documented
what needs fixing (which isn't trivial to fix) and 2) if we do fix
this, all layered file systems can take advantage of it at once.
 1.47 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.46 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.45 29-Apr-2004  jrf Removed remaining caddr_t casts we do not need in miscfs. Recompiled
kernel and ran for a day or so. There are still some caddr_t types in
the arguments of some calls, I will do those separately (later) as
they touch a lot more of the system.
Approved by christos@NetBSD.org.
 1.44 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.43 24-Mar-2004  atatat branches: 1.43.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.42 09-Mar-2004  atatat Remove pointless comment about layerfs_sysctl()
 1.41 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.40 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.39 29-Jun-2003  fvdl branches: 1.39.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.38 29-Jun-2003  thorpej Adjust for ktrace/lwp changes.
 1.37 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.36 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.35 21-Sep-2002  christos MNT_GETARGS support
 1.34 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.33 15-Nov-2001  lukem branches: 1.33.8;
don't need <sys/types.h> when including <sys/param.h>
 1.32 10-Nov-2001  lukem add RCSIDs
 1.31 15-Sep-2001  chs branches: 1.31.2;
add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.30 16-Aug-2001  tv branches: 1.30.2;
KNF on previous.
 1.29 03-Aug-2001  jdolecek bound check mount args more thoroughly
 1.28 02-Aug-2001  assar (*fs_mount): do not get the parent vnode back from namei to just release it
 1.27 22-Jan-2001  jdolecek branches: 1.27.2; 1.27.4;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.26 08-Nov-2000  ad Update for hashinit() change.
 1.25 10-Jun-2000  assar branches: 1.25.2;
make vfs_getnewfsid only take one argument and fetch the name of the
filesystem from the supplied mount argument. also make makefstype
take a const parameter. update all the callers.
 1.24 16-Mar-2000  jdolecek branches: 1.24.2;
Adapt to last VFS changes - add appropriate vfs_done routine.
 1.23 08-Jul-1999  wrstuden branches: 1.23.2; 1.23.8;
Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.22 19-Mar-1999  perseant branches: 1.22.4;
Apply patch from kern/5538:

Fix group mapping so members of group 0 get other group-ids mapped as well.
Avoid rename panic by checking (*this_vp_p) against NULLVP before
dereferencing it (same change as to NULLFS some time ago).
 1.21 12-Mar-1999  bouyer Restrict umap mounts to root. Letting any user use this has security
implications.
 1.20 26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.19 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.18 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.17 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.16 06-Oct-1997  thorpej Make the vfs ops and vnodeop_opv symbols match the name of the
file-system option used to configure the file system into the kernel.
 1.15 10-Sep-1997  christos PR/4098: Alan Barrett: Fix diagnostic printf formatting.
 1.14 11-Mar-1997  mikel branches: 1.14.4;
this is umapfs, not lofs
 1.13 20-Feb-1997  mikel use the proper entry count; from Yasufumi Itoh in PR kern/3175.
 1.12 22-Dec-1996  cgd branches: 1.12.6;
Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.11 13-Oct-1996  christos backout previous kprintf changes
 1.10 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.9 09-Feb-1996  christos miscfs prototype changes
 1.8 18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.7 09-Mar-1995  mycroft copy*str() should use size_t.
 1.6 25-Jan-1995  cgd return EOPNOTSUPP from fhtovp and vptofh functions; doing otherwise
correctly is not possible.
 1.5 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a (tiny) bit.
 1.4 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.3 15-Sep-1994  mycroft stat the file system at mount time, for `df -n', et al.
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.1 16-Sep-1994  cgd from trunk, per mycroft
 1.12.6.1 12-Mar-1997  is Merge in changes from Trunk
 1.14.4.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.14.4.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.22.4.1 02-Aug-1999  thorpej Update from trunk.
 1.23.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.23.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.23.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.23.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.24.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.25.2.1 16-Aug-2001  tv Pullup [jdolecek]:

sys/miscfs/umapfs/umap_vfsops.c 1.29-1.30
sys/kern/vfs_subr.c 1.156
sys/nfs/nfs.h 1.30

Bounds check mount args.
 1.27.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.27.4.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.27.4.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.27.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.27.4.1 03-Aug-2001  lukem update to -current
 1.27.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.27.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.27.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.27.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.27.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.27.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.30.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.31.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.33.8.1 29-Aug-2002  gehenna catch up with -current.
 1.39.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.39.2.8 01-Apr-2005  skrll Sync with HEAD.
 1.39.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.39.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.39.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.39.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.39.2.3 03-Aug-2004  skrll Sync with HEAD
 1.39.2.2 03-Jul-2003  wrstuden LWP-ify. Changes all seem to be catching up wiht recent set_statfs_info()
chances.
 1.39.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.43.2.1 29-May-2004  tron Pull up revision 1.46 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.50.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.50.2.1 29-Apr-2005  kent sync with -current
 1.53.2.8 04-Feb-2008  yamt sync with head.
 1.53.2.7 21-Jan-2008  yamt sync with head
 1.53.2.6 07-Dec-2007  yamt sync with head
 1.53.2.5 27-Oct-2007  yamt sync with head.
 1.53.2.4 03-Sep-2007  yamt sync with head.
 1.53.2.3 26-Feb-2007  yamt sync with head.
 1.53.2.2 30-Dec-2006  yamt sync with head.
 1.53.2.1 21-Jun-2006  yamt sync with head.
 1.56.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.56.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.56.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.56.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.56.8.3 14-Sep-2006  yamt sync with head.
 1.56.8.2 11-Aug-2006  yamt sync with head
 1.56.8.1 24-May-2006  yamt sync with head.
 1.56.6.1 01-Jun-2006  kardel Sync with head.
 1.56.4.1 09-Sep-2006  rpaulo sync with head
 1.59.4.2 10-Dec-2006  yamt sync with head.
 1.59.4.1 22-Oct-2006  yamt sync with head
 1.59.2.3 01-Feb-2007  ad Sync with head.
 1.59.2.2 12-Jan-2007  ad Sync with head.
 1.59.2.1 18-Nov-2006  ad Sync with head.
 1.63.8.1 11-Jul-2007  mjf Sync with head.
 1.63.6.6 16-Sep-2007  ad Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.
 1.63.6.5 20-Aug-2007  ad Sync with HEAD.
 1.63.6.4 15-Jul-2007  ad Sync with head.
 1.63.6.3 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.63.6.2 10-Apr-2007  ad Sync with head.
 1.63.6.1 05-Apr-2007  ad Compile fixes.
 1.63.2.1 15-Apr-2007  yamt sync with head.
 1.67.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.69.8.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.69.8.1 31-Jul-2007  pooka file umap_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:17 +0000
 1.69.6.1 14-Oct-2007  yamt sync with head.
 1.69.4.3 23-Mar-2008  matt sync with HEAD
 1.69.4.2 09-Jan-2008  matt sync with HEAD
 1.69.4.1 06-Nov-2007  matt sync with HEAD
 1.69.2.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.69.2.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.69.2.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.71.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.71.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.71.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.72.2.2 26-Dec-2007  ad Sync with head.
 1.72.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.73.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.75.10.3 11-Aug-2010  yamt sync with head.
 1.75.10.2 04-May-2009  yamt sync with head.
 1.75.10.1 16-May-2008  yamt sync with head.
 1.75.8.1 18-May-2008  yamt sync with head.
 1.75.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.75.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.75.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.77.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.77.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.79.2.1 03-Jul-2008  simonb Sync with head.
 1.80.16.2 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1921):
sys/miscfs/umapfs/umap_vfsops.c: revision 1.94
1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.
Both triggerable from root only.
 1.80.16.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.80.12.2 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1921):
sys/miscfs/umapfs/umap_vfsops.c: revision 1.94
1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.
Both triggerable from root only.
 1.80.12.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.80.6.2 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1921):
sys/miscfs/umapfs/umap_vfsops.c: revision 1.94
1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.
Both triggerable from root only.
 1.80.6.1 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.80.4.2 28-Apr-2009  skrll Sync with HEAD.
 1.80.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.80.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.81.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.82.4.3 05-Mar-2011  rmind sync with head
 1.82.4.2 03-Jul-2010  rmind sync with head
 1.82.4.1 30-May-2010  rmind sync with head
 1.82.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.82.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.86.20.2 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1115):
sys/miscfs/umapfs/umap_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.52
Overflow if *data_len == OSIZE and args->version >= PTYFS_ARGSVERSION.
Sent on tech-kern@, ok christos@
1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.
Both triggerable from root only.
 1.86.20.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.86.18.2 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1115):
sys/miscfs/umapfs/umap_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.52
Overflow if *data_len == OSIZE and args->version >= PTYFS_ARGSVERSION.
Sent on tech-kern@, ok christos@
1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.
Both triggerable from root only.
 1.86.18.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.86.14.2 27-Aug-2014  msaitoh Pull up following revision(s) (requested by maxv in ticket #1115):
sys/miscfs/umapfs/umap_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.52
Overflow if *data_len == OSIZE and args->version >= PTYFS_ARGSVERSION.
Sent on tech-kern@, ok christos@
1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.
Both triggerable from root only.
 1.86.14.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.86.12.2 02-Jun-2012  mrg sync to latest -current.
 1.86.12.1 05-Apr-2012  mrg sync to latest -current.
 1.86.8.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.86.8.2 23-May-2012  yamt sync with head.
 1.86.8.1 17-Apr-2012  yamt sync with head
 1.88.4.1 18-May-2014  rmind sync with head
 1.88.2.2 03-Dec-2017  jdolecek update from HEAD
 1.88.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.91.2.1 10-Aug-2014  tls Rebase.
 1.93.2.2 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.93.2.1 21-Aug-2014  martin Pull up following revision(s) (requested by maxv in ticket #43):
sys/miscfs/umapfs/umap_vfsops.c: revision 1.94
1) 'error' is returned while it does not even hold an error code. Which
means that zero is returned, and the kernel keeps mounting (and it
probably ends up in a deadlock/memory corruption somewhere).
2) 'nentries' and 'gnentries' are int and user-controlled, and there's no
check to ensure they are greater than zero. Since they are used to
compute the size of two copyin's, a user can control the copied size
by giving a negative value (like 128-2^29), and thus overwrite kernel
memory.
Both triggerable from root only.
 1.95.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.95.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.95.4.1 20-Mar-2017  pgoyette Sync with HEAD
 1.95.2.1 28-Aug-2017  skrll Sync with HEAD
 1.99.12.4 21-Apr-2020  martin Sync with HEAD
 1.99.12.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.99.12.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.99.12.1 10-Jun-2019  christos Sync with HEAD
 1.102.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.104.8.1 02-Aug-2025  perseant Sync with HEAD
 1.62 20-Oct-2021  thorpej Overhaul of the EVFILT_VNODE kevent(2) filter:

- Centralize vnode kevent handling in the VOP_*() wrappers, rather than
forcing each individual file system to deal with it (except VOP_RENAME(),
because VOP_RENAME() is a mess and we currently have 2 different ways
of handling it; at least it's reasonably well-centralized in the "new"
way).
- Add support for NOTE_OPEN, NOTE_CLOSE, NOTE_CLOSE_WRITE, and NOTE_READ,
compatible with the same events in FreeBSD.
- Track which kevent notifications clients are interested in receiving
to avoid doing work for events no one cares about (avoiding, e.g.
taking locks and traversing the klist to send a NOTE_WRITE when
someone is merely watching for a file to be deleted, for example).

In support of the above:

- Add support in vnode_if.sh for specifying PRE- and POST-op handlers,
to be invoked before and after vop_pre() and vop_post(), respectively.
Basic idea from FreeBSD, but implemented differently.
- Add support in vnode_if.sh for specifying CONTEXT fields in the
vop_*_args structures. These context fields are used to convey information
between the file system VOP function and the VOP wrapper, but do not
occupy an argument slot in the VOP_*() call itself. These context fields
are initialized and subsequently interpreted by PRE- and POST-op handlers.
- Version VOP_REMOVE(), uses the a context field for the file system to report
back the resulting link count of the target vnode. Return this in tmpfs,
udf, nfs, chfs, ext2fs, lfs, and ufs.

NetBSD 9.99.92.
 1.61 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.60 04-Jun-2017  hannken Locking a layer vnode using the regular bypass routine is no longer
racy. Undo the change from 2017-03-30 11:16:52, commitid eurqbzuGxGRlryLz
and make vi_lock a krwlock_t again.
 1.59 30-Mar-2017  hannken branches: 1.59.6;
Locking a layer vnode is racy as it may become reclaimed before
calling the operation on the lower vnode.

Replace vi_lock with a rw_obj and change layered file systems
to share the lock with the lower vnode.

Layered file systems now use genfs_lock()/_unlock/_islocked().

Welcome to 7.99.67
 1.58 27-Jan-2017  hannken Handle v_writecount from layer_open(), layer_close() and layer_revoke()
so lower file system vnodes get marked as open for writing.
 1.57 09-Nov-2014  maxv branches: 1.57.2; 1.57.4; 1.57.6;
Do not uselessly include <sys/malloc.h>.
 1.56 27-Feb-2014  hannken branches: 1.56.4;
The current implementation of vn_lock() is racy. Modification of
the vnode operations vector for active vnodes is unsafe because it
is not known whether deadfs or the original file system will be
called.

- Pass down LK_RETRY to the lock operation (hint for deadfs only).

- Change deadfs lock operation to return ENOENT if LK_RETRY is unset.

- Change all other lock operations to check for dead vnode once
the vnode is locked and unlock and return ENOENT in this case.

With these changes in place vnode lock operations will never succeed
after vclean() has marked the vnode as VI_XLOCK and before vclean()
has changed the operations vector.

Adresses PR kern/37706 (Forced unmount of file systems is unsafe)

Discussed on tech-kern.

Welcome to 6.99.33
 1.55 09-Feb-2014  hannken Adjust comment and change vput() to vrele(). This change got missed
when changing vnode creation operations to return unlocked result.
 1.54 07-Feb-2014  hannken Change vnode operation lookup to return the resulting vnode *vpp unlocked.
Change cache_lookup() to return an unlocked vnode.

Discussed on tech-kern@

Welcome to 6.99.31
 1.53 11-Jul-2011  hannken branches: 1.53.2; 1.53.12; 1.53.16;
Change VOP_BWRITE() to take a vnode as its first argument like all other
VOPs do. Layered file systems no longer have to modify bp->b_vp and run
into trouble when an async VOP_BWRITE() uses the wrong vnode.

- change all occurences of VOP_BWRITE(bp) to VOP_BWRITE(bp->b_vp, bp).
- remove layer_bwrite().
- welcome to 5.99.55

Adresses PR kern/38762 panic: vwakeup: neg numoutput

No objections from tech-kern@.
 1.52 03-Apr-2011  rmind - Use offsetof() in VOPARG_OFFSETOF() instead of re-implementing it.
- Remove VDESC_NOMAP_VPP and VDESC_VPP_WILLRELE.
- Remove VRELEL_NOINACTIVE and VRELEL_ONHEAD.
 1.51 10-Jan-2011  hannken branches: 1.51.2;
Add layer_revoke() that adjusts the lower vnode use count to be at least as
high as the upper vnode count before passing down the VOP_REVOKE().

This way vclean() check for active (vp->v_usecount > 1) vnodes gets it right.

Should fix PR kern/43456.
 1.50 02-Jul-2010  hannken LK_INTERLOCK is no longer a valid flag for VOP_LOCK(). This makes
layer_*lock*() obsolete. Remove them and handle lock operations
with the generic bypass function.

Ride 5.99.34.
 1.49 06-Jun-2010  hannken Change layered file systems to always pass the locking VOP's down to the
leaf file system. Remove now unused member v_vnlock from struct vnode.
Welcome to 5.99.30

Discussed on tech-kern.
 1.48 08-Jan-2010  pooka branches: 1.48.2; 1.48.4;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.47 14-Mar-2009  dsl ANSIfy another 1261 function definitions.
The only ones left in sys are beyond by sed script!
(or in sys/dist or sys/external)
Mostly they have function pointer parameters.
 1.46 14-Feb-2009  plunky add a comment re the vop (?) flag LAYERFS_MBYPASSDEBUG, that if set
could cause a bad pointer dereference in the debug printing when
credentials with values of NOCRED or FSCRED were passed to kauth.

I don't see any way to set such a flag, I think its just a debug
thing that could be enabled at compile time by somebody who knew
how, hence the comment rather than a real fix.
 1.45 14-Feb-2009  plunky consistency checks made inside #ifdef SAFETY should really
be #ifdef DIAGNOSTIC
 1.44 13-Feb-2009  plunky While we remap credentials we should ignore cred == FSCRED as well as
cred == NOCRED.

This fixes a page fault occurring when a union is mounted over a umap,
as FSCRED is passed by union filesystem.
 1.43 09-Dec-2006  chs branches: 1.43.44; 1.43.54; 1.43.56; 1.43.60;
a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.42 25-Oct-2006  elad branches: 1.42.2;
kauth_cred_geteuid() is okay for the purposes of these checks. Revert
conversion to kauth_authorize_generic() done some time ago.
 1.41 13-Sep-2006  elad branches: 1.41.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.
 1.40 08-Sep-2006  elad First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.39 14-May-2006  elad branches: 1.39.8;
integrate kauth.
 1.38 12-Apr-2006  christos Coverity CID 2851: Check for NULL before freeing.
 1.37 04-Apr-2006  christos Coverity CID 1002: Yes, this could really be NULL, so check against it.
 1.36 04-Apr-2006  christos Coverity CID 2413: NULL deref cannot happen, but nevertheless protect against
it.
 1.35 11-Dec-2005  christos branches: 1.35.4; 1.35.6; 1.35.8; 1.35.10; 1.35.12;
merge ktrace-lwp.
 1.34 30-Aug-2005  xtraeme Remove __P()
 1.33 26-Feb-2005  perry branches: 1.33.4;
nuke trailing whitespace
 1.32 30-Jun-2004  hannken branches: 1.32.4; 1.32.6;
Do LAYERFS_REMOVED for vop_rmdir.

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.31 16-Jun-2004  wrstuden Make sure we actually locked the parent vnode before we clear
PDIRUNLOCK. The whole reason we have the flag is to note (rare)
cases where we are supposed to have the parent directory locked
but don't. Permits error handling code to know what to do with
the parrent vnode (vrele() vs vput()).
 1.30 16-Jun-2004  yamt - eliminate gratuitous differences between umap_bypass() and layer_bypass().
- fix a typo in a comment.
no functional changes are intended.
 1.29 16-Jun-2004  yamt missing error recover from layer_node_create failure.
 1.28 11-Jun-2004  yamt umap_lookup/layer_lookup: NULL out *ap->a_vpp after calling
underlying filesystem because some caller including lookup()
assume that *vpp is NULL on error.
 1.27 07-Jun-2004  yamt do a LAYERFS_REMOVED hack for vop_rename as well.
 1.26 28-May-2004  wrstuden Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).

We still need VOP_UPCALL(), but this is better than things were before.

This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.25 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.24 25-Jan-2004  hannken branches: 1.24.2;
Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.22 04-Jan-2002  chs branches: 1.22.16;
add the entry for layer_getpages() to the VOP tables of the
layered file systems that need it.
 1.21 06-Dec-2001  chs add VOP_GETPAGES and VOP_PUTPAGES methods for layered filesystems.
drop the interlock on the upper layer, acquire the interlock on the
lower layer.
 1.20 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.19 10-Nov-2001  lukem add RCSIDs
 1.18 22-Jan-2001  jdolecek branches: 1.18.2; 1.18.4; 1.18.8;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.17 18-Jan-2001  jdolecek constify
 1.16 16-Aug-1999  wrstuden branches: 1.16.2;
Fin uninitialized variable use noted by Simon Burge.
 1.15 08-Jul-1999  wrstuden Introduce layer library in genfs. This set of files abstracts most of
the functionality of nullfs. The latter is now just a mount & unmount
routine, and a few tables. umapfs borrow most of this infrastructure.

Both fs's are now nfs-exportable.

All layered fs's share a common format to private mount & private
vnode structs (which a particular fs can extend).

Also add genfs_noerr_rele(), a vnode op which will vrele/vput
operand vnodes appropriately.
 1.14 17-May-1999  wrstuden Remove explicit references to null_bypass (used in umap_lock() and
umap_unlock()) so as to not explicitly depend on nullfs being compiled
into the kernel.

umap_bypass won't be too slow as there are no credentials in these two ops
to need mapping.
 1.13 25-Mar-1999  bouyer branches: 1.13.2; 1.13.4; 1.13.6;
We must handle MNT_NODEV at open time, so add an open op for null and union,
and do proper checks in union_open(). Fix to nullfs from OpenBSD, extended
to umap and union by me.
 1.12 22-Mar-1999  sommerfe vinvalbuf, called from vclean, could cause a locking-against-self
deadlock in VOP_FSYNC() if the unreferenced vnode picked for
reclamation happened to be stacked on top of a vnode the process
already had locked. This could happen if the same filesystem was
accessed both through a union mount and directly; it seemed to happen
most frequently when the direct access was through NFS.

Avoid this deadlock by changing vinvalbuf to pass a new FSYNC_RECLAIM
flag bit to VOP_FSYNC() to indicate that a reclaim is in progress and
only a `shallow' fsync is necessary.

Do nothing in *_fsync() in umapfs, nullfs, and unionfs when
FSYNC_RECLAIM is set; the underlying vnodes will shortly be released
in *_reclaim and may be reclaimed (and fsync'ed) later.
 1.11 19-Mar-1999  perseant Apply patch from kern/5538:

Fix group mapping so members of group 0 get other group-ids mapped as well.
Avoid rename panic by checking (*this_vp_p) against NULLVP before
dereferencing it (same change as to NULLFS some time ago).
 1.10 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.9 06-Oct-1997  thorpej Make the vfs ops and vnodeop_opv symbols match the name of the
file-system option used to configure the file system into the kernel.
 1.8 13-Oct-1996  christos branches: 1.8.10;
backout previous kprintf changes
 1.7 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.6 23-May-1996  cgd print pointers with %p, rather than by printing with %x and casting to
(unsigned int).
 1.5 09-Feb-1996  christos branches: 1.5.4;
miscfs prototype changes
 1.4 15-Apr-1995  cgd clean up some return-type warnings
 1.3 19-Aug-1994  mycroft Convert hash tables.
 1.2 29-Jun-1994  cgd branches: 1.2.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.2.1 19-Aug-1994  mycroft update from trunk
 1.5.4.1 25-May-1996  jtc pulled up to the release branch by cgd's request
 1.8.10.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.13.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.13.4.2 02-Aug-1999  thorpej Update from trunk.
 1.13.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.13.2.1 22-Jun-1999  perry pullup 1.13->1.14 (wrstuden)
 1.16.2.1 11-Feb-2001  bouyer Sync with HEAD.
 1.18.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.18.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.18.2.3 11-Jan-2002  nathanw More catchup.
 1.18.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.18.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.22.16.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.22.16.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.22.16.4 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.22.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.16.1 03-Aug-2004  skrll Sync with HEAD
 1.24.2.4 02-Jul-2004  he Pull up revision 1.32 (requested by hannken in ticket #575):
Do LAYERFS_REMOVED for vop_rmdir.
 1.24.2.3 21-Jun-2004  tron Pull up revision 1.28 (requested by yamt in ticket #514):
umap_lookup/layer_lookup: NULL out *ap->a_vpp after calling
underlying filesystem because some caller including lookup()
assume that *vpp is NULL on error.
 1.24.2.2 21-Jun-2004  tron Pull up revision 1.27 (requested by yamt in ticket #512):
do a LAYERFS_REMOVED hack for vop_rename as well.
 1.24.2.1 30-May-2004  tron Pull up revision 1.26 (requested by wrstuden in ticket #424):
Since VOP_UPCALL() has been a long time in coming, add this partial
fix for layered-file-removal. It will work for the case of accessing
and deleting a file through the layered file system. Accessing via
the layer and deleting on the underlying still won't work, nor will
accessing via complicated structures (like two umap layers over a
given file systems).
We still need VOP_UPCALL(), but this is better than things were before.
This patch has been discussed off & on for a while. This incarnation
was tested by hannken at netbsd dot org.
 1.32.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.32.4.1 29-Apr-2005  kent sync with -current
 1.33.4.2 30-Dec-2006  yamt sync with head.
 1.33.4.1 21-Jun-2006  yamt sync with head.
 1.35.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.35.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.35.10.2 19-Apr-2006  elad sync with head.
 1.35.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.35.8.3 14-Sep-2006  yamt sync with head.
 1.35.8.2 24-May-2006  yamt sync with head.
 1.35.8.1 11-Apr-2006  yamt sync with head
 1.35.6.2 01-Jun-2006  kardel Sync with head.
 1.35.6.1 22-Apr-2006  simonb Sync with head.
 1.35.4.1 09-Sep-2006  rpaulo sync with head
 1.39.8.2 12-Jan-2007  ad Sync with head.
 1.39.8.1 18-Nov-2006  ad Sync with head.
 1.41.2.1 10-Dec-2006  yamt sync with head.
 1.42.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.43.60.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.43.56.1 23-Feb-2009  snj Pull up following revision(s) (requested by plunky in ticket #461):
sys/miscfs/umapfs/umap_subr.c: revision 1.26
sys/miscfs/umapfs/umap_vnops.c: revision 1.44
While we remap credentials we should ignore cred == FSCRED as well as
cred == NOCRED.
This fixes a page fault occurring when a union is mounted over a umap,
as FSCRED is passed by union filesystem.
 1.43.54.2 28-Apr-2009  skrll Sync with HEAD.
 1.43.54.1 03-Mar-2009  skrll Sync with HEAD.
 1.43.44.3 11-Aug-2010  yamt sync with head.
 1.43.44.2 11-Mar-2010  yamt sync with head
 1.43.44.1 04-May-2009  yamt sync with head.
 1.48.4.3 21-Apr-2011  rmind sync with head
 1.48.4.2 05-Mar-2011  rmind sync with head
 1.48.4.1 03-Jul-2010  rmind sync with head
 1.48.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.51.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.53.16.1 18-May-2014  rmind sync with head
 1.53.12.2 03-Dec-2017  jdolecek update from HEAD
 1.53.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.53.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.56.4.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.57.6.1 21-Apr-2017  bouyer Sync with HEAD
 1.57.4.2 26-Apr-2017  pgoyette Sync with HEAD
 1.57.4.1 20-Mar-2017  pgoyette Sync with HEAD
 1.57.2.2 28-Aug-2017  skrll Sync with HEAD
 1.57.2.1 05-Feb-2017  skrll Sync with HEAD
 1.59.6.1 04-Jun-2017  bouyer pullup the following revisions, requested by hannken in ticket #2:
src/share/man/man9/fstrans.9 1.25
src/sys/kern/vfs_mount.c 1.66
src/sys/kern/vfs_subr.c 1.468
src/sys/kern/vfs_trans.c 1.46
src/sys/kern/vfs_vnode.c 1.94, 1.95, 1.96
src/sys/kern/vnode_if.c 1.105, 1.106
src/sys/kern/vnode_if.sh 1.65, 1.66
src/sys/kern/vnode_if.src 1.76
src/sys/miscfs/genfs/genfs_io.c 1.69
src/sys/miscfs/genfs/genfs_vnops.c 1.196, 1.197
src/sys/miscfs/genfs/layer_extern.h 1.40
src/sys/miscfs/genfs/layer_vfsops.c 1.51
src/sys/miscfs/genfs/layer_vnops.c 1.67
src/sys/miscfs/nullfs/null_vnops.c 1.42
src/sys/miscfs/overlay/overlay_vnops.c 1.24
src/sys/miscfs/umapfs/umap_vnops.c 1.60
src/sys/rump/include/rump/rumpvnode_if.h 1.29, 1.30
src/sys/rump/librump/rumpkern/emul.c 1.182
src/sys/rump/librump/rumpvfs/rumpvnode_if.c 1.29, 1.30
src/sys/sys/fstrans.h 1.11
src/sys/sys/vnode.h 1.278
src/sys/sys/vnode_if.h 1.100, 1.101
src/sys/sys/vnode_impl.h 1.14, 1.15
src/sys/ufs/lfs/lfs_pages.c 1.12

Vnode state, lock and fstrans cleanup:
- Rename vnode state "VS_ACTIVE" to "VS_LOADED" and add synthetic
state "VS_ACTIVE" to assert a loaded vnode with usecount > 0.

- Redo FSTRANS in vnode_if.c and use it for VOP_LOCK and VOP_UNLOCK.

- Cleanup the genfs lock operations.

- Make "struct vnode_impl" member "vi_lock" a krwlock_t again.

- Remove the lock type argument from fstrans_start and
fstrans_start_nowait,
remove now unused FSTRANS state "FSTRANS_SUSPENDING".

RSS XML Feed