Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/vfs_syscalls.c
RevisionDateAuthorComments
 1.571  16-Jul-2025  kre Kernel part of O_CLOFORK implementation (plus kernel revbump)

This is Ricardo Branco's implementation of O_CLOFORK (and
associated fcntl, etc) for NetBSD (with a few minor changes
by me).

For now, the header file symbols that should be exposed to
userland are hidden inside temporary #ifdef _KERNEL blocks,
just to avoid random userland apps, or config scripts, from
seeing any of this before it is better tested.

Userland parts of this will follow soon.

This also bumps the kernel version to 10.99.15 (changes to
data structs, and the signature of fd_dup()).
 1.570  07-Dec-2024  riastradh vfs(9): Fix some more whitespace issues.

No functional change intended.
 1.569  07-Dec-2024  riastradh vfs(9): Sprinkle KNF.

No functional change intended.
 1.568  11-Aug-2024  bad tweak restoration of asyncflag

Simply update mp->mnt_flag with asyncflag as it contains the correct value.
Use the same pattern as in the other two places (vfs_syscalls.c, ffs_wapbl.c).

NFC.
 1.567  11-Aug-2024  bad explain why MNT_ASYNC is temporarily cleared

related to PR kern/58564.
 1.566  04-Jul-2024  christos use the proper kernel pointer
 1.565  04-Jul-2024  mrg don't fd_putfile() if you haven't grabbed a ref already.

the condition to call fd_getvnode() was changed, but the condition
to call fd_putfile() afterwards was not changed, leading to a panic
seen by Chavdar on current-users, probably.

builds, runs, seems obvious.
 1.564  01-Jul-2024  christos refactor slightly so we don't try to read the buffer supplied by userland.
 1.563  01-Jul-2024  christos remove the part of previous that crashes for now.
 1.562  29-Jun-2024  christos Ignore the file descriptor argument for absolute pathnames, per posix eg:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html
 1.561  09-Sep-2023  ad do_sys_accessat(): copy credentials only when needed.
 1.560  10-Jul-2023  christos Add memfd_create(2) from GSoC 2023 by Theodore Preduta
 1.559  29-Apr-2023  riastradh kern/vfs_syscalls.c: Nix trailing whitesapce.

No functional change intended.
 1.558  09-Apr-2023  riastradh kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.557  05-Mar-2023  riastradh open(2): Don't map ERESTART to EINTR.

If a file or device's open function returns ERESTART, respect that --
restart the syscall; don't pretend a signal has been delivered when
it was not. If an SA_RESTART signal was delivered, POSIX does not
allow it to fail with EINTR:

SA_RESTART
This flag affects the behavior of interruptible functions;
that is, those specified to fail with errno set to [EINTR].
If set, and a function specified as interruptible is
interrupted by this signal, the function shall restart and
shall not fail with [EINTR] unless otherwise specified. If
an interruptible function which uses a timeout is restarted,
the duration of the timeout following the restart is set to
an unspecified value that does not exceed the original
timeout value. If the flag is not set, interruptible
functions interrupted by this signal shall fail with errno
set to [EINTR].

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

Nothing in the POSIX definition of open specifies otherwise.

In 1990, Kirk McKusick added these lines with a mysterious commit
message:

Author: Kirk McKusick <mckusick>
Date: Tue Apr 10 19:36:33 1990 -0800

eliminate longjmp from the kernel (for karels)

diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
index 7bc7b39bbf..d572d3a32d 100644
--- a/sys/kern/vfs_syscalls.c
+++ b/sys/kern/vfs_syscalls.c
@@ -14,7 +14,7 @@
* IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
*
- * @(#)vfs_syscalls.c 7.42 (Berkeley) 3/26/90
+ * @(#)vfs_syscalls.c 7.43 (Berkeley) 4/10/90
*/

#include "param.h"
@@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
crfree(fp->f_cred);
fp->f_count--;
- if (error == -1) /* XXX from fdopen */
- return (0); /* XXX from fdopen */
+ if (error == EJUSTRETURN) /* XXX from fdopen */
+ return (0); /* XXX from fdopen */
+ if (error == ERESTART)
+ error = EINTR;
scp->sc_ofile[indx] = NULL;
return (error);
}

(found via this git import of the CSRG history:
https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe)

This change appears to have served two related purposes:

1. The fdopen function (the erstwhile open routine for /dev/fd/N)
used to return -1 as a hack to mean it had just duplicated the fd;
it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
return EJUSTRETURN, now defined to be -2, presumably to avoid a
conflict with ERESTART, defined to be -1. So this change finished
part of the change by Mike Karels to use a different magic return
code from fdopen.

Of course, today we use still another disgusting hack, EDUPFD, for
the same purpose, so none of this is relevant any more.

2. Prior to April 1990, the kernel handled signals during tsleep(9)
by longjmping out to the system call entry point or similar. In
April 1990, Mike Karels worked to convert all of that into
explicit unwind logic by passing through EINTR or ERESTART as
appropriate, instead of setjmp at each entry point.

However, it's not clear to me why this setjmp/longjmp and
fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
ERESTART to EINTR in open(2). I suspect it was a mistake.

In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
was removed from FreeBSD:

r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines

Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks

Index: vfs_syscalls.c
 1.556  02-Nov-2022  andvar branches: 1.556.2;
fix various typos in comments and messages.
 1.555  12-Feb-2022  thorpej Add inline functions to manipulate the klists that link up knotes
via kn_selnext:

- klist_init()
- klist_fini()
- klist_insert()
- klist_remove()

These provide some API insulation from the implementation details of these
lists (but not completely; see vn_knote_attach() and vn_knote_detach()).
Currently just a wrapper around SLIST(9).

This will make it significantly easier to switch kn_selnext linkage
to a different kind of list.
 1.554  07-Nov-2021  christos Merge the kernel portion of the posix-spawn-chdir project by Piyush Sachdeva.
 1.553  26-Sep-2021  thorpej Fix the locking around EVFILT_FS. Previously, the code would walk the
fs_klist and take the kqueue_misc_lock inside the event callback.
However, that list can be modified by the attach and detach callbacks,
which could result in the walker stepping right off a cliff.

Instead, we give the fs_klist it's own lock, and hold it while we
call knote(), using the NOTE_SUBMIT protocol. Also, fs_filtops
into vfs_syscalls.c so all of the locking logic is contained in one
file (there is precedence with sig_filtops). fs_filtops is now marked
MPSAFE.
 1.552  11-Sep-2021  riastradh sys/kern: Allow custom fileops to specify fo_seek method.

Previously only vnodes allowed lseek/pread[v]/pwrite[v], which meant
converting a regular device to a cloning device doesn't always work.

Semantics is:

(*fp->f_ops->fo_seek)(fp, delta, whence, newoffp, flags)

1. Compute a new offset according to whence + delta -- that is, if
whence is SEEK_CUR, add delta to fp->f_offset; if whence is
SEEK_END, add delta to end of file; if whence is SEEK_CUR, use delta
as is.

2. If newoffp is nonnull, return the new offset in *newoffp.

3. If flags & FOF_UPDATE_OFFSET, set fp->f_offset to the new offset.

Access to fp->f_offset, and *newoffp if newoffp = &fp->f_offset, must
happen under the object lock (e.g., vnode lock), in order to
synchronize fp->f_offset reads and writes.

This change has the side effect that every call to VOP_SEEK happens
under the vnode lock now, when previously it didn't. However, from a
review of all the VOP_SEEK implementations, it does not appear that
any file system even examines the vnode, let alone locks it. So I
think this is safe -- and essentially the only reasonable way to do
things, given that it is used to validate a change from oldoff to
newoff, and oldoff becomes stale the moment we unlock the vnode.

No kernel bump because this reuses a spare entry in struct fileops,
and it is safe for the entry to be null, so all existing fileops will
continue to work as before (rejecting seek).
 1.551  03-Jul-2021  mlelstv Return error from fd_dupopen.
 1.550  29-Jun-2021  dholland Add containment for the cloning devices hack in vn_open.

Cloning devices (and also things like /dev/stderr) work by allocating
a struct file, stuffing it in the file table (which is a layer
violation), stuffing the file descriptor number for it in a magic
field of struct lwp (which is gross), and then "failing" with one of
two magic errnos, EDUPFD or EMOVEFD.

Before this commit, all callers of vn_open in the kernel (there are
quite a few) were expected to check for these errors and handle the
situation. Needless to say, none of them except for open() itself did,
resulting in internal negative errnos being returned to userspace.

This hack is fairly deeply rooted and cannot be eliminated all at
once. This commit adds logic to handle the magic errnos inside
vn_open; now on success vn_open returns either a vnode or an integer
file descriptor, along with a flag that says whether the underlying
code requested EDUPFD or EMOVEFD. Callers not prepared to cope with
file descriptors can pass NULL for the extra return values, in which
case if a file descriptor would be produced vn_open fails with
EOPNOTSUPP.

Since I'm rearranging vn_open's signature anyway, stop exposing struct
nameidata. Instead, take three arguments: an optional vnode to use as
the starting point (like openat()), the path, and additional namei
flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei
behavior, e.g. NOFOLLOW, can be requested via the open flags.)

This change requires a kernel bump. Ride the one an hour ago.
(That was supposed to be coordinated; did not intend to let an hour
slip by. My fault.)
 1.549  17-Feb-2021  dholland branches: 1.549.4;
Don't allow callers of fsync_range() to trigger UB in the kernel.

(also prohibit syncing ranges at start offsets less than zero)
 1.548  16-May-2020  christos branches: 1.548.2;
Add ACL support for FFS. From FreeBSD.
 1.547  21-Apr-2020  ad Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.
 1.546  20-Apr-2020  ad Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.
 1.545  04-Apr-2020  ad branches: 1.545.2;
Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.544  25-Mar-2020  gdt Relax fdatasync restriction that fd be writable

The restriction that a fd passed to fdatasync(2) must be writable was
added in 2003 in order to comply with POSIX. Since then, POSIX has
removed that requirement, and POSIX-valid programs have been therefore
encountering errors on NetBSD.

Patch by Paul Ripke after discussion on netbsd-users. Issue
discovered with pkgsrc/databases/mongodb3 as used by pkgsrc/net/unifi.
 1.543  03-Mar-2020  christos don't skip the rdir check for the lazy case; breaks chroot df(1) hiding.
 1.542  23-Feb-2020  ad Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.
 1.541  22-Feb-2020  maxv Inline the block in the parent block, for clarity, and also to prevent a
false positive with kMSan.

Here, LLVM reorders the conditions and checks 'vattr' before 'error'. But
if 'error' is non-zero then 'vattr' is not initialized, and kMSan notices
the uninitialized memory read.
 1.540  17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.539  31-Dec-2019  ad branches: 1.539.2;
sys_fchdir: use LK_SHARED.
 1.538  22-Dec-2019  ad Make mntvnode_lock per-mount, and address false sharing of struct mount.
 1.537  26-Sep-2019  christos make nmountcompatnames unsigned (assigned from __arraycount, compared with
unsigned in compat code)
 1.536  22-Sep-2019  christos Add a new member to struct vfsstat and grow the unused members
The new member is caled f_mntfromlabel and it is the dkw_wname
of the corresponding wedge. This is now used by df -W to display
the mountpoint name as NAME=
 1.535  20-Sep-2019  kamil Validate usec ranges in do_sys_utimes()

sys/kern/vfs_syscalls.c:3939:4, signed integer overflow: 503923632 * 1000 cannot be represented in type 'int'

Reported-by: syzbot+4cfc86ffd30e8678f68d@syzkaller.appspotmail.com
 1.534  15-Sep-2019  christos Prevent O_EXEC for mq_open(2), and O_EXEC with a writable fd for open(2).
 1.533  06-Jul-2019  maxv branches: 1.533.2;
Fix bug: if seg == UIO_SYSSPACE, tv[] is not initialized. The branches
should depend on tptr[] instead.
 1.532  21-Jun-2019  kamil Restore ability to create regular files with mknod(2)

This behavior is requested in ATF tests.
 1.531  20-Jun-2019  kamil Add mkfifo{,at}(2) mode in mknod{,at}(2) as requested by POSIX

mknod with mode & S_IFIFO and dev=0 shall behave like mkfifo.

Update the documentation to reflect this state.

Add ATF tests.

This is an in-kernel implementation as typically user-space programs use
mkfifo(2) directly, however whenever there is need to bypass libc (like in
valgrind) then portable POSIX software calls the mknod syscall.

Noted on tech-kern@ by Greg Troxel.
 1.530  19-Jun-2019  kamil Correct wrong type of uio_seg passed to do_sys_mknodat()

It was introduced by an accident in previous commit to this file.

Detected by syzbot:
https://syzkaller.appspot.com/text?tag=CrashLog&x=16635d9ea00000
 1.529  18-Jun-2019  kamil Drop unused retval pointer from do_sys_mknod{,at}()

No functional change intended.
 1.528  13-May-2019  hannken do_sys_mkdir(): pass the requested segment down to do_sys_mkdirat().
 1.527  01-Mar-2019  pgoyette Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.
 1.526  20-Feb-2019  hannken Bracket do_sys_renameat() and nfsrv_rename() with fstrans.

The v_mount field for vnodes on the same file system as "from"
is now stable for referenced vnodes.

VFS_RENAMELOCK no longer may use lock from an unreferenced and
freed "struct mount".
 1.525  19-Feb-2019  mlelstv Don't allow MNT_UNION on the root, there is no covered filesystem.

Fixes PR 53850
 1.524  05-Feb-2019  kamil The panic for fopen(NULL, ... is back, fix it

Restore the original behavior before merging the compat refactoring branch.

Now:
- no compat_10 -> perform pathbuf_copyin() and report EFAULT
- compat_10 and error -> report error
- compat_10 and success -> return file descriptor for "."

PR kern/53948
 1.523  05-Feb-2019  pgoyette If the openat_10 hook is present and it returns success, continue with
the rest of the syscall; don't return prematurely, as we'll report
success (return value 0) but won't have set up the fd.
 1.522  05-Feb-2019  pgoyette Correctly handle the NULL path when no compat_10 code is available.

This should address kern/53948 (thanks, kamil@, for the PR and for
testing the fix)
 1.521  31-Jan-2019  manu Do not resolve fdat for openat(2) if path is absolute

Opengroup says "The openat() function shall be equivalent to the open() function except in the case where path specifies a relative path", but
says nothing about fdat usage when path is absolute;
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html

We used to always reslove fdat, leading to error if it was invalid (e.g.: -1). That caused portability problem with other systems that
just ignore it. See discussion in a pull request to work around that
problem with MariaDB: https://github.com/MariaDB/server/pull/838

We fix the problem by ignoring fdat when path is absolute.
 1.520  29-Jan-2019  pgoyette Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.
 1.519  27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.518  09-Jan-2018  christos branches: 1.518.2; 1.518.4;
Merge autofs support from: Tomohiro Kusumi
XXX: Does not work yet
 1.517  07-Nov-2017  christos We computed the length of the string already, so use it...
 1.516  01-Jun-2017  chs branches: 1.516.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.515  07-May-2017  hannken Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().
 1.514  07-May-2017  hannken Return ENOENT if trying to suspend an unmounted file system.
 1.513  26-Apr-2017  riastradh branches: 1.513.2;
Change VOP_REMOVE and VOP_RMDIR to preserve lock/ref on dvp.

No change to vp -- the plan is to replace the node by the
componentname in the vop parameters, and let all directory vops do
lookups internally.

Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2017/04/17/msg021825.html
 1.512  17-Apr-2017  hannken Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.511  17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.510  12-Apr-2017  hannken Switch do_sys_sync() and do_sys_getvfsstat() to mountlist iterator.
 1.509  07-Mar-2017  hannken Fix a logic error introduced with Rev. 1.507: defer setting MNT_RDONLY
only if going from read-write to read-only.

Should fix PR kern/52045 (panic: ffs_sync: rofs mod, fs=/ after fsck)
 1.508  01-Mar-2017  hannken Suspend the mounted file system while updating.
 1.507  01-Mar-2017  hannken Change the protocol to update a mounted file system from read-write
to read-only and vice versa:

- Add an internal flag IMNT_WANTRDONLY.
- Set either IMNT_WANTRDWR or IMNT_WANTRDONLY if going from or to read-only.
- After successfull call to VFS_MOUNT() set or clear MNT_RDONLY.

Adapt tmpfs and rumpfs to the new protocol. Other file systems will be
updated when they get the IMNT_CAN_RWTORO property.

Welcome to 7.99.64
 1.506  17-Feb-2017  hannken Take fstrans_start before syncing a file system.
 1.505  31-Jul-2016  dholland branches: 1.505.2;
typo in comment
 1.504  28-Nov-2015  dholland branches: 1.504.2;
Fix kern/50841: races in sys_lseek.
 1.503  28-Oct-2015  martin Fix inverted KASSERT
 1.502  25-Oct-2015  martin Apease bogus gcc warning.
 1.501  23-Oct-2015  maxv Change do_sys_mount() so that it only takes as argument the type of the
drive instead of its associated vfsops. Makes it more friendly, and allows
compat binaries to autoload VFS modules if needed.

sent on tech-kern@, ok christos@
 1.500  24-Jul-2015  maxv Unused inits (harmless).

Found by Brainy.
 1.499  12-Jun-2015  dholland Use NOFOLLOW intead of <empty>. Purely cosmetic as NOFOLLOW is 0, but
it's supposed to be there for clarity.
 1.498  06-May-2015  hannken Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@
 1.497  21-Apr-2015  riastradh Cull unused INRENAME and INRELOOKUP from callers.
 1.496  20-Apr-2015  riastradh Make VOP_LINK return directory still locked and referenced.

Ride 7.99.10 bump.
 1.495  09-Apr-2015  riastradh But rename(..., "x/..") is still supposed to yield EINVAL. Go figure.
 1.494  09-Apr-2015  riastradh Tests claim rename(..., "x/.") yields EISDIR, so do that. Fixes zfs.
 1.493  15-Feb-2015  martin A syscall like posix_fallocate() that is not supposed to set errno in
userland needs to always return 0 and store the error code *retval.
 1.492  26-Nov-2014  manu branches: 1.492.2;
Do not follow symlinks in sys_unmount()

There are situations where the underlying filesystem is unreachable
(e.g: NFS) causing symlink resolution to hang. Such a situation
should be avoided by using umount -f -R (force and raw), but while -R
causes the symlink resolution to be skipped in umount(8), the kernel was
still doing it in sys_unmount(). This changes fixes that.

When the -R flag is not given, umount(8) does symlinks resolution through
realpath(3) before calling unmount(2), hence not doing it in the kernel
would not change behavior.
 1.491  05-Sep-2014  matt Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.490  25-Jul-2014  maxv branches: 1.490.2;
'result' -> 'error'
 1.489  25-Jul-2014  dholland typo in comment
 1.488  25-Jul-2014  dholland Add fdiscard and posix_fallocate syscalls.
 1.487  30-Jun-2014  maxv This is weird; 'abort' already does all this, so simply use goto abort.
 1.486  28-Jun-2014  dholland Revert the following changes:

src/sys/sys/quotactl.h 1.37
src/sys/compat/netbsd32/netbsd32.h 1.101
src/sys/compat/netbsd32/netbsd32_netbsd.c 1.188, 1.189
src/sys/kern/vfs_quotactl.c 1.39
src/sys/kern/vfs_syscalls.c 1.483
src/sys/ufs/lfs/ulfs_quota.c 1.11
src/sys/ufs/ufs/ufs_quota.c 1.116
src/lib/libquota/quota_kernel.c 1.5

and do them correctly.

If you're going to change the name of something, you need to change
the name of *all* the things with the same name, not just a handful,
and you should change it to something similar so it still matches the
rest of the system rather than just picking an arbitrarily different
name.

Hi, Joerg.

To wit, rename the quotactl "delete" operation to "del", because
"delete" is a reserved word in C++ and for some reason Joerg wants to
run internal interfaces used only by C code through his C++ compiler.
Do not rename it to "remove" instead, because this doesn't match
libquota or the rest of the usage throughout the system; and rename
all the related identifiers, not just the ones that blew the mind of
Joerg's C++ compiler.

Because this is not a user-facing API (the only userland consumer
sys/quotactl.h is libquota) it is sort of ok to make arbitrary
source-incompatible changes; however, by the same token it's completely
unnecessary. If it *were* a user-facing API that someone might have a
semi-rational reason to want to run a C++ compiler on, it would be
incorrect to change it at this point.
 1.485  26-Jun-2014  christos Don't initialize the fh pointer to NULL when the allocation functions fail
and allow NULL in the free functions. It just leads to writing sloppy code
for no good reason.
 1.484  14-Jun-2014  njoly Follow OpenGroup online documents for truncate[1] and ftruncate[2].
Fail with EINVAL for length argument negative values.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/truncate.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html
 1.483  12-Jun-2014  joerg Don't t use a C++ keyword as field name.
 1.482  20-Apr-2014  maxv This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.

If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.481  18-Apr-2014  maxv Memory leak (only triggerable from root).

ok christos@
 1.480  16-Apr-2014  maxv Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
 1.479  16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.478  04-Apr-2014  maxv branches: 1.478.2;
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.

ok christos@
 1.477  22-Mar-2014  maxv Fix a potential - but very unlikely - NULL pointer dereference.
(it does not introduce a new error code for open(), since
pathbuf_copyin() is already there and can return ENOMEM)

Found by my code scanner.
 1.476  15-Feb-2014  njoly Remove argument name from prototype.
 1.475  25-Jan-2014  christos Add compat_10, open NULL == open "."
 1.474  25-Jan-2014  christos expose do_open
 1.473  23-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30
 1.472  17-Jan-2014  hannken Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29
 1.471  27-Nov-2013  christos Change the queue.3 *_END(&head) macros to NULL. Since we don't have CIRCLEQ
anymore, all the macros expand to NULL anyway, so this improves readability.
Requested by rmind@
 1.470  23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.469  18-Nov-2013  chs expose various do_*at() functions for compat_linux.
 1.468  17-Oct-2013  njoly Change mknodat(2) device argument type from uint32_t to dev_t.
Adds needed extra PAD argument for 64bit alignment, and libc wrapper.
 1.467  20-Jul-2013  njoly Remove, in do_sys_renameat(), wrong KASSERTs that check for non NULL
from/to arguments. Such values are correctly handled by later
pathbuf_maybe_copyin() calls, that will fail with EFAULT.

ok from dholland@.
 1.466  18-Jul-2013  matt Make do_sys_utimensat public
 1.465  18-Jul-2013  matt export do_sys_statat for netbsd32
 1.464  28-Jun-2013  christos branches: 1.464.2; 1.464.4;
don't store random values in retval
http://m00nbsd.net/ae123a9bae03f7dde5c6d654412daf5a.html
 1.463  13-Jan-2013  dholland Revert defective O_SEARCH implementation committed by manu@ along with
the *at system calls on November 18th of last year. Reasons to revert
it include:
- it is incorrect in a whole variety of ways (but fortunately, one
of them is that the missing and improper permission checks have
no net effect);
- it was committed without review or discussion;
- core ruled that all the new O_* flags pertaining to the *at calls
needed to wait until their semantics could be clarified.

manu was asked to revert it on these grounds but has ignored the request.

I have left O_SEARCH defined and visible and made open() explicitly
ignore it. This way, most code that tries to use it will continue to
build and run. I've also arranged lib/libc/c063/t_o_search.c so that
the tests that make use of the O_SEARCH semantics will disappear until
O_SEARCH comes back, and fixed some mistakes and/or incorrect hacks
that were causing some of these to succeed despite the broken O_SEARCH
implementation.
 1.462  30-Nov-2012  njoly Apply fix from hannken to ensure that VOP_ACCESS() is called on a
locked vnode for fd_nameiat(), fd_nameiat_simple() and do_sys_openat().
Fix both PR/47226 and PR/47255.
 1.461  19-Nov-2012  martin Use copyout to copy data from kernel out to userland!
Fixes PR kern/47217.
 1.460  18-Nov-2012  manu Add most system calls for POSIX extended API set, part 2, with test cases:
faccessat(2), fchmodat(2), fchownat(2), fstatat(2), mkdirat(2), mkfifoat(2),
mknodat(2), linkat(2), readlinkat(2), symlinkat(2), renameat(2), unlinkat(2),
utimensat(2), openat(2).

Also implement O_SEARCH for openat(2)

Still missing:
- some flags for openat(2)
- fexecve(2) implementation
 1.459  19-Oct-2012  riastradh No, we can't elide the fs-wide rename lock for same-directory rename.

rename("a/b", "a/c") and rename("a/c/x", "a/b/y") will deadlock.

Darn.
 1.458  12-Oct-2012  riastradh Disentangle do_sys_rename.

Elide the fs-wide rename lock for single-directory renames. This
required changing the order of lookups, so that we know what the
directories are before we lock the nodes.

Clean up error branches, explain why various nonsense happens and
what it does and doesn't do, and note some of what needs to change.
 1.457  27-Jun-2012  cheusov branches: 1.457.2;

Add new action KAUTH_CRED_CHROOT for kauth(9)'s credential scope.
Reviewed and approved by elad@.
 1.456  08-May-2012  gson Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.
 1.455  02-May-2012  rmind do_open: move pathbuf destruction to the callers, thus simplify and fix a
memory leak on error path.
 1.454  30-Apr-2012  manu Fix the extattr start fix. Looking up the filesystemroot vnode again
does not seems to be reliable. Instead save it before mount_domount()
sets it to NULL.
 1.453  30-Apr-2012  manu Fix mount -o extattr : previous patch fixed a panic but caused operation
to happen on the mount point instead of the mounted filesystem.
 1.452  28-Apr-2012  manu Do not use vp after mount_domount() call as it sets it to NULL on success.
This fixes a panic when starting extended attributes.
 1.451  17-Apr-2012  christos it is not an error if the kernel needs to clear the setuid/
setgid bit on write/chown/chgrp
 1.450  13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.449  12-Feb-2012  martin branches: 1.449.2;
fd_open(): fix confusion between userland and kernel encoding of open flags
 1.448  11-Feb-2012  martin Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.
 1.447  01-Feb-2012  dholland Be consistent about whether idtype and objtype codes are signed or
unsigned. They are signed. (While unsigned might have been a better
choice, it doesn't really matter and the majority of preexisting uses
were signed. And consistency is good.)
 1.446  01-Feb-2012  dholland Improve the names of some members of struct quotactl_args. These are
effectively function parameter names, but since they need to be
described with the same names in the man page the choices do matter.
Some.
 1.445  01-Feb-2012  dholland Split out a do_sys_quotactl for compat_netbsd32.
 1.444  01-Feb-2012  dholland Change the syscall API for quotas over to the new non-proplib one.

- struct vfs_quotactl_args -> struct quotactl_args
- add sys/stdint.h to sys/quotactl.h for clean userland build
- install sys/quotactl.h in /usr/include
- update set lists for same
- add new marshalling code in libquota
- add new unmarshalling code in vfs_syscalls.c
- discard proplib interpreter code in vfs_quotactl.c
- add dispatching code for the 14 quotactl ops in vfs_quotactl.c
- mark the proplib quotactl syscall obsolete
- add a new syscall number for the new quotactl syscall
- change the name of the syscall to __quotactl()
- remove the decl of the old quotactl from quota/quotaprop.h
- add a decl of the new quotactl to sys/quotactl.h
- update the libc build
- update ktruss
- remove proplib marshalling code from libquota
- update copy of syscall table in gdb ppc sources
- hack rumphijack to accomodate new quotactl name (as I recall,
pooka wanted such a name change to simplify something, but I
don't really see what/how)

This change appears to require a kernel version bump for rumpish
reasons.
 1.443  29-Jan-2012  dholland Add vfs_quotactl() in between the syscall and VFS_QUOTACTL. Call it
from the COMPAT_50 code as well as the current sys_quotactl instead
of going directly to VFS_QUOTACTL. Doesn't actually do anything yet.
 1.442  02-Dec-2011  yamt branches: 1.442.2;
fix an indent an unwarp a short line.
 1.441  18-Nov-2011  christos - collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.
 1.440  14-Oct-2011  hannken branches: 1.440.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.
 1.439  22-Aug-2011  enami Remove return statement which can't be reached.
 1.438  22-Aug-2011  enami When both nanoseconds fields of futimens/utimensat call are set
to UTIMES_NOW, act as if NULL is passed to second argument, i.e.,
do same permission check and set exactly same value to both access
and modification time.
 1.437  18-Aug-2011  manu Fix utimes/futimes after utimensat/futimens addition
 1.436  17-Aug-2011  manu Missing bit in previous commit: do_sys_utimens prototype in the right
place.
 1.435  17-Aug-2011  martin add missing prototype
 1.434  17-Aug-2011  manu Add futimens(2) and part of utimnsat(2)
 1.433  08-Aug-2011  manu First stage of support for Extended API set 2. Most of the think is
unimplemented, except enough of linkat(2) to hardlink to a symlink.

Everything new in headers is guarded #ifdef _INCOMPLETE_XOPEN_C063 since
some software (e.g.: xcvs in our own tree) will assume they can use openat(2)
when AT_FDCWD is defined. _INCOMPLETE_XOPEN_C063 will go away once support
will be completed.
 1.432  24-Jul-2011  martin Make sure to not overwrite error if it already is EEXISTS - hopefully
will fix > 100 failing fs tests in my last test run.
 1.431  03-Jul-2011  hannken Return EINVAL when trying to create a device node with "rdev == VNOVAL".

Fixes PR #45111 "tmpfs panic with mknod(2)".
 1.430  17-Jun-2011  manu Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.429  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.428  11-Jun-2011  uebayasi Fix build; p was not used, but l was passed to kauth. Use curlwp directly.
 1.427  10-Jun-2011  matt l isn't used. nuke it.
 1.426  10-Jun-2011  uebayasi do_sys_rename: Kill an unused variable.
 1.425  05-Jun-2011  dsl Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.
 1.424  02-Jun-2011  dsl Fix type in comment
(before I replace the 'l' with 'curlwp')
 1.423  24-Apr-2011  rmind branches: 1.423.2;
sys_link: prevent hard links on directories (cross-mount operations are
already prevented). File systems are no longer responsible to check this.
Clean up and add asserts (note that dvp == vp cannot happen in vop_link).

OK dholland@
 1.422  10-Apr-2011  christos - Add O_CLOEXEC to open(2)
- Add fd_set_exclose() to encapsulate uses of FIO{,N}CLEX, O_CLOEXEC, F{G,S}ETFD
- Add a pipe1() function to allow passing flags to the fd's that pipe(2)
opens to ease implementation of linux pipe2(2)
- Factor out fp handling code from open(2) and fhopen(2)
 1.421  02-Apr-2011  rmind Remove unused M_MOUNT.
 1.420  02-Apr-2011  rmind Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.
 1.419  12-Mar-2011  yamt prevent cross-mount operations.
 1.418  06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.417  28-Feb-2011  dholland Revert previous, which doesn't cover all the cases if F_OK isn't 0,
and just CTASSERT that it is, as that's not remotely likely to change.
Per source-changes-d; ok by Christos.
 1.416  28-Feb-2011  christos don't depend on F_OK being 0.
 1.415  27-Feb-2011  dholland Check for bogus flags to access() up front. Otherwise we end up
calling VOP_ACCESS with flags 0 and something asserts deep in the
bowels of kauth. PR 44648 from Taylor Campbell. (I moved the check
earlier relative to the suggested patch.)

Pullup candidate.
 1.414  13-Jan-2011  pooka branches: 1.414.2; 1.414.4;
allow file system to decide if it can be downgraded from r/w to r/o
 1.413  02-Jan-2011  dholland Remove remaining references to SAVESTART.
 1.412  02-Jan-2011  dholland Remove the special refcount behavior (adding an extra reference to the
parent dir) associated with SAVESTART in relookup().

Check all call sites to make sure that SAVESTART wasn't set while
calling relookup(); if it was, adjust the refcount behavior. Remove
related references to SAVESTART.

The only code that was reaching the extra ref was msdosfs_rename,
where the refcount behavior was already fairly broken and/or gross;
repair it.

Add a dummy 4th argument to relookup to make sure code that hasn't
been inspected won't compile. (This will go away next time the
relookup semantics change, which they will.)
 1.411  02-Jan-2011  dholland Remove unused nameidata field ni_startdir.
 1.410  30-Nov-2010  dholland Abolish struct componentname's cn_pnbuf. Use the path buffer in the
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)

This removes the need for the SAVENAME and HASBUF namei flags.
 1.409  19-Nov-2010  dholland Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
 1.408  21-Aug-2010  pgoyette Update the rest of the kernel to conform to the module subsystem's new
locking protocol.
 1.407  30-Jun-2010  pooka Enable kernel-internal symlink creation with do_sys_symlink().
I did this a while ago already, but can't remember why i didn't
commit it then.
 1.406  24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.405  15-Jun-2010  hannken When mounting a file system re-lookup and lock the directory we mount on
after the file system is setup by VFS_MOUNT(). This way recursive vnode
locks are no longer needed here and mounts on null mounts no longer fail
as described in PR #43439 (mount_null panic: lockdebug_wantlock: locking
against myself).

Based on a proposal from and
Reviewed by: David A. Holland <dholland@netbsd.org>
 1.404  03-Mar-2010  yamt branches: 1.404.2;
remove redundant checks of PK_MARKER.
 1.403  15-Jan-2010  pooka branches: 1.403.2;
Fix reference counting for vfsops in mount. Otherwise it's possible
(for an unprivileged user) to force vfs modules to remain loaded
forever. Also, it's possible for an admin with fat fingers to have
to curse out loud (a lot) and reboot.

.. or at least fix things as much as seems to be possible without
involving 1000 zorkmids. do_sys_mount() takes either struct vfsops
(which hopefully came properly referenced) or a userspace string
for file system type. The standard in-kernel calling convention
of "do_sys_mount(l, vfs_getopsbyname("nfs"), NULL," is not to be
considered healthy, kosher, or even tasty (although if vfs_getopsbyname()
fails the whole thing *currently* fails without the program counter
pointing to hyperspace).
 1.402  08-Jan-2010  pooka The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.401  23-Dec-2009  pooka Define namei flag INRENAME and set it if a lookup operation is part
of rename. This helps with building better asserts for rename in
the DELETE lookup ... the RENAME lookup is quite obviously a part
of rename.
 1.400  19-Dec-2009  martin Use the kernel space version of the vfs name, not the original userspace
pointer. Avoids crashes on archs with completely separate userspace VA.
 1.399  09-Aug-2009  haad Add enum uio_seg argument to do_sys_mknod and do_sys_mkdir so these functions
can be called from kernel, too.

Change needed for zfs device node creation, until we have propoer devfs.

Oked by ad@.
 1.398  02-Aug-2009  bad Add a note to change_root() that the callers need to authorize the operation.
As requested by elad@.
 1.397  01-Aug-2009  bad As discussed on tech-kern:

Factor out common code of chroot-like syscalls into change_root() and export
that function for use in other parts of the kernel.
Rename change_dir() to chdir_lookup() as the latter describes better what
the function does. While there, move the namei_data initialisation into
chdir_lookup(), too. And export chdir_lookup().
 1.396  02-Jul-2009  pooka expose mkdir to in-kernel consumers
 1.395  29-Jun-2009  dholland Convert 67 namei call sites to use namei_simple, in these functions:

check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
 1.394  02-May-2009  pooka Move dovfsusermount from vfs_syscalls.c to param.c: secmodel bsd44
depends on it and we can't isolate it in vfs.
(no, it doesn't really belong in param.c, but I couldn't figure out
a better place for it)
 1.393  29-Apr-2009  dyoung Extract common code from vfs_rootmountalloc(9) and mount_domount() into
a new struct mount-allocation routine, vfs_mountalloc(9). Documentation
updates will follow.

Attention: Synchronization Oversight Committee! In mount_domount(),
I postpone the call mutex_enter(&mp->mnt_updating) until right before
the VFS_MOUNT(9) call because (1) that looks to me like the earliest
possible opportunity for mp to become visible to any other LWP, because
it was just kmem_zalloc(9)'d and (2) it made extracting the common code
much easier. Tell me if my reasoning is faulty.
 1.392  28-Apr-2009  yamt do_sys_utimes: fix a bug introduced by rev.1.367.
VA_UTIMES_NULL is in va_vaflags, not va_flags.
 1.391  13-Mar-2009  yamt do_sys_unlink: remove an unused credential.
 1.390  23-Feb-2009  ad Fix some comments.
 1.389  22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.388  15-Feb-2009  enami Simplify the code; we already have a hint to decide which string to copy.
(And at least gcc generates better code.)
 1.387  14-Feb-2009  christos from enami: Only apply rootdir changes if the chroot dir != /
 1.386  14-Feb-2009  christos PR/40634: Christoph Badura: "chroot / /sbin/mount" shows only / as mounted
 1.385  05-Feb-2009  enami branches: 1.385.2;
Make revoke(2) works as before:
- vfs_syscalls.c rev. 1.342 fails to invert condition correcly when
then-clause and else-clause is swapped. Since then, revoke(2) fails
if it is issued by file owner.
- Probably since rev. 1.160 of genfs_vnops.c, revoke(2) fails if it is
applied to non-device file and drops kernel into ddb.
 1.384  17-Jan-2009  yamt malloc -> kmem_alloc.
 1.383  11-Jan-2009  christos merge christos-time_t
 1.382  14-Dec-2008  elad Fix length passed to strlcpy(): we used to get names one character shorter
than reality.

Should be pulled up to netbsd-5.
 1.381  19-Nov-2008  ad Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
 1.380  16-Nov-2008  pooka <sys/buf.h> police
 1.379  14-Nov-2008  ad Fix a comment.
 1.378  14-Nov-2008  ad - Move some more compat code into sys/compat.
- Split 4.3BSD ifioctl stuff into its own file.
- Remove some ifdefs that include small fragments of vfs compat code
which are difficult to relocate elsewhere.
 1.377  12-Nov-2008  ad Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.376  22-Oct-2008  ad branches: 1.376.2; 1.376.4;
- Be clear about whether module load is explicit or system initiated (auto).
- Require that module_lock is held to autoload, so that any preconditions
can be safely checked.
 1.375  25-Sep-2008  wiz Fix typo in comment.
 1.374  25-Sep-2008  ad PR kern/39307 (mfs will sometimes panic at umount time)

Change dounmount() so that it never drops the caller provided reference.
Garbage collecting 'struct mount' is up to the caller.
 1.373  24-Sep-2008  ad PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata

Prevent r/w to r/o downgrade until such time as someone has verified all
the relevant file system code.
 1.372  24-Sep-2008  ad PR kern/39307 mfs will sometimes panic at umount time

Don't drop reference to the mount if VFS_START() fails - that's for unmount
to do.
 1.371  17-Sep-2008  hannken Replace the fss unmount hook with a vfs_hook.

fssvar.h: struct device * -> device_t.
fss.c: establish unmount hook on first attach, remove on last detach.
vfs_syscalls.c: remove the call of fss_umount_hook().
vfs_trans.c: destroy cow handlers on unmount as fstrans_unmount() will be
called before vfs_hooks.
 1.370  31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.369  24-Jun-2008  ad branches: 1.369.2;
Nothing uses getsock/getvnode any more.
 1.368  17-Jun-2008  christos set mtime/atime properly, not backwards.
 1.367  17-Jun-2008  christos PR/38942: Pedro F. Giffuni: no support for birthtime in utimes(2).
 1.366  10-Jun-2008  simonb In mount_domount() there is no need to initialise "mp" if the first time
we use it we set it.
 1.365  26-May-2008  christos branches: 1.365.2;
More fixes needed in the error paths for the chroot code to work.
 1.364  26-May-2008  christos PR/38745: Kouichirou Hiratsuka: chroot(8) can leak information of outside of
chrooted directory
 1.363  20-May-2008  ad Ignore return from module_load() and just try vfsop lookup again.
 1.362  20-May-2008  ad If autoloading a module, don't consider the current working directory.
 1.361  20-May-2008  ad Don't try to load a module while holding a vnode lock.
 1.360  20-May-2008  ad If mount fails because the needed file system code isn't in kernel, try
to autoload with the needed vfsops.
 1.359  06-May-2008  ad branches: 1.359.2;
sys_unmount: drop ref to root dir before dounmount(), otherwise we'll
always get EBUSY.
 1.358  06-May-2008  ad PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.357  06-May-2008  xtraeme Make this build again.
 1.356  06-May-2008  ad PR kern/38141 lookup/vfs_busy acquire rwlock recursively

- sys_sync: acquire a write lock on the mount since the operation modifies
the mount structure.
- sys_fchdir: use vfs_trybusy(). If an unmount is in progress, just fail it.
 1.355  06-May-2008  ad Fix a couple of problems with checkdirs():

- vnode and cwd locks were being taken with proc_lock held, which is bad
because proc_lock can only be held for a short period of time.

- Processes could have continually forked and escaped notice, keeping
a reference to the old directory on top of which a new mount exists.
 1.354  30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.353  29-Apr-2008  ad kern/38135 vfs_busy/vfs_trybusy confusion

The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:

- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
 1.352  29-Apr-2008  ad Ignore processes with PK_MARKER set.
 1.351  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.350  25-Apr-2008  joerg branches: 1.350.2;
Before allowing rmdir to progess into the netherhells called VFS,
check if no filesystem is mounted on this node. This can happen
for null mounts on top of null mounts.
 1.349  24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.348  28-Mar-2008  dholland branches: 1.348.2; 1.348.4;
Yet another rename workaround - this time check for . and .. early because
relookup() objects to being asked to handle them.
 1.347  25-Mar-2008  ad mount_domount: hold an additional reference to the mountpoint across the
call to VFS_START. The file system can be unmounted before VFS_START
returns. Partially addresses PR kern/38291.
 1.346  21-Mar-2008  ad Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.345  30-Jan-2008  ad branches: 1.345.6;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.344  28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.343  25-Jan-2008  ad Remove VOP_LEASE. Discussed on tech-kern.
 1.342  24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.341  10-Jan-2008  ad Remove hack that's no longer needed.
 1.340  09-Jan-2008  elad Refactor part of the sys_revoke() code so that it can be used in the
compat code. Allows for the removal of two redundant kauth(9) calls.

okay christos@.
 1.339  05-Jan-2008  dsl Use FILE_LOCK() and FILE_UNLOCK()
 1.338  02-Jan-2008  ad Merge vmlocking2 to head.
 1.337  26-Dec-2007  ad Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.
 1.336  24-Dec-2007  ad Export do_sys_unlink, do_sys_rename to the rest of the kernel.
 1.335  20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.334  08-Dec-2007  pooka branches: 1.334.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.333  30-Nov-2007  yamt branches: 1.333.2;
- reduce the number of VOP_ACCESS calls for O_RDWR. for nfs, it reduces
the number of rpcs.
- reduce code duplication.
 1.332  26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.331  24-Oct-2007  pooka branches: 1.331.2;
80col & whitespace police. no functional change.
 1.330  23-Oct-2007  pooka Don't take a reference to the vfsops structure in mount_domount().
It is now taken when the vfs structure is received instead of having
to randomly add references in random places. Fixes at least vfs
lkm unload.
 1.329  10-Oct-2007  ad branches: 1.329.2;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.328  08-Oct-2007  ad Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.327  01-Sep-2007  pooka branches: 1.327.2;
Make bioops a pointer and point it to the softdeps struct in softdep
init. Decouples "options SOFTDEP" from the main kernel and ffs code.
 1.326  28-Aug-2007  pooka In quotactl, move vrele() to after the VFS call: protects the
mountpoint from being wiped under us better.

from David Holland
 1.325  15-Aug-2007  ad branches: 1.325.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.
 1.324  31-Jul-2007  pooka branches: 1.324.2; 1.324.4;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.323  22-Jul-2007  pooka Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.322  17-Jul-2007  christos branches: 1.322.2;
get rid of MFSNAMELEN
 1.321  14-Jul-2007  dsl Version mount(2) so that the length of the 'data' buffer is passed into
the kernel.
 1.320  12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.319  16-Jun-2007  dsl Move the point at which sys_readv and sys_preadv (and writev) get merged
so that the same common code can be used with a kernel-resident 'iov'
array from the 32-bit compat code (which currently has its own copy
of these routines.
 1.318  07-Jun-2007  hannken Dounmount(): rearrange mountlist_slock. vfs_allocate_syncvnode() may sleep
getting a new vnode so it must not be called with this simple_lock taken.

Fixes PR #36395
 1.317  22-May-2007  tnn When renaming, copy the new name into the designated memory area.
Tested by martti@
 1.316  21-May-2007  dsl Fix logic inversion - probably PR kern/36284
 1.315  19-May-2007  christos - remove pathname_ interface.
- use macros to deal with pathnames in userspace, when veriexec is used.
- reorder the veriexec_ call arguments for consistency.
With help from elad@ finding the last bug.
 1.314  17-May-2007  christos - since mknod now can create regular files, make sure veriexec allows it.
Done in a way to minimize ifdefs. Per discussions with elad.
 1.313  15-May-2007  elad Some Veriexec stuff that's been rotting in my tree for months.

Bug fixes:
- Fix crash reported by Scott Ellis on current-users@.

- Fix race conditions in enforcing the Veriexec rename and remove
policies. These are NOT security issues.

- Fix memory leak in rename handling when overwriting a monitored
file.

- Fix table deletion logic.

- Don't prevent query requests if not in learning mode.


KPI updates:
- fileassoc_table_run() now takes a cookie to pass to the callback.

- veriexec_table_add() was removed, it is now done internally. As a
result, there's no longer a need for VERIEXEC_TABLESIZE.

- veriexec_report() was removed, it is now internal.

- Perform sanity checks on the entry type, and enforce default type
in veriexec_file_add() rather than in veriexecctl.

- Add veriexec_flush(), used to delete all Veriexec tables, and
veriexec_dump(), used to fill an array with all Veriexec entries.


New features:
- Add a '-k' flag to veriexecctl, to keep the filenames in the kernel
database. This allows Veriexec to produce slightly more accurate
logs under certain circumstances. In the future, this can be either
replaced by vnode->pathname translation, or combined with it.

- Add a VERIEXEC_DUMP ioctl, to dump the entire Veriexec database.
This can be used to recover a database if the file was lost.
Example usage:

# veriexecctl dump > /etc/signatures

Note that only entries with the filename kept (that is, were loaded
with the '-k' flag) will be dumped.

Idea from Brett Lymn.

- Add a VERIEXEC_FLUSH ioctl, to delete all Veriexec entries. Sample
usage:

# veriexecctl flush

- Add a 'veriexec_flags' rc(8) variable, and make its default have
the '-k' flag. On systems using the default signatures file
(generaetd from running 'veriexecgen' with no arguments), this will
use additional 32kb of kernel memory on average.

- Add a '-e' flag to veriexecctl, to evaluate the fingerprint during
load. This is done automatically for files marked as 'untrusted'.


Misc. stuff:
- The code for veriexecctl was massively simplified as a result of
eliminating the need for VERIEXEC_TABLESIZE, and now uses a single
pass of the signatures file, making the loading somewhat faster.

- Lots of minor fixes found using the (still under development)
Veriexec regression testsuite.

- Some of the messages Veriexec prints were improved.

- Various documentation fixes.


All relevant man-pages were updated to reflect the above changes.

Binary compatibility with existing veriexecctl binaries is maintained.
 1.312  12-May-2007  dsl Change the compat sys_[fl]utime code to not use the stackgap.
 1.311  30-Apr-2007  dsl Split the statvfs functions so that the 'work' is done to a kernel buffer
which can either be copied directly to userspace, or converted then copied.
Saves replicating a lot of code in the compat functions (esp. for
getvfsstat) at a cast of an extra function call in the non-emulated case -
which is unlikely to be measurable given the other costs of the actions
involved (even on vax).
Remove dofhstat() and dofhstatvfs() (and the last caller).
Remove some redundant stackgap_init() calls.
 1.310  22-Apr-2007  dsl Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
 1.309  09-Apr-2007  pooka If mount(MNT_UPDATE) is called for a non-VROOT directory, don't vput()
the "mountpoint" vnode twice due to an error branch.

thanks go to Gert Doering for reporting the problem and testing the fix
and to Juergen Hannken-Illjes for much of the analysis work leading to
the discovery of the problem cause
 1.308  08-Apr-2007  hannken Remove now obsolete vn_start_write() and vn_finished_write() and
corresponding flags.

Revert softdep_trackbufs() to its state before vn_start_write() was added.

Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.

Welcome to 4.99.17
 1.307  01-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.306  10-Mar-2007  dsl branches: 1.306.2; 1.306.4;
Split the work for sys_stat, sys_lstat, sys_fstat and sys_fhstat out into
separate functions that don't do the copyout.
This allows all the compat_xxx versions to convert the 'struct stat' to
the correct format without using the 'stackgap'.
The stackgap isn't at all LWP friendly, and needs to be removed from
any compat functions that might involve threads (inc. clone()).
The code is still binary compatible with existing LKMs.
 1.305  09-Mar-2007  ad - Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.304  01-Mar-2007  pooka simplify previous a bit. no functional change.
 1.303  28-Feb-2007  pooka avoid lock leak in error branch of sys_fchdir()

thanks to Tom Spindler and Greg Oster in helping find the cure
 1.302  18-Feb-2007  pooka if doing VOP_CREATE via sys_mknod, set va_rdev to VNOVAL instead of 0
 1.301  18-Feb-2007  pooka Support creating regular files with mknod(2) to match Linux/Solaris
behaviour. This happens if mode contains S_IFREG. mknod(2) is
still restricted to the superuser.

no objections from tech-kern
 1.300  09-Feb-2007  ad branches: 1.300.2;
Merge newlock2 to head.
 1.299  04-Feb-2007  elad Initialize pathname_t objects to NULL.
 1.298  04-Feb-2007  chs more fixes for the new vnode locking scheme:
- don't use SAVESTART in calls to relookup() from unionfs,
just vref() the desired vnode when we need to.
- fix locking and refcounting in the unionfs EEXIST error cases.
- release any vnode locks before calling VFS_ROOT(), vfs_busy() is enough.
this allows us to simplify union_root() and fix PR 3006.
- union_lock() doesn't handle shared lock requests correctly,
so convert them to exclusive instead. fixes PR 34775.
- in relookup(), avoid reusing "dp" for different purposes,
the error handling wasn't right. (actually just get rid of dp.)
also, change relookup() to ignore LOCKLEAF and always return the
vnode locked since the callers already expect this.
 1.297  19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.296  15-Jan-2007  pooka TAILQ_INIT a mountpoint's vnode queue and always add vnodes to the
tail instead of an explicit check to add to the head for an empty
queue. Apparently TAILQ_INSERT_HEAD happens to work for a
non-initialized head and does implicit initialization so that
TAILQ_INSERT_TAIL works after that.
 1.295  05-Jan-2007  elad Use kauth(9).
 1.294  04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.293  03-Jan-2007  wrstuden Fix issue noted by Ilja van Sprundel and disclosed at 23C3.

Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.

Fix suggested by Elad, hand-typed by me.
 1.292  02-Jan-2007  elad Make mount(2) and unmount(2) use kauth(9) for security policy.

Okay yamt@.
 1.291  01-Jan-2007  pooka in rename_files(), match pre-1.280 locking behaviour by unlocking
fromnd's dvp only in case the dvp != vp
 1.290  01-Jan-2007  elad Add back MNT_NOEXEC propagation on new mounts by non-root users.
Mistakenly removed in revision 1.286.
 1.289  31-Dec-2006  elad Enforce exclusive MNT_GETARGS in mount_getargs().
 1.288  28-Dec-2006  yamt mount_domount: revive code to enforce MNT_NOSUID and MNT_NODEV for usermount,
which was removed mistakenly by rev.1.286. pointed by elad.
 1.287  27-Dec-2006  yamt mount_domount: don't forget to handle MNT_RDONLY.
PR/35327 from Christian Biere.
 1.286  26-Dec-2006  yamt - shorten the period to modify mnt_flag temporarily.
- desupport MNT_EXPORTED without MNT_UPDATE explicitly.
- fix a comment.
- unwrap short lines.
 1.285  25-Dec-2006  elad Don't reference userspace pointers.
 1.284  25-Dec-2006  elad Properly handle flags in mount_domount().
 1.283  24-Dec-2006  elad Slash sys_mount() and add three helper functions: mount_update(),
mount_getargs(), and mount_domount() to handle three main things it can
do.

This makes the code more readable and removes the horrible goto mess
that was lurking there since forever... it also makes it easier to
implement a security policy for that code.
 1.282  24-Dec-2006  elad PR/35278: YAMAMOTO Takashi: veriexec sometimes feeds user va to log(9)

Introduce the (intentionally undocumented) pathname_get(), pathname_path(),
and pathname_put(), to deal with allocating and copying of pathnames from
either kernel- or user-space.
 1.281  14-Dec-2006  yamt - just associate fileassoc "table" to struct mount.
because the latter is always available during the lifetime of the former,
there is little point to use another global list to keep track of them.
it also allows to remove an #ifdef FILEASSOC.

- avoid some operations (memory allocation and VOP) in fileassoc_file_lookup,
when fileassoc table is not used.

ok'ed by elad.
 1.280  09-Dec-2006  chs a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.279  30-Nov-2006  elad branches: 1.279.2;
Massive restructuring and cleanup of Veriexec, mainly in preparation
for work on some future functionality.

- Veriexec data-structures are no longer exposed.

- Thanks to using proplib for data passing now, the interface
changes further to accomodate that.

Introduce four new functions. First, veriexec_file_add(), to add
a new file to be monitored by Veriexec, to replace both
veriexec_load() and veriexec_hashadd(). veriexec_table_add(), to
replace veriexec_newtable(), will be used to optimize hash table
size (during preload), and finally, veriexec_convert(), to convert
an internal entry to one userland can read.

- Introduce veriexec_unmountchk(), to enforce Veriexec unmount
policy. This cleans up a bit of code in kern/vfs_syscalls.c.

- Rename veriexec_tblfind() with veriexec_table_lookup(), and make
it static. More functions that became static: veriexec_fp_cmp(),
veriexec_fp_calc().

- veriexec_verify() no longer returns the entry as well, but just
sets a boolean indicating whether an entry was found or not.

- veriexec_purge() now takes a struct vnode *.

- veriexec_add_fp_name() was merged into veriexec_add_fp_ops(), that
changed its name to veriexec_fpops_add(). veriexec_find_ops() was
also renamed to veriexec_fpops_lookup().

Also on the fp-ops front, the three function types used to initialize,
update, and finalize a hash context were renamed to
veriexec_fpop_init_t, veriexec_fpop_update_t, and veriexec_fpop_final_t
respectively.

- Introduce a new malloc(9) type, M_VERIEXEC, and use it instead of
M_TEMP, so we can tell exactly how much memory is used by Veriexec.

- And, most importantly, whitespace and indentation nits.

Built successfuly for amd64, i386, sparc, and sparc64. Tested on amd64.
 1.278  21-Nov-2006  elad printf() -> log() for Veriexec messages.
 1.277  17-Nov-2006  hannken Add specificdata support to mount points.

Welcome to NetBSD 4.99.4

Approved by: Jason Thorpe <thorpej@netbsd.org>
 1.276  01-Nov-2006  yamt remove some __unused from function parameters.
 1.275  31-Oct-2006  mjf Revert the changes I introduced trying to solve tmpfs' NFS export problem.
Requested by yamt@
 1.274  24-Oct-2006  mjf Add support to allow a file system to not permit being exported over NFS.

Approved by elad@ and wrstuden@
 1.273  20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.272  17-Oct-2006  christos according to the manual, the last argument of quotactl(2) is a void *,
not a caddr_t.
 1.271  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.270  13-Sep-2006  elad branches: 1.270.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.
 1.269  12-Sep-2006  elad Oops, add forgotten 'if'.

From Geoff Wing, thanks!
 1.268  08-Sep-2006  elad First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.267  08-Aug-2006  yamt branches: 1.267.2;
vfs_copyinfh_alloc: kludge for nfsv2 file handles.
 1.266  04-Aug-2006  yamt branches: 1.266.2;
sys___fhstatvfs140: update a comment.
 1.265  04-Aug-2006  yamt some filehandle syscall related changes.

- remove the support of variable-sized filehandle from compat version of
syscalls. (strictly speaking, it breaks abi. i don't think it's a problem
because this feature is short-lived and there are no affected in-tree
filesystems.)
- unify vfs_copyinfh_alloc and vfs_copyinfh_alloc_size.
- vfs_copyinfh_alloc_size: check fhsize strictly.
- reduce code duplication between compat and current syscalls.
 1.264  04-Aug-2006  yamt vfs_copyinfh_alloc_size: fix indent.
 1.263  31-Jul-2006  martin Make filehandles opaque to userland
 1.262  26-Jul-2006  elad sync kpi with docs, remove old comments
 1.261  26-Jul-2006  dogcow at the request of elad, as veriexec.h has returned, revert the changes
from 2006-07-25.
 1.260  25-Jul-2006  dogcow mechanically go through and
s,include "veriexec.h",include <sys/verified_exec.h>,
as the former has apparently gone away.
 1.259  24-Jul-2006  elad replace magic numbers for strict levels (0-3) with defines.
 1.258  24-Jul-2006  elad some fixes:
- adapt to NVERIEXEC in init_sysctl.c.
- we now need "veriexec.h" for NVERIEXEC.
- "opt_verified_exec.h" -> "opt_veriexec.h", and include it only where
it is needed.
 1.257  23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.256  22-Jul-2006  elad deprecate the VERIFIED_EXEC option; now we only need the pseudo-device to
enable it. while here, some config file tweaks.

tons of input from cube@ (thanks!) and okay blymn@.
 1.255  20-Jul-2006  christos PR/34043: mrt at notwork dot org: 3.99.22 kernel crashes at *_vptofh() called
from vfs_composefh_alloc() due to uninitialized "fidsize".
 1.254  19-Jul-2006  blymn Add destination file vnode to rename checking.
 1.253  17-Jul-2006  elad move the fileassoc_delete_file() call above the VOP_REMOVE() one, yamt@
says vp might not be valid after it.
 1.252  15-Jul-2006  martin FHANDLE_SIZE_MIN is an allowed value for the requested size (it happens
to be the old static size on 32bit archs, so the compat_30 code uses it)
 1.251  14-Jul-2006  elad okay, since there was no way to divide this to two commits, here it goes..

introduce fileassoc(9), a kernel interface for associating meta-data with
files using in-kernel memory. this is very similar to what we had in
veriexec till now, only abstracted so it can be used more easily by more
consumers.

this also prompted the redesign of the interface, making it work on vnodes
and mounts and not directly on devices and inodes. internally, we still
use file-id but that's gonna change soon... the interface will remain
consistent.

as a result, veriexec went under some heavy changes to conform to the new
interface. since we no longer use device numbers to identify file-systems,
the veriexec sysctl stuff changed too: kern.veriexec.count.dev_N is now
kern.veriexec.tableN.* where 'N' is NOT the device number but rather a
way to distinguish several mounts.

also worth noting is the plugging of unmount/delete operations
wrt/fileassoc and veriexec.

tons of input from yamt@, wrstuden@, martin@, and christos@.
 1.250  14-Jul-2006  yamt introduce filehandle size limits:

- FHANDLE_SIZE_MAX: refuse unreasonable size allocation, esp. when
it's a user-specified value.

- FHANDLE_SIZE_MIN: pad small filehandles with zero for compatibility.
XXX it might be better to push this into filesystem dependent code so that
new filesystems can choose smaller handles.
 1.249  14-Jul-2006  yamt - sys___getfh30:
- restructure code so that it doesn't try to allocate user-specified
unbound amount of memory.
- don't ignore copyout failure in the case of E2BIG.
- rename vfs_copyinfh to vfs_copyinfh_alloc for consistency.
 1.248  14-Jul-2006  yamt - fix buffer overruns in fhopen and friends.
- share some code among them.
 1.247  14-Jul-2006  yamt sys___getfh30: fix a vnode lock botch in rev.1.244.
 1.246  14-Jul-2006  yamt sys___getfh30: remove unnecessary casts.
 1.245  13-Jul-2006  martin fix typo
 1.244  13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.243  17-Jun-2006  yamt - introduce vfs_composefh() and use it where appropriate.
- fix lock/unlock mismatch in sys_getfh.
 1.242  14-May-2006  elad branches: 1.242.2; 1.242.4;
integrate kauth.
 1.241  10-May-2006  yamt don't allocate struct statvfs on stack as it's too large.
 1.240  04-May-2006  christos fhstat needs to be versioned too (for ino_t). Pointed out by Izumi Tsutsui
 1.239  27-Mar-2006  martin KASSERT that the returned file id length from VPTOFH is <= the
maximum allowed value (_VFS_MAXFIDSZ).
 1.238  01-Mar-2006  yamt branches: 1.238.2; 1.238.4; 1.238.6;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.237  12-Feb-2006  chs convert "magiclinks" from a per-fs mount option to a system-wide sysctl.
as discussed on tech-kern quite some time ago.
 1.236  04-Feb-2006  yamt for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)
 1.235  12-Dec-2005  elad branches: 1.235.2; 1.235.4; 1.235.6;
Catch up with ktrace-lwp merge.

While I'm here, stop using cur{lwp,proc}.
 1.234  11-Dec-2005  christos merge ktrace-lwp.
 1.233  01-Oct-2005  yamt change_utimes: use nanotime(9) rather than time(9).
 1.232  25-Sep-2005  jmmv Add some COMPAT_30 code to let old mountd binaries work after the NFS
exports rototill.
 1.231  23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.230  30-Aug-2005  jmmv Make all creation operations (mkdir, create, mknod and symlink) consistent
by changing the symlink one to set vap's vatype to VLNK. All the other three
already set vatype to the correct type. Note that, however, in the mkdir
case (and now symlink too) this is not strictly necessary.
 1.229  19-Aug-2005  elad Introduce veriexec_renamechk().

Rename policy:
- Strict levels 0, 1: Log renames of monitored files.
- Strict level 2: Prevent renames of monitored files.
- Strict level 3: Prevent renames.
 1.228  19-Aug-2005  christos 64 bit inode changes.
 1.227  05-Aug-2005  jmmv Fix some typos in comments.
 1.226  16-Jul-2005  christos defopt verified_exec.
 1.225  10-Jul-2005  cube The comment listing the arguments of fsync_range(2) wrongly described
"length" as an int. It is an off_t.
 1.224  09-Jul-2005  thorpej Move VFS extended attribute support to its own file.
 1.223  23-Jun-2005  thorpej branches: 1.223.2;
Implement expansion of special "magic" strings in symlinks into
system-specific values. Submitted by Chris Demetriou in Nov 1995 (!)
in PR kern/1781, modified only slighly by me.

This is enabled on a per-mount basis with the MNT_MAGICLINKS mount
flag. It can be enabled at mountroot() time by building the kernel
with the ROOTFS_MAGICLINKS option.

The following magic strings are supported by the implementation:

@machine value of MACHINE for the system
@machine_arch value of MACHINE_ARCH for the system
@hostname the system host name, as set with sethostname()
@domainname the system domain name, as set with setdomainname()
@kernel_ident the kernel config file name
@osrelease the releaes number of the OS
@ostype the name of the OS (always "NetBSD" for NetBSD)

Example usage:

mkdir /arch/i386/bin
mkdir /arch/sparc/bin
ln -s /arch/@machine_arch/bin /bin
 1.222  17-Jun-2005  elad More veriexec changes:

- Better organize strict level. Now we have 4 levels:
- Level 0, learning mode: Warnings only about anything that might've
resulted in 'access denied' or similar in a higher strict level.

- Level 1, IDS mode:
- Deny access on fingerprint mismatch.
- Deny modification of veriexec tables.

- Level 2, IPS mode:
- All implications of strict level 1.
- Deny write access to monitored files.
- Prevent removal of monitored files.
- Enforce access type - 'direct', 'indirect', or 'file'.

- Level 3, lockdown mode:
- All implications of strict level 2.
- Prevent creation of new files.
- Deny access to non-monitored files.

- Update sysctl(3) man-page with above. (date bumped too :)

- Remove FINGERPRINT_INDIRECT from possible fp_status values; it's no
longer needed.

- Simplify veriexec_removechk() in light of new strict level policies.

- Eliminate use of 'securelevel'; veriexec now behaves according to
its strict level only.
 1.221  05-Jun-2005  thorpej Use ANSI function decls.
 1.220  29-May-2005  christos - add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.219  20-Apr-2005  blymn Rototill of the verified exec functionality.
* We now use hash tables instead of a list to store the in kernel
fingerprints.
* Fingerprint methods handling has been made more flexible, it is now
even simpler to add new methods.
* the loader no longer passes in magic numbers representing the
fingerprint method so veriexecctl is not longer kernel specific.
* fingerprint methods can be tailored out using options in the kernel
config file.
* more fingerprint methods added - rmd160, sha256/384/512
* veriexecctl can now report the fingerprint methods supported by the
running kernel.
* regularised the naming of some portions of veriexec.
 1.218  06-Apr-2005  yamt sys_mount:
- reject attempts of MNT_GETARGS + other MNT_xxx.
- don't modify mnt_flags needlessly for MNT_GETARGS.
a stopgap fix for PR/29898.
 1.217  26-Feb-2005  perry branches: 1.217.2;
nuke trailing whitespace
 1.216  25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.215  24-Jan-2005  dbj branches: 1.215.2;
clear p->p_cwdi of exiting processes and
avoid dereferencing invalid p_cwdi in checkdirs
this fixes a race condition between exiting processes and mount
see discussion on tech-kern:
http://mail-index.netbsd.org/tech-kern/2004/10/04/0006.html
http://mail-index.netbsd.org/tech-kern/2004/10/08/0005.html
 1.214  02-Jan-2005  thorpej branches: 1.214.2;
Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.213  30-Nov-2004  christos Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
 1.212  01-Oct-2004  yamt introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.
 1.211  13-Sep-2004  jdolecek dostatvfs(): zero the statvfs structure before calling filesystem code, so that
unset parts would have defined value and not pass random parts of kernel stack
 1.210  01-Jul-2004  hannken Keep a pointer to the leaf mount. Needed for write gating where a
file system gets suspended and has layered mounts above it.

Welcome to 2.0G

Reviewed by: Bill Studenmund <wrstuden@netbsd.org>
 1.209  25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.208  02-May-2004  pk Add a mutex for mount point I/O and wait counters (i.e. the `mnt_wcnt',
`mnt_writeopcountupper' and `mnt_writeopcountlower' members).
 1.207  02-May-2004  pk sys_access: use crdup().
 1.206  21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.205  23-Mar-2004  junyoung branches: 1.205.2; 1.205.4; 1.205.6;
- Nuke __P().
- Drop trailing spaces.
 1.204  25-Feb-2004  dbj fix typo in comment s/MNT_LAXY/MNT_LAZY/
 1.203  22-Feb-2004  jdolecek mount(2): if vinvalbuf() fails, we must also vput() the mountpoint vnode

fixes stale vnode lock after attempt to mount something on a NTFS directory
 1.202  10-Dec-2003  hannken The file system snapshot pseudo driver.

Uses a hook in spec_strategy() to save data written from a mounted
file system to its block device and a hook in dounmount().

Not enabled by default in any kernel config.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.201  15-Nov-2003  thorpej Kernel portion of the fsync_range(2) system call. Written by Bill
Studenmund, and contributed by Wasabi Systems, Inc.
 1.200  09-Nov-2003  yamt - fix an use-after-free bug in /dev/fd/* handling.
specifically, don't keep a stale pointer in fd_ofiles.
it isn't needed anymore as fd allocation is now done using bitmaps.
- clean up dupfdopen() a little.
- don't call fd_used() unnecessarily.
 1.199  25-Oct-2003  kleink POSIX: when attempting to fdatasync(2) a file which is not open for
writing, fail with EBADF.
 1.198  15-Oct-2003  thorpej Remove the superuser check for MNT_FORCE on new mounts. It's been
pointed out by several people that it offers no real protection.
 1.197  15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.196  14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.195  13-Oct-2003  thorpej * Shuffle some flags to make it easier to visually compare lists
of flags.
* In the new mount case, make sure to clear the mount "action" flags.
* Allow MNT_FORCE to be set by root on new mounts.
 1.194  13-Sep-2003  jdolecek move dupfd from struct proc to struct lwp - it's per-LWP, not per-process; we
use curlwp where the lwp is not directly available, i.e. in device open
routines

briefly discussed on tech-kern
 1.193  11-Sep-2003  christos PR/15397: Jason Thorpe: directory operations on pathnames that refer to
directories and have trailing slashes should succeed. Ok'd by kjk.
Fix provided by enami.
 1.192  02-Sep-2003  drochner also feed getdents/readdir data to KTRACE
 1.191  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.190  29-Jun-2003  fvdl branches: 1.190.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.189  29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.188  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.187  16-May-2003  itojun use strlcpy. [fixed off-by-one in subr_prop.c]
 1.186  20-Apr-2003  yamt add simple_locks that are missed in the previous.
 1.185  16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.184  21-Mar-2003  dsl Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL
and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
 1.183  23-Feb-2003  pk Make updating a file's reference and use count MP-safe.
 1.182  14-Feb-2003  drochner fix typo in comment
 1.181  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.180  21-Jan-2003  christos step 2: fix sync so that it does not dereference null lwp and assign p properly.
 1.179  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.178  30-Oct-2002  kleink branches: 1.178.2;
Revert rev. 1.147, as per PR kern/17411.

While a hard link to a symbolic link is not ruled out by POSIX-2001,
the link(2) interface is to perform normal pathname resolution,
which includes the resolution of symbolic links.
 1.177  21-Sep-2002  christos Add special handling of VFS_GETARGS (similar to VFS_UPDATE) so that it
can be done non-root, and it does not affect the mount lists.
 1.176  04-Sep-2002  matt Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.
 1.175  26-Aug-2002  thorpej Fix a signed/unsigned comparison warning from GCC 3.3.
 1.174  11-May-2002  enami branches: 1.174.2;
Don't release the lock on mount point vnode so early when doing update mount.
Otherwise, race condition occurs (e.g., between mountd(8) and next mount(8)
when multiple update mount command is invoked from shell script).
 1.173  12-Nov-2001  lukem branches: 1.173.4;
add RCSIDs
 1.172  29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.171  11-Oct-2001  christos branches: 1.171.2;
Allow userland to pass MNT_IGNORE (from enami)
 1.170  08-Sep-2001  christos Don't trash the ref count of cred. It causes a memory leak.
 1.169  08-Sep-2001  christos Hijack the credentials used to evaluate access, to avoid a potential lwp
race by modifying the proc's credentials temporarily. From Bill Sommerfeld.
Thanks for forcing me to do this right :-)
 1.168  24-Jul-2001  assar branches: 1.168.2;
change vop_symlink and vop_mknod to return vpp (the created node)
refed, so that the caller can actually use it. update callers and
file systems that implement these vnode operations
 1.167  28-Jun-2001  jdolecek branches: 1.167.2;
Only define mountcompatnames[] for COMPAT_09 and COMPAT_43, make the
table actually match state in NetBSD 0.9 (checked against sys/mount.h
rev. 1.11).
The array is not to be modified from now on, comment updated accordingly.
 1.166  14-Jun-2001  thorpej Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
 1.165  16-Apr-2001  thorpej When unmounting a file system, acquire the syncer_lock before
vfs_busy'ing just before the dounmount() call. This is to avoid
sleeping with the mountlist_slock held -- but we must acquire
syncer_lock before vfs_busy because the syncer itself uses
syncer_lock -> vfs_busy locking order.
 1.164  27-Nov-2000  chs branches: 1.164.2;
Initial integration of the Unified Buffer Cache project.
 1.163  28-Sep-2000  enami Factor out common code to manupilate file flags into separate function
like others do.
 1.162  19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Small optimization to shutdown code: only take the syncer lock if
the FS actually used it.
 1.161  03-Aug-2000  thorpej Convert namei pathname buffer allocation to use the pool allocator.
 1.160  09-Jul-2000  mycroft When unmounting, make sure to free the syncer vnode so that it can be reused.
 1.159  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.158  19-Jun-2000  pooka branches: 1.158.2;
Correct situation where vnode was left hanging around when trying to
mount a filesystem with securelevel 2. A second mount-attempt left
everything completely frozen.

Fix by Bill Sommerfeld.
 1.157  15-Jun-2000  fvdl Enable passing of the MNT_SOFTDEP flag in the mount system call.
 1.156  17-Apr-2000  mrg branches: 1.156.2;
implement lchflags(2), which does the chflags(2) dance without following
symlinks, and thus can operate on symlinks. remove a bogus comment in
chflags(1) that claims symlinks do not have file flags.

XXX: todo -- make chflags(1) use lchflags(2) when given the right options.
 1.155  30-Mar-2000  augustss Get rid of register declarations.
 1.154  30-Mar-2000  simonb Delete redundant decl of dounmount(), it's in <sys/mount.h>.
 1.153  23-Mar-2000  thorpej Implement fdremove() which is used in place of all the code that
did the "fdp->fd_ofiles[fd] = 0" assignment; fdremove() make sure
the fd_freefiles hints stay in sync.

From OpenBSD.
 1.152  15-Mar-2000  fvdl In fdatasync, do not call bioops.io_fsync, since we're not flushing
metadata. If you do call it, there's actually a fair chance that it
will panic because its metadata dependencies were not cleared in
the VOP_FSYNC above (with FSYNC_DATAONLY).
 1.151  03-Mar-2000  mycroft Allow my disk to actually spin down using `-o async' again.

Note: This uses the same questionable logic as vfs_bio.c to check MNT_ASYNC.
Something needs to be done about this.
 1.150  16-Feb-2000  fvdl Introduce a sysctl to enable/disable if non-root users can mount filesystems.
Default: off.
 1.149  01-Feb-2000  assar (sys_open, sys_fhopen): remove declaration of vnops, now in
<sys/file.h>
 1.148  15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.147  05-Sep-1999  hubertf branches: 1.147.2; 1.147.4; 1.147.8;
Allow hardlinks to symlinks.

Reviewed by: Bill Studenmund, Klaus Klein
 1.146  31-Jul-1999  christos OpenBSD patch to prevent non-root users who own block or character devices
(typically ttys or ptys) from changing the flags on them. [Commit by cjs.]
 1.145  26-Jul-1999  wrstuden Add VLAYER to tests which will cause VOP_REVOKE to be called in sys_revoke().
 1.144  25-Jul-1999  thorpej Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.
 1.143  22-Jul-1999  thorpej Add proclist locking where appropriate (forgot to commit this file previously).
 1.142  04-Jul-1999  sommerfeld Fix kern/7906: race between unmount and getnewvnode()

mp->mnt_flags & MNT_MWAIT is replaced by mp->mnt_wcnt, and a new mount
flag MNT_GONE is created (reusing the same bit).

In insmntque(), add DIAGNOSTIC check to fail if the filesystem vnode
is being moved to is in the process of being unmounted.

getnewvnode() now protects the list of vnodes active on mp with
vfs_busy()/vfs_unbusy().

To avoid generating spurious errors during a doomed unmount, change
the "wait for unmount to finish" protocol between dounmount() and
vfs_busy(). In vfs_busy(), instead of only sleeping once, sleep until
either MNT_UNMOUNT is clear or MNT_GONE is set; also, maintain a count
of waiters in mp->mnt_wcnt so that dounmount() knows when it's safe to
free mp.

tested by running a "while :; do mount /d1; umount -f /d1; done" loop
against multiple find(1) processes.
 1.141  04-Jul-1999  sommerfeld fix typo in previous
 1.140  04-Jul-1999  sommerfeld Don't permanently lose the async bit on an failed unmount
 1.139  01-Jul-1999  wrstuden Make fhopen use FILE_UNUSE, and don't leak file descriptors.

Patch from Jason Thorpe. Also should close PR 7889 from
Assar Westerlund <assar@sics.se> describing this problem.
 1.138  30-Jun-1999  is Only check for ETXTBSY if the access would otherwise be allowed.
Needed to fix pr4134.
 1.137  29-Jun-1999  wrstuden Add fhopen, fhstat, fhstatfs syscalls. Also move getfh in from the nfs
syscall code.
 1.136  06-May-1999  christos Add NTFS for the compat names.
 1.135  05-May-1999  thorpej Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.
 1.134  30-Apr-1999  thorpej Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).
 1.133  31-Mar-1999  mycroft branches: 1.133.2; 1.133.4;
If copyout() fails, make sure to unbusy the mount point before returning.
 1.132  24-Mar-1999  mrg completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.131  22-Mar-1999  sommerfe Regen files based on changes to syscalls.master, vnode_if.src (latter
was changes to comments only, but..)
Build vfs_getcwd.c as standard part of kernel.
Add implementation of fchroot(), since two emulations already had it.
Call vn_isunder() in fchdir(), chroot(), and fchroot() to make it harder
to escape chroot().
 1.130  17-Mar-1999  bouyer Hinherit MNT_NOEXEC from the mount point. Without this a user can exec
arbitrary binaries by doing a user mount, even if the admin has carefully
setup his system to avoid arbitrary binaries execution.
 1.129  02-Mar-1999  fvdl Fill in vnodecovered in the mount structure before calling VFS_MOUNT anyway,
some things (e.g. unionfs) may depend on it. It's currently ok
for vnodecovered to be set already; it's not for v_mountedhere in
the vnode, though.

From John Darrow.

XXX should probably just extend VFS_MOUNT to take the vnode pointer as
an argument.
 1.128  28-Feb-1999  fvdl Use a SETRECURSE lock before calling VFS_MOUNT in the mount() system call,
since the lock may be taken again. This was the intention of the CANRECURSE
lock already there, but didn't work.

Only fill in the vnode<->mountpoint links (mountedhere and vnodecovered)
after VFS_MOUNT returned succesfully. It might happen that something called
from VFS_MOUNT mistook the vnode for an already successfully mounted on
one because of this.
 1.127  10-Dec-1998  christos defopt COMPAT_43
 1.126  01-Dec-1998  kenh Pass MNT_NODEVMTIME flag to lower VFS layer.
 1.125  14-Nov-1998  tls At securelevel >=2, don't allow new mounts, only allow change from rw to ro.
 1.124  13-Nov-1998  thorpej Add a couple more file systems to mountcompatnames[] (even though they
didn't exist in 4.3BSD or NetBSD 0.9) and always put the table into
the kernel. It's going to be needed for VFS sysctls.
 1.123  04-Aug-1998  perry Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)
 1.122  31-Jul-1998  perry fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.
 1.121  05-Jul-1998  jonathan branches: 1.121.2;
* defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.120  30-Jun-1998  thorpej Implement pread(2), pwrite(2), preadv(2), and pwritev(2).
 1.119  24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.118  22-Jun-1998  sommerfe defopt for options FIFO
 1.117  05-Jun-1998  kleink Per IEEE Std 1003.1b-1993, implement the fdatasync() system call which is
identical to fsync() with the expecption of not being required to synchronize
file status information.
 1.116  05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.115  27-Mar-1998  kleink Per X/Open CAE Spec Issue 5 Version 2, change the buffer size argument of
readlink() from type `int' to type `size_t'. This isn't an ABI change, since
the calling convention of our only LP64 platform (the Alpha) already promotes
this argument to a `long'.

This may not be the final action on this matter; readlink() still returns
an `int', which may change in a future revision of the standard.
 1.114  10-Mar-1998  kleink Move the permission check in change_owner() back to ufs_vnops::ufs_chown()
again - the facility required in this context would be a filesystem-specific
super-user determination, which is not available yet. Also, add some
clarification to a comment.
 1.113  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.112  14-Feb-1998  kleink * Factor out some permission-checking code from ufs_setattr() into
change_owner().
* Change the semantics of chown(), fchown() and lchown(): when requesting a
change of the owner of a file, clear the set-user-id bit; analogous behaviour
for group changes.
* Since the above is a violation of the semantics specified by POSIX and
X/Open, add corresponding compatibility syscalls: __posix_chown(),
__posix_fchown(), __posix_lchown(). (Neither fchown() nor lchown() is
specified by POSIX; the prefix is intended to reflect the semantics.)
* Rename posix_rename() to __posix_rename() to follow the above convention.
 1.111  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.110  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.109  03-Feb-1998  thorpej sys_mount(): Use vfs_getopsbyname() rather than groveling the vfssw[]
manualls.
 1.108  21-Dec-1997  kleink Update to last commit: do not pass the accounting flag to suser(), since the call does not actually *use* super-user privileges. Pointed out by Charles.
 1.107  21-Dec-1997  kleink Due to the feedback received, change chown(), fchown() and lchown() not to
clear the setgid and setuid bits if called by the superuser. Addresses
PR kern/4662.
 1.106  30-Oct-1997  enami Conditionalize the recognition of symbolic link permission by
per fs mount option `symperm'.
 1.105  20-Oct-1997  thorpej branches: 1.105.2;
Fix the shared library versioning snafu caused by the recent changes
to the stat(2) family and msync(2). This uses a primitive function
versioning scheme.

This reverts the libc shared library major version from 13 to 12, and
adds a few new interfaces to bring us to libc version 12.20.

From Frank van der Linden <fvdl@NetBSD.ORG>.
 1.104  19-Oct-1997  mycroft After conversion of the file flags, if neither FREAD nor FWRITE is set,
return EINVAL.
 1.103  19-Oct-1997  mycroft Update comment.
 1.102  11-Oct-1997  enami Check read permission of symbolic link in vfs layer, when doing readlink(2).
Suggested by der Mouse. Ok'ed by Jason R. Thorpe.
 1.101  10-Oct-1997  fvdl Add vn_readdir function for use in both the old getdirentries and
the new getdents(). Add getdents().
 1.100  09-Oct-1997  thorpej In sys_mount(), use vfs_getopsbyname() rather than using an explicit
reference to vfssw[].
 1.99  06-Oct-1997  thorpej If COMPAT_09 or COMPAT_43 are defined, include a table of "mount compatnames",
which maps the old file system index numbers to the new (well, since after
NetBSD 0.9) string-based method of finding a file system ops vector. Use
this table rather than assuming the ordering of the vfssw[] array when
emulating the old mount system call.
 1.98  03-Oct-1997  enami New function sys_lchmod(), sys_lchown() and sys_lutimes() to manipulate
symbolic links.
 1.97  03-Oct-1997  enami - New function change_mode() to set mode given a vnode.
- New function change_utimes() to set access and modification times
given a vnode.
- In the function sys_chmod() and sys_fchmod(), call change_mode().
- In the function sys_utimes() and sys_futimes(), call
change_utimes().
 1.96  03-Oct-1997  enami Reorder some piece of code;

In the function sys_utimes, do NDINIT() and namei() first.
In the function sys_futimes, do getvnode() first.
 1.95  03-Oct-1997  enami In the function sys_chmod and sys_utimes, use VOP_UNLOCK(vp) and vrele(vp)
instead of vput(vp).
 1.94  03-Oct-1997  enami Fold lone line to fit column < 80.
 1.93  03-Oct-1997  enami Cosmetic change;

(error = ...) -> (error = ...) != 0, like other place.
 1.92  25-Aug-1997  kleink Lseek(2) usage cleanup: the use of L_SET/L_INCR/L_XTND is deprecated,
use SEEK_SET/SEEK_CUR/SEEK_END instead.
 1.91  24-Jun-1997  fvdl branches: 1.91.4;
Invalidate publicly exported FS info when unmounting it locally.
 1.90  18-May-1997  kleink Add posix_rename() syscall.
 1.89  08-May-1997  mycroft Pass the vnode type to vaccess(), and use it when checking VEXEC. Make sure
that the mode bits passed to vaccess() and returned by foo_getattr() contain
only permission bits.
 1.88  08-May-1997  mycroft VEXEC -> VLOOKUP, as appropriate.
 1.87  08-May-1997  mycroft va_mode contains stat bits. Use S_IS[UG]ID rather than VS[UG]ID.
 1.86  30-Apr-1997  kleink * Make chown()/fchown() use a piece of common code to set ownership.
* Setting the ownership of a file now implies clearing its set-{group,user}-id
bits.
 1.85  11-Apr-1997  kleink Addendum to last commit: "simplify" usage of a vnode pointer.
 1.84  11-Apr-1997  kleink Use VOP_SEEK() in lseek(2).
 1.83  09-Apr-1997  kleink Back out POSIX.1 conformance change to lseek(2); this will be attended to
in a different way.
 1.82  07-Apr-1997  kleink Back out last change to rename(2) until a sane solution for the coexistence
of both BSD and POSIX semantics is available.
 1.81  04-Apr-1997  kleink Changed lseek(2): return EINVAL upon attempt to seek to negative offset.
 1.80  04-Apr-1997  kleink Converted rename(2) to proper POSIX.1 behavior: if "from" and "to"
are links to the same file, do nothing. This also eliminates the
previous (and incorrect) check, which was far more complicated.
 1.79  13-Mar-1997  fvdl Add missing part of MNT_NOATIME commit: add it to the flags that can
be set by the mount system call.
 1.78  22-Feb-1997  fvdl Implement changes to make fix for NQNFS and MFS unmounting (race conditions)
work. Not quite as good as with the Lite2 merges, but it'll do until then.

* dounmount() expects to be called with the mountpoint marked busy
* all callers of dounmount() thus make the call themselves
* if a filesystem was being unmounted, and we're woken up in vfs_busy(),
don't reference the mountpoint struct pointer, as it has very probably
been freed.
 1.77  20-Feb-1997  mikel sync filesystems in reverse order. suggested originally by Jim Rees
<rees@citi.umich.edu>, with some updating by Greg Hudson <ghudson@mit.edu>.
 1.76  13-Feb-1997  tls sync needs to clean VM objects backed by vnode pagers
 1.75  10-Feb-1997  fvdl If the target for a rename() call exists, it will be removed. So, don't
leave any pages around (i,e, insert a vnode_pager_uncache()).
 1.74  22-Dec-1996  cgd branches: 1.74.4;
* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
 1.73  23-Oct-1996  cgd permit MNT_NOCOREDUMP as a generic mount flag.
 1.72  21-Oct-1996  jtc Return ESPIPE when filedes is associated with a FIFO.
 1.71  23-Apr-1996  mycroft branches: 1.71.4;
Implement futimes().
 1.70  22-Mar-1996  thorpej Move an #ifdef FIFO so this compiles on a SPARC (-Wall) if FIFO is not
defined.
 1.69  18-Mar-1996  fvdl Remove previously introduced bug: always make sure mappings of a removed
file don't stick around.
 1.68  09-Feb-1996  christos More proto fixes
 1.67  09-Feb-1996  mycroft Rearrange the locking in sys_unlink(), more like nfsrv_remove().
 1.66  09-Feb-1996  mycroft Fix vop_link, vop_symlink, and vop_remove semantics in several ways:
* Change the argument names to vop_link so they actually make sense.
* Implement vop_link and vop_symlink for all file systems, so they do proper
cleanup.
* Require the file system to decide whether or not linking and unlinking of
directories is allowed, and disable it for all current file systems.
 1.65  08-Feb-1996  mycroft No need for LOCKPARENT in sys_lstat(), and eliminate dead variables.
 1.64  07-Feb-1996  jtc Revert to sane symlink semantics. This something we should have done
long ago. Fixes many PRs.
 1.63  04-Feb-1996  christos First pass at prototyping
 1.62  02-Feb-1996  mycroft Do the previous change a little differently.
 1.61  01-Feb-1996  jtc Rename struct timespec fields to conform to POSIX.1b
 1.60  30-Jan-1996  mycroft Add a vnode** argument to getvnode(), prototype it, and make it return
EBADF if the file descriptor has been revoked.
 1.59  11-Nov-1995  mycroft ffs -> ufs
 1.58  07-Nov-1995  gwr Make sys_mount accept "ufs" as an alias for "ffs"
 1.57  07-Oct-1995  mycroft branches: 1.57.2;
Prefix names of system call implementation functions with `sys_'.
 1.56  19-Sep-1995  thorpej Make system calls conform to a standard prototype and bring those
prototypes into scope.
 1.55  24-Jun-1995  christos Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).
 1.54  18-Jun-1995  cgd don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.53  01-Jun-1995  jtc Moved egid credential from cr_groups[0] to new field cr_gid. POSIX.1
requires that sgid executables and the setuid() syscall *not* change
the supplemental group list.
 1.52  10-May-1995  christos tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed
 1.51  09-Mar-1995  mycroft copy*str() should use size_t.
 1.50  08-Mar-1995  cgd use NULL rather than casted zero
 1.49  05-Mar-1995  fvdl Two more "|| defined(COMPAT_LINUX)" that I somehow missed first time around.
 1.48  05-Mar-1995  fvdl Extended a couple of defines with "|| defined(COMPAT_LINUX)" to make
things compile without requiring COMPAT_43 and/or COMPAT_09.
 1.47  18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.46  15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.45  14-Dec-1994  mycroft Revert open() completely.
 1.44  14-Dec-1994  mycroft Revert dup handling. Remove extra arg to vn_open().
 1.43  14-Dec-1994  mycroft Sync with CSRG.
 1.42  13-Dec-1994  mycroft LEASE_CHECK -> VOP_LEASE
 1.41  13-Dec-1994  mycroft Minor changes.
 1.40  04-Dec-1994  mycroft Abstract out the code to maintain fd_lastfile. Remove the old dup() compatibility
kluge. Rearrange fdopen() handling. Make a common function to handle closing
a particular file descriptor in a process. Some other cleanup.
 1.39  18-Nov-1994  christos Don't VOP_UNLOCK the vnode on a cloning operation. vput() will do it for
us.
 1.38  17-Nov-1994  christos Added ifdef COMPAT_SVR4 to the kernel compat code needed.
 1.37  14-Nov-1994  christos added extra argument in vn_open and VOP_OPEN to allow cloning devices
 1.36  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.35  20-Oct-1994  cgd update for new syscall args description mechanism
 1.34  22-Sep-1994  mycroft Maintain vfs reference counts.
 1.33  15-Aug-1994  mycroft Need ostat() and olstat() for iBCS2 syscall conversion.
 1.32  13-Aug-1994  mycroft Fix a problem in sync() where we might keep a stale pointer to the next mount
entry.
 1.31  29-Jun-1994  cgd branches: 1.31.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.30  22-Jun-1994  mycroft Make ogetdirentries() if COMPAT_HPUX.
 1.29  16-Jun-1994  mycroft Update to union mount code from JSP.
 1.28  08-Jun-1994  mycroft Update to 4.4-Lite fs code.
 1.27  18-May-1994  cgd put sync printing in one place
 1.26  07-May-1994  cgd stub pathconf, kill some spaces
 1.25  04-May-1994  cgd expand the rlimit struct, kill last vestiges of off_t bogosity.
 1.24  29-Apr-1994  cgd kill syscall name aliases. no user-visible changes
 1.23  25-Apr-1994  cgd some prototype cleanup, eliminate/replace bogus types (e.g. quad and
u_quad) -> use better types (e.g. quad_t & u_quad_t in inodes),
some cleanup.
 1.22  21-Apr-1994  cgd Convert mount, vnode, and buf structs to use <sys/queue.h>. Also,
some knf and structure frobbing to do along with it.
 1.21  16-Apr-1994  cgd start to phase out temp. off_t syscalls
 1.20  16-Apr-1994  cgd slightly loosen lseek restriction
 1.19  14-Apr-1994  cgd fs types are names now; accompanying changes.
 1.18  07-Apr-1994  cgd if MNT_USER is set, let fs authenticate unmount
 1.17  02-Apr-1994  cgd frob arguments a little bit
 1.16  27-Mar-1994  cgd expand uid_t/gid_t/off_t
 1.15  01-Feb-1994  mycroft Fix that last bug correctly.
 1.14  01-Feb-1994  pk Replace a bogus pointer-dereference with something that at least *looks*
more sensible.
 1.13  13-Jan-1994  cgd fix utimes() to deal with NULL timeval ptr
 1.12  04-Jan-1994  cgd add support for union and loopback mounts, from jsp
 1.11  04-Jan-1994  cgd generalize dupfdopen() to allow dups and moves. from jsp
 1.10  18-Dec-1993  mycroft Canonicalize all #includes.
 1.9  27-Oct-1993  cgd BSDI official patch #15:
SUMMARY:
"panic: vrele: null vp", the problem seems to be that two renames are
moving the same source, and the second one can't do it.
ALSO:
in sync, check that rootfs is non-null before using it.
 1.8  07-Sep-1993  ws branches: 1.8.2;
Changes to VFS readdir semantics
NFS changes for better cookie support
ISOFS changes for better Rockridge support and support for generation numbers
 1.7  03-Aug-1993  mycroft Cosmetic change to VOP_ADVLOCK() fix.
 1.6  02-Aug-1993  mycroft Collapse a bunch of `if (a & x) b |= x; else b &= ~x;' statements.
Whoever wrote this fugly code must've been on drugs.
 1.5  01-Aug-1993  mycroft Add RCS identifiers (this time on the correct side of the branch), and
incorporate recent changes in netbsd-0-9 branch.
 1.4  18-Jul-1993  mycroft branches: 1.4.2;
Nuke a kluge from Net/2. The argument list ocreat() creates for open() can
now be a struct open_args; no need to redefine the structure.
 1.3  15-Jul-1993  cgd gcc2 cleanup, and break args out of procedure def'ns
 1.2  20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.4  01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.1  02-Aug-1993  cgd fix from Charles Hannum for the "every process trying to ilock() /"
problem, which would show up because calls to VOP_ADVLOCK() were
being made with the vnode locked.
 1.8.2.1  14-Nov-1993  mycroft Canonicalize all #includes.
 1.31.2.3  06-Oct-1994  mycroft Update from trunk.
 1.31.2.2  15-Aug-1994  mycroft update from trunk
 1.31.2.1  13-Aug-1994  mycroft update from trunk
 1.57.2.2  02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.57.2.1  07-Nov-1995  gwr From HEAD: Make sys_mount accept "ufs" as an alias for "ffs"
 1.71.4.1  11-Dec-1996  mycroft From trunk:
Seeking on a FIFO should return ESPIPE.
 1.74.4.1  12-Mar-1997  is Merge in changes from Trunk
 1.91.4.2  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.91.4.1  28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.105.2.1  31-Oct-1997  mellon Pull rev 1.106 up from trunk (enami)
 1.121.2.1  08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.133.4.5  02-Aug-1999  thorpej Update from trunk.
 1.133.4.4  11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.133.4.3  02-Jul-1999  thorpej Add UBC glue to sys_fhopen() (because it has some vn_open() stuff inline).
 1.133.4.2  01-Jul-1999  thorpej Sync w/ -current.
 1.133.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.133.2.2  27-Jun-2000  he Pull up revision 1.158 (requested by pooka):
Do not leave a vnode around when trying (and failing) to mount
a file system with securelevel 2. A second attempt would freeze
the system.
 1.133.2.1  01-Feb-2000  he Pull up revision 1.149 (via patch, requested by assar):
Move the declaration of `vnops' to a header file, for the
benefit of LKMs.
 1.147.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.147.4.5  26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.147.4.4  23-Oct-1999  fvdl Do previous better; add VT_VFS tag check to DIAGNOSTIC case in insmntqueue.
 1.147.4.3  23-Oct-1999  fvdl Clear the MNT_UNMOUNT flag before calling vfs_allocate_syncvnode. It
will cause insmntqueue to be called, which will cause a panic.
 1.147.4.2  21-Oct-1999  fvdl Add workaround hacks to enable the softdep code to call getnewvnode()
when a filesystem is being unmounted. The problem is that the softdep
code stored inode numbers in the worklist structures, and does not
use vnodes. So VFS_VGET must be used to get a vnode during the final
flush stages, and this can call getnewvnode(), resulting in
a vfs_busy() + MNT_UNMOUNT hang.

I've tried to make the softdep code use vnodes, but that's a pain,
since it gets called at points were vnode ops are dangerous (i.e.
interrupt context, and uncertainty whether a vnode is locked, etc).

This is all icky stuff, but it does get things much closer to a
working state..
 1.147.4.1  19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.147.2.3  21-Apr-2001  bouyer Sync with HEAD
 1.147.2.2  08-Dec-2000  bouyer Sync with HEAD.
 1.147.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.156.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.158.2.3  06-Jun-2002  he Pull up revision 1.174 (requested by enami):
Do not release the lock on the mount point vnode too early when
doing an update mount, to avoid race conditionn and eventual
panic.
 1.158.2.2  14-Dec-2000  he Pull up revision 1.162 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.158.2.1  27-Jul-2000  mycroft Approved by thorpej:
Free the syncer vnode at unmount time.

syssrc/sys/kern/vfs_syscalls.c 1.159 -> 1.160
syssrc/sys/miscfs/syncfs/sync_subr.c 1.3 -> 1.4
syssrc/sys/miscfs/syncfs/sync_vnops.c 1.2 -> 1.3
syssrc/sys/miscfs/syncfs/syncfs.h 1.3 -> 1.4
syssrc/sys/sys/vnode.h 1.82 -> 1.83
 1.164.2.14  11-Nov-2002  nathanw Catch up to -current
 1.164.2.13  18-Oct-2002  nathanw Catch up to -current.
 1.164.2.12  17-Sep-2002  nathanw Catch up to -current.
 1.164.2.11  27-Aug-2002  nathanw Catch up to -current.
 1.164.2.10  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.164.2.9  20-Jun-2002  nathanw Catch up to -current.
 1.164.2.8  29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.164.2.7  14-Nov-2001  nathanw Catch up to -current.
 1.164.2.6  22-Oct-2001  nathanw Catch up to -current.
 1.164.2.5  25-Sep-2001  nathanw LWPify new code.
 1.164.2.4  21-Sep-2001  nathanw Catch up to -current.
 1.164.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.164.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.164.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.167.2.6  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.167.2.5  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.167.2.4  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.167.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.167.2.2  13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.167.2.1  03-Aug-2001  lukem update to -current
 1.168.2.3  01-Oct-2001  fvdl Catch up with -current.
 1.168.2.2  26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.168.2.1  18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.171.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.173.4.1  11-Mar-2002  thorpej Make syncer_lock an adaptive mutex and rename it to syncer_mutex.
 1.174.2.1  29-Aug-2002  gehenna catch up with -current.
 1.178.2.2  19-Dec-2002  gmcgarry crget() + memcpy -> crdup(). From David Laight.
 1.178.2.1  18-Dec-2002  gmcgarry Merge pcred and ucred, and poolify. TBD: check backward compatibility
and factor-out some higher-level functionality.
 1.190.2.11  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.190.2.10  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.190.2.9  04-Feb-2005  skrll Sync with HEAD.
 1.190.2.8  17-Jan-2005  skrll Sync with HEAD.
 1.190.2.7  18-Dec-2004  skrll Sync with HEAD.
 1.190.2.6  19-Oct-2004  skrll Sync with HEAD
 1.190.2.5  21-Sep-2004  skrll Fix the sync with head I botched.
 1.190.2.4  18-Sep-2004  skrll Sync with HEAD.
 1.190.2.3  24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.190.2.2  03-Aug-2004  skrll Sync with HEAD
 1.190.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.205.6.1  06-Feb-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11015):
sys/kern/vfs_syscalls.c: revision 1.293 via patch
Fix issue noted by Ilja van Sprundel and disclosed at 23C3.

Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.

Fix suggested by Elad, hand-typed by me.
 1.205.4.1  06-Feb-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11015):
sys/kern/vfs_syscalls.c: revision 1.293 via patch
Fix issue noted by Ilja van Sprundel and disclosed at 23C3.

Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.

Fix suggested by Elad, hand-typed by me.
 1.205.2.1  06-Feb-2007  bouyer Pull up following revision(s) (requested by adrianp in ticket #11015):
sys/kern/vfs_syscalls.c: revision 1.293 via patch
Fix issue noted by Ilja van Sprundel and disclosed at 23C3.

Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.

Fix suggested by Elad, hand-typed by me.
 1.214.2.1  29-Apr-2005  kent sync with -current
 1.215.2.2  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.215.2.1  12-Feb-2005  yamt sync with head.
 1.217.2.13  26-Jun-2007  ghen Pull up following revision(s) (requested by blymn in ticket #1471):
sys/kern/kern_verifiedexec.c: patch
sys/kern/vfs_syscalls.c: patch
sys/sys/verified_exec.h: patch
Prevent users to rename a file to a veriexec protected file and to run
unfingerprinted files at strict level two or above.
 1.217.2.12  26-Jun-2007  ghen Revert #1471 in favour of #1751.
 1.217.2.11  03-Mar-2007  bouyer Pull up following revision(s) (requested by wrstuden in ticket #1616):
sys/kern/vfs_syscalls.c: revision 1.293 via patch
Fix issue noted by Ilja van Sprundel and disclosed at 23C3.
Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.
Fix suggested by Elad, hand-typed by me.
 1.217.2.10  02-Nov-2006  tron Pull up following revision(s) (requested by elad in ticket #1471):
sys/kern/vfs_syscalls.c: revision 1.254 via patch
sys/kern/kern_verifiedexec.c: revision 1.58 via patch
Add destination file vnode to rename checking.
 1.217.2.9  20-Jan-2006  riz branches: 1.217.2.9.2;
Back out tickets 490, 559, and 560, which added "magic symlinks", at
the request of chs@ (thorpej@ concurs), as there is consensus that
this should be changed to a system-wide tunable, rather than a mount
option.
 1.217.2.8  29-Dec-2005  riz Pull up following revision(s) (requested by thorpej in ticket #490):
lib/libc/sys/mount.2: revision 1.33
sys/sys/systm.h: revision 1.179
sys/sys/fstypes.h: revision 1.4
include/mntopts.h: revision 1.6
sys/conf/newvers.sh: revision 1.41
sys/kern/vfs_syscalls.c: revision 1.223
sys/conf/files: revision 1.720
sys/kern/vfs_lookup.c: revision 1.61
share/man/man7/symlink.7: revision 1.7
sbin/mount/mount.8: revision 1.47
sys/kern/init_main.c: revision 1.248 via patch
share/man/man4/options.4: revision 1.280 via patch
Implement expansion of special "magic" strings in symlinks into
system-specific values. Submitted by Chris Demetriou in Nov 1995 (!)
in PR kern/1781, modified only slighly by me.
This is enabled on a per-mount basis with the MNT_MAGICLINKS mount
flag. It can be enabled at mountroot() time by building the kernel
with the ROOTFS_MAGICLINKS option.
The following magic strings are supported by the implementation:
@machine value of MACHINE for the system
@machine_arch value of MACHINE_ARCH for the system
@hostname the system host name, as set with sethostname()
@domainname the system domain name, as set with setdomainname()
@kernel_ident the kernel config file name
@osrelease the releaes number of the OS
@ostype the name of the OS (always "NetBSD" for NetBSD)
Example usage:
mkdir /arch/i386/bin
mkdir /arch/sparc/bin
ln -s /arch/@machine_arch/bin /bin
 1.217.2.7  08-Sep-2005  tron branches: 1.217.2.7.2;
Apply patch (requested by elad in ticket #740):
Defopt VERIFIED_EXEC.
 1.217.2.6  02-Sep-2005  tron Apply patch (requested by elad in ticket #709):
Implements the rename policy. Implications per strict level:
0, 1: Log renames of monitored files.
2: Prevent renames of monitored files.
3: Prevent renames.
 1.217.2.5  23-Aug-2005  tron Backout ticket 685. It causes build failures.
 1.217.2.4  23-Aug-2005  tron Pull up revision 1.226 (requested by elad in ticket #685):
defopt verified_exec.
 1.217.2.3  02-Jul-2005  tron Pull up revision 1.222 (requested by elad in ticket #487):
More veriexec changes:
- Better organize strict level. Now we have 4 levels:
- Level 0, learning mode: Warnings only about anything that might've
resulted in 'access denied' or similar in a higher strict level.
- Level 1, IDS mode:
- Deny access on fingerprint mismatch.
- Deny modification of veriexec tables.
- Level 2, IPS mode:
- All implications of strict level 1.
- Deny write access to monitored files.
- Prevent removal of monitored files.
- Enforce access type - 'direct', 'indirect', or 'file'.
- Level 3, lockdown mode:
- All implications of strict level 2.
- Prevent creation of new files.
- Deny access to non-monitored files.
- Update sysctl(3) man-page with above. (date bumped too :)
- Remove FINGERPRINT_INDIRECT from possible fp_status values; it's no
longer needed.
- Simplify veriexec_removechk() in light of new strict level policies.
- Eliminate use of 'securelevel'; veriexec now behaves according to
its strict level only.
 1.217.2.2  10-Jun-2005  tron Pull up revision 1.219 (requested by elad in ticket #389):
Rototill of the verified exec functionality.
* We now use hash tables instead of a list to store the in kernel
fingerprints.
* Fingerprint methods handling has been made more flexible, it is now
even simpler to add new methods.
* the loader no longer passes in magic numbers representing the
fingerprint method so veriexecctl is not longer kernel specific.
* fingerprint methods can be tailored out using options in the kernel
config file.
* more fingerprint methods added - rmd160, sha256/384/512
* veriexecctl can now report the fingerprint methods supported by the
running kernel.
* regularised the naming of some portions of veriexec.
 1.217.2.1  13-Apr-2005  tron Pull up revision 1.218 (requested by yamt in ticket #142):
sys_mount:
- reject attempts of MNT_GETARGS + other MNT_xxx.
- don't modify mnt_flags needlessly for MNT_GETARGS.
a stopgap fix for PR/29898.
 1.217.2.9.2.2  23-Jun-2007  ghen Pull up following revision(s) (requested by blymn in ticket #1471):
sys/kern/kern_verifiedexec.c: patch
sys/kern/vfs_syscalls.c: patch
Prevent users to rename a file to a veriexec protected file and to run
unfingerprinted files at strict level two or above.
 1.217.2.9.2.1  03-Mar-2007  bouyer Pull up following revision(s) (requested by wrstuden in ticket #1616):
sys/kern/vfs_syscalls.c: revision 1.293 via patch
Fix issue noted by Ilja van Sprundel and disclosed at 23C3.
Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.
Fix suggested by Elad, hand-typed by me.
 1.217.2.7.2.2  23-Jun-2007  ghen Pull up following revision(s) (requested by blymn in ticket #1471):
sys/kern/kern_verifiedexec.c: patch
sys/kern/vfs_syscalls.c: patch
Prevent users to rename a file to a veriexec protected file and to run
unfingerprinted files at strict level two or above.
 1.217.2.7.2.1  03-Mar-2007  bouyer Pull up following revision(s) (requested by wrstuden in ticket #1616):
sys/kern/vfs_syscalls.c: revision 1.293 via patch
Fix issue noted by Ilja van Sprundel and disclosed at 23C3.
Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.
Fix suggested by Elad, hand-typed by me.
 1.223.2.9  24-Mar-2008  yamt sync with head.
 1.223.2.8  04-Feb-2008  yamt sync with head.
 1.223.2.7  21-Jan-2008  yamt sync with head
 1.223.2.6  07-Dec-2007  yamt sync with head
 1.223.2.5  27-Oct-2007  yamt sync with head.
 1.223.2.4  03-Sep-2007  yamt sync with head.
 1.223.2.3  26-Feb-2007  yamt sync with head.
 1.223.2.2  30-Dec-2006  yamt sync with head.
 1.223.2.1  21-Jun-2006  yamt sync with head.
 1.235.6.2  01-Jun-2006  kardel Sync with head.
 1.235.6.1  22-Apr-2006  simonb Sync with head.
 1.235.4.1  09-Sep-2006  rpaulo sync with head
 1.235.2.2  18-Feb-2006  yamt sync with head.
 1.235.2.1  31-Dec-2005  yamt uio_segflg/uio_lwp -> uio_vmspace.
 1.238.6.2  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.238.6.1  28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.238.4.5  11-May-2006  elad sync with head
 1.238.4.4  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.238.4.3  19-Apr-2006  elad sync with head.
 1.238.4.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.238.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.238.2.5  14-Sep-2006  yamt sync with head.
 1.238.2.4  11-Aug-2006  yamt sync with head
 1.238.2.3  26-Jun-2006  yamt sync with head.
 1.238.2.2  24-May-2006  yamt sync with head.
 1.238.2.1  01-Apr-2006  yamt sync with head.
 1.242.4.1  13-Jul-2006  gdamore Merge from HEAD.
 1.242.2.1  19-Jun-2006  chap Sync with head.
 1.266.2.1  16-Aug-2006  tron Pull up following revision(s) (requested by yamt in ticket #24):
sys/kern/vfs_syscalls.c: revision 1.267
vfs_copyinfh_alloc: kludge for nfsv2 file handles.
 1.267.2.6  09-Feb-2007  ad Sync with HEAD.
 1.267.2.5  01-Feb-2007  ad Sync with head.
 1.267.2.4  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.267.2.3  12-Jan-2007  ad Sync with head.
 1.267.2.2  18-Nov-2006  ad Sync with head.
 1.267.2.1  11-Sep-2006  ad - Convert some locks to mutexes and RW locks.
- Use the proclist_lock to protect pgrps and sessions in some places.
 1.270.2.3  18-Dec-2006  yamt sync with head.
 1.270.2.2  10-Dec-2006  yamt sync with head.
 1.270.2.1  22-Oct-2006  yamt sync with head
 1.279.2.6  20-Mar-2011  bouyer Pull up following revision(s) (requested by dholland in ticket #1417):
sys/kern/vfs_syscalls.c: revision 1.415 via patch
Check for bogus flags to access() up front. Otherwise we end up
calling VOP_ACCESS with flags 0 and something asserts deep in the
bowels of kauth. PR 44648 from Taylor Campbell. (I moved the check
earlier relative to the suggested patch.)
Pullup candidate.
 1.279.2.5  28-Feb-2007  pavel branches: 1.279.2.5.6;
Pull up following revision(s) (requested by pooka in ticket #480):
sys/kern/vfs_syscalls.c: revision 1.303
avoid lock leak in error branch of sys_fchdir()
thanks to Tom Spindler and Greg Oster in helping find the cure
 1.279.2.4  17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.279.2.3  11-Feb-2007  tron Pull up following revision(s) (requested by elad in ticket #412):
sys/kern/vfs_syscalls.c: revision 1.299
Initialize pathname_t objects to NULL.
 1.279.2.2  21-Jan-2007  bouyer Pull up following revision(s) (requested by wrstuden in ticket #374):
sys/kern/vfs_syscalls.c: revision 1.293
Fix issue noted by Ilja van Sprundel and disclosed at 23C3.
Make sure we always FILE_UNUSE the file. To make it easier, exit
via a new "out:" exit path that does so, setting error beforehand.
Fix suggested by Elad, hand-typed by me.
 1.279.2.1  03-Jan-2007  tron Pull up following revision(s) (requested by elad in ticket #304):
sys/kern/vfs_syscalls.c: revision 1.282
sys/kern/vfs_lookup.c: revision 1.76
sys/sys/namei.h: revision 1.47
PR/35278: YAMAMOTO Takashi: veriexec sometimes feeds user va to log(9)
Introduce the (intentionally undocumented) pathname_get(), pathname_path(),
and pathname_put(), to deal with allocating and copying of pathnames from
either kernel- or user-space.
 1.279.2.5.6.1  20-Mar-2011  bouyer Pull up following revision(s) (requested by dholland in ticket #1417):
sys/kern/vfs_syscalls.c: revision 1.415 via patch
Check for bogus flags to access() up front. Otherwise we end up
calling VOP_ACCESS with flags 0 and something asserts deep in the
bowels of kauth. PR 44648 from Taylor Campbell. (I moved the check
earlier relative to the suggested patch.)
Pullup candidate.
 1.300.2.5  17-May-2007  yamt sync with head.
 1.300.2.4  07-May-2007  yamt sync with head.
 1.300.2.3  15-Apr-2007  yamt sync with head.
 1.300.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.300.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.306.4.1  11-Jul-2007  mjf Sync with head.
 1.306.2.16  09-Oct-2007  ad Sync with head.
 1.306.2.15  03-Oct-2007  ad - Don't do proc_vmspace_getref() in dofileread() and friends. They only
ever access the caller's vmspace so it's not going to go away. Instead
just use curproc->p_vmspace. Fixes high lock contention during file I/O
by multithreaded processes. Don't pass in lwp_t *l, it's confusing.

- Drain cleaned vnodes from mountpoints just before checking to see
if there are danglers. Cleaned vnodes now stick around on mountpoint
lists until reused, as it's too expensive to pull them off earlier.
 1.306.2.14  20-Aug-2007  ad Sync with HEAD.
 1.306.2.13  20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.306.2.12  15-Aug-2007  yamt don't destroy mnt_mutex twice.
 1.306.2.11  29-Jul-2007  ad Add vfs_destroy() to free mount structures. The specificdata_ref was being
leaked.
 1.306.2.10  15-Jul-2007  ad Sync with head.
 1.306.2.9  15-Jul-2007  ad Sync with head.
 1.306.2.8  17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.306.2.7  09-Jun-2007  ad Sync with head.
 1.306.2.6  08-Jun-2007  ad Sync with head.
 1.306.2.5  13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.306.2.4  13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.306.2.3  10-Apr-2007  ad Sync with head.
 1.306.2.2  21-Mar-2007  ad - Put a lock around the proc's CWD info (work in progress).
- Replace some more simplelocks.
- Make lbolt a condvar.
 1.306.2.1  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.322.2.2  03-Sep-2007  skrll Sync with HEAD.
 1.322.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.324.4.2  31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.324.4.1  31-Jul-2007  pooka file vfs_syscalls.c was added on branch matt-mips64 on 2007-07-31 21:14:22 +0000
 1.324.2.7  09-Dec-2007  jmcneill Sync with HEAD.
 1.324.2.6  03-Dec-2007  joerg Sync with HEAD.
 1.324.2.5  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.324.2.4  28-Oct-2007  joerg Sync with HEAD.
 1.324.2.3  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.324.2.2  03-Sep-2007  jmcneill Sync with HEAD.
 1.324.2.1  16-Aug-2007  jmcneill Sync with HEAD.
 1.325.2.3  23-Mar-2008  matt sync with HEAD
 1.325.2.2  09-Jan-2008  matt sync with HEAD
 1.325.2.1  06-Nov-2007  matt sync with HEAD
 1.327.2.1  14-Oct-2007  yamt sync with head.
 1.329.2.1  13-Nov-2007  bouyer Sync with HEAD
 1.331.2.3  18-Feb-2008  mjf Sync with HEAD.
 1.331.2.2  27-Dec-2007  mjf Sync with HEAD.
 1.331.2.1  08-Dec-2007  mjf Sync with HEAD.
 1.333.2.5  26-Dec-2007  ad Sync with head.
 1.333.2.4  15-Dec-2007  ad sys_sync: take kernel_lock around VFS_SYNC().
 1.333.2.3  09-Dec-2007  ad Fix error in previous.
 1.333.2.2  09-Dec-2007  ad do_sys_mount: use vn_setrecurse(), not LK_SETRECURSE.
 1.333.2.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.334.4.3  10-Jan-2008  bouyer Sync with HEAD
 1.334.4.2  08-Jan-2008  bouyer Sync with HEAD
 1.334.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.345.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.345.6.4  28-Sep-2008  mjf Sync with HEAD.
 1.345.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.345.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.345.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.348.4.3  17-Jun-2008  yamt sync with head.
 1.348.4.2  04-Jun-2008  yamt sync with head
 1.348.4.1  18-May-2008  yamt sync with head.
 1.348.2.5  27-Dec-2008  christos merge with head.
 1.348.2.4  20-Nov-2008  christos merge with head.
 1.348.2.3  09-Nov-2008  christos fix fhstat
make major and minor ull
 1.348.2.2  01-Nov-2008  christos Sync with head.
 1.348.2.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.350.2.9  09-Oct-2010  yamt sync with head
 1.350.2.8  26-Sep-2010  yamt locking changes
 1.350.2.7  11-Aug-2010  yamt sync with head.
 1.350.2.6  11-Mar-2010  yamt sync with head
 1.350.2.5  19-Aug-2009  yamt sync with head.
 1.350.2.4  18-Jul-2009  yamt sync with head.
 1.350.2.3  24-Jun-2009  yamt lock vnode when calling VOP_GETATTR because there's no reasonable way for
an implementation of VOP_GETATTR to prevent the vnode from being revoked.
 1.350.2.2  04-May-2009  yamt sync with head.
 1.350.2.1  16-May-2008  yamt sync with head.
 1.359.2.6  10-Oct-2008  skrll Sync with HEAD.
 1.359.2.5  24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.359.2.4  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.359.2.3  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.359.2.2  14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.359.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.365.2.4  18-Jul-2008  simonb "mount -u -o log" works now - remove the code that explicity disabled
this.
 1.365.2.3  27-Jun-2008  simonb Sync with head.
 1.365.2.2  18-Jun-2008  simonb Sync with head.
 1.365.2.1  10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.369.2.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.369.2.1  19-Oct-2008  haad Sync with HEAD.
 1.376.4.8  25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.376.4.7  17-Sep-2011  bouyer branches: 1.376.4.7.2;
Pull up following revision(s) (requested by manu in ticket #1658):
sys/rump/include/rump/rump_syscalls.h: revision 1.52 via patch
sys/kern/init_sysent.c: revision 1.257 via patch
sys/rump/include/rump/rumpvnode_if.h: revision 1.12 via patch
lib/libc/sys/Makefile.inc: revision 1.208 via patch
sys/sys/syscallargs.h: revision 1.227 via patch
sys/kern/kern_exec.c: revision 1.317 via patch
sys/rump/librump/rumpkern/rump_syscalls.c: revision 1.74 via patch
include/limits.h: revision 1.30 via patch
sys/kern/syscalls.master: revision 1.251 via patch
sys/sys/vnode_if.h: revision 1.83 via patch
sys/sys/fcntl.h: revision 1.40 via patch
sys/sys/fcntl.h: revision 1.41 via patch
sys/kern/vfs_syscalls.c: revision 1.433 via patch
sys/rump/librump/rumpvfs/rumpvnode_if.c: revision 1.11 via patch
sys/kern/syscalls.c: revision 1.248 via patch
sys/sys/syscall.h: revision 1.244 via patch
lib/libc/sys/link.2: revision 1.25 via patch
include/unistd.h: revision 1.127 via patch
distrib/sets/lists/comp/mi: revision 1.1659 via patch
sys/sys/stat.h: revision 1.61 via patch
First stage of support for Extended API set 2. Most of the think is
unimplemented, except enough of linkat(2) to hardlink to a symlink.
Everything new in headers is guarded #ifdef _INCOMPLETE_XOPEN_C063 since
some software (e.g.: xcvs in our own tree) will assume they can use openat(2)
when AT_FDCWD is defined. _INCOMPLETE_XOPEN_C063 will go away once support
will be completed.
regen
improve comment about AT_* defines: they are not only used by linkat(2)
Add macros to hide OpenGroup extened API set 2 from GNU configure. This
is a temporary workaround until the implementation is completed.
 1.376.4.6  20-Mar-2011  bouyer Pull up following revision(s) (requested by dholland in ticket #1567):
sys/kern/vfs_syscalls.c: revision 1.415 via patch
Check for bogus flags to access() up front. Otherwise we end up
calling VOP_ACCESS with flags 0 and something asserts deep in the
bowels of kauth. PR 44648 from Taylor Campbell. (I moved the check
earlier relative to the suggested patch.)
Pullup candidate.
 1.376.4.5  14-Feb-2010  bouyer branches: 1.376.4.5.2;
Pull up following revision(s) (requested by pooka in ticket #1289):
sys/sys/namei.src: revision 1.14
sys/kern/vfs_syscalls.c: revision 1.401
sys/nfs/nfs_serv.c: revision 1.149
sys/sys/namei.h: regen
Define namei flag INRENAME and set it if a lookup operation is part
of rename. This helps with building better asserts for rename in
the DELETE lookup ... the RENAME lookup is quite obviously a part
of rename.
 1.376.4.4  21-Dec-2009  sborrill Pull up the following revisions(s) (requested by martin in ticket #1200):
sys/kern/vfs_syscalls.c: revision 1.400

Use the kernel space version of the vfs name, not the original userspace
pointer. Avoids crashes on archs with completely separate userspace VA.
 1.376.4.3  01-Jul-2009  snj Pull up following revision(s) (requested by rmind in ticket #841):
sys/kern/vfs_syscalls.c: revision 1.392
do_sys_utimes: fix a bug introduced by rev.1.367.
VA_UTIMES_NULL is in va_vaflags, not va_flags.
 1.376.4.2  16-Feb-2009  snj branches: 1.376.4.2.2;
Pull up following revision(s) (requested by enami in ticket #435):
sys/kern/vfs_subr.c: revision 1.368
sys/kern/vfs_syscalls.c: revision 1.385
Make revoke(2) works as before:
- vfs_syscalls.c rev. 1.342 fails to invert condition correcly when
then-clause and else-clause is swapped. Since then, revoke(2) fails
if it is issued by file owner.
- Probably since rev. 1.160 of genfs_vnops.c, revoke(2) fails if it is
applied to non-device file and drops kernel into ddb.
 1.376.4.1  18-Dec-2008  snj Pull up following revision(s) (requested by elad in ticket #188):
sys/kern/vfs_syscalls.c: revision 1.382
Fix length passed to strlcpy(): we used to get names one character shorter
than reality.
Should be pulled up to netbsd-5.
 1.376.4.7.2.1  28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.376.4.5.2.2  28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.376.4.5.2.1  20-Mar-2011  bouyer Pull up following revision(s) (requested by dholland in ticket #1567):
sys/kern/vfs_syscalls.c: revision 1.415 via patch
Check for bogus flags to access() up front. Otherwise we end up
calling VOP_ACCESS with flags 0 and something asserts deep in the
bowels of kauth. PR 44648 from Taylor Campbell. (I moved the check
earlier relative to the suggested patch.)
Pullup candidate.
 1.376.4.2.2.3  20-Mar-2011  bouyer Pull up following revision(s) (requested by dholland in ticket #1567):
sys/kern/vfs_syscalls.c: revision 1.415 via patch
Check for bogus flags to access() up front. Otherwise we end up
calling VOP_ACCESS with flags 0 and something asserts deep in the
bowels of kauth. PR 44648 from Taylor Campbell. (I moved the check
earlier relative to the suggested patch.)
Pullup candidate.
 1.376.4.2.2.2  21-Dec-2009  sborrill Pull up the following revisions(s) (requested by martin in ticket #1200):
sys/kern/vfs_syscalls.c: revision 1.400

Use the kernel space version of the vfs name, not the original userspace
pointer. Avoids crashes on archs with completely separate userspace VA.
 1.376.4.2.2.1  01-Jul-2009  snj branches: 1.376.4.2.2.1.2;
Pull up following revision(s) (requested by rmind in ticket #841):
sys/kern/vfs_syscalls.c: revision 1.392
do_sys_utimes: fix a bug introduced by rev.1.367.
VA_UTIMES_NULL is in va_vaflags, not va_flags.
 1.376.4.2.2.1.2.1  21-Apr-2010  matt sync to netbsd-5
 1.376.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.376.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.376.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.385.2.2  23-Jul-2009  jym Sync with HEAD.
 1.385.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.403.2.6  23-Oct-2010  uebayasi Propagate MNT_XIP in mount flags.
 1.403.2.5  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.403.2.4  17-Aug-2010  uebayasi Sync with HEAD.
 1.403.2.3  31-May-2010  uebayasi Remove the "xip" option from mount_ffs(8) for simplicity.
 1.403.2.2  30-Apr-2010  uebayasi Sync with HEAD.
 1.403.2.1  23-Feb-2010  uebayasi More bits to pass the new XIP mount option correctly.
 1.404.2.6  12-Jun-2011  rmind sync with head
 1.404.2.5  31-May-2011  rmind sync with head
 1.404.2.4  21-Apr-2011  rmind sync with head
 1.404.2.3  05-Mar-2011  rmind sync with head
 1.404.2.2  03-Jul-2010  rmind sync with head
 1.404.2.1  16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.414.4.4  05-Mar-2011  bouyer Sync with HEAD
 1.414.4.3  15-Feb-2011  bouyer Implement COMPAT_50 quotactl(2)
 1.414.4.2  11-Feb-2011  bouyer Remove debug printf
 1.414.4.1  20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.414.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.423.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.440.2.6  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.440.2.5  23-Jan-2013  yamt sync with head
 1.440.2.4  16-Jan-2013  yamt sync with (a bit old) head
 1.440.2.3  30-Oct-2012  yamt sync with head
 1.440.2.2  23-May-2012  yamt sync with head.
 1.440.2.1  17-Apr-2012  yamt sync with head
 1.442.2.4  02-Jun-2012  mrg sync to latest -current.
 1.442.2.3  29-Apr-2012  mrg sync to latest -current.
 1.442.2.2  05-Apr-2012  mrg sync to latest -current.
 1.442.2.1  18-Feb-2012  mrg merge to -current.
 1.449.2.4  03-Nov-2014  msaitoh Pull up following revision(s) (requested by manu in ticket #1150):
lib/libc/sys/truncate.2: revision 1.27
sys/kern/vfs_syscalls.c: revision 1.484
Follow OpenGroup online documents for truncate[1] and ftruncate[2].
Fail with EINVAL for length argument negative values.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/truncate.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html
 1.449.2.3  21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.449.2.2  19-May-2012  riz branches: 1.449.2.2.4; 1.449.2.2.6;
Pull up following revision(s) (requested by manu in ticket #259):
sys/kern/vfs_syscalls.c: revision 1.456
sys/kern/vfs_mount.c: revision 1.14
sys/kern/vfs_syscalls.c: revision 1.452
sys/kern/vfs_syscalls.c: revision 1.453
sys/kern/vfs_syscalls.c: revision 1.454
Do not use vp after mount_domount() call as it sets it to NULL on success.
This fixes a panic when starting extended attributes.
Fix mount -o extattr : previous patch fixed a panic but caused operation
to happen on the mount point instead of the mounted filesystem.
Fix the extattr start fix. Looking up the filesystemroot vnode again
does not seems to be reliable. Instead save it before mount_domount()
sets it to NULL.
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.
 1.449.2.1  17-May-2012  riz Pull up following revision(s) (requested by rmind in ticket #246):
sys/kern/vfs_syscalls.c: revision 1.455
do_open: move pathbuf destruction to the callers, thus simplify and fix a
memory leak on error path.
 1.449.2.2.6.1  21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.449.2.2.4.1  21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.457.2.4  03-Dec-2017  jdolecek update from HEAD
 1.457.2.3  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.457.2.2  25-Feb-2013  tls resync with head
 1.457.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.464.4.1  23-Jul-2013  riastradh sync with HEAD
 1.464.2.2  18-May-2014  rmind sync with head
 1.464.2.1  28-Aug-2013  rmind sync with head
 1.478.2.1  10-Aug-2014  tls Rebase.
 1.490.2.2  18-Feb-2015  snj Pull up following revision(s) (requested by martin in ticket #523):
sys/kern/vfs_syscalls.c: revision 1.493
A syscall like posix_fallocate() that is not supposed to set errno in
userland needs to always return 0 and store the error code *retval.
 1.490.2.1  01-Dec-2014  martin Pull up following revision(s) (requested by manu in ticket #276):
sys/kern/vfs_syscalls.c: revision 1.492
Do not follow symlinks in sys_unmount()
There are situations where the underlying filesystem is unreachable
(e.g: NFS) causing symlink resolution to hang. Such a situation
should be avoided by using umount -f -R (force and raw), but while -R
causes the symlink resolution to be skipped in umount(8), the kernel was
still doing it in sys_unmount(). This changes fixes that.
When the -R flag is not given, umount(8) does symlinks resolution
through
realpath(3) before calling unmount(2), hence not doing it in the kernel
would not change behavior.
 1.492.2.6  28-Aug-2017  skrll Sync with HEAD
 1.492.2.5  05-Oct-2016  skrll Sync with HEAD
 1.492.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.492.2.3  22-Sep-2015  skrll Sync with HEAD
 1.492.2.2  06-Jun-2015  skrll Sync with HEAD
 1.492.2.1  06-Apr-2015  skrll Sync with HEAD
 1.504.2.3  26-Apr-2017  pgoyette Sync with HEAD
 1.504.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.504.2.1  06-Aug-2016  pgoyette Sync with HEAD
 1.505.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.513.2.1  11-May-2017  pgoyette Sync with HEAD
 1.516.2.2  07-Mar-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1806):

sys/kern/vfs_syscalls.c: revision 1.557

open(2): Don't map ERESTART to EINTR.

If a file or device's open function returns ERESTART, respect that --
restart the syscall; don't pretend a signal has been delivered when
it was not. If an SA_RESTART signal was delivered, POSIX does not
allow it to fail with EINTR:

SA_RESTART
This flag affects the behavior of interruptible functions;
that is, those specified to fail with errno set to [EINTR].
If set, and a function specified as interruptible is
interrupted by this signal, the function shall restart and
shall not fail with [EINTR] unless otherwise specified. If
an interruptible function which uses a timeout is restarted,
the duration of the timeout following the restart is set to
an unspecified value that does not exceed the original
timeout value. If the flag is not set, interruptible
functions interrupted by this signal shall fail with errno
set to [EINTR].

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

Nothing in the POSIX definition of open specifies otherwise.

In 1990, Kirk McKusick added these lines with a mysterious commit
message:
Author: Kirk McKusick <mckusick>
Date: Tue Apr 10 19:36:33 1990 -0800
eliminate longjmp from the kernel (for karels)
diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
index 7bc7b39bbf..d572d3a32d 100644
--- a/sys/kern/vfs_syscalls.c
+++ b/sys/kern/vfs_syscalls.c
@@ -14,7 +14,7 @@
* IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
*
- * @(#)vfs_syscalls.c 7.42 (Berkeley) 3/26/90
+ * @(#)vfs_syscalls.c 7.43 (Berkeley) 4/10/90
*/
#include "param.h"
@@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
crfree(fp->f_cred);
fp->f_count--;
- if (error == -1) /* XXX from fdopen */
- return (0); /* XXX from fdopen */
+ if (error == EJUSTRETURN) /* XXX from fdopen */
+ return (0); /* XXX from fdopen */
+ if (error == ERESTART)
+ error = EINTR;
scp->sc_ofile[indx] = NULL;
return (error);
}

(found via this git import of the CSRG history:
https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe

This change appears to have served two related purposes:
1. The fdopen function (the erstwhile open routine for /dev/fd/N)
used to return -1 as a hack to mean it had just duplicated the fd;
it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
return EJUSTRETURN, now defined to be -2, presumably to avoid a
conflict with ERESTART, defined to be -1. So this change finished
part of the change by Mike Karels to use a different magic return
code from fdopen.
Of course, today we use still another disgusting hack, EDUPFD, for
the same purpose, so none of this is relevant any more.
2. Prior to April 1990, the kernel handled signals during tsleep(9)
by longjmping out to the system call entry point or similar. In
April 1990, Mike Karels worked to convert all of that into
explicit unwind logic by passing through EINTR or ERESTART as
appropriate, instead of setjmp at each entry point.

However, it's not clear to me why this setjmp/longjmp and
fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
ERESTART to EINTR in open(2). I suspect it was a mistake.

In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
was removed from FreeBSD:

r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines
Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.
For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.
Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks

Index: vfs_syscalls.c
 1.516.2.1  22-Apr-2020  martin Pull up following revision(s) (requested by gdt in ticket #1534):

sys/kern/vfs_syscalls.c: revision 1.544
lib/libc/sys/fdatasync.2: revision 1.17

Relax fdatasync restriction that fd be writable

The restriction that a fd passed to fdatasync(2) must be writable was
added in 2003 in order to comply with POSIX. Since then, POSIX has
removed that requirement, and POSIX-valid programs have been therefore
encountering errors on NetBSD.

Patch by Paul Ripke after discussion on netbsd-users. Issue
discovered with pkgsrc/databases/mongodb3 as used by pkgsrc/net/unifi.
 1.518.4.4  21-Apr-2020  martin Sync with HEAD
 1.518.4.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.518.4.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.518.4.1  10-Jun-2019  christos Sync with HEAD
 1.518.2.6  22-Jan-2019  pgoyette Convert the MODULE_{,VOID_}HOOK_CALL macros to do everything in-line
rather than defining an intermediate hook##call function. Almost
all of the hooks are called only once, and although we lose the
ability of doing things like

if (MODULE_HOOK_CALL(...) == 0) ...

we simplify things quite a bit. With this change, we no longer need
to have both declaration and definition macros, and the definition
no longer needs to have both prototype argument list and a "real"
argument list.

FWIW, the above if now needs to written as

int ret;

MODULE_HOOK_CALL(..., ret);
if (ret == 0) ...

with appropriate use of braces {}.
 1.518.2.5  18-Jan-2019  pgoyette Don't restrict hooks to having only int or void types. Pass the hook's
type to the various macros, as needed.

Allows us to reduce diffs to original in at least one or two places (we
no longer have to provide an additional parameter to the hook routine
for returning a non-int return value).
 1.518.2.4  14-Jan-2019  pgoyette Create a variant of the HOOK macros that handles hook routines of
type void, and use them where appropriate.
 1.518.2.3  13-Jan-2019  pgoyette Remove the HOOK2 versions of the MODULE_HOOK macros. There were
only a few uses, and using them led to some lack of clarity in the
code. Instead, we now use two separate hooks, with names that
make it clear(er) what we're doing.

This also positions us to start unraveling some of the rtsock_50
mess, which will need (at least) five hooks.
 1.518.2.2  15-Oct-2018  pgoyette Convert the openat_10 hook to use the MP-safe mechanism

XXX Still to do: compat70_unp_addsockcred and sysvipc50_sysctl
 1.518.2.1  14-Mar-2018  pgoyette Make do_openat()'s handling of path=NULL modular
 1.533.2.2  07-Mar-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1610):

sys/kern/vfs_syscalls.c: revision 1.557

open(2): Don't map ERESTART to EINTR.

If a file or device's open function returns ERESTART, respect that --
restart the syscall; don't pretend a signal has been delivered when
it was not. If an SA_RESTART signal was delivered, POSIX does not
allow it to fail with EINTR:

SA_RESTART
This flag affects the behavior of interruptible functions;
that is, those specified to fail with errno set to [EINTR].
If set, and a function specified as interruptible is
interrupted by this signal, the function shall restart and
shall not fail with [EINTR] unless otherwise specified. If
an interruptible function which uses a timeout is restarted,
the duration of the timeout following the restart is set to
an unspecified value that does not exceed the original
timeout value. If the flag is not set, interruptible
functions interrupted by this signal shall fail with errno
set to [EINTR].

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

Nothing in the POSIX definition of open specifies otherwise.

In 1990, Kirk McKusick added these lines with a mysterious commit
message:
Author: Kirk McKusick <mckusick>
Date: Tue Apr 10 19:36:33 1990 -0800
eliminate longjmp from the kernel (for karels)
diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
index 7bc7b39bbf..d572d3a32d 100644
--- a/sys/kern/vfs_syscalls.c
+++ b/sys/kern/vfs_syscalls.c
@@ -14,7 +14,7 @@
* IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
*
- * @(#)vfs_syscalls.c 7.42 (Berkeley) 3/26/90
+ * @(#)vfs_syscalls.c 7.43 (Berkeley) 4/10/90
*/
#include "param.h"
@@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
crfree(fp->f_cred);
fp->f_count--;
- if (error == -1) /* XXX from fdopen */
- return (0); /* XXX from fdopen */
+ if (error == EJUSTRETURN) /* XXX from fdopen */
+ return (0); /* XXX from fdopen */
+ if (error == ERESTART)
+ error = EINTR;
scp->sc_ofile[indx] = NULL;
return (error);
}

(found via this git import of the CSRG history:
https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe

This change appears to have served two related purposes:
1. The fdopen function (the erstwhile open routine for /dev/fd/N)
used to return -1 as a hack to mean it had just duplicated the fd;
it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
return EJUSTRETURN, now defined to be -2, presumably to avoid a
conflict with ERESTART, defined to be -1. So this change finished
part of the change by Mike Karels to use a different magic return
code from fdopen.
Of course, today we use still another disgusting hack, EDUPFD, for
the same purpose, so none of this is relevant any more.
2. Prior to April 1990, the kernel handled signals during tsleep(9)
by longjmping out to the system call entry point or similar. In
April 1990, Mike Karels worked to convert all of that into
explicit unwind logic by passing through EINTR or ERESTART as
appropriate, instead of setjmp at each entry point.

However, it's not clear to me why this setjmp/longjmp and
fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
ERESTART to EINTR in open(2). I suspect it was a mistake.

In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
was removed from FreeBSD:

r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines
Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.
For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.
Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks

Index: vfs_syscalls.c
 1.533.2.1  22-Apr-2020  martin Pull up following revision(s) (requested by gdt in ticket #840):

sys/kern/vfs_syscalls.c: revision 1.544
lib/libc/sys/fdatasync.2: revision 1.17

Relax fdatasync restriction that fd be writable

The restriction that a fd passed to fdatasync(2) must be writable was
added in 2003 in order to comply with POSIX. Since then, POSIX has
removed that requirement, and POSIX-valid programs have been therefore
encountering errors on NetBSD.

Patch by Paul Ripke after discussion on netbsd-users. Issue
discovered with pkgsrc/databases/mongodb3 as used by pkgsrc/net/unifi.
 1.539.2.4  29-Feb-2020  ad Sync with head.
 1.539.2.3  25-Jan-2020  ad Make cwdinfo use mostly lockless, and largely hide the details in vfs_cwd.c.
 1.539.2.2  19-Jan-2020  ad Use LOCKLEAF in the few cases it's useful for ffs/tmpfs/nullfs. Others need
to be checked.
 1.539.2.1  17-Jan-2020  ad Sync with head.
 1.545.2.1  25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.548.2.1  03-Apr-2021  thorpej Sync with HEAD.
 1.549.4.1  01-Aug-2021  thorpej Sync with HEAD.
 1.556.2.1  07-Mar-2023  martin Pull up following revision(s) (requested by riastradh in ticket #115):

sys/kern/vfs_syscalls.c: revision 1.557

open(2): Don't map ERESTART to EINTR.

If a file or device's open function returns ERESTART, respect that --
restart the syscall; don't pretend a signal has been delivered when
it was not. If an SA_RESTART signal was delivered, POSIX does not
allow it to fail with EINTR:

SA_RESTART
This flag affects the behavior of interruptible functions;
that is, those specified to fail with errno set to [EINTR].
If set, and a function specified as interruptible is
interrupted by this signal, the function shall restart and
shall not fail with [EINTR] unless otherwise specified. If
an interruptible function which uses a timeout is restarted,
the duration of the timeout following the restart is set to
an unspecified value that does not exceed the original
timeout value. If the flag is not set, interruptible
functions interrupted by this signal shall fail with errno
set to [EINTR].

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

Nothing in the POSIX definition of open specifies otherwise.

In 1990, Kirk McKusick added these lines with a mysterious commit
message:
Author: Kirk McKusick <mckusick>
Date: Tue Apr 10 19:36:33 1990 -0800
eliminate longjmp from the kernel (for karels)
diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
index 7bc7b39bbf..d572d3a32d 100644
--- a/sys/kern/vfs_syscalls.c
+++ b/sys/kern/vfs_syscalls.c
@@ -14,7 +14,7 @@
* IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
*
- * @(#)vfs_syscalls.c 7.42 (Berkeley) 3/26/90
+ * @(#)vfs_syscalls.c 7.43 (Berkeley) 4/10/90
*/
#include "param.h"
@@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
crfree(fp->f_cred);
fp->f_count--;
- if (error == -1) /* XXX from fdopen */
- return (0); /* XXX from fdopen */
+ if (error == EJUSTRETURN) /* XXX from fdopen */
+ return (0); /* XXX from fdopen */
+ if (error == ERESTART)
+ error = EINTR;
scp->sc_ofile[indx] = NULL;
return (error);
}

(found via this git import of the CSRG history:
https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe

This change appears to have served two related purposes:
1. The fdopen function (the erstwhile open routine for /dev/fd/N)
used to return -1 as a hack to mean it had just duplicated the fd;
it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
return EJUSTRETURN, now defined to be -2, presumably to avoid a
conflict with ERESTART, defined to be -1. So this change finished
part of the change by Mike Karels to use a different magic return
code from fdopen.
Of course, today we use still another disgusting hack, EDUPFD, for
the same purpose, so none of this is relevant any more.
2. Prior to April 1990, the kernel handled signals during tsleep(9)
by longjmping out to the system call entry point or similar. In
April 1990, Mike Karels worked to convert all of that into
explicit unwind logic by passing through EINTR or ERESTART as
appropriate, instead of setjmp at each entry point.

However, it's not clear to me why this setjmp/longjmp and
fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
ERESTART to EINTR in open(2). I suspect it was a mistake.

In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
was removed from FreeBSD:

r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines
Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.
For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.
Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks

Index: vfs_syscalls.c

RSS XML Feed