History log of /src/sys/kern/subr_disk.c |
Revision | | Date | Author | Comments |
1.138 |
| 13-Apr-2025 |
jakllsch | Add physical sector and alignment info to struct disk_geom and the geometry plist, and handle in partutil.
Bump version for disk_geom addition.
Collect DIOCGSECTORALIGN handling into one place.
|
1.137 |
| 09-May-2023 |
riastradh | branches: 1.137.6; ioctl(DIOCRMWEDGES): Delete only idle wedges.
Don't forcibly delete busy wedges.
Reported-by: syzbot+e46f31fe56e04f567d88@syzkaller.appspotmail.com https://syzkaller.appspot.com/bug?id=8a00fd7f2e7459748d7a274098180a4708ff0f61
Fixes accidental destruction of the busy wedge that the root file system is mounted on, triggered by syzbot's ioctl(DIOCRMWEDGES).
|
1.136 |
| 22-Apr-2023 |
riastradh | disk(9): Fix missing unlock in error branch in previous change.
Reported-by: syzbot+870665adaf8911c0d94d@syzkaller.appspotmail.com https://syzkaller.appspot.com/bug?id=a4ae17cf66b5bb999182ae77fd3c7ad9ad18c891
|
1.135 |
| 21-Apr-2023 |
riastradh | disk(9): Fix use-after-free race with concurrent disk_set_info.
This can happen with dk(4), which allows wedges to have their size increased without destroying and recreating the device instance.
Drivers which allow concurrent disk_set_info and disk_ioctl must serialize disk_set_info with dk_openlock.
|
1.134 |
| 28-Mar-2022 |
riastradh | branches: 1.134.4; disk(9): New function disklabel_dev_unit.
Maps a dev_t like wd3e to an autoconf instance number like 3, with no partition. Same as DISKUNIT macro, but is a symbol whose pointer can be taken. Meant for use with struct bdevsw, cdevsw::d_devtounit.
|
1.133 |
| 17-May-2021 |
mrg | move bi-endian disklabel support from the kernel and libsa into libkern.
- dkcksum() and dkcksum_sized() move from subr_disk.c and from libsa into libkern/dkcksum.c (which is missing _sized() version), using the version from usr.sbin/disklabel.
- swap_disklabel() moves from subr_disk_mbr.c into libkern, now called disklabel_swap(). (the sh3 version should be updated to use this.)
- DISKLABEL_EI becomes a first-class option with opt_disklabel.h.
- add libkern.h to libsa/disklabel.c.
this enables future work for bi-endian libsa/ufs.c (relevant for ffsv1, ffsv2, lfsv1, and lfsv2), as well as making it possible for ports not using subr_disk_mbr.c to include bi-endian disklabel support (which, afaict, includes any disk on mbr-supporting platforms that do not have an mbr as well as disklabel.)
builds successsfully on: alpha, i386, amd64, sun2, sun3, evbarm64, evbarm64-eb, sparc, and sparc64. tested in anita on i386 and sparc, testing in hardware on evbarm64*.
|
1.132 |
| 17-Oct-2020 |
mlelstv | branches: 1.132.6; 1.132.8; Attach disk info even for zero sized disks. Slight refactoring.
|
1.131 |
| 11-Jun-2020 |
thorpej | Update for proplib(3) API changes.
|
1.130 |
| 27-Mar-2020 |
mlelstv | Avoid division by zero if label isn't valid.
|
1.129 |
| 30-Sep-2019 |
cnst | kern/subr_disk: bounds_check_with_label: really protect against div by zero
Solves kernel panic in NetBSD 8.1 amd64 on VirtualBox 6.0.12 r133076.
Triggered with an NVMe controller without any actual discs behind it:
nvme0 at pci0 dev 14 function 0: vendor 80ee product 4e56 (rev. 0x00) nvme0: NVMe 1.2 nvme0: interrupting at ioapic0 pin 22 nvme0: ORCL-VBOX-NVME-VER12, firmware 1.0, serial VB1234-56789 ld0 at nvme0 nsid 1 ld0: 0, 0 cyl, 16 head, 63 sec, 1 bytes/sect x 0 sectors
Code path is reached 4 times during normal boot, each time after wd0a is already mounted; this patch avoids a crash with a dirty filesystem.
|
1.128 |
| 22-May-2019 |
hannken | branches: 1.128.2; Implement disk_rename()/iostat_rename() to rename a disk.
Use it from zvol_rename_minor() when renaming a ZVOL.
|
1.127 |
| 04-Apr-2019 |
christos | move setdisklabel(9) into a separate file.
|
1.126 |
| 04-Apr-2019 |
christos | one more __func__
|
1.125 |
| 04-Apr-2019 |
martin | Make the DEBUG version compile
|
1.124 |
| 03-Apr-2019 |
christos | centralize setdisklabel(9)
|
1.123 |
| 27-Mar-2019 |
martin | Add a disk ioctl DIOCRMWEDGES to remove all wedges of a given disk (if not busy).
|
1.122 |
| 07-Mar-2018 |
kre | branches: 1.122.2;
Fix typo in comment (s/is/if/) - NFC.
|
1.121 |
| 27-Oct-2017 |
joerg | branches: 1.121.2; Revert printf return value change.
|
1.120 |
| 27-Oct-2017 |
utkarsh009 | [syzkaller] Cast all the printf's to (void *) > as a result of new printf(9) declaration.
|
1.119 |
| 01-Jun-2017 |
chs | branches: 1.119.2; remove checks for failure after memory allocation calls that cannot fail:
kmem_alloc() with KM_SLEEP kmem_zalloc() with KM_SLEEP percpu_alloc() pserialize_create() psref_class_create()
all of these paths include an assertion that the allocation has not failed, so callers should not assert that again.
|
1.118 |
| 05-Mar-2017 |
mlelstv | Enhance disk metrics by calculating a weighted sum that is incremented by the number of concurrent I/O requests. Also introduce a new disk_wait() function to measure requests waiting in a bufq. iostat -y now reports data about waiting and active requests.
So far only drivers using dksubr and dk, ccd, wd and xbd collect data about waiting requests.
|
1.117 |
| 28-Feb-2017 |
jakllsch | pi_bsize must be at least pi_secsize
Allows block device accesses to 4KiB logical sector disks to function on the vast majority of ports with 2KiB BLKDEV_IOSIZE.
|
1.116 |
| 06-Jan-2016 |
christos | branches: 1.116.2; 1.116.4; print the disklabel information on error if DIAGNOSTIC.
|
1.115 |
| 08-Dec-2015 |
christos | Replace DIOCGPART -> DIOCGPARTINFO which returns the data needed instead of pointers.
|
1.114 |
| 28-Nov-2015 |
mlelstv | Handle sector sizes other than DEV_BSIZE when reading labels.
|
1.113 |
| 14-May-2015 |
chs | in bounds_check_with_*, reject negative block numbers and avoid a potential overflow in calculating the size of the request.
|
1.112 |
| 05-May-2015 |
mlelstv | Always fixup zero sector size, even when other geometry values are invalid.
|
1.111 |
| 02-Jan-2015 |
christos | - Use NODEV instead of 0 - Return EBUSY if there was no label
|
1.110 |
| 31-Dec-2014 |
mlelstv | Retire disk_blocksize().
|
1.109 |
| 31-Dec-2014 |
christos | Mention which ioctls need to move to dk_ioctl, and don't allow wedges on wedges.
|
1.108 |
| 31-Dec-2014 |
christos | make more drivers use disk_ioctl, and add a dev parameter to it so that we can merge the "easy" disklabel ioctls to it. Ultimately all this will go do dk_ioctl once all the drivers have been converted.
|
1.107 |
| 31-Dec-2014 |
christos | Centralize wedge ioctls in disk_ioctl.
|
1.106 |
| 31-Dec-2014 |
mlelstv | disk_blocksize and disk_set_info relay the same information to the disk subsystem.
Make disk_set_info also set blocksize shift values. Remove every call to disk_blocksize.
Keep disk_blocksize for ABI compatibility, make it also set dg_secsize.
|
1.105 |
| 29-Dec-2014 |
mlelstv | clear error for new ioctls.
|
1.104 |
| 29-Dec-2014 |
mlelstv | Implement DIOCGMEDIASIZE and DIOCGSECTORSIZE from FreeBSD.
|
1.103 |
| 19-Oct-2013 |
mlelstv | branches: 1.103.4; 1.103.6; use 64bit arithmetic to compute sectors-per-unit
|
1.102 |
| 29-May-2013 |
christos | branches: 1.102.2; phase 1 of disk geometry cleanup: - centralize the geometry -> plist code so that we don't have n useless copies of it.
|
1.101 |
| 09-Feb-2013 |
christos | printflike maintenance.
|
1.100 |
| 14-Oct-2010 |
mrg | branches: 1.100.8; 1.100.18; add some (uint64_t) casts so avoid 32 bit overflows. this fixes my 3TB disk with 4KB sectors and disklabel (which looks like it would work upto 16TB.)
idea from mlelstv@.
|
1.99 |
| 28-Nov-2009 |
dsl | branches: 1.99.2; 1.99.4; When truncating a request in bounds_check_with_mediasize() multiply by the provided sector size instead of 512. Fixes last bit of PR/31565
|
1.98 |
| 27-Nov-2009 |
tsutsui | u_short -> uint16_t, some KNF.
|
1.97 |
| 20-May-2009 |
dyoung | On second thought, let's call disk_predetach() disk_begindetach(). Verbs are good.
|
1.96 |
| 19-May-2009 |
dyoung | Encapsulate the checks that I do before detaching a disk(9) provider in a pre-detachment routine, disk_predetach().
|
1.95 |
| 04-Apr-2009 |
ad | Add disk_isbusy(), iostat_isbusy().
|
1.94 |
| 22-Jan-2009 |
yamt | branches: 1.94.2; malloc -> kmem_alloc
|
1.93 |
| 28-Apr-2008 |
martin | branches: 1.93.8; 1.93.10; Remove clause 3 and 4 from TNF licenses
|
1.92 |
| 28-Feb-2008 |
matt | branches: 1.92.2; 1.92.4; constify dkdriver
|
1.91 |
| 31-Jan-2008 |
dyoung | branches: 1.91.2; 1.91.6; Constify both struct disk->dk_name and the `name' argument to disk_init().
|
1.90 |
| 02-Jan-2008 |
ad | Merge vmlocking2 to head.
|
1.89 |
| 08-Oct-2007 |
ad | branches: 1.89.4; 1.89.6; 1.89.10; 1.89.12; Merge disk init changes from the vmlocking branch. These seperate init / destroy of 'struct disk' from attach / detach.
|
1.88 |
| 29-Jul-2007 |
ad | branches: 1.88.4; 1.88.6; 1.88.8; 1.88.10; It's not a good idea for device drivers to modify b_flags, as they don't need to understand the locking around that field. Instead of setting B_ERROR, set b_error instead. b_error is 'owned' by whoever completes the I/O request.
|
1.87 |
| 21-Jul-2007 |
ad | Replace some uses of lockmgr().
|
1.86 |
| 24-Jun-2007 |
dyoung | branches: 1.86.2; Extract common code from i386, xen, and sparc64, creating config_handle_wedges() and read_disk_sectors(). On x86, handle_wedges() is a thin wrapper for config_handle_wedges(). Share opendisk() across architectures.
Add kernel code in support of specifying a root partition by wedge name. E.g., root specifications "wedge:wd0a", "wedge:David's Root Volume" are possible. (Patches for config(1) coming soon.)
In support of moving disks between architectures (esp. i386 <-> evbmips), I've written a routine convertdisklabel() that ensures that the raw partition is at RAW_DISK by following these steps:
0 If we have read a disklabel that has a RAW_PART with p_offset == 0 and p_size != 0, then use that raw partition.
1 If we have read a disklabel that has both partitions 'c' and 'd', and RAW_PART has p_offset != 0 or p_size == 0, but the other partition is suitable for a raw partition (p_offset == 0, p_size != 0), then swap the two partitions and use the new raw partition.
2 If the architecture's raw partition is 'd', and if there is no partition 'd', but there is a partition 'c' that is suitable for a raw partition, then copy partition 'c' to partition 'd'.
3 Determine the drive's last sector, using either the d_secperunit the drive reported, or by guessing (0x1fffffff). If we cannot read the drive's last sector, then fail.
4 If we have read a disklabel that has no partition slot RAW_PART, then create a partition RAW_PART. Make it span the whole drive.
5 If there are fewer than MAXPARTITIONS partitions, then "slide" the unsuitable raw partition RAW_PART, and subsequent partitions, into partition slots RAW_PART+1 and subsequent slots. Create a raw partition at RAW_PART. Make it span the whole drive.
The convertdisklabel() procedure can probably stand to be simplified, but it ought to deal with all but an extraordinarily broken disklabel, now.
i386: compiled and tested, sparc64: compiled, evbmips: compiled.
|
1.85 |
| 04-Mar-2007 |
christos | branches: 1.85.2; 1.85.4; Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
|
1.84 |
| 01-Mar-2007 |
martin | Split the disklabel checksum function into two, so we can pass the length separately. Use this for foreign-endianess labels in wedge autodiscovery, and calculate the checksum of those before we swap various fields in the label.
|
1.83 |
| 25-Nov-2006 |
scw | branches: 1.83.2; 1.83.4; Replace the myriad copies of bounds_check_with_label() with a single MI version.
Add disk_blocksize(9) so that disk drivers can record the physical block size of a disk if it is different to DEV_BSIZE. Right now this simply initialises dk_blkshift and dk_byteshift according to the supplied block size. This information is used in the MI version of bounds_check_with_label().
|
1.82 |
| 25-Oct-2006 |
thorpej | - Add a new disk ioctl (DIOCGDISKINFO) to get the disk-info dictionary for the disk. - Add a new function, disk_ioctl(), that does generic disk ioctl handling. DIOCGDISKINFO is handled here now, and others will be added in the future. - In the wd driver, fill in the dk_info member of struct disk and use the new disk_ioctl() function.
|
1.81 |
| 22-Sep-2006 |
thorpej | - Define disk information, disk geometry, and disk partition dictionary schemas. Disk information and disk geometry are designed to replace information currently conveyed to user space using struct disklabel. - Add a dk_info member to struct disk; a reference to a disk information dictionary. This dictionary is to be allocated and the reference stored in struct disk by individual drivers. - disk_detach0() will release dk_info if non-NULL. - Convert the wd(4) driver to stash geometry and other disk properties as the "disk-info" property in its properties dictionary. This needs some cleanup, but will serve as an example of what to do with other disk drivers.
|
1.80 |
| 23-Aug-2006 |
christos | branches: 1.80.2; 1.80.4; Change iostat_alloc() to take the parent pointer and the name directly, so that callers are not responsible for initializing the fields. Store the name inside the struct instead of maintaining a pointer to external storage, or leaked memory (nfs case).
|
1.79 |
| 07-Jun-2006 |
kardel | merge FreeBSD timecounters from branch simonb-timecounters - struct timeval time is gone time.tv_sec -> time_second - struct timeval mono_time is gone mono_time.tv_sec -> time_uptime - access to time via {get,}{micro,nano,bin}time() get* versions are fast but less precise - support NTP nanokernel implementation (NTP API 4) - further reading: Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
|
1.78 |
| 21-Apr-2006 |
yamt | branches: 1.78.2; iostat_find/disk_find: constify and simplify.
|
1.77 |
| 21-Apr-2006 |
yamt | remove some unnecessary #include.
|
1.76 |
| 21-Apr-2006 |
yamt | whitespace.
|
1.75 |
| 20-Apr-2006 |
blymn | Prefix iostat structure elements with io_
|
1.74 |
| 14-Apr-2006 |
blymn | Make i/o statistics collection more generic, include tape drives and nfs mounts in the set of devices that statistics will be reported on.
|
1.73 |
| 26-Dec-2005 |
perry | branches: 1.73.4; 1.73.6; 1.73.8; 1.73.10; 1.73.12; u_intN_t -> uintN_t
|
1.72 |
| 11-Dec-2005 |
christos | merge ktrace-lwp.
|
1.71 |
| 15-Oct-2005 |
yamt | - change the way to specify a bufq strategy. (by string rather than by number) - rather than embedding bufq_state in driver softc, have a pointer to the former. - move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c. - rename method to strategy for consistency. - move some definitions which don't need to be exposed to the rest of kernel from sys/bufq.h to sys/bufq_impl.h. (is it better to move it to kern/ or somewhere?) - fix some obvious breakage in dev/qbus/ts.c. (not tested)
|
1.70 |
| 20-Aug-2005 |
yamt | introduce a variant of disk_attach/detach, for pseudo disks which is opened by user before being attached.
|
1.69 |
| 29-May-2005 |
christos | branches: 1.69.2; - add const. - remove unnecessary casts. - add __UNCONST casts and mark them with XXXUNCONST as necessary.
|
1.68 |
| 31-Mar-2005 |
yamt | introduce a function to drain bufq and use it where appropriate.
|
1.67 |
| 08-Feb-2005 |
fvdl | branches: 1.67.4; Change the 'sz' variable in bounds_check_* to int64_t to avoid overflows when a very large blocknumber is passed in.
|
1.66 |
| 06-Feb-2005 |
christos | Change an if/panic statement to a KASSERT and disable a chatty printf.
|
1.65 |
| 25-Nov-2004 |
yamt | branches: 1.65.4; 1.65.6; lookup bufq using link_set rather than a switch statement.
|
1.64 |
| 28-Oct-2004 |
yamt | move buffer queue related stuffs from buf.h to their own header, bufq.h.
|
1.63 |
| 15-Oct-2004 |
thorpej | - Eliminate the need to call disk_init(). - disk_count needs to be protected with disklist_slock, too.
|
1.62 |
| 14-Oct-2004 |
yamt | move i/o schedulers to their own files. namely, from kern/subr_disk.c to kern/bufq_{fcfs,disksort,readprio,priocscan}.c.
|
1.61 |
| 25-Sep-2004 |
thorpej | Work-in-progress implementation of "wedges", a new way to represent partitions in the NetBSD kernel. See discussion on tech-kern for details.
|
1.60 |
| 09-Mar-2004 |
yamt | - add a function prototype. - consitify.
|
1.59 |
| 28-Feb-2004 |
yamt | change the way to handle NEW_BUFQ_STRATEGY option. instead of putting #ifdefs into each drivers, use a global variable to indicate default strategy.
XXX should have a way to specify other strategies.
|
1.58 |
| 10-Jan-2004 |
yamt | add a new bufq strategy, BUFQ_PRIOCSCAN (per-priority CSCAN). discussed on tech-kern@
|
1.57 |
| 06-Dec-2003 |
yamt | rev.1.55 didn't handle BUFQ_SORT_CYLINDER case correctly. pointed by Juergen Hannken-Illjes. patch provided by him.
|
1.56 |
| 06-Dec-2003 |
he | Make sure buf_inorder() returns a value under all conditions.
|
1.55 |
| 05-Dec-2003 |
yamt | buf_inorder: deal with 64-bit daddr_t correctly.
|
1.54 |
| 04-Dec-2003 |
atatat | Dynamic sysctl.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(), vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all nodes are registered with the tree, and nodes can be added (or removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to number (and back again) can now be discovered, instead of having to be hard coded. Adding new nodes to the tree is likewise much simpler -- the new infrastructure handles almost all the work for simple types, and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking), so all existing consumers of sysctl information should notice no difference.
PS - I'm sorry, but there's a distinct lack of documentation at the moment. I'm working on sysctl(3/8/9) right now, and I promise to watch out for buses.
|
1.53 |
| 07-Aug-2003 |
agc | Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
|
1.52 |
| 13-Apr-2003 |
dsl | branches: 1.52.2; CONSTCONT should have been CONSTCOND
|
1.51 |
| 13-Apr-2003 |
dsl | Fix error message for 64bit daddr_t
|
1.50 |
| 03-Apr-2003 |
fvdl | Add a bounds_check_with_mediasize function, which is intended for checking RAW_PART transfers (and later raw disk devices).
|
1.49 |
| 06-Nov-2002 |
enami | Factor out the COMPAT_16 code.
|
1.48 |
| 05-Nov-2002 |
mrg | - do the COMPAT_16 dance in sysctl_diskstats() for the where == NULL case as well. pointed out by enami@. - defflag COMPAT_16.
|
1.47 |
| 04-Nov-2002 |
mrg | repair backwards compatibility with netbsd 1.6 - if we are not given the wanted sizeof(struct disk_sysctl), use the old size. for non-COMPAT_16, however, we return EINVAL so that all future programs are forced into passing the wanted size. 1.6 iostat(8) works with -current kernel again.
as seen on tech-kern.
|
1.46 |
| 01-Nov-2002 |
simonb | When calculating the space needed for the data, use the supplied userland structure size (if passed in). Use the supplied userland structure size (if passed in) to check if there is enough room to copyout the next structure.
|
1.45 |
| 01-Nov-2002 |
mrg | implement separate read/write disk statistics: - disk_unbusy() gets a new parameter to tell the IO direction. - struct disk_sysctl gets 4 new members for read/write bytes/transfers. when processing hw.diskstats, add the read&write bytes/transfers for the old combined stats to attempt to keep backwards compatibility.
unfortunately, due to multiple bugs, this will cause new kernels and old vmstat/iostat/systat programs to fail. however, the next time this is change it will not fail again.
this is just the kernel portion.
|
1.44 |
| 01-Nov-2002 |
enami | Make this works with QUEUEDEBUG defined; don't use queue pointer after removing an element from queue.
|
1.43 |
| 01-Nov-2002 |
enami | Cosmetic changes.
|
1.42 |
| 30-Aug-2002 |
hannken | Remove the old device buffer queue interface.
Approved by: Jason R. Thorpe <thorpej@wasabisystems.com>
|
1.41 |
| 23-Jul-2002 |
hannken | The buffer returned by BUFQ_PEEK must remain the same until BUFQ_GET is called. It may be used as the "current" buffer.
|
1.40 |
| 21-Jul-2002 |
hannken | Rename bufq_init() to bufq_alloc(). Add bufq_free() to remove a buffer queue. Avoid MALLOC while holding a spinlock.
From Chuck Silvers.
|
1.39 |
| 16-Jul-2002 |
hannken | Implement a new device buffer queue interface. One basic struct, a function to setup a queue with a specific strategy and three macros to put buf's into the queue, get and remove the next buf or get the next buf without removal.
The BUFQ_XXX interface will be removed in the future. The B_ORDERED flag is not longer supported.
Approved by: Jason R. Thorpe <thorpej@wasabisystems.com>
|
1.38 |
| 28-Jun-2002 |
yamt | constify diskerr().
|
1.37 |
| 16-Feb-2002 |
enami | branches: 1.37.8; 1.37.10; Use sizeof correctly. Fixes PR#15613.
|
1.36 |
| 16-Feb-2002 |
enami | - Wrap long line. - Remove unnecessary semi-colon.
|
1.35 |
| 28-Jan-2002 |
simonb | Remember to update the "size copied" counter in sysctl_diskstats().
|
1.34 |
| 28-Jan-2002 |
simonb | Use TAILQ_FOREACH().
|
1.33 |
| 27-Jan-2002 |
simonb | Implement the hw.disknames and hw.diskstats sysctl's that have been listed in <sys/sysctl.h> since day one but never implemented.
|
1.32 |
| 30-Nov-2001 |
enami | Use cached pointer to next buf instead of re-fetching it. GCC actually generates different code.
|
1.31 |
| 13-Nov-2001 |
lukem | add RCSID
|
1.30 |
| 09-Jul-2001 |
simonb | branches: 1.30.2; 1.30.4; ANSIfy.
|
1.29 |
| 30-Mar-2000 |
augustss | branches: 1.29.6; Get rid of register declarations.
|
1.28 |
| 07-Feb-2000 |
thorpej | Fix a bug in disksort_*() which caused non-optimal ordering when multiple active partitions were on a single spindle. Add a b_rawblkno member to struct buf which contains the non-partition-relative block number to sort by.
|
1.27 |
| 28-Jan-2000 |
hannken | The decision that `disksort_cylinder' uses to decide if the buffer needs to go to the inversion list is incomplete. If the cylinders are equal block numbers must be checked.
This caused lockups if some buffers with the same cylinder were cycling through the list, as it may happen with softdep enabled.
Fixes PR #9197.
|
1.26 |
| 21-Jan-2000 |
thorpej | - Add a B_ORDERED flag to communicate to drivers that an I/O request should be issued/completed in order; that is, provide a barrier for I/O queues. - Change the buffer driver queue links to a TAILQ, rather than using a home-grown equivalent. Provide BUFQ_*() macros to manipulate buffer queues; these deal with the barrier provided by B_ORDERED. - Update disksort() accordingly, and provide 3 versions: - disksort_cylinder(): historical disksort(), which keys on b_cylinder (and b_blkno for the case when b_cylinder matches). - disksort_blkno(): sorts only on b_blkno. Essentially the same as disksort_cylinder(), but with fewer comparisons. - disksort_tail(): requests are simply inserted into the queue at the tail. This is provided as an option so that drivers can simply have a pointer to the appropriate sort function. Note that disksort() now pays attention to B_ORDERED.
|
1.25 |
| 22-Feb-1999 |
drochner | branches: 1.25.8; 1.25.14; PR kern/7033 (Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>): use device minor to unit/partition macros from sys/disklabel.h
|
1.24 |
| 04-Aug-1998 |
perry | Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one. bcopy(x, y, z) -> memcpy(y, x, z) ovbcopy(x, y, z) -> memmove(y, x, z) bcmp(x, y, z) -> memcmp(x, y, z) bzero(x, y) -> memset(x, 0, y)
|
1.23 |
| 30-Dec-1997 |
thorpej | Rearrange disk_detach() slightly, and make a small run-time cosmetic change in disk_unbusy().
|
1.22 |
| 05-Oct-1997 |
thorpej | Copyright assigned to The NetBSD Foundation.
|
1.21 |
| 17-Oct-1996 |
perry | branches: 1.21.10; removed #ifdef tahoe
|
1.20 |
| 13-Oct-1996 |
christos | backout previous kprintf change
|
1.19 |
| 10-Oct-1996 |
christos | printf -> kprintf, sprintf -> ksprintf
|
1.18 |
| 12-Jul-1996 |
thorpej | Remove old-style disk instrumentation code.
|
1.17 |
| 16-Mar-1996 |
christos | Fix printf() formats.
|
1.16 |
| 09-Feb-1996 |
christos | More proto fixes
|
1.15 |
| 07-Jan-1996 |
thorpej | New generic disk framework. Highlights:
- New metrics handling. Metrics are now kept in the new `struct disk'. Busy time is now stored as a timeval, and transfer count in bytes.
- Storage for disklabels is now dynamically allocated, so that the size of the disk structure is not machine-dependent.
- Several new functions for attaching and detaching disks, and handling metrics calculation.
Old-style instrumentation is still supported in drivers that did it before. However, old-style instrumentation is being deprecated, and will go away once the userland utilities are updated for the new framework.
For usage and architectural details, see the forthcoming disk(9) manual page.
|
1.14 |
| 28-Dec-1995 |
thorpej | Move the old-style disk instrumentation "structures" to a central location (sys/kern/subr_disk.c) and note that they should/will be deperecated.
|
1.13 |
| 29-Mar-1995 |
mycroft | Make definition of b_cylinder global.
|
1.12 |
| 29-Jun-1994 |
cgd | New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
|
1.11 |
| 19-May-1994 |
mycroft | Update to 4.4-Lite.
|
1.10 |
| 10-Feb-1994 |
mycroft | Don't need back pointers for disksort().
|
1.9 |
| 06-Feb-1994 |
mycroft | Remove another use of b_actl.
|
1.8 |
| 06-Feb-1994 |
mycroft | Use b_actf, not av_forw.
|
1.7 |
| 23-Jan-1994 |
glass | remove warning
|
1.6 |
| 11-Jan-1994 |
mycroft | Get rid of disklabel indirection functions.
|
1.5 |
| 17-Dec-1993 |
mycroft | Canonicalize all #includes.
|
1.4 |
| 05-Sep-1993 |
mycroft | branches: 1.4.2; Add \n to end of error message.
|
1.3 |
| 20-May-1993 |
deraadt | more disklabel changes
|
1.2 |
| 20-May-1993 |
cgd | add rcs ids, and clean up headers where necessary
|
1.1 |
| 21-Mar-1993 |
cgd | branches: 1.1.1; Initial revision
|
1.1.1.1 |
| 21-Mar-1993 |
cgd | initial import of 386bsd-0.1 sources
|
1.4.2.3 |
| 14-Nov-1993 |
mycroft | Canonicalize all #includes.
|
1.4.2.2 |
| 30-Sep-1993 |
deraadt | delete hopping-functions cpu_{read,write,set}disklabel()
|
1.4.2.1 |
| 29-Sep-1993 |
mycroft | Strategy functions return void.
|
1.21.10.1 |
| 14-Oct-1997 |
thorpej | Update marc-pcmcia branch from trunk.
|
1.25.14.1 |
| 21-Dec-1999 |
wrstuden | Initial commit of recent changes to make DEV_BSIZE go away.
Runs on i386, needs work on other arch's. Main kernel routines should be fine, but a number of the stand programs need help.
cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512 byte block devices. vnd, raidframe, and lfs need work.
Non 2**n block support is automatic for LKM's and conditional for kernels on "options NON_PO2_BLOCKS".
|
1.25.8.1 |
| 20-Nov-2000 |
bouyer | Update thorpej_scsipi to -current as of a month ago
|
1.29.6.7 |
| 11-Nov-2002 |
nathanw | Catch up to -current
|
1.29.6.6 |
| 17-Sep-2002 |
nathanw | Catch up to -current.
|
1.29.6.5 |
| 01-Aug-2002 |
nathanw | Catch up to -current.
|
1.29.6.4 |
| 28-Feb-2002 |
nathanw | Catch up to -current.
|
1.29.6.3 |
| 08-Jan-2002 |
nathanw | Catch up to -current.
|
1.29.6.2 |
| 14-Nov-2001 |
nathanw | Catch up to -current.
|
1.29.6.1 |
| 24-Aug-2001 |
nathanw | Catch up with -current.
|
1.30.4.1 |
| 07-Sep-2001 |
thorpej | Commit my "devvp" changes to the thorpej-devvp branch. This replaces the use of dev_t in most places with a struct vnode *.
This will form the basic infrastructure for real cloning device support (besides being architecurally cleaner -- it'll be good to get away from using numbers to represent objects).
|
1.30.2.4 |
| 06-Sep-2002 |
jdolecek | sync kqueue branch with HEAD
|
1.30.2.3 |
| 16-Mar-2002 |
jdolecek | Catch up with -current.
|
1.30.2.2 |
| 11-Feb-2002 |
jdolecek | Sync w/ -current.
|
1.30.2.1 |
| 10-Jan-2002 |
thorpej | Sync kqueue branch with -current.
|
1.37.10.1 |
| 22-Jul-2002 |
lukem | Pull up revision 1.38 (requested by yamt in ticket #536): constify diskerr().
|
1.37.8.4 |
| 31-Aug-2002 |
gehenna | catch up with -current.
|
1.37.8.3 |
| 29-Aug-2002 |
gehenna | catch up with -current.
|
1.37.8.2 |
| 20-Jul-2002 |
gehenna | catch up with -current.
|
1.37.8.1 |
| 15-Jul-2002 |
gehenna | catch up with -current.
|
1.52.2.10 |
| 10-Nov-2005 |
skrll | Sync with HEAD. Here we go again...
|
1.52.2.9 |
| 01-Apr-2005 |
skrll | Sync with HEAD.
|
1.52.2.8 |
| 09-Feb-2005 |
skrll | Sync with HEAD.
|
1.52.2.7 |
| 07-Feb-2005 |
skrll | Sunc with HEAD.
|
1.52.2.6 |
| 29-Nov-2004 |
skrll | Sync with HEAD.
|
1.52.2.5 |
| 02-Nov-2004 |
skrll | Sync with HEAD.
|
1.52.2.4 |
| 19-Oct-2004 |
skrll | Sync with HEAD
|
1.52.2.3 |
| 21-Sep-2004 |
skrll | Fix the sync with head I botched.
|
1.52.2.2 |
| 18-Sep-2004 |
skrll | Sync with HEAD.
|
1.52.2.1 |
| 03-Aug-2004 |
skrll | Sync with HEAD
|
1.65.6.1 |
| 12-Feb-2005 |
yamt | sync with head.
|
1.65.4.1 |
| 29-Apr-2005 |
kent | sync with -current
|
1.67.4.1 |
| 06-Apr-2005 |
tron | Pull up revision 1.68 (requested by yamt in ticket #112): introduce a function to drain bufq and use it where appropriate.
|
1.69.2.7 |
| 17-Mar-2008 |
yamt | sync with head.
|
1.69.2.6 |
| 04-Feb-2008 |
yamt | sync with head.
|
1.69.2.5 |
| 21-Jan-2008 |
yamt | sync with head
|
1.69.2.4 |
| 27-Oct-2007 |
yamt | sync with head.
|
1.69.2.3 |
| 03-Sep-2007 |
yamt | sync with head.
|
1.69.2.2 |
| 30-Dec-2006 |
yamt | sync with head.
|
1.69.2.1 |
| 21-Jun-2006 |
yamt | sync with head.
|
1.73.12.1 |
| 24-May-2006 |
tron | Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
|
1.73.10.2 |
| 11-May-2006 |
elad | sync with head
|
1.73.10.1 |
| 19-Apr-2006 |
elad | sync with head.
|
1.73.8.3 |
| 03-Sep-2006 |
yamt | sync with head.
|
1.73.8.2 |
| 26-Jun-2006 |
yamt | sync with head.
|
1.73.8.1 |
| 24-May-2006 |
yamt | sync with head.
|
1.73.6.2 |
| 22-Apr-2006 |
simonb | Sync with head.
|
1.73.6.1 |
| 04-Feb-2006 |
simonb | Adapt for timecounters: mostly use get*time() and use "time_second" instead of "time.tv_sec".
|
1.73.4.1 |
| 09-Sep-2006 |
rpaulo | sync with head
|
1.78.2.1 |
| 19-Jun-2006 |
chap | Sync with head.
|
1.80.4.2 |
| 10-Dec-2006 |
yamt | sync with head.
|
1.80.4.1 |
| 22-Oct-2006 |
yamt | sync with head
|
1.80.2.2 |
| 12-Jan-2007 |
ad | Sync with head.
|
1.80.2.1 |
| 18-Nov-2006 |
ad | Sync with head.
|
1.83.4.1 |
| 12-Mar-2007 |
rmind | Sync with HEAD.
|
1.83.2.1 |
| 21-Nov-2010 |
riz | Pull up following revision(s) (requested by mrg in ticket #1411): sys/kern/subr_disk.c: revision 1.100 add some (uint64_t) casts so avoid 32 bit overflows. this fixes my 3TB disk with 4KB sectors and disklabel (which looks like it would work upto 16TB.) idea from mlelstv@.
|
1.85.4.1 |
| 11-Jul-2007 |
mjf | Sync with head.
|
1.85.2.6 |
| 24-Aug-2007 |
ad | Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details. Some minor portions are incomplete and needs to be verified as a whole.
|
1.85.2.5 |
| 20-Aug-2007 |
ad | Sync with head.
|
1.85.2.4 |
| 20-Aug-2007 |
ad | Sync with HEAD.
|
1.85.2.3 |
| 20-Aug-2007 |
ad | - Alter disk attach/detach to fix a panic when closing a vnd device. - Sync with HEAD.
|
1.85.2.2 |
| 19-Aug-2007 |
ad | - Back out the biodone() changes. - Eliminate B_ERROR (from HEAD).
|
1.85.2.1 |
| 15-Jul-2007 |
ad | Sync with head.
|
1.86.2.1 |
| 15-Aug-2007 |
skrll | Sync with HEAD.
|
1.88.10.2 |
| 29-Jul-2007 |
ad | It's not a good idea for device drivers to modify b_flags, as they don't need to understand the locking around that field. Instead of setting B_ERROR, set b_error instead. b_error is 'owned' by whoever completes the I/O request.
|
1.88.10.1 |
| 29-Jul-2007 |
ad | file subr_disk.c was added on branch matt-mips64 on 2007-07-29 12:15:46 +0000
|
1.88.8.1 |
| 14-Oct-2007 |
yamt | sync with head.
|
1.88.6.3 |
| 23-Mar-2008 |
matt | sync with HEAD
|
1.88.6.2 |
| 09-Jan-2008 |
matt | sync with HEAD
|
1.88.6.1 |
| 06-Nov-2007 |
matt | sync with HEAD
|
1.88.4.1 |
| 26-Oct-2007 |
joerg | Sync with HEAD.
Follow the merge of pmap.c on i386 and amd64 and move pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup code to restore CR4 before jumping back into kernel space as the large page option might cover that.
|
1.89.12.1 |
| 30-Jan-2008 |
cube | constify disk->dk_name.
|
1.89.10.1 |
| 02-Jan-2008 |
bouyer | Sync with HEAD
|
1.89.6.1 |
| 04-Dec-2007 |
ad | Pull the vmlocking changes into a new branch.
|
1.89.4.1 |
| 18-Feb-2008 |
mjf | Sync with HEAD.
|
1.91.6.2 |
| 02-Jun-2008 |
mjf | Sync with HEAD.
|
1.91.6.1 |
| 03-Apr-2008 |
mjf | Sync with HEAD.
|
1.91.2.1 |
| 24-Mar-2008 |
keiichi | sync with head.
|
1.92.4.4 |
| 11-Mar-2010 |
yamt | sync with head
|
1.92.4.3 |
| 20-Jun-2009 |
yamt | sync with head
|
1.92.4.2 |
| 04-May-2009 |
yamt | sync with head.
|
1.92.4.1 |
| 16-May-2008 |
yamt | sync with head.
|
1.92.2.1 |
| 18-May-2008 |
yamt | sync with head.
|
1.93.10.3 |
| 07-Jan-2011 |
riz | Pull up following revision(s) (requested by mrg in ticket #1520): sys/sys/device.h: revision 1.133 sys/kern/subr_disk.c: patch Add helper function that determines the size and block size of a disk device. For now we query - the disk label - the wedge info and data from disk(9)
|
1.93.10.2 |
| 21-Nov-2010 |
riz | Pull up following revision(s) (requested by mrg in ticket #1463): sys/kern/subr_disk.c: revision 1.100 add some (uint64_t) casts so avoid 32 bit overflows. this fixes my 3TB disk with 4KB sectors and disklabel (which looks like it would work upto 16TB.) idea from mlelstv@.
|
1.93.10.1 |
| 04-Apr-2009 |
snj | Pull up following revision(s) (requested by ad in ticket #657): sys/kern/subr_disk.c: revision 1.95 sys/kern/subr_iostat.c: revision 1.17 sys/sys/disk.h: revision 1.52 sys/sys/iostat.h: revision 1.10 Add disk_isbusy(), iostat_isbusy().
|
1.93.8.2 |
| 28-Apr-2009 |
skrll | Sync with HEAD.
|
1.93.8.1 |
| 03-Mar-2009 |
skrll | Sync with HEAD.
|
1.94.2.2 |
| 23-Jul-2009 |
jym | Sync with HEAD.
|
1.94.2.1 |
| 13-May-2009 |
jym | Sync with HEAD.
Commit is split, to avoid a "too many arguments" protocol error.
|
1.99.4.1 |
| 05-Mar-2011 |
rmind | sync with head
|
1.99.2.1 |
| 22-Oct-2010 |
uebayasi | Sync with HEAD (-D20101022).
|
1.100.18.7 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.100.18.6 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.100.18.5 |
| 23-Jun-2013 |
tls | resync from head
|
1.100.18.4 |
| 25-Feb-2013 |
tls | resync with head
|
1.100.18.3 |
| 10-Feb-2013 |
tls | Add an accessor -- ufs_maxphys() -- to check the maximum transfer size for a given UFS mountpoint, and move the code from mount that finds the underlying disk and resets the mountpoint max transfer size into a utility function, ufs_update_maxphys().
Add a global serial number that counts disk property changes to which filesystems are meant to accomodate themselves. Make ufs_maxphys() check it. This is a sort of flag-polling interface that avoids callbacks into the filesystem code, but will require freezing filesystems and draining in-flight transactions before a decrease in size that is mandatory (like attaching a disk with a smaller maximum transfer size as a spare in a RAIDframe set), rather than "advisory", like finding out set geometry from a RAID controller long after boot and deciding a smaller transfer size would be optimal, can be signalled. Still, the "advisory" case is the common one so this is progress.
Make a bit of an example of RAIDframe by making it bump this new serial number when disks are added to the subsystem. I will attack one of the hardware RAID drivers (probably arcmsr) next.
|
1.100.18.2 |
| 02-Dec-2012 |
tls | Don't pass NULL struct dkdriver to disk_init. That's seriously bogus.
|
1.100.18.1 |
| 12-Sep-2012 |
tls | Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for physio (I/O to raw devices); needs more doing to get it going with the filesystems, but it shouldn't damage data.
All work's been done on amd64 so far. Not hard to add support to other ports. If others want to pitch in, one very helpful thing would be to sort out when and how IDE disks can do 128K or larger transfers, and adjust the various PCI IDE (or at least ahcisata) drivers and wd.c accordingly -- it would make testing much easier. Another very helpful thing would be to implement a smart minphys() for RAIDframe along the lines detailed in the MAXPHYS-NOTES file.
|
1.100.8.1 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.102.2.1 |
| 18-May-2014 |
rmind | sync with head
|
1.103.6.5 |
| 28-Aug-2017 |
skrll | Sync with HEAD
|
1.103.6.4 |
| 19-Mar-2016 |
skrll | Sync with HEAD
|
1.103.6.3 |
| 27-Dec-2015 |
skrll | Sync with HEAD (as of 26th Dec)
|
1.103.6.2 |
| 06-Jun-2015 |
skrll | Sync with HEAD
|
1.103.6.1 |
| 06-Apr-2015 |
skrll | Sync with HEAD
|
1.103.4.2 |
| 01-Jun-2015 |
snj | Pull up following revision(s) (requested by jnemeth in ticket #775): share/man/man9/disk.9: revision 1.37 sys/kern/subr_disk.c: revisions 1.104, 1.105 sys/dev/dksubr.c: revision 1.56 sys/sys/dkio.h: revision 1.21 Implement DIOCGMEDIASIZE and DIOCGSECTORSIZE from FreeBSD. -- clear error for new ioctls.
|
1.103.4.1 |
| 19-May-2015 |
snj | Pull up following revision(s) (requested by chs in ticket #766): sys/kern/subr_disk.c: revision 1.113 in bounds_check_with_*, reject negative block numbers and avoid a potential overflow in calculating the size of the request.
|
1.116.4.1 |
| 21-Apr-2017 |
bouyer | Sync with HEAD
|
1.116.2.1 |
| 20-Mar-2017 |
pgoyette | Sync with HEAD
|
1.119.2.3 |
| 29-Mar-2020 |
martin | Pull up following revision(s) (requested by mlelstv in ticket #1527):
sys/dev/scsipi/cd.c: revision 1.343 sys/kern/subr_disk.c: revision 1.130
Avoid division by zero if label isn't valid. Allow open of RAWPART even when no medium is loaded. Keep errors silent if no medium is loaded. Fixes PR kern/55104
|
1.119.2.2 |
| 01-Nov-2019 |
martin | Pull up following revision(s) (requested by cnst in ticket #1397):
sys/kern/subr_disk.c: revision 1.129
kern/subr_disk: bounds_check_with_label: really protect against div by zero
Solves kernel panic in NetBSD 8.1 amd64 on VirtualBox 6.0.12 r133076.
Triggered with an NVMe controller without any actual discs behind it:
nvme0 at pci0 dev 14 function 0: vendor 80ee product 4e56 (rev. 0x00) nvme0: NVMe 1.2 nvme0: interrupting at ioapic0 pin 22 nvme0: ORCL-VBOX-NVME-VER12, firmware 1.0, serial VB1234-56789 ld0 at nvme0 nsid 1 ld0: 0, 0 cyl, 16 head, 63 sec, 1 bytes/sect x 0 sectors
Code path is reached 4 times during normal boot, each time after wd0a is already mounted; this patch avoids a crash with a dirty filesystem.
|
1.119.2.1 |
| 05-Apr-2019 |
msaitoh | Pull up following revision(s) (requested by martin in ticket #1223): sys/sys/dkio.h: revision 1.25 sys/kern/subr_disk.c: revision 1.123 sys/dev/dksubr.c: revision 1.107 sys/dev/ccd.c: revision 1.179 sys/dev/ofw/ofdisk.c: revision 1.53 Add a disk ioctl DIOCRMWEDGES to remove all wedges of a given disk (if not busy).
|
1.121.2.1 |
| 15-Mar-2018 |
pgoyette | Synch with HEAD
|
1.122.2.3 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.122.2.2 |
| 08-Apr-2020 |
martin | Merge changes from current as of 20200406
|
1.122.2.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.128.2.1 |
| 02-Apr-2020 |
martin | Pull up following revision(s) (requested by mlelstv in ticket #814):
sys/dev/scsipi/cd.c: revision 1.343 sys/kern/subr_disk.c: revision 1.130
Avoid division by zero if label isn't valid.
Allow open of RAWPART even when no medium is loaded. Keep errors silent if no medium is loaded.
Fixes PR kern/55104
|
1.132.8.1 |
| 31-May-2021 |
cjep | sync with head
|
1.132.6.1 |
| 17-Jun-2021 |
thorpej | Sync w/ HEAD.
|
1.134.4.1 |
| 01-Aug-2023 |
martin | Pull up following revision(s) (requested by riastradh in ticket #284):
sys/dev/dkwedge/dk.c 1.125-1.158 sys/kern/subr_disk.c 1.135-1.137 sys/sys/disk.h 1.78
dk(4): Explain why dk_rawopens can't overflow and assert it.
dk(4): Restore assertions in dklastclose.
We only enter dklastclose if the wedge is open (sc->sc_dk.dk_openmask != 0), which can happen only if dkfirstopen has succeeded, in which case we hold a dk_rawopens reference to the parent that prevents anyone else from closing it. Hence sc->sc_parent->dk_rawopens > 0.
On open, sc->sc_parent->dk_rawvp is set to nonnull, and it is only reset to null on close. Hence if the parent is still open, as it must be here, sc->sc_parent->dk_rawvp must be nonnull.
dk(4): Avoid holding dkwedges_lock while allocating array.
This is not great -- we shouldn't be choosing the unit number here anyway; we should just let autoconf do it for us -- but it's better than potentially blocking any dk_openlock or dk_rawlock (which are sometimes held when waiting for dkwedges_lock) for memory allocation.
dk(4): KNF: return (v) -> return v. No functional change intended.
dk(4): KNF: Whitespace. No functional change intended.
dk(4): Omit needless void * cast. No functional change intended.
dk(4): Fix typo in comment: dkstrategy, not dkstragegy. No functional change intended.
dk(4): ENXIO, not ENODEV, means no such device. ENXIO is `device not configured', meaning there is no such device. ENODEV is `operation not supported by device', meaning the device is there but refuses the operation, like writing to a read-only medium.
Exception: For undefined ioctl commands, it's not ENODEV _or_ ENXIO, but rather ENOTTY, because why make any of this obvious when you could make it obscure Unix lore?
dk(4): KNF: Sort includes. No functional change intended.
dk(4): <sys/rwlock.h> for rwlock(9).
dk(4): Prevent races in access to struct dkwedge_softc::sc_size. Rules: 1. Only ever increases, never decreases. (Decreases require removing and readding the wedge.) 2. Increases are serialized by dk_openlock. 3. Reads can happen unlocked in any context where the softc is valid.
Access is gathered into dkwedge_size* subroutines -- don't touch sc_size outside these. For now, we use rwlock(9) to keep the reasoning simple. This should be done with atomics on 64-bit platforms and a seqlock on 32-bit platforms to avoid contention.
However, we can do that in a later change.
dk(4): Move CFDRIVER_DECL and CFATTACH_DECL3_NEW earlier in file.
Follows the pattern of most drivers, and will be necessary for referencing dk_cd in dk_bdevsw and dk_cdevsw soon, to prevent open/detach races. No functional change intended.
dk(4): Don't touch dkwedges or ndkwedges outside dkwedges_lock.
dk(4): Assert parent vp is nonnull before we stash it away.
Let's enable early attribution if this goes wrong.
If it's not the parent's first open, also assert the parent vp is already nonnull.
dk(4): Assert dkwedges[unit] is the sc we're about to free.
dk(4): Require dk_openlock in dk_set_geometry.
Not strictly necessary but this makes reasoning easier and documents with an assertion how disk_set_info is serialized.
disk(9): Fix use-after-free race with concurrent disk_set_info.
This can happen with dk(4), which allows wedges to have their size increased without destroying and recreating the device instance.
Drivers which allow concurrent disk_set_info and disk_ioctl must serialize disk_set_info with dk_openlock.
dk(4): Add null d_cancel routine to devsw.
This way, dkclose is guaranteed that dkopen, dkread, dkwrite, dkioctl, &c., have all returned before it runs. For block opens, setting d_cancel also guarantees that any buffered writes are flushed with vinvalbuf before dkclose is called.
dk(4): Fix callout detach race. 1. Set a flag sc_iostop under the lock sc_iolock so dkwedge_detach and dkstart don't race over it. 2. Decline to schedule the callout if sc_iostop is set. The callout is already only ever scheduled while the lock is held. 3. Use callout_halt to wait for any concurrent callout to complete. At this point, it can't reschedule itself.
Without this change, the callout could be concurrently rescheduling itself as we issue callout_stop, leading to use-after-free later.
dk(4): Use disk_begindetach and rely on vdevgone to close instances.
The first step is to decide whether we can detach (if forced, yes; if not forced, only if not already open), and prevent new opens if so.
There's no need to start closing open instances at this point -- we're just making a decision to detach, and preventing new opens by transitioning state that dkopen will respect[*].
The second step is to force all open instances to close. This is done by vdevgone. By the time vdevgone returns, there can be no open instances, so if there _were_ any, closing them via vdevgone will have passed through dklastclose.
After that point, there can be no opens and no I/O operations, so dk_openmask must already be zero and the bufq must be empty.
Thus, there's no need to have an explicit call to dklastclose (via dkwedge_cleanup_parent) before or after making the decision to detach. [*] Currently access to this state is racy: nothing serializes dkwedge_detach's state transition with dkopen's test. TBD in a separate commit shortly.
dk(4): Set .d_cfdriver and .d_devtounit to plug open/detach race.
This way, opening dkN or rdkN will wait if attach or detach is still in progress, and vdevgone will wake up such pending opens and make them fail. So it is no longer possible for a wedge to be detached after dkopen has already started using it.
For now, we use a custom .d_devtounit function that looks up the autoconf unit number via the dkwedges array, which conceivably may use an independent unit numbering system -- nothing guarantees they match up. (In practice they will mostly match up, but concurrent wedge creation could lead to different numbering.) Eventually this should be changed so the two numbering systems match, which would let us delete the new dkunit function and just use dev_minor_unit like many other drivers can.
dk(4): Take a read-lock on dkwedges_lock if we're only reading. - dkwedge_find_by_name - dkwedge_find_by_parent - dkwedge_print_wnames
dk(4): Omit needless locking in dksize, dkdump.
All the members these use are stable after initialization, except for the wedge size, which dkwedge_size safely reads a snapshot of without locking in the caller.
dk(4): dkdump: Simplify. No functional change intended.
dk(4): Narrow the scope of the device numbering lookup on detach.
Just need it for vdevgone, order relative to other things in detach doesn't matter. No functional change intended.
disk(9): Fix missing unlock in error branch in previous change.
dk(4): Fix racy access to sc->sc_dk.dk_openmask in dkwedge_delall1. Need sc->sc_parent->dk_rawlock for this, as used in dkopen/dkclose.
dk(4): Convert tests to assertions in various devsw operations. .d_cancel, .d_strategy, .d_read, .d_write, .d_ioctl, and .d_discard are only ever used between successful .d_open return and entry to .d_close. .d_open doesn't return until sc is nonnull and sc_state is RUNNING, and dkwedge_detach waits for the last .d_close before setting sc_state to DEAD. So there is no possibility for sc to be null or for sc_state to be anything other than RUNNING or DYING.
There is a small functional change here but only in the event of a race: in the short window between when dkwedge_detach is entered, and when .d_close runs, any I/O operations (read, write, ioctl, &c.) may be issued that would have failed with ENXIO before.
This shouldn't matter for anything: disk I/O operations are supposed to complete reasonably promptly, and these operations _could_ have begun milliseconds prior, before dkwedge_detach was entered, so it's not a significant distinction.
Notes: - .d_open must still contend with trying to open a nonexistent wedge, of course. - .d_close must also contend with closing a nonexistent wedge, in case there were two calls to open in quick succession and the first failed while the second hadn't yet determined it would fail. - .d_size and .d_dump are used from ddb without any open/close.
dk(4): Fix lock assertion in size increase: parent's, not wedge's.
dk(4): Rename label for consistency. No functional change intended.
dk(4): dkclose must handle a dying wedge too to close the parent.
Otherwise the parent open leaks on detach (or revoke) when the wedge was open and had to be forcibly closed.
Fixes assertion sc->sc_dk.dk_openmask == 0. ioctl(DIOCRMWEDGES): Delete only idle wedges.
Don't forcibly delete busy wedges.
Fixes accidental destruction of the busy wedge that the root file system is mounted on, triggered by syzbot's ioctl(DIOCRMWEDGES).
dk(4): Omit needless sc_iopend, sc_dkdrn mechanism. vdevgone guarantees that all instances are closed by the time it returns, which in turn guarantees all I/O operations (read, write, ioctl, &c.) have completed, and, if the block device is open, vinvalbuf(V_SAVE) -> vflushbuf has completed, which forces all buffered transfers to be issued and waits for them to complete. So by the time vdevgone returns, no further transfers can be submitted and the bufq must be empty.
dk(4): Fix typo: sc_state, not sc_satte.
Had tested a patch series, but not every patch in it, and I inadvertently fixed the typo in a later patch in the series, not in the one I committed.
dk(4): Make it clearer that dkopen EROFS branch doesn't leak. It looked like we may need to sometimes call dklastclose in error branch for the case of (flags & ~sc->sc_mode & FWRITE) != 0, but it is not actually possible to reach that case: if the caller requested read/write, and the parent is read-only, and it is the first time we've opened the parent, then dkfirstopen will fail with EROFS so we never get there.
But this is confusing and it looked like the error branch is wrong, so let's rearrange the conditional to make it clearer that we cannot goto out after dkfirstopen has succeeded. And then assert that the case cannot happen when we do call dkfirstopen.
dk(4): Need pdk->dk_openlock to read pdk->dk_wedges.
|
1.137.6.1 |
| 02-Aug-2025 |
perseant | Sync with HEAD
|