p The parameters to an I/O transfer described by .Fa bp are specified by the following .Vt "struct buf" fields: l -tag -offset abcd t Fa bp Ns Li "->b_flags" Flags specifying the type of transfer. l -tag -compact t Dv B_READ Transfer is read from device. If not set, transfer is write to device. t Dv B_ASYNC Asynchronous I/O. Caller must provide .Fa bp Ns Li "->b_iodone" and must not call .Fn biowait bp . .El For legibility, callers should indicate writes by passing the pseudo-flag .Dv B_WRITE , which is zero. t Fa bp Ns Li "->b_data" Pointer to kernel virtual address of source/target for transfer. t Fa bp Ns Li "->b_bcount" Nonnegative number of bytes requested for transfer. t Fa bp Ns Li "->b_blkno" Block number at which to do transfer. t Fa bp Ns Li "->b_iodone" If .Dv B_ASYNC is set in .Fa bp Ns Li "->b_flags" , an I/O completion callback. .El
p Additionally, if the I/O transfer is a write associated with a .Xr vnode 9 .Fa vp , then before the user submits it to a block device, the user must increment .Fa vp Ns Li "->v_numoutput" . The user must not acquire .Fa vp Ns Ap s vnode lock between incrementing .Fa vp Ns Li "->v_numoutput" and submitting .Fa bp to a block device -- doing so will likely cause deadlock with the syncer.
p Block I/O transfers may be synchronous or asynchronous: l -dash t If synchronous, after submitting the transfer to a block device, the user must call .Fn biowait bp in order to wait until the transfer has completed. t If asynchronous, the user must set .Dv B_ASYNC in .Fa bp Ns Li "->b_flags" and provide .Fa bp Ns Li "->b_iodone" . After submitting the transfer to a block device, .Fa bp Ns Li "->b_iodone" will eventually be called with .Fa bp as its argument when the transfer has completed. The user .Em may not call .Fn biowait bp in this case. .El .Sh NESTED I/O TRANSFERS Sometimes an I/O transfer from a single buffer in memory cannot go to a single location on a block device: it must be split up into smaller transfers for each segment of the memory buffer.
p After initializing the .Li b_flags , .Li b_data , and .Li b_bcount parameters of an I/O transfer for the buffer, called the .Em master buffer, the user can issue smaller transfers for segments of the buffer using .Fn nestiobuf_setup . The nested I/O transfers are asynchronous -- when they complete, they debit from the amount of work left to be done in the master buffer. If any segments of the buffer were skipped, the user can report this with .Fn nestiobuf_done to debit the skipped part of the work.
p The master buffer's I/O transfer is completed when all nested buffers' I/O transfers are completed, and if .Fn nestiobuf_done is called in the case of skipped segments.
p For writes associated with a vnode .Fa vp , .Fn nestiobuf_setup accounts for .Fa vp Ns Li "->v_numoutput" , so the caller is not allowed to acquire .Fa vp Ns Ap s vnode lock before submitting the nested I/O transfer to a block device. However, the caller is responsible for accounting the master buffer in .Fa vp Ns Li "->v_numoutput" . This must be done very carefully because after incrementing .Fa vp Ns Li "->v_numoutput" , the caller is not allowed to acquire .Fa vp Ns Ap s vnode lock before either calling .Fn nestiobuf_done or submitting the last nested I/O transfer to a block device.
p For example: d -literal -offset abcd struct buf *mbp, *bp; size_t skipped = 0; unsigned i; int error = 0; mbp = getiobuf(vp, true); mbp->b_data = data; mbp->b_resid = mbp->b_bcount = datalen; mbp->b_flags = B_WRITE; KASSERT(i < nsegs); for (i = 0; i < nsegs; i++) { struct vnode *devvp; daddr_t blkno; vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL); VOP_UNLOCK(vp); if (error == 0 && nbn == -1) error = EIO; if (error) { skipped += segsz; break; } bp = getiobuf(vp, true); nestiobuf_setup(bp, mbp, i*segsz, segsz); bp->b_blkno = blkno; if (i == nsegs - 1) /* Last segment. */ break; VOP_STRATEGY(devvp, bp); } /* * Account v_numoutput for master write. * (Must not vn_lock before last VOP_STRATEGY!) */ mutex_enter(&vp->v_interlock); vp->v_numoutput++; mutex_exit(&vp->v_interlock); if (skipped) nestiobuf_done(mbp, skipped, error); else VOP_STRATEGY(devvp, bp); .Ed .Sh BLOCK DEVICE DRIVERS Block device drivers implement a .Sq strategy method, in the .Li d_strategy member of .Li struct bdevsw
q Xr driver 9 , to queue a buffer for disk I/O. The inputs to the strategy method are: l -tag -offset abcd t Fa bp Ns Li "->b_flags" Flags specifying the type of transfer. l -tag -compact t Dv B_READ Transfer is read from device. If not set, transfer is write to device. .El t Fa bp Ns Li "->b_data" Pointer to kernel virtual address of source/target for transfer. t Fa bp Ns Li "->b_bcount" Nonnegative number of bytes requested for transfer. t Fa bp Ns Li "->b_blkno" Block number at which to do transfer, relative to partition start. .El
p If the strategy method uses .Xr bufq 9 , it must additionally initialize the following fields before queueing .Fa bp with .Xr bufq_put 9 : l -tag -offset abcd t Fa bp Ns Li "->b_rawblkno" Block number relative to volume start. .El
p When the I/O transfer is complete, whether it succeeded or failed, the strategy method must: l -dash t Set .Fa bp Ns Li "->b_error" to zero on success, or to an .Xr errno 2 error code on failure. t Set .Fa bp Ns Li "->b_resid" to the number of bytes remaining to transfer, whether on success or on failure. If no bytes are transferred, this must be set to .Fa bp Ns Li "->b_bcount" . t Call .Li "biodone(" Ns Fa bp Ns Li ")" . .El .Sh FUNCTIONS l -tag -width abcd t Fn biodone bp Notify that the I/O transfer described by .Fa bp has completed.
p To be called by a block device driver. Caller must first set .Fa bp Ns Li "->b_error" to an error code and .Fa bp Ns Li "->b_resid" to the number of bytes remaining to transfer. t Fn biowait bp Wait for the synchronous I/O transfer described by .Fa bp to complete. Returns the value of .Fa bp Ns Li "->b_error" .
p To be called by a user requesting the I/O transfer.
p May not be called if .Fa bp represents an asynchronous transfer, i.e. if .Dv B_ASYNC is set in .Fa bp Ns Li "->b_flags" . t Fn getiobuf vp waitok Allocate a .Fa struct buf for an I/O transfer. If .Fa vp is nonnull, the transfer is associated with it. If .Fa waitok is false, returns null if none can be allocated immediately.
p The resulting .Li struct buf pointer must eventually be passed to .Fn putiobuf to release it. Do .Em not use .Xr brelse 9 .
p May sleep if .Fa waitok is true. t Fn putiobuf bp Free .Fa bp , which must have been allocated by .Fn getiobuf . Either .Fa bp must never have been submitted to a block device, or the I/O transfer must have completed. .El .Sh CODE REFERENCES The .Nm subsystem is implemented in
a sys/kern/vfs_bio.c . .Sh SEE ALSO .Xr buffercache 9 , .Xr bufq 9 .Sh BUGS The .Nm abstraction provides no way to cancel an I/O transfers once it has been submitted to a block device.
p The .Nm abstraction provides no way to do I/O transfers with non-kernel pages, e.g. directly to buffers in userland without copying into the kernel first.
p The .Vt "struct buf" type is all mixed up with the .Xr buffercache 9 .
p The .Nm abstraction is a totally idiotic API design.
p The .Li v_numoutput accounting required of .Nm callers is asinine.