Home | History | Annotate | Download | only in ic
History log of /src/sys/dev/ic/nvmevar.h
RevisionDateAuthorComments
 1.28  14-Aug-2022  jmcneill nvme: Make sure that q_ccb_list is always accessed with the q lock held.
 1.27  01-Aug-2022  mlelstv Now really restore 1.24.
 1.26  01-Aug-2022  mlelstv Revert last accidental commits.
 1.25  01-Aug-2022  mlelstv Also fix shift values for SCT constants.
 1.24  07-May-2022  skrll Add support for Apple silicon NVME. Ported from OpenBSD.
 1.23  16-Nov-2021  skrll Trailing whitespace
 1.22  29-May-2021  riastradh nvme(4): Add suspend/resume, derived from OpenBSD.
 1.21  28-Jul-2020  jdolecek branches: 1.21.6; 1.21.8;
add a quirk to disable MSI, and enable it for Intel SSD DC P4500

this device seems to cause serious system responsiveness issues when configured
to use MSI, while it works fine when configured for either INTx or MSI-X

this is important so this works well under Xen Dom0, which doesn't
support MSI-X yet

fixes another issue reported as feedback for PR port-xen/55285 by Frank Kardel
 1.20  28-Jun-2019  jmcneill branches: 1.20.2;
Fix a performance issue where one busy queue can starve all other queues.

In normal operations with multiple queues, the nvme driver will attempt
to schedule I/O requests on the submitting CPU. This breaks down when any
one of the queues becomes full; the driver returns EAGAIN to the disk
layer, which causes the disk layer to stop submitting more requests until
the blocked request is consumed. When space becomes available in the full
queue, it pulls the next buffer from the bufq and fills the queue again,
until finally hitting EAGAIN and preventing other queues from processing
requests.

Two changes here to fix the problem:

- When processing requests from the bufq, attempt to assign them to the
queue associated with the CPU that originated the request.
- If that queue is busy, try to find another queue with available space
before returning EAGAIN. This way, only when all queues are full will
the disk layer stop submitting more requests.

Now for some real numbers. On a Rockchip RK3399 board (6 CPUs), with 6
concurrent readers:

Old code:
4294967296 bytes transferred in 52.420 secs (81933752 bytes/sec)
4294967296 bytes transferred in 53.969 secs (79582117 bytes/sec)
4294967296 bytes transferred in 55.391 secs (77539082 bytes/sec)
4294967296 bytes transferred in 55.649 secs (77179595 bytes/sec)
4294967296 bytes transferred in 56.102 secs (76556402 bytes/sec)
4294967296 bytes transferred in 72.901 secs (58915066 bytes/sec)

New code:
4294967296 bytes transferred in 37.171 secs (115546186 bytes/sec)
4294967296 bytes transferred in 37.611 secs (114194445 bytes/sec)
4294967296 bytes transferred in 37.655 secs (114061009 bytes/sec)
4294967296 bytes transferred in 38.247 secs (112295534 bytes/sec)
4294967296 bytes transferred in 38.496 secs (111569183 bytes/sec)
4294967296 bytes transferred in 38.595 secs (111282997 bytes/sec)
 1.19  24-Apr-2019  mlelstv Expose device type. You can query it with e.g. drvctl -p ld0 disk-info/type.
 1.18  01-Dec-2018  jdolecek support DIOCSCACHE + DKCACHE_WRITE if volatile write cache is present

fix the Get Features call for DIOCGCACHE to actually retrieve the current
value properly
 1.17  19-Apr-2018  christos branches: 1.17.2;
s/static inline/static __inline/g for consistency.
 1.16  18-Apr-2018  nonaka nvme(4): Added some delay before check RDY bit quirk when disabling device.

Pick from FreeBSD nvme(4) r326937.
 1.15  16-Mar-2018  jdolecek refactor the locking code around DIOCGCACHE handling to be reusable
for other infrequent commands

it uses single condvar for simplicity, and uses it both when waiting
for ccb or command completion - this is fine, since usually there
will be just one such command qeueued anyway

use this to finally properly implement DIOCCACHESYNC - return only after
the command is confirmed as completed by the controller
 1.14  16-Mar-2018  jdolecek stop using q_nccbs_avail for deciding whether there are available ccbs;
no need to maintain a counter _and_ q_ccb_list

this fixes deadlock when all ccbs happen to be taken before completion
interrupt - nvme_q_complete() increased q_nccbs_avail only after
processing all the completed commands, by then there was nothing
left to actually kick the disk queue again into action

this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc()
or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path

fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette
for testing
 1.13  05-Apr-2017  jdolecek branches: 1.13.6; 1.13.12; 1.13.14;
expose disk device FUA/DPO support via DIOCGCACHE, and allow the flags
to be set for I/O; implement support in sd(4) and nvme(4)

discussed on tech-kern
 1.12  28-Feb-2017  jdolecek implement DIOCGCACHE
 1.11  01-Nov-2016  jdolecek branches: 1.11.2;
pass maxphys from device rather then assuming MAXPHYS; it's clipped in ld(4)
if bigger then MAXPHYS

multiply the queue size by number of queues for ld(4) sc_maxqueuecnt, so
that ld_diskstart() would try to use full capacity, instead of throttling
to one queue worth of commands
 1.10  01-Nov-2016  jdolecek tighter queue control - according to spec actual cap on number of commands
in flight is actually one less then queue size, head == tail means empty
queue
 1.9  20-Oct-2016  jdolecek revert change from rev. 1.12:
"""
slightly optimize memory access - change struct nvme_queue so that the
struct dmamem members are allocated as part of it, instead of separate
kmem_alloc()s
"""

that change quite curiously caused completion queue corruption on MP systems,
regardless of MPSAFE setting for the pci/softintr interrupt
 1.8  19-Oct-2016  jdolecek add debug code to check for completion queue corruption
 1.7  19-Oct-2016  jdolecek follow advice of spec and block interrupts via INTMS/INTMC for intx handler;
this also makes it possible to offload the actual interrupt processing to softintr
handler, similar as for MSI/MSI-X
 1.6  27-Sep-2016  pgoyette Modularize the ld driver and all of its attachments. Ensure that all
parents are capable of rescan (or otherwise provide a means of attaching
children post-initialization).
 1.5  19-Sep-2016  jdolecek slightly optimize memory access - change struct nvme_queue so that the
struct dmamem members are allocated as part of it, instead of separate
kmem_alloc()s
 1.4  19-Sep-2016  jdolecek on further thought, just remove the separately allocated nvme_ns_context
altogether and fold into nvme_ccb; allocating this separately just isn't useful
 1.3  18-Sep-2016  jdolecek fix several bugs, make nvme(4) MPSAFE by default and also bump default
number of ioq from 128 to 1024; tested with VirtualBox and QEMU

* remove NVME_INTMC/NVME_INTMS writes in hw intr handler as this is not MPSAFE,
fortunately they don't seem to be necessary; shaves two register writes
* need to use full mutex_enter() in nvme_q_complete(), to avoid small
race between one handler exiting the loop and another entering
* for MSI, handover the command result processing to softintr; unfortunately
can't easily do that for INTx interrupts as they require doorbell write
to deassert
* unlock/relock q->q_cq_mtx before calling ccb_done to avoid potential deadlocks
* make sure to destroy queue mutexes when destroying the queue (LOCKDEBUG)
* make ns ctx pool per-device, so that it's deallocated properly on module
unload
* handle ctx allocation failure in ld_nvme_dobio()
* remove splbio() calls in ld_nvme_dobio() and sync, the paths are exercised
only for dump/shutdown, and that already disables interrupts
* free the ns ctx in ld_nvme_biodone() before calling lddone() to avoid
memory starvation, as lddone() can trigger another i/o request
* be more careful with using PR_WAITOK, the paths are called from interrupt
context and there we can't wait
 1.2  04-Jun-2016  nonaka branches: 1.2.2;
Add NVMe command passthrough support.
 1.1  01-May-2016  nonaka branches: 1.1.2;
Added nvme(4) for Non-Volatile Memory Host Controller Interface devices.
Ported from OpenBSD.
 1.1.2.6  28-Aug-2017  skrll Sync with HEAD
 1.1.2.5  05-Dec-2016  skrll Sync with HEAD
 1.1.2.4  05-Oct-2016  skrll Sync with HEAD
 1.1.2.3  09-Jul-2016  skrll Sync with HEAD
 1.1.2.2  29-May-2016  skrll Sync with HEAD
 1.1.2.1  01-May-2016  skrll file nvmevar.h was added on branch nick-nhusb on 2016-05-29 08:44:21 +0000
 1.2.2.3  26-Apr-2017  pgoyette Sync with HEAD
 1.2.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.2.2.1  04-Nov-2016  pgoyette Sync with HEAD
 1.11.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.13.14.3  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.13.14.2  22-Apr-2018  pgoyette Sync with HEAD
 1.13.14.1  22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.13.12.2  03-Dec-2017  jdolecek update from HEAD
 1.13.12.1  05-Apr-2017  jdolecek file nvmevar.h was added on branch tls-maxphys on 2017-12-03 11:37:03 +0000
 1.13.6.3  19-Apr-2018  martin Pull up following revision(s) (requested by nonaka in ticket #781):

sbin/nvmectl/Makefile 1.4
sbin/nvmectl/bignum.c 1.2
sbin/nvmectl/devlist.c 1.3-1.5
sbin/nvmectl/firmware.c 1.3,1.4
sbin/nvmectl/identify.c 1.3-1.5
sbin/nvmectl/logpage.c 1.5-1.7
sbin/nvmectl/nvme.h 1.3
sbin/nvmectl/nvmectl.8 1.5
sbin/nvmectl/nvmectl.c 1.5-1.7
sbin/nvmectl/nvmectl.h 1.5-1.8
sbin/nvmectl/perftest.c 1.3-1.5
sbin/nvmectl/power.c 1.3,1.4
sbin/nvmectl/reset.c 1.2,1.3
sbin/nvmectl/util.c 1.1,1.2
sbin/nvmectl/wdc.c 1.2-1.4
sys/dev/ic/ld_nvme.c 1.20
sys/dev/ic/nvme.c 1.38,1.39
sys/dev/ic/nvmeio.h 1.2
sys/dev/ic/nvmereg.h 1.10,1.11
sys/dev/ic/nvmevar.h 1.16
sys/dev/pci/nvme_pci.c 1.20

nvmectl(8): Sync with FreeBSD nvmecontrol(8) r328763.

nvmectl(8): fix wdc command usage.

nvme(4): Added some delay before check RDY bit quirk when disabling device.
Pick from FreeBSD nvme(4) r326937.

Add some new structure fileds, opcodes and statuses from NVMe 1.3a.

nvmectl(8): Add big-endian support.
from FreeBSD nvmecontolr(8) r329824.

nvmectl(8): fix subcommand usage.

nvmectl(8): Remove some wdc subcommands from man page.
- wdc drive-log
- wdc get-crash-dump
- wdc purge
- wdc purge-monitor

Typos.

use setprogname()/getprogname(), do not hardcode the prognam name in fixed
strings
 1.13.6.2  18-Mar-2018  martin Pull up following revision(s) (requested by jdolecek in ticket #641):
sys/dev/ic/nvme.c: revision 1.34
sys/dev/ic/nvme.c: revision 1.35
sys/dev/ic/nvme.c: revision 1.36
sys/dev/ic/nvme.c: revision 1.37
sys/dev/ic/ld_nvme.c: revision 1.19
sys/dev/ic/nvmevar.h: revision 1.15

refactor the locking code around DIOCGCACHE handling to be reusable
for other infrequent commands,it uses single condvar for simplicity,
and uses it both when waiting for ccb or command completion - this
is fine, since usually there will be just one such command qeueued anyway
use this to finally properly implement DIOCCACHESYNC - return only after
the command is confirmed as completed by the controller.

switch handling of passthrough commands to use queue, instead of polling
should fix PR kern/53059 by Frank Kardel

fix passthrough command usage also in nvme_get_number_of_queues(), fixes
memory corruption and possible panic on boot

also remove now duplicate nvme_ccb_put() call from
nvme_get_number_of_queues()
 1.13.6.1  17-Mar-2018  martin Pull up following revision(s) (requested by jdolecek in ticket #636):
sys/dev/ic/nvme.c: revision 1.33
sys/dev/ic/nvmevar.h: revision 1.14
stop using q_nccbs_avail for deciding whether there are available ccbs;
no need to maintain a counter _and_ q_ccb_list
this fixes deadlock when all ccbs happen to be taken before completion
interrupt - nvme_q_complete() increased q_nccbs_avail only after
processing all the completed commands, by then there was nothing
left to actually kick the disk queue again into action
this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc()
or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path
fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette
for testing
 1.17.2.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.17.2.1  10-Jun-2019  christos Sync with HEAD
 1.20.2.1  21-Jun-2021  martin Pull up following revision(s) (requested by riastradh in ticket #1305):

sys/dev/ic/nvmevar.h: revision 1.22
sys/dev/ic/nvme.c: revision 1.56
sys/dev/ic/nvme.c: revision 1.57
sys/dev/pci/nvme_pci.c: revision 1.30

nvme(4): Add suspend/resume, derived from OpenBSD.

nvme(4): Move disestablishment of admin q interrupt to nvme_detach.

Nothing re-established this after suspend/resume, so attempting
suspend/resume/suspend would crash, and presumably we would miss
interrupts after resume. This keeps the establish/disestablish more
symmetric in attach/detach.
 1.21.8.1  31-May-2021  cjep sync with head
 1.21.6.1  17-Jun-2021  thorpej Sync w/ HEAD.

RSS XML Feed