History log of /src/sys/dev/ic/nvmevar.h |
Revision | | Date | Author | Comments |
1.28 |
| 14-Aug-2022 |
jmcneill | nvme: Make sure that q_ccb_list is always accessed with the q lock held.
|
1.27 |
| 01-Aug-2022 |
mlelstv | Now really restore 1.24.
|
1.26 |
| 01-Aug-2022 |
mlelstv | Revert last accidental commits.
|
1.25 |
| 01-Aug-2022 |
mlelstv | Also fix shift values for SCT constants.
|
1.24 |
| 07-May-2022 |
skrll | Add support for Apple silicon NVME. Ported from OpenBSD.
|
1.23 |
| 16-Nov-2021 |
skrll | Trailing whitespace
|
1.22 |
| 29-May-2021 |
riastradh | nvme(4): Add suspend/resume, derived from OpenBSD.
|
1.21 |
| 28-Jul-2020 |
jdolecek | branches: 1.21.6; 1.21.8; add a quirk to disable MSI, and enable it for Intel SSD DC P4500
this device seems to cause serious system responsiveness issues when configured to use MSI, while it works fine when configured for either INTx or MSI-X
this is important so this works well under Xen Dom0, which doesn't support MSI-X yet
fixes another issue reported as feedback for PR port-xen/55285 by Frank Kardel
|
1.20 |
| 28-Jun-2019 |
jmcneill | branches: 1.20.2; Fix a performance issue where one busy queue can starve all other queues.
In normal operations with multiple queues, the nvme driver will attempt to schedule I/O requests on the submitting CPU. This breaks down when any one of the queues becomes full; the driver returns EAGAIN to the disk layer, which causes the disk layer to stop submitting more requests until the blocked request is consumed. When space becomes available in the full queue, it pulls the next buffer from the bufq and fills the queue again, until finally hitting EAGAIN and preventing other queues from processing requests.
Two changes here to fix the problem:
- When processing requests from the bufq, attempt to assign them to the queue associated with the CPU that originated the request. - If that queue is busy, try to find another queue with available space before returning EAGAIN. This way, only when all queues are full will the disk layer stop submitting more requests.
Now for some real numbers. On a Rockchip RK3399 board (6 CPUs), with 6 concurrent readers:
Old code: 4294967296 bytes transferred in 52.420 secs (81933752 bytes/sec) 4294967296 bytes transferred in 53.969 secs (79582117 bytes/sec) 4294967296 bytes transferred in 55.391 secs (77539082 bytes/sec) 4294967296 bytes transferred in 55.649 secs (77179595 bytes/sec) 4294967296 bytes transferred in 56.102 secs (76556402 bytes/sec) 4294967296 bytes transferred in 72.901 secs (58915066 bytes/sec)
New code: 4294967296 bytes transferred in 37.171 secs (115546186 bytes/sec) 4294967296 bytes transferred in 37.611 secs (114194445 bytes/sec) 4294967296 bytes transferred in 37.655 secs (114061009 bytes/sec) 4294967296 bytes transferred in 38.247 secs (112295534 bytes/sec) 4294967296 bytes transferred in 38.496 secs (111569183 bytes/sec) 4294967296 bytes transferred in 38.595 secs (111282997 bytes/sec)
|
1.19 |
| 24-Apr-2019 |
mlelstv | Expose device type. You can query it with e.g. drvctl -p ld0 disk-info/type.
|
1.18 |
| 01-Dec-2018 |
jdolecek | support DIOCSCACHE + DKCACHE_WRITE if volatile write cache is present
fix the Get Features call for DIOCGCACHE to actually retrieve the current value properly
|
1.17 |
| 19-Apr-2018 |
christos | branches: 1.17.2; s/static inline/static __inline/g for consistency.
|
1.16 |
| 18-Apr-2018 |
nonaka | nvme(4): Added some delay before check RDY bit quirk when disabling device.
Pick from FreeBSD nvme(4) r326937.
|
1.15 |
| 16-Mar-2018 |
jdolecek | refactor the locking code around DIOCGCACHE handling to be reusable for other infrequent commands
it uses single condvar for simplicity, and uses it both when waiting for ccb or command completion - this is fine, since usually there will be just one such command qeueued anyway
use this to finally properly implement DIOCCACHESYNC - return only after the command is confirmed as completed by the controller
|
1.14 |
| 16-Mar-2018 |
jdolecek | stop using q_nccbs_avail for deciding whether there are available ccbs; no need to maintain a counter _and_ q_ccb_list
this fixes deadlock when all ccbs happen to be taken before completion interrupt - nvme_q_complete() increased q_nccbs_avail only after processing all the completed commands, by then there was nothing left to actually kick the disk queue again into action
this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc() or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path
fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette for testing
|
1.13 |
| 05-Apr-2017 |
jdolecek | branches: 1.13.6; 1.13.12; 1.13.14; expose disk device FUA/DPO support via DIOCGCACHE, and allow the flags to be set for I/O; implement support in sd(4) and nvme(4)
discussed on tech-kern
|
1.12 |
| 28-Feb-2017 |
jdolecek | implement DIOCGCACHE
|
1.11 |
| 01-Nov-2016 |
jdolecek | branches: 1.11.2; pass maxphys from device rather then assuming MAXPHYS; it's clipped in ld(4) if bigger then MAXPHYS
multiply the queue size by number of queues for ld(4) sc_maxqueuecnt, so that ld_diskstart() would try to use full capacity, instead of throttling to one queue worth of commands
|
1.10 |
| 01-Nov-2016 |
jdolecek | tighter queue control - according to spec actual cap on number of commands in flight is actually one less then queue size, head == tail means empty queue
|
1.9 |
| 20-Oct-2016 |
jdolecek | revert change from rev. 1.12: """ slightly optimize memory access - change struct nvme_queue so that the struct dmamem members are allocated as part of it, instead of separate kmem_alloc()s """
that change quite curiously caused completion queue corruption on MP systems, regardless of MPSAFE setting for the pci/softintr interrupt
|
1.8 |
| 19-Oct-2016 |
jdolecek | add debug code to check for completion queue corruption
|
1.7 |
| 19-Oct-2016 |
jdolecek | follow advice of spec and block interrupts via INTMS/INTMC for intx handler; this also makes it possible to offload the actual interrupt processing to softintr handler, similar as for MSI/MSI-X
|
1.6 |
| 27-Sep-2016 |
pgoyette | Modularize the ld driver and all of its attachments. Ensure that all parents are capable of rescan (or otherwise provide a means of attaching children post-initialization).
|
1.5 |
| 19-Sep-2016 |
jdolecek | slightly optimize memory access - change struct nvme_queue so that the struct dmamem members are allocated as part of it, instead of separate kmem_alloc()s
|
1.4 |
| 19-Sep-2016 |
jdolecek | on further thought, just remove the separately allocated nvme_ns_context altogether and fold into nvme_ccb; allocating this separately just isn't useful
|
1.3 |
| 18-Sep-2016 |
jdolecek | fix several bugs, make nvme(4) MPSAFE by default and also bump default number of ioq from 128 to 1024; tested with VirtualBox and QEMU
* remove NVME_INTMC/NVME_INTMS writes in hw intr handler as this is not MPSAFE, fortunately they don't seem to be necessary; shaves two register writes * need to use full mutex_enter() in nvme_q_complete(), to avoid small race between one handler exiting the loop and another entering * for MSI, handover the command result processing to softintr; unfortunately can't easily do that for INTx interrupts as they require doorbell write to deassert * unlock/relock q->q_cq_mtx before calling ccb_done to avoid potential deadlocks * make sure to destroy queue mutexes when destroying the queue (LOCKDEBUG) * make ns ctx pool per-device, so that it's deallocated properly on module unload * handle ctx allocation failure in ld_nvme_dobio() * remove splbio() calls in ld_nvme_dobio() and sync, the paths are exercised only for dump/shutdown, and that already disables interrupts * free the ns ctx in ld_nvme_biodone() before calling lddone() to avoid memory starvation, as lddone() can trigger another i/o request * be more careful with using PR_WAITOK, the paths are called from interrupt context and there we can't wait
|
1.2 |
| 04-Jun-2016 |
nonaka | branches: 1.2.2; Add NVMe command passthrough support.
|
1.1 |
| 01-May-2016 |
nonaka | branches: 1.1.2; Added nvme(4) for Non-Volatile Memory Host Controller Interface devices. Ported from OpenBSD.
|
1.1.2.6 |
| 28-Aug-2017 |
skrll | Sync with HEAD
|
1.1.2.5 |
| 05-Dec-2016 |
skrll | Sync with HEAD
|
1.1.2.4 |
| 05-Oct-2016 |
skrll | Sync with HEAD
|
1.1.2.3 |
| 09-Jul-2016 |
skrll | Sync with HEAD
|
1.1.2.2 |
| 29-May-2016 |
skrll | Sync with HEAD
|
1.1.2.1 |
| 01-May-2016 |
skrll | file nvmevar.h was added on branch nick-nhusb on 2016-05-29 08:44:21 +0000
|
1.2.2.3 |
| 26-Apr-2017 |
pgoyette | Sync with HEAD
|
1.2.2.2 |
| 20-Mar-2017 |
pgoyette | Sync with HEAD
|
1.2.2.1 |
| 04-Nov-2016 |
pgoyette | Sync with HEAD
|
1.11.2.1 |
| 21-Apr-2017 |
bouyer | Sync with HEAD
|
1.13.14.3 |
| 26-Dec-2018 |
pgoyette | Sync with HEAD, resolve a few conflicts
|
1.13.14.2 |
| 22-Apr-2018 |
pgoyette | Sync with HEAD
|
1.13.14.1 |
| 22-Mar-2018 |
pgoyette | Synch with HEAD, resolve conflicts
|
1.13.12.2 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.13.12.1 |
| 05-Apr-2017 |
jdolecek | file nvmevar.h was added on branch tls-maxphys on 2017-12-03 11:37:03 +0000
|
1.13.6.3 |
| 19-Apr-2018 |
martin | Pull up following revision(s) (requested by nonaka in ticket #781):
sbin/nvmectl/Makefile 1.4 sbin/nvmectl/bignum.c 1.2 sbin/nvmectl/devlist.c 1.3-1.5 sbin/nvmectl/firmware.c 1.3,1.4 sbin/nvmectl/identify.c 1.3-1.5 sbin/nvmectl/logpage.c 1.5-1.7 sbin/nvmectl/nvme.h 1.3 sbin/nvmectl/nvmectl.8 1.5 sbin/nvmectl/nvmectl.c 1.5-1.7 sbin/nvmectl/nvmectl.h 1.5-1.8 sbin/nvmectl/perftest.c 1.3-1.5 sbin/nvmectl/power.c 1.3,1.4 sbin/nvmectl/reset.c 1.2,1.3 sbin/nvmectl/util.c 1.1,1.2 sbin/nvmectl/wdc.c 1.2-1.4 sys/dev/ic/ld_nvme.c 1.20 sys/dev/ic/nvme.c 1.38,1.39 sys/dev/ic/nvmeio.h 1.2 sys/dev/ic/nvmereg.h 1.10,1.11 sys/dev/ic/nvmevar.h 1.16 sys/dev/pci/nvme_pci.c 1.20
nvmectl(8): Sync with FreeBSD nvmecontrol(8) r328763.
nvmectl(8): fix wdc command usage.
nvme(4): Added some delay before check RDY bit quirk when disabling device. Pick from FreeBSD nvme(4) r326937.
Add some new structure fileds, opcodes and statuses from NVMe 1.3a.
nvmectl(8): Add big-endian support. from FreeBSD nvmecontolr(8) r329824.
nvmectl(8): fix subcommand usage.
nvmectl(8): Remove some wdc subcommands from man page. - wdc drive-log - wdc get-crash-dump - wdc purge - wdc purge-monitor
Typos.
use setprogname()/getprogname(), do not hardcode the prognam name in fixed strings
|
1.13.6.2 |
| 18-Mar-2018 |
martin | Pull up following revision(s) (requested by jdolecek in ticket #641): sys/dev/ic/nvme.c: revision 1.34 sys/dev/ic/nvme.c: revision 1.35 sys/dev/ic/nvme.c: revision 1.36 sys/dev/ic/nvme.c: revision 1.37 sys/dev/ic/ld_nvme.c: revision 1.19 sys/dev/ic/nvmevar.h: revision 1.15
refactor the locking code around DIOCGCACHE handling to be reusable for other infrequent commands,it uses single condvar for simplicity, and uses it both when waiting for ccb or command completion - this is fine, since usually there will be just one such command qeueued anyway use this to finally properly implement DIOCCACHESYNC - return only after the command is confirmed as completed by the controller.
switch handling of passthrough commands to use queue, instead of polling should fix PR kern/53059 by Frank Kardel
fix passthrough command usage also in nvme_get_number_of_queues(), fixes memory corruption and possible panic on boot
also remove now duplicate nvme_ccb_put() call from nvme_get_number_of_queues()
|
1.13.6.1 |
| 17-Mar-2018 |
martin | Pull up following revision(s) (requested by jdolecek in ticket #636): sys/dev/ic/nvme.c: revision 1.33 sys/dev/ic/nvmevar.h: revision 1.14 stop using q_nccbs_avail for deciding whether there are available ccbs; no need to maintain a counter _and_ q_ccb_list this fixes deadlock when all ccbs happen to be taken before completion interrupt - nvme_q_complete() increased q_nccbs_avail only after processing all the completed commands, by then there was nothing left to actually kick the disk queue again into action this also fixes ccb leak on command errors e.g. with bus_dmamem_alloc() or bus_dmamel_load() - q_nccbs_avail was never decreased on the error path fixes PR kern/52769 by Martin Husemann, thanks to Paul Goyette for testing
|
1.17.2.2 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.17.2.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.20.2.1 |
| 21-Jun-2021 |
martin | Pull up following revision(s) (requested by riastradh in ticket #1305):
sys/dev/ic/nvmevar.h: revision 1.22 sys/dev/ic/nvme.c: revision 1.56 sys/dev/ic/nvme.c: revision 1.57 sys/dev/pci/nvme_pci.c: revision 1.30
nvme(4): Add suspend/resume, derived from OpenBSD.
nvme(4): Move disestablishment of admin q interrupt to nvme_detach.
Nothing re-established this after suspend/resume, so attempting suspend/resume/suspend would crash, and presumably we would miss interrupts after resume. This keeps the establish/disestablish more symmetric in attach/detach.
|
1.21.8.1 |
| 31-May-2021 |
cjep | sync with head
|
1.21.6.1 |
| 17-Jun-2021 |
thorpej | Sync w/ HEAD.
|