Home | History | Annotate | Download | only in raidframe
History log of /src/sys/dev/raidframe/rf_states.c
RevisionDateAuthorComments
 1.53  23-Jul-2021  oster All IO is async in the RAIDframe kernel driver, so desc->async_flag
isn't needed. Cleanup the flag from rf_DoAccess() and its caller as
well.
 1.52  23-Jul-2021  oster Extensive mechanical changes to the pools used in RAIDframe.

Alloclist remains not per-RAID, so initialize that pool
separately/differently than the rest.

The remainder of pools in RF_Pools_s are now per-RAID pools. Mostly
mechanical changes to functions to allocate/destroy per-RAID pools.
Needed to make raidPtr available in certain cases to be able to find
the per-RAID pools.

Extend rf_pool_init() to now populate a per-RAID wchan value that is
unique to each pool for a given RAID device.

TODO: Complete the analysis of the minimum number of items that are
required for each pool to allow IO to progress (i.e. so that a request
for pool resources can always be satisfied), and dynamically scale
minimum pool sizes based on RAID configuration.
 1.51  10-Oct-2019  christos branches: 1.51.12;
fix the function pointer and callback mess:
- callback functions return 0 and their result is not checked; make them void.
- there are two types of callbacks and they used to overload their parameters
and the callback structure; separate them into "function" and "value"
callbacks.
- make the wait function signature consistent.
 1.50  03-Jan-2016  mlelstv branches: 1.50.18;
refactor driver to use common code in dksubr.
 1.49  11-May-2011  mrg branches: 1.49.14; 1.49.32;
convert the main raidPtr mutex to a kmutex, and add a couple of cv's to
cover the old sleep/wakeup points for adding_hot_spare and waitForReconCond.
convert all remaining simple_lock's to kmutexes (they're not used or compiled
right now... even with all options enabled) and remove the support for them.

this leaves just a pair of tsleep()/wakeup() calls using old scheduling APIs.
 1.48  10-May-2011  mrg convert RF_CommonLogData_s/RF_ReconMap_s mutex to a kmutex/cv.
 1.47  05-May-2011  mrg convert access_suspend_mutex to a kmutex/cv.
 1.46  27-Apr-2011  mrg prepare to convert more raidframe old lock/sleep APIs to mutex/condvar:

- remove RF_DECLARE_EXTERN_MUTEX and RF_DECLARE_STATIC_MUTEX, the qualifier
can be provided at the use point with the normal define
- rename the *LGMGR_MUTEX() macros to *mutex2() names, and add some more
defines for use:
rf_declare_mutex2()
rf_declare_cond2()
rf_lock_mutex2()
rf_unlock_mutex2()
rf_init_mutex2()
rf_destroy_mutex2()
rf_init_cond2()
rf_destroy_cond2()
rf_wait_cond2()
rf_signal_cond2()
rf_broadcast_cond2()
- use the new names for the configureMutex(), which previous used some combo
of direct mutex* calls and macros
- convert the node_queue to use a mutex/cv combo
- in rf_ShutdownEngine() and DAGExecutionThread(), also signal the former from
the latter when it is done and about to exit
- convert iodone_lock to use the new macros
 1.45  23-Apr-2011  mrg convert the iodone_lock to a mutex, and use a condvar for signalling.

this only handles the smallest use of old simple_lock/tsleep/wakeup
APIs inside raidframe, and it points out that cv(9)'s have only one
wait channel per cv, whereas each tsleep() caller can specify a
different wait channel. this change removes the difference between
normal raidio and waiting for IO during shutdown.

i've tested this one 3 systems, ran atf, and had mlelstv and rmind
review the change.
 1.44  17-Nov-2009  jld branches: 1.44.4; 1.44.6;
Finally commit the RAIDframe parity map Summer Of Code project.

Drastically reduces the amount of time spent rewriting parity after an
unclean shutdown by keeping better track of which regions might have had
outstanding writes. Enabled by default; can be disabled on a per-set
basis, or tuned, with the new raidctl(8) commands.

Discussed on tech-kern@ to a general air of approval; exhortations to
commit from mrg@, christos@, and others.

Thanks to Google for their sponsorship, oster@ for mentoring the
project, assorted developers for trying very hard to break it, and
probably more I'm forgetting.
 1.43  20-May-2008  oster branches: 1.43.8; 1.43.16;
Add in a missing "bp->b_resid = bp->b_bcount" in the EIO case.
Spotted by Juergen Hannken-Illjes. Thanks!
 1.42  12-Feb-2008  oster branches: 1.42.6; 1.42.8; 1.42.10; 1.42.12;
rf_debugMem.c: remove unused 'rc' variable for RF_DEBUG_MEM.
rf_driver.c: minor comment tweak. Improve debugging output in
RF_DEBUG_QUIESCE.
rf_states.c: fix argument to rf_PrintDAGList() in the
RF_DEBUG_VALIDATE_DAG case.


Changes from Olivier Cherrier. Thanks!!
 1.41  29-Jul-2007  ad branches: 1.41.6; 1.41.12; 1.41.22;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.40  11-Dec-2005  christos branches: 1.40.24; 1.40.30; 1.40.38; 1.40.40;
merge ktrace-lwp.
 1.39  25-Sep-2005  oster Re-work the handling of incoming I/O in RAIDframe:
- introduce rf_buf_queue_check() which checks to see if there
is work to do in the incoming buffer queue
- rf_RaidIOThread() is now responsible for calling raidstart(), and is
also now the only place that calls raidstart()
- raidstrategy() now just queues requests in buf_queue
and signals rf_RaidIOThread() that work has arrived

Hopefully addresses PR#30233
 1.38  27-Feb-2005  perry branches: 1.38.2; 1.38.4;
nuke trailing whitespace
 1.37  14-Jan-2005  oster branches: 1.37.2; 1.37.4;
After walking through desc->dagList nuking entries, make sure
desc->dagList is set to NULL before continuing. If we don't,
there's a danger that we'll try to re-free these items later.
(This should fix a panic reported to me via private communciation.)
 1.36  16-Nov-2004  oster On an idea from Thor (tls@), do not fail a component if doing so would
render the RAID set completely dead. Instead, we retry the IO a
maximum of RF_RETRY_THRESHOLD times (currently '5'), and then just
return an IO error if the IO fails. This should reduce the damage
caused by having multiple disks appear to fail when the culprit is
really something else (power, controllers, etc.)
 1.35  23-Mar-2004  oster branches: 1.35.4;
Ooops.. this free should come at the end of the loop. Thanks
to Juergen Hannken-Illjes for pointing it out.
 1.34  22-Mar-2004  oster If the DAG failed, need to make sure we wipe the dagList structures too.
 1.33  21-Mar-2004  oster Why start a timer, and then just ignore it? *punt*
 1.32  20-Mar-2004  oster NO_STRIPE_LOCKS is never set, so this code will always execute.
Remove conditionals, and left-shift code.
 1.31  19-Mar-2004  oster Re-work rf_State_Quiesce() so that we don't have to hold a lock
while doing a pool_get().
 1.30  13-Mar-2004  oster - don't use rf_PrintUserStats() for recon statistics.
rf_PrintUserStats() was mean for the simulator, and doesn't provide
any real info in kernel-space, especially for reconstructs.
Reconstructing actually renders the stats even more useless, since it
resets them all to zero before the reconstruct starts!

- since rf_PrintUserStats() is no longer used, nuke it along with the
routines that feed it. Nothing was using this code, and if we ever
need it again, we know where to find it.
 1.29  02-Mar-2004  oster A few more cases where RF_DEBUG_PSS can be used.
 1.28  29-Feb-2004  oster Minor shuffling of variable declarations to clean RF_ACC_TRACE #defines
up a bit. No functional changes.
 1.27  29-Feb-2004  oster Add "RF_ACC_TRACE" as a new #define to rf_archs.h.
Use it to conditionalize some of the access tracing and tracerec bits.
Chops about 4 K off of an i386 GENERIC kernel.
 1.26  27-Feb-2004  oster Use a dynamically allocated linked list of dagLists instead of using a
dynamically allocated variable-sized array (dagArray). Convert code
to use the new linked list stuff instead of the array stuff (the ratio
of one dagList per stripe still applies). The big advantage is in
being able to more efficiently allocate the dagLists on-the-fly, and
not have to know the size(s) of the array beforehand.
 1.25  02-Jan-2004  oster Fix the "We panic if we can't create a DAG" problem that's existed
~forever. This requires a number of things:

1) If we can't create a DAG, set desc->numStripes to 0 in
rf_SelectAlgorithm. This will ensure that we don't attempt to free
any dagArray[] elements in rf_StateCleanup.

2) Modify rf_State_CreateDAG() to not panic in the event of a DAG
failure. Instead, set the bp->b_flags and bp->b_error, and set things
up to skip to rf_State_Cleanup().

3) Need to mark desc->status as "bad" so that we actually stop looking
for a different DAG. (which we won't find... no matter how many times
we try).

4) rf_State_LastState() will then do the biodone(), and return EIO for
the IO in question.

5) Remove some " || 1 "'s from ProcessNode(). These were for
debugging, and we don't need the failure notices spewing
over and over again as the failing DAGs are processed.

6) Needed to change

if (asmap->numDataFailed + asmap->numParityFailed > 1)

to

if ((asmap->numDataFailed + asmap->numParityFailed > 1) ||
(raidPtr->numFailures > 1)){

in rf_raid5.c so that it doesn't try to return
rf_CreateNonRedundantWriteDAG as the creation function.

7) Note that we can't apply the above change to the RAID 1 code as
with the silly "fake 2-D" RAID 1 sets, it is possible to have 2 failed
components in the RAID 1 set, and that would stop them from working.
(I really don't know why/how those "fake 2-D" RAID 1 sets even work
with all the "single-fault" assumptions present in the rest of the
code.)

8) Needed to protect rf_RAID0DagSelect() in a similar way -- it should
return NULL as the createFunc.

9) No point printing out "Multiple disks failed..." a zillion times.
 1.24  01-Jan-2004  oster Cleanup some unused desc->flags:
RF_DAG_RETURN_DAG
RF_DAG_RETURN_ASM
RF_DAG_TEST_ACCESS
and the code that goes with them. A couple more of these
can probably go too, but I might need them in a bit.
 1.23  31-Dec-2003  oster Clean up a bunch of comments.
 1.22  30-Dec-2003  oster Some days you wonder if some of the function declaration consistency
was just an accident in the first place. Cleanup function decls and
a few comments. [ok.. so I wasn't going to fix this many.. but once
you're on a roll....]
 1.21  29-Dec-2003  oster [Having received a definite lack of strenuous objection, a small amount
of strenuous agreement, and some general agreement, this commit is
going ahead because it's now starting to block some other changes I
wish to make.]

Remove most of the support for the concept of "rows" from RAIDframe.
While the "row" interface has been exported to the world, RAIDframe
internals have really only supported a single row, even though they
have feigned support of multiple rows.

Nothing changes in configuration land -- config files still need to
specify a single row, etc. All auto-config structures remain fully
forward/backwards compatible.

The only visible difference to the average user should be a
reduction in the size of a GENERIC kernel (i386) by 4.5K. For those
of us trolling through RAIDframe kernel code, a lot of the driver
configuration code has become a LOT easier to read.
 1.20  23-Sep-2002  oster branches: 1.20.6;
The 'reconDesc' argument to rf_SignalQuiescenceLock() is a holdover from
simulation code. *poof* Thanks to Simon B.
 1.19  19-Sep-2002  oster Introduce and use RF_DEBUG_STATES to save a bit more kernel space.
 1.18  17-Sep-2002  oster RF_DEBUG_ACCESS and RF_DEBUG_QUIESCE make things a little smaller.
 1.17  13-Jul-2002  oster Most folks won't need the DAG printing and verification routines.
Introduce a #define to toggle them on/off. Disable calls to
rf_PrintDAGList(). Saves ~6K on GENERIC+DEBUG kernel on i386.
 1.16  13-Nov-2001  lukem branches: 1.16.8;
add RCSIDs
 1.15  20-Oct-2000  oster branches: 1.15.2; 1.15.4;
Move disk_busy() and disk_unbusy() to more sane locations. Values
reported by 'systat iostat' and friends are now much more correct for
RAIDframe devices. Thanks to Andrew Doran for poking me about this,
and for suggestions on and review of the changes.
 1.14  12-Oct-2000  oster Minor fixup for a printf(). Noted by Robert Elz.
 1.13  09-Jan-2000  oster branches: 1.13.4;
Nuke desc->tid.
 1.12  08-Jan-2000  oster - nuke calls to rf_get_threadid() and associated #include
- change a bunch of debugging printfs from
"[%d] ...", tid (where tid is the "thread id")
to
"raid%d: ...", raidPtr->raidid
- other minor rototillage
 1.11  07-Jan-2000  oster nuke one call to rf_get_threadid() and cleanup rf_State_Cleanup a bit.
 1.10  12-Dec-1999  oster Rework how we do the 'wakeup' when an IO completes.
 1.9  07-Dec-1999  oster More cleanup. DKUSAGE (what little was left of it) goes bye-bye.
 1.8  03-Dec-1999  oster We don't support RF_DAG_TEST_ACCESS.
 1.7  08-Jul-1999  oster branches: 1.7.2; 1.7.8;
Once upon a time, long long ago, there was a "fix" added to the
RAIDframe driver to stop it from eating too much kernel memory when
writing data. But that fix had a nasty side-affect of hurting write
performance (*much* more than I thought it would). These changes nuke
that "fix", and instead put in a more reasonable mechanism for limiting
the number of simultaneous IO's which can be happening for each RAID device.
The result is a noticeable improvement in write throughput. The End.
 1.6  05-Feb-1999  oster branches: 1.6.2; 1.6.4;
Phase 2 of the RAIDframe cleanup. The source is now closer to KNF
and is much easier to read. No functionality changes.
 1.5  26-Jan-1999  oster Nuke more bits of RAIDframe "demo" code. We're not "demoing" here,
we're doing the Real Thing!
 1.4  26-Jan-1999  oster RAIDframe cleanup, phase 1. Nuke simulator support, user-land driver,
out-dated comments, and other unneeded stuff. This helps prepare
for cleaning up the rest of the code, and adding new functionality.

No functional changes to the kernel code in this commit.
 1.3  15-Jan-1999  explorer Make it so raidframe will only perform synchronous writes, and async
reads. This avoids a problem where many writes will cause the driver
to allocate way too much memory.

This needs to change to a queueing system later, which will provide a
way to limit the memory consumed by the driver.

Without these changes, raidframe would use 24M or more on my machine when
the buffer cache dumped all its dirty blocks. Now it uses around 200k
or so.
 1.2  13-Nov-1998  drochner fix egcs warning
 1.1  13-Nov-1998  oster RAIDframe, version 1.1, from the Parallel Data Laboratory at
Carnegie Mellon University. Full RAID implementation, including
levels 0, 1, 4, 5, 6, parity logging, and a few other goodies.
Ported to NetBSD by Greg Oster.
 1.6.4.1  02-Aug-1999  thorpej Update from trunk.
 1.6.2.2  20-Dec-1999  he Pull up revisions 1.8-1.10 (requested by oster):
Re-work the IO throttle code. Fixes potential panics under high
loads under FFS.
 1.6.2.1  26-Sep-1999  cgd pull up rev 1.7 from trunk (requested by oster):
Add a more reasonable throttling mechanism to the RAIDframe code.
Increases write performance, and helps prevent the I/O routines from
using too much kernel memory.
 1.7.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
A i386 GENERIC kernel compiles without the siop, ahc and bha drivers
(will be updated later). i386 IDE/ATAPI and ncr work, as well as
sparc/esp_sbus. alpha should work as well (untested yet).
siop, ahc and bha will be updated once I've updated the branch to current
-current, as well as machine-dependant code.
 1.13.4.2  20-Oct-2000  tv Pullup 1.15 [oster]:
Move disk_busy() and disk_unbusy() to more sane locations. Values
reported by 'systat iostat' and friends are now much more correct for
RAIDframe devices. Thanks to Andrew Doran for poking me about this,
and for suggestions on and review of the changes.
 1.13.4.1  16-Oct-2000  tv Pullup 1.14 [oster]:
Minor fixup for a printf(). Noted by Robert Elz.
 1.15.4.3  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.15.4.2  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.15.4.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.2.3  18-Oct-2002  nathanw Catch up to -current.
 1.15.2.2  01-Aug-2002  nathanw Catch up to -current.
 1.15.2.1  14-Nov-2001  nathanw Catch up to -current.
 1.16.8.1  15-Jul-2002  gehenna catch up with -current.
 1.20.6.7  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.20.6.6  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.20.6.5  17-Jan-2005  skrll Sync with HEAD.
 1.20.6.4  29-Nov-2004  skrll Sync with HEAD.
 1.20.6.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.20.6.2  18-Sep-2004  skrll Sync with HEAD.
 1.20.6.1  03-Aug-2004  skrll Sync with HEAD
 1.35.4.2  16-Apr-2005  tron Pull up revision 1.37 (requested by oster in ticket #1095):
After walking through desc->dagList nuking entries, make sure
desc->dagList is set to NULL before continuing. If we don't,
there's a danger that we'll try to re-free these items later.
(This should fix a panic reported to me via private communciation.)
 1.35.4.1  06-Apr-2005  tron Pull up revision 1.36 (requested by oster in ticket #1038):
On an idea from Thor (tls@), do not fail a component if doing so would
render the RAID set completely dead. Instead, we retry the IO a
maximum of RF_RETRY_THRESHOLD times (currently '5'), and then just
return an IO error if the IO fails. This should reduce the damage
caused by having multiple disks appear to fail when the culprit is
really something else (power, controllers, etc.)
 1.37.4.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.37.2.1  29-Apr-2005  kent sync with -current
 1.38.4.3  27-Feb-2008  yamt sync with head.
 1.38.4.2  03-Sep-2007  yamt sync with head.
 1.38.4.1  21-Jun-2006  yamt sync with head.
 1.38.2.2  25-May-2008  bouyer Pull up following revision(s) (requested by oster in ticket #1934):
sys/dev/raidframe/rf_states.c: revision 1.43
Add in a missing "bp->b_resid = bp->b_bcount" in the EIO case.
Spotted by Juergen Hannken-Illjes. Thanks!
 1.38.2.1  04-Oct-2005  tron Pull up following revision(s) (requested by oster in ticket #853):
sys/dev/raidframe/rf_netbsdkintf.c: revision 1.190
sys/dev/raidframe/rf_netbsd.h: revision 1.24
sys/dev/raidframe/rf_states.c: revision 1.39
sys/dev/raidframe/rf_engine.c: revision 1.36
Re-work the handling of incoming I/O in RAIDframe:
- introduce rf_buf_queue_check() which checks to see if there
is work to do in the incoming buffer queue
- rf_RaidIOThread() is now responsible for calling raidstart(), and is
also now the only place that calls raidstart()
- raidstrategy() now just queues requests in buf_queue
and signals rf_RaidIOThread() that work has arrived
Hopefully addresses PR#30233
 1.40.40.1  15-Aug-2007  skrll Sync with HEAD.
 1.40.38.1  03-Jun-2008  skrll Sync with netbsd-4.
 1.40.30.2  19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.40.30.1  09-Jun-2007  ad Sync with head.
 1.40.24.1  25-May-2008  bouyer Pull up following revision(s) (requested by oster in ticket #1154):
sys/dev/raidframe/rf_states.c: revision 1.43
Add in a missing "bp->b_resid = bp->b_bcount" in the EIO case.
Spotted by Juergen Hannken-Illjes. Thanks!
 1.41.22.2  29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.41.22.1  29-Jul-2007  ad file rf_states.c was added on branch matt-mips64 on 2007-07-29 12:50:23 +0000
 1.41.12.1  18-Feb-2008  mjf Sync with HEAD.
 1.41.6.1  23-Mar-2008  matt sync with HEAD
 1.42.12.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.42.10.2  11-Mar-2010  yamt sync with head
 1.42.10.1  04-May-2009  yamt sync with head.
 1.42.8.1  04-Jun-2008  yamt sync with head
 1.42.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.43.16.1  21-Apr-2010  matt sync to netbsd-5
 1.43.8.1  10-Dec-2009  snj Pull up following revision(s) (requested by tron in ticket #1187):
sbin/raidctl/raidctl.8: revisions 1.57-1.59 via patch
sbin/raidctl/raidctl.c: revision 1.42 via patch
sys/dev/raidframe/files.raidframe: revision 1.8 via patch
sys/dev/raidframe/rf_copyback.c: revision 1.42 via patch
sys/dev/raidframe/rf_disks.c: revision 1.72 via patch
sys/dev/raidframe/rf_driver.c: revision 1.122 via patch
sys/dev/raidframe/rf_engine.c: revision 1.40 via patch
sys/dev/raidframe/rf_kintf.h: revision 1.21 via patch
sys/dev/raidframe/rf_netbsdkintf.c: revision 1.269 via patch
sys/dev/raidframe/rf_paritymap.c: revisions 1.1-1.3 via patch
sys/dev/raidframe/rf_paritymap.h: revision 1.1 via patch
sys/dev/raidframe/rf_parityscan.c: revision 1.33 via patch
sys/dev/raidframe/rf_parityscan.h: revision 1.8 via patch
sys/dev/raidframe/rf_raid.h: revision 1.38 via patch
sys/dev/raidframe/rf_reconstruct.c: revision 1.108 via patch
sys/dev/raidframe/rf_states.c: revision 1.44 via patch
sys/dev/raidframe/raidframeio.h: revision 1.6 via patch
sys/dev/raidframe/raidframevar.h: revision 1.13 via patch
Pull up the RAIDframe parity map Summer Of Code project.
Drastically reduces the amount of time spent rewriting parity after an
unclean shutdown by keeping better track of which regions might have had
outstanding writes. Enabled by default; can be disabled on a per-set
basis, or tuned, with the new raidctl(8) commands.
 1.44.6.1  06-Jun-2011  jruoho Sync with HEAD.
 1.44.4.1  31-May-2011  rmind sync with head
 1.49.32.1  19-Mar-2016  skrll Sync with HEAD
 1.49.14.1  03-Dec-2017  jdolecek update from HEAD
 1.50.18.1  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.51.12.1  01-Aug-2021  thorpej Sync with HEAD.

RSS XML Feed