History log of /src/sys/dev/raidframe/rf_reconstruct.c |
Revision | | Date | Author | Comments |
1.129 |
| 17-Sep-2023 |
oster | Implement hot removal of spares and components. From manu@.
Implement a long desired feature of automatically incorporating a used spare into the array after a reconstruct.
Given the configuration: Components: /dev/wd0e: failed /dev/wd1e: optimal /dev/wd2e: optimal Spares: /dev/wd3e: spare
Running 'raidctl -F /dev/wd0e raid0' will now result in the following configuration after a successful rebuild: Components: /dev/wd3e: optimal /dev/wd1e: optimal /dev/wd2e: optimal No spares.
Thanks to manu@ for the development of the initial set of changes which allowed the changes to automatically incorporate a used spare to come to fruition. Thanks also to manu@ for useful discussions about and additional testing of these changes.
|
1.128 |
| 08-Sep-2023 |
oster | Revision 1.104 actually fixed the issues that were preventing us from freeing the ReconControl structures. So free them and thus also prevent a panic on shutdown due to items not being correctly returned to the pool.
Thanks to manu@ for report of the panic, and for initial testing of the changes.
XXX pullup-9 XXX pullup-10
|
1.127 |
| 27-Jul-2021 |
oster | branches: 1.127.10; rf_CreateDiskQueueData() no longer uses waitflag, and will always succeed. Cleanup the error path for the (no longer needed) PR_NOWAIT cases.
|
1.126 |
| 23-Jul-2021 |
oster | Extensive mechanical changes to the pools used in RAIDframe.
Alloclist remains not per-RAID, so initialize that pool separately/differently than the rest.
The remainder of pools in RF_Pools_s are now per-RAID pools. Mostly mechanical changes to functions to allocate/destroy per-RAID pools. Needed to make raidPtr available in certain cases to be able to find the per-RAID pools.
Extend rf_pool_init() to now populate a per-RAID wchan value that is unique to each pool for a given RAID device.
TODO: Complete the analysis of the minimum number of items that are required for each pool to allow IO to progress (i.e. so that a request for pool resources can always be satisfied), and dynamically scale minimum pool sizes based on RAID configuration.
|
1.125 |
| 15-Feb-2021 |
oster | branches: 1.125.4; Fix a long long-standing off-by-one error in computing lastPSID.
SUsPerPU is only really supported for a value of 1, and since the first PSID is 0, the last will be numStripe-1. Also update the setting of pending_writes to reflect the change to lastPSID.
Needs pullups to -8 and -9.
|
1.124 |
| 08-Dec-2019 |
mlelstv | branches: 1.124.8; Switch to vn_bdev_open* functions.
|
1.123 |
| 10-Oct-2019 |
christos | fix the function pointer and callback mess: - callback functions return 0 and their result is not checked; make them void. - there are two types of callbacks and they used to overload their parameters and the callback structure; separate them into "function" and "value" callbacks. - make the wait function signature consistent.
|
1.122 |
| 09-Feb-2019 |
christos | branches: 1.122.4; - Change the allocation macros to be more like function calls - Change sizeof(type) -> sizeof(*variable) - Use macros for the long buffer length allocations - Remove "bit polishing" memsets() -- do them only once - Remove unnecessary casts
Thanks to oster@ for finding bugs and testing.
|
1.121 |
| 14-Nov-2014 |
oster | branches: 1.121.12; 1.121.20;
Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete.
The issue was that a component was being marked as a used_spare when the rebuild started, not when the rebuild was actually finished. Marking it as a used_spare meant that the component label on the spare was being updated such that after a reboot the component would be considered up-to-date, regardless of whether the rebuild actually completed!
This fix includes: 1) Add an additional state "rf_ds_rebuilding_spare" which is used to denote that a spare is currently being rebuilt from the live components. 2) Update the comments on the disk states, which were out-of-sync with reality. 3) When rebuilding to a spare component, that spare now enters the state rf_ds_rebuilding_spare instead of the state rf_ds_used_spare. 4) When the rebuild is actually complete then the spare component enters the rf_ds_used_spare state. rf_ds_used_spare is now used exclusively for the case where the rebuilding to the spare has completed successfully.
XXX: Someday we need to teach raidctl(8) about this new state, and take out the backwards compatibility code in rf_netbsdkintf.c (see RAIDFRAME_GET_INFO in raidioctl()). For today, this fix needs to be generic enough that it can get backported without major grief.
XXX: Needs pullup to netbsd-5*, netbsd-6*, and netbsd-7
Fixes PR#49244.
|
1.120 |
| 14-Jun-2014 |
hannken | branches: 1.120.2; Change dk_lookup() to return an anonymous vnode not associated with any file system. Change all consumers of dk_lookup() to get the device from "v_rdev" instead of VOP_GETATTR() as specfs does not support VOP_GETATTR(). Devices obtained with dk_lookup() will no longer disappear on forced unmounts.
Fix for PR kern/48849 (root mirror raid fails on shutdown)
Welcome to 6.99.44
|
1.119 |
| 06-Mar-2013 |
yamt | branches: 1.119.10; fix parens in a message
|
1.118 |
| 20-Feb-2012 |
oster | branches: 1.118.2; Add logic to the main reconstruction loop to handle RAID5 with rotated spares. While here, observe that we were actually doing one more stripe than we thought we were, and correct that too (it didn't matter for non-RAID5_RS, but it definitely does for RAID5_RS). Add some bounds-checking at the beginning to handle the case where the number of stripes in the set is smaller than the sliding reconstruction window.
XXX: this problem likely needs to be fixed for PARITY_DECLUSTERING too.
|
1.117 |
| 14-Oct-2011 |
hannken | branches: 1.117.2; 1.117.6; 1.117.8; Change the vnode locking protocol of VOP_GETATTR() to request at least a shared lock. Make all calls outside of file systems respect it.
The calls from file systems need review.
No objections from tech-kern.
|
1.116 |
| 03-Aug-2011 |
oster | Address part of PR kern/44972. From YAMAMOTO Takashi. Thanks!
|
1.115 |
| 28-May-2011 |
yamt | rf_ReconstructInPlace: don't leave a vnode open on errors. fixes a part of PR/44972.
|
1.114 |
| 24-May-2011 |
buhrow | Suggested to oster@ and approved via private e-mail as a help to people who are getting reconstruction failures.
|
1.113 |
| 11-May-2011 |
mrg | convert the main raidPtr mutex to a kmutex, and add a couple of cv's to cover the old sleep/wakeup points for adding_hot_spare and waitForReconCond. convert all remaining simple_lock's to kmutexes (they're not used or compiled right now... even with all options enabled) and remove the support for them.
this leaves just a pair of tsleep()/wakeup() calls using old scheduling APIs.
|
1.112 |
| 02-May-2011 |
mrg | convert rb_mutex to a kmutex/cv.
|
1.111 |
| 19-Feb-2011 |
enami | Define accessors for number of blocks and partition size in the component label and use them where appropriate. Disscussed on tech-kern.
|
1.110 |
| 19-Nov-2010 |
dholland | branches: 1.110.2; 1.110.4; Introduce struct pathbuf. This is an abstraction to hold a pathname and the metadata required to interpret it. Callers of namei must now create a pathbuf and pass it to NDINIT (instead of a string and a uio_seg), then destroy the pathbuf after the namei session is complete.
Update all namei call sites accordingly. Add a pathbuf(9) man page and update namei(9).
The pathbuf interface also now appears in a couple of related additional places that were passing string/uio_seg pairs that were later fed into NDINIT. Update other call sites accordingly.
|
1.109 |
| 01-Nov-2010 |
mrg | add support for >2TB raid devices.
- add two new members to the component label: u_int numBlocksHi u_int partitionSizeHi and store the top 32 bits of the real number of blocks and partition size. modify rf_print_component_label(), rf_does_it_fit(), rf_AutoConfigureDisks() and rf_ReconstructFailedDiskBasic().
- call disk_blocksize() after disk_attach() [ from mlelstv ]
- shift the block number relative to DEV_BSHIFT in raidstart() and InitBP() so that accesses work for non 512-byte devices. [ from mlelstv ]
- update rf_getdisksize() to use the new getdisksize() [ from mlelstv. this part needs a separate change for netbsd-5. ]
reviewed by: oster, christos and darrenr
|
1.108 |
| 17-Nov-2009 |
jld | branches: 1.108.2; 1.108.4; Finally commit the RAIDframe parity map Summer Of Code project.
Drastically reduces the amount of time spent rewriting parity after an unclean shutdown by keeping better track of which regions might have had outstanding writes. Enabled by default; can be disabled on a per-set basis, or tuned, with the new raidctl(8) commands.
Discussed on tech-kern@ to a general air of approval; exhortations to commit from mrg@, christos@, and others.
Thanks to Google for their sponsorship, oster@ for mentoring the project, assorted developers for trying very hard to break it, and probably more I'm forgetting.
|
1.107 |
| 11-Feb-2009 |
oster | If we see a RF_RECON_WRITE_ERROR event we know a write has finished and we need to account for that. Failure to do so means we can end up waiting forever for writes we think are outstanding, but which have already completed.
Addresses the RAIDframe part of PR#40569. Thanks to Matthias Scheler for reporting the issue and verifying the fix.
|
1.106 |
| 20-Dec-2008 |
oster | branches: 1.106.2; When unconfiguring an array where a reconstruct is in progress, abort the reconstruct and wait for IOs to drain before pulling the plug.
Should fix the panic reported by der Mouse on tech-kern.
|
1.105 |
| 23-Sep-2008 |
oster | branches: 1.105.2; 1.105.4; Nuke unneeded printf(). Spotted by pooka@.
|
1.104 |
| 19-May-2008 |
oster | branches: 1.104.4; Re-work some of the guts of the reconstruction code.
Reconmap used to have one pointer for every reconstruction unit. This does not scale well in the land of 1TB disks, where some 100MB+ of "status pointers" are required for typical configurations. Convert the reconstruction code to use a "sliding status window" which will scale nicely regardless of the number of stripes/reconstruction units in the RAID set. Convert the main reconstruction loop to rebuild the array in chunks rather than in one big lump.
As part of these changes, introduce a function to kick any waiters on the head separation callback list, and use that in the main reconstruction event queue to wake up the waiters if things have stalled. (I believe this may fix a race condition that could occur at at least at the very end of a disk during reconstruction under heavy IO load.)
Thanks to Brian Buhrow for all his help, support, and patience in testing these changes.
|
1.103 |
| 15-Apr-2008 |
oster | branches: 1.103.2; 1.103.4; 1.103.6; A forced recon read should not default to indicating that the reads for that disk have stopped, since this will bump us out of the normal reconstruction loop prematurely.
Fixes the (mostly cosmetic) bug where the reconstruction status values stop updating, and from raidctl it appears that reconstruction has totally stalled (which it actually hasn't -- the reconstruction does complete properly, but not in the normal way).
|
1.102 |
| 14-Apr-2008 |
oster | Print out the status value if a reconstruction read fails. Don't print out write promotions during reconstruct unless we are debugging reconstructs.
|
1.101 |
| 26-Jan-2008 |
oster | branches: 1.101.6; In a land before time, when kernel processes roamed the system, we needed to keep track of the kernel process that opened a device in order to close it with the right credentials. Flash forward to today where curlwp is now quite sufficient.
|
1.100 |
| 26-Nov-2007 |
pooka | Remove the "struct lwp *" argument from all VFS and VOP interfaces. The general trend is to remove it from all kernel interfaces and this is a start. In case the calling lwp is desired, curlwp should be used.
quick consensus on tech-kern
|
1.99 |
| 21-Sep-2007 |
oster | branches: 1.99.6; Fix wording in a comment and correct a debug line. From Olivier Cherrier (via private mail). Thanks!
|
1.98 |
| 18-Jul-2007 |
ad | branches: 1.98.4; 1.98.6; 1.98.8; Fix fallout from recent kthread changes.
|
1.97 |
| 09-Jul-2007 |
ad | branches: 1.97.2; Merge some of the less invasive changes from the vmlocking branch:
- kthread, callout, devsw API changes - select()/poll() improvements - miscellaneous MT safety improvements
|
1.96 |
| 26-Jun-2007 |
cube | Change dk_lookup() to accept an additional argument of the type enum uio_seg that tells whether the given path is in user space or kernel space, so it can tell NDINIT().
While the raidframe calls were ok, both ccd(4) and cgd(4) were passing pointers to user space data, which leads to strange error on i386, as reported by Jukka Salmi on current-users.
The issue has been there since last august, I'm actually a bit surprised that no one in the meantime has used ccd(4) or cgd(4) on an arch where it would have simply faulted.
|
1.95 |
| 16-Nov-2006 |
christos | branches: 1.95.2; 1.95.8; 1.95.10; 1.95.16; __unused removal on arguments; approved by core.
|
1.94 |
| 12-Oct-2006 |
christos | - sprinkle __unused on function decls. - fix a couple of unused bugs - no more -Wno-unused for i386
|
1.93 |
| 27-Aug-2006 |
christos | branches: 1.93.2; 1.93.4; - use dk_lookup instead of our home-spun version. - allow raid to be configured in a wedge - allow wedges to be configured in a raid - add autoconfiguration of wedges in a raid
|
1.92 |
| 21-Jul-2006 |
ad | - Use the LWP cached credentials where sane. - Minor cosmetic changes.
|
1.91 |
| 14-May-2006 |
elad | integrate kauth.
|
1.90 |
| 11-Dec-2005 |
christos | branches: 1.90.4; 1.90.6; 1.90.8; 1.90.10; 1.90.12; merge ktrace-lwp.
|
1.89 |
| 18-Jul-2005 |
oster | If rf_SubmitReconBuffer indicates the submission was blocked (for whatever reason), return 0 instead of the default RF_RECON_READ_STOPPED. Returning RF_RECON_READ_STOPPED would result in rf_ContinueReconstructFailedDisk() thinking that the given component was "done" and breaking out of the main reconstruction loop far too early. Reconstruction still worked correctly as long as there were no errors, but RAIDframe wouldn't be in a position to properly handle read/write errors during reconstruction.
This fixes the "raidctl's progress bar spins at 0% until reconstruction finishes" problem.
|
1.88 |
| 08-Jun-2005 |
oster | branches: 1.88.2; - initialize numRUsTotal before we indicate that we are doing a reconstruct.
- make numRUsComplete and numRUsTotal 64-bit quantities like everything else that records this information.
|
1.87 |
| 27-Feb-2005 |
perry | branches: 1.87.2; nuke trailing whitespace
|
1.86 |
| 12-Feb-2005 |
oster | The 'next' argument to rf_CreateDiskQueueData is always NULL. Since there is no particular reason to pass an extra NULL argument, turf it, and initialize p->next to NULL within the function.
|
1.85 |
| 12-Feb-2005 |
oster | Add a 'waitflag' argument to rf_CreateDiskQueueData() and use it to determine if we are willing to wait for memory to come from the diskqueuedata (dqd) and bufpool pools. Cleanup the mess related to code calling rf_CreateDiskQueueData() with different expectations (and/or blatent disregard) of what might happen if there were insufficient pool resources.
|
1.84 |
| 06-Feb-2005 |
oster | It's not a bad idea to update the component labels whether or not the reconstruction was successful.
|
1.83 |
| 05-Feb-2005 |
oster | rf_GetNextReconEvent() *will* return a valid event, so no need for the assert. (we'd have panic'ed in there long before this assert if that wasn't the case).
Minor whitespace changes.
|
1.82 |
| 05-Feb-2005 |
oster | Vastly improve the error handling in the case of a read/write error that occurs during a reconstruction. We go from zero error handling and likely panicing if something goes amiss, to gracefully bailing and leaving the system in the best, usable state possible.
- introduce rf_DrainReconEventQueue() to allow easy cleaning of the reconstruction event queue
- change how we cleanup the floating recon buffers in rf_FreeReconControl(). Detect the end of the list rather than traversing according to a count.
- keep track of the number of pending reconstruction writes. In the event of a read error, use this to wait long enough for the pending writes to (hopefully) drain.
- more cleanup is still needed on this code, but I didn't want to start mixing major functional changes with minor cleanups.
XXX: There is a known issue with pool items left outstanding due to the IO failure, and this can show up in the form of a panic at the tail end of a shutdown. This problem is much less severe than before these changes, and the hope/plan is that this problem will go away once this code gets overhauled again.
|
1.81 |
| 22-Jan-2005 |
oster | branches: 1.81.2; Torch some #define's missed in last commit.
|
1.80 |
| 22-Jan-2005 |
oster | Reconstruction Descriptors are only allocated once per reconstruction, and don't need their own pool or freelist or anything fancier than a malloc/free.
|
1.79 |
| 18-Jan-2005 |
oster | ForceReconReadDoneProc() needs a return after doing the first rf_CauseReconEvent().
|
1.78 |
| 12-Dec-2004 |
oster | branches: 1.78.2; The switch() in rf_ContinueReconstructFailedDisk() is never actually used in non-simulation code, and thus is just wasting space (and making the code more confusing to read!). Turf the switch, left-shift the indentation of code, and nuke 'state' field of struct RF_RaidReconDesc_s.
No real functional changes.
|
1.77 |
| 15-Nov-2004 |
oster | continueFunc and continueArg arn't used. Turf. Simplify calls to rf_GetNextReconEvent().
|
1.76 |
| 18-Mar-2004 |
oster | branches: 1.76.4; Re-work the locking mechanisms for reconstruct and PSS structures such that we don't actually hold a simplelock while we are doing a pool_get(), but that we still effectively protecting critical code.
This should fix all of the outstanding LOCKDEBUG warnings related to rebuilding RAID sets.
|
1.75 |
| 13-Mar-2004 |
oster | - don't use rf_PrintUserStats() for recon statistics. rf_PrintUserStats() was mean for the simulator, and doesn't provide any real info in kernel-space, especially for reconstructs. Reconstructing actually renders the stats even more useless, since it resets them all to zero before the reconstruct starts!
- since rf_PrintUserStats() is no longer used, nuke it along with the routines that feed it. Nothing was using this code, and if we ever need it again, we know where to find it.
|
1.74 |
| 07-Mar-2004 |
oster | - Introduce rf_pools which contains all of the various global pools used by RAIDframe. Convert all other RAIDframe global pools to use pools defined within this new structure. - Introduce rf_pool_init(), used for initializing a single pool in RAIDframe. Teach each of the configuration routines to use rf_pool_init(). - Cleanup a few pool-related comments. - Cleanup revent initialization and #defines. - Add a missing pool_destroy() for the reconbuffer pool.
(Saves another 1K off of an i386 GENERIC kernel, and makes stuff a lot more readable)
|
1.73 |
| 07-Mar-2004 |
oster | - fix up initialization of rf_recond_pool - introduce rf_reconbuffer_pool and teach rf_MakeReconBuffer() to use it
|
1.72 |
| 05-Mar-2004 |
oster | Use RF_INCLUDE_PARITY_DECLUSTERING_DS to #if-out more unneeded bits. (We can't do RF_DISTRIBUTE_SPARE bits without the parity declustering stuff.)
|
1.71 |
| 03-Mar-2004 |
oster | Nuke some unnecessary casts. No functional changes.
|
1.70 |
| 03-Mar-2004 |
oster | Introduce RF_REVENT_READ_FAILED, RF_REVENT_WRITE_FAILED and RF_REVENT_FORCEREAD_FAILED. This removes 3 more RF_PANIC()'s (but we'll currently still panic if any of these cases occur). fix up a few printf's. XXX: still needs more cleanup and testing (and be taught to not panic).
|
1.69 |
| 03-Mar-2004 |
oster | Cleanup function prototypes.
|
1.68 |
| 03-Mar-2004 |
oster | - cleanup memory allocation in rf_AllocPSStatus() - change function signature of rf_LookupRUStatus(). The last argument is now a pointer to a new PSS, in case one is needed. Rather than having rf_LookupRUStatus() allocate a new PSS, we pre-allocate one beforehand, where necessary, just in case. - change callers of rf_lookupRUStatus() to deal with the new way of calling rf_lookupRUStatus().
[no improvement or worsening of parity rebuild/initialization performance.]
|
1.67 |
| 01-Mar-2004 |
oster | Use RF_ACC_TRACE to #if out more chunks of code related only to access tracing. (not turned on yet)
|
1.66 |
| 29-Feb-2004 |
oster | Adjust _rf_ShutdownCreate() so that it is willing to wait for more memory. Since we only now ever "return(0)", just return (void) instead.
Cleanup all uses of rf_ShutdownCreate() to not worry about it ever failing. Shaves another 600 bytes off of an i386 GENERIC kernel.
|
1.65 |
| 04-Jan-2004 |
oster | raidPtr->reconControl->percentCompleted only gets used in one debugging printf, and in rf_netbsdkintf.c. We can do the calculations inside of RF_DEBUG_RECON for the one debugging printf, and only perform the percentCompleted calculation "on demand" in the rf_netbsdkintf.c case. Shaves a few more bytes off an i386 GENERIC kernel, and ever-so-slightly decreases the amount of work performed during a reconstruct.
|
1.64 |
| 31-Dec-2003 |
oster | Add in a bunch of RF_SIGNAL_COND()'s that were missing. Tidy up a few lines.
|
1.63 |
| 31-Dec-2003 |
oster | Left-shift another else{} chunk. No functional changes.
|
1.62 |
| 31-Dec-2003 |
oster | left-shift the "else" part of the if(!lp_SubmitReconBuffer) condition. Cleanup. No real functional changes, just more readable.
|
1.61 |
| 31-Dec-2003 |
oster | Negate a condition, and flip if/else parts. Preparation for left-shifting the (now) else part. No real functional change.
|
1.60 |
| 30-Dec-2003 |
oster | Some days you wonder if some of the function declaration consistency was just an accident in the first place. Cleanup function decls and a few comments. [ok.. so I wasn't going to fix this many.. but once you're on a roll....]
|
1.59 |
| 29-Dec-2003 |
oster | Let's see... raidPtr->recon_done_procs is never set to anything (other than NULL when raidPtr is initialized). That means SignalReconDone() never does anything useful. Bye-bye!
Say good-bye to recon_done_procs and recon_done_procs_mutex (and its initializer) as well.
|
1.58 |
| 29-Dec-2003 |
oster | - first kick at a major reworking of RAIDframe's memory allocation code: - all freelists converted to pools - initialization of structure members in certain cases where code was relying on specific allocation and usage properties to keep structures in a "known state" (that doesn't work with pools!). - make most pool_get() be "PR_WAITOK" until they can be analyzed further, and/or have proper error handling added. - all RF_Mallocs zero the space returned, so there is no difference between RF_Calloc and RF_Malloc. In fact, all the RF_Calloc()'s do is tend to do is get things horribly confused. Make RF_Malloc() the "general memory allocator", with RF_MallocAndAdd() the "general memory allocator with allocation list". - some of these RF_Malloc's et al. are destined to disappear. - remove rf_rdp_freelist entirely (it's not used anywhere!) - remove: #include "rf_freelist.h" - to the files that were relying on the above, add: #include "rf_general.h" - add: #include "rf_debugMem.h" to rf_shutdown.h to make it happy about the loss of: #include "rf_freelist.h".
This shrinks an i386 GENERIC kernel by approx 5K. RAIDframe now weighs in at about 162K on i386.
|
1.57 |
| 29-Dec-2003 |
oster | [Having received a definite lack of strenuous objection, a small amount of strenuous agreement, and some general agreement, this commit is going ahead because it's now starting to block some other changes I wish to make.]
Remove most of the support for the concept of "rows" from RAIDframe. While the "row" interface has been exported to the world, RAIDframe internals have really only supported a single row, even though they have feigned support of multiple rows.
Nothing changes in configuration land -- config files still need to specify a single row, etc. All auto-config structures remain fully forward/backwards compatible.
The only visible difference to the average user should be a reduction in the size of a GENERIC kernel (i386) by 4.5K. For those of us trolling through RAIDframe kernel code, a lot of the driver configuration code has become a LOT easier to read.
|
1.56 |
| 29-Jun-2003 |
fvdl | branches: 1.56.2; Back out the lwp/ktrace changes. They contained a lot of colateral damage, and need to be examined and discussed more.
|
1.55 |
| 28-Jun-2003 |
darrenr | Pass lwp pointers throughtout the kernel, as required, so that the lwpid can be inserted into ktrace records. The general change has been to replace "struct proc *" with "struct lwp *" in various function prototypes, pass the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
|
1.54 |
| 10-Apr-2003 |
simonb | Remove an assigned-to but unused variable.
|
1.53 |
| 21-Mar-2003 |
dsl | Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
|
1.52 |
| 09-Feb-2003 |
jdolecek | constify some
|
1.51 |
| 19-Nov-2002 |
oster | For reconstructs, move checks for failed components to before the kernel threads are created.
|
1.50 |
| 16-Nov-2002 |
oster | Cleanup more printfs.
|
1.49 |
| 15-Nov-2002 |
oster | After a rebuild-in-place, a reconstruct, or a copyback, we should really be updating the component labels.
|
1.48 |
| 18-Oct-2002 |
oster | Improve and/or re-arrange a number of locks. While much of the locking is still a mess, and there are a number of unresolved issues here, this gets us closer to being happier in LOCKDEBUG land.
|
1.47 |
| 06-Oct-2002 |
oster | Add a missing RF_LOCK_MUTEX().
|
1.46 |
| 06-Oct-2002 |
oster | Introduce a temp variable, and allocate the ReconCtrl structure before we protect raidPtr. One less thing for LOCKDEBUG to complain about.
|
1.45 |
| 23-Sep-2002 |
oster | Nuke "baddisk". Thanks to Simon B.
|
1.44 |
| 21-Sep-2002 |
oster | rf_RegisterReconDoneProc() isn't needed.
This is the last of the 'easy' ones that Krister made me aware of. Total savings on i386 GENERIC kernel: 13151 bytes RAIDframe in GENERIC is now at: 179033 Thanks again Krister!
|
1.43 |
| 19-Sep-2002 |
oster | Introduce and use RF_DEBUG_PSS, and save a few more bytes.
|
1.42 |
| 19-Sep-2002 |
oster | One signal will do, thanks.
|
1.41 |
| 17-Sep-2002 |
oster | Cast the RF_DEBUG_RECON net a little wider.
|
1.40 |
| 17-Sep-2002 |
oster | Rename RF_DEBUG_RECONBUFFER to RF_DEBUG_RECON in order to facilitate disabling other stuff without having to introduce another #define.
|
1.39 |
| 16-Sep-2002 |
oster | Cleanup some comments.
|
1.38 |
| 16-Sep-2002 |
oster | rf_CheckFloatingRbufCount() is only really useful when debugging the reconstruct buffer stuff. #if it out in the general case.
|
1.37 |
| 16-Sep-2002 |
oster | Cleanup some printf's, and disable some (debugging) output.
|
1.36 |
| 14-Sep-2002 |
oster | Everyone and their dog was using RF_ERRORMSG3 to print out the same sort of error message, over and over again, in different files. Rather than having the same text repeated in multiple .o files, create a couple of little functions to do the printing, and save a bundle of space. Also improves readability of code.
|
1.35 |
| 09-Sep-2002 |
oster | Disallow 'reconstruct-in-place' on a component that has failed and has already been reconstructed to a hot spare.
|
1.34 |
| 13-Jul-2002 |
oster | Nuke a redundant wakeup().
|
1.33 |
| 09-Jan-2002 |
oster | branches: 1.33.8; Move a bunch of debugging stuff to be only used if DEBUG is turned on.
|
1.32 |
| 15-Nov-2001 |
lukem | don't need <sys/types.h> when including <sys/param.h>
|
1.31 |
| 13-Nov-2001 |
lukem | add RCSIDs
|
1.30 |
| 04-Oct-2001 |
oster | Step 2 of the disentanglement. We now look to <dev/raidframe/*> for the stuff that used to live in rf_types.h, rf_raidframe.h, rf_layout.h, rf_netbsd.h, rf_raid.h, rf_decluster,h, and a few other places. Believe it or not, when this is all done, things will be cleaner.
No functional changes to RAIDframe.
|
1.29 |
| 18-Jul-2001 |
thorpej | branches: 1.29.2; bzero -> memset
|
1.28 |
| 14-Jun-2001 |
oster | branches: 1.28.2; It's silly to need a parity rebuild after a reconstruction has completed. If we've just reconstructed a disk, then the parity is known to be correct. (XXX doesn't hold for RAID 6!)
|
1.27 |
| 26-Jan-2001 |
oster | branches: 1.27.2; Ensure we update the 'partitionSize' field of the component labels when doing a reconstruct or a copyback. If we don't, junk might be there, and that could cause the component to be not correctly autoconfigured on reboot. Thanks to Simon Burge for helping track this down.
|
1.26 |
| 04-Jun-2000 |
oster | branches: 1.26.2; Merge rf_update_component_labels() and rf_final_update_component_labels().
|
1.25 |
| 31-May-2000 |
oster | Oops.. reconstruction percentages were being reported incorrectly. Thanks to Manuel Bouyer for noting this.
|
1.24 |
| 28-May-2000 |
oster | Umm.. Complete is not equal to 'left to do'. Fix the math.
|
1.23 |
| 28-May-2000 |
oster | - Add a mechanism for obtaining finer-grained 'progress' information regarding reconstructs, copybacks, etc.
- RAID 0 doesn't do copybacks, but don't make raidctl sweat about it.
|
1.22 |
| 13-Mar-2000 |
soren | branches: 1.22.2; Fix doubled 'the's in comments.
|
1.21 |
| 07-Mar-2000 |
oster | Create a new rf_close_component() to handle vnode operations for closing components. Teach rf_UnconfigureVnodes() how to use it, and tell the copyback and reconstruction code about it too.
|
1.20 |
| 25-Feb-2000 |
oster | When we close autoconfigured components, we need to note that they are no longer in 'autoconfigured' status.
|
1.19 |
| 25-Feb-2000 |
oster | Fix a (slightly) bogus status message.
|
1.18 |
| 24-Feb-2000 |
oster | Make sure we close auto-configured components appropriately when attempting a rebuild-in-place.
|
1.17 |
| 23-Feb-2000 |
oster | Be more aggressive about updating component labels in the event of a real component failure (or a simulated failure): - add 'numNewFailures' to keep track of the number of disk failures since mod_counter was last updated for each component label. - make sure we call rf_update_component_labels() upon any component failure, real or simulated.
|
1.16 |
| 23-Feb-2000 |
oster | Do a better job of (re)initializing the component labels after a reconstruct or a copyback.
|
1.15 |
| 13-Feb-2000 |
oster | Get recent changes into the tree: - make component_label variables more consistent (==> clabel) - re-work incorrect component configuration code - re-work disk configuration code - cleanup initial configuration of raidPtr info - add auto-detection of components and RAID sets (Disabled, for now) - allow / on RAID sets (Disabled, for now) - rename "config_disk_queue" to "rf_ConfigureDiskQueue" and properly prototype in rf_diskqueue.h - protect some headers with #if _KERNEL (XXX this needs to be fixed properly) and cleanup header formatting. - expand the component labels (yes, they should be backward/forward compatible) - other bits and pieces (some function names are still bogus, and will get changed soon)
|
1.14 |
| 09-Jan-2000 |
oster | Nuke dependencies on rf_cpuutils.h.
|
1.13 |
| 09-Jan-2000 |
oster | Nuke unused debugging stuff. Clean up a whole bunch of comments.
|
1.12 |
| 09-Jan-2000 |
oster | - move a bunch of function prototypes to rf_kintf.h - general cleanup of a number of prototypes that were scattered around.
|
1.11 |
| 09-Jan-2000 |
oster | Nuke #if 0'ed code.
|
1.10 |
| 08-Jan-2000 |
oster | - nuke calls to rf_get_threadid() and associated #include - change a bunch of debugging printfs from "[%d] ...", tid (where tid is the "thread id") to "raid%d: ...", raidPtr->raidid - other minor rototillage
|
1.9 |
| 05-Jan-2000 |
oster | - update RF_CREATE_THREAD to handle a 'process name' argument. - fire up a new thread for parity re-writes, copybacks, and reconstructs. The ioctl's which trigger these actions now return immediately. - add progress accounting for the above actions. - minor rototillage of rf_netbsdkintf.c to deal with all of the above.
|
1.8 |
| 14-Aug-1999 |
oster | branches: 1.8.2; Remove a 'struct proc *'-passing abomination that's been bugging me for quite some time.
|
1.7 |
| 13-Aug-1999 |
oster | rf_sys.h does not need to be #included in any of these files, and, actually, is no longer needed at all.
|
1.6 |
| 13-Aug-1999 |
oster | Clean up reconstruction accounting a bit. While it worked before, it was slightly broken in the case where the RAID set did not support reconstruction.
|
1.5 |
| 02-Mar-1999 |
oster | branches: 1.5.2; Update for recent changes including component label support, clean bits, rebuilding components in-place, adding hot spares, shutdownhooks, etc.
|
1.4 |
| 05-Feb-1999 |
oster | Phase 2 of the RAIDframe cleanup. The source is now closer to KNF and is much easier to read. No functionality changes.
|
1.3 |
| 26-Jan-1999 |
oster | Nuke more bits of RAIDframe "demo" code. We're not "demoing" here, we're doing the Real Thing!
|
1.2 |
| 26-Jan-1999 |
oster | RAIDframe cleanup, phase 1. Nuke simulator support, user-land driver, out-dated comments, and other unneeded stuff. This helps prepare for cleaning up the rest of the code, and adding new functionality.
No functional changes to the kernel code in this commit.
|
1.1 |
| 13-Nov-1998 |
oster | RAIDframe, version 1.1, from the Parallel Data Laboratory at Carnegie Mellon University. Full RAID implementation, including levels 0, 1, 4, 5, 6, parity logging, and a few other goodies. Ported to NetBSD by Greg Oster.
|
1.5.2.1 |
| 28-Sep-1999 |
cgd | pull up rev 1.6 from trunk (requested by oster): Clean up reconstruction accounting a bit. While it worked before, it was slightly broken in the case where the RAID set did not support reconstruction.
|
1.8.2.2 |
| 11-Feb-2001 |
bouyer | Sync with HEAD.
|
1.8.2.1 |
| 20-Nov-2000 |
bouyer | Update thorpej_scsipi to -current as of a month ago A i386 GENERIC kernel compiles without the siop, ahc and bha drivers (will be updated later). i386 IDE/ATAPI and ncr work, as well as sparc/esp_sbus. alpha should work as well (untested yet). siop, ahc and bha will be updated once I've updated the branch to current -current, as well as machine-dependant code.
|
1.22.2.1 |
| 22-Jun-2000 |
minoura | Sync w/ netbsd-1-5-base.
|
1.26.2.1 |
| 03-Feb-2001 |
he | Pull up revision 1.27 (requested by oster): Make sure we update the ``partitionSize'' field of the component labels when doing a reconstruct or copyback, instead of leaving behind possibly uninitialized junk, which could cause autoconfig failure on reboot.
|
1.27.2.12 |
| 11-Dec-2002 |
thorpej | Sync with HEAD.
|
1.27.2.11 |
| 11-Nov-2002 |
nathanw | Catch up to -current
|
1.27.2.10 |
| 18-Oct-2002 |
nathanw | Catch up to -current.
|
1.27.2.9 |
| 17-Sep-2002 |
nathanw | Catch up to -current.
|
1.27.2.8 |
| 01-Aug-2002 |
nathanw | Catch up to -current.
|
1.27.2.7 |
| 28-Feb-2002 |
nathanw | Catch up to -current.
|
1.27.2.6 |
| 11-Jan-2002 |
nathanw | More catchup.
|
1.27.2.5 |
| 08-Jan-2002 |
nathanw | Catch up to -current.
|
1.27.2.4 |
| 14-Nov-2001 |
nathanw | Catch up to -current.
|
1.27.2.3 |
| 22-Oct-2001 |
nathanw | Catch up to -current.
|
1.27.2.2 |
| 24-Aug-2001 |
nathanw | Catch up with -current.
|
1.27.2.1 |
| 21-Jun-2001 |
nathanw | Catch up to -current.
|
1.28.2.4 |
| 10-Oct-2002 |
jdolecek | sync kqueue with -current; this includes merge of gehenna-devsw branch, merge of i386 MP branch, and part of autoconf rototil work
|
1.28.2.3 |
| 06-Sep-2002 |
jdolecek | sync kqueue branch with HEAD
|
1.28.2.2 |
| 10-Jan-2002 |
thorpej | Sync kqueue branch with -current.
|
1.28.2.1 |
| 03-Aug-2001 |
lukem | update to -current
|
1.29.2.2 |
| 11-Oct-2001 |
fvdl | Catch up with -current. Fix some bogons in the sparc64 kbd/ms attach code. cd18xx conversion provided by mrg.
|
1.29.2.1 |
| 07-Sep-2001 |
thorpej | Commit my "devvp" changes to the thorpej-devvp branch. This replaces the use of dev_t in most places with a struct vnode *.
This will form the basic infrastructure for real cloning device support (besides being architecurally cleaner -- it'll be good to get away from using numbers to represent objects).
|
1.33.8.1 |
| 15-Jul-2002 |
gehenna | catch up with -current.
|
1.56.2.11 |
| 10-Nov-2005 |
skrll | Sync with HEAD. Here we go again...
|
1.56.2.10 |
| 04-Mar-2005 |
skrll | Sync with HEAD.
Hi Perry!
|
1.56.2.9 |
| 15-Feb-2005 |
skrll | Sync with HEAD.
|
1.56.2.8 |
| 06-Feb-2005 |
skrll | Sync with HEAD.
|
1.56.2.7 |
| 24-Jan-2005 |
skrll | Sync with HEAD.
|
1.56.2.6 |
| 18-Dec-2004 |
skrll | Sync with HEAD.
|
1.56.2.5 |
| 29-Nov-2004 |
skrll | Sync with HEAD.
|
1.56.2.4 |
| 21-Sep-2004 |
skrll | Fix the sync with head I botched.
|
1.56.2.3 |
| 18-Sep-2004 |
skrll | Sync with HEAD.
|
1.56.2.2 |
| 03-Aug-2004 |
skrll | Sync with HEAD
|
1.56.2.1 |
| 02-Jul-2003 |
darrenr | Apply the aborted ktrace-lwp changes to a specific branch. This is just for others to review, I'm concerned that patch fuziness may have resulted in some errant code being generated but I'll look at that later by comparing the diff from the base to the branch with the file I attempt to apply to it. This will, at the very least, put the changes in a better context for others to review them and attempt to tinker with removing passing of 'struct lwp' through the kernel.
|
1.76.4.1 |
| 16-Apr-2005 |
tron | Pull up revision 1.79 (requested by oster in ticket #1104): ForceReconReadDoneProc() needs a return after doing the first rf_CauseReconEvent().
|
1.78.2.1 |
| 29-Apr-2005 |
kent | sync with -current
|
1.81.2.2 |
| 19-Mar-2005 |
yamt | sync with head. xen and whitespace. xen part is not finished.
|
1.81.2.1 |
| 12-Feb-2005 |
yamt | sync with head.
|
1.87.2.4 |
| 25-May-2008 |
bouyer | Pull up following revision(s) (requested by oster in ticket #1933): sys/dev/raidframe/rf_reconmap.h: revision 1.11 sys/dev/raidframe/rf_reconmap.c: revision 1.31 sys/dev/raidframe/rf_reconstruct.h: revision 1.24 sys/dev/raidframe/rf_reconstruct.c: revision 1.104 sys/dev/raidframe/rf_revent.c: revision 1.25 Convert the reconstruction code to use a "sliding status window" which will scale nicely regardless of the number of stripes/reconstruction units in the RAID set. Convert the main reconstruction loop to rebuild the array in chunks rather than in one big lump. May fix bin/38471.
|
1.87.2.3 |
| 19-Apr-2008 |
bouyer | Pull up following revision(s) (requested by oster in ticket #1923): sys/dev/raidframe/rf_reconstruct.c: revision 1.103 A forced recon read should not default to indicating that the reads for that disk have stopped, since this will bump us out of the normal reconstruction loop prematurely. Fixes the (mostly cosmetic) bug where the reconstruction status values stop updating, and from raidctl it appears that reconstruction has totally stalled (which it actually hasn't -- the reconstruction does complete properly, but not in the normal way).
|
1.87.2.2 |
| 20-Jul-2005 |
tron | Pull up revision 1.89 (requested by oster in ticket #602): If rf_SubmitReconBuffer indicates the submission was blocked (for whatever reason), return 0 instead of the default RF_RECON_READ_STOPPED. Returning RF_RECON_READ_STOPPED would result in rf_ContinueReconstructFailedDisk() thinking that the given component was "done" and breaking out of the main reconstruction loop far too early. Reconstruction still worked correctly as long as there were no errors, but RAIDframe wouldn't be in a position to properly handle read/write errors during reconstruction. This fixes the "raidctl's progress bar spins at 0% until reconstruction finishes" problem.
|
1.87.2.1 |
| 09-Jun-2005 |
tron | Pull up revision 1.88 (requested by oster in ticket #435): - initialize numRUsTotal before we indicate that we are doing a reconstruct. - make numRUsComplete and numRUsTotal 64-bit quantities like everything else that records this information.
|
1.88.2.6 |
| 04-Feb-2008 |
yamt | sync with head.
|
1.88.2.5 |
| 07-Dec-2007 |
yamt | sync with head
|
1.88.2.4 |
| 27-Oct-2007 |
yamt | sync with head.
|
1.88.2.3 |
| 03-Sep-2007 |
yamt | sync with head.
|
1.88.2.2 |
| 30-Dec-2006 |
yamt | sync with head.
|
1.88.2.1 |
| 21-Jun-2006 |
yamt | sync with head.
|
1.90.12.1 |
| 24-May-2006 |
tron | Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
|
1.90.10.1 |
| 08-Mar-2006 |
elad | Adapt to kernel authorization KPI.
|
1.90.8.3 |
| 03-Sep-2006 |
yamt | sync with head.
|
1.90.8.2 |
| 11-Aug-2006 |
yamt | sync with head
|
1.90.8.1 |
| 24-May-2006 |
yamt | sync with head.
|
1.90.6.1 |
| 01-Jun-2006 |
kardel | Sync with head.
|
1.90.4.1 |
| 09-Sep-2006 |
rpaulo | sync with head
|
1.93.4.2 |
| 10-Dec-2006 |
yamt | sync with head.
|
1.93.4.1 |
| 22-Oct-2006 |
yamt | sync with head
|
1.93.2.1 |
| 18-Nov-2006 |
ad | Sync with head.
|
1.95.16.2 |
| 03-Jun-2008 |
skrll | Sync with netbsd-4.
|
1.95.16.1 |
| 03-Sep-2007 |
wrstuden | Sync w/ NetBSD-4-RC_1
|
1.95.10.1 |
| 11-Jul-2007 |
mjf | Sync with head.
|
1.95.8.3 |
| 09-Oct-2007 |
ad | Sync with head.
|
1.95.8.2 |
| 20-Aug-2007 |
ad | Sync with HEAD.
|
1.95.8.1 |
| 15-Jul-2007 |
ad | Sync with head.
|
1.95.2.5 |
| 26-Feb-2009 |
snj | Pull up following revision(s) (requested by oster in ticket #1276): sys/dev/raidframe/rf_reconstruct.c: revision 1.107 If we see a RF_RECON_WRITE_ERROR event we know a write has finished and we need to account for that. Failure to do so means we can end up waiting forever for writes we think are outstanding, but which have already completed. Addresses the RAIDframe part of PR#40569. Thanks to Matthias Scheler for reporting the issue and verifying the fix.
|
1.95.2.4 |
| 27-Dec-2008 |
bouyer | Pull up following revision(s) (requested by oster in ticket #1249): sys/dev/raidframe/rf_driver.c: revision 1.120 sys/dev/raidframe/rf_reconstruct.c: revision 1.106 When unconfiguring an array where a reconstruct is in progress, abort the reconstruct and wait for IOs to drain before pulling the plug. Should fix the panic reported by der Mouse on tech-kern.
|
1.95.2.3 |
| 25-May-2008 |
bouyer | Pull up following revision(s) (requested by oster in ticket #1153): sys/dev/raidframe/rf_reconmap.h: revision 1.11 sys/dev/raidframe/rf_reconmap.c: revision 1.31 sys/dev/raidframe/rf_reconstruct.h: revision 1.24 sys/dev/raidframe/rf_reconstruct.c: revision 1.104 sys/dev/raidframe/rf_revent.c: revision 1.25 Convert the reconstruction code to use a "sliding status window" which will scale nicely regardless of the number of stripes/reconstruction units in the RAID set. Convert the main reconstruction loop to rebuild the array in chunks rather than in one big lump. May fix bin/38471.
|
1.95.2.2 |
| 19-Apr-2008 |
bouyer | Pull up following revision(s) (requested by oster in ticket #1127): sys/dev/raidframe/rf_reconstruct.c: revision 1.103 A forced recon read should not default to indicating that the reads for that disk have stopped, since this will bump us out of the normal reconstruction loop prematurely. Fixes the (mostly cosmetic) bug where the reconstruction status values stop updating, and from raidctl it appears that reconstruction has totally stalled (which it actually hasn't -- the reconstruction does complete properly, but not in the normal way).
|
1.95.2.1 |
| 01-Jul-2007 |
bouyer | Pull up following revision(s) (requested by cube in ticket #748): sys/dev/dksubr.c: revision 1.29 sys/dev/ccd.c: revision 1.120 sys/dev/raidframe/rf_disks.c: revision 1.66 sys/dev/raidframe/rf_reconstruct.c: revision 1.96 sys/dev/cgd.c: revision 1.45 sys/dev/dkvar.h: revision 1.11 sys/dev/raidframe/rf_copyback.c: revision 1.38 Change dk_lookup() to accept an additional argument of the type enum uio_seg that tells whether the given path is in user space or kernel space, so it can tell NDINIT(). While the raidframe calls were ok, both ccd(4) and cgd(4) were passing pointers to user space data, which leads to strange error on i386, as reported by Jukka Salmi on current-users.
|
1.97.2.1 |
| 15-Aug-2007 |
skrll | Sync with HEAD.
|
1.98.8.2 |
| 18-Jul-2007 |
ad | Fix fallout from recent kthread changes.
|
1.98.8.1 |
| 18-Jul-2007 |
ad | file rf_reconstruct.c was added on branch matt-mips64 on 2007-07-18 19:04:59 +0000
|
1.98.6.3 |
| 23-Mar-2008 |
matt | sync with HEAD
|
1.98.6.2 |
| 09-Jan-2008 |
matt | sync with HEAD
|
1.98.6.1 |
| 06-Nov-2007 |
matt | sync with HEAD
|
1.98.4.2 |
| 27-Nov-2007 |
joerg | Sync with HEAD. amd64 Xen support needs testing.
|
1.98.4.1 |
| 02-Oct-2007 |
joerg | Sync with HEAD.
|
1.99.6.2 |
| 18-Feb-2008 |
mjf | Sync with HEAD.
|
1.99.6.1 |
| 08-Dec-2007 |
mjf | Sync with HEAD.
|
1.101.6.3 |
| 17-Jan-2009 |
mjf | Sync with HEAD.
|
1.101.6.2 |
| 28-Sep-2008 |
mjf | Sync with HEAD.
|
1.101.6.1 |
| 02-Jun-2008 |
mjf | Sync with HEAD.
|
1.103.6.2 |
| 10-Oct-2008 |
skrll | Sync with HEAD.
|
1.103.6.1 |
| 23-Jun-2008 |
wrstuden | Sync w/ -current. 34 merge conflicts to follow.
|
1.103.4.2 |
| 11-Mar-2010 |
yamt | sync with head
|
1.103.4.1 |
| 04-May-2009 |
yamt | sync with head.
|
1.103.2.1 |
| 04-Jun-2008 |
yamt | sync with head
|
1.104.4.1 |
| 19-Oct-2008 |
haad | Sync with HEAD.
|
1.105.4.7 |
| 20-Nov-2014 |
sborrill | Pull up the following revisions(s) (requested by oster in ticket #1933): sys/dev/raidframe/raidframevar.h: revision 1.17 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.316 sys/dev/raidframe/rf_reconstruct.c: revision 1.121 via patch
Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete. Fixes PR kern/49244.
|
1.105.4.6 |
| 13-Jun-2012 |
sborrill | branches: 1.105.4.6.2; Pull up the following revisions(s) (requested by mrg in ticket #1774): sbin/raidctl/raidctl.c: revision 1.52 sys/dev/raidframe/raidframevar.h: revision 1.15 sys/dev/raidframe/rf_copyback.c: revision 1.45 sys/dev/raidframe/rf_disks.c: revision 1.78 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.282,1.284 sys/dev/raidframe/rf_reconstruct.c: revision 1.111
Fix garbage values in partitionSizeHi with RAID array > 2TB. Stops the check against rf_component_label_partitionsize() failing and stopping auto-configure.
|
1.105.4.5 |
| 24-Feb-2012 |
sborrill | Pull up the following revisions(s) (requested by oster in ticket #1728): sys/dev/raidframe/rf_reconmap.c: revision 1.34 sys/dev/raidframe/rf_reconstruct.c: revision 1.118
Remove a DIAGNOSTIC check that is invalid for RAID5_RS. Add logic to the main reconstruction loop to handle RAID5 with rotated spares. Correct issue where we were doing one more stripe than necessary.
|
1.105.4.4 |
| 21-Nov-2010 |
riz | Pull up following revision(s) (requested by mrg in ticket #1468): sys/dev/raidframe/rf_disks.c: revision 1.74 sys/dev/raidframe/raidframevar.h: revision 1.14 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.275 sys/dev/raidframe/rf_copyback.c: revision 1.43 sys/dev/raidframe/rf_reconstruct.c: revision 1.109 add support for >2TB raid devices. - - add two new members to the component label: u_int numBlocksHi u_int partitionSizeHi and store the top 32 bits of the real number of blocks and partition size. modify rf_print_component_label(), rf_does_it_fit(), rf_AutoConfigureDisks() and rf_ReconstructFailedDiskBasic(). - - call disk_blocksize() after disk_attach() [ from mlelstv ] - - shift the block number relative to DEV_BSHIFT in raidstart() and InitBP() so that accesses work for non 512-byte devices. [ from mlelstv ] - - update rf_getdisksize() to use the new getdisksize() [ from mlelstv. this part needs a separate change for netbsd-5. ] reviewed by: oster, christos and darrenr
|
1.105.4.3 |
| 10-Dec-2009 |
snj | branches: 1.105.4.3.2; Pull up following revision(s) (requested by tron in ticket #1187): sbin/raidctl/raidctl.8: revisions 1.57-1.59 via patch sbin/raidctl/raidctl.c: revision 1.42 via patch sys/dev/raidframe/files.raidframe: revision 1.8 via patch sys/dev/raidframe/rf_copyback.c: revision 1.42 via patch sys/dev/raidframe/rf_disks.c: revision 1.72 via patch sys/dev/raidframe/rf_driver.c: revision 1.122 via patch sys/dev/raidframe/rf_engine.c: revision 1.40 via patch sys/dev/raidframe/rf_kintf.h: revision 1.21 via patch sys/dev/raidframe/rf_netbsdkintf.c: revision 1.269 via patch sys/dev/raidframe/rf_paritymap.c: revisions 1.1-1.3 via patch sys/dev/raidframe/rf_paritymap.h: revision 1.1 via patch sys/dev/raidframe/rf_parityscan.c: revision 1.33 via patch sys/dev/raidframe/rf_parityscan.h: revision 1.8 via patch sys/dev/raidframe/rf_raid.h: revision 1.38 via patch sys/dev/raidframe/rf_reconstruct.c: revision 1.108 via patch sys/dev/raidframe/rf_states.c: revision 1.44 via patch sys/dev/raidframe/raidframeio.h: revision 1.6 via patch sys/dev/raidframe/raidframevar.h: revision 1.13 via patch Pull up the RAIDframe parity map Summer Of Code project. Drastically reduces the amount of time spent rewriting parity after an unclean shutdown by keeping better track of which regions might have had outstanding writes. Enabled by default; can be disabled on a per-set basis, or tuned, with the new raidctl(8) commands.
|
1.105.4.2 |
| 19-Feb-2009 |
snj | branches: 1.105.4.2.4; Pull up following revision(s) (requested by oster in ticket #454): sys/dev/raidframe/rf_reconstruct.c: revision 1.107 If we see a RF_RECON_WRITE_ERROR event we know a write has finished and we need to account for that. Failure to do so means we can end up waiting forever for writes we think are outstanding, but which have already completed. Addresses the RAIDframe part of PR#40569. Thanks to Matthias Scheler for reporting the issue and verifying the fix.
|
1.105.4.1 |
| 23-Dec-2008 |
snj | Pull up following revision(s) (requested by oster in ticket #203): sys/dev/raidframe/rf_driver.c: revision 1.120 sys/dev/raidframe/rf_reconstruct.c: revision 1.106 When unconfiguring an array where a reconstruct is in progress, abort the reconstruct and wait for IOs to drain before pulling the plug. Should fix the panic reported by der Mouse on tech-kern.
|
1.105.4.6.2.1 |
| 20-Nov-2014 |
sborrill | Pull up the following revisions(s) (requested by oster in ticket #1933): sys/dev/raidframe/raidframevar.h: revision 1.17 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.316 sys/dev/raidframe/rf_reconstruct.c: revision 1.121 via patch
Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete. Fixes PR kern/49244.
|
1.105.4.3.2.1 |
| 20-Nov-2014 |
sborrill | Pull up the following revisions(s) (requested by oster in ticket #1933): sys/dev/raidframe/raidframevar.h: revision 1.17 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.316 sys/dev/raidframe/rf_reconstruct.c: revision 1.121 via patch
Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete. Fixes PR kern/49244.
|
1.105.4.2.4.1 |
| 21-Apr-2010 |
matt | sync to netbsd-5
|
1.105.2.2 |
| 03-Mar-2009 |
skrll | Sync with HEAD.
|
1.105.2.1 |
| 19-Jan-2009 |
skrll | Sync with HEAD.
|
1.106.2.1 |
| 13-May-2009 |
jym | Sync with HEAD.
Commit is split, to avoid a "too many arguments" protocol error.
|
1.108.4.2 |
| 31-May-2011 |
rmind | sync with head
|
1.108.4.1 |
| 05-Mar-2011 |
rmind | sync with head
|
1.108.2.1 |
| 06-Nov-2010 |
uebayasi | Sync with HEAD.
|
1.110.4.1 |
| 05-Mar-2011 |
bouyer | Sync with HEAD
|
1.110.2.1 |
| 06-Jun-2011 |
jruoho | Sync with HEAD.
|
1.117.8.2 |
| 02-Dec-2014 |
snj | Pull up following revision(s) (requested by oster in ticket #1194): sys/dev/raidframe/raidframevar.h: revision 1.17 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.316 sys/dev/raidframe/rf_reconstruct.c: revision 1.121 Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete. The issue was that a component was being marked as a used_spare when the rebuild started, not when the rebuild was actually finished. Marking it as a used_spare meant that the component label on the spare was being updated such that after a reboot the component would be considered up-to-date, regardless of whether the rebuild actually completed! This fix includes: 1) Add an additional state "rf_ds_rebuilding_spare" which is used to denote that a spare is currently being rebuilt from the live components. 2) Update the comments on the disk states, which were out-of-sync with reality. 3) When rebuilding to a spare component, that spare now enters the state rf_ds_rebuilding_spare instead of the state rf_ds_used_spare. 4) When the rebuild is actually complete then the spare component enters the rf_ds_used_spare state. rf_ds_used_spare is now used exclusively for the case where the rebuilding to the spare has completed successfully. XXX: Someday we need to teach raidctl(8) about this new state, and take out the backwards compatibility code in rf_netbsdkintf.c (see RAIDFRAME_GET_INFO in raidioctl()). For today, this fix needs to be generic enough that it can get backported without major grief. XXX: Needs pullup to netbsd-5*, netbsd-6*, and netbsd-7 Fixes PR#49244.
|
1.117.8.1 |
| 23-Feb-2012 |
riz | branches: 1.117.8.1.4; 1.117.8.1.6; Pull up following revision(s) (requested by oster in ticket #23): sys/dev/raidframe/rf_reconstruct.c: revision 1.118 sys/dev/raidframe/rf_reconmap.c: revision 1.34 comment, and effectively remove, a DIAGNOSTIC check that is invalid for RAID5_RS. Add logic to the main reconstruction loop to handle RAID5 with rotated spares. While here, observe that we were actually doing one more stripe than we thought we were, and correct that too (it didn't matter for non-RAID5_RS, but it definitely does for RAID5_RS). Add some bounds-checking at the beginning to handle the case where the number of stripes in the set is smaller than the sliding reconstruction window. XXX: this problem likely needs to be fixed for PARITY_DECLUSTERING too.
|
1.117.8.1.6.1 |
| 02-Dec-2014 |
snj | Pull up following revision(s) (requested by oster in ticket #1194): sys/dev/raidframe/raidframevar.h: revision 1.17 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.316 sys/dev/raidframe/rf_reconstruct.c: revision 1.121 Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete. The issue was that a component was being marked as a used_spare when the rebuild started, not when the rebuild was actually finished. Marking it as a used_spare meant that the component label on the spare was being updated such that after a reboot the component would be considered up-to-date, regardless of whether the rebuild actually completed! This fix includes: 1) Add an additional state "rf_ds_rebuilding_spare" which is used to denote that a spare is currently being rebuilt from the live components. 2) Update the comments on the disk states, which were out-of-sync with reality. 3) When rebuilding to a spare component, that spare now enters the state rf_ds_rebuilding_spare instead of the state rf_ds_used_spare. 4) When the rebuild is actually complete then the spare component enters the rf_ds_used_spare state. rf_ds_used_spare is now used exclusively for the case where the rebuilding to the spare has completed successfully. XXX: Someday we need to teach raidctl(8) about this new state, and take out the backwards compatibility code in rf_netbsdkintf.c (see RAIDFRAME_GET_INFO in raidioctl()). For today, this fix needs to be generic enough that it can get backported without major grief. XXX: Needs pullup to netbsd-5*, netbsd-6*, and netbsd-7 Fixes PR#49244.
|
1.117.8.1.4.1 |
| 02-Dec-2014 |
snj | Pull up following revision(s) (requested by oster in ticket #1194): sys/dev/raidframe/raidframevar.h: revision 1.17 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.316 sys/dev/raidframe/rf_reconstruct.c: revision 1.121 Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete. The issue was that a component was being marked as a used_spare when the rebuild started, not when the rebuild was actually finished. Marking it as a used_spare meant that the component label on the spare was being updated such that after a reboot the component would be considered up-to-date, regardless of whether the rebuild actually completed! This fix includes: 1) Add an additional state "rf_ds_rebuilding_spare" which is used to denote that a spare is currently being rebuilt from the live components. 2) Update the comments on the disk states, which were out-of-sync with reality. 3) When rebuilding to a spare component, that spare now enters the state rf_ds_rebuilding_spare instead of the state rf_ds_used_spare. 4) When the rebuild is actually complete then the spare component enters the rf_ds_used_spare state. rf_ds_used_spare is now used exclusively for the case where the rebuilding to the spare has completed successfully. XXX: Someday we need to teach raidctl(8) about this new state, and take out the backwards compatibility code in rf_netbsdkintf.c (see RAIDFRAME_GET_INFO in raidioctl()). For today, this fix needs to be generic enough that it can get backported without major grief. XXX: Needs pullup to netbsd-5*, netbsd-6*, and netbsd-7 Fixes PR#49244.
|
1.117.6.1 |
| 24-Feb-2012 |
mrg | sync to -current.
|
1.117.2.2 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.117.2.1 |
| 17-Apr-2012 |
yamt | sync with head
|
1.118.2.3 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.118.2.2 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.118.2.1 |
| 23-Jun-2013 |
tls | resync from head
|
1.119.10.1 |
| 10-Aug-2014 |
tls | Rebase.
|
1.120.2.1 |
| 18-Nov-2014 |
snj | Pull up following revision(s) (requested by oster in ticket #243): sys/dev/raidframe/raidframevar.h: revision 1.17 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.316 sys/dev/raidframe/rf_reconstruct.c: revision 1.121 Fix a long-standing bug related to rebooting while a reconstruct-to-spare is underway but not yet complete. The issue was that a component was being marked as a used_spare when the rebuild started, not when the rebuild was actually finished. Marking it as a used_spare meant that the component label on the spare was being updated such that after a reboot the component would be considered up-to-date, regardless of whether the rebuild actually completed! This fix includes: 1) Add an additional state "rf_ds_rebuilding_spare" which is used to denote that a spare is currently being rebuilt from the live components. 2) Update the comments on the disk states, which were out-of-sync with reality. 3) When rebuilding to a spare component, that spare now enters the state rf_ds_rebuilding_spare instead of the state rf_ds_used_spare. 4) When the rebuild is actually complete then the spare component enters the rf_ds_used_spare state. rf_ds_used_spare is now used exclusively for the case where the rebuilding to the spare has completed successfully. XXX: Someday we need to teach raidctl(8) about this new state, and take out the backwards compatibility code in rf_netbsdkintf.c (see RAIDFRAME_GET_INFO in raidioctl()). For today, this fix needs to be generic enough that it can get backported without major grief. XXX: Needs pullup to netbsd-5*, netbsd-6*, and netbsd-7 Fixes PR#49244.
|
1.121.20.3 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.121.20.2 |
| 08-Apr-2020 |
martin | Merge changes from current as of 20200406
|
1.121.20.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.121.12.1 |
| 17-Feb-2021 |
martin | Pull up following revision(s) (requested by oster in ticket #1655):
sys/dev/raidframe/rf_reconstruct.c: revision 1.125
Fix a long long-standing off-by-one error in computing lastPSID.
SUsPerPU is only really supported for a value of 1, and since the first PSID is 0, the last will be numStripe-1. Also update the setting of pending_writes to reflect the change to lastPSID.
Needs pullups to -8 and -9.
|
1.122.4.2 |
| 09-Sep-2023 |
martin | Pull up following revision(s) (requested by oster in ticket #1729):
sys/dev/raidframe/rf_reconstruct.c: revision 1.128
Revision 1.104 actually fixed the issues that were preventing us from freeing the ReconControl structures. So free them and thus also prevent a panic on shutdown due to items not being correctly returned to the pool.
Thanks to manu@ for report of the panic, and for initial testing of the changes.
|
1.122.4.1 |
| 17-Feb-2021 |
martin | Pull up following revision(s) (requested by oster in ticket #1206):
sys/dev/raidframe/rf_reconstruct.c: revision 1.125
Fix a long long-standing off-by-one error in computing lastPSID.
SUsPerPU is only really supported for a value of 1, and since the first PSID is 0, the last will be numStripe-1. Also update the setting of pending_writes to reflect the change to lastPSID.
Needs pullups to -8 and -9.
|
1.124.8.1 |
| 03-Apr-2021 |
thorpej | Sync with HEAD.
|
1.125.4.1 |
| 01-Aug-2021 |
thorpej | Sync with HEAD.
|
1.127.10.2 |
| 28-Apr-2024 |
martin | Pull up following revision(s) (requested by oster in ticket #674):
sys/dev/raidframe/rf_raid.h: revision 1.52 sbin/raidctl/raidctl.8: revision 1.80 sys/dev/raidframe/rf_driver.c: revision 1.141 sys/dev/raidframe/rf_disks.c: revision 1.94 sys/dev/raidframe/rf_diskqueue.c: revision 1.64 sys/dev/raidframe/rf_diskqueue.h: revision 1.30 sys/dev/raidframe/rf_disks.h: revision 1.15 sys/dev/raidframe/rf_netbsdkintf.c: revision 1.414 sys/dev/raidframe/rf_reconstruct.c: revision 1.129 sys/dev/raidframe/raidframeio.h: revision 1.12 sbin/raidctl/raidctl.c: revision 1.79
Implement hot removal of spares and components. From manu@.
Implement a long desired feature of automatically incorporating a used spare into the array after a reconstruct.
Given the configuration:
Components: /dev/wd0e: failed /dev/wd1e: optimal /dev/wd2e: optimal Spares: /dev/wd3e: spare
Running 'raidctl -F /dev/wd0e raid0' will now result in the following configuration after a successful rebuild:
Components: /dev/wd3e: optimal /dev/wd1e: optimal /dev/wd2e: optimal No spares.
Thanks to manu@ for the development of the initial set of changes which allowed the changes to automatically incorporate a used spare to come to fruition. Thanks also to manu@ for useful discussions about and additional testing of these changes.
|
1.127.10.1 |
| 09-Sep-2023 |
martin | Pull up following revision(s) (requested by oster in ticket #359):
sys/dev/raidframe/rf_reconstruct.c: revision 1.128
Revision 1.104 actually fixed the issues that were preventing us from freeing the ReconControl structures. So free them and thus also prevent a panic on shutdown due to items not being correctly returned to the pool.
Thanks to manu@ for report of the panic, and for initial testing of the changes.
|