| History log of /src/usr.bin/sort/msort.c |
| Revision | | Date | Author | Comments |
| 1.31 |
| 01-Jun-2016 |
kre | Add the posix -C option (-c but quieter). Fix -R to work properly when setting \n as the record delimited using a numeric value rather than literal \n - and to not incorrectly turn \n into a field separator if -R is used to make some other char the record separator (\n becomes a field separator in that case as long as the field separator remains "white space" but should not be in any other case - unless set explicitly of course.)
Plus more cosmetic changes - the man page and usage are updated to make it more clear that the 2 (or 1) params to -k are not fields (field1 and field2) but specifiers of the beginning and end of one key field. There was an unused 'x' option in the GETOPTS string. The usage message is reformatted to display properly on both 80 col and > 80 col displays (on < 80 it will still probably look pretty ugly ... perhaps not quite so bad though), and is also updated to show the different usage for the -c case (and -C) from the others (only 1 file permitted) - the man page synopsis has a similar update.
Using more than one of -c -C or -m generates a usage message rather than just ignoring the -m as it did before (there was no -C before of course).
Aside from the bug fix to the interaction between -R and -t, there are no changes that affect the way anything is sorted (or read, or written).
Discussed on tech-userlevel earlier this week.
|
| 1.30 |
| 05-Feb-2010 |
enami | Don't touch past the end of allocated region. It results segmentation violation.
|
| 1.29 |
| 06-Nov-2009 |
joerg | Retire __SCCSID. It has only archeological value now. Also retire lint conditional around __RCSID, lint can handle that fine.
|
| 1.28 |
| 09-Oct-2009 |
dsl | When we need to merge more than 16 files, do them in a hierarchy. Reduces the amount of data written to temporary files. The 3-level stack has to do a simple reduce after 4352 input files, for a normal file sort this is 35GB of data or about 500 million records. This needs about 50 open fd's - which should be ok. Clearly the merge sort could process more input files in one go - speeding up the sort, but at some point the number of input files would exceed whatever limit was applied.
|
| 1.27 |
| 26-Sep-2009 |
dsl | Move all the fopen() calls out of the record read routines into the callers. Split the merge sort so that fsort() can pass the 'FILE *' of the temporary files to be merged into the merge code. Don't rely on realloc() not moving the end address of a buffer! Rework merge sort so that it sorts pointers to 'struct mfile' and only copies about sort record descriptors. No functional change intended.
|
| 1.26 |
| 10-Sep-2009 |
dsl | Save length of key instead of relying of the weight of the record sep. This frees a byte value to use for 'end of key' (to correctly sort short keys) while still having a weight assigned to the field sep. (Unless -t is given, the field sep is in the field data.) Do reverse sorts by writing the output file in reverse order (rather than reversing the sort - apart from merges). All key compares are now unweighted. For 'sort -u' mark duplicates keys during the sort and don't write to the output. Use -S to mean a posix sort - where equal keys are sorted using the raw record (rather than being kept in the original order). For 'sort -f' (no keys) generate a key of the folded data (as for -n -i and -d), simplifies the code and allows a 'posix' sort.
|
| 1.25 |
| 05-Sep-2009 |
dsl | Now we have our own radix_sort() change the interface so that we pass an array of 'RECHEADER *' and remove all the crappy stuff that backed up by REC_DATA_OFFSET (etc). Also change radix_sort() to return the number of elements, soon to be used to drop duplicate keys (for sort -u).
|
| 1.24 |
| 22-Aug-2009 |
dsl | Add some comments and clarifications to this inpeneterable code. When merging ensure we accurable sort records with identical keys by file-number, otherwise a 'stable' sort won't be!
|
| 1.23 |
| 22-Aug-2009 |
dsl | Rework the way sort generates sort keys: - If we generate a key, it is always sortable using memcmp() - If we are sorting the whole record, then a weight-table must be used during compares. - Major surgery to encoding of numbers to ensure unique keys for equal numeric values. Reverse numerics are handled by inverting the sign. - Case folding (-f) is handled when the sort keys are generated. No other code has to care at all. - Key uniqueness (-u) is done during merge for large datasets. It only has to be done when writing the output file for small files. Since the file is in key order this is simple! Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504 PR/36816 PR/37860 PR/39308 Also PR/18614 should no longer die, but a little more work needs to be done on the merging for very large files.
|
| 1.22 |
| 20-Aug-2009 |
dsl | Delete more unwanted/unused cruft. Simplify logic for reading input records. Do a merge sort whenever we have 16 partial sorted blocks. The patient is breathing, but still carrying a lot of extra weight.
|
| 1.21 |
| 16-Aug-2009 |
dsl | Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which is defined as offsetof(RECHEADER, data). Delete TRECHEADER.
|
| 1.20 |
| 15-Aug-2009 |
dsl | linebuf and linebuf_size are only used inside seq() - which also not only has its own static variable, but will also extend the buffer. Remove linebuf/size and change seq() to use a private, locally managed buffer.
|
| 1.19 |
| 15-Aug-2009 |
dsl | Ansify. I'm looking at fixing the 'sort -n' fubars, but this code is an inpeneterable mess - which needs some fixing first!
|
| 1.18 |
| 28-Apr-2008 |
martin | branches: 1.18.6; 1.18.12; Remove clause 3 and 4 from TNF licenses
|
| 1.17 |
| 17-Feb-2004 |
jdolecek | branches: 1.17.32; initialize malloc()ated memory
|
| 1.16 |
| 18-Oct-2003 |
itojun | KNF (mostly whitespace)
|
| 1.15 |
| 16-Oct-2003 |
itojun | safer use of realloc
|
| 1.14 |
| 07-Aug-2003 |
jdolecek | add TNF copyright
|
| 1.13 |
| 07-Aug-2003 |
agc | Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22365, verified by myself.
|
| 1.12 |
| 20-Mar-2003 |
jdolecek | get rid of one memmove() (not very significant) remove ()'s from error messages move some error checks immediatelly after appropriate realloc() calls
|
| 1.11 |
| 25-Dec-2002 |
jdolecek | make function merge() static in msort.c cosmetic change to how local variable is incremented (moved to for(;;))
|
| 1.10 |
| 19-Feb-2001 |
jdolecek | Pull up various cosmetic (mostly whitespace) changes from OpenBSD. This is primarily to ease syncing the two versions.
|
| 1.9 |
| 19-Jan-2001 |
jdolecek | merge(): use array of buffers instead of one big buffer for all records, and enlarge them as necessary to read records from merged files; the buffers are allocated once per program run, so there shouldn't be any performance difference This makes sort(1) pass also regression 40B and should make it fully arbitrary long record capable. XXX the buffer array could probably be freed on end of fmerge() to save memory
|
| 1.8 |
| 13-Jan-2001 |
jdolecek | when merging stuff from several files, make merge handle records correctly for stable sort so that the records are not swapped arbitrarily - this makes in-tree BSD sort(1) pass regression test 38
while here, do couple of cleanups, like s/16/MERGE_FNUM/ where appropriate, making local stuff static and some intendation/code format changes
|
| 1.7 |
| 11-Jan-2001 |
jdolecek | general cleanup of file list passing: * get rid of union f_handle, replace by passing explicit int parameter and (new) struct filelist * add new typedefs gen_func_t and put_func_t and use where appropriate
|
| 1.6 |
| 17-Oct-2000 |
jdolecek | order(): since getline()/getnext() behaviour wrt passed end pointer has changed (full buffer is used instead of first DEFLLEN bytes) the end pointer cannot be shared for crec and prec, we need to pass different value in each case
|
| 1.5 |
| 16-Oct-2000 |
jdolecek | constify, rename MAXLLEN to DEFLLEN
|
| 1.4 |
| 15-Oct-2000 |
jdolecek | don't use register declarations
|
| 1.3 |
| 07-Oct-2000 |
bjh21 | Two classes of changes from the initial OpenBSD commit of this sort(1): FILE * variables are called "fp" rather than "fd". Better (safer) temporary-file handling.
|
| 1.2 |
| 07-Oct-2000 |
bjh21 | Hit sort(1) with a hammer till it compiles. Also add RCSIDs.
|
| 1.1 |
| 07-Oct-2000 |
bjh21 | branches: 1.1.1; Initial revision
|
| 1.1.1.1 |
| 07-Oct-2000 |
bjh21 | 4.4BSD-Lite2 contrib/sort
|
| 1.17.32.1 |
| 18-May-2008 |
yamt | sync with head.
|
| 1.18.12.2 |
| 20-May-2011 |
matt | bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE
|
| 1.18.12.1 |
| 21-Apr-2010 |
matt | sync to netbsd-5
|
| 1.18.6.2 |
| 29-Jun-2010 |
riz | Pull up following revision(s) (requested by dholland in ticket #1420): usr.bin/sort/sort.h: revision 1.31 usr.bin/sort/sort.c: revision 1.58 usr.bin/sort/fsort.c: revision 1.47 usr.bin/sort/msort.c: revision 1.30 Don't touch past the end of allocated region. It results segmentation violation.
|
| 1.18.6.1 |
| 14-Oct-2009 |
sborrill | Pull up the following revisions(s) (requested by dsl in ticket #1084): usr.bin/sort/Makefile: revision 1.6-1.8 usr.bin/sort/append.c: revision 1.15-1.22 usr.bin/sort/fields.c: revision 1.20-1.30 usr.bin/sort/files.c: revision 1.27-1.40 usr.bin/sort/fsort.c: revision 1.33-1.45 usr.bin/sort/fsort.h: revision 1.14-1.17 usr.bin/sort/init.c: revision 1.19-1.23 usr.bin/sort/msort.c: revision 1.19-1.28 usr.bin/sort/radix_sort.c: revision 1.1-1.4 usr.bin/sort/sort.1: revision 1.27-1.29 usr.bin/sort/sort.c: revision 1.47-1.56 usr.bin/sort/sort.h: revision 1.20-1.30 usr.bin/sort/tmp.c: revision 1.14-1.15
Only use radix sort for in-memory sort, always merge temporary files. Use a local radixsort() function so we can pass record length. Avoid use of weight tables for key compares. Fix generation of keys for numbers, negate value for reverse sort. Write file in reverse-key order for 'sort -n'. 'sort -S' now does a posix sort (sort matching keys by record data). Ensure merge sort doesn't have too many temporary files open. Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816 PR#37860 PR#39308 PR#42094
|