Cross Reference: /src/usr.bin/sort/append.c

History log of /src/usr.bin/sort/append.c
Revision	Date	Author	Comments
1.23	06-Nov-2009	joerg	Retire __SCCSID. It has only archeological value now. Also retire lint conditional around __RCSID, lint can handle that fine.
1.22	10-Sep-2009	dsl	Save length of key instead of relying of the weight of the record sep. This frees a byte value to use for 'end of key' (to correctly sort short keys) while still having a weight assigned to the field sep. (Unless -t is given, the field sep is in the field data.) Do reverse sorts by writing the output file in reverse order (rather than reversing the sort - apart from merges). All key compares are now unweighted. For 'sort -u' mark duplicates keys during the sort and don't write to the output. Use -S to mean a posix sort - where equal keys are sorted using the raw record (rather than being kept in the original order). For 'sort -f' (no keys) generate a key of the folded data (as for -n -i and -d), simplifies the code and allows a 'posix' sort.
1.21	05-Sep-2009	dsl	Now we have our own radix_sort() change the interface so that we pass an array of 'RECHEADER *' and remove all the crappy stuff that backed up by REC_DATA_OFFSET (etc). Also change radix_sort() to return the number of elements, soon to be used to drop duplicate keys (for sort -u).
1.20	22-Aug-2009	dsl	Rework the way sort generates sort keys: - If we generate a key, it is always sortable using memcmp() - If we are sorting the whole record, then a weight-table must be used during compares. - Major surgery to encoding of numbers to ensure unique keys for equal numeric values. Reverse numerics are handled by inverting the sign. - Case folding (-f) is handled when the sort keys are generated. No other code has to care at all. - Key uniqueness (-u) is done during merge for large datasets. It only has to be done when writing the output file for small files. Since the file is in key order this is simple! Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504 PR/36816 PR/37860 PR/39308 Also PR/18614 should no longer die, but a little more work needs to be done on the merging for very large files.
1.19	20-Aug-2009	dsl	Delete more unwanted/unused cruft. Simplify logic for reading input records. Do a merge sort whenever we have 16 partial sorted blocks. The patient is breathing, but still carrying a lot of extra weight.
1.18	18-Aug-2009	dsl	The code that attempted to sort large files by sorting each chunk by the first key byte and writing to a temp file, then sorting the records from each temp file that had the same first key byte (and repeating for upto 4 key bytes) was a nice idea, but completely doomed to failure. Eg PR/9308 where a 70MB file has all but one record the same and short keys. Not only does the code not work, it is rather guaranteed to be slow. Instead always use a merge sort for fully sorted chunk of records (each temporary file contains one lot of sorted records). The -H option already did this, so just rip out all the code and variables that can't be used when -H was specified. Further cleanup to come ...
1.17	16-Aug-2009	dsl	'depth' is used for the number of bytes into the key that the pointers reference, when we want to find the record header put the larger value into 'hdr_off' to avoid any confusion that the code might be changing 'depth'! There is now no need to save the original value as 'odepth' in append.c. All an a vague attempt to make this code slightly readable.
1.16	16-Aug-2009	dsl	Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which is defined as offsetof(RECHEADER, data). Delete TRECHEADER.
1.15	15-Aug-2009	dsl	Ansify. I'm looking at fixing the 'sort -n' fubars, but this code is an inpeneterable mess - which needs some fixing first!
1.14	28-Apr-2008	martin	branches: 1.14.6; 1.14.12; Remove clause 3 and 4 from TNF licenses
1.13	15-Feb-2004	jdolecek	branches: 1.13.32; fix some cases of use of unitialized variables
1.12	07-Aug-2003	jdolecek	add TNF copyright
1.11	07-Aug-2003	agc	Move UCB-licensed code from 4-clause to 3-clause licence. Patches provided by Joel Baker in PR 22365, verified by myself.
1.10	19-Feb-2001	jdolecek	Pull up various cosmetic (mostly whitespace) changes from OpenBSD. This is primarily to ease syncing the two versions.
1.9	18-Jan-2001	jdolecek	cosmetic style change
1.8	11-Jan-2001	jdolecek	general cleanup of file list passing: * get rid of union f_handle, replace by passing explicit int parameter and (new) struct filelist * add new typedefs gen_func_t and put_func_t and use where appropriate
1.7	08-Jan-2001	jdolecek	by default, use stable sort add -S flag to switch to non-stable sort; for GNU sort compatibility, provide -s flag too
1.6	17-Oct-2000	jdolecek	fix bugs caused by implicit assumption that 'length' and 'offset' members of struct recheader/trecheader are shorts - they are size_t now this makes sort pass all tests in TEST/stests again after my last change other misc cosmetic changes
1.5	16-Oct-2000	jdolecek	constify
1.4	15-Oct-2000	jdolecek	don't use register declarations
1.3	07-Oct-2000	bjh21	Two classes of changes from the initial OpenBSD commit of this sort(1): FILE * variables are called "fp" rather than "fd". Better (safer) temporary-file handling.
1.2	07-Oct-2000	bjh21	Hit sort(1) with a hammer till it compiles. Also add RCSIDs.
1.1	07-Oct-2000	bjh21	branches: 1.1.1; Initial revision
1.1.1.1	07-Oct-2000	bjh21	4.4BSD-Lite2 contrib/sort
1.13.32.1	18-May-2008	yamt	sync with head.
1.14.12.1	21-Apr-2010	matt	sync to netbsd-5
1.14.6.1	14-Oct-2009	sborrill	Pull up the following revisions(s) (requested by dsl in ticket #1084): usr.bin/sort/Makefile: revision 1.6-1.8 usr.bin/sort/append.c: revision 1.15-1.22 usr.bin/sort/fields.c: revision 1.20-1.30 usr.bin/sort/files.c: revision 1.27-1.40 usr.bin/sort/fsort.c: revision 1.33-1.45 usr.bin/sort/fsort.h: revision 1.14-1.17 usr.bin/sort/init.c: revision 1.19-1.23 usr.bin/sort/msort.c: revision 1.19-1.28 usr.bin/sort/radix_sort.c: revision 1.1-1.4 usr.bin/sort/sort.1: revision 1.27-1.29 usr.bin/sort/sort.c: revision 1.47-1.56 usr.bin/sort/sort.h: revision 1.20-1.30 usr.bin/sort/tmp.c: revision 1.14-1.15 Only use radix sort for in-memory sort, always merge temporary files. Use a local radixsort() function so we can pass record length. Avoid use of weight tables for key compares. Fix generation of keys for numbers, negate value for reverse sort. Write file in reverse-key order for 'sort -n'. 'sort -S' now does a posix sort (sort matching keys by record data). Ensure merge sort doesn't have too many temporary files open. Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816 PR#37860 PR#39308 PR#42094

OpenGrok