Home | History | Annotate | only in /src/usr.bin/sort
History log of /src/usr.bin/sort
RevisionDateAuthorComments
 1.12 03-Aug-2023  rin Revert CC_WNO_USE_AFTER_FREE from Makefile's (thanks uwe@)
 1.11 03-Aug-2023  rin Sprinkle CC_WNO_USE_AFTER_FREE for GCC 12

All of them are blamed for idiom equivalent to:
newbuf = realloc(buf, size);
p = newbuf + (p - buf);
 1.10 03-Jun-2023  lukem bsd.own.mk: rename GCC_NO_* to CC_WNO_*

Rename compiler-warning-disable variables from
GCC_NO_warning
to
CC_WNO_warning
where warning is the full warning name as used by the compiler.

GCC_NO_IMPLICIT_FALLTHRU is CC_WNO_IMPLICIT_FALLTHROUGH

Using the convention CC_compilerflag, where compilerflag
is based on the full compiler flag name.
 1.9 13-Oct-2019  mrg introduce some common variables for use in GCC warning disables:

GCC_NO_FORMAT_TRUNCATION -Wno-format-truncation (GCC 7/8)
GCC_NO_STRINGOP_TRUNCATION -Wno-stringop-truncation (GCC 8)
GCC_NO_STRINGOP_OVERFLOW -Wno-stringop-overflow (GCC 8)
GCC_NO_CAST_FUNCTION_TYPE -Wno-cast-function-type (GCC 8)

use these to turn off warnings for most GCC-8 complaints. many
of these are false positives, most of the real bugs are already
commited, or are yet to come.


we plan to introduce versions of (some?) of these that use the
"-Wno-error=" form, which still displays the warnings but does
not make it an error, and all of the above will be re-considered
as either being "fix me" (warning still displayed) or "warning
is wrong."
 1.8 10-Sep-2009  dsl branches: 1.8.46;
Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.7 05-Sep-2009  dsl Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
 1.6 14-Apr-2009  lukem Enable WARNS=4 by default for usr.bin, except for:
awk bdes checknr compile_et error gss hxtool kgetcred kinit
klist ldd less lex locale login m4 man menuc mk_cmds
mklocale msgc openssl rpcgen rpcinfo sdiff spell ssh
string2key telnet tn3270 verify_krb5_conf xlint
 1.5 20-Mar-2003  jdolecek branches: 1.5.40; 1.5.42; 1.5.46;
this builds with WARNS=2
 1.4 19-Feb-2001  jdolecek put tmp.c back to Makefile, too
 1.3 08-Jan-2001  jdolecek make ftmp() wrapper aroung tmpfile(), there is no need to reimplement it
move ftmp() from tmp.c to files.c
g/c no longer needed stuff
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.5.46.1 21-Apr-2010  matt sync to netbsd-5
 1.5.42.1 13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.5.40.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.8.46.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.23 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.22 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.21 05-Sep-2009  dsl Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
 1.20 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.19 20-Aug-2009  dsl Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
 1.18 18-Aug-2009  dsl The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
 1.17 16-Aug-2009  dsl 'depth' is used for the number of bytes into the key that the pointers
reference, when we want to find the record header put the larger value
into 'hdr_off' to avoid any confusion that the code might be changing
'depth'!
There is now no need to save the original value as 'odepth' in append.c.
All an a vague attempt to make this code slightly readable.
 1.16 16-Aug-2009  dsl Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which
is defined as offsetof(RECHEADER, data). Delete TRECHEADER.
 1.15 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.14 28-Apr-2008  martin branches: 1.14.6; 1.14.12;
Remove clause 3 and 4 from TNF licenses
 1.13 15-Feb-2004  jdolecek branches: 1.13.32;
fix some cases of use of unitialized variables
 1.12 07-Aug-2003  jdolecek add TNF copyright
 1.11 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.10 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.9 18-Jan-2001  jdolecek cosmetic style change
 1.8 11-Jan-2001  jdolecek general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
 1.7 08-Jan-2001  jdolecek by default, use stable sort
add -S flag to switch to non-stable sort; for GNU sort compatibility,
provide -s flag too
 1.6 17-Oct-2000  jdolecek fix bugs caused by implicit assumption that 'length' and
'offset' members of struct recheader/trecheader are shorts - they are size_t
now
this makes sort pass all tests in TEST/stests again after my last change

other misc cosmetic changes
 1.5 16-Oct-2000  jdolecek constify
 1.4 15-Oct-2000  jdolecek don't use register declarations
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.13.32.1 18-May-2008  yamt sync with head.
 1.14.12.1 21-Apr-2010  matt sync to netbsd-5
 1.14.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.7 24-Dec-2002  jdolecek put contents of extern.h directly to sort.h, and g/c extern.h
de-__P()
 1.6 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.5 12-Jan-2001  jdolecek comsetic prototype adjustment
 1.4 11-Jan-2001  jdolecek general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
 1.3 16-Oct-2000  jdolecek constify, prototype for seq() moved to files.c
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.33 20-Jan-2013  apb When parsing numbers, allow a leading '+'.
 1.32 18-Dec-2010  christos branches: 1.32.6; 1.32.12;
Add an 'l' style for sorting that sorts by the string length of the field.
 1.31 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.30 07-Oct-2009  dsl When encoding numbers, we can use all 8 bits for exponent values.
 1.29 16-Sep-2009  dsl Minor tweaks to the key generation for numeric fields.
Use 1's compliment for -ve numbers to avoid confitionals.
 1.28 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.27 22-Aug-2009  dsl Fix generation of unmasked alpha keys.
 1.26 22-Aug-2009  dsl Only process each number digit once.
 1.25 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.24 20-Aug-2009  dsl Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
 1.23 15-Aug-2009  dsl Always add an REC_D char (usually \n) as the last sort key char - we
almost always need one.
But do ADD it, instead of overwriting the last byte of the last key since
that may be requesting the other end of the sort order.
There is no need to check for space for the line after adding the key,
but we might as well check before - just to optimise that case.
This might fix some of the sort bugs - but not the one I'm looking at!
 1.22 15-Aug-2009  dsl Remove reference to db.h by using separate ptr+len fields for the only
structure that used it.
Pass end of keybuf area, not size to enterkey() - largely to remove a
variable who'se use isn't obvious from the name!
The structute of this code sucks.
 1.21 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.20 13-Apr-2009  lukem Fix WARNS=4 issues (-Wcast-qual -Wsign-compare)
 1.19 28-Apr-2008  martin branches: 1.19.6; 1.19.8; 1.19.12;
Remove clause 3 and 4 from TNF licenses
 1.18 14-Mar-2004  heas branches: 1.18.32;
Do not step over the edge of the buffer (check for '\0'). This just happens
to not lose on i386 because another buffer appears immediately following.
Regress tests all passed.
 1.17 15-Feb-2004  jdolecek make sure zero is recognized as regular number in number(), and thus sorted
properly with -n
fixes PR bin/20259 by Giles Lean, PR bin/20542 by Peter Seebach, and
part of PR bin/24316 by MLH
 1.16 15-Feb-2004  jdolecek fix -Wunitialized warnings
 1.15 18-Oct-2003  itojun KNF (mostly whitespace)
 1.14 07-Aug-2003  jdolecek add TNF copyright
 1.13 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.12 09-Apr-2003  jdolecek rename local macro blancmange() to SKIP_BLANKS(), to clarify what
it does and to better signal it might modify it's arguments
fixes PR bin/20546 by Peter Seebach
 1.11 24-Dec-2002  jdolecek add extern definition for ncols and clist[] to sort.h, eliminate extra
definitions in init.c and field.c
g/c MAXMERGE
 1.10 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.9 19-Feb-2001  jdolecek enterfield(): test the buffer size BEFORE assignment also for the other code
branch, since we might get called with tablepos == endkey for some special
input files (where an record would happen to fit exactly to the input
buffer) - BTW, this bug looks like it has been here ~forever ...

This seems to fix the sort crash for 'make british' build of ispell package,
as reported by Mark White at current-users@.
 1.8 19-Feb-2001  jdolecek enterkey():
* move the test for keybuf size before keypos[-1] assignment, "just in case"
* move the keypos assignment to improve readability
 1.7 13-Jan-2001  jdolecek also remove the clpos++ added in rev 1.4
 1.6 13-Jan-2001  jdolecek undo broken revision 1.4
 1.5 12-Jan-2001  jdolecek for stable sort, arrange so that really only relevant part of line
is used for sort - this makes sort pass regression test number 36

while here, slighly adjust code formating on couple of places
 1.4 17-Oct-2000  jdolecek cosmetic change in way one of for variables is updated
 1.3 15-Oct-2000  jdolecek don't use register declarations
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.18.32.1 18-May-2008  yamt sync with head.
 1.19.12.1 21-Apr-2010  matt sync to netbsd-5
 1.19.8.1 13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.19.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.32.12.1 25-Feb-2013  tls resync with head
 1.32.6.1 23-Jan-2013  yamt sync with head
 1.43 10-Aug-2023  mrg avoid various use-after-free issues.

create a ptrdiff_t offset between the start of an allocation region and
some interesting pointer, so it can be adjusted with this offset after
realloc() returns. for pdisk(), realloc() is a locally inlind malloc()
and free() pair.

for mail(1), this required a little bit more effort as the old pointer
was passed into another file for fix-ups there, and that code needed to
be adjusted for offset vs old pointer usage.

found by GCC 12.
 1.42 05-Aug-2015  mrg add a description about what was being attempted to failed writes messages.
 1.41 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.40 07-Oct-2009  dsl long align records written to temporary files.
 1.39 28-Sep-2009  dsl Fix borked fix for sort relying on realloc() changing the buffer end.
Sorts of more than 8MB data now probably work again.
 1.38 26-Sep-2009  dsl Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
 1.37 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.36 05-Sep-2009  dsl Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
 1.35 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.34 18-Aug-2009  dsl The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
 1.33 16-Aug-2009  dsl Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which
is defined as offsetof(RECHEADER, data). Delete TRECHEADER.
 1.32 15-Aug-2009  dsl Remove reference to db.h by using separate ptr+len fields for the only
structure that used it.
Pass end of keybuf area, not size to enterkey() - largely to remove a
variable who'se use isn't obvious from the name!
The structute of this code sucks.
 1.31 15-Aug-2009  dsl linebuf and linebuf_size are only used inside seq() - which also not
only has its own static variable, but will also extend the buffer.
Remove linebuf/size and change seq() to use a private, locally managed
buffer.
 1.30 15-Aug-2009  dsl Remove the unused 'DBT *key' parameter from seq().
 1.29 15-Aug-2009  dsl In makeline() change 'pos' from 'char *' to 'u_char *' and remove all
the casts associated with its use.
None of the uses can possibly care about the signedness of the pointer.
 1.28 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.27 13-Apr-2009  lukem Fix WARNS=4 issues (-Wcast-qual -Wsign-compare)
 1.26 28-Apr-2008  martin branches: 1.26.6; 1.26.8; 1.26.12;
Remove clause 3 and 4 from TNF licenses
 1.25 11-May-2006  mrg branches: 1.25.20;
char -> u_char in a couple of places to match other variables.
 1.24 07-Jun-2005  he Initialize a local variable to appease -Wuninitialized.
Marked with XXXGCC for pmppc (found while compiling for it).

Reviewed by lukem.
 1.23 15-Feb-2004  jdolecek fix some cases of use of unitialized variables
 1.22 18-Oct-2003  itojun KNF (mostly whitespace)
 1.21 16-Oct-2003  itojun safer use of realloc
 1.20 07-Aug-2003  jdolecek add TNF copyright
 1.19 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.18 24-Dec-2002  jdolecek max_o in struct tempfile needs to be off_t
use fseeko() rather than fseek() when changing file offset using max_o
 1.17 15-May-2001  jdolecek Make compilable with -Wshadow
 1.16 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.15 19-Feb-2001  jdolecek resurrect old ftmp() - it supports alternative directory for temporary
file, which is needed for -T support
 1.14 18-Jan-2001  jdolecek makeline(): make the overflow handling code safe vs. buffer realloc, add
a comment explaining what we do here
 1.13 13-Jan-2001  jdolecek makeline(): put back the memmove(3) removed in rev 1.5 in belief it's been
redundant. "Oops"
This fixes bug reported to me by Simon Burge.
 1.12 13-Jan-2001  itojun fix few confusing indentation. XXX still broken
 1.11 13-Jan-2001  jdolecek one more warning to kill
 1.10 13-Jan-2001  jdolecek Since SUS explicitly specifies sort(1) should append a record
delimiter to file if it doesn't end with one, don't warn when this
happens.
 1.9 12-Jan-2001  jdolecek remove #if 0 part
 1.8 11-Jan-2001  jdolecek general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
 1.7 08-Jan-2001  jdolecek make ftmp() wrapper aroung tmpfile(), there is no need to reimplement it
move ftmp() from tmp.c to files.c
g/c no longer needed stuff
 1.6 17-Oct-2000  jdolecek fix bugs caused by implicit assumption that 'length' and
'offset' members of struct recheader/trecheader are shorts - they are size_t
now
this makes sort pass all tests in TEST/stests again after my last change

other misc cosmetic changes
 1.5 16-Oct-2000  jdolecek enlarge line buffer as necessary, so that it's possible
to process lines longer than 65522 characters
constify, rename MAXLLEN to DEFLLEN
 1.4 15-Oct-2000  jdolecek don't use register declarations
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.25.20.1 18-May-2008  yamt sync with head.
 1.26.12.1 21-Apr-2010  matt sync to netbsd-5
 1.26.8.1 13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.26.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.47 05-Feb-2010  enami Don't touch past the end of allocated region. It results segmentation
violation.
 1.46 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.45 09-Oct-2009  dsl If anyone is stupid enough to feed records longer than 8MB into sort, don't
sit in an infinite loop, instead eat memory until we have read 8 records.
 1.44 09-Oct-2009  dsl Don't give merge an empty file when we detect EOF with nothing in our
buffer.
 1.43 28-Sep-2009  dsl Fix borked fix for sort relying on realloc() changing the buffer end.
Sorts of more than 8MB data now probably work again.
 1.42 26-Sep-2009  dsl Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
 1.41 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.40 05-Sep-2009  dsl Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
 1.39 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.38 20-Aug-2009  dsl Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
 1.37 18-Aug-2009  dsl The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
 1.36 16-Aug-2009  dsl 'depth' is used for the number of bytes into the key that the pointers
reference, when we want to find the record header put the larger value
into 'hdr_off' to avoid any confusion that the code might be changing
'depth'!
There is now no need to save the original value as 'odepth' in append.c.
All an a vague attempt to make this code slightly readable.
 1.35 16-Aug-2009  dsl Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which
is defined as offsetof(RECHEADER, data). Delete TRECHEADER.
 1.34 15-Aug-2009  dsl linebuf and linebuf_size are only used inside seq() - which also not
only has its own static variable, but will also extend the buffer.
Remove linebuf/size and change seq() to use a private, locally managed
buffer.
 1.33 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.32 28-Apr-2008  martin branches: 1.32.6; 1.32.12;
Remove clause 3 and 4 from TNF licenses
 1.31 10-Jun-2005  jmc branches: 1.31.20;
Init some variables the compiler is complaining about and mark w. XXGCC as it
affects only m68k compilers.
 1.30 15-Feb-2004  jdolecek fix -Wunitialized warnings
 1.29 18-Oct-2003  itojun KNF (mostly whitespace)
 1.28 17-Oct-2003  enami Test the value returned by realloc() rather than anything else.
 1.27 16-Oct-2003  itojun safer use of realloc
 1.26 07-Aug-2003  jdolecek add TNF copyright
 1.25 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.24 24-Dec-2002  jdolecek improve previous slightly - need >= (not just >) in CHECKFSTACK()
 1.23 24-Dec-2002  jdolecek make sure we don't attempt to write past end of fstack[], error out instead

this fixes second part ('tmpdir get smashed') of bin/18614 by Michael Graff
 1.22 10-Oct-2002  jdolecek g/c extern reference to toutpath
 1.21 30-Sep-2002  enami Use the right file to output merge result.
 1.20 15-May-2001  jdolecek branches: 1.20.2;
Only try to copy the extra incomplete record data if there is anything
actually read already. Albeit it's not damaging to copy zero data
for bufend == crec->data case, the buffer end could also be between
memory position 'crec' and 'crec->data'. Thus, we could end up with
negative 'bufend - crec->data' value, and obvious havoc.

This change fixes lib/12673, though the problem was masked and no longer
repeatable with the provided example after the recent buffer size bump.
The change was tested with the buffer size change backed off, and really
fixes the problem in the PR.
 1.19 15-May-2001  jdolecek fsort(): rearrange the push code to reduce one level of intendation,
free keylist, buffer on end of work; no functional changes
 1.18 14-May-2001  jdolecek Bump the initial record buffer size to 1MB and allow it to grow to 8MB,
if needed and record count is within bounds (<MAXNUM), rather than
sorting the input by 64KB chunks. This cuts the number of needed
temporary files considerably (and improves performance, too).
Slighly adjust some #defines, mostly to power of 2 values.

This addresses bin/12673 and bin/12614, as well as complains from other
people.
 1.17 20-Feb-2001  jdolecek fsort(): don't call append() with zero nelem
This fixes the 'sort -f /dev/null' coredump reported on current-users.
 1.16 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.15 19-Feb-2001  jdolecek oops - wrong file, backoff local test change
 1.14 19-Feb-2001  jdolecek enterkey():
* move the test for keybuf size before keypos[-1] assignment, "just in case"
* move the keypos assignment to improve readability
 1.13 19-Feb-2001  jdolecek cosmetic changes - make keylist[] static and remove extern definition
in fsort.h, move macro SALIGN() from sort.h to fsort.c
 1.12 05-Feb-2001  itojun make sure to initialize malloc'ed region. PR 12138. found by malloc.conf=AJ
 1.11 19-Jan-2001  jdolecek use MERGE_FNUM instead of magic value 16
 1.10 18-Jan-2001  jdolecek keep bumping the record buffer up to 8 records - this is to avoid making
excessive number of temporary files for oversized records; the way the
buffer is enlarged is now also safer

initialize 'bufsize' statically, so that the value can be safely used
in e.g. msort.c:fmerge()
 1.9 13-Jan-2001  itojun fix few confusing indentation. XXX still broken
 1.8 11-Jan-2001  jdolecek general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
 1.7 08-Jan-2001  jdolecek by default, use stable sort
add -S flag to switch to non-stable sort; for GNU sort compatibility,
provide -s flag too
 1.6 17-Oct-2000  jdolecek fix bugs caused by implicit assumption that 'length' and
'offset' members of struct recheader/trecheader are shorts - they are size_t
now
this makes sort pass all tests in TEST/stests again after my last change

other misc cosmetic changes
 1.5 16-Oct-2000  jdolecek enlarge line buffer as necessary, so that it's possible
to process lines longer than 65522 characters
constify, rename MAXLLEN to DEFLLEN
 1.4 15-Oct-2000  jdolecek don't use register declarations
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.20.2.1 01-Oct-2002  lukem Pull up revision 1.21 (requested by enami in ticket #883):
Use the right file to output merge result.
 1.31.20.1 18-May-2008  yamt sync with head.
 1.32.12.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE
 1.32.12.1 21-Apr-2010  matt sync to netbsd-5
 1.32.6.2 29-Jun-2010  riz Pull up following revision(s) (requested by dholland in ticket #1420):
usr.bin/sort/sort.h: revision 1.31
usr.bin/sort/sort.c: revision 1.58
usr.bin/sort/fsort.c: revision 1.47
usr.bin/sort/msort.c: revision 1.30
Don't touch past the end of allocated region. It results segmentation
violation.
 1.32.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.18 25-Oct-2023  simonb Correct a comment - 8 * 1 million is 8 million, not 10 million (!).
 1.17 26-Sep-2009  dsl Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
 1.16 05-Sep-2009  dsl Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
 1.15 20-Aug-2009  dsl Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
 1.14 15-Aug-2009  dsl linebuf and linebuf_size are only used inside seq() - which also not
only has its own static variable, but will also extend the buffer.
Remove linebuf/size and change seq() to use a private, locally managed
buffer.
 1.13 28-Apr-2008  martin branches: 1.13.6; 1.13.12;
Remove clause 3 and 4 from TNF licenses
 1.12 07-Aug-2003  jdolecek branches: 1.12.32;
add TNF copyright
 1.11 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.10 24-Dec-2002  jdolecek max_o in struct tempfile needs to be off_t
use fseeko() rather than fseek() when changing file offset using max_o
 1.9 14-May-2001  jdolecek Bump the initial record buffer size to 1MB and allow it to grow to 8MB,
if needed and record count is within bounds (<MAXNUM), rather than
sorting the input by 64KB chunks. This cuts the number of needed
temporary files considerably (and improves performance, too).
Slighly adjust some #defines, mostly to power of 2 values.

This addresses bin/12673 and bin/12614, as well as complains from other
people.
 1.8 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.7 19-Feb-2001  jdolecek cosmetic changes - make keylist[] static and remove extern definition
in fsort.h, move macro SALIGN() from sort.h to fsort.c
 1.6 19-Jan-2001  jdolecek put MERGE_FNUM here, slighly clean up other defines
 1.5 18-Jan-2001  jdolecek make DEFLLEN plain 1 << 16, don't substract magic value
 1.4 16-Oct-2000  jdolecek enlarge line buffer as necessary, so that it's possible
to process lines longer than 65522 characters
constify, rename MAXLLEN to DEFLLEN
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.12.32.1 18-May-2008  yamt sync with head.
 1.13.12.1 21-Apr-2010  matt sync to netbsd-5
 1.13.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.30 19-Sep-2021  andvar fix few more typos in comments, messages and documentation.
 1.29 18-Oct-2013  christos fix unused variable warnings
 1.28 18-Dec-2010  christos branches: 1.28.6; 1.28.12;
Add an 'l' style for sorting that sorts by the string length of the field.
 1.27 06-Jun-2010  wiz Fix typo in comment.
 1.26 05-Jun-2010  dholland Rework previous change to fixit() to not trip on option arguments. (Noticed
by wiz.) Clarify the loop logic involved.
 1.25 27-May-2010  dholland Don't recognize "+3" after -- or after the first non-option argument.
This prevents converting "+3" into "-k4.1" in places where getopt
won't recognize it, which in turn prevents silly error messages and
lossage trying to sort files whose names begin with +. PR 43358.
 1.24 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.23 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.22 05-Sep-2009  dsl Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
 1.21 22-Aug-2009  dsl <space> and <tab> at the start of key fields are supposed to be sorted
as if part of the data.
This is a bit fubar since we need a value than sorts before any byte value
as a key field separator - so need 257 byte values (since radixsort() doesn't
take a length for each record).
For now map '\t' to 0x01 and hope no one will notice!
 1.20 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.19 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.18 28-Apr-2008  martin branches: 1.18.6; 1.18.12;
Remove clause 3 and 4 from TNF licenses
 1.17 23-Oct-2006  jdolecek branches: 1.17.16;
fix check for field order to allow .0 form in "-k 1.2,1.0"

fix provided in PR bin/25572 by Ross Patterson
 1.16 03-Nov-2004  dsl Add (unsigned char) cast to ctype functions
 1.15 18-Feb-2004  jdolecek insertcol() may insert up to two items to clist, so allocate memory accordingly
this fixes sort regression test 28A and 28B
 1.14 17-Feb-2004  jdolecek fix parsing of some +POS -POS variants, as pointed out by sort regression
tests
 1.13 17-Feb-2004  itojun safer realloc idiom
minor knf
 1.12 15-Feb-2004  jdolecek remove compile-time limit on number of -k options, allocate necessary
structures as-needed
 1.11 15-Feb-2004  jdolecek rewrite fixit() to duplicate less code, and comment the contents better;
also removes compile-time dependancy on ND constant
 1.10 15-Feb-2004  jdolecek g/c redundant setfield() prototype
clear setcolumn() somewhat - use strtol() instead of sscanf(), and
simplify flag setting code
 1.9 07-Aug-2003  jdolecek add TNF copyright
 1.8 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.7 24-Dec-2002  jdolecek add extern definition for ncols and clist[] to sort.h, eliminate extra
definitions in init.c and field.c
g/c MAXMERGE
 1.6 31-Dec-2001  thorpej Change some:

foo += sscanf(++foo, ...);

constructs to:

++foo;
foo += sscanf(foo, ...);

to avoid the following warning from gcc 3.1:

warning: operation on `pos' may be undefined
 1.5 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.4 12-Jan-2001  jdolecek use toupper() where appropriate
whitespace/parenthesis police
 1.3 16-Oct-2000  jdolecek cosmetic change: make setcolumn() static, remove bogus redundant setcolumn() prototype
inside setcolumn() function, constify
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.17.16.1 18-May-2008  yamt sync with head.
 1.18.12.1 21-Apr-2010  matt sync to netbsd-5
 1.18.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.28.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.28.6.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.31 01-Jun-2016  kre Add the posix -C option (-c but quieter). Fix -R to work properly when
setting \n as the record delimited using a numeric value rather than literal
\n - and to not incorrectly turn \n into a field separator if -R is used to
make some other char the record separator (\n becomes a field separator in
that case as long as the field separator remains "white space" but should not
be in any other case - unless set explicitly of course.)

Plus more cosmetic changes - the man page and usage are updated to make it
more clear that the 2 (or 1) params to -k are not fields (field1 and field2)
but specifiers of the beginning and end of one key field. There was an
unused 'x' option in the GETOPTS string. The usage message is reformatted
to display properly on both 80 col and > 80 col displays (on < 80 it will
still probably look pretty ugly ... perhaps not quite so bad though), and
is also updated to show the different usage for the -c case (and -C) from the
others (only 1 file permitted) - the man page synopsis has a similar update.

Using more than one of -c -C or -m generates a usage message rather than
just ignoring the -m as it did before (there was no -C before of course).

Aside from the bug fix to the interaction between -R and -t, there are no
changes that affect the way anything is sorted (or read, or written).

Discussed on tech-userlevel earlier this week.
 1.30 05-Feb-2010  enami Don't touch past the end of allocated region. It results segmentation
violation.
 1.29 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.28 09-Oct-2009  dsl When we need to merge more than 16 files, do them in a hierarchy.
Reduces the amount of data written to temporary files.
The 3-level stack has to do a simple reduce after 4352 input files, for
a normal file sort this is 35GB of data or about 500 million records.
This needs about 50 open fd's - which should be ok.
Clearly the merge sort could process more input files in one go - speeding
up the sort, but at some point the number of input files would exceed
whatever limit was applied.
 1.27 26-Sep-2009  dsl Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
 1.26 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.25 05-Sep-2009  dsl Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
 1.24 22-Aug-2009  dsl Add some comments and clarifications to this inpeneterable code.
When merging ensure we accurable sort records with identical keys by
file-number, otherwise a 'stable' sort won't be!
 1.23 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.22 20-Aug-2009  dsl Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
 1.21 16-Aug-2009  dsl Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which
is defined as offsetof(RECHEADER, data). Delete TRECHEADER.
 1.20 15-Aug-2009  dsl linebuf and linebuf_size are only used inside seq() - which also not
only has its own static variable, but will also extend the buffer.
Remove linebuf/size and change seq() to use a private, locally managed
buffer.
 1.19 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.18 28-Apr-2008  martin branches: 1.18.6; 1.18.12;
Remove clause 3 and 4 from TNF licenses
 1.17 17-Feb-2004  jdolecek branches: 1.17.32;
initialize malloc()ated memory
 1.16 18-Oct-2003  itojun KNF (mostly whitespace)
 1.15 16-Oct-2003  itojun safer use of realloc
 1.14 07-Aug-2003  jdolecek add TNF copyright
 1.13 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.12 20-Mar-2003  jdolecek get rid of one memmove() (not very significant)
remove ()'s from error messages
move some error checks immediatelly after appropriate realloc() calls
 1.11 25-Dec-2002  jdolecek make function merge() static in msort.c
cosmetic change to how local variable is incremented (moved to for(;;))
 1.10 19-Feb-2001  jdolecek Pull up various cosmetic (mostly whitespace) changes from OpenBSD.
This is primarily to ease syncing the two versions.
 1.9 19-Jan-2001  jdolecek merge(): use array of buffers instead of one big buffer for all records, and
enlarge them as necessary to read records from merged files; the buffers
are allocated once per program run, so there shouldn't be any
performance difference
This makes sort(1) pass also regression 40B and should make it
fully arbitrary long record capable.
XXX the buffer array could probably be freed on end of fmerge() to save memory
 1.8 13-Jan-2001  jdolecek when merging stuff from several files, make merge handle records correctly
for stable sort so that the records are not swapped arbitrarily - this makes
in-tree BSD sort(1) pass regression test 38

while here, do couple of cleanups, like s/16/MERGE_FNUM/ where appropriate,
making local stuff static and some intendation/code format changes
 1.7 11-Jan-2001  jdolecek general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
 1.6 17-Oct-2000  jdolecek order(): since getline()/getnext() behaviour wrt passed
end pointer has changed (full buffer is used instead of first DEFLLEN bytes)
the end pointer cannot be shared for crec and prec, we need to pass
different value in each case
 1.5 16-Oct-2000  jdolecek constify, rename MAXLLEN to DEFLLEN
 1.4 15-Oct-2000  jdolecek don't use register declarations
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.17.32.1 18-May-2008  yamt sync with head.
 1.18.12.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE
 1.18.12.1 21-Apr-2010  matt sync to netbsd-5
 1.18.6.2 29-Jun-2010  riz Pull up following revision(s) (requested by dholland in ticket #1420):
usr.bin/sort/sort.h: revision 1.31
usr.bin/sort/sort.c: revision 1.58
usr.bin/sort/fsort.c: revision 1.47
usr.bin/sort/msort.c: revision 1.30
Don't touch past the end of allocated region. It results segmentation
violation.
 1.18.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.6 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.5 07-Aug-2003  jdolecek branches: 1.5.32;
add TNF copyright
 1.4 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.5.32.1 18-May-2008  yamt sync with head.
 1.4 19-Sep-2009  dsl branches: 1.4.2; 1.4.4;
Fix sort -u, PR/42094
 1.3 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.2 05-Sep-2009  dsl Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
 1.1 05-Sep-2009  dsl Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
 1.4.4.2 21-Apr-2010  matt sync to netbsd-5
 1.4.4.1 19-Sep-2009  matt file radix_sort.c was added on branch matt-nb5-mips64 on 2010-04-21 05:27:12 +0000
 1.4.2.2 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.4.2.1 19-Sep-2009  sborrill file radix_sort.c was added on branch netbsd-5 on 2009-10-14 20:41:53 +0000
 1.41 17-Feb-2025  wiz capitalize POSIX
 1.40 01-Sep-2019  sevan branches: 1.40.10;
sort was there since v1
https://www.bell-labs.com/usr/dmr/www/man61.pdf
 1.39 11-Jul-2019  msaitoh branches: 1.39.2;
Fix typo (s/supress/suppress/).
 1.38 03-Jul-2017  wiz branches: 1.38.6;
Remove workaround for ancient HTML generation code.
 1.37 21-Dec-2016  abhinav Add missing full stop.
 1.36 01-Jun-2016  wiz branches: 1.36.2;
Sort options and their descriptions. Sync usage more with man page.
Bump date in man page for new option -C.
 1.35 01-Jun-2016  kre Add the posix -C option (-c but quieter). Fix -R to work properly when
setting \n as the record delimited using a numeric value rather than literal
\n - and to not incorrectly turn \n into a field separator if -R is used to
make some other char the record separator (\n becomes a field separator in
that case as long as the field separator remains "white space" but should not
be in any other case - unless set explicitly of course.)

Plus more cosmetic changes - the man page and usage are updated to make it
more clear that the 2 (or 1) params to -k are not fields (field1 and field2)
but specifiers of the beginning and end of one key field. There was an
unused 'x' option in the GETOPTS string. The usage message is reformatted
to display properly on both 80 col and > 80 col displays (on < 80 it will
still probably look pretty ugly ... perhaps not quite so bad though), and
is also updated to show the different usage for the -c case (and -C) from the
others (only 1 file permitted) - the man page synopsis has a similar update.

Using more than one of -c -C or -m generates a usage message rather than
just ignoring the -m as it did before (there was no -C before of course).

Aside from the bug fix to the interaction between -R and -t, there are no
changes that affect the way anything is sorted (or read, or written).

Discussed on tech-userlevel earlier this week.
 1.34 29-May-2013  wiz - Remove redundant argument to non-first `.Nm' macro;
- reference `-u' at `-c', to make more clear that the former can
be used with the latter;
- bump date.

From Bug Hunting.

While here, use Aq.
 1.33 20-Jan-2013  apb As from today, numeric fields may begin with an optional
plus or minus sign, not only an optional minus sign.
 1.32 18-Dec-2010  wiz branches: 1.32.6; 1.32.12;
Sort sections.
 1.31 18-Dec-2010  christos Add an 'l' style for sorting that sorts by the string length of the field.
 1.30 14-May-2010  jruoho RETURN VALUES -> EXIT STATUS.
 1.29 23-Aug-2009  wiz Fix pasto.
 1.28 22-Aug-2009  dsl Bring nearer to reality.
Note that -H is now ignored.
Move -S and -s (and -H) to the first list of options since they are
global ones, not ones that override the ordering rules.
 1.27 11-Mar-2009  joerg Don't workaround ancient macro argument limit with .Xo/.Xc.
 1.26 02-May-2008  martin branches: 1.26.6; 1.26.8; 1.26.12;
Move TNF licenses to 2 clause form
 1.25 23-Jul-2004  wiz branches: 1.25.26;
Sort options in SYNOPSIS. From Kouichirou Hiratsuka in PR 26278.
 1.24 07-Aug-2003  jdolecek add TNF copyright
 1.23 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.22 27-Jun-2003  wiz Pa Ar -> Ar.
 1.21 25-Feb-2003  wiz .Nm does not need a dummy argument ("") before punctuation or
for correct formatting of the SYNOPSIS any longer.
 1.20 04-Feb-2003  perry "Utilize" has exactly the same meaning as "use," but it is more
difficult to read and understand. Most manuals of English style
therefore say that you should use "use".
 1.19 06-Jan-2003  wiz compatibility, not compatiblity.
 1.18 08-Feb-2002  ross Generate <>& symbolically. I'm avoiding .../dist/... directories for now.
 1.17 08-Dec-2001  wiz Punctuation nits, sort SEE ALSO.
 1.16 16-Mar-2001  fair Add cross references for qsort(3), and radixsort(3), per PR 10567
 1.15 19-Feb-2001  jdolecek Pull in various cosmetic changes from OpenBSD version of this manpage - mostly
whitespace changes, which don't influence the layout of result manpage at all,
but also add -H to SYNOPSIS and state sort(1) appeared in v5, not v6 of
AT&T Unix.
 1.14 19-Feb-2001  jdolecek document -T and TMPDIR handling
resurrect ENVIRONMENT and FILES, adjust to be more correct
slighly adjust SYNOPSIS line, so that it looks little nicer :)
 1.13 07-Feb-2001  jdolecek move sections so that the order is more like the one specified by
mdoc.samples(7)
 1.12 07-Feb-2001  jdolecek use -R instead -w, to be compatible with OpenBSD
 1.11 07-Feb-2001  jdolecek s/-T/-w/
 1.10 16-Jan-2001  jdolecek set date to when this utility became default system sort(1) on NetBSD
add information about when it came to NetBSD to HISTORY
 1.9 13-Jan-2001  jdolecek note this sort(1) implementation appeared in 4.4BSD
 1.8 13-Jan-2001  jdolecek add -s/-S to synopsis
remove TMPDIR stuff - it no longer applies, at least for now
move the note about link/unlink from BUGS to NOTES
add note about trailing record separator and lack of restriction on
line length or allowed bytes
 1.7 08-Jan-2001  jdolecek by default, use stable sort
add -S flag to switch to non-stable sort; for GNU sort compatibility,
provide -s flag too
 1.6 07-Nov-2000  lukem fix up various .Nm abuses:
- keep the case consistent between the actual name and what's referenced.
e.g, if it's `foo', don't use '.Nm Foo' at the start of a sentence.
- remove unnecessary `.Nm foo' after the first occurrence (except for
using `.Nm ""' if there's stuff following, or for the 2nd and so on
occurrences in a SYNOPSIS
- use Sx, Ic, Li, Em, Sq, and Xr as appropriate
 1.5 16-Oct-2000  jdolecek enlarge line buffer as necessary, so that it's possible
to process lines longer than 65522 characters
constify, rename MAXLLEN to DEFLLEN
 1.4 14-Oct-2000  bjh21 HEAVY formatting cleanup.
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.25.26.1 18-May-2008  yamt sync with head.
 1.26.12.1 21-Apr-2010  matt sync to netbsd-5
 1.26.8.1 13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.26.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.32.12.2 23-Jun-2013  tls resync from head
 1.32.12.1 25-Feb-2013  tls resync with head
 1.32.6.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.32.6.1 23-Jan-2013  yamt sync with head
 1.36.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.38.6.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.39.2.1 05-Sep-2019  martin Pull up following revision(s) (requested by sevan in ticket #174):
lib/libc/sys/chmod.2: revision 1.48
lib/libc/sys/stat.2: revision 1.59
lib/libc/sys/unlink.2: revision 1.30
lib/libc/sys/lseek.2: revision 1.25
lib/libc/sys/getuid.2: revision 1.18
lib/libc/sys/chown.2: revision 1.37
lib/libm/man/exp.3: revision 1.32
lib/libm/man/log.3: revision 1.7
lib/libc/sys/open.2: revision 1.60
lib/libc/stdio/fopen.3: revision 1.36
lib/libc/stdio/putc.3: revision 1.14
lib/libc/sys/mount.2: revision 1.51
share/man/man9/copy.9: revision 1.22
share/man/man9/uiomove.9: revision 1.20
lib/libc/sys/setuid.2: revision 1.23
lib/libc/sys/close.2: revision 1.18
sbin/init/init.8: revision 1.61
lib/libc/sys/write.2: revision 1.36
lib/libc/sys/read.2: revision 1.39
sbin/init/init.8: revision 1.62
lib/libc/sys/wait.2: revision 1.40
usr.bin/tty/tty.1: revision 1.10
lib/libc/sys/link.2: revision 1.33
usr.bin/du/du.1: revision 1.24
lib/libc/stdlib/exit.3: revision 1.17
usr.bin/su/su.1: revision 1.53
usr.bin/mail/mail.1: revision 1.66
lib/libc/sys/fork.2: revision 1.25
usr.bin/su/su.1: revision 1.54
usr.bin/mail/mail.1: revision 1.67
lib/libm/man/sin.3: revision 1.15
share/man/man9/intro.9: revision 1.26
share/man/man5/utmp.5: revision 1.17
lib/libc/compat-43/creat.3: revision 1.17
lib/libc/time/ctime.3: revision 1.61
lib/libcompat/4.1/stty.3: revision 1.10
usr.bin/dc/dc.1: revision 1.3
lib/libm/man/cos.3: revision 1.17
lib/libc/sys/chdir.2: revision 1.23
lib/libc/gen/exec.3: revision 1.30
lib/libc/gen/exec.3: revision 1.31
games/bcd/bcd.6: revision 1.18
games/bcd/bcd.6: revision 1.19
usr.bin/write/write.1: revision 1.7
usr.bin/wc/wc.1: revision 1.18
usr.bin/pr/pr.1: revision 1.24
usr.bin/who/who.1: revision 1.25
lib/libc/sys/mkdir.2: revision 1.30
lib/libc/stdio/getc.3: revision 1.13
usr.bin/sort/sort.1: revision 1.40
usr.bin/mesg/mesg.1: revision 1.11
share/man/man5/passwd.5: revision 1.34
sort was there since v1
https://www.bell-labs.com/usr/dmr/www/man61.pdf

dc was in v1
https://www.bell-labs.com/usr/dmr/www/man12.pdf

du was in v1
https://www.bell-labs.com/usr/dmr/www/man12.pdf

mail was in v1
https://www.bell-labs.com/usr/dmr/www/man12.pdf

mesg was in v1
https://www.bell-labs.com/usr/dmr/www/man12.pdf

Document history
https://www.bell-labs.com/usr/dmr/www/man13.pdf

su was in v1
https://www.bell-labs.com/usr/dmr/www/man13.pdf

Document history
https://www.bell-labs.com/usr/dmr/www/man13.pdf

Document history
https://www.bell-labs.com/usr/dmr/www/man14.pdf
Update URL

write was in v1
https://www.bell-labs.com/usr/dmr/www/man14.pdf
grammar

passwd(5) was in v1
https://www.bell-labs.com/usr/dmr/www/man51.pdf

utmp(5) was present in v1
https://www.bell-labs.com/usr/dmr/www/man51.pdf

Earliest version of wtmp I could find was in v3
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V3/man/man5/wtmp.5

Document history of chdir(2)
https://www.bell-labs.com/usr/dmr/www/man21.pdf

Document history of chmod(2)
https://www.bell-labs.com/usr/dmr/www/man21.pdf

Document history of chown(2)
https://www.bell-labs.com/usr/dmr/www/man21.pdf

Document history
https://www.bell-labs.com/usr/dmr/www/man21.pdf

create was present in v1
https://www.bell-labs.com/usr/dmr/www/man21.pdf

Document history of exec()
Move statement on execlpe() & execvpe() to HISTORY section.

Document history
https://www.bell-labs.com/usr/dmr/www/man21.pdf

fork was present in v1
https://www.bell-labs.com/usr/dmr/www/man21.pdf
stat() was present in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

document history of fstat()
https://www.bell-labs.com/usr/dmr/www/man21.pdf

getuid was present in v1
https://www.bell-labs.com/usr/dmr/www/man21.pdf

Document history
https://www.bell-labs.com/usr/dmr/www/man21.pdf

Document history
https://www.bell-labs.com/usr/dmr/www/man21.pdf

stty & gtty were around since v1
https://www.bell-labs.com/usr/dmr/www/man21.pdf
https://www.bell-labs.com/usr/dmr/www/man22.pdf

mount & umount were present in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

Open was present in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

read was present in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

seek was present in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

setuid was in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

unlink was presen in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

wait was present in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

write was present in v1
https://www.bell-labs.com/usr/dmr/www/man22.pdf

start documenting history
exp was present in v1
https://www.bell-labs.com/usr/dmr/www/man31.pdf

Start documenting history
https://www.bell-labs.com/usr/dmr/www/man31.pdf

Start documenting history
https://www.bell-labs.com/usr/dmr/www/man31.pdf

log appeared in v1
https://www.bell-labs.com/usr/dmr/www/man31.pdf

putc & putw were in v1
https://www.bell-labs.com/usr/dmr/www/man31.pdf

putchar was in v4
https://minie.tuhs.org/cgi-bin/utree.pl?file=V4/man/man3/putchr.3

Start documenting history
https://www.bell-labs.com/usr/dmr/www/man31.pdf

Document history.
https://www.bell-labs.com/usr/dmr/www/man11.pdf
Between v1 & v6 UNIX, bcd was rewritten in C, but I don't know if which
version,
hence I've skipped mentioning it.
End sentence with a dot.
Remove superfluous Pp.
Remove superfluous Pp.
Remove superfluous Ns.
Remove superfluous Pp.
fetch(9) -> ufetch(9)
fetch(9) -> ufetch(9). Remove superfluous Pp.
fetch(9) -> ufetch(9). Remove reference to unimplemented ppi(9).
 1.40.10.1 02-Aug-2025  perseant Sync with HEAD
 1.64 10-Jan-2017  christos refactor includes, add <sys/stat.h>
 1.63 01-Jun-2016  wiz branches: 1.63.2;
Sort options and their descriptions. Sync usage more with man page.
Bump date in man page for new option -C.
 1.62 01-Jun-2016  kre Add the posix -C option (-c but quieter). Fix -R to work properly when
setting \n as the record delimited using a numeric value rather than literal
\n - and to not incorrectly turn \n into a field separator if -R is used to
make some other char the record separator (\n becomes a field separator in
that case as long as the field separator remains "white space" but should not
be in any other case - unless set explicitly of course.)

Plus more cosmetic changes - the man page and usage are updated to make it
more clear that the 2 (or 1) params to -k are not fields (field1 and field2)
but specifiers of the beginning and end of one key field. There was an
unused 'x' option in the GETOPTS string. The usage message is reformatted
to display properly on both 80 col and > 80 col displays (on < 80 it will
still probably look pretty ugly ... perhaps not quite so bad though), and
is also updated to show the different usage for the -c case (and -C) from the
others (only 1 file permitted) - the man page synopsis has a similar update.

Using more than one of -c -C or -m generates a usage message rather than
just ignoring the -m as it did before (there was no -C before of course).

Aside from the bug fix to the interaction between -R and -t, there are no
changes that affect the way anything is sorted (or read, or written).

Discussed on tech-userlevel earlier this week.
 1.61 16-Sep-2011  joerg Use __dead
 1.60 18-Dec-2010  christos Add an 'l' style for sorting that sorts by the string length of the field.
 1.59 05-Jun-2010  dholland fixit() needs to know the getopt options list to do its thing correctly.
 1.58 05-Feb-2010  enami Don't touch past the end of allocated region. It results segmentation
violation.
 1.57 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.56 26-Sep-2009  dsl Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
 1.55 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.54 05-Sep-2009  dsl Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
 1.53 22-Aug-2009  dsl Put radixsort() and sradixsort() the correct way around.
 1.52 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.51 20-Aug-2009  dsl Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
 1.50 18-Aug-2009  dsl The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
 1.49 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.48 13-Apr-2009  lukem Fix WARNS=4 issues (-Wcast-qual -Wsign-compare)
 1.47 08-Nov-2008  christos branches: 1.47.2;
Make -R accept numeric arguments so one can say -R '\0' to be used in
pipelines like find . -print0 | sort -R '\0'. From Anon Ymous
 1.46 21-Jul-2008  lukem branches: 1.46.4; 1.46.8;
Remove the \n and tabs from the __COPYRIGHT() strings.
Tweak to use a consistent format.
 1.45 28-Apr-2008  martin branches: 1.45.2;
Remove clause 3 and 4 from TNF licenses
 1.44 23-Oct-2006  jdolecek branches: 1.44.16;
when using -o into file which already exists, copy the permissions
of the original file to the new (sorted) file

adresses PR bin/26860 by Michael van Elst
 1.43 23-Oct-2006  jdolecek replace access(2) + /dev/ prefix check with lstat(2) and S_ISCHR()/S_ISBLK()

part of PR bin/26860 by Michael van Elst

while here, put output file fopen() inside the code block of the
only code path where it's actually needed, to make the logic more obvious;
and in the "stdout" case, initialize toutpath to empty string rather
then /dev/stdout, to make it clear /dev/stdout is not actually used
 1.42 23-Oct-2006  jdolecek use F_OK instead of 0 for second parameter of access(2)

part of PR bin/26860 by Michael van Elst
 1.41 23-Jul-2004  wiz Sync usage with man page. From Kouichirou Hiratsuka in PR 26278.
 1.40 14-Mar-2004  heas remove double initialisation of SINGL_FLD & SEP_FLAG
 1.39 17-Feb-2004  jdolecek ftpos pointer was not updated when fldtab was reallocated; drop completely
in favour of an index counter
fixes bin/24449 by Jun-ichiro itojun
 1.38 17-Feb-2004  jdolecek fldtab[] needs to have one extra element - this marks end of array
adresses part of PR bin/24449 by Jun-ichiro itojun
 1.37 17-Feb-2004  itojun use safer realloc idiom
memset new region got by realloc
 1.36 17-Feb-2004  itojun initialize fldtab
 1.35 15-Feb-2004  jdolecek remove compile-time limit on number of -k options, allocate necessary
structures as-needed
 1.34 07-Aug-2003  jdolecek add TNF copyright
 1.33 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.32 24-Dec-2002  jdolecek g/c many_files(), too
 1.31 24-Dec-2002  jdolecek bump 'soft' limit for number of files to hard limit on startup; we
want to be able to open as many temporary files as possible
 1.30 24-Dec-2002  jdolecek move fltab outside main and make it static, eliminate two memset()s
g/c superfluous extern definition for clist[] and ncols
make toutpath[] static
 1.29 27-Nov-2002  tron Remove the statically initialized "sigaction" structure completely because
such usage is broken. Problem pointed out by Klaus Klein on
"sources-changes@netbsd.org".
 1.28 27-Nov-2002  tron Add braces in a statically initialized "sigaction" structure to fix a
build problem after siginfo(2) has been added.
 1.27 14-May-2001  jdolecek disable the code which maxes nofiles limit, it should not be normally
needed now
 1.26 30-Apr-2001  ross XXX
For some reason this program wants to open _hundreds_ of temporary files.
Make it setrlimit(RLIMIT_NOFILE, ...), so this rather dubious strategy at
least works well enough to ctag(1) our own kernel.
XXX
 1.25 22-Feb-2001  christos - use MAXPATHLEN (1024) instead of _POSIX_PATH_MAX (255) for the temporary
path buffer
- provide better error messages about why the temp file creation is failing
- explicitly compare syscall return to -1 instead of < 0 and fdopen return
to NULL instead of 0.
 1.24 21-Feb-2001  christos Fix problem when using sort >> foo
If no output file was specified sort fopened("/dev/stdout", "w").
This is *wrong* because "/dev/stdout" will truncate the output file,
thus undoing the append effect the shell had set up. The simple fix
here is to just arrange for outfp = stdout and don't play with /dev/stdout.

While I am here:
- KNF
- make pattern for mkstemp have 6 X's.
 1.23 19-Feb-2001  jdolecek full -T support
 1.22 19-Feb-2001  jdolecek resurrect old ftmp() - it supports alternative directory for temporary
file, which is needed for -T support
 1.21 07-Feb-2001  jdolecek use -R instead of -w, since that's what OpenBSD is using and there is no reason
to be different
 1.20 07-Feb-2001  jdolecek Since -T is used to select directory for temporary files in other sort
implementations, we should avoid using it for something else.
Use (new) flag -w for setting record delimiter, make -T noop.
 1.19 07-Feb-2001  jdolecek use errx(), not err() within section for '-t' flag
 1.18 13-Jan-2001  soren And make usage() test for NULL explicitly..
 1.17 13-Jan-2001  soren usage() expects a NULL when there is no specific error message.
 1.16 13-Jan-2001  jdolecek save couple of cycles and bytes by static initialization of sigaction act
and sigtable[]
 1.15 12-Jan-2001  jdolecek alltable[], itable[], dtable[] were moved to init.c, g/c from sort.[ch]
put extern declaration for gweights[] to sort.h
add -s/-S to usage(), couple of formating nits
 1.14 11-Jan-2001  jdolecek the g/c in rev 1.12 was too aggressive - put back code
to change file '-' to '/dev/stdin'
 1.13 11-Jan-2001  jdolecek general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
 1.12 08-Jan-2001  jdolecek make ftmp() wrapper aroung tmpfile(), there is no need to reimplement it
move ftmp() from tmp.c to files.c
g/c no longer needed stuff
 1.11 08-Jan-2001  jdolecek call setlocale() on startup
reformat the switch contents in main() a little, sort flags by alphabet
where possible
 1.10 08-Jan-2001  jdolecek constify a bit, small cleanups
 1.9 08-Jan-2001  jdolecek by default, use stable sort
add -S flag to switch to non-stable sort; for GNU sort compatibility,
provide -s flag too
 1.8 16-Oct-2000  jdolecek include a bit more information in error messages, constify
put temporary files in _PATH_TMP by default
 1.7 11-Oct-2000  thorpej Format string fixes.
 1.6 07-Oct-2000  bjh21 OpenBSD revision 1.5:
Normalize treatment of -n option. Don't know why it was ever special-cased
(since it was broken that way).
 1.5 07-Oct-2000  bjh21 OpenBSD revision 1.3:
for implied stdin, do not corrupt argv[0]
 1.4 07-Oct-2000  bjh21 Part of OpenBSD revision 1.2:
Fix err(3) usage.
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.44.16.1 18-May-2008  yamt sync with head.
 1.45.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.46.8.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.46.8.1 21-Apr-2010  matt sync to netbsd-5
 1.46.4.2 29-Jun-2010  riz Pull up following revision(s) (requested by dholland in ticket #1420):
usr.bin/sort/sort.h: revision 1.31
usr.bin/sort/sort.c: revision 1.58
usr.bin/sort/fsort.c: revision 1.47
usr.bin/sort/msort.c: revision 1.30
Don't touch past the end of allocated region. It results segmentation
violation.
 1.46.4.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.47.2.1 13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.63.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.36 01-Jun-2016  kre Add the posix -C option (-c but quieter). Fix -R to work properly when
setting \n as the record delimited using a numeric value rather than literal
\n - and to not incorrectly turn \n into a field separator if -R is used to
make some other char the record separator (\n becomes a field separator in
that case as long as the field separator remains "white space" but should not
be in any other case - unless set explicitly of course.)

Plus more cosmetic changes - the man page and usage are updated to make it
more clear that the 2 (or 1) params to -k are not fields (field1 and field2)
but specifiers of the beginning and end of one key field. There was an
unused 'x' option in the GETOPTS string. The usage message is reformatted
to display properly on both 80 col and > 80 col displays (on < 80 it will
still probably look pretty ugly ... perhaps not quite so bad though), and
is also updated to show the different usage for the -c case (and -C) from the
others (only 1 file permitted) - the man page synopsis has a similar update.

Using more than one of -c -C or -m generates a usage message rather than
just ignoring the -m as it did before (there was no -C before of course).

Aside from the bug fix to the interaction between -R and -t, there are no
changes that affect the way anything is sorted (or read, or written).

Discussed on tech-userlevel earlier this week.
 1.35 05-Aug-2015  mrg add a description about what was being attempted to failed writes messages.
 1.34 16-Sep-2011  joerg Use __dead
 1.33 18-Dec-2010  christos Add an 'l' style for sorting that sorts by the string length of the field.
 1.32 05-Jun-2010  dholland fixit() needs to know the getopt options list to do its thing correctly.
 1.31 05-Feb-2010  enami Don't touch past the end of allocated region. It results segmentation
violation.
 1.30 28-Sep-2009  dsl Fix borked fix for sort relying on realloc() changing the buffer end.
Sorts of more than 8MB data now probably work again.
 1.29 26-Sep-2009  dsl Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
 1.28 10-Sep-2009  dsl Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
 1.27 05-Sep-2009  dsl Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
 1.26 05-Sep-2009  dsl Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
 1.25 22-Aug-2009  dsl Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
 1.24 20-Aug-2009  dsl Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
 1.23 18-Aug-2009  dsl The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
 1.22 16-Aug-2009  dsl Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which
is defined as offsetof(RECHEADER, data). Delete TRECHEADER.
 1.21 15-Aug-2009  dsl Remove reference to db.h by using separate ptr+len fields for the only
structure that used it.
Pass end of keybuf area, not size to enterkey() - largely to remove a
variable who'se use isn't obvious from the name!
The structute of this code sucks.
 1.20 13-Apr-2009  lukem Fix WARNS=4 issues (-Wcast-qual -Wsign-compare)
 1.19 28-Apr-2008  martin branches: 1.19.6; 1.19.8; 1.19.12;
Remove clause 3 and 4 from TNF licenses
 1.18 15-Feb-2004  jdolecek branches: 1.18.32;
remove compile-time limit on number of -k options, allocate necessary
structures as-needed
 1.17 07-Aug-2003  jdolecek add TNF copyright
 1.16 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.15 25-Dec-2002  jdolecek make function merge() static in msort.c
cosmetic change to how local variable is incremented (moved to for(;;))
 1.14 24-Dec-2002  jdolecek put contents of extern.h directly to sort.h, and g/c extern.h
de-__P()
 1.13 24-Dec-2002  jdolecek add extern definition for ncols and clist[] to sort.h, eliminate extra
definitions in init.c and field.c
g/c MAXMERGE
 1.12 19-Feb-2001  jdolecek cosmetic changes - make keylist[] static and remove extern definition
in fsort.h, move macro SALIGN() from sort.h to fsort.c
 1.11 19-Jan-2001  jdolecek adjust intendation
 1.10 16-Jan-2001  shin - fix alignment problem.
 1.9 12-Jan-2001  jdolecek alltable[], itable[], dtable[] were moved to init.c, g/c from sort.[ch]
put extern declaration for gweights[] to sort.h
 1.8 11-Jan-2001  jdolecek general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
 1.7 08-Jan-2001  jdolecek constify a bit, small cleanups
 1.6 08-Jan-2001  jdolecek by default, use stable sort
add -S flag to switch to non-stable sort; for GNU sort compatibility,
provide -s flag too
 1.5 16-Oct-2000  jdolecek enlarge line buffer as necessary, so that it's possible
to process lines longer than 65522 characters
constify, rename MAXLLEN to DEFLLEN
 1.4 07-Oct-2000  simonb Include <string.h> to get prototype for memcpy(). Fixed compile problems
on alpha (and other LP64 archs?).

XXX: Can't gcc be fixed so that it doesn't auto-prototype mem*()??
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.18.32.1 18-May-2008  yamt sync with head.
 1.19.12.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE
 1.19.12.1 21-Apr-2010  matt sync to netbsd-5
 1.19.8.1 13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.19.6.2 29-Jun-2010  riz Pull up following revision(s) (requested by dholland in ticket #1420):
usr.bin/sort/sort.h: revision 1.31
usr.bin/sort/sort.c: revision 1.58
usr.bin/sort/fsort.c: revision 1.47
usr.bin/sort/msort.c: revision 1.30
Don't touch past the end of allocated region. It results segmentation
violation.
 1.19.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094
 1.16 06-Nov-2009  joerg Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
 1.15 26-Sep-2009  dsl Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
 1.14 15-Aug-2009  dsl Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
 1.13 28-Apr-2008  martin branches: 1.13.6; 1.13.12;
Remove clause 3 and 4 from TNF licenses
 1.12 21-Feb-2007  hubertf branches: 1.12.10;
<ctype.h> is unused. What's still needed is <sys/cdefs.h> (which is
usually included at that place anyways).

From Slava Semushin <slava.semushin@gmail.com>.
 1.11 07-Aug-2003  jdolecek add TNF copyright
 1.10 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.9 23-Dec-2002  jdolecek simplify a bit (no need for separate 'char *path')
 1.8 23-Feb-2001  jdolecek Use MAXPATHLEN (which is 1024) instead of _POSIX_PATH_MAX (which is only 255).
This change tracks change in rev 1.25 of sort.c by Christos Zoulas.
While here, improve error messages slighly.
 1.7 19-Feb-2001  jdolecek resurrect old ftmp() - it supports alternative directory for temporary
file, which is needed for -T support
 1.6 08-Jan-2001  jdolecek make ftmp() wrapper aroung tmpfile(), there is no need to reimplement it
move ftmp() from tmp.c to files.c
g/c no longer needed stuff
 1.5 16-Oct-2000  jdolecek include a bit more information in error messages
 1.4 11-Oct-2000  thorpej Format string fixes.
 1.3 07-Oct-2000  bjh21 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
 1.2 07-Oct-2000  bjh21 Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
 1.1 07-Oct-2000  bjh21 branches: 1.1.1;
Initial revision
 1.1.1.1 07-Oct-2000  bjh21 4.4BSD-Lite2 contrib/sort
 1.12.10.1 18-May-2008  yamt sync with head.
 1.13.12.1 21-Apr-2010  matt sync to netbsd-5
 1.13.6.1 14-Oct-2009  sborrill Pull up the following revisions(s) (requested by dsl in ticket #1084):
usr.bin/sort/Makefile: revision 1.6-1.8
usr.bin/sort/append.c: revision 1.15-1.22
usr.bin/sort/fields.c: revision 1.20-1.30
usr.bin/sort/files.c: revision 1.27-1.40
usr.bin/sort/fsort.c: revision 1.33-1.45
usr.bin/sort/fsort.h: revision 1.14-1.17
usr.bin/sort/init.c: revision 1.19-1.23
usr.bin/sort/msort.c: revision 1.19-1.28
usr.bin/sort/radix_sort.c: revision 1.1-1.4
usr.bin/sort/sort.1: revision 1.27-1.29
usr.bin/sort/sort.c: revision 1.47-1.56
usr.bin/sort/sort.h: revision 1.20-1.30
usr.bin/sort/tmp.c: revision 1.14-1.15

Only use radix sort for in-memory sort, always merge temporary files.
Use a local radixsort() function so we can pass record length.
Avoid use of weight tables for key compares.
Fix generation of keys for numbers, negate value for reverse sort.
Write file in reverse-key order for 'sort -n'.
'sort -S' now does a posix sort (sort matching keys by record data).
Ensure merge sort doesn't have too many temporary files open.
Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
PR#37860 PR#39308 PR#42094

RSS XML Feed