11.1Scgd 21.2Srillig @(#)README 8.1 (Berkeley) 6/9/93 31.1Scgd 41.1ScgdCompress version 4.0 improvements over 3.0: 51.1Scgd o compress() speedup (10-50%) by changing division hash to xor 61.1Scgd o decompress() speedup (5-10%) 71.1Scgd o Memory requirements reduced (3-30%) 81.1Scgd o Stack requirements reduced to less than 4kb 91.1Scgd o Removed 'Big+Fast' compress code (FBITS) because of compress speedup 101.1Scgd o Portability mods for Z8000 and PC/XT (but not zeus 3.2) 111.1Scgd o Default to 'quiet' mode 121.1Scgd o Unification of 'force' flags 131.1Scgd o Manual page overhaul 141.1Scgd o Portability enhancement for M_XENIX 151.1Scgd o Removed text on #else and #endif 161.1Scgd o Added "-V" switch to print version and options 171.1Scgd o Added #defines for SIGNED_COMPARE_SLOW 181.1Scgd o Added Makefile and "usermem" program 191.1Scgd o Removed all floating point computations 201.1Scgd o New programs: [deleted] 211.1Scgd 221.1ScgdThe "usermem" script attempts to determine the maximum process size. Some 231.1Scgdediting of the script may be necessary (see the comments). [It should work 241.1Scgdfine on 4.3 bsd.] If you can't get it to work at all, just create file 251.1Scgd"USERMEM" containing the maximum process size in decimal. 261.1Scgd 271.1ScgdThe following preprocessor symbols control the compilation of "compress.c": 281.1Scgd 291.1Scgd o USERMEM Maximum process memory on the system 301.1Scgd o SACREDMEM Amount to reserve for other proceses 311.1Scgd o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster 321.1Scgd o NO_UCHAR Don't use "unsigned char" types 331.1Scgd o BITS Overrules default set by USERMEM-SACREDMEM 341.1Scgd o vax Generate inline assembler 351.1Scgd o interdata Defines SIGNED_COMPARE_SLOW 361.1Scgd o M_XENIX Makes arrays < 65536 bytes each 371.1Scgd o pdp11 BITS=12, NO_UCHAR 381.1Scgd o z8000 BITS=12 391.1Scgd o pcxt BITS=12 401.1Scgd o BSD4_2 Allow long filenames ( > 14 characters) & 411.1Scgd Call setlinebuf(stderr) 421.1Scgd 431.1ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be 441.1Scgdspecified with the "-b" flag. 451.1Scgd 461.1Scgdmemory: at least BITS 471.1Scgd------ -- ----- ---- 481.1Scgd 433,484 16 491.1Scgd 229,600 15 501.1Scgd 127,536 14 511.1Scgd 73,464 13 521.1Scgd 0 12 531.1Scgd 541.1ScgdThe default is BITS=16. 551.1Scgd 561.1ScgdThe maximum bits can be overrulled by specifying "-DBITS=bits" at 571.1Scgdcompilation time. 581.1Scgd 591.1ScgdWARNING: files compressed on a large machine with more bits than allowed by 601.1Scgda version of compress on a smaller machine cannot be decompressed! Use the 611.1Scgd"-b12" flag to generate a file on a large machine that can be uncompressed 621.1Scgdon a 16-bit machine. 631.1Scgd 641.1ScgdThe output of compress 4.0 is fully compatible with that of compress 3.0. 651.1ScgdIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or 661.1Scgdthe output of compress 3.0 may be fed into uncompress 4.0. 671.1Scgd 681.1ScgdThe output of compress 4.0 not compatible with that of 691.1Scgdcompress 2.0. However, compress 4.0 still accepts the output of 701.1Scgdcompress 2.0. To generate output that is compatible with compress 711.1Scgd2.0, use the undocumented "-C" flag. 721.1Scgd 731.1Scgd -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85 741.1Scgd-------------------------------- 751.1Scgd 761.1ScgdEnclosed is compress version 3.0 with the following changes: 771.1Scgd 781.1Scgd1. "Block" compression is performed. After the BITS run out, the 791.1Scgd compression ratio is checked every so often. If it is decreasing, 801.1Scgd the table is cleared and a new set of substrings are generated. 811.1Scgd 821.1Scgd This makes the output of compress 3.0 not compatible with that of 831.1Scgd compress 2.0. However, compress 3.0 still accepts the output of 841.1Scgd compress 2.0. To generate output that is compatible with compress 851.1Scgd 2.0, use the undocumented "-C" flag. 861.1Scgd 871.1Scgd2. A quiet "-q" flag has been added for use by the news system. 881.1Scgd 891.1Scgd3. The character chaining has been deleted and the program now uses 901.1Scgd hashing. This improves the speed of the program, especially 911.1Scgd during decompression. Other speed improvements have been made, 921.1Scgd such as using putc() instead of fwrite(). 931.1Scgd 941.1Scgd4. A large table is used on large machines when a relatively small 951.1Scgd number of bits is specified. This saves much time when compressing 961.1Scgd for a 16-bit machine on a 32-bit virtual machine. Note that the 971.1Scgd speed improvement only occurs when the input file is > 30000 981.1Scgd characters, and the -b BITS is less than or equal to the cutoff 991.1Scgd described below. 1001.1Scgd 1011.1ScgdMost of these changes were made by James A. Woods (ames!jaw). Thank you 1021.1ScgdJames! 1031.1Scgd 1041.1ScgdTo compile compress: 1051.1Scgd 1061.1Scgd cc -O -DUSERMEM=usermem -o compress compress.c 1071.1Scgd 1081.1ScgdWhere "usermem" is the amount of physical user memory available (in bytes). 1091.1ScgdIf any physical memory is to be reserved for other processes, put in 1101.1Scgd"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 1111.1Scgd 1121.1ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be 1131.1Scgdspecified, and the cutoff bits where the large+fast table is used. 1141.1Scgd 1151.1Scgdmemory: at least BITS cutoff 1161.1Scgd------ -- ----- ---- ------ 1171.1Scgd 4,718,592 16 13 1181.1Scgd 2,621,440 16 12 1191.1Scgd 1,572,864 16 11 1201.1Scgd 1,048,576 16 10 1211.1Scgd 631,808 16 -- 1221.1Scgd 329,728 15 -- 1231.1Scgd 178,176 14 -- 1241.1Scgd 99,328 13 -- 1251.1Scgd 0 12 -- 1261.1Scgd 1271.1ScgdThe default memory size is 750,000 which gives a maximum BITS=16 and no 1281.1Scgdlarge+fast table. 1291.1Scgd 1301.1ScgdThe maximum bits can be overruled by specifying "-DBITS=bits" at 1311.1Scgdcompilation time. 1321.1Scgd 1331.1ScgdIf your machine doesn't support unsigned characters, define "NO_UCHAR" 1341.1Scgdwhen compiling. 1351.1Scgd 1361.1ScgdIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling. 1371.1Scgd 1381.1ScgdAfter compilation, move "compress" to a standard executable location, such 1391.1Scgdas /usr/local. Then: 1401.1Scgd cd /usr/local 1411.1Scgd ln compress uncompress 1421.1Scgd ln compress zcat 1431.1Scgd 1441.1ScgdOn machines that have a fixed stack size (such as Perkin-Elmer), set the 1451.1Scgdstack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 1461.1Scgd 1471.1ScgdNext, install the manual (compress.l). 1481.1Scgd cp compress.l /usr/man/manl 1491.1Scgd cd /usr/man/manl 1501.1Scgd ln compress.l uncompress.l 1511.1Scgd ln compress.l zcat.l 1521.1Scgd 1531.1Scgd - or - 1541.1Scgd 1551.1Scgd cp compress.l /usr/man/man1/compress.1 1561.1Scgd cd /usr/man/man1 1571.1Scgd ln compress.1 uncompress.1 1581.1Scgd ln compress.1 zcat.1 1591.1Scgd 1601.1Scgd regards, 1611.1Scgd petsd!joe 1621.1Scgd 1631.1ScgdHere is a note from the net: 1641.1Scgd 1651.1Scgd>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 1661.1ScgdPath: ames!hplabs!pesnta!amd!turtlevax!ken 1671.1ScgdFrom: ken@turtlevax.UUCP (Ken Turkowski) 1681.1ScgdNewsgroups: net.sources 1691.1ScgdSubject: Re: Compress release 3.0 : sample Makefile 1701.1ScgdOrganization: CADLINC, Inc. @ Menlo Park, CA 1711.1Scgd 1721.1ScgdIn the compress 3.0 source recently posted to mod.sources, there is a 1731.1Scgd#define variable which can be set for optimum performance on a machine 1741.1Scgdwith a large amount of memory. A program (usermem) to calculate the 1751.1Scgduseable amount of physical user memory is enclosed, as well as a sample 1761.1Scgd4.2bsd Vax Makefile for compress. 1771.1Scgd 1781.1ScgdHere is the README file from the previous version of compress (2.0): 1791.1Scgd 1801.1Scgd>Enclosed is compress.c version 2.0 with the following bugs fixed: 1811.1Scgd> 1821.1Scgd>1. The packed files produced by compress are different on different 1831.1Scgd> machines and dependent on the vax sysgen option. 1841.1Scgd> The bug was in the different byte/bit ordering on the 1851.1Scgd> various machines. This has been fixed. 1861.1Scgd> 1871.1Scgd> This version is NOT compatible with the original vax posting 1881.1Scgd> unless the '-DCOMPATIBLE' option is specified to the C 1891.1Scgd> compiler. The original posting has a bug which I fixed, 1901.1Scgd> causing incompatible files. I recommend you NOT to use this 1911.1Scgd> option unless you already have a lot of packed files from 1921.1Scgd> the original posting by thomas. 1931.1Scgd>2. The exit status is not well defined (on some machines) causing the 1941.1Scgd> scripts to fail. 1951.1Scgd> The exit status is now 0,1 or 2 and is documented in 1961.1Scgd> compress.l. 1971.1Scgd>3. The function getopt() is not available in all C libraries. 1981.1Scgd> The function getopt() is no longer referenced by the 1991.1Scgd> program. 2001.1Scgd>4. Error status is not being checked on the fwrite() and fflush() calls. 2011.1Scgd> Fixed. 2021.1Scgd> 2031.1Scgd>The following enhancements have been made: 2041.1Scgd> 2051.1Scgd>1. Added facilities of "compact" into the compress program. "Pack", 2061.1Scgd> "Unpack", and "Pcat" are no longer required (no longer supplied). 2071.1Scgd>2. Installed work around for C compiler bug with "-O". 2081.1Scgd>3. Added a magic number header (\037\235). Put the bits specified 2091.1Scgd> in the file. 2101.1Scgd>4. Added "-f" flag to force overwrite of output file. 2111.1Scgd>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 2121.1Scgd> compile. 2131.1Scgd>6. The 'uncompress' script has been deleted; simply 2141.1Scgd> 'ln compress uncompress' after you compile and it will work. 2151.1Scgd>7. Removed extra bit masking for machines that support unsigned 2161.1Scgd> characters. If your machine doesn't support unsigned characters, 2171.1Scgd> define "NO_UCHAR" when compiling. 2181.1Scgd> 2191.1Scgd>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 2201.1Scgd>standard executable location, such as /usr/local. Then: 2211.1Scgd> cd /usr/local 2221.1Scgd> ln compress uncompress 2231.1Scgd> ln compress zcat 2241.1Scgd> 2251.1Scgd>On machines that have a fixed stack size (such as Perkin-Elmer), set the 2261.1Scgd>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 2271.1Scgd> 2281.1Scgd>Next, install the manual (compress.l). 2291.1Scgd> cp compress.l /usr/man/manl - or - 2301.1Scgd> cp compress.l /usr/man/man1/compress.1 2311.1Scgd> 2321.1Scgd>Here is the README that I sent with my first posting: 2331.1Scgd> 2341.1Scgd>>Enclosed is a modified version of compress.c, along with scripts to make it 2351.2Srillig>>run identically to pack(1), unpack(1), and pcat(1). Here is what I 2361.1Scgd>>(petsd!joe) and a colleague (petsd!peora!srd) did: 2371.1Scgd>> 2381.1Scgd>>1. Removed VAX dependencies. 2391.1Scgd>>2. Changed the struct to separate arrays; saves mucho memory. 2401.1Scgd>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 2411.1Scgd>>4. Sorted the character next chain and changed the search to stop 2421.1Scgd>>prematurely. This saves a lot on the execution time when compressing. 2431.1Scgd>> 2441.1Scgd>>This version is totally compatible with the original version. Even though 2451.1Scgd>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 2461.1Scgd>>machine, due to the size of the arrays. 2471.1Scgd>> 2481.1Scgd>>Here is the README file from the original author: 2491.1Scgd>> 2501.1Scgd>>>Well, with all this discussion about file compression (for news batching 2511.1Scgd>>>in particular) going around, I decided to implement the text compression 2521.1Scgd>>>algorithm described in the June Computer magazine. The author claimed 2531.1Scgd>>>blinding speed and good compression ratios. It's certainly faster than 2541.1Scgd>>>compact (but, then, what wouldn't be), but it's also the same speed as 2551.1Scgd>>>pack, and gets better compression than both of them. On 350K bytes of 2561.1Scgd>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80 2571.1Scgd>>>seconds, and compress (herein) also took 80 seconds. But, compact and 2581.1Scgd>>>pack got about 30% compression, whereas compress got over 50%. So, I 2591.1Scgd>>>decided I had something, and that others might be interested, too. 2601.1Scgd>>> 2611.1Scgd>>>As is probably true of compact and pack (although I haven't checked), 2621.1Scgd>>>the byte order within a word is probably relevant here, but as long as 2631.1Scgd>>>you stay on a single machine type, you should be ok. (Can anybody 2641.1Scgd>>>elucidate on this?) There are a couple of asm's in the code (extv and 2651.1Scgd>>>insv instructions), so anyone porting it to another machine will have to 2661.1Scgd>>>deal with this anyway (and could probably make it compatible with Vax 2671.1Scgd>>>byte order at the same time). Anyway, I've linted the code (both with 2681.1Scgd>>>and without -p), so it should run elsewhere. Note the longs in the 2691.1Scgd>>>code, you can take these out if you reduce BITS to <= 15. 2701.1Scgd>>> 2711.1Scgd>>>Have fun, and as always, if you make good enhancements, or bug fixes, 2721.1Scgd>>>I'd like to see them. 2731.1Scgd>>> 2741.1Scgd>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 2751.1Scgd>> 2761.1Scgd>> regards, 2771.1Scgd>> joe 2781.1Scgd>> 2791.1Scgd>>-- 2801.1Scgd>>Full-Name: Joseph M. Orost 2811.1Scgd>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 2821.1Scgd>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 2831.1Scgd>>Phone: (201) 870-5844 284