README revision 1.1
11.1Scgd
21.1Scgd	@(#)README	5.3 (Berkeley) 9/17/85
31.1Scgd
41.1ScgdCompress version 4.0 improvements over 3.0:
51.1Scgd	o compress() speedup (10-50%) by changing division hash to xor
61.1Scgd	o decompress() speedup (5-10%)
71.1Scgd	o Memory requirements reduced (3-30%)
81.1Scgd	o Stack requirements reduced to less than 4kb
91.1Scgd	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
101.1Scgd    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
111.1Scgd	o Default to 'quiet' mode
121.1Scgd	o Unification of 'force' flags
131.1Scgd	o Manual page overhaul
141.1Scgd	o Portability enhancement for M_XENIX
151.1Scgd	o Removed text on #else and #endif
161.1Scgd	o Added "-V" switch to print version and options
171.1Scgd	o Added #defines for SIGNED_COMPARE_SLOW
181.1Scgd	o Added Makefile and "usermem" program
191.1Scgd	o Removed all floating point computations
201.1Scgd	o New programs: [deleted]
211.1Scgd
221.1ScgdThe "usermem" script attempts to determine the maximum process size.  Some
231.1Scgdediting of the script may be necessary (see the comments).  [It should work
241.1Scgdfine on 4.3 bsd.] If you can't get it to work at all, just create file
251.1Scgd"USERMEM" containing the maximum process size in decimal.
261.1Scgd
271.1ScgdThe following preprocessor symbols control the compilation of "compress.c":
281.1Scgd
291.1Scgd	o USERMEM		Maximum process memory on the system
301.1Scgd	o SACREDMEM		Amount to reserve for other proceses
311.1Scgd	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
321.1Scgd	o NO_UCHAR		Don't use "unsigned char" types
331.1Scgd	o BITS			Overrules default set by USERMEM-SACREDMEM
341.1Scgd	o vax			Generate inline assembler
351.1Scgd	o interdata		Defines SIGNED_COMPARE_SLOW
361.1Scgd	o M_XENIX		Makes arrays < 65536 bytes each
371.1Scgd	o pdp11			BITS=12, NO_UCHAR
381.1Scgd	o z8000			BITS=12
391.1Scgd	o pcxt			BITS=12
401.1Scgd	o BSD4_2		Allow long filenames ( > 14 characters) &
411.1Scgd				Call setlinebuf(stderr)
421.1Scgd
431.1ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be
441.1Scgdspecified with the "-b" flag.
451.1Scgd
461.1Scgdmemory: at least		BITS
471.1Scgd------  -- -----                ----
481.1Scgd     433,484			 16
491.1Scgd     229,600			 15
501.1Scgd     127,536			 14
511.1Scgd      73,464			 13
521.1Scgd           0			 12
531.1Scgd
541.1ScgdThe default is BITS=16.
551.1Scgd
561.1ScgdThe maximum bits can be overrulled by specifying "-DBITS=bits" at
571.1Scgdcompilation time.
581.1Scgd
591.1ScgdWARNING: files compressed on a large machine with more bits than allowed by 
601.1Scgda version of compress on a smaller machine cannot be decompressed!  Use the
611.1Scgd"-b12" flag to generate a file on a large machine that can be uncompressed 
621.1Scgdon a 16-bit machine.
631.1Scgd
641.1ScgdThe output of compress 4.0 is fully compatible with that of compress 3.0.
651.1ScgdIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
661.1Scgdthe output of compress 3.0 may be fed into uncompress 4.0.
671.1Scgd
681.1ScgdThe output of compress 4.0 not compatible with that of
691.1Scgdcompress 2.0.  However, compress 4.0 still accepts the output of
701.1Scgdcompress 2.0.  To generate output that is compatible with compress
711.1Scgd2.0, use the undocumented "-C" flag.
721.1Scgd
731.1Scgd	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
741.1Scgd--------------------------------
751.1Scgd
761.1ScgdEnclosed is compress version 3.0 with the following changes:
771.1Scgd
781.1Scgd1.	"Block" compression is performed.  After the BITS run out, the
791.1Scgd	compression ratio is checked every so often.  If it is decreasing,
801.1Scgd	the table is cleared and a new set of substrings are generated.
811.1Scgd
821.1Scgd	This makes the output of compress 3.0 not compatible with that of
831.1Scgd	compress 2.0.  However, compress 3.0 still accepts the output of
841.1Scgd	compress 2.0.  To generate output that is compatible with compress
851.1Scgd	2.0, use the undocumented "-C" flag.
861.1Scgd
871.1Scgd2.	A quiet "-q" flag has been added for use by the news system.
881.1Scgd
891.1Scgd3.	The character chaining has been deleted and the program now uses
901.1Scgd	hashing.  This improves the speed of the program, especially
911.1Scgd	during decompression.  Other speed improvements have been made,
921.1Scgd	such as using putc() instead of fwrite().
931.1Scgd
941.1Scgd4.	A large table is used on large machines when a relatively small
951.1Scgd	number of bits is specified.  This saves much time when compressing
961.1Scgd	for a 16-bit machine on a 32-bit virtual machine.  Note that the
971.1Scgd	speed improvement only occurs when the input file is > 30000
981.1Scgd	characters, and the -b BITS is less than or equal to the cutoff
991.1Scgd	described below.
1001.1Scgd
1011.1ScgdMost of these changes were made by James A. Woods (ames!jaw).  Thank you
1021.1ScgdJames!
1031.1Scgd
1041.1ScgdTo compile compress:
1051.1Scgd
1061.1Scgd	cc -O -DUSERMEM=usermem -o compress compress.c
1071.1Scgd
1081.1ScgdWhere "usermem" is the amount of physical user memory available (in bytes).  
1091.1ScgdIf any physical memory is to be reserved for other processes, put in 
1101.1Scgd"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
1111.1Scgd
1121.1ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be
1131.1Scgdspecified, and the cutoff bits where the large+fast table is used.
1141.1Scgd
1151.1Scgdmemory: at least		BITS		cutoff
1161.1Scgd------  -- -----                ----            ------
1171.1Scgd   4,718,592 			 16		  13
1181.1Scgd   2,621,440 			 16		  12
1191.1Scgd   1,572,864			 16		  11
1201.1Scgd   1,048,576			 16		  10
1211.1Scgd     631,808			 16               --
1221.1Scgd     329,728			 15               --
1231.1Scgd     178,176			 14		  --
1241.1Scgd      99,328			 13		  --
1251.1Scgd           0			 12		  --
1261.1Scgd
1271.1ScgdThe default memory size is 750,000 which gives a maximum BITS=16 and no
1281.1Scgdlarge+fast table.
1291.1Scgd
1301.1ScgdThe maximum bits can be overruled by specifying "-DBITS=bits" at
1311.1Scgdcompilation time.
1321.1Scgd
1331.1ScgdIf your machine doesn't support unsigned characters, define "NO_UCHAR" 
1341.1Scgdwhen compiling.
1351.1Scgd
1361.1ScgdIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
1371.1Scgd
1381.1ScgdAfter compilation, move "compress" to a standard executable location, such 
1391.1Scgdas /usr/local.  Then:
1401.1Scgd	cd /usr/local
1411.1Scgd	ln compress uncompress
1421.1Scgd	ln compress zcat
1431.1Scgd
1441.1ScgdOn machines that have a fixed stack size (such as Perkin-Elmer), set the
1451.1Scgdstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
1461.1Scgd
1471.1ScgdNext, install the manual (compress.l).
1481.1Scgd	cp compress.l /usr/man/manl
1491.1Scgd	cd /usr/man/manl
1501.1Scgd	ln compress.l uncompress.l
1511.1Scgd	ln compress.l zcat.l
1521.1Scgd
1531.1Scgd		- or -
1541.1Scgd
1551.1Scgd	cp compress.l /usr/man/man1/compress.1
1561.1Scgd	cd /usr/man/man1
1571.1Scgd	ln compress.1 uncompress.1
1581.1Scgd	ln compress.1 zcat.1
1591.1Scgd
1601.1Scgd					regards,
1611.1Scgd					petsd!joe
1621.1Scgd
1631.1ScgdHere is a note from the net:
1641.1Scgd
1651.1Scgd>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
1661.1ScgdPath: ames!hplabs!pesnta!amd!turtlevax!ken
1671.1ScgdFrom: ken@turtlevax.UUCP (Ken Turkowski)
1681.1ScgdNewsgroups: net.sources
1691.1ScgdSubject: Re: Compress release 3.0 : sample Makefile
1701.1ScgdOrganization: CADLINC, Inc. @ Menlo Park, CA
1711.1Scgd
1721.1ScgdIn the compress 3.0 source recently posted to mod.sources, there is a
1731.1Scgd#define variable which can be set for optimum performance on a machine
1741.1Scgdwith a large amount of memory.  A program (usermem) to calculate the
1751.1Scgduseable amount of physical user memory is enclosed, as well as a sample
1761.1Scgd4.2bsd Vax Makefile for compress.
1771.1Scgd
1781.1ScgdHere is the README file from the previous version of compress (2.0):
1791.1Scgd
1801.1Scgd>Enclosed is compress.c version 2.0 with the following bugs fixed:
1811.1Scgd>
1821.1Scgd>1.	The packed files produced by compress are different on different
1831.1Scgd>	machines and dependent on the vax sysgen option.
1841.1Scgd>		The bug was in the different byte/bit ordering on the
1851.1Scgd>		various machines.  This has been fixed.
1861.1Scgd>
1871.1Scgd>		This version is NOT compatible with the original vax posting
1881.1Scgd>		unless the '-DCOMPATIBLE' option is specified to the C
1891.1Scgd>		compiler.  The original posting has a bug which I fixed, 
1901.1Scgd>		causing incompatible files.  I recommend you NOT to use this
1911.1Scgd>		option unless you already have a lot of packed files from
1921.1Scgd>		the original posting by thomas.
1931.1Scgd>2.	The exit status is not well defined (on some machines) causing the
1941.1Scgd>	scripts to fail.
1951.1Scgd>		The exit status is now 0,1 or 2 and is documented in
1961.1Scgd>		compress.l.
1971.1Scgd>3.	The function getopt() is not available in all C libraries.
1981.1Scgd>		The function getopt() is no longer referenced by the
1991.1Scgd>		program.
2001.1Scgd>4.	Error status is not being checked on the fwrite() and fflush() calls.
2011.1Scgd>		Fixed.
2021.1Scgd>
2031.1Scgd>The following enhancements have been made:
2041.1Scgd>
2051.1Scgd>1.	Added facilities of "compact" into the compress program.  "Pack",
2061.1Scgd>	"Unpack", and "Pcat" are no longer required (no longer supplied).
2071.1Scgd>2.	Installed work around for C compiler bug with "-O".
2081.1Scgd>3.	Added a magic number header (\037\235).  Put the bits specified
2091.1Scgd>	in the file.
2101.1Scgd>4.	Added "-f" flag to force overwrite of output file.
2111.1Scgd>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
2121.1Scgd>	compile.
2131.1Scgd>6.	The 'uncompress' script has been deleted; simply 
2141.1Scgd>	'ln compress uncompress' after you compile and it will work.
2151.1Scgd>7.	Removed extra bit masking for machines that support unsigned
2161.1Scgd>	characters.  If your machine doesn't support unsigned characters,
2171.1Scgd>	define "NO_UCHAR" when compiling.
2181.1Scgd>
2191.1Scgd>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
2201.1Scgd>standard executable location, such as /usr/local.  Then:
2211.1Scgd>	cd /usr/local
2221.1Scgd>	ln compress uncompress
2231.1Scgd>	ln compress zcat
2241.1Scgd>
2251.1Scgd>On machines that have a fixed stack size (such as Perkin-Elmer), set the
2261.1Scgd>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
2271.1Scgd>
2281.1Scgd>Next, install the manual (compress.l).
2291.1Scgd>	cp compress.l /usr/man/manl		- or -
2301.1Scgd>	cp compress.l /usr/man/man1/compress.1
2311.1Scgd>
2321.1Scgd>Here is the README that I sent with my first posting:
2331.1Scgd>
2341.1Scgd>>Enclosed is a modified version of compress.c, along with scripts to make it
2351.1Scgd>>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
2361.1Scgd>>(petsd!joe) and a colleague (petsd!peora!srd) did:
2371.1Scgd>>
2381.1Scgd>>1. Removed VAX dependencies.
2391.1Scgd>>2. Changed the struct to separate arrays; saves mucho memory.
2401.1Scgd>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
2411.1Scgd>>4. Sorted the character next chain and changed the search to stop
2421.1Scgd>>prematurely.  This saves a lot on the execution time when compressing.
2431.1Scgd>>
2441.1Scgd>>This version is totally compatible with the original version.  Even though
2451.1Scgd>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
2461.1Scgd>>machine, due to the size of the arrays.
2471.1Scgd>>
2481.1Scgd>>Here is the README file from the original author:
2491.1Scgd>> 
2501.1Scgd>>>Well, with all this discussion about file compression (for news batching
2511.1Scgd>>>in particular) going around, I decided to implement the text compression
2521.1Scgd>>>algorithm described in the June Computer magazine.  The author claimed
2531.1Scgd>>>blinding speed and good compression ratios.  It's certainly faster than
2541.1Scgd>>>compact (but, then, what wouldn't be), but it's also the same speed as
2551.1Scgd>>>pack, and gets better compression than both of them.  On 350K bytes of
2561.1Scgd>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
2571.1Scgd>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
2581.1Scgd>>>pack got about 30% compression, whereas compress got over 50%.  So, I
2591.1Scgd>>>decided I had something, and that others might be interested, too.
2601.1Scgd>>>
2611.1Scgd>>>As is probably true of compact and pack (although I haven't checked),
2621.1Scgd>>>the byte order within a word is probably relevant here, but as long as
2631.1Scgd>>>you stay on a single machine type, you should be ok.  (Can anybody
2641.1Scgd>>>elucidate on this?)  There are a couple of asm's in the code (extv and
2651.1Scgd>>>insv instructions), so anyone porting it to another machine will have to
2661.1Scgd>>>deal with this anyway (and could probably make it compatible with Vax
2671.1Scgd>>>byte order at the same time).  Anyway, I've linted the code (both with
2681.1Scgd>>>and without -p), so it should run elsewhere.  Note the longs in the
2691.1Scgd>>>code, you can take these out if you reduce BITS to <= 15.
2701.1Scgd>>>
2711.1Scgd>>>Have fun, and as always, if you make good enhancements, or bug fixes,
2721.1Scgd>>>I'd like to see them.
2731.1Scgd>>>
2741.1Scgd>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
2751.1Scgd>>
2761.1Scgd>>					regards,
2771.1Scgd>>					joe
2781.1Scgd>>
2791.1Scgd>>--
2801.1Scgd>>Full-Name:  Joseph M. Orost
2811.1Scgd>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
2821.1Scgd>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
2831.1Scgd>>Phone:      (201) 870-5844
284