Home | History | Annotate | only in /src/usr.bin/compress/doc
Up to higher level directory
NameDateSize
NOTES02-Sep-20246.7K
README08-Sep-202411K
revision.log13-Jun-19945K

README

      1 
      2 	@(#)README	8.1 (Berkeley) 6/9/93
      3 
      4 Compress version 4.0 improvements over 3.0:
      5 	o compress() speedup (10-50%) by changing division hash to xor
      6 	o decompress() speedup (5-10%)
      7 	o Memory requirements reduced (3-30%)
      8 	o Stack requirements reduced to less than 4kb
      9 	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
     10     	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
     11 	o Default to 'quiet' mode
     12 	o Unification of 'force' flags
     13 	o Manual page overhaul
     14 	o Portability enhancement for M_XENIX
     15 	o Removed text on #else and #endif
     16 	o Added "-V" switch to print version and options
     17 	o Added #defines for SIGNED_COMPARE_SLOW
     18 	o Added Makefile and "usermem" program
     19 	o Removed all floating point computations
     20 	o New programs: [deleted]
     21 
     22 The "usermem" script attempts to determine the maximum process size.  Some
     23 editing of the script may be necessary (see the comments).  [It should work
     24 fine on 4.3 bsd.] If you can't get it to work at all, just create file
     25 "USERMEM" containing the maximum process size in decimal.
     26 
     27 The following preprocessor symbols control the compilation of "compress.c":
     28 
     29 	o USERMEM		Maximum process memory on the system
     30 	o SACREDMEM		Amount to reserve for other proceses
     31 	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
     32 	o NO_UCHAR		Don't use "unsigned char" types
     33 	o BITS			Overrules default set by USERMEM-SACREDMEM
     34 	o vax			Generate inline assembler
     35 	o interdata		Defines SIGNED_COMPARE_SLOW
     36 	o M_XENIX		Makes arrays < 65536 bytes each
     37 	o pdp11			BITS=12, NO_UCHAR
     38 	o z8000			BITS=12
     39 	o pcxt			BITS=12
     40 	o BSD4_2		Allow long filenames ( > 14 characters) &
     41 				Call setlinebuf(stderr)
     42 
     43 The difference "usermem-sacredmem" determines the maximum BITS that can be
     44 specified with the "-b" flag.
     45 
     46 memory: at least		BITS
     47 ------  -- -----                ----
     48      433,484			 16
     49      229,600			 15
     50      127,536			 14
     51       73,464			 13
     52            0			 12
     53 
     54 The default is BITS=16.
     55 
     56 The maximum bits can be overrulled by specifying "-DBITS=bits" at
     57 compilation time.
     58 
     59 WARNING: files compressed on a large machine with more bits than allowed by 
     60 a version of compress on a smaller machine cannot be decompressed!  Use the
     61 "-b12" flag to generate a file on a large machine that can be uncompressed 
     62 on a 16-bit machine.
     63 
     64 The output of compress 4.0 is fully compatible with that of compress 3.0.
     65 In other words, the output of compress 4.0 may be fed into uncompress 3.0 or
     66 the output of compress 3.0 may be fed into uncompress 4.0.
     67 
     68 The output of compress 4.0 not compatible with that of
     69 compress 2.0.  However, compress 4.0 still accepts the output of
     70 compress 2.0.  To generate output that is compatible with compress
     71 2.0, use the undocumented "-C" flag.
     72 
     73 	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
     74 --------------------------------
     75 
     76 Enclosed is compress version 3.0 with the following changes:
     77 
     78 1.	"Block" compression is performed.  After the BITS run out, the
     79 	compression ratio is checked every so often.  If it is decreasing,
     80 	the table is cleared and a new set of substrings are generated.
     81 
     82 	This makes the output of compress 3.0 not compatible with that of
     83 	compress 2.0.  However, compress 3.0 still accepts the output of
     84 	compress 2.0.  To generate output that is compatible with compress
     85 	2.0, use the undocumented "-C" flag.
     86 
     87 2.	A quiet "-q" flag has been added for use by the news system.
     88 
     89 3.	The character chaining has been deleted and the program now uses
     90 	hashing.  This improves the speed of the program, especially
     91 	during decompression.  Other speed improvements have been made,
     92 	such as using putc() instead of fwrite().
     93 
     94 4.	A large table is used on large machines when a relatively small
     95 	number of bits is specified.  This saves much time when compressing
     96 	for a 16-bit machine on a 32-bit virtual machine.  Note that the
     97 	speed improvement only occurs when the input file is > 30000
     98 	characters, and the -b BITS is less than or equal to the cutoff
     99 	described below.
    100 
    101 Most of these changes were made by James A. Woods (ames!jaw).  Thank you
    102 James!
    103 
    104 To compile compress:
    105 
    106 	cc -O -DUSERMEM=usermem -o compress compress.c
    107 
    108 Where "usermem" is the amount of physical user memory available (in bytes).  
    109 If any physical memory is to be reserved for other processes, put in 
    110 "-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
    111 
    112 The difference "usermem-sacredmem" determines the maximum BITS that can be
    113 specified, and the cutoff bits where the large+fast table is used.
    114 
    115 memory: at least		BITS		cutoff
    116 ------  -- -----                ----            ------
    117    4,718,592 			 16		  13
    118    2,621,440 			 16		  12
    119    1,572,864			 16		  11
    120    1,048,576			 16		  10
    121      631,808			 16               --
    122      329,728			 15               --
    123      178,176			 14		  --
    124       99,328			 13		  --
    125            0			 12		  --
    126 
    127 The default memory size is 750,000 which gives a maximum BITS=16 and no
    128 large+fast table.
    129 
    130 The maximum bits can be overruled by specifying "-DBITS=bits" at
    131 compilation time.
    132 
    133 If your machine doesn't support unsigned characters, define "NO_UCHAR" 
    134 when compiling.
    135 
    136 If your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
    137 
    138 After compilation, move "compress" to a standard executable location, such 
    139 as /usr/local.  Then:
    140 	cd /usr/local
    141 	ln compress uncompress
    142 	ln compress zcat
    143 
    144 On machines that have a fixed stack size (such as Perkin-Elmer), set the
    145 stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
    146 
    147 Next, install the manual (compress.l).
    148 	cp compress.l /usr/man/manl
    149 	cd /usr/man/manl
    150 	ln compress.l uncompress.l
    151 	ln compress.l zcat.l
    152 
    153 		- or -
    154 
    155 	cp compress.l /usr/man/man1/compress.1
    156 	cd /usr/man/man1
    157 	ln compress.1 uncompress.1
    158 	ln compress.1 zcat.1
    159 
    160 					regards,
    161 					petsd!joe
    162 
    163 Here is a note from the net:
    164 
    165 >From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
    166 Path: ames!hplabs!pesnta!amd!turtlevax!ken
    167 From: ken (a] turtlevax.UUCP (Ken Turkowski)
    168 Newsgroups: net.sources
    169 Subject: Re: Compress release 3.0 : sample Makefile
    170 Organization: CADLINC, Inc. @ Menlo Park, CA
    171 
    172 In the compress 3.0 source recently posted to mod.sources, there is a
    173 #define variable which can be set for optimum performance on a machine
    174 with a large amount of memory.  A program (usermem) to calculate the
    175 useable amount of physical user memory is enclosed, as well as a sample
    176 4.2bsd Vax Makefile for compress.
    177 
    178 Here is the README file from the previous version of compress (2.0):
    179 
    180 >Enclosed is compress.c version 2.0 with the following bugs fixed:
    181 >
    182 >1.	The packed files produced by compress are different on different
    183 >	machines and dependent on the vax sysgen option.
    184 >		The bug was in the different byte/bit ordering on the
    185 >		various machines.  This has been fixed.
    186 >
    187 >		This version is NOT compatible with the original vax posting
    188 >		unless the '-DCOMPATIBLE' option is specified to the C
    189 >		compiler.  The original posting has a bug which I fixed, 
    190 >		causing incompatible files.  I recommend you NOT to use this
    191 >		option unless you already have a lot of packed files from
    192 >		the original posting by thomas.
    193 >2.	The exit status is not well defined (on some machines) causing the
    194 >	scripts to fail.
    195 >		The exit status is now 0,1 or 2 and is documented in
    196 >		compress.l.
    197 >3.	The function getopt() is not available in all C libraries.
    198 >		The function getopt() is no longer referenced by the
    199 >		program.
    200 >4.	Error status is not being checked on the fwrite() and fflush() calls.
    201 >		Fixed.
    202 >
    203 >The following enhancements have been made:
    204 >
    205 >1.	Added facilities of "compact" into the compress program.  "Pack",
    206 >	"Unpack", and "Pcat" are no longer required (no longer supplied).
    207 >2.	Installed work around for C compiler bug with "-O".
    208 >3.	Added a magic number header (\037\235).  Put the bits specified
    209 >	in the file.
    210 >4.	Added "-f" flag to force overwrite of output file.
    211 >5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
    212 >	compile.
    213 >6.	The 'uncompress' script has been deleted; simply 
    214 >	'ln compress uncompress' after you compile and it will work.
    215 >7.	Removed extra bit masking for machines that support unsigned
    216 >	characters.  If your machine doesn't support unsigned characters,
    217 >	define "NO_UCHAR" when compiling.
    218 >
    219 >Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
    220 >standard executable location, such as /usr/local.  Then:
    221 >	cd /usr/local
    222 >	ln compress uncompress
    223 >	ln compress zcat
    224 >
    225 >On machines that have a fixed stack size (such as Perkin-Elmer), set the
    226 >stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
    227 >
    228 >Next, install the manual (compress.l).
    229 >	cp compress.l /usr/man/manl		- or -
    230 >	cp compress.l /usr/man/man1/compress.1
    231 >
    232 >Here is the README that I sent with my first posting:
    233 >
    234 >>Enclosed is a modified version of compress.c, along with scripts to make it
    235 >>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
    236 >>(petsd!joe) and a colleague (petsd!peora!srd) did:
    237 >>
    238 >>1. Removed VAX dependencies.
    239 >>2. Changed the struct to separate arrays; saves mucho memory.
    240 >>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
    241 >>4. Sorted the character next chain and changed the search to stop
    242 >>prematurely.  This saves a lot on the execution time when compressing.
    243 >>
    244 >>This version is totally compatible with the original version.  Even though
    245 >>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
    246 >>machine, due to the size of the arrays.
    247 >>
    248 >>Here is the README file from the original author:
    249 >> 
    250 >>>Well, with all this discussion about file compression (for news batching
    251 >>>in particular) going around, I decided to implement the text compression
    252 >>>algorithm described in the June Computer magazine.  The author claimed
    253 >>>blinding speed and good compression ratios.  It's certainly faster than
    254 >>>compact (but, then, what wouldn't be), but it's also the same speed as
    255 >>>pack, and gets better compression than both of them.  On 350K bytes of
    256 >>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
    257 >>>seconds, and compress (herein) also took 80 seconds.  But, compact and
    258 >>>pack got about 30% compression, whereas compress got over 50%.  So, I
    259 >>>decided I had something, and that others might be interested, too.
    260 >>>
    261 >>>As is probably true of compact and pack (although I haven't checked),
    262 >>>the byte order within a word is probably relevant here, but as long as
    263 >>>you stay on a single machine type, you should be ok.  (Can anybody
    264 >>>elucidate on this?)  There are a couple of asm's in the code (extv and
    265 >>>insv instructions), so anyone porting it to another machine will have to
    266 >>>deal with this anyway (and could probably make it compatible with Vax
    267 >>>byte order at the same time).  Anyway, I've linted the code (both with
    268 >>>and without -p), so it should run elsewhere.  Note the longs in the
    269 >>>code, you can take these out if you reduce BITS to <= 15.
    270 >>>
    271 >>>Have fun, and as always, if you make good enhancements, or bug fixes,
    272 >>>I'd like to see them.
    273 >>>
    274 >>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
    275 >>
    276 >>					regards,
    277 >>					joe
    278 >>
    279 >>--
    280 >>Full-Name:  Joseph M. Orost
    281 >>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
    282 >>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
    283 >>Phone:      (201) 870-5844
    284