rfc1952.txt revision 1.1 1 1.1 christos
2 1.1 christos
3 1.1 christos
4 1.1 christos
5 1.1 christos
6 1.1 christos
7 1.1 christos Network Working Group P. Deutsch
8 1.1 christos Request for Comments: 1952 Aladdin Enterprises
9 1.1 christos Category: Informational May 1996
10 1.1 christos
11 1.1 christos
12 1.1 christos GZIP file format specification version 4.3
13 1.1 christos
14 1.1 christos Status of This Memo
15 1.1 christos
16 1.1 christos This memo provides information for the Internet community. This memo
17 1.1 christos does not specify an Internet standard of any kind. Distribution of
18 1.1 christos this memo is unlimited.
19 1.1 christos
20 1.1 christos IESG Note:
21 1.1 christos
22 1.1 christos The IESG takes no position on the validity of any Intellectual
23 1.1 christos Property Rights statements contained in this document.
24 1.1 christos
25 1.1 christos Notices
26 1.1 christos
27 1.1 christos Copyright (c) 1996 L. Peter Deutsch
28 1.1 christos
29 1.1 christos Permission is granted to copy and distribute this document for any
30 1.1 christos purpose and without charge, including translations into other
31 1.1 christos languages and incorporation into compilations, provided that the
32 1.1 christos copyright notice and this notice are preserved, and that any
33 1.1 christos substantive changes or deletions from the original are clearly
34 1.1 christos marked.
35 1.1 christos
36 1.1 christos A pointer to the latest version of this and related documentation in
37 1.1 christos HTML format can be found at the URL
38 1.1 christos <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39 1.1 christos
40 1.1 christos Abstract
41 1.1 christos
42 1.1 christos This specification defines a lossless compressed data format that is
43 1.1 christos compatible with the widely used GZIP utility. The format includes a
44 1.1 christos cyclic redundancy check value for detecting data corruption. The
45 1.1 christos format presently uses the DEFLATE method of compression but can be
46 1.1 christos easily extended to use other compression methods. The format can be
47 1.1 christos implemented readily in a manner not covered by patents.
48 1.1 christos
49 1.1 christos
50 1.1 christos
51 1.1 christos
52 1.1 christos
53 1.1 christos
54 1.1 christos
55 1.1 christos
56 1.1 christos
57 1.1 christos
58 1.1 christos Deutsch Informational [Page 1]
59 1.1 christos
61 1.1 christos RFC 1952 GZIP File Format Specification May 1996
62 1.1 christos
63 1.1 christos
64 1.1 christos Table of Contents
65 1.1 christos
66 1.1 christos 1. Introduction ................................................... 2
67 1.1 christos 1.1. Purpose ................................................... 2
68 1.1 christos 1.2. Intended audience ......................................... 3
69 1.1 christos 1.3. Scope ..................................................... 3
70 1.1 christos 1.4. Compliance ................................................ 3
71 1.1 christos 1.5. Definitions of terms and conventions used ................. 3
72 1.1 christos 1.6. Changes from previous versions ............................ 3
73 1.1 christos 2. Detailed specification ......................................... 4
74 1.1 christos 2.1. Overall conventions ....................................... 4
75 1.1 christos 2.2. File format ............................................... 5
76 1.1 christos 2.3. Member format ............................................. 5
77 1.1 christos 2.3.1. Member header and trailer ........................... 6
78 1.1 christos 2.3.1.1. Extra field ................................... 8
79 1.1 christos 2.3.1.2. Compliance .................................... 9
80 1.1 christos 3. References .................................................. 9
81 1.1 christos 4. Security Considerations .................................... 10
82 1.1 christos 5. Acknowledgements ........................................... 10
83 1.1 christos 6. Author's Address ........................................... 10
84 1.1 christos 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
85 1.1 christos 8. Appendix: Sample CRC Code .................................. 11
86 1.1 christos
87 1.1 christos 1. Introduction
88 1.1 christos
89 1.1 christos 1.1. Purpose
90 1.1 christos
91 1.1 christos The purpose of this specification is to define a lossless
92 1.1 christos compressed data format that:
93 1.1 christos
94 1.1 christos * Is independent of CPU type, operating system, file system,
95 1.1 christos and character set, and hence can be used for interchange;
96 1.1 christos * Can compress or decompress a data stream (as opposed to a
97 1.1 christos randomly accessible file) to produce another data stream,
98 1.1 christos using only an a priori bounded amount of intermediate
99 1.1 christos storage, and hence can be used in data communications or
100 1.1 christos similar structures such as Unix filters;
101 1.1 christos * Compresses data with efficiency comparable to the best
102 1.1 christos currently available general-purpose compression methods,
103 1.1 christos and in particular considerably better than the "compress"
104 1.1 christos program;
105 1.1 christos * Can be implemented readily in a manner not covered by
106 1.1 christos patents, and hence can be practiced freely;
107 1.1 christos * Is compatible with the file format produced by the current
108 1.1 christos widely used gzip utility, in that conforming decompressors
109 1.1 christos will be able to read data produced by the existing gzip
110 1.1 christos compressor.
111 1.1 christos
112 1.1 christos
113 1.1 christos
114 1.1 christos
115 1.1 christos Deutsch Informational [Page 2]
116 1.1 christos
118 1.1 christos RFC 1952 GZIP File Format Specification May 1996
119 1.1 christos
120 1.1 christos
121 1.1 christos The data format defined by this specification does not attempt to:
122 1.1 christos
123 1.1 christos * Provide random access to compressed data;
124 1.1 christos * Compress specialized data (e.g., raster graphics) as well as
125 1.1 christos the best currently available specialized algorithms.
126 1.1 christos
127 1.1 christos 1.2. Intended audience
128 1.1 christos
129 1.1 christos This specification is intended for use by implementors of software
130 1.1 christos to compress data into gzip format and/or decompress data from gzip
131 1.1 christos format.
132 1.1 christos
133 1.1 christos The text of the specification assumes a basic background in
134 1.1 christos programming at the level of bits and other primitive data
135 1.1 christos representations.
136 1.1 christos
137 1.1 christos 1.3. Scope
138 1.1 christos
139 1.1 christos The specification specifies a compression method and a file format
140 1.1 christos (the latter assuming only that a file can store a sequence of
141 1.1 christos arbitrary bytes). It does not specify any particular interface to
142 1.1 christos a file system or anything about character sets or encodings
143 1.1 christos (except for file names and comments, which are optional).
144 1.1 christos
145 1.1 christos 1.4. Compliance
146 1.1 christos
147 1.1 christos Unless otherwise indicated below, a compliant decompressor must be
148 1.1 christos able to accept and decompress any file that conforms to all the
149 1.1 christos specifications presented here; a compliant compressor must produce
150 1.1 christos files that conform to all the specifications presented here. The
151 1.1 christos material in the appendices is not part of the specification per se
152 1.1 christos and is not relevant to compliance.
153 1.1 christos
154 1.1 christos 1.5. Definitions of terms and conventions used
155 1.1 christos
156 1.1 christos byte: 8 bits stored or transmitted as a unit (same as an octet).
157 1.1 christos (For this specification, a byte is exactly 8 bits, even on
158 1.1 christos machines which store a character on a number of bits different
159 1.1 christos from 8.) See below for the numbering of bits within a byte.
160 1.1 christos
161 1.1 christos 1.6. Changes from previous versions
162 1.1 christos
163 1.1 christos There have been no technical changes to the gzip format since
164 1.1 christos version 4.1 of this specification. In version 4.2, some
165 1.1 christos terminology was changed, and the sample CRC code was rewritten for
166 1.1 christos clarity and to eliminate the requirement for the caller to do pre-
167 1.1 christos and post-conditioning. Version 4.3 is a conversion of the
168 1.1 christos specification to RFC style.
169 1.1 christos
170 1.1 christos
171 1.1 christos
172 1.1 christos Deutsch Informational [Page 3]
173 1.1 christos
175 1.1 christos RFC 1952 GZIP File Format Specification May 1996
176 1.1 christos
177 1.1 christos
178 1.1 christos 2. Detailed specification
179 1.1 christos
180 1.1 christos 2.1. Overall conventions
181 1.1 christos
182 1.1 christos In the diagrams below, a box like this:
183 1.1 christos
184 1.1 christos +---+
185 1.1 christos | | <-- the vertical bars might be missing
186 1.1 christos +---+
187 1.1 christos
188 1.1 christos represents one byte; a box like this:
189 1.1 christos
190 1.1 christos +==============+
191 1.1 christos | |
192 1.1 christos +==============+
193 1.1 christos
194 1.1 christos represents a variable number of bytes.
195 1.1 christos
196 1.1 christos Bytes stored within a computer do not have a "bit order", since
197 1.1 christos they are always treated as a unit. However, a byte considered as
198 1.1 christos an integer between 0 and 255 does have a most- and least-
199 1.1 christos significant bit, and since we write numbers with the most-
200 1.1 christos significant digit on the left, we also write bytes with the most-
201 1.1 christos significant bit on the left. In the diagrams below, we number the
202 1.1 christos bits of a byte so that bit 0 is the least-significant bit, i.e.,
203 1.1 christos the bits are numbered:
204 1.1 christos
205 1.1 christos +--------+
206 1.1 christos |76543210|
207 1.1 christos +--------+
208 1.1 christos
209 1.1 christos This document does not address the issue of the order in which
210 1.1 christos bits of a byte are transmitted on a bit-sequential medium, since
211 1.1 christos the data format described here is byte- rather than bit-oriented.
212 1.1 christos
213 1.1 christos Within a computer, a number may occupy multiple bytes. All
214 1.1 christos multi-byte numbers in the format described here are stored with
215 1.1 christos the least-significant byte first (at the lower memory address).
216 1.1 christos For example, the decimal number 520 is stored as:
217 1.1 christos
218 1.1 christos 0 1
219 1.1 christos +--------+--------+
220 1.1 christos |00001000|00000010|
221 1.1 christos +--------+--------+
222 1.1 christos ^ ^
223 1.1 christos | |
224 1.1 christos | + more significant byte = 2 x 256
225 1.1 christos + less significant byte = 8
226 1.1 christos
227 1.1 christos
228 1.1 christos
229 1.1 christos Deutsch Informational [Page 4]
230 1.1 christos
232 1.1 christos RFC 1952 GZIP File Format Specification May 1996
233 1.1 christos
234 1.1 christos
235 1.1 christos 2.2. File format
236 1.1 christos
237 1.1 christos A gzip file consists of a series of "members" (compressed data
238 1.1 christos sets). The format of each member is specified in the following
239 1.1 christos section. The members simply appear one after another in the file,
240 1.1 christos with no additional information before, between, or after them.
241 1.1 christos
242 1.1 christos 2.3. Member format
243 1.1 christos
244 1.1 christos Each member has the following structure:
245 1.1 christos
246 1.1 christos +---+---+---+---+---+---+---+---+---+---+
247 1.1 christos |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
248 1.1 christos +---+---+---+---+---+---+---+---+---+---+
249 1.1 christos
250 1.1 christos (if FLG.FEXTRA set)
251 1.1 christos
252 1.1 christos +---+---+=================================+
253 1.1 christos | XLEN |...XLEN bytes of "extra field"...| (more-->)
254 1.1 christos +---+---+=================================+
255 1.1 christos
256 1.1 christos (if FLG.FNAME set)
257 1.1 christos
258 1.1 christos +=========================================+
259 1.1 christos |...original file name, zero-terminated...| (more-->)
260 1.1 christos +=========================================+
261 1.1 christos
262 1.1 christos (if FLG.FCOMMENT set)
263 1.1 christos
264 1.1 christos +===================================+
265 1.1 christos |...file comment, zero-terminated...| (more-->)
266 1.1 christos +===================================+
267 1.1 christos
268 1.1 christos (if FLG.FHCRC set)
269 1.1 christos
270 1.1 christos +---+---+
271 1.1 christos | CRC16 |
272 1.1 christos +---+---+
273 1.1 christos
274 1.1 christos +=======================+
275 1.1 christos |...compressed blocks...| (more-->)
276 1.1 christos +=======================+
277 1.1 christos
278 1.1 christos 0 1 2 3 4 5 6 7
279 1.1 christos +---+---+---+---+---+---+---+---+
280 1.1 christos | CRC32 | ISIZE |
281 1.1 christos +---+---+---+---+---+---+---+---+
282 1.1 christos
283 1.1 christos
284 1.1 christos
285 1.1 christos
286 1.1 christos Deutsch Informational [Page 5]
287 1.1 christos
289 1.1 christos RFC 1952 GZIP File Format Specification May 1996
290 1.1 christos
291 1.1 christos
292 1.1 christos 2.3.1. Member header and trailer
293 1.1 christos
294 1.1 christos ID1 (IDentification 1)
295 1.1 christos ID2 (IDentification 2)
296 1.1 christos These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
297 1.1 christos (0x8b, \213), to identify the file as being in gzip format.
298 1.1 christos
299 1.1 christos CM (Compression Method)
300 1.1 christos This identifies the compression method used in the file. CM
301 1.1 christos = 0-7 are reserved. CM = 8 denotes the "deflate"
302 1.1 christos compression method, which is the one customarily used by
303 1.1 christos gzip and which is documented elsewhere.
304 1.1 christos
305 1.1 christos FLG (FLaGs)
306 1.1 christos This flag byte is divided into individual bits as follows:
307 1.1 christos
308 1.1 christos bit 0 FTEXT
309 1.1 christos bit 1 FHCRC
310 1.1 christos bit 2 FEXTRA
311 1.1 christos bit 3 FNAME
312 1.1 christos bit 4 FCOMMENT
313 1.1 christos bit 5 reserved
314 1.1 christos bit 6 reserved
315 1.1 christos bit 7 reserved
316 1.1 christos
317 1.1 christos If FTEXT is set, the file is probably ASCII text. This is
318 1.1 christos an optional indication, which the compressor may set by
319 1.1 christos checking a small amount of the input data to see whether any
320 1.1 christos non-ASCII characters are present. In case of doubt, FTEXT
321 1.1 christos is cleared, indicating binary data. For systems which have
322 1.1 christos different file formats for ascii text and binary data, the
323 1.1 christos decompressor can use FTEXT to choose the appropriate format.
324 1.1 christos We deliberately do not specify the algorithm used to set
325 1.1 christos this bit, since a compressor always has the option of
326 1.1 christos leaving it cleared and a decompressor always has the option
327 1.1 christos of ignoring it and letting some other program handle issues
328 1.1 christos of data conversion.
329 1.1 christos
330 1.1 christos If FHCRC is set, a CRC16 for the gzip header is present,
331 1.1 christos immediately before the compressed data. The CRC16 consists
332 1.1 christos of the two least significant bytes of the CRC32 for all
333 1.1 christos bytes of the gzip header up to and not including the CRC16.
334 1.1 christos [The FHCRC bit was never set by versions of gzip up to
335 1.1 christos 1.2.4, even though it was documented with a different
336 1.1 christos meaning in gzip 1.2.4.]
337 1.1 christos
338 1.1 christos If FEXTRA is set, optional extra fields are present, as
339 1.1 christos described in a following section.
340 1.1 christos
341 1.1 christos
342 1.1 christos
343 1.1 christos Deutsch Informational [Page 6]
344 1.1 christos
346 1.1 christos RFC 1952 GZIP File Format Specification May 1996
347 1.1 christos
348 1.1 christos
349 1.1 christos If FNAME is set, an original file name is present,
350 1.1 christos terminated by a zero byte. The name must consist of ISO
351 1.1 christos 8859-1 (LATIN-1) characters; on operating systems using
352 1.1 christos EBCDIC or any other character set for file names, the name
353 1.1 christos must be translated to the ISO LATIN-1 character set. This
354 1.1 christos is the original name of the file being compressed, with any
355 1.1 christos directory components removed, and, if the file being
356 1.1 christos compressed is on a file system with case insensitive names,
357 1.1 christos forced to lower case. There is no original file name if the
358 1.1 christos data was compressed from a source other than a named file;
359 1.1 christos for example, if the source was stdin on a Unix system, there
360 1.1 christos is no file name.
361 1.1 christos
362 1.1 christos If FCOMMENT is set, a zero-terminated file comment is
363 1.1 christos present. This comment is not interpreted; it is only
364 1.1 christos intended for human consumption. The comment must consist of
365 1.1 christos ISO 8859-1 (LATIN-1) characters. Line breaks should be
366 1.1 christos denoted by a single line feed character (10 decimal).
367 1.1 christos
368 1.1 christos Reserved FLG bits must be zero.
369 1.1 christos
370 1.1 christos MTIME (Modification TIME)
371 1.1 christos This gives the most recent modification time of the original
372 1.1 christos file being compressed. The time is in Unix format, i.e.,
373 1.1 christos seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
374 1.1 christos may cause problems for MS-DOS and other systems that use
375 1.1 christos local rather than Universal time.) If the compressed data
376 1.1 christos did not come from a file, MTIME is set to the time at which
377 1.1 christos compression started. MTIME = 0 means no time stamp is
378 1.1 christos available.
379 1.1 christos
380 1.1 christos XFL (eXtra FLags)
381 1.1 christos These flags are available for use by specific compression
382 1.1 christos methods. The "deflate" method (CM = 8) sets these flags as
383 1.1 christos follows:
384 1.1 christos
385 1.1 christos XFL = 2 - compressor used maximum compression,
386 1.1 christos slowest algorithm
387 1.1 christos XFL = 4 - compressor used fastest algorithm
388 1.1 christos
389 1.1 christos OS (Operating System)
390 1.1 christos This identifies the type of file system on which compression
391 1.1 christos took place. This may be useful in determining end-of-line
392 1.1 christos convention for text files. The currently defined values are
393 1.1 christos as follows:
394 1.1 christos
395 1.1 christos
396 1.1 christos
397 1.1 christos
398 1.1 christos
399 1.1 christos
400 1.1 christos Deutsch Informational [Page 7]
401 1.1 christos
403 1.1 christos RFC 1952 GZIP File Format Specification May 1996
404 1.1 christos
405 1.1 christos
406 1.1 christos 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
407 1.1 christos 1 - Amiga
408 1.1 christos 2 - VMS (or OpenVMS)
409 1.1 christos 3 - Unix
410 1.1 christos 4 - VM/CMS
411 1.1 christos 5 - Atari TOS
412 1.1 christos 6 - HPFS filesystem (OS/2, NT)
413 1.1 christos 7 - Macintosh
414 1.1 christos 8 - Z-System
415 1.1 christos 9 - CP/M
416 1.1 christos 10 - TOPS-20
417 1.1 christos 11 - NTFS filesystem (NT)
418 1.1 christos 12 - QDOS
419 1.1 christos 13 - Acorn RISCOS
420 1.1 christos 255 - unknown
421 1.1 christos
422 1.1 christos XLEN (eXtra LENgth)
423 1.1 christos If FLG.FEXTRA is set, this gives the length of the optional
424 1.1 christos extra field. See below for details.
425 1.1 christos
426 1.1 christos CRC32 (CRC-32)
427 1.1 christos This contains a Cyclic Redundancy Check value of the
428 1.1 christos uncompressed data computed according to CRC-32 algorithm
429 1.1 christos used in the ISO 3309 standard and in section 8.1.1.6.2 of
430 1.1 christos ITU-T recommendation V.42. (See http://www.iso.ch for
431 1.1 christos ordering ISO documents. See gopher://info.itu.ch for an
432 1.1 christos online version of ITU-T V.42.)
433 1.1 christos
434 1.1 christos ISIZE (Input SIZE)
435 1.1 christos This contains the size of the original (uncompressed) input
436 1.1 christos data modulo 2^32.
437 1.1 christos
438 1.1 christos 2.3.1.1. Extra field
439 1.1 christos
440 1.1 christos If the FLG.FEXTRA bit is set, an "extra field" is present in
441 1.1 christos the header, with total length XLEN bytes. It consists of a
442 1.1 christos series of subfields, each of the form:
443 1.1 christos
444 1.1 christos +---+---+---+---+==================================+
445 1.1 christos |SI1|SI2| LEN |... LEN bytes of subfield data ...|
446 1.1 christos +---+---+---+---+==================================+
447 1.1 christos
448 1.1 christos SI1 and SI2 provide a subfield ID, typically two ASCII letters
449 1.1 christos with some mnemonic value. Jean-Loup Gailly
450 1.1 christos <gzip (a] prep.ai.mit.edu> is maintaining a registry of subfield
451 1.1 christos IDs; please send him any subfield ID you wish to use. Subfield
452 1.1 christos IDs with SI2 = 0 are reserved for future use. The following
453 1.1 christos IDs are currently defined:
454 1.1 christos
455 1.1 christos
456 1.1 christos
457 1.1 christos Deutsch Informational [Page 8]
458 1.1 christos
460 1.1 christos RFC 1952 GZIP File Format Specification May 1996
461 1.1 christos
462 1.1 christos
463 1.1 christos SI1 SI2 Data
464 1.1 christos ---------- ---------- ----
465 1.1 christos 0x41 ('A') 0x70 ('P') Apollo file type information
466 1.1 christos
467 1.1 christos LEN gives the length of the subfield data, excluding the 4
468 1.1 christos initial bytes.
469 1.1 christos
470 1.1 christos 2.3.1.2. Compliance
471 1.1 christos
472 1.1 christos A compliant compressor must produce files with correct ID1,
473 1.1 christos ID2, CM, CRC32, and ISIZE, but may set all the other fields in
474 1.1 christos the fixed-length part of the header to default values (255 for
475 1.1 christos OS, 0 for all others). The compressor must set all reserved
476 1.1 christos bits to zero.
477 1.1 christos
478 1.1 christos A compliant decompressor must check ID1, ID2, and CM, and
479 1.1 christos provide an error indication if any of these have incorrect
480 1.1 christos values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
481 1.1 christos at least so it can skip over the optional fields if they are
482 1.1 christos present. It need not examine any other part of the header or
483 1.1 christos trailer; in particular, a decompressor may ignore FTEXT and OS
484 1.1 christos and always produce binary output, and still be compliant. A
485 1.1 christos compliant decompressor must give an error indication if any
486 1.1 christos reserved bit is non-zero, since such a bit could indicate the
487 1.1 christos presence of a new field that would cause subsequent data to be
488 1.1 christos interpreted incorrectly.
489 1.1 christos
490 1.1 christos 3. References
491 1.1 christos
492 1.1 christos [1] "Information Processing - 8-bit single-byte coded graphic
493 1.1 christos character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
494 1.1 christos The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
495 1.1 christos ASCII. Files defining this character set are available as
496 1.1 christos iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
497 1.1 christos
498 1.1 christos [2] ISO 3309
499 1.1 christos
500 1.1 christos [3] ITU-T recommendation V.42
501 1.1 christos
502 1.1 christos [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
503 1.1 christos available in ftp://ftp.uu.net/pub/archiving/zip/doc/
504 1.1 christos
505 1.1 christos [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
506 1.1 christos ftp://prep.ai.mit.edu/pub/gnu/
507 1.1 christos
508 1.1 christos [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
509 1.1 christos Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
510 1.1 christos
511 1.1 christos
512 1.1 christos
513 1.1 christos
514 1.1 christos Deutsch Informational [Page 9]
515 1.1 christos
517 1.1 christos RFC 1952 GZIP File Format Specification May 1996
518 1.1 christos
519 1.1 christos
520 1.1 christos [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
521 1.1 christos pp.118-133.
522 1.1 christos
523 1.1 christos [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
524 1.1 christos describing the CRC concept.
525 1.1 christos
526 1.1 christos 4. Security Considerations
527 1.1 christos
528 1.1 christos Any data compression method involves the reduction of redundancy in
529 1.1 christos the data. Consequently, any corruption of the data is likely to have
530 1.1 christos severe effects and be difficult to correct. Uncompressed text, on
531 1.1 christos the other hand, will probably still be readable despite the presence
532 1.1 christos of some corrupted bytes.
533 1.1 christos
534 1.1 christos It is recommended that systems using this data format provide some
535 1.1 christos means of validating the integrity of the compressed data, such as by
536 1.1 christos setting and checking the CRC-32 check value.
537 1.1 christos
538 1.1 christos 5. Acknowledgements
539 1.1 christos
540 1.1 christos Trademarks cited in this document are the property of their
541 1.1 christos respective owners.
542 1.1 christos
543 1.1 christos Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
544 1.1 christos the related software described in this specification. Glenn
545 1.1 christos Randers-Pehrson converted this document to RFC and HTML format.
546 1.1 christos
547 1.1 christos 6. Author's Address
548 1.1 christos
549 1.1 christos L. Peter Deutsch
550 1.1 christos Aladdin Enterprises
551 1.1 christos 203 Santa Margarita Ave.
552 1.1 christos Menlo Park, CA 94025
553 1.1 christos
554 1.1 christos Phone: (415) 322-0103 (AM only)
555 1.1 christos FAX: (415) 322-1734
556 1.1 christos EMail: <ghost (a] aladdin.com>
557 1.1 christos
558 1.1 christos Questions about the technical content of this specification can be
559 1.1 christos sent by email to:
560 1.1 christos
561 1.1 christos Jean-Loup Gailly <gzip (a] prep.ai.mit.edu> and
562 1.1 christos Mark Adler <madler (a] alumni.caltech.edu>
563 1.1 christos
564 1.1 christos Editorial comments on this specification can be sent by email to:
565 1.1 christos
566 1.1 christos L. Peter Deutsch <ghost (a] aladdin.com> and
567 1.1 christos Glenn Randers-Pehrson <randeg (a] alumni.rpi.edu>
568 1.1 christos
569 1.1 christos
570 1.1 christos
571 1.1 christos Deutsch Informational [Page 10]
572 1.1 christos
574 1.1 christos RFC 1952 GZIP File Format Specification May 1996
575 1.1 christos
576 1.1 christos
577 1.1 christos 7. Appendix: Jean-Loup Gailly's gzip utility
578 1.1 christos
579 1.1 christos The most widely used implementation of gzip compression, and the
580 1.1 christos original documentation on which this specification is based, were
581 1.1 christos created by Jean-Loup Gailly <gzip (a] prep.ai.mit.edu>. Since this
582 1.1 christos implementation is a de facto standard, we mention some more of its
583 1.1 christos features here. Again, the material in this section is not part of
584 1.1 christos the specification per se, and implementations need not follow it to
585 1.1 christos be compliant.
586 1.1 christos
587 1.1 christos When compressing or decompressing a file, gzip preserves the
588 1.1 christos protection, ownership, and modification time attributes on the local
589 1.1 christos file system, since there is no provision for representing protection
590 1.1 christos attributes in the gzip file format itself. Since the file format
591 1.1 christos includes a modification time, the gzip decompressor provides a
592 1.1 christos command line switch that assigns the modification time from the file,
593 1.1 christos rather than the local modification time of the compressed input, to
594 1.1 christos the decompressed output.
595 1.1 christos
596 1.1 christos 8. Appendix: Sample CRC Code
597 1.1 christos
598 1.1 christos The following sample code represents a practical implementation of
599 1.1 christos the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
600 1.1 christos for a formal specification.)
601 1.1 christos
602 1.1 christos The sample code is in the ANSI C programming language. Non C users
603 1.1 christos may find it easier to read with these hints:
604 1.1 christos
605 1.1 christos & Bitwise AND operator.
606 1.1 christos ^ Bitwise exclusive-OR operator.
607 1.1 christos >> Bitwise right shift operator. When applied to an
608 1.1 christos unsigned quantity, as here, right shift inserts zero
609 1.1 christos bit(s) at the left.
610 1.1 christos ! Logical NOT operator.
611 1.1 christos ++ "n++" increments the variable n.
612 1.1 christos 0xNNN 0x introduces a hexadecimal (base 16) constant.
613 1.1 christos Suffix L indicates a long value (at least 32 bits).
614 1.1 christos
615 1.1 christos /* Table of CRCs of all 8-bit messages. */
616 1.1 christos unsigned long crc_table[256];
617 1.1 christos
618 1.1 christos /* Flag: has the table been computed? Initially false. */
619 1.1 christos int crc_table_computed = 0;
620 1.1 christos
621 1.1 christos /* Make the table for a fast CRC. */
622 1.1 christos void make_crc_table(void)
623 1.1 christos {
624 1.1 christos unsigned long c;
625 1.1 christos
626 1.1 christos
627 1.1 christos
628 1.1 christos Deutsch Informational [Page 11]
629 1.1 christos
631 1.1 christos RFC 1952 GZIP File Format Specification May 1996
632 1.1 christos
633 1.1 christos
634 1.1 christos int n, k;
635 1.1 christos for (n = 0; n < 256; n++) {
636 1.1 christos c = (unsigned long) n;
637 1.1 christos for (k = 0; k < 8; k++) {
638 1.1 christos if (c & 1) {
639 1.1 christos c = 0xedb88320L ^ (c >> 1);
640 1.1 christos } else {
641 1.1 christos c = c >> 1;
642 1.1 christos }
643 1.1 christos }
644 1.1 christos crc_table[n] = c;
645 1.1 christos }
646 1.1 christos crc_table_computed = 1;
647 1.1 christos }
648 1.1 christos
649 1.1 christos /*
650 1.1 christos Update a running crc with the bytes buf[0..len-1] and return
651 1.1 christos the updated crc. The crc should be initialized to zero. Pre- and
652 1.1 christos post-conditioning (one's complement) is performed within this
653 1.1 christos function so it shouldn't be done by the caller. Usage example:
654 1.1 christos
655 1.1 christos unsigned long crc = 0L;
656 1.1 christos
657 1.1 christos while (read_buffer(buffer, length) != EOF) {
658 1.1 christos crc = update_crc(crc, buffer, length);
659 1.1 christos }
660 1.1 christos if (crc != original_crc) error();
661 1.1 christos */
662 1.1 christos unsigned long update_crc(unsigned long crc,
663 1.1 christos unsigned char *buf, int len)
664 1.1 christos {
665 1.1 christos unsigned long c = crc ^ 0xffffffffL;
666 1.1 christos int n;
667 1.1 christos
668 1.1 christos if (!crc_table_computed)
669 1.1 christos make_crc_table();
670 1.1 christos for (n = 0; n < len; n++) {
671 1.1 christos c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
672 1.1 christos }
673 1.1 christos return c ^ 0xffffffffL;
674 1.1 christos }
675 1.1 christos
676 /* Return the CRC of the bytes buf[0..len-1]. */
677 unsigned long crc(unsigned char *buf, int len)
678 {
679 return update_crc(0L, buf, len);
680 }
681
682
683
684
685 Deutsch Informational [Page 12]
686
688