rfc1952.txt revision 1.1.1.1.4.2 1 1.1.1.1.4.2 pgoyette
2 1.1.1.1.4.2 pgoyette
3 1.1.1.1.4.2 pgoyette
4 1.1.1.1.4.2 pgoyette
5 1.1.1.1.4.2 pgoyette
6 1.1.1.1.4.2 pgoyette
7 1.1.1.1.4.2 pgoyette Network Working Group P. Deutsch
8 1.1.1.1.4.2 pgoyette Request for Comments: 1952 Aladdin Enterprises
9 1.1.1.1.4.2 pgoyette Category: Informational May 1996
10 1.1.1.1.4.2 pgoyette
11 1.1.1.1.4.2 pgoyette
12 1.1.1.1.4.2 pgoyette GZIP file format specification version 4.3
13 1.1.1.1.4.2 pgoyette
14 1.1.1.1.4.2 pgoyette Status of This Memo
15 1.1.1.1.4.2 pgoyette
16 1.1.1.1.4.2 pgoyette This memo provides information for the Internet community. This memo
17 1.1.1.1.4.2 pgoyette does not specify an Internet standard of any kind. Distribution of
18 1.1.1.1.4.2 pgoyette this memo is unlimited.
19 1.1.1.1.4.2 pgoyette
20 1.1.1.1.4.2 pgoyette IESG Note:
21 1.1.1.1.4.2 pgoyette
22 1.1.1.1.4.2 pgoyette The IESG takes no position on the validity of any Intellectual
23 1.1.1.1.4.2 pgoyette Property Rights statements contained in this document.
24 1.1.1.1.4.2 pgoyette
25 1.1.1.1.4.2 pgoyette Notices
26 1.1.1.1.4.2 pgoyette
27 1.1.1.1.4.2 pgoyette Copyright (c) 1996 L. Peter Deutsch
28 1.1.1.1.4.2 pgoyette
29 1.1.1.1.4.2 pgoyette Permission is granted to copy and distribute this document for any
30 1.1.1.1.4.2 pgoyette purpose and without charge, including translations into other
31 1.1.1.1.4.2 pgoyette languages and incorporation into compilations, provided that the
32 1.1.1.1.4.2 pgoyette copyright notice and this notice are preserved, and that any
33 1.1.1.1.4.2 pgoyette substantive changes or deletions from the original are clearly
34 1.1.1.1.4.2 pgoyette marked.
35 1.1.1.1.4.2 pgoyette
36 1.1.1.1.4.2 pgoyette A pointer to the latest version of this and related documentation in
37 1.1.1.1.4.2 pgoyette HTML format can be found at the URL
38 1.1.1.1.4.2 pgoyette <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39 1.1.1.1.4.2 pgoyette
40 1.1.1.1.4.2 pgoyette Abstract
41 1.1.1.1.4.2 pgoyette
42 1.1.1.1.4.2 pgoyette This specification defines a lossless compressed data format that is
43 1.1.1.1.4.2 pgoyette compatible with the widely used GZIP utility. The format includes a
44 1.1.1.1.4.2 pgoyette cyclic redundancy check value for detecting data corruption. The
45 1.1.1.1.4.2 pgoyette format presently uses the DEFLATE method of compression but can be
46 1.1.1.1.4.2 pgoyette easily extended to use other compression methods. The format can be
47 1.1.1.1.4.2 pgoyette implemented readily in a manner not covered by patents.
48 1.1.1.1.4.2 pgoyette
49 1.1.1.1.4.2 pgoyette
50 1.1.1.1.4.2 pgoyette
51 1.1.1.1.4.2 pgoyette
52 1.1.1.1.4.2 pgoyette
53 1.1.1.1.4.2 pgoyette
54 1.1.1.1.4.2 pgoyette
55 1.1.1.1.4.2 pgoyette
56 1.1.1.1.4.2 pgoyette
57 1.1.1.1.4.2 pgoyette
58 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 1]
59 1.1.1.1.4.2 pgoyette
61 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
62 1.1.1.1.4.2 pgoyette
63 1.1.1.1.4.2 pgoyette
64 1.1.1.1.4.2 pgoyette Table of Contents
65 1.1.1.1.4.2 pgoyette
66 1.1.1.1.4.2 pgoyette 1. Introduction ................................................... 2
67 1.1.1.1.4.2 pgoyette 1.1. Purpose ................................................... 2
68 1.1.1.1.4.2 pgoyette 1.2. Intended audience ......................................... 3
69 1.1.1.1.4.2 pgoyette 1.3. Scope ..................................................... 3
70 1.1.1.1.4.2 pgoyette 1.4. Compliance ................................................ 3
71 1.1.1.1.4.2 pgoyette 1.5. Definitions of terms and conventions used ................. 3
72 1.1.1.1.4.2 pgoyette 1.6. Changes from previous versions ............................ 3
73 1.1.1.1.4.2 pgoyette 2. Detailed specification ......................................... 4
74 1.1.1.1.4.2 pgoyette 2.1. Overall conventions ....................................... 4
75 1.1.1.1.4.2 pgoyette 2.2. File format ............................................... 5
76 1.1.1.1.4.2 pgoyette 2.3. Member format ............................................. 5
77 1.1.1.1.4.2 pgoyette 2.3.1. Member header and trailer ........................... 6
78 1.1.1.1.4.2 pgoyette 2.3.1.1. Extra field ................................... 8
79 1.1.1.1.4.2 pgoyette 2.3.1.2. Compliance .................................... 9
80 1.1.1.1.4.2 pgoyette 3. References .................................................. 9
81 1.1.1.1.4.2 pgoyette 4. Security Considerations .................................... 10
82 1.1.1.1.4.2 pgoyette 5. Acknowledgements ........................................... 10
83 1.1.1.1.4.2 pgoyette 6. Author's Address ........................................... 10
84 1.1.1.1.4.2 pgoyette 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
85 1.1.1.1.4.2 pgoyette 8. Appendix: Sample CRC Code .................................. 11
86 1.1.1.1.4.2 pgoyette
87 1.1.1.1.4.2 pgoyette 1. Introduction
88 1.1.1.1.4.2 pgoyette
89 1.1.1.1.4.2 pgoyette 1.1. Purpose
90 1.1.1.1.4.2 pgoyette
91 1.1.1.1.4.2 pgoyette The purpose of this specification is to define a lossless
92 1.1.1.1.4.2 pgoyette compressed data format that:
93 1.1.1.1.4.2 pgoyette
94 1.1.1.1.4.2 pgoyette * Is independent of CPU type, operating system, file system,
95 1.1.1.1.4.2 pgoyette and character set, and hence can be used for interchange;
96 1.1.1.1.4.2 pgoyette * Can compress or decompress a data stream (as opposed to a
97 1.1.1.1.4.2 pgoyette randomly accessible file) to produce another data stream,
98 1.1.1.1.4.2 pgoyette using only an a priori bounded amount of intermediate
99 1.1.1.1.4.2 pgoyette storage, and hence can be used in data communications or
100 1.1.1.1.4.2 pgoyette similar structures such as Unix filters;
101 1.1.1.1.4.2 pgoyette * Compresses data with efficiency comparable to the best
102 1.1.1.1.4.2 pgoyette currently available general-purpose compression methods,
103 1.1.1.1.4.2 pgoyette and in particular considerably better than the "compress"
104 1.1.1.1.4.2 pgoyette program;
105 1.1.1.1.4.2 pgoyette * Can be implemented readily in a manner not covered by
106 1.1.1.1.4.2 pgoyette patents, and hence can be practiced freely;
107 1.1.1.1.4.2 pgoyette * Is compatible with the file format produced by the current
108 1.1.1.1.4.2 pgoyette widely used gzip utility, in that conforming decompressors
109 1.1.1.1.4.2 pgoyette will be able to read data produced by the existing gzip
110 1.1.1.1.4.2 pgoyette compressor.
111 1.1.1.1.4.2 pgoyette
112 1.1.1.1.4.2 pgoyette
113 1.1.1.1.4.2 pgoyette
114 1.1.1.1.4.2 pgoyette
115 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 2]
116 1.1.1.1.4.2 pgoyette
118 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
119 1.1.1.1.4.2 pgoyette
120 1.1.1.1.4.2 pgoyette
121 1.1.1.1.4.2 pgoyette The data format defined by this specification does not attempt to:
122 1.1.1.1.4.2 pgoyette
123 1.1.1.1.4.2 pgoyette * Provide random access to compressed data;
124 1.1.1.1.4.2 pgoyette * Compress specialized data (e.g., raster graphics) as well as
125 1.1.1.1.4.2 pgoyette the best currently available specialized algorithms.
126 1.1.1.1.4.2 pgoyette
127 1.1.1.1.4.2 pgoyette 1.2. Intended audience
128 1.1.1.1.4.2 pgoyette
129 1.1.1.1.4.2 pgoyette This specification is intended for use by implementors of software
130 1.1.1.1.4.2 pgoyette to compress data into gzip format and/or decompress data from gzip
131 1.1.1.1.4.2 pgoyette format.
132 1.1.1.1.4.2 pgoyette
133 1.1.1.1.4.2 pgoyette The text of the specification assumes a basic background in
134 1.1.1.1.4.2 pgoyette programming at the level of bits and other primitive data
135 1.1.1.1.4.2 pgoyette representations.
136 1.1.1.1.4.2 pgoyette
137 1.1.1.1.4.2 pgoyette 1.3. Scope
138 1.1.1.1.4.2 pgoyette
139 1.1.1.1.4.2 pgoyette The specification specifies a compression method and a file format
140 1.1.1.1.4.2 pgoyette (the latter assuming only that a file can store a sequence of
141 1.1.1.1.4.2 pgoyette arbitrary bytes). It does not specify any particular interface to
142 1.1.1.1.4.2 pgoyette a file system or anything about character sets or encodings
143 1.1.1.1.4.2 pgoyette (except for file names and comments, which are optional).
144 1.1.1.1.4.2 pgoyette
145 1.1.1.1.4.2 pgoyette 1.4. Compliance
146 1.1.1.1.4.2 pgoyette
147 1.1.1.1.4.2 pgoyette Unless otherwise indicated below, a compliant decompressor must be
148 1.1.1.1.4.2 pgoyette able to accept and decompress any file that conforms to all the
149 1.1.1.1.4.2 pgoyette specifications presented here; a compliant compressor must produce
150 1.1.1.1.4.2 pgoyette files that conform to all the specifications presented here. The
151 1.1.1.1.4.2 pgoyette material in the appendices is not part of the specification per se
152 1.1.1.1.4.2 pgoyette and is not relevant to compliance.
153 1.1.1.1.4.2 pgoyette
154 1.1.1.1.4.2 pgoyette 1.5. Definitions of terms and conventions used
155 1.1.1.1.4.2 pgoyette
156 1.1.1.1.4.2 pgoyette byte: 8 bits stored or transmitted as a unit (same as an octet).
157 1.1.1.1.4.2 pgoyette (For this specification, a byte is exactly 8 bits, even on
158 1.1.1.1.4.2 pgoyette machines which store a character on a number of bits different
159 1.1.1.1.4.2 pgoyette from 8.) See below for the numbering of bits within a byte.
160 1.1.1.1.4.2 pgoyette
161 1.1.1.1.4.2 pgoyette 1.6. Changes from previous versions
162 1.1.1.1.4.2 pgoyette
163 1.1.1.1.4.2 pgoyette There have been no technical changes to the gzip format since
164 1.1.1.1.4.2 pgoyette version 4.1 of this specification. In version 4.2, some
165 1.1.1.1.4.2 pgoyette terminology was changed, and the sample CRC code was rewritten for
166 1.1.1.1.4.2 pgoyette clarity and to eliminate the requirement for the caller to do pre-
167 1.1.1.1.4.2 pgoyette and post-conditioning. Version 4.3 is a conversion of the
168 1.1.1.1.4.2 pgoyette specification to RFC style.
169 1.1.1.1.4.2 pgoyette
170 1.1.1.1.4.2 pgoyette
171 1.1.1.1.4.2 pgoyette
172 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 3]
173 1.1.1.1.4.2 pgoyette
175 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
176 1.1.1.1.4.2 pgoyette
177 1.1.1.1.4.2 pgoyette
178 1.1.1.1.4.2 pgoyette 2. Detailed specification
179 1.1.1.1.4.2 pgoyette
180 1.1.1.1.4.2 pgoyette 2.1. Overall conventions
181 1.1.1.1.4.2 pgoyette
182 1.1.1.1.4.2 pgoyette In the diagrams below, a box like this:
183 1.1.1.1.4.2 pgoyette
184 1.1.1.1.4.2 pgoyette +---+
185 1.1.1.1.4.2 pgoyette | | <-- the vertical bars might be missing
186 1.1.1.1.4.2 pgoyette +---+
187 1.1.1.1.4.2 pgoyette
188 1.1.1.1.4.2 pgoyette represents one byte; a box like this:
189 1.1.1.1.4.2 pgoyette
190 1.1.1.1.4.2 pgoyette +==============+
191 1.1.1.1.4.2 pgoyette | |
192 1.1.1.1.4.2 pgoyette +==============+
193 1.1.1.1.4.2 pgoyette
194 1.1.1.1.4.2 pgoyette represents a variable number of bytes.
195 1.1.1.1.4.2 pgoyette
196 1.1.1.1.4.2 pgoyette Bytes stored within a computer do not have a "bit order", since
197 1.1.1.1.4.2 pgoyette they are always treated as a unit. However, a byte considered as
198 1.1.1.1.4.2 pgoyette an integer between 0 and 255 does have a most- and least-
199 1.1.1.1.4.2 pgoyette significant bit, and since we write numbers with the most-
200 1.1.1.1.4.2 pgoyette significant digit on the left, we also write bytes with the most-
201 1.1.1.1.4.2 pgoyette significant bit on the left. In the diagrams below, we number the
202 1.1.1.1.4.2 pgoyette bits of a byte so that bit 0 is the least-significant bit, i.e.,
203 1.1.1.1.4.2 pgoyette the bits are numbered:
204 1.1.1.1.4.2 pgoyette
205 1.1.1.1.4.2 pgoyette +--------+
206 1.1.1.1.4.2 pgoyette |76543210|
207 1.1.1.1.4.2 pgoyette +--------+
208 1.1.1.1.4.2 pgoyette
209 1.1.1.1.4.2 pgoyette This document does not address the issue of the order in which
210 1.1.1.1.4.2 pgoyette bits of a byte are transmitted on a bit-sequential medium, since
211 1.1.1.1.4.2 pgoyette the data format described here is byte- rather than bit-oriented.
212 1.1.1.1.4.2 pgoyette
213 1.1.1.1.4.2 pgoyette Within a computer, a number may occupy multiple bytes. All
214 1.1.1.1.4.2 pgoyette multi-byte numbers in the format described here are stored with
215 1.1.1.1.4.2 pgoyette the least-significant byte first (at the lower memory address).
216 1.1.1.1.4.2 pgoyette For example, the decimal number 520 is stored as:
217 1.1.1.1.4.2 pgoyette
218 1.1.1.1.4.2 pgoyette 0 1
219 1.1.1.1.4.2 pgoyette +--------+--------+
220 1.1.1.1.4.2 pgoyette |00001000|00000010|
221 1.1.1.1.4.2 pgoyette +--------+--------+
222 1.1.1.1.4.2 pgoyette ^ ^
223 1.1.1.1.4.2 pgoyette | |
224 1.1.1.1.4.2 pgoyette | + more significant byte = 2 x 256
225 1.1.1.1.4.2 pgoyette + less significant byte = 8
226 1.1.1.1.4.2 pgoyette
227 1.1.1.1.4.2 pgoyette
228 1.1.1.1.4.2 pgoyette
229 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 4]
230 1.1.1.1.4.2 pgoyette
232 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
233 1.1.1.1.4.2 pgoyette
234 1.1.1.1.4.2 pgoyette
235 1.1.1.1.4.2 pgoyette 2.2. File format
236 1.1.1.1.4.2 pgoyette
237 1.1.1.1.4.2 pgoyette A gzip file consists of a series of "members" (compressed data
238 1.1.1.1.4.2 pgoyette sets). The format of each member is specified in the following
239 1.1.1.1.4.2 pgoyette section. The members simply appear one after another in the file,
240 1.1.1.1.4.2 pgoyette with no additional information before, between, or after them.
241 1.1.1.1.4.2 pgoyette
242 1.1.1.1.4.2 pgoyette 2.3. Member format
243 1.1.1.1.4.2 pgoyette
244 1.1.1.1.4.2 pgoyette Each member has the following structure:
245 1.1.1.1.4.2 pgoyette
246 1.1.1.1.4.2 pgoyette +---+---+---+---+---+---+---+---+---+---+
247 1.1.1.1.4.2 pgoyette |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
248 1.1.1.1.4.2 pgoyette +---+---+---+---+---+---+---+---+---+---+
249 1.1.1.1.4.2 pgoyette
250 1.1.1.1.4.2 pgoyette (if FLG.FEXTRA set)
251 1.1.1.1.4.2 pgoyette
252 1.1.1.1.4.2 pgoyette +---+---+=================================+
253 1.1.1.1.4.2 pgoyette | XLEN |...XLEN bytes of "extra field"...| (more-->)
254 1.1.1.1.4.2 pgoyette +---+---+=================================+
255 1.1.1.1.4.2 pgoyette
256 1.1.1.1.4.2 pgoyette (if FLG.FNAME set)
257 1.1.1.1.4.2 pgoyette
258 1.1.1.1.4.2 pgoyette +=========================================+
259 1.1.1.1.4.2 pgoyette |...original file name, zero-terminated...| (more-->)
260 1.1.1.1.4.2 pgoyette +=========================================+
261 1.1.1.1.4.2 pgoyette
262 1.1.1.1.4.2 pgoyette (if FLG.FCOMMENT set)
263 1.1.1.1.4.2 pgoyette
264 1.1.1.1.4.2 pgoyette +===================================+
265 1.1.1.1.4.2 pgoyette |...file comment, zero-terminated...| (more-->)
266 1.1.1.1.4.2 pgoyette +===================================+
267 1.1.1.1.4.2 pgoyette
268 1.1.1.1.4.2 pgoyette (if FLG.FHCRC set)
269 1.1.1.1.4.2 pgoyette
270 1.1.1.1.4.2 pgoyette +---+---+
271 1.1.1.1.4.2 pgoyette | CRC16 |
272 1.1.1.1.4.2 pgoyette +---+---+
273 1.1.1.1.4.2 pgoyette
274 1.1.1.1.4.2 pgoyette +=======================+
275 1.1.1.1.4.2 pgoyette |...compressed blocks...| (more-->)
276 1.1.1.1.4.2 pgoyette +=======================+
277 1.1.1.1.4.2 pgoyette
278 1.1.1.1.4.2 pgoyette 0 1 2 3 4 5 6 7
279 1.1.1.1.4.2 pgoyette +---+---+---+---+---+---+---+---+
280 1.1.1.1.4.2 pgoyette | CRC32 | ISIZE |
281 1.1.1.1.4.2 pgoyette +---+---+---+---+---+---+---+---+
282 1.1.1.1.4.2 pgoyette
283 1.1.1.1.4.2 pgoyette
284 1.1.1.1.4.2 pgoyette
285 1.1.1.1.4.2 pgoyette
286 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 5]
287 1.1.1.1.4.2 pgoyette
289 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
290 1.1.1.1.4.2 pgoyette
291 1.1.1.1.4.2 pgoyette
292 1.1.1.1.4.2 pgoyette 2.3.1. Member header and trailer
293 1.1.1.1.4.2 pgoyette
294 1.1.1.1.4.2 pgoyette ID1 (IDentification 1)
295 1.1.1.1.4.2 pgoyette ID2 (IDentification 2)
296 1.1.1.1.4.2 pgoyette These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
297 1.1.1.1.4.2 pgoyette (0x8b, \213), to identify the file as being in gzip format.
298 1.1.1.1.4.2 pgoyette
299 1.1.1.1.4.2 pgoyette CM (Compression Method)
300 1.1.1.1.4.2 pgoyette This identifies the compression method used in the file. CM
301 1.1.1.1.4.2 pgoyette = 0-7 are reserved. CM = 8 denotes the "deflate"
302 1.1.1.1.4.2 pgoyette compression method, which is the one customarily used by
303 1.1.1.1.4.2 pgoyette gzip and which is documented elsewhere.
304 1.1.1.1.4.2 pgoyette
305 1.1.1.1.4.2 pgoyette FLG (FLaGs)
306 1.1.1.1.4.2 pgoyette This flag byte is divided into individual bits as follows:
307 1.1.1.1.4.2 pgoyette
308 1.1.1.1.4.2 pgoyette bit 0 FTEXT
309 1.1.1.1.4.2 pgoyette bit 1 FHCRC
310 1.1.1.1.4.2 pgoyette bit 2 FEXTRA
311 1.1.1.1.4.2 pgoyette bit 3 FNAME
312 1.1.1.1.4.2 pgoyette bit 4 FCOMMENT
313 1.1.1.1.4.2 pgoyette bit 5 reserved
314 1.1.1.1.4.2 pgoyette bit 6 reserved
315 1.1.1.1.4.2 pgoyette bit 7 reserved
316 1.1.1.1.4.2 pgoyette
317 1.1.1.1.4.2 pgoyette If FTEXT is set, the file is probably ASCII text. This is
318 1.1.1.1.4.2 pgoyette an optional indication, which the compressor may set by
319 1.1.1.1.4.2 pgoyette checking a small amount of the input data to see whether any
320 1.1.1.1.4.2 pgoyette non-ASCII characters are present. In case of doubt, FTEXT
321 1.1.1.1.4.2 pgoyette is cleared, indicating binary data. For systems which have
322 1.1.1.1.4.2 pgoyette different file formats for ascii text and binary data, the
323 1.1.1.1.4.2 pgoyette decompressor can use FTEXT to choose the appropriate format.
324 1.1.1.1.4.2 pgoyette We deliberately do not specify the algorithm used to set
325 1.1.1.1.4.2 pgoyette this bit, since a compressor always has the option of
326 1.1.1.1.4.2 pgoyette leaving it cleared and a decompressor always has the option
327 1.1.1.1.4.2 pgoyette of ignoring it and letting some other program handle issues
328 1.1.1.1.4.2 pgoyette of data conversion.
329 1.1.1.1.4.2 pgoyette
330 1.1.1.1.4.2 pgoyette If FHCRC is set, a CRC16 for the gzip header is present,
331 1.1.1.1.4.2 pgoyette immediately before the compressed data. The CRC16 consists
332 1.1.1.1.4.2 pgoyette of the two least significant bytes of the CRC32 for all
333 1.1.1.1.4.2 pgoyette bytes of the gzip header up to and not including the CRC16.
334 1.1.1.1.4.2 pgoyette [The FHCRC bit was never set by versions of gzip up to
335 1.1.1.1.4.2 pgoyette 1.2.4, even though it was documented with a different
336 1.1.1.1.4.2 pgoyette meaning in gzip 1.2.4.]
337 1.1.1.1.4.2 pgoyette
338 1.1.1.1.4.2 pgoyette If FEXTRA is set, optional extra fields are present, as
339 1.1.1.1.4.2 pgoyette described in a following section.
340 1.1.1.1.4.2 pgoyette
341 1.1.1.1.4.2 pgoyette
342 1.1.1.1.4.2 pgoyette
343 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 6]
344 1.1.1.1.4.2 pgoyette
346 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
347 1.1.1.1.4.2 pgoyette
348 1.1.1.1.4.2 pgoyette
349 1.1.1.1.4.2 pgoyette If FNAME is set, an original file name is present,
350 1.1.1.1.4.2 pgoyette terminated by a zero byte. The name must consist of ISO
351 1.1.1.1.4.2 pgoyette 8859-1 (LATIN-1) characters; on operating systems using
352 1.1.1.1.4.2 pgoyette EBCDIC or any other character set for file names, the name
353 1.1.1.1.4.2 pgoyette must be translated to the ISO LATIN-1 character set. This
354 1.1.1.1.4.2 pgoyette is the original name of the file being compressed, with any
355 1.1.1.1.4.2 pgoyette directory components removed, and, if the file being
356 1.1.1.1.4.2 pgoyette compressed is on a file system with case insensitive names,
357 1.1.1.1.4.2 pgoyette forced to lower case. There is no original file name if the
358 1.1.1.1.4.2 pgoyette data was compressed from a source other than a named file;
359 1.1.1.1.4.2 pgoyette for example, if the source was stdin on a Unix system, there
360 1.1.1.1.4.2 pgoyette is no file name.
361 1.1.1.1.4.2 pgoyette
362 1.1.1.1.4.2 pgoyette If FCOMMENT is set, a zero-terminated file comment is
363 1.1.1.1.4.2 pgoyette present. This comment is not interpreted; it is only
364 1.1.1.1.4.2 pgoyette intended for human consumption. The comment must consist of
365 1.1.1.1.4.2 pgoyette ISO 8859-1 (LATIN-1) characters. Line breaks should be
366 1.1.1.1.4.2 pgoyette denoted by a single line feed character (10 decimal).
367 1.1.1.1.4.2 pgoyette
368 1.1.1.1.4.2 pgoyette Reserved FLG bits must be zero.
369 1.1.1.1.4.2 pgoyette
370 1.1.1.1.4.2 pgoyette MTIME (Modification TIME)
371 1.1.1.1.4.2 pgoyette This gives the most recent modification time of the original
372 1.1.1.1.4.2 pgoyette file being compressed. The time is in Unix format, i.e.,
373 1.1.1.1.4.2 pgoyette seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
374 1.1.1.1.4.2 pgoyette may cause problems for MS-DOS and other systems that use
375 1.1.1.1.4.2 pgoyette local rather than Universal time.) If the compressed data
376 1.1.1.1.4.2 pgoyette did not come from a file, MTIME is set to the time at which
377 1.1.1.1.4.2 pgoyette compression started. MTIME = 0 means no time stamp is
378 1.1.1.1.4.2 pgoyette available.
379 1.1.1.1.4.2 pgoyette
380 1.1.1.1.4.2 pgoyette XFL (eXtra FLags)
381 1.1.1.1.4.2 pgoyette These flags are available for use by specific compression
382 1.1.1.1.4.2 pgoyette methods. The "deflate" method (CM = 8) sets these flags as
383 1.1.1.1.4.2 pgoyette follows:
384 1.1.1.1.4.2 pgoyette
385 1.1.1.1.4.2 pgoyette XFL = 2 - compressor used maximum compression,
386 1.1.1.1.4.2 pgoyette slowest algorithm
387 1.1.1.1.4.2 pgoyette XFL = 4 - compressor used fastest algorithm
388 1.1.1.1.4.2 pgoyette
389 1.1.1.1.4.2 pgoyette OS (Operating System)
390 1.1.1.1.4.2 pgoyette This identifies the type of file system on which compression
391 1.1.1.1.4.2 pgoyette took place. This may be useful in determining end-of-line
392 1.1.1.1.4.2 pgoyette convention for text files. The currently defined values are
393 1.1.1.1.4.2 pgoyette as follows:
394 1.1.1.1.4.2 pgoyette
395 1.1.1.1.4.2 pgoyette
396 1.1.1.1.4.2 pgoyette
397 1.1.1.1.4.2 pgoyette
398 1.1.1.1.4.2 pgoyette
399 1.1.1.1.4.2 pgoyette
400 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 7]
401 1.1.1.1.4.2 pgoyette
403 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
404 1.1.1.1.4.2 pgoyette
405 1.1.1.1.4.2 pgoyette
406 1.1.1.1.4.2 pgoyette 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
407 1.1.1.1.4.2 pgoyette 1 - Amiga
408 1.1.1.1.4.2 pgoyette 2 - VMS (or OpenVMS)
409 1.1.1.1.4.2 pgoyette 3 - Unix
410 1.1.1.1.4.2 pgoyette 4 - VM/CMS
411 1.1.1.1.4.2 pgoyette 5 - Atari TOS
412 1.1.1.1.4.2 pgoyette 6 - HPFS filesystem (OS/2, NT)
413 1.1.1.1.4.2 pgoyette 7 - Macintosh
414 1.1.1.1.4.2 pgoyette 8 - Z-System
415 1.1.1.1.4.2 pgoyette 9 - CP/M
416 1.1.1.1.4.2 pgoyette 10 - TOPS-20
417 1.1.1.1.4.2 pgoyette 11 - NTFS filesystem (NT)
418 1.1.1.1.4.2 pgoyette 12 - QDOS
419 1.1.1.1.4.2 pgoyette 13 - Acorn RISCOS
420 1.1.1.1.4.2 pgoyette 255 - unknown
421 1.1.1.1.4.2 pgoyette
422 1.1.1.1.4.2 pgoyette XLEN (eXtra LENgth)
423 1.1.1.1.4.2 pgoyette If FLG.FEXTRA is set, this gives the length of the optional
424 1.1.1.1.4.2 pgoyette extra field. See below for details.
425 1.1.1.1.4.2 pgoyette
426 1.1.1.1.4.2 pgoyette CRC32 (CRC-32)
427 1.1.1.1.4.2 pgoyette This contains a Cyclic Redundancy Check value of the
428 1.1.1.1.4.2 pgoyette uncompressed data computed according to CRC-32 algorithm
429 1.1.1.1.4.2 pgoyette used in the ISO 3309 standard and in section 8.1.1.6.2 of
430 1.1.1.1.4.2 pgoyette ITU-T recommendation V.42. (See http://www.iso.ch for
431 1.1.1.1.4.2 pgoyette ordering ISO documents. See gopher://info.itu.ch for an
432 1.1.1.1.4.2 pgoyette online version of ITU-T V.42.)
433 1.1.1.1.4.2 pgoyette
434 1.1.1.1.4.2 pgoyette ISIZE (Input SIZE)
435 1.1.1.1.4.2 pgoyette This contains the size of the original (uncompressed) input
436 1.1.1.1.4.2 pgoyette data modulo 2^32.
437 1.1.1.1.4.2 pgoyette
438 1.1.1.1.4.2 pgoyette 2.3.1.1. Extra field
439 1.1.1.1.4.2 pgoyette
440 1.1.1.1.4.2 pgoyette If the FLG.FEXTRA bit is set, an "extra field" is present in
441 1.1.1.1.4.2 pgoyette the header, with total length XLEN bytes. It consists of a
442 1.1.1.1.4.2 pgoyette series of subfields, each of the form:
443 1.1.1.1.4.2 pgoyette
444 1.1.1.1.4.2 pgoyette +---+---+---+---+==================================+
445 1.1.1.1.4.2 pgoyette |SI1|SI2| LEN |... LEN bytes of subfield data ...|
446 1.1.1.1.4.2 pgoyette +---+---+---+---+==================================+
447 1.1.1.1.4.2 pgoyette
448 1.1.1.1.4.2 pgoyette SI1 and SI2 provide a subfield ID, typically two ASCII letters
449 1.1.1.1.4.2 pgoyette with some mnemonic value. Jean-Loup Gailly
450 1.1.1.1.4.2 pgoyette <gzip (a] prep.ai.mit.edu> is maintaining a registry of subfield
451 1.1.1.1.4.2 pgoyette IDs; please send him any subfield ID you wish to use. Subfield
452 1.1.1.1.4.2 pgoyette IDs with SI2 = 0 are reserved for future use. The following
453 1.1.1.1.4.2 pgoyette IDs are currently defined:
454 1.1.1.1.4.2 pgoyette
455 1.1.1.1.4.2 pgoyette
456 1.1.1.1.4.2 pgoyette
457 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 8]
458 1.1.1.1.4.2 pgoyette
460 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
461 1.1.1.1.4.2 pgoyette
462 1.1.1.1.4.2 pgoyette
463 1.1.1.1.4.2 pgoyette SI1 SI2 Data
464 1.1.1.1.4.2 pgoyette ---------- ---------- ----
465 1.1.1.1.4.2 pgoyette 0x41 ('A') 0x70 ('P') Apollo file type information
466 1.1.1.1.4.2 pgoyette
467 1.1.1.1.4.2 pgoyette LEN gives the length of the subfield data, excluding the 4
468 1.1.1.1.4.2 pgoyette initial bytes.
469 1.1.1.1.4.2 pgoyette
470 1.1.1.1.4.2 pgoyette 2.3.1.2. Compliance
471 1.1.1.1.4.2 pgoyette
472 1.1.1.1.4.2 pgoyette A compliant compressor must produce files with correct ID1,
473 1.1.1.1.4.2 pgoyette ID2, CM, CRC32, and ISIZE, but may set all the other fields in
474 1.1.1.1.4.2 pgoyette the fixed-length part of the header to default values (255 for
475 1.1.1.1.4.2 pgoyette OS, 0 for all others). The compressor must set all reserved
476 1.1.1.1.4.2 pgoyette bits to zero.
477 1.1.1.1.4.2 pgoyette
478 1.1.1.1.4.2 pgoyette A compliant decompressor must check ID1, ID2, and CM, and
479 1.1.1.1.4.2 pgoyette provide an error indication if any of these have incorrect
480 1.1.1.1.4.2 pgoyette values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
481 1.1.1.1.4.2 pgoyette at least so it can skip over the optional fields if they are
482 1.1.1.1.4.2 pgoyette present. It need not examine any other part of the header or
483 1.1.1.1.4.2 pgoyette trailer; in particular, a decompressor may ignore FTEXT and OS
484 1.1.1.1.4.2 pgoyette and always produce binary output, and still be compliant. A
485 1.1.1.1.4.2 pgoyette compliant decompressor must give an error indication if any
486 1.1.1.1.4.2 pgoyette reserved bit is non-zero, since such a bit could indicate the
487 1.1.1.1.4.2 pgoyette presence of a new field that would cause subsequent data to be
488 1.1.1.1.4.2 pgoyette interpreted incorrectly.
489 1.1.1.1.4.2 pgoyette
490 1.1.1.1.4.2 pgoyette 3. References
491 1.1.1.1.4.2 pgoyette
492 1.1.1.1.4.2 pgoyette [1] "Information Processing - 8-bit single-byte coded graphic
493 1.1.1.1.4.2 pgoyette character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
494 1.1.1.1.4.2 pgoyette The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
495 1.1.1.1.4.2 pgoyette ASCII. Files defining this character set are available as
496 1.1.1.1.4.2 pgoyette iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
497 1.1.1.1.4.2 pgoyette
498 1.1.1.1.4.2 pgoyette [2] ISO 3309
499 1.1.1.1.4.2 pgoyette
500 1.1.1.1.4.2 pgoyette [3] ITU-T recommendation V.42
501 1.1.1.1.4.2 pgoyette
502 1.1.1.1.4.2 pgoyette [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
503 1.1.1.1.4.2 pgoyette available in ftp://ftp.uu.net/pub/archiving/zip/doc/
504 1.1.1.1.4.2 pgoyette
505 1.1.1.1.4.2 pgoyette [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
506 1.1.1.1.4.2 pgoyette ftp://prep.ai.mit.edu/pub/gnu/
507 1.1.1.1.4.2 pgoyette
508 1.1.1.1.4.2 pgoyette [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
509 1.1.1.1.4.2 pgoyette Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
510 1.1.1.1.4.2 pgoyette
511 1.1.1.1.4.2 pgoyette
512 1.1.1.1.4.2 pgoyette
513 1.1.1.1.4.2 pgoyette
514 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 9]
515 1.1.1.1.4.2 pgoyette
517 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
518 1.1.1.1.4.2 pgoyette
519 1.1.1.1.4.2 pgoyette
520 1.1.1.1.4.2 pgoyette [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
521 1.1.1.1.4.2 pgoyette pp.118-133.
522 1.1.1.1.4.2 pgoyette
523 1.1.1.1.4.2 pgoyette [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
524 1.1.1.1.4.2 pgoyette describing the CRC concept.
525 1.1.1.1.4.2 pgoyette
526 1.1.1.1.4.2 pgoyette 4. Security Considerations
527 1.1.1.1.4.2 pgoyette
528 1.1.1.1.4.2 pgoyette Any data compression method involves the reduction of redundancy in
529 1.1.1.1.4.2 pgoyette the data. Consequently, any corruption of the data is likely to have
530 1.1.1.1.4.2 pgoyette severe effects and be difficult to correct. Uncompressed text, on
531 1.1.1.1.4.2 pgoyette the other hand, will probably still be readable despite the presence
532 1.1.1.1.4.2 pgoyette of some corrupted bytes.
533 1.1.1.1.4.2 pgoyette
534 1.1.1.1.4.2 pgoyette It is recommended that systems using this data format provide some
535 1.1.1.1.4.2 pgoyette means of validating the integrity of the compressed data, such as by
536 1.1.1.1.4.2 pgoyette setting and checking the CRC-32 check value.
537 1.1.1.1.4.2 pgoyette
538 1.1.1.1.4.2 pgoyette 5. Acknowledgements
539 1.1.1.1.4.2 pgoyette
540 1.1.1.1.4.2 pgoyette Trademarks cited in this document are the property of their
541 1.1.1.1.4.2 pgoyette respective owners.
542 1.1.1.1.4.2 pgoyette
543 1.1.1.1.4.2 pgoyette Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
544 1.1.1.1.4.2 pgoyette the related software described in this specification. Glenn
545 1.1.1.1.4.2 pgoyette Randers-Pehrson converted this document to RFC and HTML format.
546 1.1.1.1.4.2 pgoyette
547 1.1.1.1.4.2 pgoyette 6. Author's Address
548 1.1.1.1.4.2 pgoyette
549 1.1.1.1.4.2 pgoyette L. Peter Deutsch
550 1.1.1.1.4.2 pgoyette Aladdin Enterprises
551 1.1.1.1.4.2 pgoyette 203 Santa Margarita Ave.
552 1.1.1.1.4.2 pgoyette Menlo Park, CA 94025
553 1.1.1.1.4.2 pgoyette
554 1.1.1.1.4.2 pgoyette Phone: (415) 322-0103 (AM only)
555 1.1.1.1.4.2 pgoyette FAX: (415) 322-1734
556 1.1.1.1.4.2 pgoyette EMail: <ghost (a] aladdin.com>
557 1.1.1.1.4.2 pgoyette
558 1.1.1.1.4.2 pgoyette Questions about the technical content of this specification can be
559 1.1.1.1.4.2 pgoyette sent by email to:
560 1.1.1.1.4.2 pgoyette
561 1.1.1.1.4.2 pgoyette Jean-Loup Gailly <gzip (a] prep.ai.mit.edu> and
562 1.1.1.1.4.2 pgoyette Mark Adler <madler (a] alumni.caltech.edu>
563 1.1.1.1.4.2 pgoyette
564 1.1.1.1.4.2 pgoyette Editorial comments on this specification can be sent by email to:
565 1.1.1.1.4.2 pgoyette
566 1.1.1.1.4.2 pgoyette L. Peter Deutsch <ghost (a] aladdin.com> and
567 1.1.1.1.4.2 pgoyette Glenn Randers-Pehrson <randeg (a] alumni.rpi.edu>
568 1.1.1.1.4.2 pgoyette
569 1.1.1.1.4.2 pgoyette
570 1.1.1.1.4.2 pgoyette
571 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 10]
572 1.1.1.1.4.2 pgoyette
574 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
575 1.1.1.1.4.2 pgoyette
576 1.1.1.1.4.2 pgoyette
577 1.1.1.1.4.2 pgoyette 7. Appendix: Jean-Loup Gailly's gzip utility
578 1.1.1.1.4.2 pgoyette
579 1.1.1.1.4.2 pgoyette The most widely used implementation of gzip compression, and the
580 1.1.1.1.4.2 pgoyette original documentation on which this specification is based, were
581 1.1.1.1.4.2 pgoyette created by Jean-Loup Gailly <gzip (a] prep.ai.mit.edu>. Since this
582 1.1.1.1.4.2 pgoyette implementation is a de facto standard, we mention some more of its
583 1.1.1.1.4.2 pgoyette features here. Again, the material in this section is not part of
584 1.1.1.1.4.2 pgoyette the specification per se, and implementations need not follow it to
585 1.1.1.1.4.2 pgoyette be compliant.
586 1.1.1.1.4.2 pgoyette
587 1.1.1.1.4.2 pgoyette When compressing or decompressing a file, gzip preserves the
588 1.1.1.1.4.2 pgoyette protection, ownership, and modification time attributes on the local
589 1.1.1.1.4.2 pgoyette file system, since there is no provision for representing protection
590 1.1.1.1.4.2 pgoyette attributes in the gzip file format itself. Since the file format
591 1.1.1.1.4.2 pgoyette includes a modification time, the gzip decompressor provides a
592 1.1.1.1.4.2 pgoyette command line switch that assigns the modification time from the file,
593 1.1.1.1.4.2 pgoyette rather than the local modification time of the compressed input, to
594 1.1.1.1.4.2 pgoyette the decompressed output.
595 1.1.1.1.4.2 pgoyette
596 1.1.1.1.4.2 pgoyette 8. Appendix: Sample CRC Code
597 1.1.1.1.4.2 pgoyette
598 1.1.1.1.4.2 pgoyette The following sample code represents a practical implementation of
599 1.1.1.1.4.2 pgoyette the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
600 1.1.1.1.4.2 pgoyette for a formal specification.)
601 1.1.1.1.4.2 pgoyette
602 1.1.1.1.4.2 pgoyette The sample code is in the ANSI C programming language. Non C users
603 1.1.1.1.4.2 pgoyette may find it easier to read with these hints:
604 1.1.1.1.4.2 pgoyette
605 1.1.1.1.4.2 pgoyette & Bitwise AND operator.
606 1.1.1.1.4.2 pgoyette ^ Bitwise exclusive-OR operator.
607 1.1.1.1.4.2 pgoyette >> Bitwise right shift operator. When applied to an
608 1.1.1.1.4.2 pgoyette unsigned quantity, as here, right shift inserts zero
609 1.1.1.1.4.2 pgoyette bit(s) at the left.
610 1.1.1.1.4.2 pgoyette ! Logical NOT operator.
611 1.1.1.1.4.2 pgoyette ++ "n++" increments the variable n.
612 1.1.1.1.4.2 pgoyette 0xNNN 0x introduces a hexadecimal (base 16) constant.
613 1.1.1.1.4.2 pgoyette Suffix L indicates a long value (at least 32 bits).
614 1.1.1.1.4.2 pgoyette
615 1.1.1.1.4.2 pgoyette /* Table of CRCs of all 8-bit messages. */
616 1.1.1.1.4.2 pgoyette unsigned long crc_table[256];
617 1.1.1.1.4.2 pgoyette
618 1.1.1.1.4.2 pgoyette /* Flag: has the table been computed? Initially false. */
619 1.1.1.1.4.2 pgoyette int crc_table_computed = 0;
620 1.1.1.1.4.2 pgoyette
621 1.1.1.1.4.2 pgoyette /* Make the table for a fast CRC. */
622 1.1.1.1.4.2 pgoyette void make_crc_table(void)
623 1.1.1.1.4.2 pgoyette {
624 1.1.1.1.4.2 pgoyette unsigned long c;
625 1.1.1.1.4.2 pgoyette
626 1.1.1.1.4.2 pgoyette
627 1.1.1.1.4.2 pgoyette
628 1.1.1.1.4.2 pgoyette Deutsch Informational [Page 11]
629 1.1.1.1.4.2 pgoyette
631 1.1.1.1.4.2 pgoyette RFC 1952 GZIP File Format Specification May 1996
632 1.1.1.1.4.2 pgoyette
633 1.1.1.1.4.2 pgoyette
634 1.1.1.1.4.2 pgoyette int n, k;
635 1.1.1.1.4.2 pgoyette for (n = 0; n < 256; n++) {
636 1.1.1.1.4.2 pgoyette c = (unsigned long) n;
637 1.1.1.1.4.2 pgoyette for (k = 0; k < 8; k++) {
638 1.1.1.1.4.2 pgoyette if (c & 1) {
639 1.1.1.1.4.2 pgoyette c = 0xedb88320L ^ (c >> 1);
640 1.1.1.1.4.2 pgoyette } else {
641 1.1.1.1.4.2 pgoyette c = c >> 1;
642 1.1.1.1.4.2 pgoyette }
643 1.1.1.1.4.2 pgoyette }
644 1.1.1.1.4.2 pgoyette crc_table[n] = c;
645 1.1.1.1.4.2 pgoyette }
646 1.1.1.1.4.2 pgoyette crc_table_computed = 1;
647 1.1.1.1.4.2 pgoyette }
648 1.1.1.1.4.2 pgoyette
649 1.1.1.1.4.2 pgoyette /*
650 1.1.1.1.4.2 pgoyette Update a running crc with the bytes buf[0..len-1] and return
651 1.1.1.1.4.2 pgoyette the updated crc. The crc should be initialized to zero. Pre- and
652 1.1.1.1.4.2 pgoyette post-conditioning (one's complement) is performed within this
653 1.1.1.1.4.2 pgoyette function so it shouldn't be done by the caller. Usage example:
654 1.1.1.1.4.2 pgoyette
655 1.1.1.1.4.2 pgoyette unsigned long crc = 0L;
656 1.1.1.1.4.2 pgoyette
657 1.1.1.1.4.2 pgoyette while (read_buffer(buffer, length) != EOF) {
658 1.1.1.1.4.2 pgoyette crc = update_crc(crc, buffer, length);
659 1.1.1.1.4.2 pgoyette }
660 1.1.1.1.4.2 pgoyette if (crc != original_crc) error();
661 1.1.1.1.4.2 pgoyette */
662 1.1.1.1.4.2 pgoyette unsigned long update_crc(unsigned long crc,
663 1.1.1.1.4.2 pgoyette unsigned char *buf, int len)
664 1.1.1.1.4.2 pgoyette {
665 1.1.1.1.4.2 pgoyette unsigned long c = crc ^ 0xffffffffL;
666 1.1.1.1.4.2 pgoyette int n;
667 1.1.1.1.4.2 pgoyette
668 1.1.1.1.4.2 pgoyette if (!crc_table_computed)
669 1.1.1.1.4.2 pgoyette make_crc_table();
670 1.1.1.1.4.2 pgoyette for (n = 0; n < len; n++) {
671 1.1.1.1.4.2 pgoyette c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
672 1.1.1.1.4.2 pgoyette }
673 1.1.1.1.4.2 pgoyette return c ^ 0xffffffffL;
674 1.1.1.1.4.2 pgoyette }
675 1.1.1.1.4.2 pgoyette
676 /* Return the CRC of the bytes buf[0..len-1]. */
677 unsigned long crc(unsigned char *buf, int len)
678 {
679 return update_crc(0L, buf, len);
680 }
681
682
683
684
685 Deutsch Informational [Page 12]
686
688