1 1.1 christos 2 1.1 christos 3 1.1 christos 4 1.1 christos 5 1.1 christos 6 1.1 christos 7 1.1 christos Network Working Group P. Deutsch 8 1.1 christos Request for Comments: 1952 Aladdin Enterprises 9 1.1 christos Category: Informational May 1996 10 1.1 christos 11 1.1 christos 12 1.1 christos GZIP file format specification version 4.3 13 1.1 christos 14 1.1 christos Status of This Memo 15 1.1 christos 16 1.1 christos This memo provides information for the Internet community. This memo 17 1.1 christos does not specify an Internet standard of any kind. Distribution of 18 1.1 christos this memo is unlimited. 19 1.1 christos 20 1.1 christos IESG Note: 21 1.1 christos 22 1.1 christos The IESG takes no position on the validity of any Intellectual 23 1.1 christos Property Rights statements contained in this document. 24 1.1 christos 25 1.1 christos Notices 26 1.1 christos 27 1.1 christos Copyright (c) 1996 L. Peter Deutsch 28 1.1 christos 29 1.1 christos Permission is granted to copy and distribute this document for any 30 1.1 christos purpose and without charge, including translations into other 31 1.1 christos languages and incorporation into compilations, provided that the 32 1.1 christos copyright notice and this notice are preserved, and that any 33 1.1 christos substantive changes or deletions from the original are clearly 34 1.1 christos marked. 35 1.1 christos 36 1.1 christos A pointer to the latest version of this and related documentation in 37 1.1 christos HTML format can be found at the URL 38 1.1 christos <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>. 39 1.1 christos 40 1.1 christos Abstract 41 1.1 christos 42 1.1 christos This specification defines a lossless compressed data format that is 43 1.1 christos compatible with the widely used GZIP utility. The format includes a 44 1.1 christos cyclic redundancy check value for detecting data corruption. The 45 1.1 christos format presently uses the DEFLATE method of compression but can be 46 1.1 christos easily extended to use other compression methods. The format can be 47 1.1 christos implemented readily in a manner not covered by patents. 48 1.1 christos 49 1.1 christos 50 1.1 christos 51 1.1 christos 52 1.1 christos 53 1.1 christos 54 1.1 christos 55 1.1 christos 56 1.1 christos 57 1.1 christos 58 1.1 christos Deutsch Informational [Page 1] 59 1.1 christos 61 1.1 christos RFC 1952 GZIP File Format Specification May 1996 62 1.1 christos 63 1.1 christos 64 1.1 christos Table of Contents 65 1.1 christos 66 1.1 christos 1. Introduction ................................................... 2 67 1.1 christos 1.1. Purpose ................................................... 2 68 1.1 christos 1.2. Intended audience ......................................... 3 69 1.1 christos 1.3. Scope ..................................................... 3 70 1.1 christos 1.4. Compliance ................................................ 3 71 1.1 christos 1.5. Definitions of terms and conventions used ................. 3 72 1.1 christos 1.6. Changes from previous versions ............................ 3 73 1.1 christos 2. Detailed specification ......................................... 4 74 1.1 christos 2.1. Overall conventions ....................................... 4 75 1.1 christos 2.2. File format ............................................... 5 76 1.1 christos 2.3. Member format ............................................. 5 77 1.1 christos 2.3.1. Member header and trailer ........................... 6 78 1.1 christos 2.3.1.1. Extra field ................................... 8 79 1.1 christos 2.3.1.2. Compliance .................................... 9 80 1.1 christos 3. References .................................................. 9 81 1.1 christos 4. Security Considerations .................................... 10 82 1.1 christos 5. Acknowledgements ........................................... 10 83 1.1 christos 6. Author's Address ........................................... 10 84 1.1 christos 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11 85 1.1 christos 8. Appendix: Sample CRC Code .................................. 11 86 1.1 christos 87 1.1 christos 1. Introduction 88 1.1 christos 89 1.1 christos 1.1. Purpose 90 1.1 christos 91 1.1 christos The purpose of this specification is to define a lossless 92 1.1 christos compressed data format that: 93 1.1 christos 94 1.1 christos * Is independent of CPU type, operating system, file system, 95 1.1 christos and character set, and hence can be used for interchange; 96 1.1 christos * Can compress or decompress a data stream (as opposed to a 97 1.1 christos randomly accessible file) to produce another data stream, 98 1.1 christos using only an a priori bounded amount of intermediate 99 1.1 christos storage, and hence can be used in data communications or 100 1.1 christos similar structures such as Unix filters; 101 1.1 christos * Compresses data with efficiency comparable to the best 102 1.1 christos currently available general-purpose compression methods, 103 1.1 christos and in particular considerably better than the "compress" 104 1.1 christos program; 105 1.1 christos * Can be implemented readily in a manner not covered by 106 1.1 christos patents, and hence can be practiced freely; 107 1.1 christos * Is compatible with the file format produced by the current 108 1.1 christos widely used gzip utility, in that conforming decompressors 109 1.1 christos will be able to read data produced by the existing gzip 110 1.1 christos compressor. 111 1.1 christos 112 1.1 christos 113 1.1 christos 114 1.1 christos 115 1.1 christos Deutsch Informational [Page 2] 116 1.1 christos 118 1.1 christos RFC 1952 GZIP File Format Specification May 1996 119 1.1 christos 120 1.1 christos 121 1.1 christos The data format defined by this specification does not attempt to: 122 1.1 christos 123 1.1 christos * Provide random access to compressed data; 124 1.1 christos * Compress specialized data (e.g., raster graphics) as well as 125 1.1 christos the best currently available specialized algorithms. 126 1.1 christos 127 1.1 christos 1.2. Intended audience 128 1.1 christos 129 1.1 christos This specification is intended for use by implementors of software 130 1.1 christos to compress data into gzip format and/or decompress data from gzip 131 1.1 christos format. 132 1.1 christos 133 1.1 christos The text of the specification assumes a basic background in 134 1.1 christos programming at the level of bits and other primitive data 135 1.1 christos representations. 136 1.1 christos 137 1.1 christos 1.3. Scope 138 1.1 christos 139 1.1 christos The specification specifies a compression method and a file format 140 1.1 christos (the latter assuming only that a file can store a sequence of 141 1.1 christos arbitrary bytes). It does not specify any particular interface to 142 1.1 christos a file system or anything about character sets or encodings 143 1.1 christos (except for file names and comments, which are optional). 144 1.1 christos 145 1.1 christos 1.4. Compliance 146 1.1 christos 147 1.1 christos Unless otherwise indicated below, a compliant decompressor must be 148 1.1 christos able to accept and decompress any file that conforms to all the 149 1.1 christos specifications presented here; a compliant compressor must produce 150 1.1 christos files that conform to all the specifications presented here. The 151 1.1 christos material in the appendices is not part of the specification per se 152 1.1 christos and is not relevant to compliance. 153 1.1 christos 154 1.1 christos 1.5. Definitions of terms and conventions used 155 1.1 christos 156 1.1 christos byte: 8 bits stored or transmitted as a unit (same as an octet). 157 1.1 christos (For this specification, a byte is exactly 8 bits, even on 158 1.1 christos machines which store a character on a number of bits different 159 1.1 christos from 8.) See below for the numbering of bits within a byte. 160 1.1 christos 161 1.1 christos 1.6. Changes from previous versions 162 1.1 christos 163 1.1 christos There have been no technical changes to the gzip format since 164 1.1 christos version 4.1 of this specification. In version 4.2, some 165 1.1 christos terminology was changed, and the sample CRC code was rewritten for 166 1.1 christos clarity and to eliminate the requirement for the caller to do pre- 167 1.1 christos and post-conditioning. Version 4.3 is a conversion of the 168 1.1 christos specification to RFC style. 169 1.1 christos 170 1.1 christos 171 1.1 christos 172 1.1 christos Deutsch Informational [Page 3] 173 1.1 christos 175 1.1 christos RFC 1952 GZIP File Format Specification May 1996 176 1.1 christos 177 1.1 christos 178 1.1 christos 2. Detailed specification 179 1.1 christos 180 1.1 christos 2.1. Overall conventions 181 1.1 christos 182 1.1 christos In the diagrams below, a box like this: 183 1.1 christos 184 1.1 christos +---+ 185 1.1 christos | | <-- the vertical bars might be missing 186 1.1 christos +---+ 187 1.1 christos 188 1.1 christos represents one byte; a box like this: 189 1.1 christos 190 1.1 christos +==============+ 191 1.1 christos | | 192 1.1 christos +==============+ 193 1.1 christos 194 1.1 christos represents a variable number of bytes. 195 1.1 christos 196 1.1 christos Bytes stored within a computer do not have a "bit order", since 197 1.1 christos they are always treated as a unit. However, a byte considered as 198 1.1 christos an integer between 0 and 255 does have a most- and least- 199 1.1 christos significant bit, and since we write numbers with the most- 200 1.1 christos significant digit on the left, we also write bytes with the most- 201 1.1 christos significant bit on the left. In the diagrams below, we number the 202 1.1 christos bits of a byte so that bit 0 is the least-significant bit, i.e., 203 1.1 christos the bits are numbered: 204 1.1 christos 205 1.1 christos +--------+ 206 1.1 christos |76543210| 207 1.1 christos +--------+ 208 1.1 christos 209 1.1 christos This document does not address the issue of the order in which 210 1.1 christos bits of a byte are transmitted on a bit-sequential medium, since 211 1.1 christos the data format described here is byte- rather than bit-oriented. 212 1.1 christos 213 1.1 christos Within a computer, a number may occupy multiple bytes. All 214 1.1 christos multi-byte numbers in the format described here are stored with 215 1.1 christos the least-significant byte first (at the lower memory address). 216 1.1 christos For example, the decimal number 520 is stored as: 217 1.1 christos 218 1.1 christos 0 1 219 1.1 christos +--------+--------+ 220 1.1 christos |00001000|00000010| 221 1.1 christos +--------+--------+ 222 1.1 christos ^ ^ 223 1.1 christos | | 224 1.1 christos | + more significant byte = 2 x 256 225 1.1 christos + less significant byte = 8 226 1.1 christos 227 1.1 christos 228 1.1 christos 229 1.1 christos Deutsch Informational [Page 4] 230 1.1 christos 232 1.1 christos RFC 1952 GZIP File Format Specification May 1996 233 1.1 christos 234 1.1 christos 235 1.1 christos 2.2. File format 236 1.1 christos 237 1.1 christos A gzip file consists of a series of "members" (compressed data 238 1.1 christos sets). The format of each member is specified in the following 239 1.1 christos section. The members simply appear one after another in the file, 240 1.1 christos with no additional information before, between, or after them. 241 1.1 christos 242 1.1 christos 2.3. Member format 243 1.1 christos 244 1.1 christos Each member has the following structure: 245 1.1 christos 246 1.1 christos +---+---+---+---+---+---+---+---+---+---+ 247 1.1 christos |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->) 248 1.1 christos +---+---+---+---+---+---+---+---+---+---+ 249 1.1 christos 250 1.1 christos (if FLG.FEXTRA set) 251 1.1 christos 252 1.1 christos +---+---+=================================+ 253 1.1 christos | XLEN |...XLEN bytes of "extra field"...| (more-->) 254 1.1 christos +---+---+=================================+ 255 1.1 christos 256 1.1 christos (if FLG.FNAME set) 257 1.1 christos 258 1.1 christos +=========================================+ 259 1.1 christos |...original file name, zero-terminated...| (more-->) 260 1.1 christos +=========================================+ 261 1.1 christos 262 1.1 christos (if FLG.FCOMMENT set) 263 1.1 christos 264 1.1 christos +===================================+ 265 1.1 christos |...file comment, zero-terminated...| (more-->) 266 1.1 christos +===================================+ 267 1.1 christos 268 1.1 christos (if FLG.FHCRC set) 269 1.1 christos 270 1.1 christos +---+---+ 271 1.1 christos | CRC16 | 272 1.1 christos +---+---+ 273 1.1 christos 274 1.1 christos +=======================+ 275 1.1 christos |...compressed blocks...| (more-->) 276 1.1 christos +=======================+ 277 1.1 christos 278 1.1 christos 0 1 2 3 4 5 6 7 279 1.1 christos +---+---+---+---+---+---+---+---+ 280 1.1 christos | CRC32 | ISIZE | 281 1.1 christos +---+---+---+---+---+---+---+---+ 282 1.1 christos 283 1.1 christos 284 1.1 christos 285 1.1 christos 286 1.1 christos Deutsch Informational [Page 5] 287 1.1 christos 289 1.1 christos RFC 1952 GZIP File Format Specification May 1996 290 1.1 christos 291 1.1 christos 292 1.1 christos 2.3.1. Member header and trailer 293 1.1 christos 294 1.1 christos ID1 (IDentification 1) 295 1.1 christos ID2 (IDentification 2) 296 1.1 christos These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 297 1.1 christos (0x8b, \213), to identify the file as being in gzip format. 298 1.1 christos 299 1.1 christos CM (Compression Method) 300 1.1 christos This identifies the compression method used in the file. CM 301 1.1 christos = 0-7 are reserved. CM = 8 denotes the "deflate" 302 1.1 christos compression method, which is the one customarily used by 303 1.1 christos gzip and which is documented elsewhere. 304 1.1 christos 305 1.1 christos FLG (FLaGs) 306 1.1 christos This flag byte is divided into individual bits as follows: 307 1.1 christos 308 1.1 christos bit 0 FTEXT 309 1.1 christos bit 1 FHCRC 310 1.1 christos bit 2 FEXTRA 311 1.1 christos bit 3 FNAME 312 1.1 christos bit 4 FCOMMENT 313 1.1 christos bit 5 reserved 314 1.1 christos bit 6 reserved 315 1.1 christos bit 7 reserved 316 1.1 christos 317 1.1 christos If FTEXT is set, the file is probably ASCII text. This is 318 1.1 christos an optional indication, which the compressor may set by 319 1.1 christos checking a small amount of the input data to see whether any 320 1.1 christos non-ASCII characters are present. In case of doubt, FTEXT 321 1.1 christos is cleared, indicating binary data. For systems which have 322 1.1 christos different file formats for ascii text and binary data, the 323 1.1 christos decompressor can use FTEXT to choose the appropriate format. 324 1.1 christos We deliberately do not specify the algorithm used to set 325 1.1 christos this bit, since a compressor always has the option of 326 1.1 christos leaving it cleared and a decompressor always has the option 327 1.1 christos of ignoring it and letting some other program handle issues 328 1.1 christos of data conversion. 329 1.1 christos 330 1.1 christos If FHCRC is set, a CRC16 for the gzip header is present, 331 1.1 christos immediately before the compressed data. The CRC16 consists 332 1.1 christos of the two least significant bytes of the CRC32 for all 333 1.1 christos bytes of the gzip header up to and not including the CRC16. 334 1.1 christos [The FHCRC bit was never set by versions of gzip up to 335 1.1 christos 1.2.4, even though it was documented with a different 336 1.1 christos meaning in gzip 1.2.4.] 337 1.1 christos 338 1.1 christos If FEXTRA is set, optional extra fields are present, as 339 1.1 christos described in a following section. 340 1.1 christos 341 1.1 christos 342 1.1 christos 343 1.1 christos Deutsch Informational [Page 6] 344 1.1 christos 346 1.1 christos RFC 1952 GZIP File Format Specification May 1996 347 1.1 christos 348 1.1 christos 349 1.1 christos If FNAME is set, an original file name is present, 350 1.1 christos terminated by a zero byte. The name must consist of ISO 351 1.1 christos 8859-1 (LATIN-1) characters; on operating systems using 352 1.1 christos EBCDIC or any other character set for file names, the name 353 1.1 christos must be translated to the ISO LATIN-1 character set. This 354 1.1 christos is the original name of the file being compressed, with any 355 1.1 christos directory components removed, and, if the file being 356 1.1 christos compressed is on a file system with case insensitive names, 357 1.1 christos forced to lower case. There is no original file name if the 358 1.1 christos data was compressed from a source other than a named file; 359 1.1 christos for example, if the source was stdin on a Unix system, there 360 1.1 christos is no file name. 361 1.1 christos 362 1.1 christos If FCOMMENT is set, a zero-terminated file comment is 363 1.1 christos present. This comment is not interpreted; it is only 364 1.1 christos intended for human consumption. The comment must consist of 365 1.1 christos ISO 8859-1 (LATIN-1) characters. Line breaks should be 366 1.1 christos denoted by a single line feed character (10 decimal). 367 1.1 christos 368 1.1 christos Reserved FLG bits must be zero. 369 1.1 christos 370 1.1 christos MTIME (Modification TIME) 371 1.1 christos This gives the most recent modification time of the original 372 1.1 christos file being compressed. The time is in Unix format, i.e., 373 1.1 christos seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this 374 1.1 christos may cause problems for MS-DOS and other systems that use 375 1.1 christos local rather than Universal time.) If the compressed data 376 1.1 christos did not come from a file, MTIME is set to the time at which 377 1.1 christos compression started. MTIME = 0 means no time stamp is 378 1.1 christos available. 379 1.1 christos 380 1.1 christos XFL (eXtra FLags) 381 1.1 christos These flags are available for use by specific compression 382 1.1 christos methods. The "deflate" method (CM = 8) sets these flags as 383 1.1 christos follows: 384 1.1 christos 385 1.1 christos XFL = 2 - compressor used maximum compression, 386 1.1 christos slowest algorithm 387 1.1 christos XFL = 4 - compressor used fastest algorithm 388 1.1 christos 389 1.1 christos OS (Operating System) 390 1.1 christos This identifies the type of file system on which compression 391 1.1 christos took place. This may be useful in determining end-of-line 392 1.1 christos convention for text files. The currently defined values are 393 1.1 christos as follows: 394 1.1 christos 395 1.1 christos 396 1.1 christos 397 1.1 christos 398 1.1 christos 399 1.1 christos 400 1.1 christos Deutsch Informational [Page 7] 401 1.1 christos 403 1.1 christos RFC 1952 GZIP File Format Specification May 1996 404 1.1 christos 405 1.1 christos 406 1.1 christos 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32) 407 1.1 christos 1 - Amiga 408 1.1 christos 2 - VMS (or OpenVMS) 409 1.1 christos 3 - Unix 410 1.1 christos 4 - VM/CMS 411 1.1 christos 5 - Atari TOS 412 1.1 christos 6 - HPFS filesystem (OS/2, NT) 413 1.1 christos 7 - Macintosh 414 1.1 christos 8 - Z-System 415 1.1 christos 9 - CP/M 416 1.1 christos 10 - TOPS-20 417 1.1 christos 11 - NTFS filesystem (NT) 418 1.1 christos 12 - QDOS 419 1.1 christos 13 - Acorn RISCOS 420 1.1 christos 255 - unknown 421 1.1 christos 422 1.1 christos XLEN (eXtra LENgth) 423 1.1 christos If FLG.FEXTRA is set, this gives the length of the optional 424 1.1 christos extra field. See below for details. 425 1.1 christos 426 1.1 christos CRC32 (CRC-32) 427 1.1 christos This contains a Cyclic Redundancy Check value of the 428 1.1 christos uncompressed data computed according to CRC-32 algorithm 429 1.1 christos used in the ISO 3309 standard and in section 8.1.1.6.2 of 430 1.1 christos ITU-T recommendation V.42. (See http://www.iso.ch for 431 1.1 christos ordering ISO documents. See gopher://info.itu.ch for an 432 1.1 christos online version of ITU-T V.42.) 433 1.1 christos 434 1.1 christos ISIZE (Input SIZE) 435 1.1 christos This contains the size of the original (uncompressed) input 436 1.1 christos data modulo 2^32. 437 1.1 christos 438 1.1 christos 2.3.1.1. Extra field 439 1.1 christos 440 1.1 christos If the FLG.FEXTRA bit is set, an "extra field" is present in 441 1.1 christos the header, with total length XLEN bytes. It consists of a 442 1.1 christos series of subfields, each of the form: 443 1.1 christos 444 1.1 christos +---+---+---+---+==================================+ 445 1.1 christos |SI1|SI2| LEN |... LEN bytes of subfield data ...| 446 1.1 christos +---+---+---+---+==================================+ 447 1.1 christos 448 1.1 christos SI1 and SI2 provide a subfield ID, typically two ASCII letters 449 1.1 christos with some mnemonic value. Jean-Loup Gailly 450 1.1 christos <gzip (a] prep.ai.mit.edu> is maintaining a registry of subfield 451 1.1 christos IDs; please send him any subfield ID you wish to use. Subfield 452 1.1 christos IDs with SI2 = 0 are reserved for future use. The following 453 1.1 christos IDs are currently defined: 454 1.1 christos 455 1.1 christos 456 1.1 christos 457 1.1 christos Deutsch Informational [Page 8] 458 1.1 christos 460 1.1 christos RFC 1952 GZIP File Format Specification May 1996 461 1.1 christos 462 1.1 christos 463 1.1 christos SI1 SI2 Data 464 1.1 christos ---------- ---------- ---- 465 1.1 christos 0x41 ('A') 0x70 ('P') Apollo file type information 466 1.1 christos 467 1.1 christos LEN gives the length of the subfield data, excluding the 4 468 1.1 christos initial bytes. 469 1.1 christos 470 1.1 christos 2.3.1.2. Compliance 471 1.1 christos 472 1.1 christos A compliant compressor must produce files with correct ID1, 473 1.1 christos ID2, CM, CRC32, and ISIZE, but may set all the other fields in 474 1.1 christos the fixed-length part of the header to default values (255 for 475 1.1 christos OS, 0 for all others). The compressor must set all reserved 476 1.1 christos bits to zero. 477 1.1 christos 478 1.1 christos A compliant decompressor must check ID1, ID2, and CM, and 479 1.1 christos provide an error indication if any of these have incorrect 480 1.1 christos values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC 481 1.1 christos at least so it can skip over the optional fields if they are 482 1.1 christos present. It need not examine any other part of the header or 483 1.1 christos trailer; in particular, a decompressor may ignore FTEXT and OS 484 1.1 christos and always produce binary output, and still be compliant. A 485 1.1 christos compliant decompressor must give an error indication if any 486 1.1 christos reserved bit is non-zero, since such a bit could indicate the 487 1.1 christos presence of a new field that would cause subsequent data to be 488 1.1 christos interpreted incorrectly. 489 1.1 christos 490 1.1 christos 3. References 491 1.1 christos 492 1.1 christos [1] "Information Processing - 8-bit single-byte coded graphic 493 1.1 christos character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987). 494 1.1 christos The ISO 8859-1 (Latin-1) character set is a superset of 7-bit 495 1.1 christos ASCII. Files defining this character set are available as 496 1.1 christos iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/ 497 1.1 christos 498 1.1 christos [2] ISO 3309 499 1.1 christos 500 1.1 christos [3] ITU-T recommendation V.42 501 1.1 christos 502 1.1 christos [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification", 503 1.1 christos available in ftp://ftp.uu.net/pub/archiving/zip/doc/ 504 1.1 christos 505 1.1 christos [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in 506 1.1 christos ftp://prep.ai.mit.edu/pub/gnu/ 507 1.1 christos 508 1.1 christos [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table 509 1.1 christos Look-Up", Communications of the ACM, 31(8), pp.1008-1013. 510 1.1 christos 511 1.1 christos 512 1.1 christos 513 1.1 christos 514 1.1 christos Deutsch Informational [Page 9] 515 1.1 christos 517 1.1 christos RFC 1952 GZIP File Format Specification May 1996 518 1.1 christos 519 1.1 christos 520 1.1 christos [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal, 521 1.1 christos pp.118-133. 522 1.1 christos 523 1.1 christos [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt, 524 1.1 christos describing the CRC concept. 525 1.1 christos 526 1.1 christos 4. Security Considerations 527 1.1 christos 528 1.1 christos Any data compression method involves the reduction of redundancy in 529 1.1 christos the data. Consequently, any corruption of the data is likely to have 530 1.1 christos severe effects and be difficult to correct. Uncompressed text, on 531 1.1 christos the other hand, will probably still be readable despite the presence 532 1.1 christos of some corrupted bytes. 533 1.1 christos 534 1.1 christos It is recommended that systems using this data format provide some 535 1.1 christos means of validating the integrity of the compressed data, such as by 536 1.1 christos setting and checking the CRC-32 check value. 537 1.1 christos 538 1.1 christos 5. Acknowledgements 539 1.1 christos 540 1.1 christos Trademarks cited in this document are the property of their 541 1.1 christos respective owners. 542 1.1 christos 543 1.1 christos Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler, 544 1.1 christos the related software described in this specification. Glenn 545 1.1 christos Randers-Pehrson converted this document to RFC and HTML format. 546 1.1 christos 547 1.1 christos 6. Author's Address 548 1.1 christos 549 1.1 christos L. Peter Deutsch 550 1.1 christos Aladdin Enterprises 551 1.1 christos 203 Santa Margarita Ave. 552 1.1 christos Menlo Park, CA 94025 553 1.1 christos 554 1.1 christos Phone: (415) 322-0103 (AM only) 555 1.1 christos FAX: (415) 322-1734 556 1.1 christos EMail: <ghost (a] aladdin.com> 557 1.1 christos 558 1.1 christos Questions about the technical content of this specification can be 559 1.1 christos sent by email to: 560 1.1 christos 561 1.1 christos Jean-Loup Gailly <gzip (a] prep.ai.mit.edu> and 562 1.1 christos Mark Adler <madler (a] alumni.caltech.edu> 563 1.1 christos 564 1.1 christos Editorial comments on this specification can be sent by email to: 565 1.1 christos 566 1.1 christos L. Peter Deutsch <ghost (a] aladdin.com> and 567 1.1 christos Glenn Randers-Pehrson <randeg (a] alumni.rpi.edu> 568 1.1 christos 569 1.1 christos 570 1.1 christos 571 1.1 christos Deutsch Informational [Page 10] 572 1.1 christos 574 1.1 christos RFC 1952 GZIP File Format Specification May 1996 575 1.1 christos 576 1.1 christos 577 1.1 christos 7. Appendix: Jean-Loup Gailly's gzip utility 578 1.1 christos 579 1.1 christos The most widely used implementation of gzip compression, and the 580 1.1 christos original documentation on which this specification is based, were 581 1.1 christos created by Jean-Loup Gailly <gzip (a] prep.ai.mit.edu>. Since this 582 1.1 christos implementation is a de facto standard, we mention some more of its 583 1.1 christos features here. Again, the material in this section is not part of 584 1.1 christos the specification per se, and implementations need not follow it to 585 1.1 christos be compliant. 586 1.1 christos 587 1.1 christos When compressing or decompressing a file, gzip preserves the 588 1.1 christos protection, ownership, and modification time attributes on the local 589 1.1 christos file system, since there is no provision for representing protection 590 1.1 christos attributes in the gzip file format itself. Since the file format 591 1.1 christos includes a modification time, the gzip decompressor provides a 592 1.1 christos command line switch that assigns the modification time from the file, 593 1.1 christos rather than the local modification time of the compressed input, to 594 1.1 christos the decompressed output. 595 1.1 christos 596 1.1 christos 8. Appendix: Sample CRC Code 597 1.1 christos 598 1.1 christos The following sample code represents a practical implementation of 599 1.1 christos the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42 600 1.1 christos for a formal specification.) 601 1.1 christos 602 1.1 christos The sample code is in the ANSI C programming language. Non C users 603 1.1 christos may find it easier to read with these hints: 604 1.1 christos 605 1.1 christos & Bitwise AND operator. 606 1.1 christos ^ Bitwise exclusive-OR operator. 607 1.1 christos >> Bitwise right shift operator. When applied to an 608 1.1 christos unsigned quantity, as here, right shift inserts zero 609 1.1 christos bit(s) at the left. 610 1.1 christos ! Logical NOT operator. 611 1.1 christos ++ "n++" increments the variable n. 612 1.1 christos 0xNNN 0x introduces a hexadecimal (base 16) constant. 613 1.1 christos Suffix L indicates a long value (at least 32 bits). 614 1.1 christos 615 1.1 christos /* Table of CRCs of all 8-bit messages. */ 616 1.1 christos unsigned long crc_table[256]; 617 1.1 christos 618 1.1 christos /* Flag: has the table been computed? Initially false. */ 619 1.1 christos int crc_table_computed = 0; 620 1.1 christos 621 1.1 christos /* Make the table for a fast CRC. */ 622 1.1 christos void make_crc_table(void) 623 1.1 christos { 624 1.1 christos unsigned long c; 625 1.1 christos 626 1.1 christos 627 1.1 christos 628 1.1 christos Deutsch Informational [Page 11] 629 1.1 christos 631 1.1 christos RFC 1952 GZIP File Format Specification May 1996 632 1.1 christos 633 1.1 christos 634 1.1 christos int n, k; 635 1.1 christos for (n = 0; n < 256; n++) { 636 1.1 christos c = (unsigned long) n; 637 1.1 christos for (k = 0; k < 8; k++) { 638 1.1 christos if (c & 1) { 639 1.1 christos c = 0xedb88320L ^ (c >> 1); 640 1.1 christos } else { 641 1.1 christos c = c >> 1; 642 1.1 christos } 643 1.1 christos } 644 1.1 christos crc_table[n] = c; 645 1.1 christos } 646 1.1 christos crc_table_computed = 1; 647 1.1 christos } 648 1.1 christos 649 1.1 christos /* 650 1.1 christos Update a running crc with the bytes buf[0..len-1] and return 651 1.1 christos the updated crc. The crc should be initialized to zero. Pre- and 652 1.1 christos post-conditioning (one's complement) is performed within this 653 1.1 christos function so it shouldn't be done by the caller. Usage example: 654 1.1 christos 655 1.1 christos unsigned long crc = 0L; 656 1.1 christos 657 1.1 christos while (read_buffer(buffer, length) != EOF) { 658 1.1 christos crc = update_crc(crc, buffer, length); 659 1.1 christos } 660 1.1 christos if (crc != original_crc) error(); 661 1.1 christos */ 662 1.1 christos unsigned long update_crc(unsigned long crc, 663 1.1 christos unsigned char *buf, int len) 664 1.1 christos { 665 1.1 christos unsigned long c = crc ^ 0xffffffffL; 666 1.1 christos int n; 667 1.1 christos 668 1.1 christos if (!crc_table_computed) 669 1.1 christos make_crc_table(); 670 1.1 christos for (n = 0; n < len; n++) { 671 1.1 christos c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8); 672 1.1 christos } 673 1.1 christos return c ^ 0xffffffffL; 674 1.1 christos } 675 1.1 christos 676 /* Return the CRC of the bytes buf[0..len-1]. */ 677 unsigned long crc(unsigned char *buf, int len) 678 { 679 return update_crc(0L, buf, len); 680 } 681 682 683 684 685 Deutsch Informational [Page 12] 686 688