Home | History | Annotate | Line # | Download | only in doc
      1  1.1  christos 
      2  1.1  christos 
      3  1.1  christos 
      4  1.1  christos 
      5  1.1  christos 
      6  1.1  christos 
      7  1.1  christos Network Working Group                                         P. Deutsch
      8  1.1  christos Request for Comments: 1952                           Aladdin Enterprises
      9  1.1  christos Category: Informational                                         May 1996
     10  1.1  christos 
     11  1.1  christos 
     12  1.1  christos                GZIP file format specification version 4.3
     13  1.1  christos 
     14  1.1  christos Status of This Memo
     15  1.1  christos 
     16  1.1  christos    This memo provides information for the Internet community.  This memo
     17  1.1  christos    does not specify an Internet standard of any kind.  Distribution of
     18  1.1  christos    this memo is unlimited.
     19  1.1  christos 
     20  1.1  christos IESG Note:
     21  1.1  christos 
     22  1.1  christos    The IESG takes no position on the validity of any Intellectual
     23  1.1  christos    Property Rights statements contained in this document.
     24  1.1  christos 
     25  1.1  christos Notices
     26  1.1  christos 
     27  1.1  christos    Copyright (c) 1996 L. Peter Deutsch
     28  1.1  christos 
     29  1.1  christos    Permission is granted to copy and distribute this document for any
     30  1.1  christos    purpose and without charge, including translations into other
     31  1.1  christos    languages and incorporation into compilations, provided that the
     32  1.1  christos    copyright notice and this notice are preserved, and that any
     33  1.1  christos    substantive changes or deletions from the original are clearly
     34  1.1  christos    marked.
     35  1.1  christos 
     36  1.1  christos    A pointer to the latest version of this and related documentation in
     37  1.1  christos    HTML format can be found at the URL
     38  1.1  christos    <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
     39  1.1  christos 
     40  1.1  christos Abstract
     41  1.1  christos 
     42  1.1  christos    This specification defines a lossless compressed data format that is
     43  1.1  christos    compatible with the widely used GZIP utility.  The format includes a
     44  1.1  christos    cyclic redundancy check value for detecting data corruption.  The
     45  1.1  christos    format presently uses the DEFLATE method of compression but can be
     46  1.1  christos    easily extended to use other compression methods.  The format can be
     47  1.1  christos    implemented readily in a manner not covered by patents.
     48  1.1  christos 
     49  1.1  christos 
     50  1.1  christos 
     51  1.1  christos 
     52  1.1  christos 
     53  1.1  christos 
     54  1.1  christos 
     55  1.1  christos 
     56  1.1  christos 
     57  1.1  christos 
     58  1.1  christos Deutsch                      Informational                      [Page 1]
     59  1.1  christos 
     61  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
     62  1.1  christos 
     63  1.1  christos 
     64  1.1  christos Table of Contents
     65  1.1  christos 
     66  1.1  christos    1. Introduction ................................................... 2
     67  1.1  christos       1.1. Purpose ................................................... 2
     68  1.1  christos       1.2. Intended audience ......................................... 3
     69  1.1  christos       1.3. Scope ..................................................... 3
     70  1.1  christos       1.4. Compliance ................................................ 3
     71  1.1  christos       1.5. Definitions of terms and conventions used ................. 3
     72  1.1  christos       1.6. Changes from previous versions ............................ 3
     73  1.1  christos    2. Detailed specification ......................................... 4
     74  1.1  christos       2.1. Overall conventions ....................................... 4
     75  1.1  christos       2.2. File format ............................................... 5
     76  1.1  christos       2.3. Member format ............................................. 5
     77  1.1  christos           2.3.1. Member header and trailer ........................... 6
     78  1.1  christos               2.3.1.1. Extra field ................................... 8
     79  1.1  christos               2.3.1.2. Compliance .................................... 9
     80  1.1  christos       3. References .................................................. 9
     81  1.1  christos       4. Security Considerations .................................... 10
     82  1.1  christos       5. Acknowledgements ........................................... 10
     83  1.1  christos       6. Author's Address ........................................... 10
     84  1.1  christos       7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
     85  1.1  christos       8. Appendix: Sample CRC Code .................................. 11
     86  1.1  christos 
     87  1.1  christos 1. Introduction
     88  1.1  christos 
     89  1.1  christos    1.1. Purpose
     90  1.1  christos 
     91  1.1  christos       The purpose of this specification is to define a lossless
     92  1.1  christos       compressed data format that:
     93  1.1  christos 
     94  1.1  christos           * Is independent of CPU type, operating system, file system,
     95  1.1  christos             and character set, and hence can be used for interchange;
     96  1.1  christos           * Can compress or decompress a data stream (as opposed to a
     97  1.1  christos             randomly accessible file) to produce another data stream,
     98  1.1  christos             using only an a priori bounded amount of intermediate
     99  1.1  christos             storage, and hence can be used in data communications or
    100  1.1  christos             similar structures such as Unix filters;
    101  1.1  christos           * Compresses data with efficiency comparable to the best
    102  1.1  christos             currently available general-purpose compression methods,
    103  1.1  christos             and in particular considerably better than the "compress"
    104  1.1  christos             program;
    105  1.1  christos           * Can be implemented readily in a manner not covered by
    106  1.1  christos             patents, and hence can be practiced freely;
    107  1.1  christos           * Is compatible with the file format produced by the current
    108  1.1  christos             widely used gzip utility, in that conforming decompressors
    109  1.1  christos             will be able to read data produced by the existing gzip
    110  1.1  christos             compressor.
    111  1.1  christos 
    112  1.1  christos 
    113  1.1  christos 
    114  1.1  christos 
    115  1.1  christos Deutsch                      Informational                      [Page 2]
    116  1.1  christos 
    118  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    119  1.1  christos 
    120  1.1  christos 
    121  1.1  christos       The data format defined by this specification does not attempt to:
    122  1.1  christos 
    123  1.1  christos           * Provide random access to compressed data;
    124  1.1  christos           * Compress specialized data (e.g., raster graphics) as well as
    125  1.1  christos             the best currently available specialized algorithms.
    126  1.1  christos 
    127  1.1  christos    1.2. Intended audience
    128  1.1  christos 
    129  1.1  christos       This specification is intended for use by implementors of software
    130  1.1  christos       to compress data into gzip format and/or decompress data from gzip
    131  1.1  christos       format.
    132  1.1  christos 
    133  1.1  christos       The text of the specification assumes a basic background in
    134  1.1  christos       programming at the level of bits and other primitive data
    135  1.1  christos       representations.
    136  1.1  christos 
    137  1.1  christos    1.3. Scope
    138  1.1  christos 
    139  1.1  christos       The specification specifies a compression method and a file format
    140  1.1  christos       (the latter assuming only that a file can store a sequence of
    141  1.1  christos       arbitrary bytes).  It does not specify any particular interface to
    142  1.1  christos       a file system or anything about character sets or encodings
    143  1.1  christos       (except for file names and comments, which are optional).
    144  1.1  christos 
    145  1.1  christos    1.4. Compliance
    146  1.1  christos 
    147  1.1  christos       Unless otherwise indicated below, a compliant decompressor must be
    148  1.1  christos       able to accept and decompress any file that conforms to all the
    149  1.1  christos       specifications presented here; a compliant compressor must produce
    150  1.1  christos       files that conform to all the specifications presented here.  The
    151  1.1  christos       material in the appendices is not part of the specification per se
    152  1.1  christos       and is not relevant to compliance.
    153  1.1  christos 
    154  1.1  christos    1.5. Definitions of terms and conventions used
    155  1.1  christos 
    156  1.1  christos       byte: 8 bits stored or transmitted as a unit (same as an octet).
    157  1.1  christos       (For this specification, a byte is exactly 8 bits, even on
    158  1.1  christos       machines which store a character on a number of bits different
    159  1.1  christos       from 8.)  See below for the numbering of bits within a byte.
    160  1.1  christos 
    161  1.1  christos    1.6. Changes from previous versions
    162  1.1  christos 
    163  1.1  christos       There have been no technical changes to the gzip format since
    164  1.1  christos       version 4.1 of this specification.  In version 4.2, some
    165  1.1  christos       terminology was changed, and the sample CRC code was rewritten for
    166  1.1  christos       clarity and to eliminate the requirement for the caller to do pre-
    167  1.1  christos       and post-conditioning.  Version 4.3 is a conversion of the
    168  1.1  christos       specification to RFC style.
    169  1.1  christos 
    170  1.1  christos 
    171  1.1  christos 
    172  1.1  christos Deutsch                      Informational                      [Page 3]
    173  1.1  christos 
    175  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    176  1.1  christos 
    177  1.1  christos 
    178  1.1  christos 2. Detailed specification
    179  1.1  christos 
    180  1.1  christos    2.1. Overall conventions
    181  1.1  christos 
    182  1.1  christos       In the diagrams below, a box like this:
    183  1.1  christos 
    184  1.1  christos          +---+
    185  1.1  christos          |   | <-- the vertical bars might be missing
    186  1.1  christos          +---+
    187  1.1  christos 
    188  1.1  christos       represents one byte; a box like this:
    189  1.1  christos 
    190  1.1  christos          +==============+
    191  1.1  christos          |              |
    192  1.1  christos          +==============+
    193  1.1  christos 
    194  1.1  christos       represents a variable number of bytes.
    195  1.1  christos 
    196  1.1  christos       Bytes stored within a computer do not have a "bit order", since
    197  1.1  christos       they are always treated as a unit.  However, a byte considered as
    198  1.1  christos       an integer between 0 and 255 does have a most- and least-
    199  1.1  christos       significant bit, and since we write numbers with the most-
    200  1.1  christos       significant digit on the left, we also write bytes with the most-
    201  1.1  christos       significant bit on the left.  In the diagrams below, we number the
    202  1.1  christos       bits of a byte so that bit 0 is the least-significant bit, i.e.,
    203  1.1  christos       the bits are numbered:
    204  1.1  christos 
    205  1.1  christos          +--------+
    206  1.1  christos          |76543210|
    207  1.1  christos          +--------+
    208  1.1  christos 
    209  1.1  christos       This document does not address the issue of the order in which
    210  1.1  christos       bits of a byte are transmitted on a bit-sequential medium, since
    211  1.1  christos       the data format described here is byte- rather than bit-oriented.
    212  1.1  christos 
    213  1.1  christos       Within a computer, a number may occupy multiple bytes.  All
    214  1.1  christos       multi-byte numbers in the format described here are stored with
    215  1.1  christos       the least-significant byte first (at the lower memory address).
    216  1.1  christos       For example, the decimal number 520 is stored as:
    217  1.1  christos 
    218  1.1  christos              0        1
    219  1.1  christos          +--------+--------+
    220  1.1  christos          |00001000|00000010|
    221  1.1  christos          +--------+--------+
    222  1.1  christos           ^        ^
    223  1.1  christos           |        |
    224  1.1  christos           |        + more significant byte = 2 x 256
    225  1.1  christos           + less significant byte = 8
    226  1.1  christos 
    227  1.1  christos 
    228  1.1  christos 
    229  1.1  christos Deutsch                      Informational                      [Page 4]
    230  1.1  christos 
    232  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    233  1.1  christos 
    234  1.1  christos 
    235  1.1  christos    2.2. File format
    236  1.1  christos 
    237  1.1  christos       A gzip file consists of a series of "members" (compressed data
    238  1.1  christos       sets).  The format of each member is specified in the following
    239  1.1  christos       section.  The members simply appear one after another in the file,
    240  1.1  christos       with no additional information before, between, or after them.
    241  1.1  christos 
    242  1.1  christos    2.3. Member format
    243  1.1  christos 
    244  1.1  christos       Each member has the following structure:
    245  1.1  christos 
    246  1.1  christos          +---+---+---+---+---+---+---+---+---+---+
    247  1.1  christos          |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
    248  1.1  christos          +---+---+---+---+---+---+---+---+---+---+
    249  1.1  christos 
    250  1.1  christos       (if FLG.FEXTRA set)
    251  1.1  christos 
    252  1.1  christos          +---+---+=================================+
    253  1.1  christos          | XLEN  |...XLEN bytes of "extra field"...| (more-->)
    254  1.1  christos          +---+---+=================================+
    255  1.1  christos 
    256  1.1  christos       (if FLG.FNAME set)
    257  1.1  christos 
    258  1.1  christos          +=========================================+
    259  1.1  christos          |...original file name, zero-terminated...| (more-->)
    260  1.1  christos          +=========================================+
    261  1.1  christos 
    262  1.1  christos       (if FLG.FCOMMENT set)
    263  1.1  christos 
    264  1.1  christos          +===================================+
    265  1.1  christos          |...file comment, zero-terminated...| (more-->)
    266  1.1  christos          +===================================+
    267  1.1  christos 
    268  1.1  christos       (if FLG.FHCRC set)
    269  1.1  christos 
    270  1.1  christos          +---+---+
    271  1.1  christos          | CRC16 |
    272  1.1  christos          +---+---+
    273  1.1  christos 
    274  1.1  christos          +=======================+
    275  1.1  christos          |...compressed blocks...| (more-->)
    276  1.1  christos          +=======================+
    277  1.1  christos 
    278  1.1  christos            0   1   2   3   4   5   6   7
    279  1.1  christos          +---+---+---+---+---+---+---+---+
    280  1.1  christos          |     CRC32     |     ISIZE     |
    281  1.1  christos          +---+---+---+---+---+---+---+---+
    282  1.1  christos 
    283  1.1  christos 
    284  1.1  christos 
    285  1.1  christos 
    286  1.1  christos Deutsch                      Informational                      [Page 5]
    287  1.1  christos 
    289  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    290  1.1  christos 
    291  1.1  christos 
    292  1.1  christos       2.3.1. Member header and trailer
    293  1.1  christos 
    294  1.1  christos          ID1 (IDentification 1)
    295  1.1  christos          ID2 (IDentification 2)
    296  1.1  christos             These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
    297  1.1  christos             (0x8b, \213), to identify the file as being in gzip format.
    298  1.1  christos 
    299  1.1  christos          CM (Compression Method)
    300  1.1  christos             This identifies the compression method used in the file.  CM
    301  1.1  christos             = 0-7 are reserved.  CM = 8 denotes the "deflate"
    302  1.1  christos             compression method, which is the one customarily used by
    303  1.1  christos             gzip and which is documented elsewhere.
    304  1.1  christos 
    305  1.1  christos          FLG (FLaGs)
    306  1.1  christos             This flag byte is divided into individual bits as follows:
    307  1.1  christos 
    308  1.1  christos                bit 0   FTEXT
    309  1.1  christos                bit 1   FHCRC
    310  1.1  christos                bit 2   FEXTRA
    311  1.1  christos                bit 3   FNAME
    312  1.1  christos                bit 4   FCOMMENT
    313  1.1  christos                bit 5   reserved
    314  1.1  christos                bit 6   reserved
    315  1.1  christos                bit 7   reserved
    316  1.1  christos 
    317  1.1  christos             If FTEXT is set, the file is probably ASCII text.  This is
    318  1.1  christos             an optional indication, which the compressor may set by
    319  1.1  christos             checking a small amount of the input data to see whether any
    320  1.1  christos             non-ASCII characters are present.  In case of doubt, FTEXT
    321  1.1  christos             is cleared, indicating binary data. For systems which have
    322  1.1  christos             different file formats for ascii text and binary data, the
    323  1.1  christos             decompressor can use FTEXT to choose the appropriate format.
    324  1.1  christos             We deliberately do not specify the algorithm used to set
    325  1.1  christos             this bit, since a compressor always has the option of
    326  1.1  christos             leaving it cleared and a decompressor always has the option
    327  1.1  christos             of ignoring it and letting some other program handle issues
    328  1.1  christos             of data conversion.
    329  1.1  christos 
    330  1.1  christos             If FHCRC is set, a CRC16 for the gzip header is present,
    331  1.1  christos             immediately before the compressed data. The CRC16 consists
    332  1.1  christos             of the two least significant bytes of the CRC32 for all
    333  1.1  christos             bytes of the gzip header up to and not including the CRC16.
    334  1.1  christos             [The FHCRC bit was never set by versions of gzip up to
    335  1.1  christos             1.2.4, even though it was documented with a different
    336  1.1  christos             meaning in gzip 1.2.4.]
    337  1.1  christos 
    338  1.1  christos             If FEXTRA is set, optional extra fields are present, as
    339  1.1  christos             described in a following section.
    340  1.1  christos 
    341  1.1  christos 
    342  1.1  christos 
    343  1.1  christos Deutsch                      Informational                      [Page 6]
    344  1.1  christos 
    346  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    347  1.1  christos 
    348  1.1  christos 
    349  1.1  christos             If FNAME is set, an original file name is present,
    350  1.1  christos             terminated by a zero byte.  The name must consist of ISO
    351  1.1  christos             8859-1 (LATIN-1) characters; on operating systems using
    352  1.1  christos             EBCDIC or any other character set for file names, the name
    353  1.1  christos             must be translated to the ISO LATIN-1 character set.  This
    354  1.1  christos             is the original name of the file being compressed, with any
    355  1.1  christos             directory components removed, and, if the file being
    356  1.1  christos             compressed is on a file system with case insensitive names,
    357  1.1  christos             forced to lower case. There is no original file name if the
    358  1.1  christos             data was compressed from a source other than a named file;
    359  1.1  christos             for example, if the source was stdin on a Unix system, there
    360  1.1  christos             is no file name.
    361  1.1  christos 
    362  1.1  christos             If FCOMMENT is set, a zero-terminated file comment is
    363  1.1  christos             present.  This comment is not interpreted; it is only
    364  1.1  christos             intended for human consumption.  The comment must consist of
    365  1.1  christos             ISO 8859-1 (LATIN-1) characters.  Line breaks should be
    366  1.1  christos             denoted by a single line feed character (10 decimal).
    367  1.1  christos 
    368  1.1  christos             Reserved FLG bits must be zero.
    369  1.1  christos 
    370  1.1  christos          MTIME (Modification TIME)
    371  1.1  christos             This gives the most recent modification time of the original
    372  1.1  christos             file being compressed.  The time is in Unix format, i.e.,
    373  1.1  christos             seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
    374  1.1  christos             may cause problems for MS-DOS and other systems that use
    375  1.1  christos             local rather than Universal time.)  If the compressed data
    376  1.1  christos             did not come from a file, MTIME is set to the time at which
    377  1.1  christos             compression started.  MTIME = 0 means no time stamp is
    378  1.1  christos             available.
    379  1.1  christos 
    380  1.1  christos          XFL (eXtra FLags)
    381  1.1  christos             These flags are available for use by specific compression
    382  1.1  christos             methods.  The "deflate" method (CM = 8) sets these flags as
    383  1.1  christos             follows:
    384  1.1  christos 
    385  1.1  christos                XFL = 2 - compressor used maximum compression,
    386  1.1  christos                          slowest algorithm
    387  1.1  christos                XFL = 4 - compressor used fastest algorithm
    388  1.1  christos 
    389  1.1  christos          OS (Operating System)
    390  1.1  christos             This identifies the type of file system on which compression
    391  1.1  christos             took place.  This may be useful in determining end-of-line
    392  1.1  christos             convention for text files.  The currently defined values are
    393  1.1  christos             as follows:
    394  1.1  christos 
    395  1.1  christos 
    396  1.1  christos 
    397  1.1  christos 
    398  1.1  christos 
    399  1.1  christos 
    400  1.1  christos Deutsch                      Informational                      [Page 7]
    401  1.1  christos 
    403  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    404  1.1  christos 
    405  1.1  christos 
    406  1.1  christos                  0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
    407  1.1  christos                  1 - Amiga
    408  1.1  christos                  2 - VMS (or OpenVMS)
    409  1.1  christos                  3 - Unix
    410  1.1  christos                  4 - VM/CMS
    411  1.1  christos                  5 - Atari TOS
    412  1.1  christos                  6 - HPFS filesystem (OS/2, NT)
    413  1.1  christos                  7 - Macintosh
    414  1.1  christos                  8 - Z-System
    415  1.1  christos                  9 - CP/M
    416  1.1  christos                 10 - TOPS-20
    417  1.1  christos                 11 - NTFS filesystem (NT)
    418  1.1  christos                 12 - QDOS
    419  1.1  christos                 13 - Acorn RISCOS
    420  1.1  christos                255 - unknown
    421  1.1  christos 
    422  1.1  christos          XLEN (eXtra LENgth)
    423  1.1  christos             If FLG.FEXTRA is set, this gives the length of the optional
    424  1.1  christos             extra field.  See below for details.
    425  1.1  christos 
    426  1.1  christos          CRC32 (CRC-32)
    427  1.1  christos             This contains a Cyclic Redundancy Check value of the
    428  1.1  christos             uncompressed data computed according to CRC-32 algorithm
    429  1.1  christos             used in the ISO 3309 standard and in section 8.1.1.6.2 of
    430  1.1  christos             ITU-T recommendation V.42.  (See http://www.iso.ch for
    431  1.1  christos             ordering ISO documents. See gopher://info.itu.ch for an
    432  1.1  christos             online version of ITU-T V.42.)
    433  1.1  christos 
    434  1.1  christos          ISIZE (Input SIZE)
    435  1.1  christos             This contains the size of the original (uncompressed) input
    436  1.1  christos             data modulo 2^32.
    437  1.1  christos 
    438  1.1  christos       2.3.1.1. Extra field
    439  1.1  christos 
    440  1.1  christos          If the FLG.FEXTRA bit is set, an "extra field" is present in
    441  1.1  christos          the header, with total length XLEN bytes.  It consists of a
    442  1.1  christos          series of subfields, each of the form:
    443  1.1  christos 
    444  1.1  christos             +---+---+---+---+==================================+
    445  1.1  christos             |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
    446  1.1  christos             +---+---+---+---+==================================+
    447  1.1  christos 
    448  1.1  christos          SI1 and SI2 provide a subfield ID, typically two ASCII letters
    449  1.1  christos          with some mnemonic value.  Jean-Loup Gailly
    450  1.1  christos          <gzip (a] prep.ai.mit.edu> is maintaining a registry of subfield
    451  1.1  christos          IDs; please send him any subfield ID you wish to use.  Subfield
    452  1.1  christos          IDs with SI2 = 0 are reserved for future use.  The following
    453  1.1  christos          IDs are currently defined:
    454  1.1  christos 
    455  1.1  christos 
    456  1.1  christos 
    457  1.1  christos Deutsch                      Informational                      [Page 8]
    458  1.1  christos 
    460  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    461  1.1  christos 
    462  1.1  christos 
    463  1.1  christos             SI1         SI2         Data
    464  1.1  christos             ----------  ----------  ----
    465  1.1  christos             0x41 ('A')  0x70 ('P')  Apollo file type information
    466  1.1  christos 
    467  1.1  christos          LEN gives the length of the subfield data, excluding the 4
    468  1.1  christos          initial bytes.
    469  1.1  christos 
    470  1.1  christos       2.3.1.2. Compliance
    471  1.1  christos 
    472  1.1  christos          A compliant compressor must produce files with correct ID1,
    473  1.1  christos          ID2, CM, CRC32, and ISIZE, but may set all the other fields in
    474  1.1  christos          the fixed-length part of the header to default values (255 for
    475  1.1  christos          OS, 0 for all others).  The compressor must set all reserved
    476  1.1  christos          bits to zero.
    477  1.1  christos 
    478  1.1  christos          A compliant decompressor must check ID1, ID2, and CM, and
    479  1.1  christos          provide an error indication if any of these have incorrect
    480  1.1  christos          values.  It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
    481  1.1  christos          at least so it can skip over the optional fields if they are
    482  1.1  christos          present.  It need not examine any other part of the header or
    483  1.1  christos          trailer; in particular, a decompressor may ignore FTEXT and OS
    484  1.1  christos          and always produce binary output, and still be compliant.  A
    485  1.1  christos          compliant decompressor must give an error indication if any
    486  1.1  christos          reserved bit is non-zero, since such a bit could indicate the
    487  1.1  christos          presence of a new field that would cause subsequent data to be
    488  1.1  christos          interpreted incorrectly.
    489  1.1  christos 
    490  1.1  christos 3. References
    491  1.1  christos 
    492  1.1  christos    [1] "Information Processing - 8-bit single-byte coded graphic
    493  1.1  christos        character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
    494  1.1  christos        The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
    495  1.1  christos        ASCII. Files defining this character set are available as
    496  1.1  christos        iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
    497  1.1  christos 
    498  1.1  christos    [2] ISO 3309
    499  1.1  christos 
    500  1.1  christos    [3] ITU-T recommendation V.42
    501  1.1  christos 
    502  1.1  christos    [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
    503  1.1  christos        available in ftp://ftp.uu.net/pub/archiving/zip/doc/
    504  1.1  christos 
    505  1.1  christos    [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
    506  1.1  christos        ftp://prep.ai.mit.edu/pub/gnu/
    507  1.1  christos 
    508  1.1  christos    [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
    509  1.1  christos        Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
    510  1.1  christos 
    511  1.1  christos 
    512  1.1  christos 
    513  1.1  christos 
    514  1.1  christos Deutsch                      Informational                      [Page 9]
    515  1.1  christos 
    517  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    518  1.1  christos 
    519  1.1  christos 
    520  1.1  christos    [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
    521  1.1  christos        pp.118-133.
    522  1.1  christos 
    523  1.1  christos    [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
    524  1.1  christos        describing the CRC concept.
    525  1.1  christos 
    526  1.1  christos 4. Security Considerations
    527  1.1  christos 
    528  1.1  christos    Any data compression method involves the reduction of redundancy in
    529  1.1  christos    the data.  Consequently, any corruption of the data is likely to have
    530  1.1  christos    severe effects and be difficult to correct.  Uncompressed text, on
    531  1.1  christos    the other hand, will probably still be readable despite the presence
    532  1.1  christos    of some corrupted bytes.
    533  1.1  christos 
    534  1.1  christos    It is recommended that systems using this data format provide some
    535  1.1  christos    means of validating the integrity of the compressed data, such as by
    536  1.1  christos    setting and checking the CRC-32 check value.
    537  1.1  christos 
    538  1.1  christos 5. Acknowledgements
    539  1.1  christos 
    540  1.1  christos    Trademarks cited in this document are the property of their
    541  1.1  christos    respective owners.
    542  1.1  christos 
    543  1.1  christos    Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
    544  1.1  christos    the related software described in this specification.  Glenn
    545  1.1  christos    Randers-Pehrson converted this document to RFC and HTML format.
    546  1.1  christos 
    547  1.1  christos 6. Author's Address
    548  1.1  christos 
    549  1.1  christos    L. Peter Deutsch
    550  1.1  christos    Aladdin Enterprises
    551  1.1  christos    203 Santa Margarita Ave.
    552  1.1  christos    Menlo Park, CA 94025
    553  1.1  christos 
    554  1.1  christos    Phone: (415) 322-0103 (AM only)
    555  1.1  christos    FAX:   (415) 322-1734
    556  1.1  christos    EMail: <ghost (a] aladdin.com>
    557  1.1  christos 
    558  1.1  christos    Questions about the technical content of this specification can be
    559  1.1  christos    sent by email to:
    560  1.1  christos 
    561  1.1  christos    Jean-Loup Gailly <gzip (a] prep.ai.mit.edu> and
    562  1.1  christos    Mark Adler <madler (a] alumni.caltech.edu>
    563  1.1  christos 
    564  1.1  christos    Editorial comments on this specification can be sent by email to:
    565  1.1  christos 
    566  1.1  christos    L. Peter Deutsch <ghost (a] aladdin.com> and
    567  1.1  christos    Glenn Randers-Pehrson <randeg (a] alumni.rpi.edu>
    568  1.1  christos 
    569  1.1  christos 
    570  1.1  christos 
    571  1.1  christos Deutsch                      Informational                     [Page 10]
    572  1.1  christos 
    574  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    575  1.1  christos 
    576  1.1  christos 
    577  1.1  christos 7. Appendix: Jean-Loup Gailly's gzip utility
    578  1.1  christos 
    579  1.1  christos    The most widely used implementation of gzip compression, and the
    580  1.1  christos    original documentation on which this specification is based, were
    581  1.1  christos    created by Jean-Loup Gailly <gzip (a] prep.ai.mit.edu>.  Since this
    582  1.1  christos    implementation is a de facto standard, we mention some more of its
    583  1.1  christos    features here.  Again, the material in this section is not part of
    584  1.1  christos    the specification per se, and implementations need not follow it to
    585  1.1  christos    be compliant.
    586  1.1  christos 
    587  1.1  christos    When compressing or decompressing a file, gzip preserves the
    588  1.1  christos    protection, ownership, and modification time attributes on the local
    589  1.1  christos    file system, since there is no provision for representing protection
    590  1.1  christos    attributes in the gzip file format itself.  Since the file format
    591  1.1  christos    includes a modification time, the gzip decompressor provides a
    592  1.1  christos    command line switch that assigns the modification time from the file,
    593  1.1  christos    rather than the local modification time of the compressed input, to
    594  1.1  christos    the decompressed output.
    595  1.1  christos 
    596  1.1  christos 8. Appendix: Sample CRC Code
    597  1.1  christos 
    598  1.1  christos    The following sample code represents a practical implementation of
    599  1.1  christos    the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
    600  1.1  christos    for a formal specification.)
    601  1.1  christos 
    602  1.1  christos    The sample code is in the ANSI C programming language. Non C users
    603  1.1  christos    may find it easier to read with these hints:
    604  1.1  christos 
    605  1.1  christos       &      Bitwise AND operator.
    606  1.1  christos       ^      Bitwise exclusive-OR operator.
    607  1.1  christos       >>     Bitwise right shift operator. When applied to an
    608  1.1  christos              unsigned quantity, as here, right shift inserts zero
    609  1.1  christos              bit(s) at the left.
    610  1.1  christos       !      Logical NOT operator.
    611  1.1  christos       ++     "n++" increments the variable n.
    612  1.1  christos       0xNNN  0x introduces a hexadecimal (base 16) constant.
    613  1.1  christos              Suffix L indicates a long value (at least 32 bits).
    614  1.1  christos 
    615  1.1  christos       /* Table of CRCs of all 8-bit messages. */
    616  1.1  christos       unsigned long crc_table[256];
    617  1.1  christos 
    618  1.1  christos       /* Flag: has the table been computed? Initially false. */
    619  1.1  christos       int crc_table_computed = 0;
    620  1.1  christos 
    621  1.1  christos       /* Make the table for a fast CRC. */
    622  1.1  christos       void make_crc_table(void)
    623  1.1  christos       {
    624  1.1  christos         unsigned long c;
    625  1.1  christos 
    626  1.1  christos 
    627  1.1  christos 
    628  1.1  christos Deutsch                      Informational                     [Page 11]
    629  1.1  christos 
    631  1.1  christos RFC 1952             GZIP File Format Specification             May 1996
    632  1.1  christos 
    633  1.1  christos 
    634  1.1  christos         int n, k;
    635  1.1  christos         for (n = 0; n < 256; n++) {
    636  1.1  christos           c = (unsigned long) n;
    637  1.1  christos           for (k = 0; k < 8; k++) {
    638  1.1  christos             if (c & 1) {
    639  1.1  christos               c = 0xedb88320L ^ (c >> 1);
    640  1.1  christos             } else {
    641  1.1  christos               c = c >> 1;
    642  1.1  christos             }
    643  1.1  christos           }
    644  1.1  christos           crc_table[n] = c;
    645  1.1  christos         }
    646  1.1  christos         crc_table_computed = 1;
    647  1.1  christos       }
    648  1.1  christos 
    649  1.1  christos       /*
    650  1.1  christos          Update a running crc with the bytes buf[0..len-1] and return
    651  1.1  christos        the updated crc. The crc should be initialized to zero. Pre- and
    652  1.1  christos        post-conditioning (one's complement) is performed within this
    653  1.1  christos        function so it shouldn't be done by the caller. Usage example:
    654  1.1  christos 
    655  1.1  christos          unsigned long crc = 0L;
    656  1.1  christos 
    657  1.1  christos          while (read_buffer(buffer, length) != EOF) {
    658  1.1  christos            crc = update_crc(crc, buffer, length);
    659  1.1  christos          }
    660  1.1  christos          if (crc != original_crc) error();
    661  1.1  christos       */
    662  1.1  christos       unsigned long update_crc(unsigned long crc,
    663  1.1  christos                       unsigned char *buf, int len)
    664  1.1  christos       {
    665  1.1  christos         unsigned long c = crc ^ 0xffffffffL;
    666  1.1  christos         int n;
    667  1.1  christos 
    668  1.1  christos         if (!crc_table_computed)
    669  1.1  christos           make_crc_table();
    670  1.1  christos         for (n = 0; n < len; n++) {
    671  1.1  christos           c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
    672  1.1  christos         }
    673  1.1  christos         return c ^ 0xffffffffL;
    674  1.1  christos       }
    675  1.1  christos 
    676                      /* Return the CRC of the bytes buf[0..len-1]. */
    677                      unsigned long crc(unsigned char *buf, int len)
    678                      {
    679                        return update_crc(0L, buf, len);
    680                      }
    681                
    682                
    683                
    684                
    685                Deutsch                      Informational                     [Page 12]
    686                
    688