Home | History | Annotate | Line # | Download | only in examples
enough.c revision 1.1.1.1.4.2
      1  1.1.1.1.4.2  pgoyette /* enough.c -- determine the maximum size of inflate's Huffman code tables over
      2  1.1.1.1.4.2  pgoyette  * all possible valid and complete Huffman codes, subject to a length limit.
      3  1.1.1.1.4.2  pgoyette  * Copyright (C) 2007, 2008, 2012 Mark Adler
      4  1.1.1.1.4.2  pgoyette  * Version 1.4  18 August 2012  Mark Adler
      5  1.1.1.1.4.2  pgoyette  */
      6  1.1.1.1.4.2  pgoyette 
      7  1.1.1.1.4.2  pgoyette /* Version history:
      8  1.1.1.1.4.2  pgoyette    1.0   3 Jan 2007  First version (derived from codecount.c version 1.4)
      9  1.1.1.1.4.2  pgoyette    1.1   4 Jan 2007  Use faster incremental table usage computation
     10  1.1.1.1.4.2  pgoyette                      Prune examine() search on previously visited states
     11  1.1.1.1.4.2  pgoyette    1.2   5 Jan 2007  Comments clean up
     12  1.1.1.1.4.2  pgoyette                      As inflate does, decrease root for short codes
     13  1.1.1.1.4.2  pgoyette                      Refuse cases where inflate would increase root
     14  1.1.1.1.4.2  pgoyette    1.3  17 Feb 2008  Add argument for initial root table size
     15  1.1.1.1.4.2  pgoyette                      Fix bug for initial root table size == max - 1
     16  1.1.1.1.4.2  pgoyette                      Use a macro to compute the history index
     17  1.1.1.1.4.2  pgoyette    1.4  18 Aug 2012  Avoid shifts more than bits in type (caused endless loop!)
     18  1.1.1.1.4.2  pgoyette                      Clean up comparisons of different types
     19  1.1.1.1.4.2  pgoyette                      Clean up code indentation
     20  1.1.1.1.4.2  pgoyette  */
     21  1.1.1.1.4.2  pgoyette 
     22  1.1.1.1.4.2  pgoyette /*
     23  1.1.1.1.4.2  pgoyette    Examine all possible Huffman codes for a given number of symbols and a
     24  1.1.1.1.4.2  pgoyette    maximum code length in bits to determine the maximum table size for zilb's
     25  1.1.1.1.4.2  pgoyette    inflate.  Only complete Huffman codes are counted.
     26  1.1.1.1.4.2  pgoyette 
     27  1.1.1.1.4.2  pgoyette    Two codes are considered distinct if the vectors of the number of codes per
     28  1.1.1.1.4.2  pgoyette    length are not identical.  So permutations of the symbol assignments result
     29  1.1.1.1.4.2  pgoyette    in the same code for the counting, as do permutations of the assignments of
     30  1.1.1.1.4.2  pgoyette    the bit values to the codes (i.e. only canonical codes are counted).
     31  1.1.1.1.4.2  pgoyette 
     32  1.1.1.1.4.2  pgoyette    We build a code from shorter to longer lengths, determining how many symbols
     33  1.1.1.1.4.2  pgoyette    are coded at each length.  At each step, we have how many symbols remain to
     34  1.1.1.1.4.2  pgoyette    be coded, what the last code length used was, and how many bit patterns of
     35  1.1.1.1.4.2  pgoyette    that length remain unused. Then we add one to the code length and double the
     36  1.1.1.1.4.2  pgoyette    number of unused patterns to graduate to the next code length.  We then
     37  1.1.1.1.4.2  pgoyette    assign all portions of the remaining symbols to that code length that
     38  1.1.1.1.4.2  pgoyette    preserve the properties of a correct and eventually complete code.  Those
     39  1.1.1.1.4.2  pgoyette    properties are: we cannot use more bit patterns than are available; and when
     40  1.1.1.1.4.2  pgoyette    all the symbols are used, there are exactly zero possible bit patterns
     41  1.1.1.1.4.2  pgoyette    remaining.
     42  1.1.1.1.4.2  pgoyette 
     43  1.1.1.1.4.2  pgoyette    The inflate Huffman decoding algorithm uses two-level lookup tables for
     44  1.1.1.1.4.2  pgoyette    speed.  There is a single first-level table to decode codes up to root bits
     45  1.1.1.1.4.2  pgoyette    in length (root == 9 in the current inflate implementation).  The table
     46  1.1.1.1.4.2  pgoyette    has 1 << root entries and is indexed by the next root bits of input.  Codes
     47  1.1.1.1.4.2  pgoyette    shorter than root bits have replicated table entries, so that the correct
     48  1.1.1.1.4.2  pgoyette    entry is pointed to regardless of the bits that follow the short code.  If
     49  1.1.1.1.4.2  pgoyette    the code is longer than root bits, then the table entry points to a second-
     50  1.1.1.1.4.2  pgoyette    level table.  The size of that table is determined by the longest code with
     51  1.1.1.1.4.2  pgoyette    that root-bit prefix.  If that longest code has length len, then the table
     52  1.1.1.1.4.2  pgoyette    has size 1 << (len - root), to index the remaining bits in that set of
     53  1.1.1.1.4.2  pgoyette    codes.  Each subsequent root-bit prefix then has its own sub-table.  The
     54  1.1.1.1.4.2  pgoyette    total number of table entries required by the code is calculated
     55  1.1.1.1.4.2  pgoyette    incrementally as the number of codes at each bit length is populated.  When
     56  1.1.1.1.4.2  pgoyette    all of the codes are shorter than root bits, then root is reduced to the
     57  1.1.1.1.4.2  pgoyette    longest code length, resulting in a single, smaller, one-level table.
     58  1.1.1.1.4.2  pgoyette 
     59  1.1.1.1.4.2  pgoyette    The inflate algorithm also provides for small values of root (relative to
     60  1.1.1.1.4.2  pgoyette    the log2 of the number of symbols), where the shortest code has more bits
     61  1.1.1.1.4.2  pgoyette    than root.  In that case, root is increased to the length of the shortest
     62  1.1.1.1.4.2  pgoyette    code.  This program, by design, does not handle that case, so it is verified
     63  1.1.1.1.4.2  pgoyette    that the number of symbols is less than 2^(root + 1).
     64  1.1.1.1.4.2  pgoyette 
     65  1.1.1.1.4.2  pgoyette    In order to speed up the examination (by about ten orders of magnitude for
     66  1.1.1.1.4.2  pgoyette    the default arguments), the intermediate states in the build-up of a code
     67  1.1.1.1.4.2  pgoyette    are remembered and previously visited branches are pruned.  The memory
     68  1.1.1.1.4.2  pgoyette    required for this will increase rapidly with the total number of symbols and
     69  1.1.1.1.4.2  pgoyette    the maximum code length in bits.  However this is a very small price to pay
     70  1.1.1.1.4.2  pgoyette    for the vast speedup.
     71  1.1.1.1.4.2  pgoyette 
     72  1.1.1.1.4.2  pgoyette    First, all of the possible Huffman codes are counted, and reachable
     73  1.1.1.1.4.2  pgoyette    intermediate states are noted by a non-zero count in a saved-results array.
     74  1.1.1.1.4.2  pgoyette    Second, the intermediate states that lead to (root + 1) bit or longer codes
     75  1.1.1.1.4.2  pgoyette    are used to look at all sub-codes from those junctures for their inflate
     76  1.1.1.1.4.2  pgoyette    memory usage.  (The amount of memory used is not affected by the number of
     77  1.1.1.1.4.2  pgoyette    codes of root bits or less in length.)  Third, the visited states in the
     78  1.1.1.1.4.2  pgoyette    construction of those sub-codes and the associated calculation of the table
     79  1.1.1.1.4.2  pgoyette    size is recalled in order to avoid recalculating from the same juncture.
     80  1.1.1.1.4.2  pgoyette    Beginning the code examination at (root + 1) bit codes, which is enabled by
     81  1.1.1.1.4.2  pgoyette    identifying the reachable nodes, accounts for about six of the orders of
     82  1.1.1.1.4.2  pgoyette    magnitude of improvement for the default arguments.  About another four
     83  1.1.1.1.4.2  pgoyette    orders of magnitude come from not revisiting previous states.  Out of
     84  1.1.1.1.4.2  pgoyette    approximately 2x10^16 possible Huffman codes, only about 2x10^6 sub-codes
     85  1.1.1.1.4.2  pgoyette    need to be examined to cover all of the possible table memory usage cases
     86  1.1.1.1.4.2  pgoyette    for the default arguments of 286 symbols limited to 15-bit codes.
     87  1.1.1.1.4.2  pgoyette 
     88  1.1.1.1.4.2  pgoyette    Note that an unsigned long long type is used for counting.  It is quite easy
     89  1.1.1.1.4.2  pgoyette    to exceed the capacity of an eight-byte integer with a large number of
     90  1.1.1.1.4.2  pgoyette    symbols and a large maximum code length, so multiple-precision arithmetic
     91  1.1.1.1.4.2  pgoyette    would need to replace the unsigned long long arithmetic in that case.  This
     92  1.1.1.1.4.2  pgoyette    program will abort if an overflow occurs.  The big_t type identifies where
     93  1.1.1.1.4.2  pgoyette    the counting takes place.
     94  1.1.1.1.4.2  pgoyette 
     95  1.1.1.1.4.2  pgoyette    An unsigned long long type is also used for calculating the number of
     96  1.1.1.1.4.2  pgoyette    possible codes remaining at the maximum length.  This limits the maximum
     97  1.1.1.1.4.2  pgoyette    code length to the number of bits in a long long minus the number of bits
     98  1.1.1.1.4.2  pgoyette    needed to represent the symbols in a flat code.  The code_t type identifies
     99  1.1.1.1.4.2  pgoyette    where the bit pattern counting takes place.
    100  1.1.1.1.4.2  pgoyette  */
    101  1.1.1.1.4.2  pgoyette 
    102  1.1.1.1.4.2  pgoyette #include <stdio.h>
    103  1.1.1.1.4.2  pgoyette #include <stdlib.h>
    104  1.1.1.1.4.2  pgoyette #include <string.h>
    105  1.1.1.1.4.2  pgoyette #include <assert.h>
    106  1.1.1.1.4.2  pgoyette 
    107  1.1.1.1.4.2  pgoyette #define local static
    108  1.1.1.1.4.2  pgoyette 
    109  1.1.1.1.4.2  pgoyette /* special data types */
    110  1.1.1.1.4.2  pgoyette typedef unsigned long long big_t;   /* type for code counting */
    111  1.1.1.1.4.2  pgoyette typedef unsigned long long code_t;  /* type for bit pattern counting */
    112  1.1.1.1.4.2  pgoyette struct tab {                        /* type for been here check */
    113  1.1.1.1.4.2  pgoyette     size_t len;         /* length of bit vector in char's */
    114  1.1.1.1.4.2  pgoyette     char *vec;          /* allocated bit vector */
    115  1.1.1.1.4.2  pgoyette };
    116  1.1.1.1.4.2  pgoyette 
    117  1.1.1.1.4.2  pgoyette /* The array for saving results, num[], is indexed with this triplet:
    118  1.1.1.1.4.2  pgoyette 
    119  1.1.1.1.4.2  pgoyette       syms: number of symbols remaining to code
    120  1.1.1.1.4.2  pgoyette       left: number of available bit patterns at length len
    121  1.1.1.1.4.2  pgoyette       len: number of bits in the codes currently being assigned
    122  1.1.1.1.4.2  pgoyette 
    123  1.1.1.1.4.2  pgoyette    Those indices are constrained thusly when saving results:
    124  1.1.1.1.4.2  pgoyette 
    125  1.1.1.1.4.2  pgoyette       syms: 3..totsym (totsym == total symbols to code)
    126  1.1.1.1.4.2  pgoyette       left: 2..syms - 1, but only the evens (so syms == 8 -> 2, 4, 6)
    127  1.1.1.1.4.2  pgoyette       len: 1..max - 1 (max == maximum code length in bits)
    128  1.1.1.1.4.2  pgoyette 
    129  1.1.1.1.4.2  pgoyette    syms == 2 is not saved since that immediately leads to a single code.  left
    130  1.1.1.1.4.2  pgoyette    must be even, since it represents the number of available bit patterns at
    131  1.1.1.1.4.2  pgoyette    the current length, which is double the number at the previous length.
    132  1.1.1.1.4.2  pgoyette    left ends at syms-1 since left == syms immediately results in a single code.
    133  1.1.1.1.4.2  pgoyette    (left > sym is not allowed since that would result in an incomplete code.)
    134  1.1.1.1.4.2  pgoyette    len is less than max, since the code completes immediately when len == max.
    135  1.1.1.1.4.2  pgoyette 
    136  1.1.1.1.4.2  pgoyette    The offset into the array is calculated for the three indices with the
    137  1.1.1.1.4.2  pgoyette    first one (syms) being outermost, and the last one (len) being innermost.
    138  1.1.1.1.4.2  pgoyette    We build the array with length max-1 lists for the len index, with syms-3
    139  1.1.1.1.4.2  pgoyette    of those for each symbol.  There are totsym-2 of those, with each one
    140  1.1.1.1.4.2  pgoyette    varying in length as a function of sym.  See the calculation of index in
    141  1.1.1.1.4.2  pgoyette    count() for the index, and the calculation of size in main() for the size
    142  1.1.1.1.4.2  pgoyette    of the array.
    143  1.1.1.1.4.2  pgoyette 
    144  1.1.1.1.4.2  pgoyette    For the deflate example of 286 symbols limited to 15-bit codes, the array
    145  1.1.1.1.4.2  pgoyette    has 284,284 entries, taking up 2.17 MB for an 8-byte big_t.  More than
    146  1.1.1.1.4.2  pgoyette    half of the space allocated for saved results is actually used -- not all
    147  1.1.1.1.4.2  pgoyette    possible triplets are reached in the generation of valid Huffman codes.
    148  1.1.1.1.4.2  pgoyette  */
    149  1.1.1.1.4.2  pgoyette 
    150  1.1.1.1.4.2  pgoyette /* The array for tracking visited states, done[], is itself indexed identically
    151  1.1.1.1.4.2  pgoyette    to the num[] array as described above for the (syms, left, len) triplet.
    152  1.1.1.1.4.2  pgoyette    Each element in the array is further indexed by the (mem, rem) doublet,
    153  1.1.1.1.4.2  pgoyette    where mem is the amount of inflate table space used so far, and rem is the
    154  1.1.1.1.4.2  pgoyette    remaining unused entries in the current inflate sub-table.  Each indexed
    155  1.1.1.1.4.2  pgoyette    element is simply one bit indicating whether the state has been visited or
    156  1.1.1.1.4.2  pgoyette    not.  Since the ranges for mem and rem are not known a priori, each bit
    157  1.1.1.1.4.2  pgoyette    vector is of a variable size, and grows as needed to accommodate the visited
    158  1.1.1.1.4.2  pgoyette    states.  mem and rem are used to calculate a single index in a triangular
    159  1.1.1.1.4.2  pgoyette    array.  Since the range of mem is expected in the default case to be about
    160  1.1.1.1.4.2  pgoyette    ten times larger than the range of rem, the array is skewed to reduce the
    161  1.1.1.1.4.2  pgoyette    memory usage, with eight times the range for mem than for rem.  See the
    162  1.1.1.1.4.2  pgoyette    calculations for offset and bit in beenhere() for the details.
    163  1.1.1.1.4.2  pgoyette 
    164  1.1.1.1.4.2  pgoyette    For the deflate example of 286 symbols limited to 15-bit codes, the bit
    165  1.1.1.1.4.2  pgoyette    vectors grow to total approximately 21 MB, in addition to the 4.3 MB done[]
    166  1.1.1.1.4.2  pgoyette    array itself.
    167  1.1.1.1.4.2  pgoyette  */
    168  1.1.1.1.4.2  pgoyette 
    169  1.1.1.1.4.2  pgoyette /* Globals to avoid propagating constants or constant pointers recursively */
    170  1.1.1.1.4.2  pgoyette local int max;          /* maximum allowed bit length for the codes */
    171  1.1.1.1.4.2  pgoyette local int root;         /* size of base code table in bits */
    172  1.1.1.1.4.2  pgoyette local int large;        /* largest code table so far */
    173  1.1.1.1.4.2  pgoyette local size_t size;      /* number of elements in num and done */
    174  1.1.1.1.4.2  pgoyette local int *code;        /* number of symbols assigned to each bit length */
    175  1.1.1.1.4.2  pgoyette local big_t *num;       /* saved results array for code counting */
    176  1.1.1.1.4.2  pgoyette local struct tab *done; /* states already evaluated array */
    177  1.1.1.1.4.2  pgoyette 
    178  1.1.1.1.4.2  pgoyette /* Index function for num[] and done[] */
    179  1.1.1.1.4.2  pgoyette #define INDEX(i,j,k) (((size_t)((i-1)>>1)*((i-2)>>1)+(j>>1)-1)*(max-1)+k-1)
    180  1.1.1.1.4.2  pgoyette 
    181  1.1.1.1.4.2  pgoyette /* Free allocated space.  Uses globals code, num, and done. */
    182  1.1.1.1.4.2  pgoyette local void cleanup(void)
    183  1.1.1.1.4.2  pgoyette {
    184  1.1.1.1.4.2  pgoyette     size_t n;
    185  1.1.1.1.4.2  pgoyette 
    186  1.1.1.1.4.2  pgoyette     if (done != NULL) {
    187  1.1.1.1.4.2  pgoyette         for (n = 0; n < size; n++)
    188  1.1.1.1.4.2  pgoyette             if (done[n].len)
    189  1.1.1.1.4.2  pgoyette                 free(done[n].vec);
    190  1.1.1.1.4.2  pgoyette         free(done);
    191  1.1.1.1.4.2  pgoyette     }
    192  1.1.1.1.4.2  pgoyette     if (num != NULL)
    193  1.1.1.1.4.2  pgoyette         free(num);
    194  1.1.1.1.4.2  pgoyette     if (code != NULL)
    195  1.1.1.1.4.2  pgoyette         free(code);
    196  1.1.1.1.4.2  pgoyette }
    197  1.1.1.1.4.2  pgoyette 
    198  1.1.1.1.4.2  pgoyette /* Return the number of possible Huffman codes using bit patterns of lengths
    199  1.1.1.1.4.2  pgoyette    len through max inclusive, coding syms symbols, with left bit patterns of
    200  1.1.1.1.4.2  pgoyette    length len unused -- return -1 if there is an overflow in the counting.
    201  1.1.1.1.4.2  pgoyette    Keep a record of previous results in num to prevent repeating the same
    202  1.1.1.1.4.2  pgoyette    calculation.  Uses the globals max and num. */
    203  1.1.1.1.4.2  pgoyette local big_t count(int syms, int len, int left)
    204  1.1.1.1.4.2  pgoyette {
    205  1.1.1.1.4.2  pgoyette     big_t sum;          /* number of possible codes from this juncture */
    206  1.1.1.1.4.2  pgoyette     big_t got;          /* value returned from count() */
    207  1.1.1.1.4.2  pgoyette     int least;          /* least number of syms to use at this juncture */
    208  1.1.1.1.4.2  pgoyette     int most;           /* most number of syms to use at this juncture */
    209  1.1.1.1.4.2  pgoyette     int use;            /* number of bit patterns to use in next call */
    210  1.1.1.1.4.2  pgoyette     size_t index;       /* index of this case in *num */
    211  1.1.1.1.4.2  pgoyette 
    212  1.1.1.1.4.2  pgoyette     /* see if only one possible code */
    213  1.1.1.1.4.2  pgoyette     if (syms == left)
    214  1.1.1.1.4.2  pgoyette         return 1;
    215  1.1.1.1.4.2  pgoyette 
    216  1.1.1.1.4.2  pgoyette     /* note and verify the expected state */
    217  1.1.1.1.4.2  pgoyette     assert(syms > left && left > 0 && len < max);
    218  1.1.1.1.4.2  pgoyette 
    219  1.1.1.1.4.2  pgoyette     /* see if we've done this one already */
    220  1.1.1.1.4.2  pgoyette     index = INDEX(syms, left, len);
    221  1.1.1.1.4.2  pgoyette     got = num[index];
    222  1.1.1.1.4.2  pgoyette     if (got)
    223  1.1.1.1.4.2  pgoyette         return got;         /* we have -- return the saved result */
    224  1.1.1.1.4.2  pgoyette 
    225  1.1.1.1.4.2  pgoyette     /* we need to use at least this many bit patterns so that the code won't be
    226  1.1.1.1.4.2  pgoyette        incomplete at the next length (more bit patterns than symbols) */
    227  1.1.1.1.4.2  pgoyette     least = (left << 1) - syms;
    228  1.1.1.1.4.2  pgoyette     if (least < 0)
    229  1.1.1.1.4.2  pgoyette         least = 0;
    230  1.1.1.1.4.2  pgoyette 
    231  1.1.1.1.4.2  pgoyette     /* we can use at most this many bit patterns, lest there not be enough
    232  1.1.1.1.4.2  pgoyette        available for the remaining symbols at the maximum length (if there were
    233  1.1.1.1.4.2  pgoyette        no limit to the code length, this would become: most = left - 1) */
    234  1.1.1.1.4.2  pgoyette     most = (((code_t)left << (max - len)) - syms) /
    235  1.1.1.1.4.2  pgoyette             (((code_t)1 << (max - len)) - 1);
    236  1.1.1.1.4.2  pgoyette 
    237  1.1.1.1.4.2  pgoyette     /* count all possible codes from this juncture and add them up */
    238  1.1.1.1.4.2  pgoyette     sum = 0;
    239  1.1.1.1.4.2  pgoyette     for (use = least; use <= most; use++) {
    240  1.1.1.1.4.2  pgoyette         got = count(syms - use, len + 1, (left - use) << 1);
    241  1.1.1.1.4.2  pgoyette         sum += got;
    242  1.1.1.1.4.2  pgoyette         if (got == (big_t)0 - 1 || sum < got)   /* overflow */
    243  1.1.1.1.4.2  pgoyette             return (big_t)0 - 1;
    244  1.1.1.1.4.2  pgoyette     }
    245  1.1.1.1.4.2  pgoyette 
    246  1.1.1.1.4.2  pgoyette     /* verify that all recursive calls are productive */
    247  1.1.1.1.4.2  pgoyette     assert(sum != 0);
    248  1.1.1.1.4.2  pgoyette 
    249  1.1.1.1.4.2  pgoyette     /* save the result and return it */
    250  1.1.1.1.4.2  pgoyette     num[index] = sum;
    251  1.1.1.1.4.2  pgoyette     return sum;
    252  1.1.1.1.4.2  pgoyette }
    253  1.1.1.1.4.2  pgoyette 
    254  1.1.1.1.4.2  pgoyette /* Return true if we've been here before, set to true if not.  Set a bit in a
    255  1.1.1.1.4.2  pgoyette    bit vector to indicate visiting this state.  Each (syms,len,left) state
    256  1.1.1.1.4.2  pgoyette    has a variable size bit vector indexed by (mem,rem).  The bit vector is
    257  1.1.1.1.4.2  pgoyette    lengthened if needed to allow setting the (mem,rem) bit. */
    258  1.1.1.1.4.2  pgoyette local int beenhere(int syms, int len, int left, int mem, int rem)
    259  1.1.1.1.4.2  pgoyette {
    260  1.1.1.1.4.2  pgoyette     size_t index;       /* index for this state's bit vector */
    261  1.1.1.1.4.2  pgoyette     size_t offset;      /* offset in this state's bit vector */
    262  1.1.1.1.4.2  pgoyette     int bit;            /* mask for this state's bit */
    263  1.1.1.1.4.2  pgoyette     size_t length;      /* length of the bit vector in bytes */
    264  1.1.1.1.4.2  pgoyette     char *vector;       /* new or enlarged bit vector */
    265  1.1.1.1.4.2  pgoyette 
    266  1.1.1.1.4.2  pgoyette     /* point to vector for (syms,left,len), bit in vector for (mem,rem) */
    267  1.1.1.1.4.2  pgoyette     index = INDEX(syms, left, len);
    268  1.1.1.1.4.2  pgoyette     mem -= 1 << root;
    269  1.1.1.1.4.2  pgoyette     offset = (mem >> 3) + rem;
    270  1.1.1.1.4.2  pgoyette     offset = ((offset * (offset + 1)) >> 1) + rem;
    271  1.1.1.1.4.2  pgoyette     bit = 1 << (mem & 7);
    272  1.1.1.1.4.2  pgoyette 
    273  1.1.1.1.4.2  pgoyette     /* see if we've been here */
    274  1.1.1.1.4.2  pgoyette     length = done[index].len;
    275  1.1.1.1.4.2  pgoyette     if (offset < length && (done[index].vec[offset] & bit) != 0)
    276  1.1.1.1.4.2  pgoyette         return 1;       /* done this! */
    277  1.1.1.1.4.2  pgoyette 
    278  1.1.1.1.4.2  pgoyette     /* we haven't been here before -- set the bit to show we have now */
    279  1.1.1.1.4.2  pgoyette 
    280  1.1.1.1.4.2  pgoyette     /* see if we need to lengthen the vector in order to set the bit */
    281  1.1.1.1.4.2  pgoyette     if (length <= offset) {
    282  1.1.1.1.4.2  pgoyette         /* if we have one already, enlarge it, zero out the appended space */
    283  1.1.1.1.4.2  pgoyette         if (length) {
    284  1.1.1.1.4.2  pgoyette             do {
    285  1.1.1.1.4.2  pgoyette                 length <<= 1;
    286  1.1.1.1.4.2  pgoyette             } while (length <= offset);
    287  1.1.1.1.4.2  pgoyette             vector = realloc(done[index].vec, length);
    288  1.1.1.1.4.2  pgoyette             if (vector != NULL)
    289  1.1.1.1.4.2  pgoyette                 memset(vector + done[index].len, 0, length - done[index].len);
    290  1.1.1.1.4.2  pgoyette         }
    291  1.1.1.1.4.2  pgoyette 
    292  1.1.1.1.4.2  pgoyette         /* otherwise we need to make a new vector and zero it out */
    293  1.1.1.1.4.2  pgoyette         else {
    294  1.1.1.1.4.2  pgoyette             length = 1 << (len - root);
    295  1.1.1.1.4.2  pgoyette             while (length <= offset)
    296  1.1.1.1.4.2  pgoyette                 length <<= 1;
    297  1.1.1.1.4.2  pgoyette             vector = calloc(length, sizeof(char));
    298  1.1.1.1.4.2  pgoyette         }
    299  1.1.1.1.4.2  pgoyette 
    300  1.1.1.1.4.2  pgoyette         /* in either case, bail if we can't get the memory */
    301  1.1.1.1.4.2  pgoyette         if (vector == NULL) {
    302  1.1.1.1.4.2  pgoyette             fputs("abort: unable to allocate enough memory\n", stderr);
    303  1.1.1.1.4.2  pgoyette             cleanup();
    304  1.1.1.1.4.2  pgoyette             exit(1);
    305  1.1.1.1.4.2  pgoyette         }
    306  1.1.1.1.4.2  pgoyette 
    307  1.1.1.1.4.2  pgoyette         /* install the new vector */
    308  1.1.1.1.4.2  pgoyette         done[index].len = length;
    309  1.1.1.1.4.2  pgoyette         done[index].vec = vector;
    310  1.1.1.1.4.2  pgoyette     }
    311  1.1.1.1.4.2  pgoyette 
    312  1.1.1.1.4.2  pgoyette     /* set the bit */
    313  1.1.1.1.4.2  pgoyette     done[index].vec[offset] |= bit;
    314  1.1.1.1.4.2  pgoyette     return 0;
    315  1.1.1.1.4.2  pgoyette }
    316  1.1.1.1.4.2  pgoyette 
    317  1.1.1.1.4.2  pgoyette /* Examine all possible codes from the given node (syms, len, left).  Compute
    318  1.1.1.1.4.2  pgoyette    the amount of memory required to build inflate's decoding tables, where the
    319  1.1.1.1.4.2  pgoyette    number of code structures used so far is mem, and the number remaining in
    320  1.1.1.1.4.2  pgoyette    the current sub-table is rem.  Uses the globals max, code, root, large, and
    321  1.1.1.1.4.2  pgoyette    done. */
    322  1.1.1.1.4.2  pgoyette local void examine(int syms, int len, int left, int mem, int rem)
    323  1.1.1.1.4.2  pgoyette {
    324  1.1.1.1.4.2  pgoyette     int least;          /* least number of syms to use at this juncture */
    325  1.1.1.1.4.2  pgoyette     int most;           /* most number of syms to use at this juncture */
    326  1.1.1.1.4.2  pgoyette     int use;            /* number of bit patterns to use in next call */
    327  1.1.1.1.4.2  pgoyette 
    328  1.1.1.1.4.2  pgoyette     /* see if we have a complete code */
    329  1.1.1.1.4.2  pgoyette     if (syms == left) {
    330  1.1.1.1.4.2  pgoyette         /* set the last code entry */
    331  1.1.1.1.4.2  pgoyette         code[len] = left;
    332  1.1.1.1.4.2  pgoyette 
    333  1.1.1.1.4.2  pgoyette         /* complete computation of memory used by this code */
    334  1.1.1.1.4.2  pgoyette         while (rem < left) {
    335  1.1.1.1.4.2  pgoyette             left -= rem;
    336  1.1.1.1.4.2  pgoyette             rem = 1 << (len - root);
    337  1.1.1.1.4.2  pgoyette             mem += rem;
    338  1.1.1.1.4.2  pgoyette         }
    339  1.1.1.1.4.2  pgoyette         assert(rem == left);
    340  1.1.1.1.4.2  pgoyette 
    341  1.1.1.1.4.2  pgoyette         /* if this is a new maximum, show the entries used and the sub-code */
    342  1.1.1.1.4.2  pgoyette         if (mem > large) {
    343  1.1.1.1.4.2  pgoyette             large = mem;
    344  1.1.1.1.4.2  pgoyette             printf("max %d: ", mem);
    345  1.1.1.1.4.2  pgoyette             for (use = root + 1; use <= max; use++)
    346  1.1.1.1.4.2  pgoyette                 if (code[use])
    347  1.1.1.1.4.2  pgoyette                     printf("%d[%d] ", code[use], use);
    348  1.1.1.1.4.2  pgoyette             putchar('\n');
    349  1.1.1.1.4.2  pgoyette             fflush(stdout);
    350  1.1.1.1.4.2  pgoyette         }
    351  1.1.1.1.4.2  pgoyette 
    352  1.1.1.1.4.2  pgoyette         /* remove entries as we drop back down in the recursion */
    353  1.1.1.1.4.2  pgoyette         code[len] = 0;
    354  1.1.1.1.4.2  pgoyette         return;
    355  1.1.1.1.4.2  pgoyette     }
    356  1.1.1.1.4.2  pgoyette 
    357  1.1.1.1.4.2  pgoyette     /* prune the tree if we can */
    358  1.1.1.1.4.2  pgoyette     if (beenhere(syms, len, left, mem, rem))
    359  1.1.1.1.4.2  pgoyette         return;
    360  1.1.1.1.4.2  pgoyette 
    361  1.1.1.1.4.2  pgoyette     /* we need to use at least this many bit patterns so that the code won't be
    362  1.1.1.1.4.2  pgoyette        incomplete at the next length (more bit patterns than symbols) */
    363  1.1.1.1.4.2  pgoyette     least = (left << 1) - syms;
    364  1.1.1.1.4.2  pgoyette     if (least < 0)
    365  1.1.1.1.4.2  pgoyette         least = 0;
    366  1.1.1.1.4.2  pgoyette 
    367  1.1.1.1.4.2  pgoyette     /* we can use at most this many bit patterns, lest there not be enough
    368  1.1.1.1.4.2  pgoyette        available for the remaining symbols at the maximum length (if there were
    369  1.1.1.1.4.2  pgoyette        no limit to the code length, this would become: most = left - 1) */
    370  1.1.1.1.4.2  pgoyette     most = (((code_t)left << (max - len)) - syms) /
    371  1.1.1.1.4.2  pgoyette             (((code_t)1 << (max - len)) - 1);
    372  1.1.1.1.4.2  pgoyette 
    373  1.1.1.1.4.2  pgoyette     /* occupy least table spaces, creating new sub-tables as needed */
    374  1.1.1.1.4.2  pgoyette     use = least;
    375  1.1.1.1.4.2  pgoyette     while (rem < use) {
    376  1.1.1.1.4.2  pgoyette         use -= rem;
    377  1.1.1.1.4.2  pgoyette         rem = 1 << (len - root);
    378  1.1.1.1.4.2  pgoyette         mem += rem;
    379  1.1.1.1.4.2  pgoyette     }
    380  1.1.1.1.4.2  pgoyette     rem -= use;
    381  1.1.1.1.4.2  pgoyette 
    382  1.1.1.1.4.2  pgoyette     /* examine codes from here, updating table space as we go */
    383  1.1.1.1.4.2  pgoyette     for (use = least; use <= most; use++) {
    384  1.1.1.1.4.2  pgoyette         code[len] = use;
    385  1.1.1.1.4.2  pgoyette         examine(syms - use, len + 1, (left - use) << 1,
    386  1.1.1.1.4.2  pgoyette                 mem + (rem ? 1 << (len - root) : 0), rem << 1);
    387  1.1.1.1.4.2  pgoyette         if (rem == 0) {
    388  1.1.1.1.4.2  pgoyette             rem = 1 << (len - root);
    389  1.1.1.1.4.2  pgoyette             mem += rem;
    390  1.1.1.1.4.2  pgoyette         }
    391  1.1.1.1.4.2  pgoyette         rem--;
    392  1.1.1.1.4.2  pgoyette     }
    393  1.1.1.1.4.2  pgoyette 
    394  1.1.1.1.4.2  pgoyette     /* remove entries as we drop back down in the recursion */
    395  1.1.1.1.4.2  pgoyette     code[len] = 0;
    396  1.1.1.1.4.2  pgoyette }
    397  1.1.1.1.4.2  pgoyette 
    398  1.1.1.1.4.2  pgoyette /* Look at all sub-codes starting with root + 1 bits.  Look at only the valid
    399  1.1.1.1.4.2  pgoyette    intermediate code states (syms, left, len).  For each completed code,
    400  1.1.1.1.4.2  pgoyette    calculate the amount of memory required by inflate to build the decoding
    401  1.1.1.1.4.2  pgoyette    tables. Find the maximum amount of memory required and show the code that
    402  1.1.1.1.4.2  pgoyette    requires that maximum.  Uses the globals max, root, and num. */
    403  1.1.1.1.4.2  pgoyette local void enough(int syms)
    404  1.1.1.1.4.2  pgoyette {
    405  1.1.1.1.4.2  pgoyette     int n;              /* number of remaing symbols for this node */
    406  1.1.1.1.4.2  pgoyette     int left;           /* number of unused bit patterns at this length */
    407  1.1.1.1.4.2  pgoyette     size_t index;       /* index of this case in *num */
    408  1.1.1.1.4.2  pgoyette 
    409  1.1.1.1.4.2  pgoyette     /* clear code */
    410  1.1.1.1.4.2  pgoyette     for (n = 0; n <= max; n++)
    411  1.1.1.1.4.2  pgoyette         code[n] = 0;
    412  1.1.1.1.4.2  pgoyette 
    413  1.1.1.1.4.2  pgoyette     /* look at all (root + 1) bit and longer codes */
    414  1.1.1.1.4.2  pgoyette     large = 1 << root;              /* base table */
    415  1.1.1.1.4.2  pgoyette     if (root < max)                 /* otherwise, there's only a base table */
    416  1.1.1.1.4.2  pgoyette         for (n = 3; n <= syms; n++)
    417  1.1.1.1.4.2  pgoyette             for (left = 2; left < n; left += 2)
    418  1.1.1.1.4.2  pgoyette             {
    419  1.1.1.1.4.2  pgoyette                 /* look at all reachable (root + 1) bit nodes, and the
    420  1.1.1.1.4.2  pgoyette                    resulting codes (complete at root + 2 or more) */
    421  1.1.1.1.4.2  pgoyette                 index = INDEX(n, left, root + 1);
    422  1.1.1.1.4.2  pgoyette                 if (root + 1 < max && num[index])       /* reachable node */
    423  1.1.1.1.4.2  pgoyette                     examine(n, root + 1, left, 1 << root, 0);
    424  1.1.1.1.4.2  pgoyette 
    425  1.1.1.1.4.2  pgoyette                 /* also look at root bit codes with completions at root + 1
    426  1.1.1.1.4.2  pgoyette                    bits (not saved in num, since complete), just in case */
    427  1.1.1.1.4.2  pgoyette                 if (num[index - 1] && n <= left << 1)
    428  1.1.1.1.4.2  pgoyette                     examine((n - left) << 1, root + 1, (n - left) << 1,
    429  1.1.1.1.4.2  pgoyette                             1 << root, 0);
    430  1.1.1.1.4.2  pgoyette             }
    431  1.1.1.1.4.2  pgoyette 
    432  1.1.1.1.4.2  pgoyette     /* done */
    433  1.1.1.1.4.2  pgoyette     printf("done: maximum of %d table entries\n", large);
    434  1.1.1.1.4.2  pgoyette }
    435  1.1.1.1.4.2  pgoyette 
    436  1.1.1.1.4.2  pgoyette /*
    437  1.1.1.1.4.2  pgoyette    Examine and show the total number of possible Huffman codes for a given
    438  1.1.1.1.4.2  pgoyette    maximum number of symbols, initial root table size, and maximum code length
    439  1.1.1.1.4.2  pgoyette    in bits -- those are the command arguments in that order.  The default
    440  1.1.1.1.4.2  pgoyette    values are 286, 9, and 15 respectively, for the deflate literal/length code.
    441  1.1.1.1.4.2  pgoyette    The possible codes are counted for each number of coded symbols from two to
    442  1.1.1.1.4.2  pgoyette    the maximum.  The counts for each of those and the total number of codes are
    443  1.1.1.1.4.2  pgoyette    shown.  The maximum number of inflate table entires is then calculated
    444  1.1.1.1.4.2  pgoyette    across all possible codes.  Each new maximum number of table entries and the
    445  1.1.1.1.4.2  pgoyette    associated sub-code (starting at root + 1 == 10 bits) is shown.
    446  1.1.1.1.4.2  pgoyette 
    447  1.1.1.1.4.2  pgoyette    To count and examine Huffman codes that are not length-limited, provide a
    448  1.1.1.1.4.2  pgoyette    maximum length equal to the number of symbols minus one.
    449  1.1.1.1.4.2  pgoyette 
    450  1.1.1.1.4.2  pgoyette    For the deflate literal/length code, use "enough".  For the deflate distance
    451  1.1.1.1.4.2  pgoyette    code, use "enough 30 6".
    452  1.1.1.1.4.2  pgoyette 
    453  1.1.1.1.4.2  pgoyette    This uses the %llu printf format to print big_t numbers, which assumes that
    454  1.1.1.1.4.2  pgoyette    big_t is an unsigned long long.  If the big_t type is changed (for example
    455  1.1.1.1.4.2  pgoyette    to a multiple precision type), the method of printing will also need to be
    456  1.1.1.1.4.2  pgoyette    updated.
    457  1.1.1.1.4.2  pgoyette  */
    458  1.1.1.1.4.2  pgoyette int main(int argc, char **argv)
    459  1.1.1.1.4.2  pgoyette {
    460  1.1.1.1.4.2  pgoyette     int syms;           /* total number of symbols to code */
    461  1.1.1.1.4.2  pgoyette     int n;              /* number of symbols to code for this run */
    462  1.1.1.1.4.2  pgoyette     big_t got;          /* return value of count() */
    463  1.1.1.1.4.2  pgoyette     big_t sum;          /* accumulated number of codes over n */
    464  1.1.1.1.4.2  pgoyette     code_t word;        /* for counting bits in code_t */
    465  1.1.1.1.4.2  pgoyette 
    466  1.1.1.1.4.2  pgoyette     /* set up globals for cleanup() */
    467  1.1.1.1.4.2  pgoyette     code = NULL;
    468  1.1.1.1.4.2  pgoyette     num = NULL;
    469  1.1.1.1.4.2  pgoyette     done = NULL;
    470  1.1.1.1.4.2  pgoyette 
    471  1.1.1.1.4.2  pgoyette     /* get arguments -- default to the deflate literal/length code */
    472  1.1.1.1.4.2  pgoyette     syms = 286;
    473  1.1.1.1.4.2  pgoyette     root = 9;
    474  1.1.1.1.4.2  pgoyette     max = 15;
    475  1.1.1.1.4.2  pgoyette     if (argc > 1) {
    476  1.1.1.1.4.2  pgoyette         syms = atoi(argv[1]);
    477  1.1.1.1.4.2  pgoyette         if (argc > 2) {
    478  1.1.1.1.4.2  pgoyette             root = atoi(argv[2]);
    479  1.1.1.1.4.2  pgoyette             if (argc > 3)
    480  1.1.1.1.4.2  pgoyette                 max = atoi(argv[3]);
    481  1.1.1.1.4.2  pgoyette         }
    482  1.1.1.1.4.2  pgoyette     }
    483  1.1.1.1.4.2  pgoyette     if (argc > 4 || syms < 2 || root < 1 || max < 1) {
    484  1.1.1.1.4.2  pgoyette         fputs("invalid arguments, need: [sym >= 2 [root >= 1 [max >= 1]]]\n",
    485  1.1.1.1.4.2  pgoyette               stderr);
    486  1.1.1.1.4.2  pgoyette         return 1;
    487  1.1.1.1.4.2  pgoyette     }
    488  1.1.1.1.4.2  pgoyette 
    489  1.1.1.1.4.2  pgoyette     /* if not restricting the code length, the longest is syms - 1 */
    490  1.1.1.1.4.2  pgoyette     if (max > syms - 1)
    491  1.1.1.1.4.2  pgoyette         max = syms - 1;
    492  1.1.1.1.4.2  pgoyette 
    493  1.1.1.1.4.2  pgoyette     /* determine the number of bits in a code_t */
    494  1.1.1.1.4.2  pgoyette     for (n = 0, word = 1; word; n++, word <<= 1)
    495  1.1.1.1.4.2  pgoyette         ;
    496  1.1.1.1.4.2  pgoyette 
    497  1.1.1.1.4.2  pgoyette     /* make sure that the calculation of most will not overflow */
    498  1.1.1.1.4.2  pgoyette     if (max > n || (code_t)(syms - 2) >= (((code_t)0 - 1) >> (max - 1))) {
    499  1.1.1.1.4.2  pgoyette         fputs("abort: code length too long for internal types\n", stderr);
    500  1.1.1.1.4.2  pgoyette         return 1;
    501  1.1.1.1.4.2  pgoyette     }
    502  1.1.1.1.4.2  pgoyette 
    503  1.1.1.1.4.2  pgoyette     /* reject impossible code requests */
    504  1.1.1.1.4.2  pgoyette     if ((code_t)(syms - 1) > ((code_t)1 << max) - 1) {
    505  1.1.1.1.4.2  pgoyette         fprintf(stderr, "%d symbols cannot be coded in %d bits\n",
    506  1.1.1.1.4.2  pgoyette                 syms, max);
    507  1.1.1.1.4.2  pgoyette         return 1;
    508  1.1.1.1.4.2  pgoyette     }
    509  1.1.1.1.4.2  pgoyette 
    510  1.1.1.1.4.2  pgoyette     /* allocate code vector */
    511  1.1.1.1.4.2  pgoyette     code = calloc(max + 1, sizeof(int));
    512  1.1.1.1.4.2  pgoyette     if (code == NULL) {
    513  1.1.1.1.4.2  pgoyette         fputs("abort: unable to allocate enough memory\n", stderr);
    514  1.1.1.1.4.2  pgoyette         return 1;
    515  1.1.1.1.4.2  pgoyette     }
    516  1.1.1.1.4.2  pgoyette 
    517  1.1.1.1.4.2  pgoyette     /* determine size of saved results array, checking for overflows,
    518  1.1.1.1.4.2  pgoyette        allocate and clear the array (set all to zero with calloc()) */
    519  1.1.1.1.4.2  pgoyette     if (syms == 2)              /* iff max == 1 */
    520  1.1.1.1.4.2  pgoyette         num = NULL;             /* won't be saving any results */
    521  1.1.1.1.4.2  pgoyette     else {
    522  1.1.1.1.4.2  pgoyette         size = syms >> 1;
    523  1.1.1.1.4.2  pgoyette         if (size > ((size_t)0 - 1) / (n = (syms - 1) >> 1) ||
    524  1.1.1.1.4.2  pgoyette                 (size *= n, size > ((size_t)0 - 1) / (n = max - 1)) ||
    525  1.1.1.1.4.2  pgoyette                 (size *= n, size > ((size_t)0 - 1) / sizeof(big_t)) ||
    526  1.1.1.1.4.2  pgoyette                 (num = calloc(size, sizeof(big_t))) == NULL) {
    527  1.1.1.1.4.2  pgoyette             fputs("abort: unable to allocate enough memory\n", stderr);
    528  1.1.1.1.4.2  pgoyette             cleanup();
    529  1.1.1.1.4.2  pgoyette             return 1;
    530  1.1.1.1.4.2  pgoyette         }
    531  1.1.1.1.4.2  pgoyette     }
    532  1.1.1.1.4.2  pgoyette 
    533  1.1.1.1.4.2  pgoyette     /* count possible codes for all numbers of symbols, add up counts */
    534  1.1.1.1.4.2  pgoyette     sum = 0;
    535  1.1.1.1.4.2  pgoyette     for (n = 2; n <= syms; n++) {
    536  1.1.1.1.4.2  pgoyette         got = count(n, 1, 2);
    537  1.1.1.1.4.2  pgoyette         sum += got;
    538  1.1.1.1.4.2  pgoyette         if (got == (big_t)0 - 1 || sum < got) {     /* overflow */
    539  1.1.1.1.4.2  pgoyette             fputs("abort: can't count that high!\n", stderr);
    540  1.1.1.1.4.2  pgoyette             cleanup();
    541  1.1.1.1.4.2  pgoyette             return 1;
    542  1.1.1.1.4.2  pgoyette         }
    543  1.1.1.1.4.2  pgoyette         printf("%llu %d-codes\n", got, n);
    544  1.1.1.1.4.2  pgoyette     }
    545  1.1.1.1.4.2  pgoyette     printf("%llu total codes for 2 to %d symbols", sum, syms);
    546  1.1.1.1.4.2  pgoyette     if (max < syms - 1)
    547  1.1.1.1.4.2  pgoyette         printf(" (%d-bit length limit)\n", max);
    548  1.1.1.1.4.2  pgoyette     else
    549  1.1.1.1.4.2  pgoyette         puts(" (no length limit)");
    550  1.1.1.1.4.2  pgoyette 
    551  1.1.1.1.4.2  pgoyette     /* allocate and clear done array for beenhere() */
    552  1.1.1.1.4.2  pgoyette     if (syms == 2)
    553  1.1.1.1.4.2  pgoyette         done = NULL;
    554  1.1.1.1.4.2  pgoyette     else if (size > ((size_t)0 - 1) / sizeof(struct tab) ||
    555  1.1.1.1.4.2  pgoyette              (done = calloc(size, sizeof(struct tab))) == NULL) {
    556  1.1.1.1.4.2  pgoyette         fputs("abort: unable to allocate enough memory\n", stderr);
    557  1.1.1.1.4.2  pgoyette         cleanup();
    558  1.1.1.1.4.2  pgoyette         return 1;
    559  1.1.1.1.4.2  pgoyette     }
    560  1.1.1.1.4.2  pgoyette 
    561  1.1.1.1.4.2  pgoyette     /* find and show maximum inflate table usage */
    562  1.1.1.1.4.2  pgoyette     if (root > max)                 /* reduce root to max length */
    563  1.1.1.1.4.2  pgoyette         root = max;
    564  1.1.1.1.4.2  pgoyette     if ((code_t)syms < ((code_t)1 << (root + 1)))
    565  1.1.1.1.4.2  pgoyette         enough(syms);
    566  1.1.1.1.4.2  pgoyette     else
    567  1.1.1.1.4.2  pgoyette         puts("cannot handle minimum code lengths > root");
    568  1.1.1.1.4.2  pgoyette 
    569  1.1.1.1.4.2  pgoyette     /* done */
    570  1.1.1.1.4.2  pgoyette     cleanup();
    571  1.1.1.1.4.2  pgoyette     return 0;
    572  1.1.1.1.4.2  pgoyette }
    573