Home | History | Annotate | Line # | Download | only in wind
      1 
      2 
      3 
      4 
      5 
      6 
      7 Network Working Group                                        A. Costello
      8 Request for Comments: 3492                 Univ. of California, Berkeley
      9 Category: Standards Track                                     March 2003
     10 
     11 
     12               Punycode: A Bootstring encoding of Unicode
     13        for Internationalized Domain Names in Applications (IDNA)
     14 
     15 Status of this Memo
     16 
     17    This document specifies an Internet standards track protocol for the
     18    Internet community, and requests discussion and suggestions for
     19    improvements.  Please refer to the current edition of the "Internet
     20    Official Protocol Standards" (STD 1) for the standardization state
     21    and status of this protocol.  Distribution of this memo is unlimited.
     22 
     23 Copyright Notice
     24 
     25    Copyright (C) The Internet Society (2003).  All Rights Reserved.
     26 
     27 Abstract
     28 
     29    Punycode is a simple and efficient transfer encoding syntax designed
     30    for use with Internationalized Domain Names in Applications (IDNA).
     31    It uniquely and reversibly transforms a Unicode string into an ASCII
     32    string.  ASCII characters in the Unicode string are represented
     33    literally, and non-ASCII characters are represented by ASCII
     34    characters that are allowed in host name labels (letters, digits, and
     35    hyphens).  This document defines a general algorithm called
     36    Bootstring that allows a string of basic code points to uniquely
     37    represent any string of code points drawn from a larger set.
     38    Punycode is an instance of Bootstring that uses particular parameter
     39    values specified by this document, appropriate for IDNA.
     40 
     41 Table of Contents
     42 
     43    1. Introduction...............................................2
     44        1.1 Features..............................................2
     45        1.2 Interaction of protocol parts.........................3
     46    2. Terminology................................................3
     47    3. Bootstring description.....................................4
     48        3.1 Basic code point segregation..........................4
     49        3.2 Insertion unsort coding...............................4
     50        3.3 Generalized variable-length integers..................5
     51        3.4 Bias adaptation.......................................7
     52    4. Bootstring parameters......................................8
     53    5. Parameter values for Punycode..............................8
     54    6. Bootstring algorithms......................................9
     55 
     56 
     57 
     58 Costello                    Standards Track                     [Page 1]
     59 
     61 RFC 3492                     IDNA Punycode                    March 2003
     62 
     63 
     64        6.1 Bias adaptation function.............................10
     65        6.2 Decoding procedure...................................11
     66        6.3 Encoding procedure...................................12
     67        6.4 Overflow handling....................................13
     68    7. Punycode examples.........................................14
     69        7.1 Sample strings.......................................14
     70        7.2 Decoding traces......................................17
     71        7.3 Encoding traces......................................19
     72    8. Security Considerations...................................20
     73    9. References................................................21
     74        9.1 Normative References.................................21
     75        9.2 Informative References...............................21
     76    A. Mixed-case annotation.....................................22
     77    B. Disclaimer and license....................................22
     78    C. Punycode sample implementation............................23
     79    Author's Address.............................................34
     80    Full Copyright Statement.....................................35
     81 
     82 1. Introduction
     83 
     84    [IDNA] describes an architecture for supporting internationalized
     85    domain names.  Labels containing non-ASCII characters can be
     86    represented by ACE labels, which begin with a special ACE prefix and
     87    contain only ASCII characters.  The remainder of the label after the
     88    prefix is a Punycode encoding of a Unicode string satisfying certain
     89    constraints.  For the details of the prefix and constraints, see
     90    [IDNA] and [NAMEPREP].
     91 
     92    Punycode is an instance of a more general algorithm called
     93    Bootstring, which allows strings composed from a small set of "basic"
     94    code points to uniquely represent any string of code points drawn
     95    from a larger set.  Punycode is Bootstring with particular parameter
     96    values appropriate for IDNA.
     97 
     98 1.1 Features
     99 
    100    Bootstring has been designed to have the following features:
    101 
    102    *  Completeness:  Every extended string (sequence of arbitrary code
    103       points) can be represented by a basic string (sequence of basic
    104       code points).  Restrictions on what strings are allowed, and on
    105       length, can be imposed by higher layers.
    106 
    107    *  Uniqueness:  There is at most one basic string that represents a
    108       given extended string.
    109 
    110    *  Reversibility:  Any extended string mapped to a basic string can
    111       be recovered from that basic string.
    112 
    113 
    114 
    115 Costello                    Standards Track                     [Page 2]
    116 
    118 RFC 3492                     IDNA Punycode                    March 2003
    119 
    120 
    121    *  Efficient encoding:  The ratio of basic string length to extended
    122       string length is small.  This is important in the context of
    123       domain names because RFC 1034 [RFC1034] restricts the length of a
    124       domain label to 63 characters.
    125 
    126    *  Simplicity:  The encoding and decoding algorithms are reasonably
    127       simple to implement.  The goals of efficiency and simplicity are
    128       at odds; Bootstring aims at a good balance between them.
    129 
    130    *  Readability:  Basic code points appearing in the extended string
    131       are represented as themselves in the basic string (although the
    132       main purpose is to improve efficiency, not readability).
    133 
    134    Punycode can also support an additional feature that is not used by
    135    the ToASCII and ToUnicode operations of [IDNA].  When extended
    136    strings are case-folded prior to encoding, the basic string can use
    137    mixed case to tell how to convert the folded string into a mixed-case
    138    string.  See appendix A "Mixed-case annotation".
    139 
    140 1.2 Interaction of protocol parts
    141 
    142    Punycode is used by the IDNA protocol [IDNA] for converting domain
    143    labels into ASCII; it is not designed for any other purpose.  It is
    144    explicitly not designed for processing arbitrary free text.
    145 
    146 2. Terminology
    147 
    148    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    149    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    150    document are to be interpreted as described in BCP 14, RFC 2119
    151    [RFC2119].
    152 
    153    A code point is an integral value associated with a character in a
    154    coded character set.
    155 
    156    As in the Unicode Standard [UNICODE], Unicode code points are denoted
    157    by "U+" followed by four to six hexadecimal digits, while a range of
    158    code points is denoted by two hexadecimal numbers separated by "..",
    159    with no prefixes.
    160 
    161    The operators div and mod perform integer division; (x div y) is the
    162    quotient of x divided by y, discarding the remainder, and (x mod y)
    163    is the remainder, so (x div y) * y + (x mod y) == x.  Bootstring uses
    164    these operators only with nonnegative operands, so the quotient and
    165    remainder are always nonnegative.
    166 
    167    The break statement jumps out of the innermost loop (as in C).
    168 
    169 
    170 
    171 
    172 Costello                    Standards Track                     [Page 3]
    173 
    175 RFC 3492                     IDNA Punycode                    March 2003
    176 
    177 
    178    An overflow is an attempt to compute a value that exceeds the maximum
    179    value of an integer variable.
    180 
    181 3. Bootstring description
    182 
    183    Bootstring represents an arbitrary sequence of code points (the
    184    "extended string") as a sequence of basic code points (the "basic
    185    string").  This section describes the representation.  Section 6
    186    "Bootstring algorithms" presents the algorithms as pseudocode.
    187    Sections 7.1 "Decoding traces" and 7.2 "Encoding traces" trace the
    188    algorithms for sample inputs.
    189 
    190    The following sections describe the four techniques used in
    191    Bootstring.  "Basic code point segregation" is a very simple and
    192    efficient encoding for basic code points occurring in the extended
    193    string: they are simply copied all at once.  "Insertion unsort
    194    coding" encodes the non-basic code points as deltas, and processes
    195    the code points in numerical order rather than in order of
    196    appearance, which typically results in smaller deltas.  The deltas
    197    are represented as "generalized variable-length integers", which use
    198    basic code points to represent nonnegative integers.  The parameters
    199    of this integer representation are dynamically adjusted using "bias
    200    adaptation", to improve efficiency when consecutive deltas have
    201    similar magnitudes.
    202 
    203 3.1 Basic code point segregation
    204 
    205    All basic code points appearing in the extended string are
    206    represented literally at the beginning of the basic string, in their
    207    original order, followed by a delimiter if (and only if) the number
    208    of basic code points is nonzero.  The delimiter is a particular basic
    209    code point, which never appears in the remainder of the basic string.
    210    The decoder can therefore find the end of the literal portion (if
    211    there is one) by scanning for the last delimiter.
    212 
    213 3.2 Insertion unsort coding
    214 
    215    The remainder of the basic string (after the last delimiter if there
    216    is one) represents a sequence of nonnegative integral deltas as
    217    generalized variable-length integers, described in section 3.3.  The
    218    meaning of the deltas is best understood in terms of the decoder.
    219 
    220    The decoder builds the extended string incrementally.  Initially, the
    221    extended string is a copy of the literal portion of the basic string
    222    (excluding the last delimiter).  The decoder inserts non-basic code
    223    points, one for each delta, into the extended string, ultimately
    224    arriving at the final decoded string.
    225 
    226 
    227 
    228 
    229 Costello                    Standards Track                     [Page 4]
    230 
    232 RFC 3492                     IDNA Punycode                    March 2003
    233 
    234 
    235    At the heart of this process is a state machine with two state
    236    variables: an index i and a counter n.  The index i refers to a
    237    position in the extended string; it ranges from 0 (the first
    238    position) to the current length of the extended string (which refers
    239    to a potential position beyond the current end).  If the current
    240    state is <n,i>, the next state is <n,i+1> if i is less than the
    241    length of the extended string, or <n+1,0> if i equals the length of
    242    the extended string.  In other words, each state change causes i to
    243    increment, wrapping around to zero if necessary, and n counts the
    244    number of wrap-arounds.
    245 
    246    Notice that the state always advances monotonically (there is no way
    247    for the decoder to return to an earlier state).  At each state, an
    248    insertion is either performed or not performed.  At most one
    249    insertion is performed in a given state.  An insertion inserts the
    250    value of n at position i in the extended string.  The deltas are a
    251    run-length encoding of this sequence of events: they are the lengths
    252    of the runs of non-insertion states preceeding the insertion states.
    253    Hence, for each delta, the decoder performs delta state changes, then
    254    an insertion, and then one more state change.  (An implementation
    255    need not perform each state change individually, but can instead use
    256    division and remainder calculations to compute the next insertion
    257    state directly.)  It is an error if the inserted code point is a
    258    basic code point (because basic code points were supposed to be
    259    segregated as described in section 3.1).
    260 
    261    The encoder's main task is to derive the sequence of deltas that will
    262    cause the decoder to construct the desired string.  It can do this by
    263    repeatedly scanning the extended string for the next code point that
    264    the decoder would need to insert, and counting the number of state
    265    changes the decoder would need to perform, mindful of the fact that
    266    the decoder's extended string will include only those code points
    267    that have already been inserted.  Section 6.3 "Encoding procedure"
    268    gives a precise algorithm.
    269 
    270 3.3 Generalized variable-length integers
    271 
    272    In a conventional integer representation the base is the number of
    273    distinct symbols for digits, whose values are 0 through base-1.  Let
    274    digit_0 denote the least significant digit, digit_1 the next least
    275    significant, and so on.  The value represented is the sum over j of
    276    digit_j * w(j), where w(j) = base^j is the weight (scale factor) for
    277    position j.  For example, in the base 8 integer 437, the digits are
    278    7, 3, and 4, and the weights are 1, 8, and 64, so the value is 7 +
    279    3*8 + 4*64 = 287.  This representation has two disadvantages:  First,
    280    there are multiple encodings of each value (because there can be
    281    extra zeros in the most significant positions), which is inconvenient
    282 
    283 
    284 
    285 
    286 Costello                    Standards Track                     [Page 5]
    287 
    289 RFC 3492                     IDNA Punycode                    March 2003
    290 
    291 
    292    when unique encodings are needed.  Second, the integer is not self-
    293    delimiting, so if multiple integers are concatenated the boundaries
    294    between them are lost.
    295 
    296    The generalized variable-length representation solves these two
    297    problems.  The digit values are still 0 through base-1, but now the
    298    integer is self-delimiting by means of thresholds t(j), each of which
    299    is in the range 0 through base-1.  Exactly one digit, the most
    300    significant, satisfies digit_j < t(j).  Therefore, if several
    301    integers are concatenated, it is easy to separate them, starting with
    302    the first if they are little-endian (least significant digit first),
    303    or starting with the last if they are big-endian (most significant
    304    digit first).  As before, the value is the sum over j of digit_j *
    305    w(j), but the weights are different:
    306 
    307       w(0) = 1
    308       w(j) = w(j-1) * (base - t(j-1)) for j > 0
    309 
    310    For example, consider the little-endian sequence of base 8 digits
    311    734251...  Suppose the thresholds are 2, 3, 5, 5, 5, 5...  This
    312    implies that the weights are 1, 1*(8-2) = 6, 6*(8-3) = 30, 30*(8-5) =
    313    90, 90*(8-5) = 270, and so on.  7 is not less than 2, and 3 is not
    314    less than 3, but 4 is less than 5, so 4 is the last digit.  The value
    315    of 734 is 7*1 + 3*6 + 4*30 = 145.  The next integer is 251, with
    316    value 2*1 + 5*6 + 1*30 = 62.  Decoding this representation is very
    317    similar to decoding a conventional integer:  Start with a current
    318    value of N = 0 and a weight w = 1.  Fetch the next digit d and
    319    increase N by d * w.  If d is less than the current threshold (t)
    320    then stop, otherwise increase w by a factor of (base - t), update t
    321    for the next position, and repeat.
    322 
    323    Encoding this representation is similar to encoding a conventional
    324    integer:  If N < t then output one digit for N and stop, otherwise
    325    output the digit for t + ((N - t) mod (base - t)), then replace N
    326    with (N - t) div (base - t), update t for the next position, and
    327    repeat.
    328 
    329    For any particular set of values of t(j), there is exactly one
    330    generalized variable-length representation of each nonnegative
    331    integral value.
    332 
    333    Bootstring uses little-endian ordering so that the deltas can be
    334    separated starting with the first.  The t(j) values are defined in
    335    terms of the constants base, tmin, and tmax, and a state variable
    336    called bias:
    337 
    338       t(j) = base * (j + 1) - bias,
    339       clamped to the range tmin through tmax
    340 
    341 
    342 
    343 Costello                    Standards Track                     [Page 6]
    344 
    346 RFC 3492                     IDNA Punycode                    March 2003
    347 
    348 
    349    The clamping means that if the formula yields a value less than tmin
    350    or greater than tmax, then t(j) = tmin or tmax, respectively.  (In
    351    the pseudocode in section 6 "Bootstring algorithms", the expression
    352    base * (j + 1) is denoted by k for performance reasons.)  These t(j)
    353    values cause the representation to favor integers within a particular
    354    range determined by the bias.
    355 
    356 3.4 Bias adaptation
    357 
    358    After each delta is encoded or decoded, bias is set for the next
    359    delta as follows:
    360 
    361    1. Delta is scaled in order to avoid overflow in the next step:
    362 
    363          let delta = delta div 2
    364 
    365       But when this is the very first delta, the divisor is not 2, but
    366       instead a constant called damp.  This compensates for the fact
    367       that the second delta is usually much smaller than the first.
    368 
    369    2. Delta is increased to compensate for the fact that the next delta
    370       will be inserting into a longer string:
    371 
    372          let delta = delta + (delta div numpoints)
    373 
    374       numpoints is the total number of code points encoded/decoded so
    375       far (including the one corresponding to this delta itself, and
    376       including the basic code points).
    377 
    378    3. Delta is repeatedly divided until it falls within a threshold, to
    379       predict the minimum number of digits needed to represent the next
    380       delta:
    381 
    382          while delta > ((base - tmin) * tmax) div 2
    383          do let delta = delta div (base - tmin)
    384 
    385    4. The bias is set:
    386 
    387          let bias =
    388            (base * the number of divisions performed in step 3) +
    389            (((base - tmin + 1) * delta) div (delta + skew))
    390 
    391       The motivation for this procedure is that the current delta
    392       provides a hint about the likely size of the next delta, and so
    393       t(j) is set to tmax for the more significant digits starting with
    394       the one expected to be last, tmin for the less significant digits
    395       up through the one expected to be third-last, and somewhere
    396       between tmin and tmax for the digit expected to be second-last
    397 
    398 
    399 
    400 Costello                    Standards Track                     [Page 7]
    401 
    403 RFC 3492                     IDNA Punycode                    March 2003
    404 
    405 
    406       (balancing the hope of the expected-last digit being unnecessary
    407       against the danger of it being insufficient).
    408 
    409 4. Bootstring parameters
    410 
    411    Given a set of basic code points, one needs to be designated as the
    412    delimiter.  The base cannot be greater than the number of
    413    distinguishable basic code points remaining.  The digit-values in the
    414    range 0 through base-1 need to be associated with distinct non-
    415    delimiter basic code points.  In some cases multiple code points need
    416    to have the same digit-value; for example, uppercase and lowercase
    417    versions of the same letter need to be equivalent if basic strings
    418    are case-insensitive.
    419 
    420    The initial value of n cannot be greater than the minimum non-basic
    421    code point that could appear in extended strings.
    422 
    423    The remaining five parameters (tmin, tmax, skew, damp, and the
    424    initial value of bias) need to satisfy the following constraints:
    425 
    426       0 <= tmin <= tmax <= base-1
    427       skew >= 1
    428       damp >= 2
    429       initial_bias mod base <= base - tmin
    430 
    431    Provided the constraints are satisfied, these five parameters affect
    432    efficiency but not correctness.  They are best chosen empirically.
    433 
    434    If support for mixed-case annotation is desired (see appendix A),
    435    make sure that the code points corresponding to 0 through tmax-1 all
    436    have both uppercase and lowercase forms.
    437 
    438 5. Parameter values for Punycode
    439 
    440    Punycode uses the following Bootstring parameter values:
    441 
    442       base         = 36
    443       tmin         = 1
    444       tmax         = 26
    445       skew         = 38
    446       damp         = 700
    447       initial_bias = 72
    448       initial_n    = 128 = 0x80
    449 
    450    Although the only restriction Punycode imposes on the input integers
    451    is that they be nonnegative, these parameters are especially designed
    452    to work well with Unicode [UNICODE] code points, which are integers
    453    in the range 0..10FFFF (but not D800..DFFF, which are reserved for
    454 
    455 
    456 
    457 Costello                    Standards Track                     [Page 8]
    458 
    460 RFC 3492                     IDNA Punycode                    March 2003
    461 
    462 
    463    use by the UTF-16 encoding of Unicode).  The basic code points are
    464    the ASCII [ASCII] code points (0..7F), of which U+002D (-) is the
    465    delimiter, and some of the others have digit-values as follows:
    466 
    467       code points    digit-values
    468       ------------   ----------------------
    469       41..5A (A-Z) =  0 to 25, respectively
    470       61..7A (a-z) =  0 to 25, respectively
    471       30..39 (0-9) = 26 to 35, respectively
    472 
    473    Using hyphen-minus as the delimiter implies that the encoded string
    474    can end with a hyphen-minus only if the Unicode string consists
    475    entirely of basic code points, but IDNA forbids such strings from
    476    being encoded.  The encoded string can begin with a hyphen-minus, but
    477    IDNA prepends a prefix.  Therefore IDNA using Punycode conforms to
    478    the RFC 952 rule that host name labels neither begin nor end with a
    479    hyphen-minus [RFC952].
    480 
    481    A decoder MUST recognize the letters in both uppercase and lowercase
    482    forms (including mixtures of both forms).  An encoder SHOULD output
    483    only uppercase forms or only lowercase forms, unless it uses mixed-
    484    case annotation (see appendix A).
    485 
    486    Presumably most users will not manually write or type encoded strings
    487    (as opposed to cutting and pasting them), but those who do will need
    488    to be alert to the potential visual ambiguity between the following
    489    sets of characters:
    490 
    491       G 6
    492       I l 1
    493       O 0
    494       S 5
    495       U V
    496       Z 2
    497 
    498    Such ambiguities are usually resolved by context, but in a Punycode
    499    encoded string there is no context apparent to humans.
    500 
    501 6. Bootstring algorithms
    502 
    503    Some parts of the pseudocode can be omitted if the parameters satisfy
    504    certain conditions (for which Punycode qualifies).  These parts are
    505    enclosed in {braces}, and notes immediately following the pseudocode
    506    explain the conditions under which they can be omitted.
    507 
    508 
    509 
    510 
    511 
    512 
    513 
    514 Costello                    Standards Track                     [Page 9]
    515 
    517 RFC 3492                     IDNA Punycode                    March 2003
    518 
    519 
    520    Formally, code points are integers, and hence the pseudocode assumes
    521    that arithmetic operations can be performed directly on code points.
    522    In some programming languages, explicit conversion between code
    523    points and integers might be necessary.
    524 
    525 6.1 Bias adaptation function
    526 
    527    function adapt(delta,numpoints,firsttime):
    528      if firsttime then let delta = delta div damp
    529      else let delta = delta div 2
    530      let delta = delta + (delta div numpoints)
    531      let k = 0
    532      while delta > ((base - tmin) * tmax) div 2 do begin
    533        let delta = delta div (base - tmin)
    534        let k = k + base
    535      end
    536      return k + (((base - tmin + 1) * delta) div (delta + skew))
    537 
    538    It does not matter whether the modifications to delta and k inside
    539    adapt() affect variables of the same name inside the
    540    encoding/decoding procedures, because after calling adapt() the
    541    caller does not read those variables before overwriting them.
    542 
    543 
    544 
    545 
    546 
    547 
    548 
    549 
    550 
    551 
    552 
    553 
    554 
    555 
    556 
    557 
    558 
    559 
    560 
    561 
    562 
    563 
    564 
    565 
    566 
    567 
    568 
    569 
    570 
    571 Costello                    Standards Track                    [Page 10]
    572 
    574 RFC 3492                     IDNA Punycode                    March 2003
    575 
    576 
    577 6.2 Decoding procedure
    578 
    579    let n = initial_n
    580    let i = 0
    581    let bias = initial_bias
    582    let output = an empty string indexed from 0
    583    consume all code points before the last delimiter (if there is one)
    584      and copy them to output, fail on any non-basic code point
    585    if more than zero code points were consumed then consume one more
    586      (which will be the last delimiter)
    587    while the input is not exhausted do begin
    588      let oldi = i
    589      let w = 1
    590      for k = base to infinity in steps of base do begin
    591        consume a code point, or fail if there was none to consume
    592        let digit = the code point's digit-value, fail if it has none
    593        let i = i + digit * w, fail on overflow
    594        let t = tmin if k <= bias {+ tmin}, or
    595                tmax if k >= bias + tmax, or k - bias otherwise
    596        if digit < t then break
    597        let w = w * (base - t), fail on overflow
    598      end
    599      let bias = adapt(i - oldi, length(output) + 1, test oldi is 0?)
    600      let n = n + i div (length(output) + 1), fail on overflow
    601      let i = i mod (length(output) + 1)
    602      {if n is a basic code point then fail}
    603      insert n into output at position i
    604      increment i
    605    end
    606 
    607    The full statement enclosed in braces (checking whether n is a basic
    608    code point) can be omitted if initial_n exceeds all basic code points
    609    (which is true for Punycode), because n is never less than initial_n.
    610 
    611    In the assignment of t, where t is clamped to the range tmin through
    612    tmax, "+ tmin" can always be omitted.  This makes the clamping
    613    calculation incorrect when bias < k < bias + tmin, but that cannot
    614    happen because of the way bias is computed and because of the
    615    constraints on the parameters.
    616 
    617    Because the decoder state can only advance monotonically, and there
    618    is only one representation of any delta, there is therefore only one
    619    encoded string that can represent a given sequence of integers.  The
    620    only error conditions are invalid code points, unexpected end-of-
    621    input, overflow, and basic code points encoded using deltas instead
    622    of appearing literally.  If the decoder fails on these errors as
    623    shown above, then it cannot produce the same output for two distinct
    624    inputs.  Without this property it would have been necessary to re-
    625 
    626 
    627 
    628 Costello                    Standards Track                    [Page 11]
    629 
    631 RFC 3492                     IDNA Punycode                    March 2003
    632 
    633 
    634    encode the output and verify that it matches the input in order to
    635    guarantee the uniqueness of the encoding.
    636 
    637 6.3 Encoding procedure
    638 
    639    let n = initial_n
    640    let delta = 0
    641    let bias = initial_bias
    642    let h = b = the number of basic code points in the input
    643    copy them to the output in order, followed by a delimiter if b > 0
    644    {if the input contains a non-basic code point < n then fail}
    645    while h < length(input) do begin
    646      let m = the minimum {non-basic} code point >= n in the input
    647      let delta = delta + (m - n) * (h + 1), fail on overflow
    648      let n = m
    649      for each code point c in the input (in order) do begin
    650        if c < n {or c is basic} then increment delta, fail on overflow
    651        if c == n then begin
    652          let q = delta
    653          for k = base to infinity in steps of base do begin
    654            let t = tmin if k <= bias {+ tmin}, or
    655                    tmax if k >= bias + tmax, or k - bias otherwise
    656            if q < t then break
    657            output the code point for digit t + ((q - t) mod (base - t))
    658            let q = (q - t) div (base - t)
    659          end
    660          output the code point for digit q
    661          let bias = adapt(delta, h + 1, test h equals b?)
    662          let delta = 0
    663          increment h
    664        end
    665      end
    666      increment delta and n
    667    end
    668 
    669    The full statement enclosed in braces (checking whether the input
    670    contains a non-basic code point less than n) can be omitted if all
    671    code points less than initial_n are basic code points (which is true
    672    for Punycode if code points are unsigned).
    673 
    674    The brace-enclosed conditions "non-basic" and "or c is basic" can be
    675    omitted if initial_n exceeds all basic code points (which is true for
    676    Punycode), because the code point being tested is never less than
    677    initial_n.
    678 
    679    In the assignment of t, where t is clamped to the range tmin through
    680    tmax, "+ tmin" can always be omitted.  This makes the clamping
    681    calculation incorrect when bias < k < bias + tmin, but that cannot
    682 
    683 
    684 
    685 Costello                    Standards Track                    [Page 12]
    686 
    688 RFC 3492                     IDNA Punycode                    March 2003
    689 
    690 
    691    happen because of the way bias is computed and because of the
    692    constraints on the parameters.
    693 
    694    The checks for overflow are necessary to avoid producing invalid
    695    output when the input contains very large values or is very long.
    696 
    697    The increment of delta at the bottom of the outer loop cannot
    698    overflow because delta < length(input) before the increment, and
    699    length(input) is already assumed to be representable.  The increment
    700    of n could overflow, but only if h == length(input), in which case
    701    the procedure is finished anyway.
    702 
    703 6.4 Overflow handling
    704 
    705    For IDNA, 26-bit unsigned integers are sufficient to handle all valid
    706    IDNA labels without overflow, because any string that needed a 27-bit
    707    delta would have to exceed either the code point limit (0..10FFFF) or
    708    the label length limit (63 characters).  However, overflow handling
    709    is necessary because the inputs are not necessarily valid IDNA
    710    labels.
    711 
    712    If the programming language does not provide overflow detection, the
    713    following technique can be used.  Suppose A, B, and C are
    714    representable nonnegative integers and C is nonzero.  Then A + B
    715    overflows if and only if B > maxint - A, and A + (B * C) overflows if
    716    and only if B > (maxint - A) div C, where maxint is the greatest
    717    integer for which maxint + 1 cannot be represented.  Refer to
    718    appendix C "Punycode sample implementation" for demonstrations of
    719    this technique in the C language.
    720 
    721    The decoding and encoding algorithms shown in sections 6.2 and 6.3
    722    handle overflow by detecting it whenever it happens.  Another
    723    approach is to enforce limits on the inputs that prevent overflow
    724    from happening.  For example, if the encoder were to verify that no
    725    input code points exceed M and that the input length does not exceed
    726    L, then no delta could ever exceed (M - initial_n) * (L + 1), and
    727    hence no overflow could occur if integer variables were capable of
    728    representing values that large.  This prevention approach would
    729    impose more restrictions on the input than the detection approach
    730    does, but might be considered simpler in some programming languages.
    731 
    732    In theory, the decoder could use an analogous approach, limiting the
    733    number of digits in a variable-length integer (that is, limiting the
    734    number of iterations in the innermost loop).  However, the number of
    735    digits that suffice to represent a given delta can sometimes
    736    represent much larger deltas (because of the adaptation), and hence
    737    this approach would probably need integers wider than 32 bits.
    738 
    739 
    740 
    741 
    742 Costello                    Standards Track                    [Page 13]
    743 
    745 RFC 3492                     IDNA Punycode                    March 2003
    746 
    747 
    748    Yet another approach for the decoder is to allow overflow to occur,
    749    but to check the final output string by re-encoding it and comparing
    750    to the decoder input.  If and only if they do not match (using a
    751    case-insensitive ASCII comparison) overflow has occurred.  This
    752    delayed-detection approach would not impose any more restrictions on
    753    the input than the immediate-detection approach does, and might be
    754    considered simpler in some programming languages.
    755 
    756    In fact, if the decoder is used only inside the IDNA ToUnicode
    757    operation [IDNA], then it need not check for overflow at all, because
    758    ToUnicode performs a higher level re-encoding and comparison, and a
    759    mismatch has the same consequence as if the Punycode decoder had
    760    failed.
    761 
    762 7. Punycode examples
    763 
    764 7.1 Sample strings
    765 
    766    In the Punycode encodings below, the ACE prefix is not shown.
    767    Backslashes show where line breaks have been inserted in strings too
    768    long for one line.
    769 
    770    The first several examples are all translations of the sentence "Why
    771    can't they just speak in <language>?" (courtesy of Michael Kaplan's
    772    "provincial" page [PROVINCIAL]).  Word breaks and punctuation have
    773    been removed, as is often done in domain names.
    774 
    775    (A) Arabic (Egyptian):
    776        u+0644 u+064A u+0647 u+0645 u+0627 u+0628 u+062A u+0643 u+0644
    777        u+0645 u+0648 u+0634 u+0639 u+0631 u+0628 u+064A u+061F
    778        Punycode: egbpdaj6bu4bxfgehfvwxn
    779 
    780    (B) Chinese (simplified):
    781        u+4ED6 u+4EEC u+4E3A u+4EC0 u+4E48 u+4E0D u+8BF4 u+4E2D u+6587
    782        Punycode: ihqwcrb4cv8a8dqg056pqjye
    783 
    784    (C) Chinese (traditional):
    785        u+4ED6 u+5011 u+7232 u+4EC0 u+9EBD u+4E0D u+8AAA u+4E2D u+6587
    786        Punycode: ihqwctvzc91f659drss3x8bo0yb
    787 
    788    (D) Czech: Pro<ccaron>prost<ecaron>nemluv<iacute><ccaron>esky
    789        U+0050 u+0072 u+006F u+010D u+0070 u+0072 u+006F u+0073 u+0074
    790        u+011B u+006E u+0065 u+006D u+006C u+0075 u+0076 u+00ED u+010D
    791        u+0065 u+0073 u+006B u+0079
    792        Punycode: Proprostnemluvesky-uyb24dma41a
    793 
    794 
    795 
    796 
    797 
    798 
    799 Costello                    Standards Track                    [Page 14]
    800 
    802 RFC 3492                     IDNA Punycode                    March 2003
    803 
    804 
    805    (E) Hebrew:
    806        u+05DC u+05DE u+05D4 u+05D4 u+05DD u+05E4 u+05E9 u+05D5 u+05D8
    807        u+05DC u+05D0 u+05DE u+05D3 u+05D1 u+05E8 u+05D9 u+05DD u+05E2
    808        u+05D1 u+05E8 u+05D9 u+05EA
    809        Punycode: 4dbcagdahymbxekheh6e0a7fei0b
    810 
    811    (F) Hindi (Devanagari):
    812        u+092F u+0939 u+0932 u+094B u+0917 u+0939 u+093F u+0928 u+094D
    813        u+0926 u+0940 u+0915 u+094D u+092F u+094B u+0902 u+0928 u+0939
    814        u+0940 u+0902 u+092C u+094B u+0932 u+0938 u+0915 u+0924 u+0947
    815        u+0939 u+0948 u+0902
    816        Punycode: i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
    817 
    818    (G) Japanese (kanji and hiragana):
    819        u+306A u+305C u+307F u+3093 u+306A u+65E5 u+672C u+8A9E u+3092
    820        u+8A71 u+3057 u+3066 u+304F u+308C u+306A u+3044 u+306E u+304B
    821        Punycode: n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
    822 
    823    (H) Korean (Hangul syllables):
    824        u+C138 u+ACC4 u+C758 u+BAA8 u+B4E0 u+C0AC u+B78C u+B4E4 u+C774
    825        u+D55C u+AD6D u+C5B4 u+B97C u+C774 u+D574 u+D55C u+B2E4 u+BA74
    826        u+C5BC u+B9C8 u+B098 u+C88B u+C744 u+AE4C
    827        Punycode: 989aomsvi5e83db1d2a355cv1e0vak1dwrv93d5xbh15a0dt30a5j\
    828                  psd879ccm6fea98c
    829 
    830    (I) Russian (Cyrillic):
    831        U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
    832        u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
    833        u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
    834        u+0438
    835        Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
    836 
    837    (J) Spanish: Porqu<eacute>nopuedensimplementehablarenEspa<ntilde>ol
    838        U+0050 u+006F u+0072 u+0071 u+0075 u+00E9 u+006E u+006F u+0070
    839        u+0075 u+0065 u+0064 u+0065 u+006E u+0073 u+0069 u+006D u+0070
    840        u+006C u+0065 u+006D u+0065 u+006E u+0074 u+0065 u+0068 u+0061
    841        u+0062 u+006C u+0061 u+0072 u+0065 u+006E U+0045 u+0073 u+0070
    842        u+0061 u+00F1 u+006F u+006C
    843        Punycode: PorqunopuedensimplementehablarenEspaol-fmd56a
    844 
    845    (K) Vietnamese:
    846        T<adotbelow>isaoh<odotbelow>kh<ocirc>ngth<ecirchookabove>ch\
    847        <ihookabove>n<oacute>iti<ecircacute>ngVi<ecircdotbelow>t
    848        U+0054 u+1EA1 u+0069 u+0073 u+0061 u+006F u+0068 u+1ECD u+006B
    849        u+0068 u+00F4 u+006E u+0067 u+0074 u+0068 u+1EC3 u+0063 u+0068
    850        u+1EC9 u+006E u+00F3 u+0069 u+0074 u+0069 u+1EBF u+006E u+0067
    851        U+0056 u+0069 u+1EC7 u+0074
    852        Punycode: TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g
    853 
    854 
    855 
    856 Costello                    Standards Track                    [Page 15]
    857 
    859 RFC 3492                     IDNA Punycode                    March 2003
    860 
    861 
    862    The next several examples are all names of Japanese music artists,
    863    song titles, and TV programs, just because the author happens to have
    864    them handy (but Japanese is useful for providing examples of single-
    865    row text, two-row text, ideographic text, and various mixtures
    866    thereof).
    867 
    868    (L) 3<nen>B<gumi><kinpachi><sensei>
    869        u+0033 u+5E74 U+0042 u+7D44 u+91D1 u+516B u+5148 u+751F
    870        Punycode: 3B-ww4c5e180e575a65lsy2b
    871 
    872    (M) <amuro><namie>-with-SUPER-MONKEYS
    873        u+5B89 u+5BA4 u+5948 u+7F8E u+6075 u+002D u+0077 u+0069 u+0074
    874        u+0068 u+002D U+0053 U+0055 U+0050 U+0045 U+0052 u+002D U+004D
    875        U+004F U+004E U+004B U+0045 U+0059 U+0053
    876        Punycode: -with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n
    877 
    878    (N) Hello-Another-Way-<sorezore><no><basho>
    879        U+0048 u+0065 u+006C u+006C u+006F u+002D U+0041 u+006E u+006F
    880        u+0074 u+0068 u+0065 u+0072 u+002D U+0057 u+0061 u+0079 u+002D
    881        u+305D u+308C u+305E u+308C u+306E u+5834 u+6240
    882        Punycode: Hello-Another-Way--fc4qua05auwb3674vfr0b
    883 
    884    (O) <hitotsu><yane><no><shita>2
    885        u+3072 u+3068 u+3064 u+5C4B u+6839 u+306E u+4E0B u+0032
    886        Punycode: 2-u9tlzr9756bt3uc0v
    887 
    888    (P) Maji<de>Koi<suru>5<byou><mae>
    889        U+004D u+0061 u+006A u+0069 u+3067 U+004B u+006F u+0069 u+3059
    890        u+308B u+0035 u+79D2 u+524D
    891        Punycode: MajiKoi5-783gue6qz075azm5e
    892 
    893    (Q) <pafii>de<runba>
    894        u+30D1 u+30D5 u+30A3 u+30FC u+0064 u+0065 u+30EB u+30F3 u+30D0
    895        Punycode: de-jg4avhby1noc0d
    896 
    897    (R) <sono><supiido><de>
    898        u+305D u+306E u+30B9 u+30D4 u+30FC u+30C9 u+3067
    899        Punycode: d9juau41awczczp
    900 
    901    The last example is an ASCII string that breaks the existing rules
    902    for host name labels.  (It is not a realistic example for IDNA,
    903    because IDNA never encodes pure ASCII labels.)
    904 
    905    (S) -> $1.00 <-
    906        u+002D u+003E u+0020 u+0024 u+0031 u+002E u+0030 u+0030 u+0020
    907        u+003C u+002D
    908        Punycode: -> $1.00 <--
    909 
    910 
    911 
    912 
    913 Costello                    Standards Track                    [Page 16]
    914 
    916 RFC 3492                     IDNA Punycode                    March 2003
    917 
    918 
    919 7.2 Decoding traces
    920 
    921    In the following traces, the evolving state of the decoder is shown
    922    as a sequence of hexadecimal values, representing the code points in
    923    the extended string.  An asterisk appears just after the most
    924    recently inserted code point, indicating both n (the value preceeding
    925    the asterisk) and i (the position of the value just after the
    926    asterisk).  Other numerical values are decimal.
    927 
    928    Decoding trace of example B from section 7.1:
    929 
    930    n is 128, i is 0, bias is 72
    931    input is "ihqwcrb4cv8a8dqg056pqjye"
    932    there is no delimiter, so extended string starts empty
    933    delta "ihq" decodes to 19853
    934    bias becomes 21
    935    4E0D *
    936    delta "wc" decodes to 64
    937    bias becomes 20
    938    4E0D 4E2D *
    939    delta "rb" decodes to 37
    940    bias becomes 13
    941    4E3A * 4E0D 4E2D
    942    delta "4c" decodes to 56
    943    bias becomes 17
    944    4E3A 4E48 * 4E0D 4E2D
    945    delta "v8a" decodes to 599
    946    bias becomes 32
    947    4E3A 4EC0 * 4E48 4E0D 4E2D
    948    delta "8d" decodes to 130
    949    bias becomes 23
    950    4ED6 * 4E3A 4EC0 4E48 4E0D 4E2D
    951    delta "qg" decodes to 154
    952    bias becomes 25
    953    4ED6 4EEC * 4E3A 4EC0 4E48 4E0D 4E2D
    954    delta "056p" decodes to 46301
    955    bias becomes 84
    956    4ED6 4EEC 4E3A 4EC0 4E48 4E0D 4E2D 6587 *
    957    delta "qjye" decodes to 88531
    958    bias becomes 90
    959    4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 * 4E2D 6587
    960 
    961 
    962 
    963 
    964 
    965 
    966 
    967 
    968 
    969 
    970 Costello                    Standards Track                    [Page 17]
    971 
    973 RFC 3492                     IDNA Punycode                    March 2003
    974 
    975 
    976    Decoding trace of example L from section 7.1:
    977 
    978    n is 128, i is 0, bias is 72
    979    input is "3B-ww4c5e180e575a65lsy2b"
    980    literal portion is "3B-", so extended string starts as:
    981    0033 0042
    982    delta "ww4c" decodes to 62042
    983    bias becomes 27
    984    0033 0042 5148 *
    985    delta "5e" decodes to 139
    986    bias becomes 24
    987    0033 0042 516B * 5148
    988    delta "180e" decodes to 16683
    989    bias becomes 67
    990    0033 5E74 * 0042 516B 5148
    991    delta "575a" decodes to 34821
    992    bias becomes 82
    993    0033 5E74 0042 516B 5148 751F *
    994    delta "65l" decodes to 14592
    995    bias becomes 67
    996    0033 5E74 0042 7D44 * 516B 5148 751F
    997    delta "sy2b" decodes to 42088
    998    bias becomes 84
    999    0033 5E74 0042 7D44 91D1 * 516B 5148 751F
   1000 
   1001 
   1002 
   1003 
   1004 
   1005 
   1006 
   1007 
   1008 
   1009 
   1010 
   1011 
   1012 
   1013 
   1014 
   1015 
   1016 
   1017 
   1018 
   1019 
   1020 
   1021 
   1022 
   1023 
   1024 
   1025 
   1026 
   1027 Costello                    Standards Track                    [Page 18]
   1028 
   1030 RFC 3492                     IDNA Punycode                    March 2003
   1031 
   1032 
   1033 7.3 Encoding traces
   1034 
   1035    In the following traces, code point values are hexadecimal, while
   1036    other numerical values are decimal.
   1037 
   1038    Encoding trace of example B from section 7.1:
   1039 
   1040    bias is 72
   1041    input is:
   1042    4ED6 4EEC 4E3A 4EC0 4E48 4E0D 8BF4 4E2D 6587
   1043    there are no basic code points, so no literal portion
   1044    next code point to insert is 4E0D
   1045    needed delta is 19853, encodes as "ihq"
   1046    bias becomes 21
   1047    next code point to insert is 4E2D
   1048    needed delta is 64, encodes as "wc"
   1049    bias becomes 20
   1050    next code point to insert is 4E3A
   1051    needed delta is 37, encodes as "rb"
   1052    bias becomes 13
   1053    next code point to insert is 4E48
   1054    needed delta is 56, encodes as "4c"
   1055    bias becomes 17
   1056    next code point to insert is 4EC0
   1057    needed delta is 599, encodes as "v8a"
   1058    bias becomes 32
   1059    next code point to insert is 4ED6
   1060    needed delta is 130, encodes as "8d"
   1061    bias becomes 23
   1062    next code point to insert is 4EEC
   1063    needed delta is 154, encodes as "qg"
   1064    bias becomes 25
   1065    next code point to insert is 6587
   1066    needed delta is 46301, encodes as "056p"
   1067    bias becomes 84
   1068    next code point to insert is 8BF4
   1069    needed delta is 88531, encodes as "qjye"
   1070    bias becomes 90
   1071    output is "ihqwcrb4cv8a8dqg056pqjye"
   1072 
   1073 
   1074 
   1075 
   1076 
   1077 
   1078 
   1079 
   1080 
   1081 
   1082 
   1083 
   1084 Costello                    Standards Track                    [Page 19]
   1085 
   1087 RFC 3492                     IDNA Punycode                    March 2003
   1088 
   1089 
   1090    Encoding trace of example L from section 7.1:
   1091 
   1092    bias is 72
   1093    input is:
   1094    0033 5E74 0042 7D44 91D1 516B 5148 751F
   1095    basic code points (0033, 0042) are copied to literal portion: "3B-"
   1096    next code point to insert is 5148
   1097    needed delta is 62042, encodes as "ww4c"
   1098    bias becomes 27
   1099    next code point to insert is 516B
   1100    needed delta is 139, encodes as "5e"
   1101    bias becomes 24
   1102    next code point to insert is 5E74
   1103    needed delta is 16683, encodes as "180e"
   1104    bias becomes 67
   1105    next code point to insert is 751F
   1106    needed delta is 34821, encodes as "575a"
   1107    bias becomes 82
   1108    next code point to insert is 7D44
   1109    needed delta is 14592, encodes as "65l"
   1110    bias becomes 67
   1111    next code point to insert is 91D1
   1112    needed delta is 42088, encodes as "sy2b"
   1113    bias becomes 84
   1114    output is "3B-ww4c5e180e575a65lsy2b"
   1115 
   1116 8. Security Considerations
   1117 
   1118    Users expect each domain name in DNS to be controlled by a single
   1119    authority.  If a Unicode string intended for use as a domain label
   1120    could map to multiple ACE labels, then an internationalized domain
   1121    name could map to multiple ASCII domain names, each controlled by a
   1122    different authority, some of which could be spoofs that hijack
   1123    service requests intended for another.  Therefore Punycode is
   1124    designed so that each Unicode string has a unique encoding.
   1125 
   1126    However, there can still be multiple Unicode representations of the
   1127    "same" text, for various definitions of "same".  This problem is
   1128    addressed to some extent by the Unicode standard under the topic of
   1129    canonicalization, and this work is leveraged for domain names by
   1130    Nameprep [NAMEPREP].
   1131 
   1132 
   1133 
   1134 
   1135 
   1136 
   1137 
   1138 
   1139 
   1140 
   1141 Costello                    Standards Track                    [Page 20]
   1142 
   1144 RFC 3492                     IDNA Punycode                    March 2003
   1145 
   1146 
   1147 9. References
   1148 
   1149 9.1 Normative References
   1150 
   1151    [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
   1152                 Requirement Levels", BCP 14, RFC 2119, March 1997.
   1153 
   1154 9.2 Informative References
   1155 
   1156    [RFC952]     Harrenstien, K., Stahl, M. and E. Feinler, "DOD Internet
   1157                 Host Table Specification", RFC 952, October 1985.
   1158 
   1159    [RFC1034]    Mockapetris, P., "Domain Names - Concepts and
   1160                 Facilities", STD 13, RFC 1034, November 1987.
   1161 
   1162    [IDNA]       Faltstrom, P., Hoffman, P. and A. Costello,
   1163                 "Internationalizing Domain Names in Applications
   1164                 (IDNA)", RFC 3490, March 2003.
   1165 
   1166    [NAMEPREP]   Hoffman, P. and  M. Blanchet, "Nameprep: A Stringprep
   1167                 Profile for Internationalized Domain Names (IDN)", RFC
   1168                 3491, March 2003.
   1169 
   1170    [ASCII]      Cerf, V., "ASCII format for Network Interchange", RFC
   1171                 20, October 1969.
   1172 
   1173    [PROVINCIAL] Kaplan, M., "The 'anyone can be provincial!' page",
   1174                 http://www.trigeminal.com/samples/provincial.html.
   1175 
   1176    [UNICODE]    The Unicode Consortium, "The Unicode Standard",
   1177                 http://www.unicode.org/unicode/standard/standard.html.
   1178 
   1179 
   1180 
   1181 
   1182 
   1183 
   1184 
   1185 
   1186 
   1187 
   1188 
   1189 
   1190 
   1191 
   1192 
   1193 
   1194 
   1195 
   1196 
   1197 
   1198 Costello                    Standards Track                    [Page 21]
   1199 
   1201 RFC 3492                     IDNA Punycode                    March 2003
   1202 
   1203 
   1204 A. Mixed-case annotation
   1205 
   1206    In order to use Punycode to represent case-insensitive strings,
   1207    higher layers need to case-fold the strings prior to Punycode
   1208    encoding.  The encoded string can use mixed case as an annotation
   1209    telling how to convert the folded string into a mixed-case string for
   1210    display purposes.  Note, however, that mixed-case annotation is not
   1211    used by the ToASCII and ToUnicode operations specified in [IDNA], and
   1212    therefore implementors of IDNA can disregard this appendix.
   1213 
   1214    Basic code points can use mixed case directly, because the decoder
   1215    copies them verbatim, leaving lowercase code points lowercase, and
   1216    leaving uppercase code points uppercase.  Each non-basic code point
   1217    is represented by a delta, which is represented by a sequence of
   1218    basic code points, the last of which provides the annotation.  If it
   1219    is uppercase, it is a suggestion to map the non-basic code point to
   1220    uppercase (if possible); if it is lowercase, it is a suggestion to
   1221    map the non-basic code point to lowercase (if possible).
   1222 
   1223    These annotations do not alter the code points returned by decoders;
   1224    the annotations are returned separately, for the caller to use or
   1225    ignore.  Encoders can accept annotations in addition to code points,
   1226    but the annotations do not alter the output, except to influence the
   1227    uppercase/lowercase form of ASCII letters.
   1228 
   1229    Punycode encoders and decoders need not support these annotations,
   1230    and higher layers need not use them.
   1231 
   1232 B. Disclaimer and license
   1233 
   1234    Regarding this entire document or any portion of it (including the
   1235    pseudocode and C code), the author makes no guarantees and is not
   1236    responsible for any damage resulting from its use.  The author grants
   1237    irrevocable permission to anyone to use, modify, and distribute it in
   1238    any way that does not diminish the rights of anyone else to use,
   1239    modify, and distribute it, provided that redistributed derivative
   1240    works do not contain misleading author or version information.
   1241    Derivative works need not be licensed under similar terms.
   1242 
   1243 
   1244 
   1245 
   1246 
   1247 
   1248 
   1249 
   1250 
   1251 
   1252 
   1253 
   1254 
   1255 Costello                    Standards Track                    [Page 22]
   1256 
   1258 RFC 3492                     IDNA Punycode                    March 2003
   1259 
   1260 
   1261 C. Punycode sample implementation
   1262 
   1263 /*
   1264 punycode.c from RFC 3492
   1265 http://www.nicemice.net/idn/
   1266 Adam M. Costello
   1267 http://www.nicemice.net/amc/
   1268 
   1269 This is ANSI C code (C89) implementing Punycode (RFC 3492).
   1270 
   1271 */
   1272 
   1273 
   1274 /************************************************************/
   1275 /* Public interface (would normally go in its own .h file): */
   1276 
   1277 #include <limits.h>
   1278 
   1279 enum punycode_status {
   1280   punycode_success,
   1281   punycode_bad_input,   /* Input is invalid.                       */
   1282   punycode_big_output,  /* Output would exceed the space provided. */
   1283   punycode_overflow     /* Input needs wider integers to process.  */
   1284 };
   1285 
   1286 #if UINT_MAX >= (1 << 26) - 1
   1287 typedef unsigned int punycode_uint;
   1288 #else
   1289 typedef unsigned long punycode_uint;
   1290 #endif
   1291 
   1292 enum punycode_status punycode_encode(
   1293   punycode_uint input_length,
   1294   const punycode_uint input[],
   1295   const unsigned char case_flags[],
   1296   punycode_uint *output_length,
   1297   char output[] );
   1298 
   1299     /* punycode_encode() converts Unicode to Punycode.  The input     */
   1300     /* is represented as an array of Unicode code points (not code    */
   1301     /* units; surrogate pairs are not allowed), and the output        */
   1302     /* will be represented as an array of ASCII code points.  The     */
   1303     /* output string is *not* null-terminated; it will contain        */
   1304     /* zeros if and only if the input contains zeros.  (Of course     */
   1305     /* the caller can leave room for a terminator and add one if      */
   1306     /* needed.)  The input_length is the number of code points in     */
   1307     /* the input.  The output_length is an in/out argument: the       */
   1308     /* caller passes in the maximum number of code points that it     */
   1309 
   1310 
   1311 
   1312 Costello                    Standards Track                    [Page 23]
   1313 
   1315 RFC 3492                     IDNA Punycode                    March 2003
   1316 
   1317 
   1318     /* can receive, and on successful return it will contain the      */
   1319     /* number of code points actually output.  The case_flags array   */
   1320     /* holds input_length boolean values, where nonzero suggests that */
   1321     /* the corresponding Unicode character be forced to uppercase     */
   1322     /* after being decoded (if possible), and zero suggests that      */
   1323     /* it be forced to lowercase (if possible).  ASCII code points    */
   1324     /* are encoded literally, except that ASCII letters are forced    */
   1325     /* to uppercase or lowercase according to the corresponding       */
   1326     /* uppercase flags.  If case_flags is a null pointer then ASCII   */
   1327     /* letters are left as they are, and other code points are        */
   1328     /* treated as if their uppercase flags were zero.  The return     */
   1329     /* value can be any of the punycode_status values defined above   */
   1330     /* except punycode_bad_input; if not punycode_success, then       */
   1331     /* output_size and output might contain garbage.                  */
   1332 
   1333 enum punycode_status punycode_decode(
   1334   punycode_uint input_length,
   1335   const char input[],
   1336   punycode_uint *output_length,
   1337   punycode_uint output[],
   1338   unsigned char case_flags[] );
   1339 
   1340     /* punycode_decode() converts Punycode to Unicode.  The input is  */
   1341     /* represented as an array of ASCII code points, and the output   */
   1342     /* will be represented as an array of Unicode code points.  The   */
   1343     /* input_length is the number of code points in the input.  The   */
   1344     /* output_length is an in/out argument: the caller passes in      */
   1345     /* the maximum number of code points that it can receive, and     */
   1346     /* on successful return it will contain the actual number of      */
   1347     /* code points output.  The case_flags array needs room for at    */
   1348     /* least output_length values, or it can be a null pointer if the */
   1349     /* case information is not needed.  A nonzero flag suggests that  */
   1350     /* the corresponding Unicode character be forced to uppercase     */
   1351     /* by the caller (if possible), while zero suggests that it be    */
   1352     /* forced to lowercase (if possible).  ASCII code points are      */
   1353     /* output already in the proper case, but their flags will be set */
   1354     /* appropriately so that applying the flags would be harmless.    */
   1355     /* The return value can be any of the punycode_status values      */
   1356     /* defined above; if not punycode_success, then output_length,    */
   1357     /* output, and case_flags might contain garbage.  On success, the */
   1358     /* decoder will never need to write an output_length greater than */
   1359     /* input_length, because of how the encoding is defined.          */
   1360 
   1361 /**********************************************************/
   1362 /* Implementation (would normally go in its own .c file): */
   1363 
   1364 #include <string.h>
   1365 
   1366 
   1367 
   1368 
   1369 Costello                    Standards Track                    [Page 24]
   1370 
   1372 RFC 3492                     IDNA Punycode                    March 2003
   1373 
   1374 
   1375 /*** Bootstring parameters for Punycode ***/
   1376 
   1377 enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700,
   1378        initial_bias = 72, initial_n = 0x80, delimiter = 0x2D };
   1379 
   1380 /* basic(cp) tests whether cp is a basic code point: */
   1381 #define basic(cp) ((punycode_uint)(cp) < 0x80)
   1382 
   1383 /* delim(cp) tests whether cp is a delimiter: */
   1384 #define delim(cp) ((cp) == delimiter)
   1385 
   1386 /* decode_digit(cp) returns the numeric value of a basic code */
   1387 /* point (for use in representing integers) in the range 0 to */
   1388 /* base-1, or base if cp is does not represent a value.       */
   1389 
   1390 static punycode_uint decode_digit(punycode_uint cp)
   1391 {
   1392   return  cp - 48 < 10 ? cp - 22 :  cp - 65 < 26 ? cp - 65 :
   1393           cp - 97 < 26 ? cp - 97 :  base;
   1394 }
   1395 
   1396 /* encode_digit(d,flag) returns the basic code point whose value      */
   1397 /* (when used for representing integers) is d, which needs to be in   */
   1398 /* the range 0 to base-1.  The lowercase form is used unless flag is  */
   1399 /* nonzero, in which case the uppercase form is used.  The behavior   */
   1400 /* is undefined if flag is nonzero and digit d has no uppercase form. */
   1401 
   1402 static char encode_digit(punycode_uint d, int flag)
   1403 {
   1404   return d + 22 + 75 * (d < 26) - ((flag != 0) << 5);
   1405   /*  0..25 map to ASCII a..z or A..Z */
   1406   /* 26..35 map to ASCII 0..9         */
   1407 }
   1408 
   1409 /* flagged(bcp) tests whether a basic code point is flagged */
   1410 /* (uppercase).  The behavior is undefined if bcp is not a  */
   1411 /* basic code point.                                        */
   1412 
   1413 #define flagged(bcp) ((punycode_uint)(bcp) - 65 < 26)
   1414 
   1415 /* encode_basic(bcp,flag) forces a basic code point to lowercase */
   1416 /* if flag is zero, uppercase if flag is nonzero, and returns    */
   1417 /* the resulting code point.  The code point is unchanged if it  */
   1418 /* is caseless.  The behavior is undefined if bcp is not a basic */
   1419 /* code point.                                                   */
   1420 
   1421 static char encode_basic(punycode_uint bcp, int flag)
   1422 {
   1423 
   1424 
   1425 
   1426 Costello                    Standards Track                    [Page 25]
   1427 
   1429 RFC 3492                     IDNA Punycode                    March 2003
   1430 
   1431 
   1432   bcp -= (bcp - 97 < 26) << 5;
   1433   return bcp + ((!flag && (bcp - 65 < 26)) << 5);
   1434 }
   1435 
   1436 /*** Platform-specific constants ***/
   1437 
   1438 /* maxint is the maximum value of a punycode_uint variable: */
   1439 static const punycode_uint maxint = -1;
   1440 /* Because maxint is unsigned, -1 becomes the maximum value. */
   1441 
   1442 /*** Bias adaptation function ***/
   1443 
   1444 static punycode_uint adapt(
   1445   punycode_uint delta, punycode_uint numpoints, int firsttime )
   1446 {
   1447   punycode_uint k;
   1448 
   1449   delta = firsttime ? delta / damp : delta >> 1;
   1450   /* delta >> 1 is a faster way of doing delta / 2 */
   1451   delta += delta / numpoints;
   1452 
   1453   for (k = 0;  delta > ((base - tmin) * tmax) / 2;  k += base) {
   1454     delta /= base - tmin;
   1455   }
   1456 
   1457   return k + (base - tmin + 1) * delta / (delta + skew);
   1458 }
   1459 
   1460 /*** Main encode function ***/
   1461 
   1462 enum punycode_status punycode_encode(
   1463   punycode_uint input_length,
   1464   const punycode_uint input[],
   1465   const unsigned char case_flags[],
   1466   punycode_uint *output_length,
   1467   char output[] )
   1468 {
   1469   punycode_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t;
   1470 
   1471   /* Initialize the state: */
   1472 
   1473   n = initial_n;
   1474   delta = out = 0;
   1475   max_out = *output_length;
   1476   bias = initial_bias;
   1477 
   1478   /* Handle the basic code points: */
   1479 
   1480 
   1481 
   1482 
   1483 Costello                    Standards Track                    [Page 26]
   1484 
   1486 RFC 3492                     IDNA Punycode                    March 2003
   1487 
   1488 
   1489   for (j = 0;  j < input_length;  ++j) {
   1490     if (basic(input[j])) {
   1491       if (max_out - out < 2) return punycode_big_output;
   1492       output[out++] =
   1493         case_flags ?  encode_basic(input[j], case_flags[j]) : input[j];
   1494     }
   1495     /* else if (input[j] < n) return punycode_bad_input; */
   1496     /* (not needed for Punycode with unsigned code points) */
   1497   }
   1498 
   1499   h = b = out;
   1500 
   1501   /* h is the number of code points that have been handled, b is the  */
   1502   /* number of basic code points, and out is the number of characters */
   1503   /* that have been output.                                           */
   1504 
   1505   if (b > 0) output[out++] = delimiter;
   1506 
   1507   /* Main encoding loop: */
   1508 
   1509   while (h < input_length) {
   1510     /* All non-basic code points < n have been     */
   1511     /* handled already.  Find the next larger one: */
   1512 
   1513     for (m = maxint, j = 0;  j < input_length;  ++j) {
   1514       /* if (basic(input[j])) continue; */
   1515       /* (not needed for Punycode) */
   1516       if (input[j] >= n && input[j] < m) m = input[j];
   1517     }
   1518 
   1519     /* Increase delta enough to advance the decoder's    */
   1520     /* <n,i> state to <m,0>, but guard against overflow: */
   1521 
   1522     if (m - n > (maxint - delta) / (h + 1)) return punycode_overflow;
   1523     delta += (m - n) * (h + 1);
   1524     n = m;
   1525 
   1526     for (j = 0;  j < input_length;  ++j) {
   1527       /* Punycode does not need to check whether input[j] is basic: */
   1528       if (input[j] < n /* || basic(input[j]) */ ) {
   1529         if (++delta == 0) return punycode_overflow;
   1530       }
   1531 
   1532       if (input[j] == n) {
   1533         /* Represent delta as a generalized variable-length integer: */
   1534 
   1535         for (q = delta, k = base;  ;  k += base) {
   1536           if (out >= max_out) return punycode_big_output;
   1537 
   1538 
   1539 
   1540 Costello                    Standards Track                    [Page 27]
   1541 
   1543 RFC 3492                     IDNA Punycode                    March 2003
   1544 
   1545 
   1546           t = k <= bias /* + tmin */ ? tmin :     /* +tmin not needed */
   1547               k >= bias + tmax ? tmax : k - bias;
   1548           if (q < t) break;
   1549           output[out++] = encode_digit(t + (q - t) % (base - t), 0);
   1550           q = (q - t) / (base - t);
   1551         }
   1552 
   1553         output[out++] = encode_digit(q, case_flags && case_flags[j]);
   1554         bias = adapt(delta, h + 1, h == b);
   1555         delta = 0;
   1556         ++h;
   1557       }
   1558     }
   1559 
   1560     ++delta, ++n;
   1561   }
   1562 
   1563   *output_length = out;
   1564   return punycode_success;
   1565 }
   1566 
   1567 /*** Main decode function ***/
   1568 
   1569 enum punycode_status punycode_decode(
   1570   punycode_uint input_length,
   1571   const char input[],
   1572   punycode_uint *output_length,
   1573   punycode_uint output[],
   1574   unsigned char case_flags[] )
   1575 {
   1576   punycode_uint n, out, i, max_out, bias,
   1577                  b, j, in, oldi, w, k, digit, t;
   1578 
   1579   /* Initialize the state: */
   1580 
   1581   n = initial_n;
   1582   out = i = 0;
   1583   max_out = *output_length;
   1584   bias = initial_bias;
   1585 
   1586   /* Handle the basic code points:  Let b be the number of input code */
   1587   /* points before the last delimiter, or 0 if there is none, then    */
   1588   /* copy the first b code points to the output.                      */
   1589 
   1590   for (b = j = 0;  j < input_length;  ++j) if (delim(input[j])) b = j;
   1591   if (b > max_out) return punycode_big_output;
   1592 
   1593   for (j = 0;  j < b;  ++j) {
   1594 
   1595 
   1596 
   1597 Costello                    Standards Track                    [Page 28]
   1598 
   1600 RFC 3492                     IDNA Punycode                    March 2003
   1601 
   1602 
   1603     if (case_flags) case_flags[out] = flagged(input[j]);
   1604     if (!basic(input[j])) return punycode_bad_input;
   1605     output[out++] = input[j];
   1606   }
   1607 
   1608   /* Main decoding loop:  Start just after the last delimiter if any  */
   1609   /* basic code points were copied; start at the beginning otherwise. */
   1610 
   1611   for (in = b > 0 ? b + 1 : 0;  in < input_length;  ++out) {
   1612 
   1613     /* in is the index of the next character to be consumed, and */
   1614     /* out is the number of code points in the output array.     */
   1615 
   1616     /* Decode a generalized variable-length integer into delta,  */
   1617     /* which gets added to i.  The overflow checking is easier   */
   1618     /* if we increase i as we go, then subtract off its starting */
   1619     /* value at the end to obtain delta.                         */
   1620 
   1621     for (oldi = i, w = 1, k = base;  ;  k += base) {
   1622       if (in >= input_length) return punycode_bad_input;
   1623       digit = decode_digit(input[in++]);
   1624       if (digit >= base) return punycode_bad_input;
   1625       if (digit > (maxint - i) / w) return punycode_overflow;
   1626       i += digit * w;
   1627       t = k <= bias /* + tmin */ ? tmin :     /* +tmin not needed */
   1628           k >= bias + tmax ? tmax : k - bias;
   1629       if (digit < t) break;
   1630       if (w > maxint / (base - t)) return punycode_overflow;
   1631       w *= (base - t);
   1632     }
   1633 
   1634     bias = adapt(i - oldi, out + 1, oldi == 0);
   1635 
   1636     /* i was supposed to wrap around from out+1 to 0,   */
   1637     /* incrementing n each time, so we'll fix that now: */
   1638 
   1639     if (i / (out + 1) > maxint - n) return punycode_overflow;
   1640     n += i / (out + 1);
   1641     i %= (out + 1);
   1642 
   1643     /* Insert n at position i of the output: */
   1644 
   1645     /* not needed for Punycode: */
   1646     /* if (decode_digit(n) <= base) return punycode_invalid_input; */
   1647     if (out >= max_out) return punycode_big_output;
   1648 
   1649     if (case_flags) {
   1650       memmove(case_flags + i + 1, case_flags + i, out - i);
   1651 
   1652 
   1653 
   1654 Costello                    Standards Track                    [Page 29]
   1655 
   1657 RFC 3492                     IDNA Punycode                    March 2003
   1658 
   1659 
   1660       /* Case of last character determines uppercase flag: */
   1661       case_flags[i] = flagged(input[in - 1]);
   1662     }
   1663 
   1664     memmove(output + i + 1, output + i, (out - i) * sizeof *output);
   1665     output[i++] = n;
   1666   }
   1667 
   1668   *output_length = out;
   1669   return punycode_success;
   1670 }
   1671 
   1672 /******************************************************************/
   1673 /* Wrapper for testing (would normally go in a separate .c file): */
   1674 
   1675 #include <assert.h>
   1676 #include <stdio.h>
   1677 #include <stdlib.h>
   1678 #include <string.h>
   1679 
   1680 /* For testing, we'll just set some compile-time limits rather than */
   1681 /* use malloc(), and set a compile-time option rather than using a  */
   1682 /* command-line option.                                             */
   1683 
   1684 enum {
   1685   unicode_max_length = 256,
   1686   ace_max_length = 256
   1687 };
   1688 
   1689 static void usage(char **argv)
   1690 {
   1691   fprintf(stderr,
   1692     "\n"
   1693     "%s -e reads code points and writes a Punycode string.\n"
   1694     "%s -d reads a Punycode string and writes code points.\n"
   1695     "\n"
   1696     "Input and output are plain text in the native character set.\n"
   1697     "Code points are in the form u+hex separated by whitespace.\n"
   1698     "Although the specification allows Punycode strings to contain\n"
   1699     "any characters from the ASCII repertoire, this test code\n"
   1700     "supports only the printable characters, and needs the Punycode\n"
   1701     "string to be followed by a newline.\n"
   1702     "The case of the u in u+hex is the force-to-uppercase flag.\n"
   1703     , argv[0], argv[0]);
   1704   exit(EXIT_FAILURE);
   1705 }
   1706 
   1707 static void fail(const char *msg)
   1708 
   1709 
   1710 
   1711 Costello                    Standards Track                    [Page 30]
   1712 
   1714 RFC 3492                     IDNA Punycode                    March 2003
   1715 
   1716 
   1717 {
   1718   fputs(msg,stderr);
   1719   exit(EXIT_FAILURE);
   1720 }
   1721 
   1722 static const char too_big[] =
   1723   "input or output is too large, recompile with larger limits\n";
   1724 static const char invalid_input[] = "invalid input\n";
   1725 static const char overflow[] = "arithmetic overflow\n";
   1726 static const char io_error[] = "I/O error\n";
   1727 
   1728 /* The following string is used to convert printable */
   1729 /* characters between ASCII and the native charset:  */
   1730 
   1731 static const char print_ascii[] =
   1732   "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
   1733   "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
   1734   " !\"#$%&'()*+,-./"
   1735   "0123456789:;<=>?"
   1736   "@ABCDEFGHIJKLMNO"
   1737   "PQRSTUVWXYZ[\\]^_"
   1738   "`abcdefghijklmno"
   1739   "pqrstuvwxyz{|}~\n";
   1740 
   1741 int main(int argc, char **argv)
   1742 {
   1743   enum punycode_status status;
   1744   int r;
   1745   unsigned int input_length, output_length, j;
   1746   unsigned char case_flags[unicode_max_length];
   1747 
   1748   if (argc != 2) usage(argv);
   1749   if (argv[1][0] != '-') usage(argv);
   1750   if (argv[1][2] != 0) usage(argv);
   1751 
   1752   if (argv[1][1] == 'e') {
   1753     punycode_uint input[unicode_max_length];
   1754     unsigned long codept;
   1755     char output[ace_max_length+1], uplus[3];
   1756     int c;
   1757 
   1758     /* Read the input code points: */
   1759 
   1760     input_length = 0;
   1761 
   1762     for (;;) {
   1763       r = scanf("%2s%lx", uplus, &codept);
   1764       if (ferror(stdin)) fail(io_error);
   1765 
   1766 
   1767 
   1768 Costello                    Standards Track                    [Page 31]
   1769 
   1771 RFC 3492                     IDNA Punycode                    March 2003
   1772 
   1773 
   1774       if (r == EOF || r == 0) break;
   1775 
   1776       if (r != 2 || uplus[1] != '+' || codept > (punycode_uint)-1) {
   1777         fail(invalid_input);
   1778       }
   1779 
   1780       if (input_length == unicode_max_length) fail(too_big);
   1781 
   1782       if (uplus[0] == 'u') case_flags[input_length] = 0;
   1783       else if (uplus[0] == 'U') case_flags[input_length] = 1;
   1784       else fail(invalid_input);
   1785 
   1786       input[input_length++] = codept;
   1787     }
   1788 
   1789     /* Encode: */
   1790 
   1791     output_length = ace_max_length;
   1792     status = punycode_encode(input_length, input, case_flags,
   1793                              &output_length, output);
   1794     if (status == punycode_bad_input) fail(invalid_input);
   1795     if (status == punycode_big_output) fail(too_big);
   1796     if (status == punycode_overflow) fail(overflow);
   1797     assert(status == punycode_success);
   1798 
   1799     /* Convert to native charset and output: */
   1800 
   1801     for (j = 0;  j < output_length;  ++j) {
   1802       c = output[j];
   1803       assert(c >= 0 && c <= 127);
   1804       if (print_ascii[c] == 0) fail(invalid_input);
   1805       output[j] = print_ascii[c];
   1806     }
   1807 
   1808     output[j] = 0;
   1809     r = puts(output);
   1810     if (r == EOF) fail(io_error);
   1811     return EXIT_SUCCESS;
   1812   }
   1813 
   1814   if (argv[1][1] == 'd') {
   1815     char input[ace_max_length+2], *p, *pp;
   1816     punycode_uint output[unicode_max_length];
   1817 
   1818     /* Read the Punycode input string and convert to ASCII: */
   1819 
   1820     fgets(input, ace_max_length+2, stdin);
   1821     if (ferror(stdin)) fail(io_error);
   1822 
   1823 
   1824 
   1825 Costello                    Standards Track                    [Page 32]
   1826 
   1828 RFC 3492                     IDNA Punycode                    March 2003
   1829 
   1830 
   1831     if (feof(stdin)) fail(invalid_input);
   1832     input_length = strlen(input) - 1;
   1833     if (input[input_length] != '\n') fail(too_big);
   1834     input[input_length] = 0;
   1835 
   1836     for (p = input;  *p != 0;  ++p) {
   1837       pp = strchr(print_ascii, *p);
   1838       if (pp == 0) fail(invalid_input);
   1839       *p = pp - print_ascii;
   1840     }
   1841 
   1842     /* Decode: */
   1843 
   1844     output_length = unicode_max_length;
   1845     status = punycode_decode(input_length, input, &output_length,
   1846                              output, case_flags);
   1847     if (status == punycode_bad_input) fail(invalid_input);
   1848     if (status == punycode_big_output) fail(too_big);
   1849     if (status == punycode_overflow) fail(overflow);
   1850     assert(status == punycode_success);
   1851 
   1852     /* Output the result: */
   1853 
   1854     for (j = 0;  j < output_length;  ++j) {
   1855       r = printf("%s+%04lX\n",
   1856                  case_flags[j] ? "U" : "u",
   1857                  (unsigned long) output[j] );
   1858       if (r < 0) fail(io_error);
   1859     }
   1860 
   1861     return EXIT_SUCCESS;
   1862   }
   1863 
   1864   usage(argv);
   1865   return EXIT_SUCCESS;  /* not reached, but quiets compiler warning */
   1866 }
   1867 
   1868 
   1869 
   1870 
   1871 
   1872 
   1873 
   1874 
   1875 
   1876 
   1877 
   1878 
   1879 
   1880 
   1881 
   1882 Costello                    Standards Track                    [Page 33]
   1883 
   1885 RFC 3492                     IDNA Punycode                    March 2003
   1886 
   1887 
   1888 Author's Address
   1889 
   1890    Adam M. Costello
   1891    University of California, Berkeley
   1892    http://www.nicemice.net/amc/
   1893 
   1894 
   1895 
   1896 
   1897 
   1898 
   1899 
   1900 
   1901 
   1902 
   1903 
   1904 
   1905 
   1906 
   1907 
   1908 
   1909 
   1910 
   1911 
   1912 
   1913 
   1914 
   1915 
   1916 
   1917 
   1918 
   1919 
   1920 
   1921 
   1922 
   1923 
   1924 
   1925 
   1926 
   1927 
   1928 
   1929 
   1930 
   1931 
   1932 
   1933 
   1934 
   1935 
   1936 
   1937 
   1938 
   1939 Costello                    Standards Track                    [Page 34]
   1940 
   1942 RFC 3492                     IDNA Punycode                    March 2003
   1943 
   1944 
   1945 Full Copyright Statement
   1946 
   1947    Copyright (C) The Internet Society (2003).  All Rights Reserved.
   1948 
   1949    This document and translations of it may be copied and furnished to
   1950    others, and derivative works that comment on or otherwise explain it
   1951    or assist in its implementation may be prepared, copied, published
   1952    and distributed, in whole or in part, without restriction of any
   1953    kind, provided that the above copyright notice and this paragraph are
   1954    included on all such copies and derivative works.  However, this
   1955    document itself may not be modified in any way, such as by removing
   1956    the copyright notice or references to the Internet Society or other
   1957    Internet organizations, except as needed for the purpose of
   1958    developing Internet standards in which case the procedures for
   1959    copyrights defined in the Internet Standards process must be
   1960    followed, or as required to translate it into languages other than
   1961    English.
   1962 
   1963    The limited permissions granted above are perpetual and will not be
   1964    revoked by the Internet Society or its successors or assigns.
   1965 
   1966    This document and the information contained herein is provided on an
   1967    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   1968    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   1969    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   1970    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   1971    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
   1972 
   1973 Acknowledgement
   1974 
   1975    Funding for the RFC Editor function is currently provided by the
   1976    Internet Society.
   1977 
   1978 
   1979 
   1980 
   1981 
   1982 
   1983 
   1984 
   1985 
   1986 
   1987 
   1988 
   1989 
   1990 
   1991 
   1992 
   1993 
   1994 
   1995 
   1996 Costello                    Standards Track                    [Page 35]
   1997 
   1999