Home | History | Annotate | Line # | Download | only in simdzone
      1 # Zone files
      2 
      3 Zone files are text files that contain resource records (RRs) in text form.
      4 Zones can be defined by expressing them in the form of a list of RRs.
      5 
      6 Zone files were originally specified in RFC1035 Section 5, but the DNS
      7 has seen many additions since and the specification is rather ambiguous.
      8 Consequently, various name servers implement slightly different dialects. This
      9 document aims to clarify the format by listing (some of) the relevant
     10 specifications and then proceed to explain why certain design decisions were
     11 made in simdzone.
     12 
     13 * [RFC 1034 Section 3.6.1][rfc1034#3.6.1]
     14 * [RFC 1035 Section 5][rfc1035#5]
     15 * [RFC 2065 Section 4.5][rfc2065#4.5]
     16 * [RFC 2181 Section 8][rfc2181#8]
     17 * [RFC 2308 Section 4][rfc2308#4]
     18 * [RFC 3597 Section 5][rfc3597#5]
     19 * [RFC 9460 Section 2.1][rfc9460#2.1]
     20 
     21 
     22 ## Clarification (work-in-progress)
     23 
     24 > NOTE: BIND behavior is more-or-less considered the de facto standard.
     25 
     26 Historically, master files where edited by hand, which is reflected in the
     27 syntax. Consider the format a tabular serialization format with provisions
     28 for convenient editing. i.e. the owner, class and ttl fields may be omitted
     29 (provided the line starts with \<blank\> for the owner) and $INCLUDE directives
     30 can be used for templating.
     31 
     32 The format is NOT context-free. The field following the owner (if specified)
     33 may represent either a type, class or ttl and a symbolic constant, e.g. A
     34 or NS, may have a different meaning if specified as an RDATA field.
     35 
     36 The DNS is intentionally extensible. The specification is not explicit about
     37 how that affects syntax, but it explains why no specific notation for
     38 data-types can be enforced by RFC 1035. To make it easier for data-types to
     39 be added at a later stage the syntax cannot enforce a certain notation (or
     40 the scanner would need to be revised). Consequently, the scanner only
     41 identifies items (or fields) and structural characters, which can be
     42 expressed as either a contiguous set of characters without interior spaces,
     43 or as a quoted string.
     44 
     45 The format allows for including structural characters in fields by means of
     46 escaping the actual character or enclosing the field in quotes. The example
     47 provided by the specification here is using ASCII dots in domain name labels.
     48 The dot is normally a label separator, replaced by the length of the label
     49 on the wire. If a domain name includes an actual ASCII dot, the character
     50 must be escaped in the textual representation (`\X` or `\DDD`).
     51 
     52 Note that ASCII dot characters strictly speaking do not have to be escaped
     53 in a quoted string. RFC 1035 clearly states labels in domain names are
     54 expressed as character strings. However, behavior differs across
     55 implementations, so support for quoted labels is best dropped (see below).
     56 
     57 RFC 1035 states both \<contiguous\> and \<quoted\> are \<character-string\>.
     58 Meaning, items can be either \<contiguous\> or \<quoted\>. Wether a specific
     59 item is interpreted as a \<character-string\> depends on type of value for
     60 that item. E.g., TTLs are decimal integers and therefore cannot be expressed
     61 as \<quoted\> as it is not a \<character-string\>. Similarly, base64
     62 sequences are encoded binary blobs, not \<character-string\>s and therefore
     63 cannot be expressed as such. Escape sequences are valid only in
     64 \<character-string\>s.
     65 
     66 * Mnemonics are NOT character strings.
     67 
     68   > BIND does not accept quoted fields for A or NS RDATA. TTL values in SOA
     69   > RDATA, base64 Signature in DNSKEY RDATA, as well as type, class and TTL
     70   > header fields all result in a syntax error too if quoted.
     71 
     72 * Some integer fields allow for using mnemonics too. E.g., the algorithm
     73   field in RRSIG records.
     74 
     75 * RFC 1035 states: A freestanding @ denotes the current origin.
     76   There has been discussion in which locations @ is interpreted as the origin.
     77   e.g. how is a freestanding @ be interpreted in the RDATA section of a TXT RR.
     78   Note that there is no mention of text expansion in the original text. A
     79   freestanding @ denotes the origin. As such, it stands to reason that it's
     80   use is limited to locations where domain names are expressed, which also
     81   happens to be the most practical way to implement the functionality.
     82 
     83   > This also seems to be the behavior that other name servers implement (at
     84   > least BIND and PowerDNS). The BIND manual states: "When used in the label
     85   > (or name) field, the asperand or at-sign (@) symbol represents the current
     86   > origin. At the start of the zone file, it is the \<zone\_name\>, followed
     87   > by a trailing dot (.).
     88 
     89   > It may also make sense to interpret a quoted freestanding @ differently
     90   > than a non-quoted one. At least, BIND throws an error if a quoted
     91   > freestanding @ is encountered in the RDATA sections for CNAME and NS RRs.
     92   > However, a quoted freestanding @ is accepted and interpreted as origin
     93   > if specified as the OWNER.
     94 
     95   > Found mentions of what happens when a zone that uses freestanding @ in
     96   > RDATA is written to disk. Of course, this particular scenario rarely occurs
     97   > as it does not need to be written to disk when loaded on a primary and no
     98   > file exists if received over AXFR/IXFR. However, it may make sense to
     99   > implement optimistic compression of this form, and make it configurable.
    100 
    101 * Class and type names are mutually exclusive in practice.
    102   RFC1035 states: The RR begins with optional TTL and class fields, ...
    103   Therefore, if a type name matches a class name, the parser cannot distinguish
    104   between the two in text representation and must resort to generic notation
    105   (RFC3597) or, depending on the RDATA format for the record type, a
    106   look-ahead may be sufficient. Realistically, it is highly likely that because
    107   of this, no type name will ever match a class name.
    108 
    109   > This means both can reside in the same table.
    110 
    111 * The encoding is non-ASCII. Some characters have special meaning, but users
    112   are technically allowed to put in non-printable octets outside the ASCII
    113   range without custom encoding. Of course, this rarely occurs in practice
    114   and users are encouraged to use the \DDD encoding for "special".
    115 
    116 * Parenthesis may not be nested.
    117 
    118 * $ORIGIN must be an absolute domain.
    119 
    120 * Escape sequences must NOT be unescaped in the scanner as is common with
    121   programming languages like C that have a preprocessor. Instead, the
    122   original text is necessary in the parsing stage to distinguish between
    123   label separators (dots).
    124 
    125 * RFC 1035 specifies that the current origin should be restored after an
    126   $INCLUDE, but it is silent on whether the current domain name should also be
    127   restored. BIND 9 restores both of them. This could be construed as a
    128   deviation from RFC 1035, a feature, or both.
    129 
    130 * RFC 1035 states: and text literals can contain CRLF within the text.
    131   BIND, however, does not allow newlines in text (escaped or not). For
    132   performance reasons, we may adopt the same behavior as that would relieve
    133   the need to keep track of possibly embedded newlines.
    134 
    135 * From: http://www.zytrax.com/books/dns/ch8/include.html (mentioned in chat)
    136   > Source states: The RFC is silent on the topic of embedded `$INCLUDE`s in
    137   > `$INCLUDE`d files - BIND 9 documentation is similarly silent. Assume they
    138   > are not permitted.
    139 
    140   All implementations, including BIND, allow for embedded `$INCLUDE`s.
    141   The current implementation is such that (embedded) includes are allowed by
    142   default. However, `$INCLUDE` directives can be disabled, which is useful
    143   when parsing from an untrusted source. There is also protection against
    144   cyclic includes.
    145 
    146   > There is no maximum to the amount of embedded includes (yet). NSD limits
    147   > the number of includes to 10 by default (compile option). For security, it
    148   > must be possible to set a hard limit.
    149 
    150 * Default values for TTLs can be quite complicated.
    151 
    152   A [commit to ldns](https://github.com/NLnetLabs/ldns/commit/cb101c9) by
    153   @wtoorop nicely sums it up in code.
    154 
    155   RFC 1035 section 5.1:
    156   > Omitted class and TTL values are default to the last explicitly stated
    157   > values.
    158 
    159   This behavior is updated by RFC 2308 section 4:
    160   > All resource records appearing after the directive, and which do not
    161   > explicitly include a TTL value, have their TTL set to the TTL given
    162   > in the $TTL directive.  SIG records without a explicit TTL get their
    163   > TTL from the "original TTL" of the SIG record [RFC 2065 Section 4.5].
    164 
    165   The TTL rules for `SIG` RRs stated in RFC 2065 Section 4.5:
    166   > If the original TTL, which applies to the type signed, is the same as
    167   > the TTL of the SIG RR itself, it may be omitted.  The date field
    168   > which follows it is larger than the maximum possible TTL so there is
    169   > no ambiguity.
    170 
    171   The same applies applies to `RRSIG` RRs, although not stated as explicitly
    172   in RFC 4034 Section 3:
    173   > The TTL value of an RRSIG RR MUST match the TTL value of the RRset it
    174   > covers.  This is an exception to the [RFC2181] rules for TTL values
    175   > of individual RRs within a RRset: individual RRSIG RRs with the same
    176   > owner name will have different TTL values if the RRsets they cover
    177   > have different TTL values.
    178 
    179   Logic spanning RRs must not be handled during deserialization. The order in
    180   which RRs appear in the zone file is not relevant and keeping a possibly
    181   infinite backlog of RRs to handle it "automatically" is inefficient. As
    182   the name server retains RRs in a database already it seems most elegant to
    183   signal the TTL value was omitted and a default was used so that it may be
    184   updated in some post processing step.
    185 
    186   [RFC 2181 Section 8][rfc2181#8] contains additional notes on the maximum
    187   value for TTLs. During deserialization, any value exceeding 2147483647 is
    188   considered an error in primary mode, or a warning in secondary mode.
    189   [RFC 8767 Section 4][rfc8767#4] updates the text, but the update does not
    190   update handling during deserialization.
    191 
    192   [RFC 2181 Section 5][rfc2181#5.2] states the TTLs of all RRs in an RRSet
    193   must be the same. As with default values for `SIG` and `RRSIG` RRs, this
    194   must NOT be handled during deserialization. Presumably, the application
    195   should transparently fix TTLs (NLnetLabs/nsd#178).
    196 
    197 * Do NOT allow for quoted labels in domain names.
    198   [RFC 1035 Section 5][rfc1035#5] states:
    199   > The labels in the domain name are expressed as character strings and
    200   > separated by dots.
    201 
    202   [RFC 1035 section 5][rfc1035#5] states:
    203   > \<character-string\> is expressed in one or two ways: as a contiguous set
    204   > of characters without interior spaces, or as string beginning with a " and
    205   > ending with a ".
    206 
    207   However, quoted labels in domain names are very uncommon and implementations
    208   handle quoted names both in OWNER and RDATA very differently. The Flex+Bison
    209   based parser used in NSD before was the only parser that got it right.
    210 
    211   * BIND
    212     * owner: yes, interpreted as quoted
    213       ```
    214       dig @127.0.0.1 A quoted.example.com.
    215       ```
    216       ```
    217       quoted.example.com.  xxx  IN  A  x.x.x.x
    218       ```
    219     * rdata: no, syntax error (even with `check-names master ignored;`)
    220   * Knot
    221     * owner: no, syntax error
    222     * rdata: no, syntax error
    223   * PowerDNS
    224     * owner: no, not interpreted as quoted
    225       ```
    226       pdnsutil list-zone example.com.
    227       ```
    228       ```
    229       "quoted".example.com  xxx  IN  A  x.x.x.x
    230       ```
    231     * rdata: no, not interpreted as quoted
    232       ```
    233       dig @127.0.0.1 NS example.com.
    234       ```
    235       ```
    236       example.com.  xxx  IN  NS  \"quoted.example.com.\".example.com.
    237       ```
    238 
    239   > [libzscanner](https://github.com/CZ-NIC/knot/tree/master/src/libzscanner),
    240   > the (standalone) zone parser used by Knot seems mosts consistent.
    241 
    242   Drop support for quoted labels or domain names for consistent behavior.
    243 
    244 * Should any domain names that are not valid host names as specified by
    245   RFC 1123 section 2, i.e. use characters not in the preferred naming syntax
    246   as specified by RFC 1035 section 2.3.1, be accepted? RFC 2181 section 11 is
    247   very specific on this topic, but it merely states that labels may contain
    248   characters outside the set on the wire, it does not address what is, or is
    249   not, allowed in zone files.
    250 
    251   BIND's zone parser throws a syntax error for any name that is not a valid
    252   hostname unless `check-names master ignored;` is specified. Knot
    253   additionally accepts `-`, `_` and `/` according to
    254   [NOTES](https://github.com/CZ-NIC/knot/blob/master/src/libzscanner/NOTES).
    255 
    256   * [RFC1035 Section 2.3.1][rfc1035#2.3.1]
    257   * [RFC1123 Section 2][rfc1123#2]
    258   * [RFC2181 Section 11][rfc2181#11]
    259 
    260 * RFC 1035 specifies two control directives "$INCLUDE" and "$ORIGIN". RFC 2308
    261   specifies the "$TTL" directive. BIND additionally implements the "$DATE" and
    262   "$GENERATE" directives. Since "$" (dollar sign) is not reserved, both
    263   "$DATE" and "$GENERATE" (and "$TTL" before RFC2308) are considered valid
    264   domain names in other implementations (based on what is accepted for domain
    265   names, see earlier points). It seems "$" is better considered a reserved
    266   character (possibly limiting its special status to the start of the
    267   line), to allow for reliable extensibility in the future.
    268 
    269   > BIND seems to already throw an error if "$" is encountered, see
    270   > `lib/dns/master.c`. Presumably, the "$DATE" directive is written when the
    271   > zone is written to disk(?) In the code it is referred to as
    272   > __dump_time__ and later used to calculate __ttl_offset__.
    273 
    274 * BIND10 had a nice writeup on zone files, kindly provided by Shane Kerr.
    275   [Zone File Loading Requirements on Wayback Machine](https://web.archive.org/web/20140928215002/http://bind10.isc.org:80/wiki/ZoneLoadingRequirements)
    276 
    277 * `TYPE0` is sometimes used for debugging and therefore may occur in type
    278   bitmaps or as unknown RR type.
    279 
    280 * `pdns/master/regression-tests/zones/test.com` contains regression tests
    281   that may be useful for testing simdzone.
    282 
    283 * Some implementations (Knot, possibly PowerDNS) will silently split-up
    284   strings longer than 255 characters. Others (BIND, simdzone) will throw a
    285   syntax error.
    286 
    287 * How do we handle the corner case where the first record does not have a TTL
    288   when the file does not define a zone? (from @shane-kerr).
    289 
    290   At this point in time, the application provides a default TTL value before
    291   parsing. Whether that is the right approach is unclear, but it is what NSD
    292   did before.
    293 
    294 * Leading zeroes in integers appear to be allowed judging by the zone file
    295   generated for the [socket10kxfr][socket10kxfr.pre#L64] test in NSD. BIND
    296   and Knot parsed it without problems too.
    297 
    298 [rfc1034#3.6.1]: https://datatracker.ietf.org/doc/html/rfc1034#section-3.6.1
    299 [rfc1035#5]: https://datatracker.ietf.org/doc/html/rfc1035#section-5
    300 [rfc1035#2.3.1]: https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.1
    301 [rfc1123#2]: https://datatracker.ietf.org/doc/html/rfc1123#section-2
    302 [rfc2065#4.5]: https://datatracker.ietf.org/doc/html/rfc2065#section-4.5
    303 [rfc2181#5.2]: https://datatracker.ietf.org/doc/html/rfc2181#section-5.2
    304 [rfc2181#8]: https://datatracker.ietf.org/doc/html/rfc2181#section-8
    305 [rfc2181#11]: https://datatracker.ietf.org/doc/html/rfc2181#section-11
    306 [rfc2308#4]: https://datatracker.ietf.org/doc/html/rfc2308#section-4
    307 [rfc3597#5]: https://datatracker.ietf.org/doc/html/rfc3597#section-5
    308 [rfc8767#4]: https://www.rfc-editor.org/rfc/rfc8767#section-4
    309 [rfc9460#2.1]: https://datatracker.ietf.org/doc/html/rfc9460#section-2.1
    310 
    311 [socket10kxfr.pre#L64]: https://github.com/NLnetLabs/nsd/blob/86a6961f2ca64f169d7beece0ed8a5e1dd1cd302/tpkg/long/socket10kxfr.tdir/socket10kxfr.pre#L64
    312