1 # Zone files 2 3 Zone files are text files that contain resource records (RRs) in text form. 4 Zones can be defined by expressing them in the form of a list of RRs. 5 6 Zone files were originally specified in RFC1035 Section 5, but the DNS 7 has seen many additions since and the specification is rather ambiguous. 8 Consequently, various name servers implement slightly different dialects. This 9 document aims to clarify the format by listing (some of) the relevant 10 specifications and then proceed to explain why certain design decisions were 11 made in simdzone. 12 13 * [RFC 1034 Section 3.6.1][rfc1034#3.6.1] 14 * [RFC 1035 Section 5][rfc1035#5] 15 * [RFC 2065 Section 4.5][rfc2065#4.5] 16 * [RFC 2181 Section 8][rfc2181#8] 17 * [RFC 2308 Section 4][rfc2308#4] 18 * [RFC 3597 Section 5][rfc3597#5] 19 * [RFC 9460 Section 2.1][rfc9460#2.1] 20 21 22 ## Clarification (work-in-progress) 23 24 > NOTE: BIND behavior is more-or-less considered the de facto standard. 25 26 Historically, master files where edited by hand, which is reflected in the 27 syntax. Consider the format a tabular serialization format with provisions 28 for convenient editing. i.e. the owner, class and ttl fields may be omitted 29 (provided the line starts with \<blank\> for the owner) and $INCLUDE directives 30 can be used for templating. 31 32 The format is NOT context-free. The field following the owner (if specified) 33 may represent either a type, class or ttl and a symbolic constant, e.g. A 34 or NS, may have a different meaning if specified as an RDATA field. 35 36 The DNS is intentionally extensible. The specification is not explicit about 37 how that affects syntax, but it explains why no specific notation for 38 data-types can be enforced by RFC 1035. To make it easier for data-types to 39 be added at a later stage the syntax cannot enforce a certain notation (or 40 the scanner would need to be revised). Consequently, the scanner only 41 identifies items (or fields) and structural characters, which can be 42 expressed as either a contiguous set of characters without interior spaces, 43 or as a quoted string. 44 45 The format allows for including structural characters in fields by means of 46 escaping the actual character or enclosing the field in quotes. The example 47 provided by the specification here is using ASCII dots in domain name labels. 48 The dot is normally a label separator, replaced by the length of the label 49 on the wire. If a domain name includes an actual ASCII dot, the character 50 must be escaped in the textual representation (`\X` or `\DDD`). 51 52 Note that ASCII dot characters strictly speaking do not have to be escaped 53 in a quoted string. RFC 1035 clearly states labels in domain names are 54 expressed as character strings. However, behavior differs across 55 implementations, so support for quoted labels is best dropped (see below). 56 57 RFC 1035 states both \<contiguous\> and \<quoted\> are \<character-string\>. 58 Meaning, items can be either \<contiguous\> or \<quoted\>. Wether a specific 59 item is interpreted as a \<character-string\> depends on type of value for 60 that item. E.g., TTLs are decimal integers and therefore cannot be expressed 61 as \<quoted\> as it is not a \<character-string\>. Similarly, base64 62 sequences are encoded binary blobs, not \<character-string\>s and therefore 63 cannot be expressed as such. Escape sequences are valid only in 64 \<character-string\>s. 65 66 * Mnemonics are NOT character strings. 67 68 > BIND does not accept quoted fields for A or NS RDATA. TTL values in SOA 69 > RDATA, base64 Signature in DNSKEY RDATA, as well as type, class and TTL 70 > header fields all result in a syntax error too if quoted. 71 72 * Some integer fields allow for using mnemonics too. E.g., the algorithm 73 field in RRSIG records. 74 75 * RFC 1035 states: A freestanding @ denotes the current origin. 76 There has been discussion in which locations @ is interpreted as the origin. 77 e.g. how is a freestanding @ be interpreted in the RDATA section of a TXT RR. 78 Note that there is no mention of text expansion in the original text. A 79 freestanding @ denotes the origin. As such, it stands to reason that it's 80 use is limited to locations where domain names are expressed, which also 81 happens to be the most practical way to implement the functionality. 82 83 > This also seems to be the behavior that other name servers implement (at 84 > least BIND and PowerDNS). The BIND manual states: "When used in the label 85 > (or name) field, the asperand or at-sign (@) symbol represents the current 86 > origin. At the start of the zone file, it is the \<zone\_name\>, followed 87 > by a trailing dot (.). 88 89 > It may also make sense to interpret a quoted freestanding @ differently 90 > than a non-quoted one. At least, BIND throws an error if a quoted 91 > freestanding @ is encountered in the RDATA sections for CNAME and NS RRs. 92 > However, a quoted freestanding @ is accepted and interpreted as origin 93 > if specified as the OWNER. 94 95 > Found mentions of what happens when a zone that uses freestanding @ in 96 > RDATA is written to disk. Of course, this particular scenario rarely occurs 97 > as it does not need to be written to disk when loaded on a primary and no 98 > file exists if received over AXFR/IXFR. However, it may make sense to 99 > implement optimistic compression of this form, and make it configurable. 100 101 * Class and type names are mutually exclusive in practice. 102 RFC1035 states: The RR begins with optional TTL and class fields, ... 103 Therefore, if a type name matches a class name, the parser cannot distinguish 104 between the two in text representation and must resort to generic notation 105 (RFC3597) or, depending on the RDATA format for the record type, a 106 look-ahead may be sufficient. Realistically, it is highly likely that because 107 of this, no type name will ever match a class name. 108 109 > This means both can reside in the same table. 110 111 * The encoding is non-ASCII. Some characters have special meaning, but users 112 are technically allowed to put in non-printable octets outside the ASCII 113 range without custom encoding. Of course, this rarely occurs in practice 114 and users are encouraged to use the \DDD encoding for "special". 115 116 * Parenthesis may not be nested. 117 118 * $ORIGIN must be an absolute domain. 119 120 * Escape sequences must NOT be unescaped in the scanner as is common with 121 programming languages like C that have a preprocessor. Instead, the 122 original text is necessary in the parsing stage to distinguish between 123 label separators (dots). 124 125 * RFC 1035 specifies that the current origin should be restored after an 126 $INCLUDE, but it is silent on whether the current domain name should also be 127 restored. BIND 9 restores both of them. This could be construed as a 128 deviation from RFC 1035, a feature, or both. 129 130 * RFC 1035 states: and text literals can contain CRLF within the text. 131 BIND, however, does not allow newlines in text (escaped or not). For 132 performance reasons, we may adopt the same behavior as that would relieve 133 the need to keep track of possibly embedded newlines. 134 135 * From: http://www.zytrax.com/books/dns/ch8/include.html (mentioned in chat) 136 > Source states: The RFC is silent on the topic of embedded `$INCLUDE`s in 137 > `$INCLUDE`d files - BIND 9 documentation is similarly silent. Assume they 138 > are not permitted. 139 140 All implementations, including BIND, allow for embedded `$INCLUDE`s. 141 The current implementation is such that (embedded) includes are allowed by 142 default. However, `$INCLUDE` directives can be disabled, which is useful 143 when parsing from an untrusted source. There is also protection against 144 cyclic includes. 145 146 > There is no maximum to the amount of embedded includes (yet). NSD limits 147 > the number of includes to 10 by default (compile option). For security, it 148 > must be possible to set a hard limit. 149 150 * Default values for TTLs can be quite complicated. 151 152 A [commit to ldns](https://github.com/NLnetLabs/ldns/commit/cb101c9) by 153 @wtoorop nicely sums it up in code. 154 155 RFC 1035 section 5.1: 156 > Omitted class and TTL values are default to the last explicitly stated 157 > values. 158 159 This behavior is updated by RFC 2308 section 4: 160 > All resource records appearing after the directive, and which do not 161 > explicitly include a TTL value, have their TTL set to the TTL given 162 > in the $TTL directive. SIG records without a explicit TTL get their 163 > TTL from the "original TTL" of the SIG record [RFC 2065 Section 4.5]. 164 165 The TTL rules for `SIG` RRs stated in RFC 2065 Section 4.5: 166 > If the original TTL, which applies to the type signed, is the same as 167 > the TTL of the SIG RR itself, it may be omitted. The date field 168 > which follows it is larger than the maximum possible TTL so there is 169 > no ambiguity. 170 171 The same applies applies to `RRSIG` RRs, although not stated as explicitly 172 in RFC 4034 Section 3: 173 > The TTL value of an RRSIG RR MUST match the TTL value of the RRset it 174 > covers. This is an exception to the [RFC2181] rules for TTL values 175 > of individual RRs within a RRset: individual RRSIG RRs with the same 176 > owner name will have different TTL values if the RRsets they cover 177 > have different TTL values. 178 179 Logic spanning RRs must not be handled during deserialization. The order in 180 which RRs appear in the zone file is not relevant and keeping a possibly 181 infinite backlog of RRs to handle it "automatically" is inefficient. As 182 the name server retains RRs in a database already it seems most elegant to 183 signal the TTL value was omitted and a default was used so that it may be 184 updated in some post processing step. 185 186 [RFC 2181 Section 8][rfc2181#8] contains additional notes on the maximum 187 value for TTLs. During deserialization, any value exceeding 2147483647 is 188 considered an error in primary mode, or a warning in secondary mode. 189 [RFC 8767 Section 4][rfc8767#4] updates the text, but the update does not 190 update handling during deserialization. 191 192 [RFC 2181 Section 5][rfc2181#5.2] states the TTLs of all RRs in an RRSet 193 must be the same. As with default values for `SIG` and `RRSIG` RRs, this 194 must NOT be handled during deserialization. Presumably, the application 195 should transparently fix TTLs (NLnetLabs/nsd#178). 196 197 * Do NOT allow for quoted labels in domain names. 198 [RFC 1035 Section 5][rfc1035#5] states: 199 > The labels in the domain name are expressed as character strings and 200 > separated by dots. 201 202 [RFC 1035 section 5][rfc1035#5] states: 203 > \<character-string\> is expressed in one or two ways: as a contiguous set 204 > of characters without interior spaces, or as string beginning with a " and 205 > ending with a ". 206 207 However, quoted labels in domain names are very uncommon and implementations 208 handle quoted names both in OWNER and RDATA very differently. The Flex+Bison 209 based parser used in NSD before was the only parser that got it right. 210 211 * BIND 212 * owner: yes, interpreted as quoted 213 ``` 214 dig @127.0.0.1 A quoted.example.com. 215 ``` 216 ``` 217 quoted.example.com. xxx IN A x.x.x.x 218 ``` 219 * rdata: no, syntax error (even with `check-names master ignored;`) 220 * Knot 221 * owner: no, syntax error 222 * rdata: no, syntax error 223 * PowerDNS 224 * owner: no, not interpreted as quoted 225 ``` 226 pdnsutil list-zone example.com. 227 ``` 228 ``` 229 "quoted".example.com xxx IN A x.x.x.x 230 ``` 231 * rdata: no, not interpreted as quoted 232 ``` 233 dig @127.0.0.1 NS example.com. 234 ``` 235 ``` 236 example.com. xxx IN NS \"quoted.example.com.\".example.com. 237 ``` 238 239 > [libzscanner](https://github.com/CZ-NIC/knot/tree/master/src/libzscanner), 240 > the (standalone) zone parser used by Knot seems mosts consistent. 241 242 Drop support for quoted labels or domain names for consistent behavior. 243 244 * Should any domain names that are not valid host names as specified by 245 RFC 1123 section 2, i.e. use characters not in the preferred naming syntax 246 as specified by RFC 1035 section 2.3.1, be accepted? RFC 2181 section 11 is 247 very specific on this topic, but it merely states that labels may contain 248 characters outside the set on the wire, it does not address what is, or is 249 not, allowed in zone files. 250 251 BIND's zone parser throws a syntax error for any name that is not a valid 252 hostname unless `check-names master ignored;` is specified. Knot 253 additionally accepts `-`, `_` and `/` according to 254 [NOTES](https://github.com/CZ-NIC/knot/blob/master/src/libzscanner/NOTES). 255 256 * [RFC1035 Section 2.3.1][rfc1035#2.3.1] 257 * [RFC1123 Section 2][rfc1123#2] 258 * [RFC2181 Section 11][rfc2181#11] 259 260 * RFC 1035 specifies two control directives "$INCLUDE" and "$ORIGIN". RFC 2308 261 specifies the "$TTL" directive. BIND additionally implements the "$DATE" and 262 "$GENERATE" directives. Since "$" (dollar sign) is not reserved, both 263 "$DATE" and "$GENERATE" (and "$TTL" before RFC2308) are considered valid 264 domain names in other implementations (based on what is accepted for domain 265 names, see earlier points). It seems "$" is better considered a reserved 266 character (possibly limiting its special status to the start of the 267 line), to allow for reliable extensibility in the future. 268 269 > BIND seems to already throw an error if "$" is encountered, see 270 > `lib/dns/master.c`. Presumably, the "$DATE" directive is written when the 271 > zone is written to disk(?) In the code it is referred to as 272 > __dump_time__ and later used to calculate __ttl_offset__. 273 274 * BIND10 had a nice writeup on zone files, kindly provided by Shane Kerr. 275 [Zone File Loading Requirements on Wayback Machine](https://web.archive.org/web/20140928215002/http://bind10.isc.org:80/wiki/ZoneLoadingRequirements) 276 277 * `TYPE0` is sometimes used for debugging and therefore may occur in type 278 bitmaps or as unknown RR type. 279 280 * `pdns/master/regression-tests/zones/test.com` contains regression tests 281 that may be useful for testing simdzone. 282 283 * Some implementations (Knot, possibly PowerDNS) will silently split-up 284 strings longer than 255 characters. Others (BIND, simdzone) will throw a 285 syntax error. 286 287 * How do we handle the corner case where the first record does not have a TTL 288 when the file does not define a zone? (from @shane-kerr). 289 290 At this point in time, the application provides a default TTL value before 291 parsing. Whether that is the right approach is unclear, but it is what NSD 292 did before. 293 294 * Leading zeroes in integers appear to be allowed judging by the zone file 295 generated for the [socket10kxfr][socket10kxfr.pre#L64] test in NSD. BIND 296 and Knot parsed it without problems too. 297 298 [rfc1034#3.6.1]: https://datatracker.ietf.org/doc/html/rfc1034#section-3.6.1 299 [rfc1035#5]: https://datatracker.ietf.org/doc/html/rfc1035#section-5 300 [rfc1035#2.3.1]: https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.1 301 [rfc1123#2]: https://datatracker.ietf.org/doc/html/rfc1123#section-2 302 [rfc2065#4.5]: https://datatracker.ietf.org/doc/html/rfc2065#section-4.5 303 [rfc2181#5.2]: https://datatracker.ietf.org/doc/html/rfc2181#section-5.2 304 [rfc2181#8]: https://datatracker.ietf.org/doc/html/rfc2181#section-8 305 [rfc2181#11]: https://datatracker.ietf.org/doc/html/rfc2181#section-11 306 [rfc2308#4]: https://datatracker.ietf.org/doc/html/rfc2308#section-4 307 [rfc3597#5]: https://datatracker.ietf.org/doc/html/rfc3597#section-5 308 [rfc8767#4]: https://www.rfc-editor.org/rfc/rfc8767#section-4 309 [rfc9460#2.1]: https://datatracker.ietf.org/doc/html/rfc9460#section-2.1 310 311 [socket10kxfr.pre#L64]: https://github.com/NLnetLabs/nsd/blob/86a6961f2ca64f169d7beece0ed8a5e1dd1cd302/tpkg/long/socket10kxfr.tdir/socket10kxfr.pre#L64 312