1 .. Copyright (C) Internet Systems Consortium, Inc. ("ISC") 2 .. 3 .. SPDX-License-Identifier: MPL-2.0 4 .. 5 .. This Source Code Form is subject to the terms of the Mozilla Public 6 .. License, v. 2.0. If a copy of the MPL was not distributed with this 7 .. file, you can obtain one at https://mozilla.org/MPL/2.0/. 8 .. 9 .. See the COPYRIGHT file distributed with this work for additional 10 .. information regarding copyright ownership. 11 12 .. _dnssec_troubleshooting: 13 14 Basic DNSSEC Troubleshooting 15 ---------------------------- 16 17 In this chapter, we cover some basic troubleshooting 18 techniques, some common DNSSEC symptoms, and their causes and solutions. This 19 is not a comprehensive "how to troubleshoot any DNS or DNSSEC problem" 20 guide, because that could easily be an entire book by itself. 21 22 .. _troubleshooting_query_path: 23 24 Query Path 25 ~~~~~~~~~~ 26 27 The first step in troubleshooting DNS or DNSSEC should be to 28 determine the query path. Whenever you are working with a DNS-related issue, it is 29 always a good idea to determine the exact query path to identify the 30 origin of the problem. 31 32 End clients, such as laptop computers or mobile phones, are configured 33 to talk to a recursive name server, and the recursive name server may in 34 turn forward requests on to other recursive name servers before arriving at the 35 authoritative name server. The giveaway is the presence of the 36 Authoritative Answer (``aa``) flag in a query response: when present, we know we are talking 37 to the authoritative server; when missing, we are talking to a recursive 38 server. The example below shows an answer to a query for 39 ``www.example.com`` without the Authoritative Answer flag: 40 41 :: 42 43 $ dig @10.53.0.3 www.example.com A 44 45 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.com a 46 ; (1 server found) 47 ;; global options: +cmd 48 ;; Got answer: 49 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62714 50 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 51 52 ;; OPT PSEUDOSECTION: 53 ; EDNS: version: 0, flags:; udp: 4096 54 ; COOKIE: c823fe302625db5b010000005e722b504d81bb01c2227259 (good) 55 ;; QUESTION SECTION: 56 ;www.example.com. IN A 57 58 ;; ANSWER SECTION: 59 www.example.com. 60 IN A 10.1.0.1 60 61 ;; Query time: 3 msec 62 ;; SERVER: 10.53.0.3#53(10.53.0.3) 63 ;; WHEN: Wed Mar 18 14:08:16 GMT 2020 64 ;; MSG SIZE rcvd: 88 65 66 Not only do we not see the ``aa`` flag, we see an ``ra`` 67 flag, which indicates Recursion Available. This indicates that the 68 server we are talking to (10.53.0.3 in this example) is a recursive name 69 server: although we were able to get an answer for 70 ``www.example.com``, we know that the answer came from somewhere else. 71 72 If we query the authoritative server directly, we get: 73 74 :: 75 76 $ dig @10.53.0.2 www.example.com A 77 78 ; <<>> DiG 9.16.0 <<>> @10.53.0.2 www.example.com a 79 ; (1 server found) 80 ;; global options: +cmd 81 ;; Got answer: 82 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39542 83 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 84 ;; WARNING: recursion requested but not available 85 ... 86 87 The ``aa`` flag tells us that we are now talking to the 88 authoritative name server for ``www.example.com``, and that this is not a 89 cached answer it obtained from some other name server; it served this 90 answer to us right from its own database. In fact, 91 the Recursion Available (``ra``) flag is not present, which means this 92 name server is not configured to perform recursion (at least not for 93 this client), so it could not have queried another name server to get 94 cached results. 95 96 .. _troubleshooting_visible_symptoms: 97 98 Visible DNSSEC Validation Symptoms 99 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 100 101 After determining the query path, it is necessary to 102 determine whether the problem is actually related to DNSSEC 103 validation. You can use the :option:`dig +cd` flag to disable 104 validation, as described in 105 :ref:`how_do_i_know_validation_problem`. 106 107 When there is indeed a DNSSEC validation problem, the visible symptoms, 108 unfortunately, are very limited. With DNSSEC validation enabled, if a 109 DNS response is not fully validated, it results in a generic 110 SERVFAIL message, as shown below when querying against a recursive name 111 server at 192.168.1.7: 112 113 :: 114 115 $ dig @10.53.0.3 www.example.org. A 116 117 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A 118 ; (1 server found) 119 ;; global options: +cmd 120 ;; Got answer: 121 ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 28947 122 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 123 124 ;; OPT PSEUDOSECTION: 125 ; EDNS: version: 0, flags:; udp: 4096 126 ; COOKIE: d1301968aca086ad010000005e723a7113603c01916d136b (good) 127 ;; QUESTION SECTION: 128 ;www.example.org. IN A 129 130 ;; Query time: 3 msec 131 ;; SERVER: 10.53.0.3#53(10.53.0.3) 132 ;; WHEN: Wed Mar 18 15:12:49 GMT 2020 133 ;; MSG SIZE rcvd: 72 134 135 With :iscman:`delv`, a "resolution failed" message is output instead: 136 137 :: 138 139 $ delv @10.53.0.3 www.example.org. A +rtrace 140 ;; fetch: www.example.org/A 141 ;; resolution failed: SERVFAIL 142 143 BIND 9 logging features may be useful when trying to identify 144 DNSSEC errors. 145 146 .. _troubleshooting_logging: 147 148 Basic Logging 149 ~~~~~~~~~~~~~ 150 151 DNSSEC validation error messages show up in :any:`syslog` as a 152 query error by default. Here is an example of what it may look like: 153 154 :: 155 156 validating www.example.org/A: no valid signature found 157 RRSIG failed to verify resolving 'www.example.org/A/IN': 10.53.0.2#53 158 159 Usually, this level of error logging is sufficient. 160 Debug logging, described in 161 :ref:`troubleshooting_logging_debug`, gives information on how 162 to get more details about why DNSSEC validation may have 163 failed. 164 165 .. _troubleshooting_logging_debug: 166 167 BIND DNSSEC Debug Logging 168 ~~~~~~~~~~~~~~~~~~~~~~~~~ 169 170 A word of caution: before you enable debug logging, be aware that this 171 may dramatically increase the load on your name servers. Enabling debug 172 logging is thus not recommended for production servers. 173 174 With that said, sometimes it may become necessary to temporarily enable 175 BIND debug logging to see more details of how and whether DNSSEC is 176 validating. DNSSEC-related messages are not recorded in :any:`syslog` by default, 177 even if query log is enabled; only DNSSEC errors show up in :any:`syslog`. 178 179 The example below shows how to enable debug level 3 (to see full DNSSEC 180 validation messages) in BIND 9 and have it sent to :any:`syslog`: 181 182 :: 183 184 logging { 185 channel dnssec_log { 186 syslog daemon; 187 severity debug 3; 188 print-category yes; 189 }; 190 category dnssec { dnssec_log; }; 191 }; 192 193 The example below shows how to log DNSSEC messages to their own file 194 (here, ``/var/log/dnssec.log``): 195 196 :: 197 198 logging { 199 channel dnssec_log { 200 file "/var/log/dnssec.log"; 201 severity debug 3; 202 }; 203 category dnssec { dnssec_log; }; 204 }; 205 206 After turning on debug logging and restarting BIND, a large 207 number of log messages appear in 208 :any:`syslog`. The example below shows the log messages as a result of 209 successfully looking up and validating the domain name ``ftp.isc.org``. 210 211 :: 212 213 validating ./NS: starting 214 validating ./NS: attempting positive response validation 215 validating ./DNSKEY: starting 216 validating ./DNSKEY: attempting positive response validation 217 validating ./DNSKEY: verify rdataset (keyid=20326): success 218 validating ./DNSKEY: marking as secure (DS) 219 validating ./NS: in validator_callback_dnskey 220 validating ./NS: keyset with trust secure 221 validating ./NS: resuming validate 222 validating ./NS: verify rdataset (keyid=33853): success 223 validating ./NS: marking as secure, noqname proof not needed 224 validating ftp.isc.org/A: starting 225 validating ftp.isc.org/A: attempting positive response validation 226 validating isc.org/DNSKEY: starting 227 validating isc.org/DNSKEY: attempting positive response validation 228 validating isc.org/DS: starting 229 validating isc.org/DS: attempting positive response validation 230 validating org/DNSKEY: starting 231 validating org/DNSKEY: attempting positive response validation 232 validating org/DS: starting 233 validating org/DS: attempting positive response validation 234 validating org/DS: keyset with trust secure 235 validating org/DS: verify rdataset (keyid=33853): success 236 validating org/DS: marking as secure, noqname proof not needed 237 validating org/DNSKEY: in validator_callback_ds 238 validating org/DNSKEY: dsset with trust secure 239 validating org/DNSKEY: verify rdataset (keyid=9795): success 240 validating org/DNSKEY: marking as secure (DS) 241 validating isc.org/DS: in fetch_callback_dnskey 242 validating isc.org/DS: keyset with trust secure 243 validating isc.org/DS: resuming validate 244 validating isc.org/DS: verify rdataset (keyid=33209): success 245 validating isc.org/DS: marking as secure, noqname proof not needed 246 validating isc.org/DNSKEY: in validator_callback_ds 247 validating isc.org/DNSKEY: dsset with trust secure 248 validating isc.org/DNSKEY: verify rdataset (keyid=7250): success 249 validating isc.org/DNSKEY: marking as secure (DS) 250 validating ftp.isc.org/A: in fetch_callback_dnskey 251 validating ftp.isc.org/A: keyset with trust secure 252 validating ftp.isc.org/A: resuming validate 253 validating ftp.isc.org/A: verify rdataset (keyid=27566): success 254 validating ftp.isc.org/A: marking as secure, noqname proof not needed 255 256 Note that these log messages indicate that the chain of trust has been 257 established and ``ftp.isc.org`` has been successfully validated. 258 259 If validation had failed, you would see log messages indicating errors. 260 We cover some of the most validation problems in the next section. 261 262 .. _troubleshooting_common_problems: 263 264 Common Problems 265 ~~~~~~~~~~~~~~~ 266 267 .. _troubleshooting_security_lameness: 268 269 Security Lameness 270 ^^^^^^^^^^^^^^^^^ 271 272 Similar to lame delegation in traditional DNS, security lameness refers to the 273 condition when the parent zone holds a set of DS records that point to 274 something that does not exist in the child zone. As a result, 275 the entire child zone may "disappear," having been marked as bogus by 276 validating resolvers. 277 278 Below is an example attempting to resolve the A record for a test domain 279 name ``www.example.net``. From the user's perspective, as described in 280 :ref:`how_do_i_know_validation_problem`, only a SERVFAIL 281 message is returned. On the validating resolver, we see the 282 following messages in :any:`syslog`: 283 284 :: 285 286 named[126063]: validating example.net/DNSKEY: no valid signature found (DS) 287 named[126063]: no valid RRSIG resolving 'example.net/DNSKEY/IN': 10.53.0.2#53 288 named[126063]: broken trust chain resolving 'www.example.net/A/IN': 10.53.0.2#53 289 290 This gives us a hint that it is a broken trust chain issue. Let's take a 291 look at the DS records that are published for the zone (with the keys 292 shortened for ease of display): 293 294 :: 295 296 $ dig @10.53.0.3 example.net. DS 297 298 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DS 299 ; (1 server found) 300 ;; global options: +cmd 301 ;; Got answer: 302 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59602 303 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 304 305 ;; OPT PSEUDOSECTION: 306 ; EDNS: version: 0, flags:; udp: 4096 307 ; COOKIE: 7026d8f7c6e77e2a010000005e735d7c9d038d061b2d24da (good) 308 ;; QUESTION SECTION: 309 ;example.net. IN DS 310 311 ;; ANSWER SECTION: 312 example.net. 256 IN DS 14956 8 2 9F3CACD...D3E3A396 313 314 ;; Query time: 0 msec 315 ;; SERVER: 10.53.0.3#53(10.53.0.3) 316 ;; WHEN: Thu Mar 19 11:54:36 GMT 2020 317 ;; MSG SIZE rcvd: 116 318 319 Next, we query for the DNSKEY and RRSIG of ``example.net`` to see if 320 there's anything wrong. Since we are having trouble validating, we 321 can use the :option:`dig +cd` option to temporarily disable checking and return 322 results, even though they do not pass the validation tests. The 323 :option:`dig +multiline` option causes :iscman:`dig` to print the type, algorithm type, 324 and key id for DNSKEY records. Again, 325 some long strings are shortened for ease of display: 326 327 :: 328 329 $ dig @10.53.0.3 example.net. DNSKEY +dnssec +cd +multiline 330 331 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DNSKEY +cd +multiline +dnssec 332 ; (1 server found) 333 ;; global options: +cmd 334 ;; Got answer: 335 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42980 336 ;; flags: qr rd ra cd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1 337 338 ;; OPT PSEUDOSECTION: 339 ; EDNS: version: 0, flags: do; udp: 4096 340 ; COOKIE: 4b5e7c88b3680c35010000005e73722057551f9f8be1990e (good) 341 ;; QUESTION SECTION: 342 ;example.net. IN DNSKEY 343 344 ;; ANSWER SECTION: 345 example.net. 287 IN DNSKEY 256 3 8 ( 346 AwEAAbu3NX...ADU/D7xjFFDu+8WRIn 347 ) ; ZSK; alg = RSASHA256 ; key id = 35328 348 example.net. 287 IN DNSKEY 257 3 8 ( 349 AwEAAbKtU1...PPP4aQZTybk75ZW+uL 350 6OJMAF63NO0s1nAZM2EWAVasbnn/X+J4N2rLuhk= 351 ) ; KSK; alg = RSASHA256 ; key id = 27247 352 example.net. 287 IN RRSIG DNSKEY 8 2 300 ( 353 20811123173143 20180101000000 27247 example.net. 354 Fz1sjClIoF...YEjzpAWuAj9peQ== ) 355 example.net. 287 IN RRSIG DNSKEY 8 2 300 ( 356 20811123173143 20180101000000 35328 example.net. 357 seKtUeJ4/l...YtDc1rcXTVlWIOw= ) 358 359 ;; Query time: 0 msec 360 ;; SERVER: 10.53.0.3#53(10.53.0.3) 361 ;; WHEN: Thu Mar 19 13:22:40 GMT 2020 362 ;; MSG SIZE rcvd: 962 363 364 Here is the problem: the parent zone is telling the world that 365 ``example.net`` is using the key 14956, but the authoritative server 366 indicates that it is using keys 27247 and 35328. There are several 367 potential causes for this mismatch: one possibility is that a malicious 368 attacker has compromised one side and changed the data. A more likely 369 scenario is that the DNS administrator for the child zone did not upload 370 the correct key information to the parent zone. 371 372 .. _troubleshooting_incorrect_time: 373 374 Incorrect Time 375 ^^^^^^^^^^^^^^ 376 377 In DNSSEC, every record comes with at least one RRSIG, and each RRSIG 378 contains two timestamps: one indicating when it becomes valid, and 379 one when it expires. If the validating resolver's current system time does 380 not fall within the two RRSIG timestamps, error messages 381 appear in the BIND debug log. 382 383 The example below shows a log message when the RRSIG appears to have 384 expired. This could mean the validating resolver system time is 385 incorrectly set too far in the future, or the zone administrator has not 386 kept up with RRSIG maintenance. 387 388 :: 389 390 validating example.com/DNSKEY: verify failed due to bad signature (keyid=19036): RRSIG has expired 391 392 The log below shows that the RRSIG validity period has not yet begun. This could mean 393 the validation resolver's system time is incorrectly set too far in the past, or 394 the zone administrator has incorrectly generated signatures for this 395 domain name. 396 397 :: 398 399 validating example.com/DNSKEY: verify failed due to bad signature (keyid=4521): RRSIG validity period has not begun 400 401 .. _troubleshooting_unable_to_load_keys: 402 403 Unable to Load Keys 404 ^^^^^^^^^^^^^^^^^^^ 405 406 This is a simple yet common issue. If the key files are present but 407 unreadable by :iscman:`named` for some reason, the :any:`syslog` returns clear error 408 messages, as shown below: 409 410 :: 411 412 named[32447]: zone example.com/IN (signed): reconfiguring zone keys 413 named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+06817.private: permission denied 414 named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+17694.private: permission denied 415 named[32447]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:04:36.521 416 417 However, if no keys are found, the error is not as obvious. Below shows 418 the :any:`syslog` messages after executing ``rndc 419 reload`` with the key files missing from the key directory: 420 421 :: 422 423 named[32516]: received control channel command 'reload' 424 named[32516]: loading configuration from '/etc/bind/named.conf' 425 named[32516]: using default UDP/IPv4 port range: [1024, 65535] 426 named[32516]: using default UDP/IPv6 port range: [1024, 65535] 427 named[32516]: sizing zone task pool based on 6 zones 428 named[32516]: the working directory is not writable 429 named[32516]: reloading configuration succeeded 430 named[32516]: reloading zones succeeded 431 named[32516]: all zones loaded 432 named[32516]: running 433 named[32516]: zone example.com/IN (signed): reconfiguring zone keys 434 named[32516]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:07:09.292 435 436 This happens to look exactly the same as if the keys were present and 437 readable, and appears to indicate that :iscman:`named` loaded the keys and signed the zone. It 438 even generates the internal (raw) files: 439 440 :: 441 442 # cd /etc/bind/db 443 # ls 444 example.com.db example.com.db.jbk example.com.db.signed 445 446 If :iscman:`named` really loaded the keys and signed the zone, you should see 447 the following files: 448 449 :: 450 451 # cd /etc/bind/db 452 # ls 453 example.com.db example.com.db.jbk example.com.db.signed example.com.db.signed.jnl 454 455 So, unless you see the ``*.signed.jnl`` file, your zone has not been 456 signed. 457 458 .. _troubleshooting_invalid_trust_anchors: 459 460 Invalid Trust Anchors 461 ^^^^^^^^^^^^^^^^^^^^^ 462 463 In most cases, you never need to explicitly configure trust 464 anchors. :iscman:`named` supplies the current root trust anchor and, 465 with the default setting of :any:`dnssec-validation`, updates it on the 466 infrequent occasions when it is changed. 467 468 However, in some circumstances you may need to explicitly configure 469 your own trust anchor. As we saw in the :ref:`trust_anchors_description` 470 section, whenever a DNSKEY is received by the validating resolver, it is 471 compared to the list of keys the resolver explicitly trusts to see if 472 further action is needed. If the two keys match, the validating resolver 473 stops performing further verification and returns the answer(s) as 474 validated. 475 476 But what if the key file on the validating resolver is misconfigured or 477 missing? Below we show some examples of log messages when things are not 478 working properly. 479 480 First of all, if the key you copied is malformed, BIND does not even 481 start and you will likely find this error message in syslog: 482 483 :: 484 485 named[18235]: /etc/bind/named.conf.options:29: bad base64 encoding 486 named[18235]: loading configuration: failure 487 488 If the key is a valid base64 string but the key algorithm is incorrect, 489 or if the wrong key is installed, the first thing you will notice is 490 that virtually all of your DNS lookups result in SERVFAIL, even when 491 you are looking up domain names that have not been DNSSEC-enabled. Below 492 shows an example of querying a recursive server 10.53.0.3: 493 494 :: 495 496 $ dig @10.53.0.3 www.example.com. A 497 498 ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A +dnssec 499 ; (1 server found) 500 ;; global options: +cmd 501 ;; Got answer: 502 ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 29586 503 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 504 505 ;; OPT PSEUDOSECTION: 506 ; EDNS: version: 0, flags: do; udp: 4096 507 ; COOKIE: ee078fc321fa1367010000005e73a58bf5f205ca47e04bed (good) 508 ;; QUESTION SECTION: 509 ;www.example.org. IN A 510 511 :iscman:`delv` shows a similar result: 512 513 :: 514 515 $ delv @192.168.1.7 www.example.com. +rtrace 516 ;; fetch: www.example.com/A 517 ;; resolution failed: SERVFAIL 518 519 The next symptom you see is in the DNSSEC log messages: 520 521 :: 522 523 managed-keys-zone: DNSKEY set for zone '.' could not be verified with current keys 524 validating ./DNSKEY: starting 525 validating ./DNSKEY: attempting positive response validation 526 validating ./DNSKEY: no DNSKEY matching DS 527 validating ./DNSKEY: no DNSKEY matching DS 528 validating ./DNSKEY: no valid signature found (DS) 529 530 These errors are indications that there are problems with the trust 531 anchor. 532 533 .. _troubleshooting_nta: 534 535 Negative Trust Anchors 536 ~~~~~~~~~~~~~~~~~~~~~~ 537 538 BIND 9.11 introduced Negative Trust Anchors (NTAs) as a means to 539 *temporarily* disable DNSSEC validation for a zone when you know that 540 the zone's DNSSEC is misconfigured. 541 542 NTAs are added using the :iscman:`rndc` command, e.g.: 543 544 :: 545 546 $ rndc nta example.com 547 Negative trust anchor added: example.com/_default, expires 19-Mar-2020 19:57:42.000 548 549 550 The list of currently configured NTAs can also be examined using 551 :iscman:`rndc`, e.g.: 552 553 :: 554 555 $ rndc nta -dump 556 example.com/_default: expiry 19-Mar-2020 19:57:42.000 557 558 559 The default lifetime of an NTA is one hour, although by default, BIND 560 polls the zone every five minutes to see if the zone correctly 561 validates, at which point the NTA automatically expires. Both the 562 default lifetime and the polling interval may be configured via 563 :iscman:`named.conf`, and the lifetime can be overridden on a per-zone basis 564 using the ``-lifetime duration`` parameter to ``rndc nta``. Both timer 565 values have a permitted maximum value of one week. 566 567 .. _troubleshooting_nsec3: 568 569 NSEC3 Troubleshooting 570 ~~~~~~~~~~~~~~~~~~~~~ 571 572 BIND includes a tool called :iscman:`nsec3hash` that runs through the same 573 steps as a validating resolver, to generate the correct hashed name 574 based on NSEC3PARAM parameters. The command takes the following 575 parameters in order: salt, algorithm, iterations, and domain. For 576 example, if the salt is 1234567890ABCDEF, hash algorithm is 1, and 577 iteration is 10, to get the NSEC3-hashed name for ``www.example.com`` we 578 would execute a command like this: 579 580 :: 581 582 $ nsec3hash 1234567890ABCEDF 1 10 www.example.com 583 RN7I9ME6E1I6BDKIP91B9TCE4FHJ7LKF (salt=1234567890ABCEDF, hash=1, iterations=10) 584 585 Zero-length salt can be specified as ``-``. 586 587 While it is unlikely you would construct a rainbow table of your own 588 zone data, this tool may be useful when troubleshooting NSEC3 problems. 589