Home | History | Annotate | Line # | Download | only in doc
      1 TEST PLAN for NSD.
      2 
      3 By W.C.A. Wijngaards, July 2006, NLnetLabs.
      4 
      5 
      6 1. Introduction
      7 ---------------
      8 NSD 3 contains far more features than a typical point release. These
      9 features need to be tested and checked to make sure they work well.
     10 This document describes a plan to test all the features that have
     11 been added to NSD.
     12 
     13 Regression testing is also very important. The old features must
     14 remain working. We have a set of tpkg packages to help with it.
     15 And also root-trace speed tests to regression test NSD.
     16 
     17 The feature tests are to be automated, using tpkg packages where
     18 possible.
     19 
     20 2. Minor Features
     21 -----------------
     22 Some minor features for the test:
     23 
     24 2.1. DNAME
     25 ----------
     26 DNAME support - there are already extensive DNAME tests.
     27 (closed).
     28 
     29 2.2. NSEC3
     30 ----------
     31 NSEC3 support 
     32 	- use the perl automated nsec3 test
     33 		- port to tpkg perhaps.
     34 
     35 Note NSEC3 hash length byte to be implemented, test against others.
     36 Test interoperability of that. A simple zone transfer with Bind.
     37 (experimental, no need to test any more).
     38 
     39 2.3. NSID
     40 ---------
     41 Would make a nice nsid.tpkg package.
     42 
     43 NSID support - run NSD with different NSIDs and queries.
     44 	a- test NSID with zero length, query with NSID.
     45 	b- very long length, query with NSID
     46 	c- 012345678 and query for different things.
     47 		1- query OK things.
     48 		- query error 
     49 			2- nxdomain, 
     50 			3- loop, 
     51 			4- nodata.
     52 		5- query error (bad queries, wrong zone)
     53 			- is NSID present.
     54 	d- NSID and TSIG.
     55 		1- query has OK TSIG
     56 		2- query has BAD TSIG
     57 		3- query for nxdomain
     58 		4- bad query, wrong zone
     59 	e- configure NSID from config file?
     60 		- is this possible
     61 
     62 	f- test if NSID in NOTIFY responses. (should there?)
     63 		ldns-notify and parse result packet for nsid.
     64 	g- test if NSID in AXFR responses. (should there?)
     65 		drill axfr <zone> and see if nsid in result packets.
     66 (experimental, so low priority).
     67 (need a way to send NSID enabled queries - no test).
     68 
     69 3. Transfers
     70 ------------
     71 For the transfers the test are to be done using
     72 - NSD as a master(AXFR) in 3.1, or a ldns-ixfr miniserver(IXFR)
     73   as a master in 3.2 that serves pre-made ixfr answers.
     74 
     75 The zone transfer tests can be put in one tpkg by using servers
     76 at different ports. The allow- lines are then for localhost, all 
     77 ports (since the sending process uses ephemeral ports, all must
     78 be allowed). The request- lines contain the correct port numbers
     79 to send to. 
     80 
     81 3.1. AXFR
     82 ---------
     83 3.1.1. AXFR features
     84 --------------------
     85 Setup is a secondary zone which requests to a master.
     86 the master zone is updated. Then, the secondary should be 
     87 informed with the notify: statements.
     88 And test if secondary got the same zone as master.
     89 By doing axfr from both servers and check if the same, and serial nr. 
     90 
     91 Tests 3.1.1 can be one tpkg.
     92 
     93 (with serial numbers for SOA, to perform serial rollover).
     94 - secondary starts with a zone without content (soa=1)
     95   so the zone is only mentioned in the config, the zonefile is empty/nonexist
     96   on the slave. Master has a soa and three text records.
     97 - axfr an empty zone - only the SOA 	(soa=2)
     98 - axfr a zone with only little data. 	(soa=3)
     99   some NS, MX, A, AAAA records.
    100 [NOTE: apparently, due to the linked list mgt in domains (of rrset*)
    101  the ordering of rrtypes for a domain is reversed after a zone transfer
    102  for NSD, i.e. for query type=any. Ordering within an rrset is preserved.
    103  Created fix to ordering, but is slow for many rr types... ]
    104 - old zone unsigned, new zone signed. 	(soa=4)
    105   sign with two KSKs and one ZSK. And a prepublished ZSK in the zone.
    106 	ZSK1: Kexample.com.+005+44537
    107 	ZSK2: Kexample.com.+005+03824 (prepublish)
    108 	KSK1: Kexample.com.+005+53988
    109 	KSK2: Kexample.com.+005+25320 (presign)
    110 - old zone signed, new zone unsigned. 	(soa=5)
    111   different zone contents, some names are still there, 
    112   unchanged, some names are there RRs changed, some names
    113   there different RRtypes, and some names removed, some names added.
    114 	www: unchanged (including nsec,rrsig).
    115 	webmail: mail prio changed.
    116 	printer: name removed.
    117 	terms: different RR types, now A type.
    118 	mail: type TXT added.
    119 	apex: type DNSKEY, nsec, rrsig removed.
    120 	newservice, ooo: new names
    121 - new zone with nsec3.			(soa=6)
    122 	iter=33 salt=AA44FF11
    123   	slave detects NSEC3 settings.
    124 - new parameters for nsec3.		(soa=7)
    125 	iter=1078 salt=00998877665544332211AADDCCFF
    126   	slave detects NSEC3 settings.
    127 - new zone no longer uses nsec3.	(soa=8)
    128 	uses nsec.
    129 - axfr an empty zone (only the SOA)	(soa=9 + a lot for serial wraparound)
    130 	2**31 = 2147483648
    131 	9 + 2**31 : 2147483657
    132   also tested
    133 	9 + 2**31-3	-- works. notify and transfer, zone updated.
    134 	9 + 2**31-2	-- works. notify and transfer, zone updated.
    135 	9 + 2**31-1	-- notify works, but at transfer time 'serial old'.
    136 				fixed: works, zone updated.
    137 	9 + 2**31 	-- notify is ignored, 'serial old'.
    138 	9 + 2**31+1	-- notify is ignored, 'serial old'.
    139 - axfr wraparound zone, couple txts.	(soa=2)
    140 	serial=2 works after serial=9 + 2**31-2 before.
    141 These can be done in order.
    142 
    143 Test done for AXFR. RRset-type ordering preserve fixed. Serial rollover fixed.
    144 Serial printed as unsigned.
    145 
    146 3.1.2. Huge xfr - see test tpkg for this.
    147 It works already.
    148 
    149 3.1.3. AXFR and TSIG
    150 Like 3.1.1. but enable tsig.
    151 Tested, it works.
    152 
    153 3.2. IXFR 
    154 ---------
    155 3.2.1. IXFR Features
    156 --------------------
    157 Setup is a secondary with ldns-mini-ixfr server as a master.
    158 (ldns/examples/nsd-test/ldns-testns.c).
    159 The mini ixfr server responds with canned replies to a IXFR query.
    160 
    161 - secondary loaded with only the soa = 1. notified with 10.
    162   ixfr server responds with soa=1. (i.e. no update available).
    163 - same, ixfr server responds with soa=2, and TC (udp).
    164   on tcp connection, it responds with a simple difference package.
    165   SOA2 SOA1 SOA2 newTXTA newTXTB SOA2
    166   adds a couple records.
    167 - to soa is 3, and this one removes the TXT records,
    168   makes a new TXTB record and a TXTC record.  test that TXTA domain does not exist NXDOMAIN.
    169   test that TXTB is updated.
    170   test that TXTC exists.
    171 - from 3 to 5, with 4 in between.
    172   make a ixfr packet that is  3 .. 4 and 4 .. 5 concatenated without
    173   compression, so it means more work for processing.
    174   So, in the ixfr packet TXTB is removed from 3, added in 4, removed from 4,
    175   added in 5.
    176   Also in vs4 a txt5 record is added, which stays around.
    177 - from 5 to 7, but this time the redundant work is removed from ixfr packet.
    178 - 7 to 8, have one domain name where two RR types exist, A and DNAME.
    179   remove one RR type in IXFR then make sure the other type still exists.
    180 - 8 to 9, have a name with many different A records. Remove one A record
    181   from it. Add another A record to it. Test if the rest is there.
    182 [Note: when you delete on RR from an RRset, the ordering of the RRset
    183 changes, the contents of the rrset get shuffled (last put in empty slot).]
    184 
    185 3.2.2. a huge ixfr.
    186 -------------------
    187 create test for it. Should be several MBs worth or data removed, MBs of
    188 data that stays the same, and MBs that are added.
    189 To make sure that the code can handle multiple packet IXFRs, and the
    190 state memory between IXFR packets.
    191 - created testns version for multiple packet reply. Small, multiple packets.
    192 Test from 3.1. but using multiple packets: one RR per packet.
    193 This test also falls over from udp to tcp for ixfr.
    194 - This works. The Mbs in size is tested in huge axfr test already.
    195 
    196 3.2.3. Test remove domain
    197 -------------------------
    198 - is_existing = 0 used to remove a domain. Check and test carefully.
    199         - test delete middle name
    200 		i.e. you have a zone with:
    201 			c.example.com TXT "x"
    202 			b.c.example.com TXT "x"
    203 			a.b.c.example.com TXT "x"
    204 		and you delete the b.c. record. The b.c becomes empty 
    205 		nonterminal. If you then delete a.b.c. TXT, the b.c becomes
    206 		NXdomain.
    207 		[- fixed delete with IXFR for empty nonterminals.]
    208 	- test delete/add a domain and NXdomain/exist replies.
    209 		- tested in 3.2.1 already, works.
    210 	- test delet domain and wildcard replies.
    211 - Tested, it works, fix for IXFR that makes empty nonterminals.
    212 
    213 3.3. Timeouts
    214 -------------
    215 Get zone to expire. Check it does not answer.
    216 	Start only a secondary server, no master. Set expire timeout short.
    217 	Timers set as refresh=1 retry=1 expire=10 minimum=10
    218 Provide an update. Check it does answer again.
    219 	Startup the master server after a while. Transfer should happen
    220 	within the retry interval.
    221 Wait for zone to expire again. Check that.
    222 Provide old zone on the master, after expire the slave must transfer it.
    223 The above works, old zone is transferred and served.
    224 
    225 Test that the master says that serial number is OK, in 3.2.1 tests.
    226 This test also includes IXFR reply from the master that contains AXFR contents.
    227 
    228 3.4. TSIG zone transfers
    229 ------------------------
    230 Already TSIG tpkg tests, with transfers TSIG protected, so that is ok.
    231 TSIG notifies - test it, create test for it.
    232 	- notify accepted, nsd->nsd notify
    233 	  by starting master and slave server with tsig keys
    234 	  for a zone, update zone at master.
    235 	  Done in 3.1.3.
    236 	- notify refused. nsd->nsd notify
    237 	  same but use different secret at one server.
    238 	  Test done.
    239 
    240 4. IPC
    241 --------
    242 
    243 4.1. deadlocks
    244 --------------
    245 Have 100.000 zones, all with short SOA timeouts, expire=1 sec. refresh=10.
    246 Expire very quickly. This gives many messages from xfrd to server.
    247 Send notifies to the server in a loop from a shell script. Lots of
    248 messages the other way around.
    249 Provide a master server that will serve all the zones (and say they are ok).
    250 
    251 then proceed to send queries for the zones to the server and see if you 
    252 get answers. Wait for an hour and try again.
    253 
    254 Result, the IPC works okay, but xfrd uses much memory, 16Kb for TSIG regions,
    255 per zone. With the 2.5 kb in xfrd almost 20 Kb per zone. For 2G for 100.000.
    256 A bit much memory, for the largely unused tsig regions.
    257 Fixed, tsig for xfrd uses no preallocated worst case memory use, but only 
    258 a small footprint. During use this may grow; about 1 K per zone perhaps.
    259 
    260 About 2.5Kb per secondary zone in xfrd, below 1 Kb for a master zone,
    261 that works out for 100.000 secondary zones as 250 Mb for xfrd.
    262 
    263 Perhaps do also with 100 child servers for the NSD. see if it can keep
    264 up and the result if it cannot keep up sending to child servers.
    265 Since it has to send for each zone to each child a message, this will
    266 take more resources.
    267 Tested, it cannot keep up. Child servers operate using old zone status
    268 of expired/ok, also the machine load is 100%.
    269 Also fixed tsig.other_size to be checked when reading TSIG from network.
    270 
    271 Due to the length and size, more an incidental test, but can be tpkg-ed.
    272 
    273 4.2. IPC FORKS
    274 --------------
    275 Infinite loop of reloads on a server. Has 10 child servers. wait.
    276 See if it runs out of sockets, file descriptors, etc.
    277 incidental test.
    278 Tested, with adjusted source that repeats reloads. This puts strain on the 
    279 reload ipc handshake code. And ipc socket code. It works fine.
    280 
    281 5. Random mess test
    282 -------------------
    283 Setup 7 servers. In  master->intermed->slave,
    284 with multiple master(2), intermed(3) and slave(2) servers.
    285 TSIGs (different) for everyone.
    286 Perhaps also include never respond entries (fake address) in acls.
    287 
    288 - Load random SOA + random data in servers.
    289   Backup the setup so it is repeatable.
    290   Let them work out what version to run.
    291 - Provide updated zone for a master.
    292   See what happens.
    293 - Send notifies to the slave servers.
    294 - Send notifies to the intermed servers.
    295 - Send notifies to the master servers.
    296 - Kill some server. Start it again.
    297 - Kill some server & delete some file (ixfr.db or xfrd.status).
    298 	- delete ixfr.db
    299 	- delete xfrd.status
    300 	- delete ixfr and xfrd files.
    301 - run nsdc patch on a server.
    302 - pretend an intermediary was offline for a long time
    303   with old zone files and old ixfr.db and xfrd.state(!!) files.
    304   and see what happens :-)
    305   It should refresh/expire and so based on timers in xfrd.state.
    306 
    307 Tested:
    308 - nsd returns formerr on IXFR queries because of data in NS section.
    309   But this is correct, fixed NSD, so it is no longer formerr, but 
    310   refused / not authorised instead. (or whatever we put in axfr.c).
    311 - depending on which server they are asking, servers will use one of
    312   the master zones (after expiry time exceeded). If master updated,
    313   intermediaries, then slaves update themselves too.
    314 - NSD would not start with a corrupt diff file. Now logs error and
    315   ignores, fixes, the diff file.
    316 
    317 6. Portability test
    318 -------------------
    319 Port NSD to as many platforms as possible
    320 - local: sparc5(ok), alpha(ok), amd64/OpenBsd(jelte thuis), 
    321 	open=FreeBSD(ok), linuxes(ok), MacOsX(ok), Sunos4(ok).
    322 - sf compilefarm for more.
    323 	- x86-linux2 has ip6 disabled. tests dont work with that.
    324 - minix3 if we can get it working (the minix3 setup fails somehow).
    325 
    326 - would be good to have a test set of tpkg (and tools required) to
    327   run after a port-test. A very portable set of tpkgs.
    328 	OSTYPE: (g)make. autoreconf. (g)indent. 
    329 		-> defaults for * systems.
    330 	dig 8.3 too old (format of output). Need 9+.
    331 		however dig/bind is not portable enough.
    332 	ldns: pcat, pcat-diff, pcat-print.  xfr1,2:nsd-ldnsd. pcat-grep.pl
    333 	manual: md5sum/md5. hping(sudo). 
    334 	long: ldns-testns.
    335 Made tests more portable, ran tests on linux, freebsd, Solaris. 
    336 Full testset run on SPARC/SunOS2.5, and fixed two unaligned memory accesses, 
    337 all tests succeed now. Full testset runs on Powerpc/MacOSX.
    338 
    339 7. CODE REVIEW
    340 --------------
    341 Code has already had 1x review by Wouter, some review by Miek.
    342 More review (again), Jelte, Wouter. 
    343 - Do some spots of interest.
    344 - perhaps a full review as well.
    345 
    346 8. todo-tests ideas
    347 -------------------
    348 These would be nice as tpkgs, but perhaps manual tests are needed.
    349 
    350 8.1. test combinations of configure options and shells
    351 ------------------------------------------------------
    352 "
    353 run tests with different shells, aka ==-bug
    354 	bash 1.1. is too old for [[ in tpkg and tests.
    355 	Some hosts have awk that puts a space before .pre files in tpkg.
    356 	Some hosts have bash in /usr/local/bin so tpkg fails on that.
    357 run tests with different configure options and combinations
    358     of them.
    359     	Many tests fail with disable-ipv6.
    360 implement this in a xen-like environment so that different OSs can
    361 be checked.
    362 run this daily or only when subversion changes
    363 for each test, run our "test-suite"
    364 "
    365 
    366 8.2. patch file remove
    367 ----------------------
    368 rm patch file, check xfrd's behavior. Refetches zones
    369 Checked in section transfer_axfr.
    370 
    371 8.3. 64 bit
    372 -----------
    373 GB 64 bit file size transfers. On alpha so nastiest
    374 alignment on 64bit machine. Do transfer of > 4 Gb zone.
    375 Needs lots of memory(swap space) and disk space.
    376 Not done; no host for test.
    377 
    378 8.4. Valgrind
    379 -------------
    380 run with valgrind - on two nsds.
    381 then do the nsd-nsd, and notify the master to get axfr
    382 to happen test, with tsig as well enabled.
    383 Done, found one uninit variable.
    384 
    385 8.5. Chroot
    386 -----------
    387 test chroot and the new files/directories.
    388 (And the file/dir not in chroot problem, and if all is OK that it works).
    389 Done, default locations for ixfr.db and xfrd.state have full pathnames.
    390 
    391 8.6. nsdc
    392 ---------
    393 In temporary test setup above, test nsdc tool.
    394 works.
    395 
    396 Make sure that if nsdc patch breaks a zone transfer in progress it is
    397 reattempted later on.
    398 hard to test.
    399 
    400 8.7. nsd-patch
    401 --------------
    402 nsd-patch - run nsd patch and compare zone files, like AXFR/IXFR tests.
    403 Done test axfr run, or test-mess.
    404 
    405 8.8. gcov
    406 ---------
    407 gcov to look at code coverage of the tests. Tests added to improve coverage.
    408