1 <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" 2 "https://www.w3.org/TR/html4/loose.dtd"> 3 4 <html> 5 6 <head> 7 8 <title>Postfix Bottleneck Analysis</title> 9 10 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 11 <link rel='stylesheet' type='text/css' href='postfix-doc.css'> 12 13 </head> 14 15 <body> 16 17 <h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Bottleneck Analysis</h1> 18 19 <hr> 20 21 <h2>Purpose of this document </h2> 22 23 <p> This document is an introduction to Postfix queue congestion analysis. 24 It explains how the <a href="qshape.1.html">qshape(1)</a> program can help to track down the 25 reason for queue congestion. <a href="qshape.1.html">qshape(1)</a> is bundled with Postfix 26 2.1 and later source code, under the "auxiliary" directory. This 27 document describes <a href="qshape.1.html">qshape(1)</a> as bundled with Postfix 2.4. </p> 28 29 <p> This document covers the following topics: </p> 30 31 <ul> 32 33 <li><a href="#qshape">Introducing the qshape tool</a> 34 35 <li><a href="#trouble_shooting">Trouble shooting with qshape</a> 36 37 <li><a href="#healthy">Example 1: Healthy queue</a> 38 39 <li><a href="#dictionary_bounce">Example 2: Deferred queue full of 40 dictionary attack bounces</a></li> 41 42 <li><a href="#active_congestion">Example 3: Congestion in the active 43 queue</a></li> 44 45 <li><a href="#backlog">Example 4: High volume destination backlog</a> 46 47 <li><a href="#queues">Postfix queue directories</a> 48 49 <ul> 50 51 <li> <a href="#maildrop_queue"> The "maildrop" queue </a> 52 53 <li> <a href="#hold_queue"> The "hold" queue </a> 54 55 <li> <a href="#incoming_queue"> The "incoming" queue </a> 56 57 <li> <a href="#active_queue"> The "active" queue </a> 58 59 <li> <a href="#deferred_queue"> The "deferred" queue </a> 60 61 </ul> 62 63 <li><a href="#credits">Credits</a> 64 65 </ul> 66 67 <h2><a name="qshape">Introducing the qshape tool</a></h2> 68 69 <p> When mail is draining slowly or the queue is unexpectedly large, 70 run <a href="qshape.1.html">qshape(1)</a> as the super-user (root) to help zero in on the problem. 71 The <a href="qshape.1.html">qshape(1)</a> program displays a tabular view of the Postfix queue 72 contents. </p> 73 74 <ul> 75 76 <li> <p> On the horizontal axis, it displays the queue age with 77 fine granularity for recent messages and (geometrically) less fine 78 granularity for older messages. </p> 79 80 <li> <p> The vertical axis displays the destination (or with the 81 "-s" switch the sender) domain. Domains with the most messages are 82 listed first. </p> 83 84 </ul> 85 86 <p> For example, in the output below we see the top 10 lines of 87 the (mostly forged) sender domain distribution for captured spam 88 in the "<a href="QSHAPE_README.html#hold_queue">hold" queue</a>: </p> 89 90 <blockquote> 91 <pre> 92 $ qshape -s hold | head 93 T 5 10 20 40 80 160 320 640 1280 1280+ 94 TOTAL 486 0 0 1 0 0 2 4 20 40 419 95 yahoo.com 14 0 0 1 0 0 0 0 1 0 12 96 extremepricecuts.net 13 0 0 0 0 0 0 0 2 0 11 97 ms35.hinet.net 12 0 0 0 0 0 0 0 0 1 11 98 winnersdaily.net 12 0 0 0 0 0 0 0 2 0 10 99 hotmail.com 11 0 0 0 0 0 0 0 0 1 10 100 worldnet.fr 6 0 0 0 0 0 0 0 0 0 6 101 ms41.hinet.net 6 0 0 0 0 0 0 0 0 0 6 102 osn.de 5 0 0 0 0 0 1 0 0 0 4 103 </pre> 104 </blockquote> 105 106 <ul> 107 108 <li> <p> The "T" column shows the total (in this case sender) count 109 for each domain. The columns with numbers above them, show counts 110 for messages aged fewer than that many minutes, but not younger 111 than the age limit for the previous column. The row labeled "TOTAL" 112 shows the total count for all domains. </p> 113 114 <li> <p> In this example, there are 14 messages allegedly from 115 yahoo.com, 1 between 10 and 20 minutes old, 1 between 320 and 640 116 minutes old and 12 older than 1280 minutes (1440 minutes in a day). 117 </p> 118 119 </ul> 120 121 <p> When the output is a terminal intermediate results showing the top 20 122 domains (-n option) are displayed after every 1000 messages (-N option) 123 and the final output also shows only the top 20 domains. This makes 124 qshape useful even when the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> is very large and it may 125 otherwise take prohibitively long to read the entire "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>. </p> 126 127 <p> By default, qshape shows statistics for the union of both the 128 "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> and "<a href="QSHAPE_README.html#active_queue">active" queues</a> which are the most relevant queues to 129 look at when analyzing performance. </p> 130 131 <p> One can request an alternate list of queues: </p> 132 133 <blockquote> 134 <pre> 135 $ qshape deferred 136 $ qshape incoming active deferred 137 </pre> 138 </blockquote> 139 140 <p> this will show the age distribution of the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> or 141 the union of the "<a href="QSHAPE_README.html#incoming_queue">incoming"</a>, "<a href="QSHAPE_README.html#active_queue">active"</a> and "<a href="QSHAPE_README.html#deferred_queue">deferred" queues</a>. </p> 142 143 <p> Command line options control the number of display "buckets", 144 the age limit for the smallest bucket, display of parent domain 145 counts and so on. The "-h" option outputs a summary of the available 146 switches. </p> 147 148 <h2><a name="trouble_shooting">Trouble shooting with qshape</a> 149 </h2> 150 151 <p> Large numbers in the qshape output represent a large number of 152 messages that are destined to (or alleged to come from) a particular 153 domain. It should be possible to tell at a glance which domains 154 dominate the queue sender or recipient counts, approximately when 155 a burst of mail started, and when it stopped. </p> 156 157 <p> The problem destinations or sender domains appear near the top 158 left corner of the output table. Remember that the "<a href="QSHAPE_README.html#active_queue">active" queue</a> 159 can accommodate up to 20000 ($<a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a>) messages. 160 To check whether this limit has been reached, use: </p> 161 162 <blockquote> 163 <pre> 164 $ qshape -s active <i>(show sender statistics)</i> 165 </pre> 166 </blockquote> 167 168 <p> If the total sender count is below 20000 the "<a href="QSHAPE_README.html#active_queue">active" queue</a> is 169 not yet saturated, any high volume sender domains show near the 170 top of the output. 171 172 <p> With <a href="qmgr.8.html">oqmgr(8)</a> the "<a href="QSHAPE_README.html#active_queue">active" queue</a> is also limited to at most 20000 173 recipient addresses ($<a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>). To check for 174 exhaustion of this limit use: </p> 175 176 <blockquote> 177 <pre> 178 $ qshape active <i>(show recipient statistics)</i> 179 </pre> 180 </blockquote> 181 182 <p> Having found the high volume domains, it is often useful to 183 search the logs for recent messages pertaining to the domains in 184 question. </p> 185 186 <blockquote> 187 <pre> 188 # Find deliveries to example.com 189 # 190 $ tail -10000 /var/log/maillog | 191 grep -E -i ': to=<.*@example\.com>,' | 192 less 193 194 # Find messages from example.com 195 # 196 $ tail -10000 /var/log/maillog | 197 grep -E -i ': from=<.*@example\.com>,' | 198 less 199 </pre> 200 </blockquote> 201 202 <p> You may want to drill in on some specific queue ids: </p> 203 204 <blockquote> 205 <pre> 206 # Find all messages for a specific queue id. 207 # 208 $ tail -10000 /var/log/maillog | grep -E ': 2B2173FF68: ' 209 </pre> 210 </blockquote> 211 212 <p> Also look for queue manager warning messages in the log. These 213 warnings can suggest strategies to reduce congestion. </p> 214 215 <blockquote> 216 <pre> 217 $ grep -E 'qmgr.*(panic|fatal|error|warning):' /var/log/maillog 218 </pre> 219 </blockquote> 220 221 <p> When all else fails try the Postfix mailing list for help, but 222 please don't forget to include the top 10 or 20 lines of <a href="qshape.1.html">qshape(1)</a> 223 output. </p> 224 225 <h2><a name="healthy">Example 1: Healthy queue</a></h2> 226 227 <p> When looking at just the "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> and "<a href="QSHAPE_README.html#active_queue">active" queues</a>, under 228 normal conditions (no congestion) the "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> and "<a href="QSHAPE_README.html#active_queue">active" queues</a> 229 are nearly empty. Mail leaves the system almost as quickly as it 230 comes in or is deferred without congestion in the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. 231 </p> 232 233 <blockquote> 234 <pre> 235 $ qshape <i>(show "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> and "<a href="QSHAPE_README.html#active_queue">active" queue</a> status)</i> 236 237 T 5 10 20 40 80 160 320 640 1280 1280+ 238 TOTAL 5 0 0 0 1 0 0 0 1 1 2 239 meri.uwasa.fi 5 0 0 0 1 0 0 0 1 1 2 240 </pre> 241 </blockquote> 242 243 <p> If one looks at the two queues separately, the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> 244 is empty or perhaps briefly has one or two messages, while the 245 "<a href="QSHAPE_README.html#active_queue">active" queue</a> holds more messages and for a somewhat longer time: 246 </p> 247 248 <blockquote> 249 <pre> 250 $ qshape incoming 251 252 T 5 10 20 40 80 160 320 640 1280 1280+ 253 TOTAL 0 0 0 0 0 0 0 0 0 0 0 254 255 $ qshape active 256 257 T 5 10 20 40 80 160 320 640 1280 1280+ 258 TOTAL 5 0 0 0 1 0 0 0 1 1 2 259 meri.uwasa.fi 5 0 0 0 1 0 0 0 1 1 2 260 </pre> 261 </blockquote> 262 263 <h2><a name="dictionary_bounce">Example 2: Deferred queue full of 264 dictionary attack bounces</a></h2> 265 266 <p> This is from a server where recipient validation is not yet 267 available for some of the <a href="VIRTUAL_README.html#canonical">hosted domains</a>. Dictionary attacks on 268 the unvalidated domains result in bounce backscatter. The bounces 269 dominate the queue, but with proper tuning they do not saturate the 270 "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> or "<a href="QSHAPE_README.html#active_queue">active" queues</a>. The high volume of deferred mail is not 271 a direct cause for alarm. </p> 272 273 <blockquote> 274 <pre> 275 $ qshape deferred | head 276 277 T 5 10 20 40 80 160 320 640 1280 1280+ 278 TOTAL 2234 4 2 5 9 31 57 108 201 464 1353 279 heyhihellothere.com 207 0 0 1 1 6 6 8 25 68 92 280 pleazerzoneprod.com 105 0 0 0 0 0 0 0 5 44 56 281 groups.msn.com 63 2 1 2 4 4 14 14 14 8 0 282 orion.toppoint.de 49 0 0 0 1 0 2 4 3 16 23 283 kali.com.cn 46 0 0 0 0 1 0 2 6 12 25 284 meri.uwasa.fi 44 0 0 0 0 1 0 2 8 11 22 285 gjr.paknet.com.pk 43 1 0 0 1 1 3 3 6 12 16 286 aristotle.algonet.se 41 0 0 0 0 0 1 2 11 12 15 287 </pre> 288 </blockquote> 289 290 <p> The domains shown are mostly bulk-mailers and all the volume 291 is the tail end of the time distribution, showing that short term 292 arrival rates are moderate. Larger numbers and lower message ages 293 are more indicative of current trouble. Old mail still going nowhere 294 is largely harmless so long as the "<a href="QSHAPE_README.html#active_queue">active"</a> and "<a href="QSHAPE_README.html#incoming_queue">incoming" queues</a> are 295 short. We can also see that the groups.msn.com undeliverables are 296 low rate steady stream rather than a concentrated dictionary attack 297 that is now over. </p> 298 299 <blockquote> 300 <pre> 301 $ qshape -s deferred | head 302 303 T 5 10 20 40 80 160 320 640 1280 1280+ 304 TOTAL 2193 4 4 5 8 33 56 104 205 465 1309 305 MAILER-DAEMON 1709 4 4 5 8 33 55 101 198 452 849 306 example.com 263 0 0 0 0 0 0 0 0 2 261 307 example.org 209 0 0 0 0 0 1 3 6 11 188 308 example.net 6 0 0 0 0 0 0 0 0 0 6 309 example.edu 3 0 0 0 0 0 0 0 0 0 3 310 example.gov 2 0 0 0 0 0 0 0 1 0 1 311 example.mil 1 0 0 0 0 0 0 0 0 0 1 312 </pre> 313 </blockquote> 314 315 <p> Looking at the sender distribution, we see that as expected 316 most of the messages are bounces. </p> 317 318 <h2><a name="active_congestion">Example 3: Congestion in the active 319 queue</a></h2> 320 321 <p> This example is taken from a Feb 2004 discussion on the Postfix 322 Users list. Congestion was reported with the 323 "<a href="QSHAPE_README.html#active_queue">active"</a> and "<a href="QSHAPE_README.html#incoming_queue">incoming" queues</a> 324 large and not shrinking despite very large delivery agent 325 process limits. The thread is archived at: 326 <a href="https://web.archive.org/web/20120227170207/http://archives.neohapsis.com/archives/postfix/2004-02/thread.html#1371">https://web.archive.org/web/20120227170207/http://archives.neohapsis.com/archives/postfix/2004-02/thread.html#1371</a> 327 </p> 328 329 <p> Using an older version of <a href="qshape.1.html">qshape(1)</a> it was quickly determined 330 that all the messages were for just a few destinations: </p> 331 332 <blockquote> 333 <pre> 334 $ qshape <i>(show "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> and "<a href="QSHAPE_README.html#active_queue">active" queue</a> status)</i> 335 336 T A 5 10 20 40 80 160 320 320+ 337 TOTAL 11775 9996 0 0 1 1 42 94 221 1420 338 user.sourceforge.net 7678 7678 0 0 0 0 0 0 0 0 339 lists.sourceforge.net 2313 2313 0 0 0 0 0 0 0 0 340 gzd.gotdns.com 102 0 0 0 0 0 0 0 2 100 341 </pre> 342 </blockquote> 343 344 <p> The "A" column showed the count of messages in the "<a href="QSHAPE_README.html#active_queue">active" queue</a>, 345 and the numbered columns showed totals for the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>. At 346 10000 messages (Postfix 1.x "<a href="QSHAPE_README.html#active_queue">active" queue</a> size limit) the "<a href="QSHAPE_README.html#active_queue">active" queue</a> 347 is full. The "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> was growing rapidly. </p> 348 349 <p> With the trouble destinations clearly identified, the administrator 350 quickly found and fixed the problem. It is substantially harder to 351 glean the same information from the logs. While a careful reading 352 of <a href="mailq.1.html">mailq(1)</a> output should yield similar results, it is much harder 353 to gauge the magnitude of the problem by looking at the queue 354 one message at a time. </p> 355 356 <h2><a name="backlog">Example 4: High volume destination backlog</a></h2> 357 358 <p> When a site you send a lot of email to is down or slow, mail 359 messages will rapidly build up in the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>, or worse, in 360 the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. The qshape output will show large numbers for 361 the destination domain in all age buckets that overlap the starting 362 time of the problem: </p> 363 364 <blockquote> 365 <pre> 366 $ qshape deferred | head 367 368 T 5 10 20 40 80 160 320 640 1280 1280+ 369 TOTAL 5000 200 200 400 800 1600 1000 200 200 200 200 370 highvolume.com 4000 160 160 320 640 1280 1440 0 0 0 0 371 ... 372 </pre> 373 </blockquote> 374 375 <p> Here the "highvolume.com" destination is continuing to accumulate 376 deferred mail. The "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> and "<a href="QSHAPE_README.html#active_queue">active" queues</a> are fine, but the 377 "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> started growing some time between 1 and 2 hours ago 378 and continues to grow. </p> 379 380 <p> If the high volume destination is not down, but is instead 381 slow, one might see similar congestion in the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. 382 "<a href="QSHAPE_README.html#active_queue">Active" queue</a> congestion is a greater cause for alarm; one might need to 383 take measures to ensure that the mail is deferred instead or even 384 add an <a href="access.5.html">access(5)</a> rule asking the sender to try again later. </p> 385 386 <p> If a high volume destination exhibits frequent bursts of consecutive 387 connections refused by all MX hosts or "421 Server busy errors", it 388 is possible for the queue manager to mark the destination as "dead" 389 despite the transient nature of the errors. The destination will be 390 retried again after the expiration of a $<a href="postconf.5.html#minimal_backoff_time">minimal_backoff_time</a> timer. 391 If the error bursts are frequent enough it may be that only a small 392 quantity of email is delivered before the destination is again marked 393 "dead". In some cases enabling static (not on demand) connection 394 caching by listing the appropriate nexthop domain in a table included in 395 "<a href="postconf.5.html#smtp_connection_cache_destinations">smtp_connection_cache_destinations</a>" may help to reduce the error rate, 396 because most messages will re-use existing connections. </p> 397 398 <p> The MTA that has been observed most frequently to exhibit such 399 bursts of errors is Microsoft Exchange, which refuses connections 400 under load. Some proxy virus scanners in front of the Exchange 401 server propagate the refused connection to the client as a "421" 402 error. </p> 403 404 <p> Note that it is now possible to configure Postfix to exhibit similarly 405 erratic behavior by misconfiguring the <a href="anvil.8.html">anvil(8)</a> service. Do not use 406 <a href="anvil.8.html">anvil(8)</a> for steady-state rate limiting, its purpose is (unintentional) 407 DoS prevention and the rate limits set should be very generous! </p> 408 409 <p> If one finds oneself needing to deliver a high volume of mail to a 410 destination that exhibits frequent brief bursts of errors and connection 411 caching does not solve the problem, there is a subtle workaround. </p> 412 413 <ul> 414 415 <li> <p> Postfix version 2.5 and later: </p> 416 417 <ul> 418 419 <li> <p> In <a href="master.5.html">master.cf</a> set up a dedicated clone of the "smtp" transport 420 for the destination in question. In the example below we will call 421 it "fragile". </p> 422 423 <li> <p> In <a href="master.5.html">master.cf</a> configure a reasonable process limit for the 424 cloned smtp transport (a number in the 10-20 range is typical). </p> 425 426 <li> <p> IMPORTANT!!! In <a href="postconf.5.html">main.cf</a> configure a large per-destination 427 pseudo-cohort failure limit for the cloned smtp transport. </p> 428 429 <pre> 430 /etc/postfix/<a href="postconf.5.html">main.cf</a>: 431 <a href="postconf.5.html#transport_maps">transport_maps</a> = <a href="DATABASE_README.html#types">hash</a>:/etc/postfix/transport 432 fragile_destination_concurrency_failed_cohort_limit = 100 433 fragile_destination_concurrency_limit = 20 434 435 /etc/postfix/transport: 436 example.com fragile: 437 438 /etc/postfix/<a href="master.5.html">master.cf</a>: 439 # service type private unpriv chroot wakeup maxproc command 440 fragile unix - - n - 20 smtp 441 </pre> 442 443 <p> See also the documentation for 444 <a href="postconf.5.html#default_destination_concurrency_failed_cohort_limit">default_destination_concurrency_failed_cohort_limit</a> and 445 <a href="postconf.5.html#default_destination_concurrency_limit">default_destination_concurrency_limit</a>. </p> 446 447 </ul> 448 449 <li> <p> Earlier Postfix versions: </p> 450 451 <ul> 452 453 <li> <p> In <a href="master.5.html">master.cf</a> set up a dedicated clone of the "smtp" 454 transport for the destination in question. In the example below 455 we will call it "fragile". </p> 456 457 <li> <p> In <a href="master.5.html">master.cf</a> configure a reasonable process limit for the 458 transport (a number in the 10-20 range is typical). </p> 459 460 <li> <p> IMPORTANT!!! In <a href="postconf.5.html">main.cf</a> configure a very large initial 461 and destination concurrency limit for this transport (say 2000). </p> 462 463 <pre> 464 /etc/postfix/<a href="postconf.5.html">main.cf</a>: 465 <a href="postconf.5.html#transport_maps">transport_maps</a> = <a href="DATABASE_README.html#types">hash</a>:/etc/postfix/transport 466 <a href="postconf.5.html#initial_destination_concurrency">initial_destination_concurrency</a> = 2000 467 fragile_destination_concurrency_limit = 2000 468 469 /etc/postfix/transport: 470 example.com fragile: 471 472 /etc/postfix/<a href="master.5.html">master.cf</a>: 473 # service type private unpriv chroot wakeup maxproc command 474 fragile unix - - n - 20 smtp 475 </pre> 476 477 <p> See also the documentation for <a href="postconf.5.html#default_destination_concurrency_limit">default_destination_concurrency_limit</a>. 478 </p> 479 480 </ul> 481 482 </ul> 483 484 <p> The effect of this configuration is that up to 2000 485 consecutive errors are tolerated without marking the destination 486 dead, while the total concurrency remains reasonable (10-20 487 processes). This trick is only for a very specialized situation: 488 high volume delivery into a channel with multi-error bursts 489 that is capable of high throughput, but is repeatedly throttled by 490 the bursts of errors. </p> 491 492 <p> When a destination is unable to handle the load even after the 493 Postfix process limit is reduced to 1, a desperate measure is to 494 insert brief delays between delivery attempts. </p> 495 496 <ul> 497 498 <li> <p> Postfix version 2.5 and later: </p> 499 500 <ul> 501 502 <li> <p> In <a href="master.5.html">master.cf</a> set up a dedicated clone of the "smtp" transport 503 for the problem destination. In the example below we call it "slow". 504 </p> 505 506 <li> <p> In <a href="postconf.5.html">main.cf</a> configure a short delay between deliveries to 507 the same destination. </p> 508 509 <pre> 510 /etc/postfix/<a href="postconf.5.html">main.cf</a>: 511 <a href="postconf.5.html#transport_maps">transport_maps</a> = <a href="DATABASE_README.html#types">hash</a>:/etc/postfix/transport 512 slow_destination_rate_delay = 1 513 slow_destination_concurrency_failed_cohort_limit = 100 514 515 /etc/postfix/transport: 516 example.com slow: 517 518 /etc/postfix/<a href="master.5.html">master.cf</a>: 519 # service type private unpriv chroot wakeup maxproc command 520 slow unix - - n - - smtp 521 </pre> 522 523 </ul> 524 525 <p> See also the documentation for <a href="postconf.5.html#default_destination_rate_delay">default_destination_rate_delay</a>. </p> 526 527 <p> This solution forces the Postfix <a href="smtp.8.html">smtp(8)</a> client to wait for 528 $slow_destination_rate_delay seconds between deliveries to the same 529 destination. </p> 530 531 <p> IMPORTANT!! The large slow_destination_concurrency_failed_cohort_limit 532 value is needed. This prevents Postfix from deferring all mail for 533 the same destination after only one connection or handshake error 534 (the reason for this is that non-zero slow_destination_rate_delay 535 forces a per-destination concurrency of 1). </p> 536 537 <li> <p> Earlier Postfix versions: </p> 538 539 <ul> 540 541 <li> <p> In the transport map entry for the problem destination, 542 specify a dead host as the primary nexthop. </p> 543 544 <li> <p> In the <a href="master.5.html">master.cf</a> entry for the transport specify the 545 problem destination as the <a href="postconf.5.html#fallback_relay">fallback_relay</a> and specify a small 546 <a href="postconf.5.html#smtp_connect_timeout">smtp_connect_timeout</a> value. </p> 547 548 <pre> 549 /etc/postfix/<a href="postconf.5.html">main.cf</a>: 550 <a href="postconf.5.html#transport_maps">transport_maps</a> = <a href="DATABASE_README.html#types">hash</a>:/etc/postfix/transport 551 552 /etc/postfix/transport: 553 example.com slow:[dead.host] 554 555 /etc/postfix/<a href="master.5.html">master.cf</a>: 556 # service type private unpriv chroot wakeup maxproc command 557 slow unix - - n - 1 smtp 558 -o <a href="postconf.5.html#fallback_relay">fallback_relay</a>=problem.example.com 559 -o <a href="postconf.5.html#smtp_connect_timeout">smtp_connect_timeout</a>=1 560 -o <a href="postconf.5.html#smtp_connection_cache_on_demand">smtp_connection_cache_on_demand</a>=no 561 </pre> 562 563 </ul> 564 565 <p> This solution forces the Postfix <a href="smtp.8.html">smtp(8)</a> client to wait for 566 $<a href="postconf.5.html#smtp_connect_timeout">smtp_connect_timeout</a> seconds between deliveries. The connection 567 caching feature is disabled to prevent the client from skipping 568 over the dead host. </p> 569 570 </ul> 571 572 <h2><a name="queues">Postfix queue directories</a></h2> 573 574 <p> The following sections describe Postfix queues: their purpose, 575 what normal behavior looks like, and how to diagnose abnormal 576 behavior. </p> 577 578 <h3> <a name="maildrop_queue"> The "maildrop" queue </a> </h3> 579 580 <p> Messages that have been submitted via the Postfix <a href="sendmail.1.html">sendmail(1)</a> 581 command, but not yet brought into the main Postfix queue by the 582 <a href="pickup.8.html">pickup(8)</a> service, await processing in the "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a>. Messages 583 can be added to the "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a> even when the Postfix system 584 is not running. They will begin to be processed once Postfix is 585 started. </p> 586 587 <p> The "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a> is drained by the single threaded <a href="pickup.8.html">pickup(8)</a> 588 service scanning the queue directory periodically or when notified 589 of new message arrival by the <a href="postdrop.1.html">postdrop(1)</a> program. The <a href="postdrop.1.html">postdrop(1)</a> 590 program is a setgid helper that allows the unprivileged Postfix 591 <a href="sendmail.1.html">sendmail(1)</a> program to inject mail into the "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a> and 592 to notify the <a href="pickup.8.html">pickup(8)</a> service of its arrival. </p> 593 594 <p> All mail that enters the main Postfix queue does so via the 595 <a href="cleanup.8.html">cleanup(8)</a> service. The cleanup service is responsible for envelope 596 and header rewriting, header and body regular expression checks, 597 automatic bcc recipient processing, milter content processing, and 598 reliable insertion of the message into the Postfix "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a>. </p> 599 600 <p> In the absence of excessive CPU consumption in <a href="cleanup.8.html">cleanup(8)</a> header 601 or body regular expression checks or other software consuming all 602 available CPU resources, Postfix performance is disk I/O bound. 603 The rate at which the <a href="pickup.8.html">pickup(8)</a> service can inject messages into 604 the queue is largely determined by disk access times, since the 605 <a href="cleanup.8.html">cleanup(8)</a> service must commit the message to stable storage before 606 returning success. The same is true of the <a href="postdrop.1.html">postdrop(1)</a> program 607 writing the message to the "maildrop" directory. </p> 608 609 <p> As the pickup service is single threaded, it can only deliver 610 one message at a time at a rate that does not exceed the reciprocal 611 disk I/O latency (+ CPU if not negligible) of the cleanup service. 612 </p> 613 614 <p> Congestion in this queue is indicative of an excessive local message 615 submission rate or perhaps excessive CPU consumption in the <a href="cleanup.8.html">cleanup(8)</a> 616 service due to excessive <a href="postconf.5.html#body_checks">body_checks</a>, or (Postfix ≥ 2.3) high latency 617 milters. </p> 618 619 <p> Note, that once the "<a href="QSHAPE_README.html#active_queue">active" queue</a> is full, the cleanup service 620 will attempt to slow down message injection by pausing $<a href="postconf.5.html#in_flow_delay">in_flow_delay</a> 621 for each message. In this case "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a> congestion may be 622 a consequence of congestion downstream, rather than a problem in 623 its own right. </p> 624 625 <p> Note, you should not attempt to deliver large volumes of mail via 626 the <a href="pickup.8.html">pickup(8)</a> service. High volume sites should avoid using "simple" 627 content filters that re-inject scanned mail via Postfix <a href="sendmail.1.html">sendmail(1)</a> 628 and <a href="postdrop.1.html">postdrop(1)</a>. </p> 629 630 <p> A high arrival rate of locally submitted mail may be an indication 631 of an uncaught forwarding loop, or a run-away notification program. 632 Try to keep the volume of local mail injection to a moderate level. 633 </p> 634 635 <p> The "postsuper -r" command can place selected messages into 636 the "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a> for reprocessing. This is most useful for 637 resetting any stale <a href="postconf.5.html#content_filter">content_filter</a> settings. Requeuing a large number 638 of messages using "postsuper -r" can clearly cause a spike in the 639 size of the "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a>. </p> 640 641 <h3> <a name="hold_queue"> The "hold" queue </a> </h3> 642 643 <p> The administrator can define "smtpd" <a href="access.5.html">access(5)</a> policies, or 644 <a href="cleanup.8.html">cleanup(8)</a> header/body checks that cause messages to be automatically 645 diverted from normal processing and placed indefinitely in the 646 "<a href="QSHAPE_README.html#hold_queue">hold" queue</a>. Messages placed in the "<a href="QSHAPE_README.html#hold_queue">hold" queue</a> stay there until 647 the administrator intervenes. No periodic delivery attempts are 648 made for messages in the "<a href="QSHAPE_README.html#hold_queue">hold" queue</a>. The <a href="postsuper.1.html">postsuper(1)</a> command 649 can be used to manually release messages into the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>. 650 </p> 651 652 <p> Messages can potentially stay in the "<a href="QSHAPE_README.html#hold_queue">hold" queue</a> longer than 653 $<a href="postconf.5.html#maximal_queue_lifetime">maximal_queue_lifetime</a>. If such "old" messages need to be released from 654 the "<a href="QSHAPE_README.html#hold_queue">hold" queue</a>, they should typically be moved into the "<a href="QSHAPE_README.html#maildrop_queue">maildrop" queue</a> 655 using "postsuper -r", so that the message gets a new timestamp and 656 is given more than one opportunity to be delivered. Messages that are 657 "young" can be moved directly into the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> using 658 "postsuper -H". </p> 659 660 <p> The "<a href="QSHAPE_README.html#hold_queue">hold" queue</a> plays little role in Postfix performance, and 661 monitoring of the "<a href="QSHAPE_README.html#hold_queue">hold" queue</a> is typically more closely motivated 662 by tracking spam and malware, than by performance issues. </p> 663 664 <h3> <a name="incoming_queue"> The "incoming" queue </a> </h3> 665 666 <p> All new mail entering the Postfix queue is written by the 667 <a href="cleanup.8.html">cleanup(8)</a> service into the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a>. New queue files are 668 created owned by the "postfix" user with an access bitmask (or 669 mode) of 0600. Once a queue file is ready for further processing 670 the <a href="cleanup.8.html">cleanup(8)</a> service changes the queue file mode to 0700 and 671 notifies the queue manager of new mail arrival. The queue manager 672 ignores incomplete queue files whose mode is 0600, as these are 673 still being written by cleanup. </p> 674 675 <p> The queue manager scans the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> bringing any new 676 mail into the "<a href="QSHAPE_README.html#active_queue">active" queue</a> if the "<a href="QSHAPE_README.html#active_queue">active" queue</a> resource limits 677 have not been exceeded. By default, the "<a href="QSHAPE_README.html#active_queue">active" queue</a> accommodates 678 at most 20000 messages. Once the "<a href="QSHAPE_README.html#active_queue">active" queue</a> message limit is 679 reached, the queue manager stops scanning the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> 680 (and the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>, see below). </p> 681 682 <p> Under normal conditions the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> is nearly empty (has 683 only mode 0600 files), with the queue manager able to import new 684 messages into the "<a href="QSHAPE_README.html#active_queue">active" queue</a> as soon as they become available. 685 </p> 686 687 <p> The "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> grows when the message input rate spikes 688 above the rate at which the queue manager can import messages into 689 the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. The main factors slowing down the queue manager 690 are disk I/O and lookup queries to the trivial-rewrite service. If the queue 691 manager is routinely not keeping up, consider not using "slow" 692 lookup services (MySQL, LDAP, ...) for transport lookups or speeding 693 up the hosts that provide the lookup service. If the problem is I/O 694 starvation, consider striping the queue over more disks, faster controllers 695 with a battery write cache, or other hardware improvements. At the very 696 least, make sure that the queue directory is mounted with the "noatime" 697 option if applicable to the underlying filesystem. </p> 698 699 <p> The <a href="postconf.5.html#in_flow_delay">in_flow_delay</a> parameter is used to clamp the input rate 700 when the queue manager starts to fall behind. The <a href="cleanup.8.html">cleanup(8)</a> service 701 will pause for $<a href="postconf.5.html#in_flow_delay">in_flow_delay</a> seconds before creating a new queue 702 file if it cannot obtain a "token" from the queue manager. </p> 703 704 <p> Since the number of <a href="cleanup.8.html">cleanup(8)</a> processes is limited in most 705 cases by the SMTP server concurrency, the input rate can exceed 706 the output rate by at most "SMTP connection count" / $<a href="postconf.5.html#in_flow_delay">in_flow_delay</a> 707 messages per second. </p> 708 709 <p> With a default process limit of 100, and an <a href="postconf.5.html#in_flow_delay">in_flow_delay</a> of 710 1s, the coupling is strong enough to limit a single run-away injector 711 to 1 message per second, but is not strong enough to deflect an 712 excessive input rate from many sources at the same time. </p> 713 714 <p> If a server is being hammered from multiple directions, consider 715 raising the <a href="postconf.5.html#in_flow_delay">in_flow_delay</a> to 10 seconds, but only if the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> 716 is growing even while the "<a href="QSHAPE_README.html#active_queue">active" queue</a> is not full and the 717 trivial-rewrite service is using a fast transport lookup mechanism. 718 </p> 719 720 <h3> <a name="active_queue"> The "active" queue </a> </h3> 721 722 <p> The queue manager is a delivery agent scheduler; it works to 723 ensure fast and fair delivery of mail to all destinations within 724 designated resource limits. </p> 725 726 <p> The "<a href="QSHAPE_README.html#active_queue">active" queue</a> is somewhat analogous to an operating system's 727 process run queue. Messages in the "<a href="QSHAPE_README.html#active_queue">active" queue</a> are ready to be 728 sent (runnable), but are not necessarily in the process of being 729 sent (running). </p> 730 731 <p> While most Postfix administrators think of the "<a href="QSHAPE_README.html#active_queue">active" queue</a> 732 as a directory on disk, the real "<a href="QSHAPE_README.html#active_queue">active" queue</a> is a set of data 733 structures in the memory of the queue manager process. </p> 734 735 <p> Messages in the "<a href="QSHAPE_README.html#maildrop_queue">maildrop"</a>, "<a href="QSHAPE_README.html#hold_queue">hold"</a>, "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> and "<a href="QSHAPE_README.html#deferred_queue">deferred" queues</a> 736 (see below) do not occupy memory; they are safely stored on 737 disk waiting for their turn to be processed. The envelope information 738 for messages in the "<a href="QSHAPE_README.html#active_queue">active" queue</a> is managed in memory, allowing 739 the queue manager to do global scheduling, allocating available 740 delivery agent processes to an appropriate message in the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. </p> 741 742 <p> Within the "<a href="QSHAPE_README.html#active_queue">active" queue</a>, (multi-recipient) messages are broken 743 up into groups of recipients that share the same transport/nexthop 744 combination; the group size is capped by the transport's recipient 745 concurrency limit. </p> 746 747 <p> Multiple recipient groups (from one or more messages) are queued 748 for delivery grouped by transport/nexthop combination. The 749 <b>destination</b> concurrency limit for the transports caps the number 750 of simultaneous delivery attempts for each nexthop. Transports with 751 a <b>recipient</b> concurrency limit of 1 are special: these are grouped 752 by the actual recipient address rather than the nexthop, yielding 753 per-recipient concurrency limits rather than per-domain 754 concurrency limits. Per-recipient limits are appropriate when 755 performing final delivery to mailboxes rather than when relaying 756 to a remote server. </p> 757 758 <p> Congestion occurs in the "<a href="QSHAPE_README.html#active_queue">active" queue</a> when one or more destinations 759 drain slower than the corresponding message input rate. </p> 760 761 <p> Input into the "<a href="QSHAPE_README.html#active_queue">active" queue</a> comes both from new mail in the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a>, 762 and retries of mail in the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>. Should the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> 763 get really large, retries of old mail can dominate the arrival 764 rate of new mail. Systems with more CPU, faster disks and more network 765 bandwidth can deal with larger "<a href="QSHAPE_README.html#deferred_queue">deferred" queues</a>, but as a rule of thumb 766 the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> scales to somewhere between 100,000 and 1,000,000 767 messages with good performance unlikely above that "limit". Systems with 768 queues this large should typically stop accepting new mail, or put the 769 backlog "on hold" until the underlying issue is fixed (provided that 770 there is enough capacity to handle just the new mail). </p> 771 772 <p> When a destination is down for some time, the queue manager will 773 mark it dead, and immediately defer all mail for the destination without 774 trying to assign it to a delivery agent. In this case the messages 775 will quickly leave the "<a href="QSHAPE_README.html#active_queue">active" queue</a> and end up in the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> 776 (with Postfix < 2.4, this is done directly by the queue manager, 777 with Postfix ≥ 2.4 this is done via the "retry" delivery agent). </p> 778 779 <p> When the destination is instead simply slow, or there is a problem 780 causing an excessive arrival rate the "<a href="QSHAPE_README.html#active_queue">active" queue</a> will grow and will 781 become dominated by mail to the congested destination. </p> 782 783 <p> The only way to reduce congestion is to either reduce the input 784 rate or increase the throughput. Increasing the throughput requires 785 either increasing the concurrency or reducing the latency of 786 deliveries. </p> 787 788 <p> For high volume sites a key tuning parameter is the number of 789 "smtp" delivery agents allocated to the "smtp" and "relay" transports. 790 High volume sites tend to send to many different destinations, many 791 of which may be down or slow, so a good fraction of the available 792 delivery agents will be blocked waiting for slow sites. Also mail 793 destined across the globe will incur large SMTP command-response 794 latencies, so high message throughput can only be achieved with 795 more concurrent delivery agents. </p> 796 797 <p> The default "smtp" process limit of 100 is good enough for most 798 sites, and may even need to be lowered for sites with low bandwidth 799 connections (no use increasing concurrency once the network pipe 800 is full). When one finds that the queue is growing on an "idle" 801 system (CPU, disk I/O and network not exhausted) the remaining 802 reason for congestion is insufficient concurrency in the face of 803 a high average latency. If the number of outbound SMTP connections 804 (either ESTABLISHED or SYN_SENT) reaches the process limit, mail 805 is draining slowly and the system and network are not loaded, raise 806 the "smtp" and/or "relay" process limits! </p> 807 808 <p> When a high volume destination is served by multiple MX hosts with 809 typically low delivery latency, performance can suffer dramatically when 810 one of the MX hosts is unresponsive and SMTP connections to that host 811 timeout. For example, if there are 2 equal weight MX hosts, the SMTP 812 connection timeout is 30 seconds and one of the MX hosts is down, the 813 average SMTP connection will take approximately 15 seconds to complete. 814 With a default per-destination concurrency limit of 20 connections, 815 throughput falls to just over 1 message per second. </p> 816 817 <p> The best way to avoid bottlenecks when one or more MX hosts is 818 non-responsive is to use connection caching. Connection caching was 819 introduced with Postfix 2.2 and is by default enabled on demand for 820 destinations with a backlog of mail in the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. When connection 821 caching is in effect for a particular destination, established connections 822 are re-used to send additional messages, this reduces the number of 823 connections made per message delivery and maintains good throughput even 824 in the face of partial unavailability of the destination's MX hosts. </p> 825 826 <p> If connection caching is not available (Postfix < 2.2) or does 827 not provide a sufficient latency reduction, especially for the "relay" 828 transport used to forward mail to "your own" domains, consider setting 829 lower than default SMTP connection timeouts (1-5 seconds) and higher 830 than default destination concurrency limits. This will further reduce 831 latency and provide more concurrency to maintain throughput should 832 latency rise. </p> 833 834 <p> Setting high concurrency limits to domains that are not your own may 835 be viewed as hostile by the receiving system, and steps may be taken 836 to prevent you from monopolizing the destination system's resources. 837 The defensive measures may substantially reduce your throughput or block 838 access entirely. Do not set aggressive concurrency limits to remote 839 domains without coordinating with the administrators of the target 840 domain. </p> 841 842 <p> If necessary, dedicate and tune custom transports for selected high 843 volume destinations. The "relay" transport is provided for forwarding mail 844 to domains for which your server is a primary or backup MX host. These can 845 make up a substantial fraction of your email traffic. Use the "relay" and 846 not the "smtp" transport to send email to these domains. Using the "relay" 847 transport allocates a separate delivery agent pool to these destinations 848 and allows separate tuning of timeouts and concurrency limits. </p> 849 850 <p> Another common cause of congestion is unwarranted flushing of the 851 entire "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>. The "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> holds messages that are likely 852 to fail to be delivered and are also likely to be slow to fail delivery 853 (time out). As a result the most common reaction to a large "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> 854 (flush it!) is more than likely counter-productive, and typically makes 855 the congestion worse. Do not flush the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> unless you expect 856 that most of its content has recently become deliverable (e.g. <a href="postconf.5.html#relayhost">relayhost</a> 857 back up after an outage)! </p> 858 859 <p> Note that whenever the queue manager is restarted, there may 860 already be messages in the "<a href="QSHAPE_README.html#active_queue">active" queue</a> directory, but the "real" 861 "<a href="QSHAPE_README.html#active_queue">active" queue</a> in memory is empty. In order to recover the in-memory 862 state, the queue manager moves all the "<a href="QSHAPE_README.html#active_queue">active" queue</a> messages 863 back into the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a>, and then uses its normal "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> 864 scan to refill the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. The process of moving all 865 the messages back and forth, redoing transport table (<a href="trivial-rewrite.8.html">trivial-rewrite(8)</a> 866 resolve service) lookups, and re-importing the messages back into 867 memory is expensive. At all costs, avoid frequent restarts of the 868 queue manager (e.g. via frequent execution of "postfix reload"). </p> 869 870 <h3> <a name="deferred_queue"> The "deferred" queue </a> </h3> 871 872 <p> When all the deliverable recipients for a message are delivered, 873 and for some recipients delivery failed for a transient reason (it 874 might succeed later), the message is placed in the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>. 875 </p> 876 877 <p> The queue manager scans the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> periodically. The scan 878 interval is controlled by the <a href="postconf.5.html#queue_run_delay">queue_run_delay</a> parameter. While a "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> 879 scan is in progress, if an "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> scan is also in progress 880 (ideally these are brief since the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> should be short), the 881 queue manager alternates between looking for messages in the "<a href="QSHAPE_README.html#incoming_queue">incoming" queue</a> 882 and in the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a>. This "round-robin" strategy prevents 883 starvation of either the "<a href="QSHAPE_README.html#incoming_queue">incoming"</a> or the "<a href="QSHAPE_README.html#deferred_queue">deferred" queues</a>. </p> 884 885 <p> Each "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> scan only brings a fraction of the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> 886 back into the "<a href="QSHAPE_README.html#active_queue">active" queue</a> for a retry. This is because each 887 message in the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> is assigned a "cool-off" time when 888 it is deferred. This is done by time-warping the modification 889 time of the queue file into the future. The queue file is not 890 eligible for a retry if its modification time is not yet reached. 891 </p> 892 893 <p> The "cool-off" time is at least $<a href="postconf.5.html#minimal_backoff_time">minimal_backoff_time</a> and at 894 most $<a href="postconf.5.html#maximal_backoff_time">maximal_backoff_time</a>. The next retry time is set by doubling 895 the message's age in the queue, and adjusting up or down to lie 896 within the limits. This means that young messages are initially 897 retried more often than old messages. </p> 898 899 <p> If a high volume site routinely has large "<a href="QSHAPE_README.html#deferred_queue">deferred" queues</a>, it 900 may be useful to adjust the <a href="postconf.5.html#queue_run_delay">queue_run_delay</a>, <a href="postconf.5.html#minimal_backoff_time">minimal_backoff_time</a> and 901 <a href="postconf.5.html#maximal_backoff_time">maximal_backoff_time</a> to provide short enough delays on first failure 902 (Postfix ≥ 2.4 has a sensibly low minimal backoff time by default), 903 with perhaps longer delays after multiple failures, to reduce the 904 retransmission rate of old messages and thereby reduce the quantity 905 of previously deferred mail in the "<a href="QSHAPE_README.html#active_queue">active" queue</a>. If you want a really 906 low <a href="postconf.5.html#minimal_backoff_time">minimal_backoff_time</a>, you may also want to lower <a href="postconf.5.html#queue_run_delay">queue_run_delay</a>, 907 but understand that more frequent scans will increase the demand for 908 disk I/O. </p> 909 910 <p> One common cause of large "<a href="QSHAPE_README.html#deferred_queue">deferred" queues</a> is failure to validate 911 recipients at the SMTP input stage. Since spammers routinely launch 912 dictionary attacks from unrepliable sender addresses, the bounces 913 for invalid recipient addresses clog the "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> (and at high 914 volumes proportionally clog the "<a href="QSHAPE_README.html#active_queue">active" queue</a>). Recipient validation 915 is strongly recommended through use of the <a href="postconf.5.html#local_recipient_maps">local_recipient_maps</a> and 916 <a href="postconf.5.html#relay_recipient_maps">relay_recipient_maps</a> parameters. Even when bounces drain quickly they 917 inundate innocent victims of forgery with unwanted email. To avoid 918 this, do not accept mail for invalid recipients. </p> 919 920 <p> When a host with lots of deferred mail is down for some time, 921 it is possible for the entire "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> to reach its retry 922 time simultaneously. This can lead to a very full "<a href="QSHAPE_README.html#active_queue">active" queue</a> once 923 the host comes back up. The phenomenon can repeat approximately 924 every <a href="postconf.5.html#maximal_backoff_time">maximal_backoff_time</a> seconds if the messages are again deferred 925 after a brief burst of congestion. Perhaps, a future Postfix release 926 will add a random offset to the retry time (or use a combination 927 of strategies) to reduce the odds of repeated complete "<a href="QSHAPE_README.html#deferred_queue">deferred" queue</a> 928 flushes. </p> 929 930 <h2><a name="credits">Credits</a></h2> 931 932 <p> The <a href="qshape.1.html">qshape(1)</a> program was developed by Victor Duchovni of Morgan 933 Stanley, who also wrote the initial version of this document. </p> 934 935 </body> 936 937 </html> 938