Home | History | Annotate | Line # | Download | only in proto
      1 <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
      2 	"https://www.w3.org/TR/html4/loose.dtd">
      3 
      4 <html>
      5 
      6 <head>
      7 
      8 <title>Postfix Queue Scheduler</title>
      9 
     10 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
     11 <link rel='stylesheet' type='text/css' href='postfix-doc.css'>
     12 
     13 </head>
     14 
     15 <body>
     16 
     17 <h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
     18 Queue Scheduler</h1>
     19 
     20 <hr>
     21 
     22 <h2> Disclaimer </h2>
     23 
     24 <p> Many of the <i>transport</i>-specific configuration parameters
     25 discussed in this document will not show up in "postconf" command
     26 output before Postfix version 2.9. This limitation applies to many
     27 parameters whose name is a combination of a master.cf service name
     28 such as "relay" and a built-in suffix such as
     29 "_destination_concurrency_limit". </p>
     30 
     31 <h2> Overview </h2>
     32 
     33 <p> The queue manager is by far the most complex part of the Postfix
     34 mail system. It schedules delivery of new mail, retries failed
     35 deliveries at specific times, and removes mail from the queue after
     36 the last delivery attempt.  There are two major classes of mechanisms
     37 that control the operation of the queue manager. </p>
     38 
     39 <p> Topics covered by this document: </p>
     40 
     41 <ul>
     42 
     43 <li> <a href="#concurrency"> Concurrency scheduling</a>, concerned
     44 with the number of concurrent deliveries to a specific destination,
     45 including decisions on when to suspend deliveries after persistent
     46 failures.
     47 
     48 <li> <a href="#jobs"> Preemptive scheduling</a>, concerned with
     49 the selection of email messages and recipients for a given destination.
     50 
     51 <li> <a href="#credits"> Credits</a>, something this document would not be
     52 complete without.
     53 
     54 </ul>
     55 
     56 <!--
     57 
     58 <p> Once started, the qmgr(8) process runs until "postfix reload"
     59 or "postfix stop".  As a persistent process, the queue manager has
     60 to meet strict requirements with respect to code correctness and
     61 robustness. Unlike non-persistent daemon processes, the queue manager
     62 cannot benefit from Postfix's process rejuvenation mechanism that
     63 limit the impact from resource leaks and other coding errors
     64 (translation: replacing a process after a short time covers up bugs
     65 before they can become a problem).  </p>
     66 
     67 -->
     68 
     69 <h2> <a name="concurrency"> Concurrency scheduling </a> </h2>
     70 
     71 <p> The following sections document the Postfix 2.5 concurrency
     72 scheduler, after a discussion of the limitations of the earlier
     73 concurrency scheduler. This is followed by results of medium-concurrency
     74 experiments, and a discussion of trade-offs between performance and
     75 robustness.  </p>
     76 
     77 <p> The material is organized as follows: </p>
     78 
     79 <ul>
     80 
     81 <li> <a href="#concurrency_drawbacks"> Drawbacks of the existing
     82 concurrency scheduler </a>
     83 
     84 <li> <a href="#concurrency_summary_2_5"> Summary of the Postfix 2.5
     85 concurrency feedback algorithm </a>
     86 
     87 <li> <a href="#dead_summary_2_5"> Summary of the Postfix 2.5 "dead
     88 destination" detection algorithm </a>
     89 
     90 <li> <a href="#pseudo_code_2_5"> Pseudocode for the Postfix 2.5
     91 concurrency scheduler </a>
     92 
     93 <li> <a href="#concurrency_results"> Results for delivery to
     94 concurrency limited servers </a>
     95 
     96 <li> <a href="#concurrency_discussion"> Discussion of concurrency
     97 limited server results </a>
     98 
     99 <li> <a href="#concurrency_limitations"> Limitations of less-than-1
    100 per delivery feedback </a>
    101 
    102 <li> <a href="#concurrency_config"> Concurrency configuration
    103 parameters </a>
    104 
    105 </ul>
    106 
    107 <h3> <a name="concurrency_drawbacks"> Drawbacks of the existing
    108 concurrency scheduler </a> </h3>
    109 
    110 <p> From the start, Postfix has used a simple but robust algorithm
    111 where the per-destination delivery concurrency is decremented by 1
    112 after delivery failed due to connection or handshake failure, and
    113 incremented by 1 otherwise.  Of course the concurrency is never
    114 allowed to exceed the maximum per-destination concurrency limit.
    115 And when a destination's concurrency level drops to zero, the
    116 destination is declared "dead" and delivery is suspended.  </p>
    117 
    118 <p> Drawbacks of +/-1 concurrency feedback per delivery are: <p>
    119 
    120 <ul>
    121 
    122 <li> <p> Overshoot due to exponential delivery concurrency growth
    123 with each pseudo-cohort(*). This can be an issue with high-concurrency
    124 channels. For example, with the default initial concurrency of 5,
    125 concurrency would proceed over time as (5-10-20).  </p>
    126 
    127 <li> <p> Throttling down to zero concurrency after a single
    128 pseudo-cohort(*) failure. This was especially an issue with
    129 low-concurrency channels where a single failure could be sufficient
    130 to mark a destination as "dead", causing the suspension of further
    131 deliveries to the affected destination. </p>
    132 
    133 </ul>
    134 
    135 <p> (*) A pseudo-cohort is a number of delivery requests equal to
    136 a destination's delivery concurrency. </p>
    137 
    138 <p> The revised concurrency scheduler has a highly modular structure.
    139 It uses separate mechanisms for per-destination concurrency control
    140 and for "dead destination" detection.  The concurrency control in
    141 turn is built from two separate mechanisms: it supports less-than-1
    142 feedback per delivery to allow for more gradual concurrency
    143 adjustments, and it uses feedback hysteresis to suppress concurrency
    144 oscillations.  And instead of waiting for delivery concurrency to
    145 throttle down to zero, a destination is declared "dead" after a
    146 configurable number of pseudo-cohorts reports connection or handshake
    147 failure.  </p>
    148 
    149 <h3> <a name="concurrency_summary_2_5"> Summary of the Postfix 2.5
    150 concurrency feedback algorithm </a> </h3>
    151 
    152 <p> We want to increment a destination's delivery concurrency when
    153 some (not necessarily consecutive) number of deliveries complete
    154 without connection or handshake failure.  This is implemented with
    155 positive feedback g(N) where N is the destination's delivery
    156 concurrency.  With g(N)=1 feedback per delivery, concurrency increases
    157 by 1 after each positive feedback event; this gives us the old
    158 scheduler's exponential growth in time. With g(N)=1/N feedback per
    159 delivery, concurrency increases by 1 after an entire pseudo-cohort
    160 N of positive feedback reports; this gives us linear growth in time.
    161 Less-than-1 feedback per delivery and integer truncation naturally
    162 give us hysteresis, so that transitions to larger concurrency happen
    163 every 1/g(N) positive feedback events.  </p>
    164 
    165 <p> We want to decrement a destination's delivery concurrency when
    166 some (not necessarily consecutive) number of deliveries complete
    167 after connection or handshake failure.  This is implemented with
    168 negative feedback f(N) where N is the destination's delivery
    169 concurrency.  With f(N)=1 feedback per delivery, concurrency decreases
    170 by 1 after each negative feedback event; this gives us the old
    171 scheduler's behavior where concurrency is throttled down dramatically
    172 after a single pseudo-cohort failure.  With f(N)=1/N feedback per
    173 delivery, concurrency backs off more gently.  Again, less-than-1
    174 feedback per delivery and integer truncation naturally give us
    175 hysteresis, so that transitions to lower concurrency happen every
    176 1/f(N) negative feedback events.  </p>
    177 
    178 <p> However, with negative feedback we introduce a subtle twist.
    179 We "reverse" the negative hysteresis cycle so that the transition
    180 to lower concurrency happens at the <b>beginning</b> of a sequence
    181 of 1/f(N) negative feedback events.  Otherwise, a correction for
    182 overload would be made too late.  This makes the choice of f(N)
    183 relatively unimportant, as borne out by measurements later in this
    184 document.  </p>
    185 
    186 <p> In summary, the main ingredients for the Postfix 2.5 concurrency
    187 feedback algorithm are a) the option of less-than-1 positive feedback
    188 per delivery to avoid overwhelming servers, b) the option of
    189 less-than-1 negative feedback per delivery to avoid giving up too
    190 fast, c) feedback hysteresis to avoid rapid oscillation, and d) a
    191 "reverse" hysteresis cycle for negative feedback, so that it can
    192 correct for overload quickly.  </p>
    193 
    194 <h3> <a name="dead_summary_2_5"> Summary of the Postfix 2.5 "dead destination" detection algorithm </a> </h3>
    195 
    196 <p> We want to suspend deliveries to a specific destination after
    197 some number of deliveries suffers connection or handshake failure.
    198 The old scheduler declares a destination "dead" when negative (-1)
    199 feedback throttles the delivery concurrency down to zero. With
    200 less-than-1 feedback per delivery, this throttling down would
    201 obviously take too long.  We therefore have to separate "dead
    202 destination" detection from concurrency feedback.  This is implemented
    203 by introducing the concept of pseudo-cohort failure. The Postfix
    204 2.5 concurrency scheduler declares a destination "dead" after a
    205 configurable number of pseudo-cohorts suffers from connection or
    206 handshake failures. The old scheduler corresponds to the special
    207 case where the pseudo-cohort failure limit is equal to 1.  </p>
    208 
    209 <h3> <a name="pseudo_code_2_5"> Pseudocode for the Postfix 2.5 concurrency scheduler </a> </h3>
    210 
    211 <p> The pseudo code shows how the ideas behind new concurrency
    212 scheduler are implemented as of November 2007.  The actual code can
    213 be found in the module qmgr/qmgr_queue.c.  </p>
    214 
    215 <pre>
    216 Types:
    217         Each destination has one set of the following variables
    218         int concurrency
    219         double success
    220         double failure
    221         double fail_cohorts
    222 
    223 Feedback functions:
    224         N is concurrency; x, y are arbitrary numbers in [0..1] inclusive
    225         positive feedback: g(N) = x/N | x/sqrt(N) | x
    226         negative feedback: f(N) = y/N | y/sqrt(N) | y
    227 
    228 Initialization:
    229         concurrency = initial_concurrency
    230         success = 0
    231         failure = 0
    232         fail_cohorts = 0
    233 
    234 After success:
    235         fail_cohorts = 0
    236         Be prepared for feedback &gt; hysteresis, or rounding error
    237         success += g(concurrency)
    238         while (success >= 1)            Hysteresis 1
    239             concurrency += 1            Hysteresis 1
    240             failure = 0
    241             success -= 1                Hysteresis 1
    242         Be prepared for overshoot
    243         if (concurrency &gt; concurrency limit)
    244             concurrency = concurrency limit
    245 
    246 Safety:
    247         Don't apply positive feedback unless
    248             concurrency &lt; busy_refcount + init_dest_concurrency
    249         otherwise negative feedback effect could be delayed
    250 
    251 After failure:
    252         if (concurrency &gt; 0)
    253             fail_cohorts += 1.0 / concurrency
    254             if (fail_cohorts &gt; cohort_failure_limit)
    255                 concurrency = 0
    256         if (concurrency &gt; 0)
    257             Be prepared for feedback &gt; hysteresis, rounding errors
    258             failure -= f(concurrency)
    259             while (failure &lt; 0)
    260                 concurrency -= 1        Hysteresis 1
    261                 failure += 1            Hysteresis 1
    262                 success = 0
    263             Be prepared for overshoot
    264             if (concurrency &lt; 1)
    265                 concurrency = 1
    266 </pre>
    267 
    268 <h3> <a name="concurrency_results"> Results for delivery to concurrency-limited servers </a> </h3>
    269 
    270 <p> Discussions about the concurrency scheduler redesign started
    271 early 2004, when the primary goal was to find alternatives that did
    272 not exhibit exponential growth or rapid concurrency throttling.  No
    273 code was implemented until late 2007, when the primary concern had
    274 shifted towards better handling of server concurrency limits. For
    275 this reason we measure how well the new scheduler does this
    276 job.  The table below compares mail delivery performance of the old
    277 +/-1 feedback per delivery with several less-than-1 feedback
    278 functions, for different limited-concurrency server scenarios.
    279 Measurements were done with a FreeBSD 6.2 client and with FreeBSD
    280 6.2 and various Linux servers.  </p>
    281 
    282 <p> Server configuration: </p>
    283 
    284 <ul> <li> The mail flow was slowed down with 1 second latency per
    285 recipient ("smtpd_client_restrictions = sleep 1"). The purpose was
    286 to make results less dependent on hardware details, by avoiding
    287 slow-downs by queue file I/O, logging I/O, and network I/O.
    288 
    289 <li> Concurrency was limited by the server process limit
    290 ("default_process_limit = 5" and "smtpd_client_event_limit_exceptions
    291 = static:all"). Postfix was stopped and started after changing the
    292 process limit, because the same number is also used as the backlog
    293 argument to the listen(2) system call, and "postfix reload" does
    294 not re-issue this call.
    295 
    296 <li> Mail was discarded with "local_recipient_maps = static:all" and
    297 "local_transport = discard". The discard action in access maps or
    298 header/body checks
    299 could not be used as it fails to update the in_flow_delay counters.
    300 
    301 </ul>
    302 
    303 <p> Client configuration: </p>
    304 
    305 <ul>
    306 
    307 <li> Queue file overhead was minimized by sending one message to a
    308 virtual alias that expanded into 2000 different remote recipients.
    309 All recipients were accounted for according to the maillog file.
    310 The virtual_alias_expansion_limit setting was increased to avoid
    311 complaints from the cleanup(8) server.
    312 
    313 <li> The number of deliveries was maximized with
    314 "smtp_destination_recipient_limit = 2". A smaller limit would cause
    315 Postfix to schedule the concurrency per recipient instead of domain,
    316 which is not what we want.
    317 
    318 <li> Maximum concurrency was limited with
    319 "smtp_destination_concurrency_limit = 20", and
    320 initial_destination_concurrency was set to the same value.
    321 
    322 <li> The positive and negative concurrency feedback hysteresis was
    323 1.  Concurrency was incremented by 1 at the END of 1/feedback steps
    324 of positive feedback, and was decremented by 1 at the START of
    325 1/feedback steps of negative feedback.
    326 
    327 <li> The SMTP client used the default 30s SMTP connect timeout and
    328 300s SMTP greeting timeout.
    329 
    330 </ul>
    331 
    332 <h4> Impact of the 30s SMTP connect timeout </h4>
    333 
    334 <p> The first results are for a FreeBSD 6.2 server, where our
    335 artificially low listen(2) backlog results in a very short kernel
    336 queue for established connections. The table shows that all deferred
    337 deliveries failed due to a 30s connection timeout, and none failed
    338 due to a server greeting timeout.  This measurement simulates what
    339 happens when the server's connection queue is completely full under
    340 load, and the TCP engine drops new connections.  </p>
    341 
    342 <blockquote>
    343 
    344 <table>
    345 
    346 <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
    347 style</th> <th>connection<br> caching</th> <th>percentage<br>
    348 deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
    349 <th colspan=2>timed-out in<br> connect/greeting </th> </tr>
    350 
    351 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    352 
    353 <tr><td align="center">20</td> <td align="center">5</td> <td
    354 align="center">1/N</td> <td align="center">no</td> <td
    355 align="center">9.9</td> <td align="center">19.4</td> <td
    356 align="center">0.49</td> <td align="center">198</td> <td
    357 align="center">-</td> </tr>
    358 
    359 <tr><td align="center">20</td> <td align="center">5</td> <td
    360 align="center">1/N</td> <td align="center">yes</td> <td
    361 align="center">10.3</td> <td align="center">19.4</td> <td
    362 align="center">0.49</td> <td align="center">206</td> <td
    363 align="center">-</td> </tr>
    364 
    365 <tr><td align="center">20</td> <td align="center">5</td> <td
    366 align="center">1/sqrt(N)</td> <td align="center">no</td>
    367 <td align="center">10.4</td> <td align="center">19.6</td> <td
    368 align="center">0.59</td> <td align="center">208</td> <td
    369 align="center">-</td> </tr>
    370 
    371 <tr><td align="center">20</td> <td align="center">5</td> <td
    372 align="center">1/sqrt(N)</td> <td align="center">yes</td>
    373 <td align="center">10.6</td> <td align="center">19.6</td> <td
    374 align="center">0.61</td> <td align="center">212</td> <td
    375 align="center">-</td> </tr>
    376 
    377 <tr><td align="center">20</td> <td align="center">5</td> <td
    378 align="center">1</td> <td align="center">no</td> <td
    379 align="center">10.1</td> <td align="center">19.5</td> <td
    380 align="center">1.29</td> <td align="center">202</td> <td
    381 align="center">-</td> </tr>
    382 
    383 <tr><td align="center">20</td> <td align="center">5</td> <td
    384 align="center">1</td> <td align="center">yes</td> <td
    385 align="center">10.8</td> <td align="center">19.3</td> <td
    386 align="center">1.57</td> <td align="center">216</td> <td
    387 align="center">-</td> </tr>
    388 
    389 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    390 
    391 </table>
    392 
    393 <p> A busy server with a completely full connection queue.  N is
    394 the client delivery concurrency.  Failed deliveries time out after
    395 30s without completing the TCP handshake. See text for a discussion
    396 of results. </p>
    397 
    398 </blockquote>
    399 
    400 <h4> Impact of the 300s SMTP greeting timeout </h4>
    401 
    402 <p> The next table shows results for a Fedora Core 8 server (results
    403 for RedHat 7.3 are identical). In this case, the artificially small
    404 listen(2) backlog argument does not impact our measurement.  The
    405 table shows that practically all deferred deliveries fail after the
    406 300s SMTP greeting timeout. As these timeouts were 10x longer than
    407 with the first measurement, we increased the recipient count (and
    408 thus the running time) by a factor of 10 to keep the results
    409 comparable. The deferred mail percentages are a factor 10 lower
    410 than with the first measurement, because the 1s per-recipient delay
    411 was 1/300th of the greeting timeout instead of 1/30th of the
    412 connection timeout.  </p>
    413 
    414 <blockquote>
    415 
    416 <table>
    417 
    418 <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
    419 style</th> <th>connection<br> caching</th> <th>percentage<br>
    420 deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
    421 <th colspan=2>timed-out in<br> connect/greeting </th> </tr>
    422 
    423 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    424 
    425 <tr> <td align="center">20</td> <td align="center">5</td> <td
    426 align="center">1/N</td> <td align="center">no</td> <td
    427 align="center">1.16</td> <td align="center">19.8</td> <td
    428 align="center">0.37</td> <td align="center">-</td> <td
    429 align="center">230</td> </tr>
    430 
    431 <tr> <td align="center">20</td> <td align="center">5</td> <td
    432 align="center">1/N</td> <td align="center">yes</td> <td
    433 align="center">1.36</td> <td align="center">19.8</td> <td
    434 align="center">0.36</td> <td align="center">-</td> <td
    435 align="center">272</td> </tr>
    436 
    437 <tr> <td align="center">20</td> <td align="center">5</td> <td
    438 align="center">1/sqrt(N)</td> <td align="center">no</td>
    439 <td align="center">1.21</td> <td align="center">19.9</td> <td
    440 align="center">0.23</td> <td align="center">4</td> <td
    441 align="center">238</td> </tr>
    442 
    443 <tr> <td align="center">20</td> <td align="center">5</td> <td
    444 align="center">1/sqrt(N)</td> <td align="center">yes</td>
    445 <td align="center">1.36</td> <td align="center">20.0</td> <td
    446 align="center">0.23</td> <td align="center">-</td> <td
    447 align="center">272</td> </tr>
    448 
    449 <tr> <td align="center">20</td> <td align="center">5</td> <td
    450 align="center">1</td> <td align="center">no</td> <td
    451 align="center">1.18</td> <td align="center">20.0</td> <td
    452 align="center">0.16</td> <td align="center">-</td> <td
    453 align="center">236</td> </tr>
    454 
    455 <tr> <td align="center">20</td> <td align="center">5</td> <td
    456 align="center">1</td> <td align="center">yes</td> <td
    457 align="center">1.39</td> <td align="center">20.0</td> <td
    458 align="center">0.16</td> <td align="center">-</td> <td
    459 align="center">278</td> </tr>
    460 
    461 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    462 
    463 </table>
    464 
    465 <p> A busy server with a non-full connection queue.  N is the client
    466 delivery concurrency. Failed deliveries complete at the TCP level,
    467 but time out after 300s while waiting for the SMTP greeting.  See
    468 text for a discussion of results.  </p>
    469 
    470 </blockquote>
    471 
    472 <h4> Impact of active server concurrency limiter </h4>
    473 
    474 <p> The final concurrency-limited result shows what happens when
    475 SMTP connections don't time out, but are rejected immediately with
    476 the Postfix server's smtpd_client_connection_count_limit feature
    477 (the server replies with a 421 status and disconnects immediately).
    478 Similar results can be expected with concurrency limiting features
    479 built into other MTAs or firewalls.  For this measurement we specified
    480 a server concurrency limit and a client initial destination concurrency
    481 of 5, and a server process limit of 10; all other conditions were
    482 the same as with the first measurement. The same result would be
    483 obtained with a FreeBSD or Linux server, because the "pushing back"
    484 is done entirely by the receiving side. </p>
    485 
    486 <blockquote>
    487 
    488 <table>
    489 
    490 <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
    491 style</th> <th>connection<br> caching</th> <th>percentage<br>
    492 deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
    493 <th>theoretical<br>defer rate</th> </tr>
    494 
    495 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    496 
    497 <tr> <td align="center">20</td> <td align="center">5</td> <td
    498 align="center">1/N</td> <td align="center">no</td> <td
    499 align="center">16.5</td> <td align="center">5.17</td> <td
    500 align="center">0.38</td> <td align="center">1/6</td> </tr>
    501 
    502 <tr> <td align="center">20</td> <td align="center">5</td> <td
    503 align="center">1/N</td> <td align="center">yes</td> <td
    504 align="center">16.5</td> <td align="center">5.17</td> <td
    505 align="center">0.38</td> <td align="center">1/6</td> </tr>
    506 
    507 <tr> <td align="center">20</td> <td align="center">5</td> <td
    508 align="center">1/sqrt(N)</td> <td align="center">no</td>
    509 <td align="center">24.5</td> <td align="center">5.28</td> <td
    510 align="center">0.45</td> <td align="center">1/4</td> </tr>
    511 
    512 <tr> <td align="center">20</td> <td align="center">5</td> <td
    513 align="center">1/sqrt(N)</td> <td align="center">yes</td>
    514 <td align="center">24.3</td> <td align="center">5.28</td> <td
    515 align="center">0.46</td> <td align="center">1/4</td> </tr>
    516 
    517 <tr> <td align="center">20</td> <td align="center">5</td> <td
    518 align="center">1</td> <td align="center">no</td> <td
    519 align="center">49.7</td> <td align="center">5.63</td> <td
    520 align="center">0.67</td> <td align="center">1/2</td> </tr>
    521 
    522 <tr> <td align="center">20</td> <td align="center">5</td> <td
    523 align="center">1</td> <td align="center">yes</td> <td
    524 align="center">49.7</td> <td align="center">5.68</td> <td
    525 align="center">0.70</td> <td align="center">1/2</td> </tr>
    526 
    527 <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    528 
    529 </table>
    530 
    531 <p> A server with active per-client concurrency limiter that replies
    532 with 421 and disconnects.  N is the client delivery concurrency.
    533 The theoretical defer rate is 1/(1+roundup(1/feedback)).  This is
    534 always 1/2 with the fixed +/-1 feedback per delivery; with the
    535 concurrency-dependent feedback variants, the defer rate decreases
    536 with increasing concurrency. See text for a discussion of results.
    537 </p>
    538 
    539 </blockquote>
    540 
    541 <h3> <a name="concurrency_discussion"> Discussion of concurrency-limited server results </a> </h3>
    542 
    543 <p> All results in the previous sections are based on the first
    544 delivery runs only; they do not include any second etc. delivery
    545 attempts. It's also worth noting that the measurements look at
    546 steady-state behavior only. They don't show what happens when the
    547 client starts sending at a much higher or lower concurrency.
    548 </p>
    549 
    550 <p> The first two examples show that the effect of feedback
    551 is negligible when concurrency is limited due to congestion. This
    552 is because the initial concurrency is already at the client's
    553 concurrency maximum, and because there is 10-100 times more positive
    554 than negative feedback.  Under these conditions, it is no surprise
    555 that the contribution from SMTP connection caching is also negligible.
    556 </p>
    557 
    558 <p> In the last example, the old +/-1 feedback per delivery will
    559 defer 50% of the mail when confronted with an active (anvil-style)
    560 server concurrency limit, where the server hangs up immediately
    561 with a 421 status (a TCP-level RST would have the same result).
    562 Less aggressive feedback mechanisms fare better than more aggressive
    563 ones.  Concurrency-dependent feedback fares even better at higher
    564 concurrencies than shown here, but has limitations as discussed in
    565 the next section.  </p>
    566 
    567 <h3> <a name="concurrency_limitations"> Limitations of less-than-1 per delivery feedback </a> </h3>
    568 
    569 <p> Less-than-1 feedback is of interest primarily when sending large
    570 amounts of mail to destinations with active concurrency limiters
    571 (servers that reply with 421, or firewalls that send RST).  When
    572 sending small amounts of mail per destination, less-than-1 per-delivery
    573 feedback won't have a noticeable effect on the per-destination
    574 concurrency, because the number of deliveries to the same destination
    575 is too small. You might just as well use zero per-delivery feedback
    576 and stay with the initial per-destination concurrency. And when
    577 mail deliveries fail due to congestion instead of active concurrency
    578 limiters, the measurements above show that per-delivery feedback
    579 has no effect.  With large amounts of mail you might just as well
    580 use zero per-delivery feedback and start with the maximal per-destination
    581 concurrency.  </p>
    582 
    583 <p> The scheduler with less-than-1 concurrency
    584 feedback per delivery solves a problem with servers that have active
    585 concurrency limiters.  This works only because feedback is handled
    586 in a peculiar manner: positive feedback will increment the concurrency
    587 by 1 at the <b>end</b> of a sequence of events of length 1/feedback,
    588 while negative feedback will decrement concurrency by 1 at the
    589 <b>beginning</b> of such a sequence.  This is how Postfix adjusts
    590 quickly for overshoot without causing lots of mail to be deferred.
    591 Without this difference in feedback treatment, less-than-1 feedback
    592 per delivery would defer 50% of the mail, and would be no better
    593 in this respect than the old +/-1 feedback per delivery.  </p>
    594 
    595 <p> Unfortunately, the same feature that corrects quickly for
    596 concurrency overshoot also makes the scheduler more sensitive for
    597 noisy negative feedback.  The reason is that one lonely negative
    598 feedback event has the same effect as a complete sequence of length
    599 1/feedback: in both cases delivery concurrency is dropped by 1
    600 immediately.  As a worst-case scenario, consider multiple servers
    601 behind a load balancer on a single IP address, and no backup MX
    602 address.  When 1 out of K servers fails to complete the SMTP handshake
    603 or drops the connection, a scheduler with 1/N (N = concurrency)
    604 feedback stops increasing its concurrency once it reaches a concurrency
    605 level of about K,  even though the good servers behind the load
    606 balancer are perfectly capable of handling more traffic. </p>
    607 
    608 <p> This noise problem gets worse as the amount of positive feedback
    609 per delivery gets smaller.  A compromise is to use fixed less-than-1
    610 positive feedback values instead of concurrency-dependent positive
    611 feedback.  For example, to tolerate 1 of 4 bad servers in the above
    612 load balancer scenario, use positive feedback of 1/4 per "good"
    613 delivery (no connect or handshake error), and use an equal or smaller
    614 amount of negative feedback per "bad" delivery.  The downside of
    615 using concurrency-independent feedback is that some of the old +/-1
    616 feedback problems will return at large concurrencies.  Sites that
    617 must deliver mail at non-trivial per-destination concurrencies will
    618 require special configuration.  </p>
    619 
    620 <h3> <a name="concurrency_config"> Concurrency configuration parameters </a> </h3>
    621 
    622 <p> The Postfix 2.5 concurrency scheduler is controlled with the
    623 following configuration parameters, where "<i>transport</i>_foo"
    624 provides a transport-specific parameter override.  All parameter
    625 default settings are compatible with earlier Postfix versions. </p>
    626 
    627 <blockquote>
    628 
    629 <table border="0">
    630 
    631 <tr> <th> Parameter name </th> <th> Postfix version </th> <th>
    632 Description </th> </tr>
    633 
    634 <tr> <td colspan="3"> <hr> </td> </tr>
    635 
    636 <tr> <td> initial_destination_concurrency<br>
    637 <i>transport</i>_initial_destination_concurrency </td> <td
    638 align="center"> all<br> 2.5 </td> <td> Initial per-destination
    639 delivery concurrency </td> </tr>
    640 
    641 <tr> <td> default_destination_concurrency_limit<br>
    642 <i>transport</i>_destination_concurrency_limit </td> <td align="center">
    643 all<br> all </td> <td> Maximum per-destination delivery concurrency
    644 </td> </tr>
    645 
    646 <tr> <td> default_destination_concurrency_positive_feedback<br>
    647 <i>transport</i>_destination_concurrency_positive_feedback </td>
    648 <td align="center"> 2.5<br> 2.5 </td> <td> Per-destination positive
    649 feedback amount, per delivery that does not fail with connection
    650 or handshake failure </td> </tr>
    651 
    652 <tr> <td> default_destination_concurrency_negative_feedback<br>
    653 <i>transport</i>_destination_concurrency_negative_feedback </td>
    654 <td align="center"> 2.5<br> 2.5 </td> <td> Per-destination negative
    655 feedback amount, per delivery that fails with connection or handshake
    656 failure </td> </tr>
    657 
    658 <tr> <td> default_destination_concurrency_failed_cohort_limit<br>
    659 <i>transport</i>_destination_concurrency_failed_cohort_limit </td>
    660 <td align="center"> 2.5<br> 2.5 </td> <td> Number of failed
    661 pseudo-cohorts after which a destination is declared "dead" and
    662 delivery is suspended </td> </tr>
    663 
    664 <tr> <td> destination_concurrency_feedback_debug</td> <td align="center">
    665 2.5 </td> <td> Enable verbose logging of concurrency scheduler
    666 activity </td> </tr>
    667 
    668 <tr> <td colspan="3"> <hr> </td> </tr>
    669 
    670 </table>
    671 
    672 </blockquote>
    673 
    674 <h2> <a name="jobs"> Preemptive scheduling </a> </h2>
    675 
    676 <p>
    677 
    678 The following sections describe the new queue manager and its
    679 preemptive scheduler algorithm. Note that the document was originally
    680 written to describe the changes between the new queue manager (in
    681 this text referred to as <tt>nqmgr</tt>, the name it was known by
    682 before it became the default queue manager) and the old queue manager
    683 (referred to as <tt>oqmgr</tt>). This is why it refers to <tt>oqmgr</tt>
    684 every so often.
    685 
    686 </p>
    687 
    688 <p>
    689 
    690 This document is divided into sections as follows:
    691 
    692 </p>
    693 
    694 <ul>
    695 
    696 <li> <a href="#<tt>nqmgr</tt>_structures"> The structures used by
    697 nqmgr </a>
    698 
    699 <li> <a href="#<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
    700 up the message </a> - how it is assigned to transports, jobs, peers,
    701 entries
    702 
    703 <li> <a href="#<tt>nqmgr</tt>_selection"> How the entry selection
    704 works </a>
    705 
    706 <li> <a href="#<tt>nqmgr</tt>_preemption"> How the preemption
    707 works </a> - what messages may be preempted and how and what messages
    708 are chosen to preempt them
    709 
    710 <li> <a href="#<tt>nqmgr</tt>_concurrency"> How destination concurrency
    711 limits affect the scheduling algorithm </a>
    712 
    713 <li> <a href="#<tt>nqmgr</tt>_memory"> Dealing with memory resource
    714 limits </a>
    715 
    716 </ul>
    717 
    718 <h3> <a name="<tt>nqmgr</tt>_structures"> The structures used by
    719 nqmgr </a> </h3>
    720 
    721 <p>
    722 
    723 Let's start by recapitulating the structures and terms used when
    724 referring to the queue manager and how it operates. Many of these are
    725 partially described elsewhere, but it is nice to have a coherent
    726 overview in one place:
    727 
    728 </p>
    729 
    730 <ul>
    731 
    732 <li> <p> Each message structure represents one mail message which
    733 Postfix is to deliver. The message recipients specify to what
    734 destinations is the message to be delivered and what transports are
    735 going to be used for the delivery. </p>
    736 
    737 <li> <p> Each recipient entry groups a batch of recipients of one
    738 message which are all going to be delivered to the same destination
    739 (and over the same transport).
    740 </p>
    741 
    742 <li> <p> Each transport structure groups everything what is going
    743 to be delivered by delivery agents dedicated for that transport.
    744 Each transport maintains a set of queues (describing the destinations
    745 it shall talk to) and jobs (referencing the messages it shall
    746 deliver). </p>
    747 
    748 <li> <p> Each transport queue (not to be confused with the on-disk
    749 "active" queue or "incoming" queue) groups everything what is going be
    750 delivered to given destination (aka nexthop) by its transport.  Each
    751 queue belongs to one transport, so each destination may be referred
    752 to by several queues, one for each transport.  Each queue maintains
    753 a list of all recipient entries (batches of message recipients)
    754 which shall be delivered to given destination (the todo list), and
    755 a list of recipient entries already being delivered by the delivery
    756 agents (the busy list). </p>
    757 
    758 <li> <p> Each queue corresponds to multiple peer structures.  Each
    759 peer structure is like the queue structure, belonging to one transport
    760 and referencing one destination. The difference is that it lists
    761 only the recipient entries which all originate from the same message,
    762 unlike the queue structure, whose entries may originate from various
    763 messages. For messages with few recipients, there is usually just
    764 one recipient entry for each destination, resulting in one recipient
    765 entry per peer. But for large mailing list messages the recipients
    766 may need to be split to multiple recipient entries, in which case
    767 the peer structure may list many entries for single destination.
    768 </p>
    769 
    770 <li> <p> Each transport job groups everything it takes to deliver
    771 one message via its transport. Each job represents one message
    772 within the context of the transport. The job belongs to one transport
    773 and message, so each message may have multiple jobs, one for each
    774 transport. The job groups all the peer structures, which describe
    775 the destinations the job's message has to be delivered to. </p>
    776 
    777 </ul>
    778 
    779 <p>
    780 
    781 The first four structures are common to both <tt>nqmgr</tt> and
    782 <tt>oqmgr</tt>, the latter two were introduced by <tt>nqmgr</tt>.
    783 
    784 </p>
    785 
    786 <p>
    787 
    788 These terms are used extensively in the text below, feel free to
    789 look up the description above anytime you'll feel you have lost a
    790 sense what is what.
    791 
    792 </p>
    793 
    794 <h3> <a name="<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
    795 up the message </a> </h3>
    796 
    797 <p>
    798 
    799 Whenever <tt>nqmgr</tt> moves a queue file into the "active" queue,
    800 the following happens: It reads all necessary information from the
    801 queue file as <tt>oqmgr</tt> does, and also reads as many recipients
    802 as possible - more on that later, for now let's just pretend it
    803 always reads all recipients.
    804 
    805 </p>
    806 
    807 <p>
    808 
    809 Then it resolves the recipients as <tt>oqmgr</tt> does, which
    810 means obtaining (address, nexthop, transport) triple for each
    811 recipient. For each triple, it finds the transport; if it does not
    812 exist yet, it instantiates it (unless it's dead). Within the
    813 transport, it finds the destination queue for the given nexthop; if it
    814 does not exist yet, it instantiates it (unless it's dead). The
    815 triple is then bound to given destination queue. This happens in
    816 qmgr_resolve() and is basically the same as in <tt>oqmgr</tt>.
    817 
    818 </p>
    819 
    820 <p>
    821 
    822 Then for each triple which was bound to some queue (and thus
    823 transport), the program finds the job which represents the message
    824 within that transport's context; if it does not exist yet, it
    825 instantiates it. Within the job, it finds the peer which represents
    826 the bound destination queue within this jobs context; if it does
    827 not exist yet, it instantiates it.  Finally, it stores the address
    828 from the resolved triple to the recipient entry which is appended
    829 to both the queue entry list and the peer entry list. The addresses
    830 for the same nexthop are batched in the entries up to the
    831 <i>transport</i>_destination_recipient_limit for that transport.
    832 This happens in qmgr_message_assign(), and apart
    833 from that it operates with job and peer structures, it is basically the
    834 same as in <tt>oqmgr</tt>.
    835 
    836 </p>
    837 
    838 <p>
    839 
    840 When the job is instantiated, it is enqueued on the transport's job
    841 list based on the time its message was picked up by <tt>nqmgr</tt>.
    842 For first batch of recipients this means it is appended to the end
    843 of the job list, but the ordering of the job list by the enqueue
    844 time is important as we will see shortly.
    845 
    846 </p>
    847 
    848 <p>
    849 
    850 [Now you should have a pretty good idea what the state of the
    851 <tt>nqmgr</tt> is after a couple of messages were picked up, and what the
    852 relation is between all those job, peer, queue and entry structures.]
    853 
    854 </p>
    855 
    856 <h3> <a name="<tt>nqmgr</tt>_selection"> How the entry selection
    857 works </a> </h3>
    858 
    859 <p>
    860 
    861 Having prepared all those above mentioned structures, the task of
    862 the <tt>nqmgr</tt>'s scheduler is to choose the recipient entries
    863 one at a time and pass them to the delivery agent for corresponding
    864 transport. Now how does this work?
    865 
    866 </p>
    867 
    868 <p>
    869 
    870 The first approximation of the new scheduling algorithm is like this:
    871 
    872 </p>
    873 
    874 <blockquote>
    875 <pre>
    876 foreach transport (round-robin-by-transport)
    877 do
    878     if transport busy continue
    879     if transport process limit reached continue
    880     foreach transport's job (in the order of the transport's job list)
    881     do
    882 	foreach job's peer (round-robin-by-destination)
    883 	     if peer-&gt;queue-&gt;concurrency &lt; peer-&gt;queue-&gt;window
    884 		 return next peer entry.
    885 	done
    886     done
    887 done
    888 </pre>
    889 </blockquote>
    890 
    891 <p>
    892 
    893 Now what is the "order of the transport's job list"? As we know
    894 already, the job list is by default kept in the order the message
    895 was picked up by the <tt>nqmgr</tt>. So by default we get the
    896 top-level round-robin transport, and within each transport we get
    897 the FIFO message delivery. The round-robin of the peers by the
    898 destination is perhaps of little importance in most real-life cases
    899 (unless the <i>transport</i>_destination_recipient_limit is reached,
    900 in one job there
    901 is only one peer structure for each destination), but theoretically
    902 it makes sure that even within single jobs, destinations are treated
    903 fairly.
    904 
    905 </p>
    906 
    907 <p>
    908 
    909 [By now you should have a feeling you really know how the scheduler
    910 works, except for the preemption, under ideal conditions - that is,
    911 no recipient resource limits and no destination concurrency problems.]
    912 
    913 </p>
    914 
    915 <h3> <a name="<tt>nqmgr</tt>_preemption"> How the preemption
    916 works </a> </h3>
    917 
    918 <p>
    919 
    920 As you might perhaps expect by now, the transport's job list does
    921 not remain sorted by the job's message enqueue time all the time.
    922 The most cool thing about <tt>nqmgr</tt> is not the simple FIFO
    923 delivery, but that it is able to slip mail with little recipients
    924 past the mailing-list bulk mail.  This is what the job preemption
    925 is about - shuffling the jobs on the transport's job list to get
    926 the best message delivery rates. Now how is it achieved?
    927 
    928 </p>
    929 
    930 <p>
    931 
    932 First I have to tell you that there are in fact two job lists in
    933 each transport. One is the scheduler's job list, which the scheduler
    934 is free to play with, while the other one keeps the jobs always
    935 listed in the order of the enqueue time and is used for recipient
    936 pool management we will discuss later. For now, we will deal with
    937 the scheduler's job list only.
    938 
    939 </p>
    940 
    941 <p>
    942 
    943 So, we have the job list, which is first ordered by the time the
    944 jobs' messages were enqueued, oldest messages first, the most recently
    945 picked one at the end. For now, let's assume that there are no
    946 destination concurrency problems. Without preemption, we pick some
    947 entry of the first (oldest) job on the queue, assign it to delivery
    948 agent, pick another one from the same job, assign it again, and so
    949 on, until all the entries are used and the job is delivered. We
    950 would then move onto the next job and so on and on. Now how do we
    951 manage to sneak in some entries from the recently added jobs when
    952 the first job on the job list belongs to a message going to the
    953 mailing-list and has thousands of recipient entries?
    954 
    955 </p>
    956 
    957 <p>
    958 
    959 The <tt>nqmgr</tt>'s answer is that we can artificially "inflate"
    960 the delivery time of that first job by some constant for free - it
    961 is basically the same trick you might remember as "accumulation of
    962 potential" from the amortized complexity lessons. For example,
    963 instead of delivering the entries of the first job on the job list
    964 every time a delivery agent becomes available, we can do it only
    965 every second time. If you view the moments the delivery agent becomes
    966 available on a timeline as "delivery slots", then instead of using
    967 every delivery slot for the first job, we can use only every other
    968 slot, and still the overall delivery efficiency of the first job
    969 remains the same. So the delivery <tt>11112222</tt> becomes
    970 <tt>1.1.1.1.2.2.2.2</tt> (1 and 2 are the imaginary job numbers, .
    971 denotes the free slot). Now what do we do with free slots?
    972 
    973 </p>
    974 
    975 <p>
    976 
    977 As you might have guessed, we will use them for sneaking the mail
    978 with little recipients in. For example, if we have one four-recipient
    979 mail followed by four one recipients mail, the delivery sequence
    980 (that is, the sequence in which the jobs are assigned to the
    981 delivery slots) might look like this: <tt>12131415</tt>. Hmm, fine
    982 for sneaking in the single recipient mail, but how do we sneak in
    983 the mail with more than one recipient? Say if we have one four-recipient
    984 mail followed by two two-recipient mails?
    985 
    986 </p>
    987 
    988 <p>
    989 
    990 The simple answer would be to use delivery sequence <tt>12121313</tt>.
    991 But the problem is that this does not scale well. Imagine you have
    992 mail with a thousand recipients followed by mail with a hundred recipients.
    993 It is tempting to suggest the  delivery sequence like <tt>121212....</tt>,
    994 but alas! Imagine there arrives another mail with say ten recipients.
    995 But there are no free slots anymore, so it can't slip by, not even
    996 if it had only one recipient.  It will be stuck until the
    997 hundred-recipient mail is delivered, which really sucks.
    998 
    999 </p>
   1000 
   1001 <p>
   1002 
   1003 So, it becomes obvious that while inflating the message to get
   1004 free slots is a great idea, one has to be really careful of how the
   1005 free slots are assigned, otherwise one might corner himself. So,
   1006 how does <tt>nqmgr</tt> really use the free slots?
   1007 
   1008 </p>
   1009 
   1010 <p>
   1011 
   1012 The key idea is that one does not have to generate the free slots
   1013 in a uniform way. The delivery sequence <tt>111...1</tt> is no
   1014 worse than <tt>1.1.1.1</tt>, in fact, it is even better as some
   1015 entries are in the first case selected earlier than in the second
   1016 case, and none is selected later! So it is possible to first
   1017 "accumulate" the free delivery slots and then use them all at once.
   1018 It is even possible to accumulate some, then use them, then accumulate
   1019 some more and use them again, as in <tt>11..1.1</tt> .
   1020 
   1021 </p>
   1022 
   1023 <p>
   1024 
   1025 Let's get back to the one hundred recipient example. We now know
   1026 that we could first accumulate one hundred free slots, and only
   1027 after then to preempt the first job and sneak the one hundred
   1028 recipient mail in. Applying the algorithm recursively, we see the
   1029 hundred recipient job can accumulate ten free delivery slots, and
   1030 then we could preempt it and sneak in the ten-recipient mail...
   1031 Wait wait wait! Could we? Aren't we overinflating the original one
   1032 thousand recipient mail?
   1033 
   1034 </p>
   1035 
   1036 <p>
   1037 
   1038 Well, despite the fact that it looks so at the first glance, another trick will
   1039 allow us to answer "no, we are not!". If we had said that we will
   1040 inflate the delivery time twice at maximum, and then we consider
   1041 every other slot as a free slot, then we would overinflate in case
   1042 of the recursive preemption. BUT! The trick is that if we use only
   1043 every n-th slot as a free slot for n&gt;2, there is always some worst
   1044 inflation factor which we can guarantee not to be breached, even
   1045 if we apply the algorithm recursively. To be precise, if for every
   1046 k&gt;1 normally used slots we accumulate one free delivery slot, than
   1047 the inflation factor is not worse than k/(k-1) no matter how many
   1048 recursive preemptions happen. And it's not worse than (k+1)/k if
   1049 only non-recursive preemption happens. Now, having got through the
   1050 theory and the related math, let's see how <tt>nqmgr</tt> implements
   1051 this.
   1052 
   1053 </p>
   1054 
   1055 <p>
   1056 
   1057 Each job has so called "available delivery slot" counter. Each
   1058 transport has a <i>transport</i>_delivery_slot_cost parameter, which
   1059 defaults to default_delivery_slot_cost parameter which is set to 5
   1060 by default. This is the k from the paragraph above. Each time k
   1061 entries of the job are selected for delivery, this counter is
   1062 incremented by one. Once there are some slots accumulated, a job which
   1063 requires no more than that number of slots to be fully delivered
   1064 can preempt this job.
   1065 
   1066 </p>
   1067 
   1068 <p>
   1069 
   1070 [Well, the truth is, the counter is incremented every time an entry
   1071 is selected and it is divided by k when it is used.
   1072 But to understand, it's good enough to use
   1073 the above approximation of the truth.]
   1074 
   1075 </p>
   1076 
   1077 <p>
   1078 
   1079 OK, so now we know the conditions which must be satisfied so one
   1080 job can preempt another one. But what job gets preempted, how do
   1081 we choose what job preempts it if there are several valid candidates,
   1082 and when does all this exactly happen?
   1083 
   1084 </p>
   1085 
   1086 <p>
   1087 
   1088 The answer for the first part is simple. The job whose entry was
   1089 selected the last time is the so called current job. Normally, it is
   1090 the first job on the scheduler's job list, but destination concurrency
   1091 limits may change this as we will see later. It is always only the
   1092 current job which may get preempted.
   1093 
   1094 </p>
   1095 
   1096 <p>
   1097 
   1098 Now for the second part. The current job has a certain amount of
   1099 recipient entries, and as such may accumulate at maximum some amount
   1100 of available delivery slots. It might have already accumulated some,
   1101 and perhaps even already used some when it was preempted before
   1102 (remember a job can be preempted several times). In either case,
   1103 we know how many are accumulated and how many are left to deliver,
   1104 so we know how many it may yet accumulate at maximum. Every other
   1105 job which may be delivered by less than that number of slots is a
   1106 valid candidate for preemption. How do we choose among them?
   1107 
   1108 </p>
   1109 
   1110 <p>
   1111 
   1112 The answer is - the one with maximum enqueue_time/recipient_entry_count.
   1113 That is, the older the job is, the more we should try to deliver
   1114 it in order to get best message delivery rates. These rates are of
   1115 course subject to how many recipients the message has, therefore
   1116 the division by the recipient (entry) count. No one shall be surprised
   1117 that a message with n recipients takes n times longer to deliver than
   1118 a message with one recipient.
   1119 
   1120 </p>
   1121 
   1122 <p>
   1123 
   1124 Now let's recap the previous two paragraphs. Isn't it too complicated?
   1125 Why don't the candidates come only among the jobs which can be
   1126 delivered within the number of slots the current job already
   1127 accumulated? Why do we need to estimate how much it has yet to
   1128 accumulate? If you found out the answer, congratulate yourself. If
   1129 we did it this simple way, we would always choose the candidate
   1130 with the fewest recipient entries. If there were enough single recipient
   1131 mails coming in, they would always slip by the bulk mail as soon
   1132 as possible, and the two or more recipients mail would never get
   1133 a chance, no matter how long they have been sitting around in the
   1134 job list.
   1135 
   1136 </p>
   1137 
   1138 <p>
   1139 
   1140 This candidate selection has an interesting implication - that when
   1141 we choose the best candidate for preemption (this is done in
   1142 qmgr_choose_candidate()), it may happen that we may not use it for
   1143 preemption immediately. This leads to an answer to the last part
   1144 of the original question - when does the preemption happen?
   1145 
   1146 </p>
   1147 
   1148 <p>
   1149 
   1150 The preemption attempt happens every time next transport's recipient
   1151 entry is to be chosen for delivery. To avoid needless overhead, the
   1152 preemption is not attempted if the current job could never accumulate
   1153 more than <i>transport</i>_minimum_delivery_slots (defaults to
   1154 default_minimum_delivery_slots which defaults to 3). If there are
   1155 already enough accumulated slots to preempt the current job by the
   1156 chosen best candidate, it is done immediately. This basically means
   1157 that the candidate is moved in front of the current job on the
   1158 scheduler's job list and decreasing the accumulated slot counter
   1159 by the amount used by the candidate. If there are not enough slots...
   1160 well, I could say that nothing happens and the another preemption
   1161 is attempted the next time. But that's not the complete truth.
   1162 
   1163 </p>
   1164 
   1165 <p>
   1166 
   1167 The truth is that it turns out that it is not really necessary to
   1168 wait until the jobs counter accumulates all the delivery slots in
   1169 advance. Say we have ten-recipient mail followed by two two-recipient
   1170 mails. If the preemption happened when enough delivery slots accumulate
   1171 (assuming slot cost 2), the delivery sequence becomes
   1172 <tt>11112211113311</tt>. Now what would we get if we would wait
   1173 only for 50% of the necessary slots to accumulate and we promise
   1174 we would wait for the remaining 50% later, after we get back
   1175 to the preempted job? If we use such a slot loan, the delivery sequence
   1176 becomes <tt>11221111331111</tt>. As we can see, it makes it not
   1177 considerably worse for the delivery of the ten-recipient mail, but
   1178 it allows the small messages to be delivered sooner.
   1179 
   1180 </p>
   1181 
   1182 <p>
   1183 
   1184 The concept of these slot loans is where the
   1185 <i>transport</i>_delivery_slot_discount and
   1186 <i>transport</i>_delivery_slot_loan come from (they default to
   1187 default_delivery_slot_discount and default_delivery_slot_loan, whose
   1188 values are by default 50 and 3, respectively). The discount (resp.
   1189 loan) specifies how many percent (resp. how many slots) one "gets
   1190 in advance", when the number of slots required to deliver the best
   1191 candidate is compared with the number of slots the current slot had
   1192 accumulated so far.
   1193 
   1194 </p>
   1195 
   1196 <p>
   1197 
   1198 And that pretty much concludes this chapter.
   1199 
   1200 </p>
   1201 
   1202 <p>
   1203 
   1204 [Now you should have a feeling that you pretty much understand the
   1205 scheduler and the preemption, or at least that you will have
   1206 after you read the last chapter a couple more times. You shall clearly
   1207 see the job list and the preemption happening at its head, in ideal
   1208 delivery conditions. The feeling of understanding shall last until
   1209 you start wondering what happens if some of the jobs are blocked,
   1210 which you might eventually figure out correctly from what had been
   1211 said already. But I would be surprised if your mental image of the
   1212 scheduler's functionality is not completely shattered once you
   1213 start wondering how it works when not all recipients may be read
   1214 in-core.  More on that later.]
   1215 
   1216 </p>
   1217 
   1218 <h3> <a name="<tt>nqmgr</tt>_concurrency"> How destination concurrency
   1219 limits affect the scheduling algorithm </a> </h3>
   1220 
   1221 <p>
   1222 
   1223 The <tt>nqmgr</tt> uses the same algorithm for destination concurrency
   1224 control as <tt>oqmgr</tt>. Now what happens when the destination
   1225 limits are reached and no more entries for that destination may be
   1226 selected by the scheduler?
   1227 
   1228 </p>
   1229 
   1230 <p>
   1231 
   1232 From the user's point of view it is all simple. If some of the peers
   1233 of a job can't be selected, those peers are simply skipped by the
   1234 entry selection algorithm (the pseudo-code described before) and
   1235 only the selectable ones are used. If none of the peers may be
   1236 selected, the job is declared a "blocker job". Blocker jobs are
   1237 skipped by the entry selection algorithm and they are also excluded
   1238 from the candidates for preemption of the current job. Thus the scheduler
   1239 effectively behaves as if the blocker jobs didn't exist on the job
   1240 list at all. As soon as at least one of the peers of a blocker job
   1241 becomes unblocked (that is, the delivery agent handling the delivery
   1242 of the recipient entry for the given destination successfully finishes),
   1243 the job's blocker status is removed and the job again participates
   1244 in all further scheduler actions normally.
   1245 
   1246 </p>
   1247 
   1248 <p>
   1249 
   1250 So the summary is that the users don't really have to be concerned
   1251 about the interaction of the destination limits and scheduling
   1252 algorithm. It works well on its own and there are no knobs they
   1253 would need to control it.
   1254 
   1255 </p>
   1256 
   1257 <p>
   1258 
   1259 From a programmer's point of view, the blocker jobs complicate the
   1260 scheduler quite a lot. Without them, the jobs on the job list would
   1261 be normally delivered in strict FIFO order. If the current job is
   1262 preempted, the job preempting it is completely delivered unless it
   1263 is preempted itself. Without blockers, the current job is thus
   1264 always either the first job on the job list, or the top of the stack
   1265 of jobs preempting the first job on the job list.
   1266 
   1267 </p>
   1268 
   1269 <p>
   1270 
   1271 The visualization of the job list and the preemption stack without
   1272 blockers would be like this:
   1273 
   1274 </p>
   1275 
   1276 <blockquote>
   1277 <pre>
   1278 first job-&gt;    1--2--3--5--6--8--...    &lt;- job list
   1279 on job list    |
   1280                4    &lt;- preemption stack
   1281                |
   1282 current job-&gt;  7
   1283 </pre>
   1284 </blockquote>
   1285 
   1286 <p>
   1287 
   1288 In the example above we see that job 1 was preempted by job 4 and
   1289 then job 4 was preempted by job 7. After job 7 is completed, remaining
   1290 entries of job 4 are selected, and once they are all selected, job
   1291 1 continues.
   1292 
   1293 </p>
   1294 
   1295 <p>
   1296 
   1297 As we see, it's all very clean and straightforward. Now how does
   1298 this change because of blockers?
   1299 
   1300 </p>
   1301 
   1302 <p>
   1303 
   1304 The answer is: a lot. Any job may become a blocker job at any time,
   1305 and also become a normal job again at any time. This has several
   1306 important implications:
   1307 
   1308 </p>
   1309 
   1310 <ol>
   1311 
   1312 <li> <p>
   1313 
   1314 The jobs may be completed in arbitrary order. For example, in the
   1315 example above, if the current job 7 becomes blocked, the next job
   1316 4 may complete before the job 7 becomes unblocked again. Or if both
   1317 7 and 4 are blocked, then 1 is completed, then 7 becomes unblocked
   1318 and is completed, then 2 is completed and only after that 4 becomes
   1319 unblocked and is completed... You get the idea.
   1320 
   1321 </p>
   1322 
   1323 <p>
   1324 
   1325 [Interesting side note: even when jobs are delivered out of order,
   1326 from a single destination's point of view the jobs are still delivered
   1327 in the expected order (that is, FIFO unless there was some preemption
   1328 involved). This is because whenever a destination queue becomes
   1329 unblocked (the destination limit allows selection of more recipient
   1330 entries for that destination), all jobs which have peers for that
   1331 destination are unblocked at once.]
   1332 
   1333 </p>
   1334 
   1335 <li> <p>
   1336 
   1337 The idea of the preemption stack at the head of the job list is
   1338 gone.  That is, it must be possible to preempt any job on the job
   1339 list. For example, if the jobs 7, 4, 1 and 2 in the example above
   1340 become all blocked, job 3 becomes the current job. And of course
   1341 we do not want the preemption to be affected by the fact that there
   1342 are some blocked jobs or not. Therefore, if it turns out that job
   1343 3 might be preempted by job 6, the implementation shall make it
   1344 possible.
   1345 
   1346 </p>
   1347 
   1348 <li> <p>
   1349 
   1350 The idea of the linear preemption stack itself is gone. It's no
   1351 longer true that one job is always preempted by only one job at one
   1352 time (that is directly preempted, not counting the recursively
   1353 nested jobs). For example, in the example above, job 1 is directly
   1354 preempted by only job 4, and job 4 by job 7. Now assume job 7 becomes
   1355 blocked, and job 4 is being delivered. If it accumulates enough
   1356 delivery slots, it is natural that it might be preempted for example
   1357 by job 8. Now job 4 is preempted by both job 7 AND job 8 at the
   1358 same time.
   1359 
   1360 </p>
   1361 
   1362 </ol>
   1363 
   1364 <p>
   1365 
   1366 Now combine the points 2) and 3) with point 1) again and you realize
   1367 that the relations on the once linear job list became pretty
   1368 complicated. If we extend the point 3) example: jobs 7 and 8 preempt
   1369 job 4, now job 8 becomes blocked too, then job 4 completes. Tricky,
   1370 huh?
   1371 
   1372 </p>
   1373 
   1374 <p>
   1375 
   1376 If I illustrate the relations after the above mentioned examples
   1377 (but those in point 1), the situation would look like this:
   1378 
   1379 </p>
   1380 
   1381 <blockquote>
   1382 <pre>
   1383                             v- parent
   1384 
   1385 adoptive parent -&gt;    1--2--3--5--...      &lt;- "stack" level 0
   1386                       |     |
   1387 parent gone -&gt;        ?     6              &lt;- "stack" level 1
   1388                      / \
   1389 children -&gt;         7   8   ^- child       &lt;- "stack" level 2
   1390 
   1391                       ^- siblings
   1392 </pre>
   1393 </blockquote>
   1394 
   1395 <p>
   1396 
   1397 Now how does <tt>nqmgr</tt> deal with all these complicated relations?
   1398 
   1399 </p>
   1400 
   1401 <p>
   1402 
   1403 Well, it maintains them all as described, but fortunately, all these
   1404 relations are necessary only for the purpose of proper counting of
   1405 available delivery slots. For the purpose of ordering the jobs for
   1406 entry selection, the original rule still applies: "the job preempting
   1407 the current job is moved in front of the current job on the job
   1408 list". So for entry selection purposes, the job relations remain
   1409 as simple as this:
   1410 
   1411 </p>
   1412 
   1413 <blockquote>
   1414 <pre>
   1415 7--8--1--2--6--3--5--..   &lt;- scheduler's job list order
   1416 </pre>
   1417 </blockquote>
   1418 
   1419 <p>
   1420 
   1421 The job list order and the preemption parent/child/siblings relations
   1422 are maintained separately. And because the selection works only
   1423 with the job list, you can happily forget about those complicated
   1424 relations unless you want to study the <tt>nqmgr</tt> sources. In
   1425 that case the text above might provide some helpful introduction
   1426 to the problem domain. Otherwise I suggest you just forget about
   1427 all this and stick with the user's point of view: the blocker jobs
   1428 are simply ignored.
   1429 
   1430 </p>
   1431 
   1432 <p>
   1433 
   1434 [By now, you should have a feeling that there are more things going
   1435 on under the hood than you ever wanted to know. You decide that
   1436 forgetting about this chapter is the best you can do for the sake
   1437 of your mind's health and you basically stick with the idea how the
   1438 scheduler works in ideal conditions, when there are no blockers,
   1439 which is good enough.]
   1440 
   1441 </p>
   1442 
   1443 <h3> <a name="<tt>nqmgr</tt>_memory"> Dealing with memory resource
   1444 limits </a> </h3>
   1445 
   1446 <p>
   1447 
   1448 When discussing the <tt>nqmgr</tt> scheduler, we have so far assumed
   1449 that all recipients of all messages in the "active" queue are completely
   1450 read into memory. This is simply not true. There is an upper
   1451 bound on the amount of memory the <tt>nqmgr</tt> may use, and
   1452 therefore it must impose some limits on the information it may store
   1453 in memory at any given time.
   1454 
   1455 </p>
   1456 
   1457 <p>
   1458 
   1459 First of all, not all messages may be read in-core at once. At any
   1460 time, only qmgr_message_active_limit messages may be read in-core
   1461 at maximum. When read into memory, the messages are picked from the
   1462 "incoming" and "deferred" queues and moved to the "active" queue
   1463 (incoming having priority), so if there are more than
   1464 qmgr_message_active_limit messages queued in the "active" queue, the
   1465 rest will have to wait until (some of) the messages in the "active" queue
   1466 are completely delivered (or deferred).
   1467 
   1468 </p>
   1469 
   1470 <p>
   1471 
   1472 Even with the limited amount of in-core messages, there is another
   1473 limit which must be imposed in order to avoid memory exhaustion.
   1474 Each message may contain a huge number of recipients (tens or hundreds
   1475 of thousands are not uncommon), so if <tt>nqmgr</tt> read all
   1476 recipients of all messages in the "active" queue, it may easily run
   1477 out of memory. Therefore there must be some upper bound on the
   1478 amount of message recipients which are read into memory at the
   1479 same time.
   1480 
   1481 </p>
   1482 
   1483 <p>
   1484 
   1485 Before discussing how exactly <tt>nqmgr</tt> implements the recipient
   1486 limits, let's see how the sole existence of the limits themselves
   1487 affects the <tt>nqmgr</tt> and its scheduler.
   1488 
   1489 </p>
   1490 
   1491 <p>
   1492 
   1493 The message limit is straightforward - it just limits the size of
   1494 the
   1495 lookahead the <tt>nqmgr</tt>'s scheduler has when choosing which
   1496 message can preempt the current one. Messages not in the "active" queue
   1497 are simply not considered at all.
   1498 
   1499 </p>
   1500 
   1501 <p>
   1502 
   1503 The recipient limit complicates more things. First of all, the
   1504 message reading code must support reading the recipients in batches,
   1505 which among other things means accessing the queue file several
   1506 times and continuing where the last recipient batch ended. This is
   1507 invoked by the scheduler whenever the current job has space for more
   1508 recipients, subject to transport's refill_limit and refill_delay parameters.
   1509 It is also done any time when all
   1510 in-core recipients of the message are dealt with (which may also
   1511 mean they were deferred) but there are still more in the queue file.
   1512 
   1513 </p>
   1514 
   1515 <p>
   1516 
   1517 The second complication is that with some recipients left unread
   1518 in the queue file, the scheduler can't operate with exact counts
   1519 of recipient entries. With unread recipients, it is not clear how
   1520 many recipient entries there will be, as they are subject to
   1521 per-destination grouping. It is not even clear to what transports
   1522 (and thus jobs) the recipients will be assigned. And with messages
   1523 coming from the "deferred" queue, it is not even clear how many unread
   1524 recipients are still to be delivered. This all means that the
   1525 scheduler must use only estimates of how many recipients entries
   1526 there will be.  Fortunately, it is possible to estimate the minimum
   1527 and maximum correctly, so the scheduler can always err on the safe
   1528 side.  Obviously, the better the estimates, the better the results, so
   1529 it is best when we are able to read all recipients in-core and turn
   1530 the estimates into exact counts, or at least try to read as many
   1531 as possible to make the estimates as accurate as possible.
   1532 
   1533 </p>
   1534 
   1535 <p>
   1536 
   1537 The third complication is that it is no longer true that the scheduler
   1538 is done with a job once all of its in-core recipients are delivered.
   1539 It is possible that the job will be revived later, when another
   1540 batch of recipients is read in core. It is also possible that some
   1541 jobs will be created for the first time long after the first batch
   1542 of recipients was read in core. The <tt>nqmgr</tt> code must be
   1543 ready to handle all such situations.
   1544 
   1545 </p>
   1546 
   1547 <p>
   1548 
   1549 And finally, the fourth complication is that the <tt>nqmgr</tt>
   1550 code must somehow impose the recipient limit itself. Now how does
   1551 it achieve it?
   1552 
   1553 </p>
   1554 
   1555 <p>
   1556 
   1557 Perhaps the easiest solution would be to say that each message may
   1558 have at maximum X recipients stored in-core, but such a solution would
   1559 be poor for several reasons. With reasonable qmgr_message_active_limit
   1560 values, the X would have to be quite low to maintain a reasonable
   1561 memory footprint. And with low X lots of things would not work well.
   1562 The <tt>nqmgr</tt> would have problems to use the
   1563 <i>transport</i>_destination_recipient_limit efficiently. The
   1564 scheduler's preemption would be suboptimal as the recipient count
   1565 estimates would be inaccurate. The message queue file would have
   1566 to be accessed many times to read in more recipients again and
   1567 again.
   1568 
   1569 </p>
   1570 
   1571 <p>
   1572 
   1573 Therefore it seems reasonable to have a solution which does not use
   1574 a limit imposed on a per-message basis, but which maintains a pool
   1575 of available recipient slots, which can be shared among all messages
   1576 in the most efficient manner. And as we do not want separate
   1577 transports to compete for resources whenever possible, it seems
   1578 appropriate to maintain such a recipient pool for each transport
   1579 separately. This is the general idea, now how does it work in
   1580 practice?
   1581 
   1582 </p>
   1583 
   1584 <p>
   1585 
   1586 First we have to solve a little chicken-and-egg problem. If we want
   1587 to use the per-transport recipient pools, we first need to know to
   1588 what transport(s) the message is assigned. But we will find that
   1589 out only after we first read in the recipients. So it is obvious
   1590 that we first have to read in some recipients, use them to find out
   1591 to what transports the message is to be assigned, and only after
   1592 that can we use the per-transport recipient pools.
   1593 
   1594 </p>
   1595 
   1596 <p>
   1597 
   1598 Now how many recipients shall we read for the first time? This is
   1599 what qmgr_message_recipient_minimum and qmgr_message_recipient_limit
   1600 values control. The qmgr_message_recipient_minimum value specifies
   1601 how many recipients of each message we will read the first time,
   1602 no matter what.  It is necessary to read at least one recipient
   1603 before we can assign the message to a transport and create the first
   1604 job. However, reading only qmgr_message_recipient_minimum recipients
   1605 even if there are only few messages with few recipients in-core would
   1606 be wasteful. Therefore if there are fewer than qmgr_message_recipient_limit
   1607 recipients in-core so far, the first batch of recipients may be
   1608 larger than qmgr_message_recipient_minimum - as large as is required
   1609 to reach the qmgr_message_recipient_limit limit.
   1610 
   1611 </p>
   1612 
   1613 <p>
   1614 
   1615 Once the first batch of recipients was read in core and the message
   1616 jobs were created, the size of the subsequent recipient batches (if
   1617 any - of course it's best when all recipients are read in one batch)
   1618 is based solely on the position of the message jobs on their
   1619 corresponding transports' job lists. Each transport has a pool of
   1620 <i>transport</i>_recipient_limit recipient slots which it can
   1621 distribute among its jobs (how this is done is described later).
   1622 The subsequent recipient batch may be as large as the sum of all
   1623 recipient slots of all jobs of the message permits (plus the
   1624 qmgr_message_recipient_minimum amount which always applies).
   1625 
   1626 </p>
   1627 
   1628 <p>
   1629 
   1630 For example, if a message has three jobs, the first with 1 recipient
   1631 still in-core and 4 recipient slots, the second with 5 recipients in-core
   1632 and 5 recipient slots, and the third with 2 recipients in-core and 0
   1633 recipient slots, it has 1+5+2=8 recipients in-core and 4+5+0=9 jobs'
   1634 recipients slots in total. This means that we could immediately
   1635 read 2+qmgr_message_recipient_minimum more recipients of that message
   1636 in core.
   1637 
   1638 </p>
   1639 
   1640 <p>
   1641 
   1642 The above example illustrates several things which might be worth
   1643 mentioning explicitly: first, note that although the per-transport
   1644 slots are assigned to particular jobs, we can't guarantee that once
   1645 the next batch of recipients is read in core, that the corresponding
   1646 amounts of recipients will be assigned to those jobs. The jobs lend
   1647 its slots to the message as a whole, so it is possible that some
   1648 jobs end up sponsoring other jobs of their message. For example,
   1649 if in the example above the 2 newly read recipients were assigned
   1650 to the second job, the first job sponsored the second job with 2
   1651 slots. The second notable thing is the third job, which has more
   1652 recipients in-core than it has slots. Apart from sponsoring by other
   1653 job we just saw it can be result of the first recipient batch, which
   1654 is sponsored from global recipient pool of qmgr_message_recipient_limit
   1655 recipients. It can be also sponsored from the message recipient
   1656 pool of qmgr_message_recipient_minimum recipients.
   1657 
   1658 </p>
   1659 
   1660 <p>
   1661 
   1662 Now how does each transport distribute the recipient slots among
   1663 its jobs?  The strategy is quite simple. As most scheduler activity
   1664 happens on the head of the job list, it is our intention to make
   1665 sure that the scheduler has the best estimates of the recipient
   1666 counts for those jobs. As we mentioned above, this means that we
   1667 want to try to make sure that the messages of those jobs have all
   1668 recipients read in-core. Therefore the transport distributes the
   1669 slots "along" the job list from start to end. In this case the job
   1670 list sorted by message enqueue time is used, because it doesn't
   1671 change over time as the scheduler's job list does.
   1672 
   1673 </p>
   1674 
   1675 <p>
   1676 
   1677 More specifically, each time a job is created and appended to the
   1678 job list, it gets all unused recipient slots from its transport's
   1679 pool. It keeps them until all recipients of its message are read.
   1680 When this happens, all unused recipient slots are transferred to
   1681 the next job (which is now in fact the first such job) on the job
   1682 list which still has some recipients unread, or eventually back to
   1683 the transport pool if there is no such job. Such a transfer then also
   1684 happens whenever a recipient entry of that job is delivered.
   1685 
   1686 </p>
   1687 
   1688 <p>
   1689 
   1690 There is also a scenario when a job is not appended to the end of
   1691 the job list (for example it was created as a result of a second or
   1692 later recipient batch). Then it works exactly as above, except that
   1693 if it was put in front of the first unread job (that is, the job
   1694 of a message which still has some unread recipients in the queue file),
   1695 that job is first forced to return all of its unused recipient slots
   1696 to the transport pool.
   1697 
   1698 </p>
   1699 
   1700 <p>
   1701 
   1702 The algorithm just described leads to the following state: The first
   1703 unread job on the job list always gets all the remaining recipient
   1704 slots of that transport (if there are any). The jobs queued before
   1705 this job are completely read (that is, all recipients of their
   1706 message were already read in core) and have at maximum as many slots
   1707 as they still have recipients in-core (the maximum is there because
   1708 of the sponsoring mentioned before) and the jobs after this job get
   1709 nothing from the transport recipient pool (unless they got something
   1710 before and then the first unread job was created and enqueued in
   1711 front of them later - in such a case, they also get at maximum as many
   1712 slots as they have recipients in-core).
   1713 
   1714 </p>
   1715 
   1716 <p>
   1717 
   1718 Things work fine in such a state for most of the time, because the
   1719 current job is either completely read in-core or has as many recipient
   1720 slots as there are, but there is one situation which we still have
   1721 to take care of specially.  Imagine if the current job is preempted
   1722 by some unread job from the job list and there are no more recipient
   1723 slots available, so this new current job could read only batches
   1724 of qmgr_message_recipient_minimum recipients at a time. This would
   1725 really degrade performance. For this reason, each transport has an
   1726 extra pool of <i>transport</i>_extra_recipient_limit recipient
   1727 slots, dedicated exactly for this situation. Each time an unread
   1728 job preempts the current job, it gets half of the remaining recipient
   1729 slots from the normal pool and this extra pool.
   1730 
   1731 </p>
   1732 
   1733 <p>
   1734 
   1735 And that's it. It sure does sound pretty complicated, but fortunately
   1736 most people don't really have to care exactly how it works as long
   1737 as it works.  Perhaps the only important things to know for most
   1738 people are the following upper bound formulas:
   1739 
   1740 </p>
   1741 
   1742 <p>
   1743 
   1744 Each transport has at maximum
   1745 
   1746 </p>
   1747 
   1748 <blockquote>
   1749 <pre>
   1750 max(
   1751 qmgr_message_recipient_minimum * qmgr_message_active_limit
   1752 + *_recipient_limit + *_extra_recipient_limit,
   1753 qmgr_message_recipient_limit
   1754 )
   1755 </pre>
   1756 </blockquote>
   1757 
   1758 <p>
   1759 
   1760 recipients in core.
   1761 
   1762 </p>
   1763 
   1764 <p>
   1765 
   1766 The total amount of recipients in core is
   1767 
   1768 </p>
   1769 
   1770 <blockquote>
   1771 <pre>
   1772 max(
   1773 qmgr_message_recipient_minimum * qmgr_message_active_limit
   1774 + sum( *_recipient_limit + *_extra_recipient_limit ),
   1775 qmgr_message_recipient_limit
   1776 )
   1777 </pre>
   1778 </blockquote>
   1779 
   1780 <p>
   1781 
   1782 where the sum is over all used transports.
   1783 
   1784 </p>
   1785 
   1786 <p>
   1787 
   1788 And this terribly complicated chapter concludes the documentation
   1789 of the <tt>nqmgr</tt> scheduler.
   1790 
   1791 </p>
   1792 
   1793 <p>
   1794 
   1795 [By now you should theoretically know the <tt>nqmgr</tt> scheduler
   1796 inside out. In practice, you still hope that you will never have
   1797 to really understand the last or last two chapters completely, and
   1798 fortunately most people really won't. Understanding how the scheduler
   1799 works in ideal conditions is more than good enough for the vast majority
   1800 of users.]
   1801 
   1802 </p>
   1803 
   1804 <h2> <a name="credits"> Credits </a> </h2>
   1805 
   1806 <ul>
   1807 
   1808 <li> Wietse Venema designed and implemented the initial queue manager
   1809 with per-domain FIFO scheduling, and per-delivery +/-1 concurrency
   1810 feedback.
   1811 
   1812 <li> Patrik Rak designed and implemented preemption where mail with
   1813 fewer recipients can slip past mail with more recipients in a
   1814 controlled manner, and wrote up its documentation.
   1815 
   1816 <li> Wietse Venema initiated a discussion with Patrik Rak and Victor
   1817 Duchovni on alternatives for the +/-1 feedback scheduler's aggressive
   1818 behavior. This is when K/N feedback was reviewed (N = concurrency).
   1819 The discussion ended without a good solution for both negative
   1820 feedback and dead site detection.
   1821 
   1822 <li> Victor Duchovni resumed work on concurrency feedback in the
   1823 context of concurrency-limited servers.
   1824 
   1825 <li> Wietse Venema then re-designed the concurrency scheduler in
   1826 terms of the simplest possible concepts: less-than-1 concurrency
   1827 feedback per delivery, forward and reverse concurrency feedback
   1828 hysteresis, and pseudo-cohort failure. At this same time, concurrency
   1829 feedback was separated from dead site detection.
   1830 
   1831 <li> These simplifications, and their modular implementation, helped
   1832 to develop further insights into the different roles that positive
   1833 and negative concurrency feedback play, and helped to identify some
   1834 worst-case scenarios.
   1835 
   1836 </ul>
   1837 
   1838 </body>
   1839 
   1840 </html>
   1841