Home | History | Annotate | Line # | Download | only in proto
      1      1.1      tron <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
      2  1.1.1.6  christos 	"https://www.w3.org/TR/html4/loose.dtd">
      3      1.1      tron 
      4      1.1      tron <html>
      5      1.1      tron 
      6      1.1      tron <head>
      7      1.1      tron 
      8      1.1      tron <title>Postfix Queue Scheduler</title>
      9      1.1      tron 
     10  1.1.1.4  christos <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
     11  1.1.1.5  christos <link rel='stylesheet' type='text/css' href='postfix-doc.css'>
     12      1.1      tron 
     13      1.1      tron </head>
     14      1.1      tron 
     15      1.1      tron <body>
     16      1.1      tron 
     17      1.1      tron <h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
     18      1.1      tron Queue Scheduler</h1>
     19      1.1      tron 
     20      1.1      tron <hr>
     21      1.1      tron 
     22  1.1.1.2      tron <h2> Disclaimer </h2>
     23  1.1.1.2      tron 
     24  1.1.1.2      tron <p> Many of the <i>transport</i>-specific configuration parameters
     25  1.1.1.2      tron discussed in this document will not show up in "postconf" command
     26  1.1.1.2      tron output before Postfix version 2.9. This limitation applies to many
     27  1.1.1.2      tron parameters whose name is a combination of a master.cf service name
     28  1.1.1.2      tron such as "relay" and a built-in suffix such as
     29  1.1.1.2      tron "_destination_concurrency_limit". </p>
     30  1.1.1.2      tron 
     31      1.1      tron <h2> Overview </h2>
     32      1.1      tron 
     33      1.1      tron <p> The queue manager is by far the most complex part of the Postfix
     34      1.1      tron mail system. It schedules delivery of new mail, retries failed
     35      1.1      tron deliveries at specific times, and removes mail from the queue after
     36      1.1      tron the last delivery attempt.  There are two major classes of mechanisms
     37      1.1      tron that control the operation of the queue manager. </p>
     38      1.1      tron 
     39      1.1      tron <p> Topics covered by this document: </p>
     40      1.1      tron 
     41      1.1      tron <ul>
     42      1.1      tron 
     43      1.1      tron <li> <a href="#concurrency"> Concurrency scheduling</a>, concerned
     44      1.1      tron with the number of concurrent deliveries to a specific destination,
     45      1.1      tron including decisions on when to suspend deliveries after persistent
     46      1.1      tron failures.
     47      1.1      tron 
     48      1.1      tron <li> <a href="#jobs"> Preemptive scheduling</a>, concerned with
     49      1.1      tron the selection of email messages and recipients for a given destination.
     50      1.1      tron 
     51      1.1      tron <li> <a href="#credits"> Credits</a>, something this document would not be
     52      1.1      tron complete without.
     53      1.1      tron 
     54      1.1      tron </ul>
     55      1.1      tron 
     56      1.1      tron <!--
     57      1.1      tron 
     58      1.1      tron <p> Once started, the qmgr(8) process runs until "postfix reload"
     59      1.1      tron or "postfix stop".  As a persistent process, the queue manager has
     60      1.1      tron to meet strict requirements with respect to code correctness and
     61      1.1      tron robustness. Unlike non-persistent daemon processes, the queue manager
     62      1.1      tron cannot benefit from Postfix's process rejuvenation mechanism that
     63      1.1      tron limit the impact from resource leaks and other coding errors
     64      1.1      tron (translation: replacing a process after a short time covers up bugs
     65      1.1      tron before they can become a problem).  </p>
     66      1.1      tron 
     67      1.1      tron -->
     68      1.1      tron 
     69      1.1      tron <h2> <a name="concurrency"> Concurrency scheduling </a> </h2>
     70      1.1      tron 
     71      1.1      tron <p> The following sections document the Postfix 2.5 concurrency
     72  1.1.1.3      tron scheduler, after a discussion of the limitations of the earlier
     73      1.1      tron concurrency scheduler. This is followed by results of medium-concurrency
     74      1.1      tron experiments, and a discussion of trade-offs between performance and
     75      1.1      tron robustness.  </p>
     76      1.1      tron 
     77      1.1      tron <p> The material is organized as follows: </p>
     78      1.1      tron 
     79      1.1      tron <ul>
     80      1.1      tron 
     81      1.1      tron <li> <a href="#concurrency_drawbacks"> Drawbacks of the existing
     82      1.1      tron concurrency scheduler </a>
     83      1.1      tron 
     84      1.1      tron <li> <a href="#concurrency_summary_2_5"> Summary of the Postfix 2.5
     85      1.1      tron concurrency feedback algorithm </a>
     86      1.1      tron 
     87      1.1      tron <li> <a href="#dead_summary_2_5"> Summary of the Postfix 2.5 "dead
     88      1.1      tron destination" detection algorithm </a>
     89      1.1      tron 
     90      1.1      tron <li> <a href="#pseudo_code_2_5"> Pseudocode for the Postfix 2.5
     91      1.1      tron concurrency scheduler </a>
     92      1.1      tron 
     93      1.1      tron <li> <a href="#concurrency_results"> Results for delivery to
     94      1.1      tron concurrency limited servers </a>
     95      1.1      tron 
     96      1.1      tron <li> <a href="#concurrency_discussion"> Discussion of concurrency
     97      1.1      tron limited server results </a>
     98      1.1      tron 
     99      1.1      tron <li> <a href="#concurrency_limitations"> Limitations of less-than-1
    100      1.1      tron per delivery feedback </a>
    101      1.1      tron 
    102      1.1      tron <li> <a href="#concurrency_config"> Concurrency configuration
    103      1.1      tron parameters </a>
    104      1.1      tron 
    105      1.1      tron </ul>
    106      1.1      tron 
    107      1.1      tron <h3> <a name="concurrency_drawbacks"> Drawbacks of the existing
    108      1.1      tron concurrency scheduler </a> </h3>
    109      1.1      tron 
    110      1.1      tron <p> From the start, Postfix has used a simple but robust algorithm
    111      1.1      tron where the per-destination delivery concurrency is decremented by 1
    112      1.1      tron after delivery failed due to connection or handshake failure, and
    113      1.1      tron incremented by 1 otherwise.  Of course the concurrency is never
    114      1.1      tron allowed to exceed the maximum per-destination concurrency limit.
    115      1.1      tron And when a destination's concurrency level drops to zero, the
    116      1.1      tron destination is declared "dead" and delivery is suspended.  </p>
    117      1.1      tron 
    118      1.1      tron <p> Drawbacks of +/-1 concurrency feedback per delivery are: <p>
    119      1.1      tron 
    120      1.1      tron <ul>
    121      1.1      tron 
    122      1.1      tron <li> <p> Overshoot due to exponential delivery concurrency growth
    123      1.1      tron with each pseudo-cohort(*). This can be an issue with high-concurrency
    124      1.1      tron channels. For example, with the default initial concurrency of 5,
    125      1.1      tron concurrency would proceed over time as (5-10-20).  </p>
    126      1.1      tron 
    127      1.1      tron <li> <p> Throttling down to zero concurrency after a single
    128      1.1      tron pseudo-cohort(*) failure. This was especially an issue with
    129      1.1      tron low-concurrency channels where a single failure could be sufficient
    130      1.1      tron to mark a destination as "dead", causing the suspension of further
    131      1.1      tron deliveries to the affected destination. </p>
    132      1.1      tron 
    133      1.1      tron </ul>
    134      1.1      tron 
    135      1.1      tron <p> (*) A pseudo-cohort is a number of delivery requests equal to
    136      1.1      tron a destination's delivery concurrency. </p>
    137      1.1      tron 
    138      1.1      tron <p> The revised concurrency scheduler has a highly modular structure.
    139      1.1      tron It uses separate mechanisms for per-destination concurrency control
    140      1.1      tron and for "dead destination" detection.  The concurrency control in
    141      1.1      tron turn is built from two separate mechanisms: it supports less-than-1
    142      1.1      tron feedback per delivery to allow for more gradual concurrency
    143      1.1      tron adjustments, and it uses feedback hysteresis to suppress concurrency
    144      1.1      tron oscillations.  And instead of waiting for delivery concurrency to
    145      1.1      tron throttle down to zero, a destination is declared "dead" after a
    146      1.1      tron configurable number of pseudo-cohorts reports connection or handshake
    147      1.1      tron failure.  </p>
    148      1.1      tron 
    149      1.1      tron <h3> <a name="concurrency_summary_2_5"> Summary of the Postfix 2.5
    150      1.1      tron concurrency feedback algorithm </a> </h3>
    151      1.1      tron 
    152      1.1      tron <p> We want to increment a destination's delivery concurrency when
    153      1.1      tron some (not necessarily consecutive) number of deliveries complete
    154      1.1      tron without connection or handshake failure.  This is implemented with
    155      1.1      tron positive feedback g(N) where N is the destination's delivery
    156      1.1      tron concurrency.  With g(N)=1 feedback per delivery, concurrency increases
    157      1.1      tron by 1 after each positive feedback event; this gives us the old
    158      1.1      tron scheduler's exponential growth in time. With g(N)=1/N feedback per
    159      1.1      tron delivery, concurrency increases by 1 after an entire pseudo-cohort
    160      1.1      tron N of positive feedback reports; this gives us linear growth in time.
    161      1.1      tron Less-than-1 feedback per delivery and integer truncation naturally
    162      1.1      tron give us hysteresis, so that transitions to larger concurrency happen
    163      1.1      tron every 1/g(N) positive feedback events.  </p>
    164      1.1      tron 
    165      1.1      tron <p> We want to decrement a destination's delivery concurrency when
    166      1.1      tron some (not necessarily consecutive) number of deliveries complete
    167      1.1      tron after connection or handshake failure.  This is implemented with
    168      1.1      tron negative feedback f(N) where N is the destination's delivery
    169      1.1      tron concurrency.  With f(N)=1 feedback per delivery, concurrency decreases
    170      1.1      tron by 1 after each negative feedback event; this gives us the old
    171      1.1      tron scheduler's behavior where concurrency is throttled down dramatically
    172      1.1      tron after a single pseudo-cohort failure.  With f(N)=1/N feedback per
    173      1.1      tron delivery, concurrency backs off more gently.  Again, less-than-1
    174      1.1      tron feedback per delivery and integer truncation naturally give us
    175      1.1      tron hysteresis, so that transitions to lower concurrency happen every
    176      1.1      tron 1/f(N) negative feedback events.  </p>
    177      1.1      tron 
    178      1.1      tron <p> However, with negative feedback we introduce a subtle twist.
    179      1.1      tron We "reverse" the negative hysteresis cycle so that the transition
    180      1.1      tron to lower concurrency happens at the <b>beginning</b> of a sequence
    181      1.1      tron of 1/f(N) negative feedback events.  Otherwise, a correction for
    182      1.1      tron overload would be made too late.  This makes the choice of f(N)
    183      1.1      tron relatively unimportant, as borne out by measurements later in this
    184      1.1      tron document.  </p>
    185      1.1      tron 
    186      1.1      tron <p> In summary, the main ingredients for the Postfix 2.5 concurrency
    187      1.1      tron feedback algorithm are a) the option of less-than-1 positive feedback
    188      1.1      tron per delivery to avoid overwhelming servers, b) the option of
    189      1.1      tron less-than-1 negative feedback per delivery to avoid giving up too
    190      1.1      tron fast, c) feedback hysteresis to avoid rapid oscillation, and d) a
    191      1.1      tron "reverse" hysteresis cycle for negative feedback, so that it can
    192      1.1      tron correct for overload quickly.  </p>
    193      1.1      tron 
    194      1.1      tron <h3> <a name="dead_summary_2_5"> Summary of the Postfix 2.5 "dead destination" detection algorithm </a> </h3>
    195      1.1      tron 
    196      1.1      tron <p> We want to suspend deliveries to a specific destination after
    197      1.1      tron some number of deliveries suffers connection or handshake failure.
    198      1.1      tron The old scheduler declares a destination "dead" when negative (-1)
    199      1.1      tron feedback throttles the delivery concurrency down to zero. With
    200      1.1      tron less-than-1 feedback per delivery, this throttling down would
    201      1.1      tron obviously take too long.  We therefore have to separate "dead
    202      1.1      tron destination" detection from concurrency feedback.  This is implemented
    203      1.1      tron by introducing the concept of pseudo-cohort failure. The Postfix
    204      1.1      tron 2.5 concurrency scheduler declares a destination "dead" after a
    205      1.1      tron configurable number of pseudo-cohorts suffers from connection or
    206      1.1      tron handshake failures. The old scheduler corresponds to the special
    207      1.1      tron case where the pseudo-cohort failure limit is equal to 1.  </p>
    208      1.1      tron 
    209      1.1      tron <h3> <a name="pseudo_code_2_5"> Pseudocode for the Postfix 2.5 concurrency scheduler </a> </h3>
    210      1.1      tron 
    211      1.1      tron <p> The pseudo code shows how the ideas behind new concurrency
    212      1.1      tron scheduler are implemented as of November 2007.  The actual code can
    213      1.1      tron be found in the module qmgr/qmgr_queue.c.  </p>
    214      1.1      tron 
    215      1.1      tron <pre>
    216      1.1      tron Types:
    217      1.1      tron         Each destination has one set of the following variables
    218      1.1      tron         int concurrency
    219      1.1      tron         double success
    220      1.1      tron         double failure
    221      1.1      tron         double fail_cohorts
    222      1.1      tron 
    223      1.1      tron Feedback functions:
    224      1.1      tron         N is concurrency; x, y are arbitrary numbers in [0..1] inclusive
    225      1.1      tron         positive feedback: g(N) = x/N | x/sqrt(N) | x
    226      1.1      tron         negative feedback: f(N) = y/N | y/sqrt(N) | y
    227      1.1      tron 
    228      1.1      tron Initialization:
    229      1.1      tron         concurrency = initial_concurrency
    230      1.1      tron         success = 0
    231      1.1      tron         failure = 0
    232      1.1      tron         fail_cohorts = 0
    233      1.1      tron 
    234      1.1      tron After success:
    235      1.1      tron         fail_cohorts = 0
    236      1.1      tron         Be prepared for feedback &gt; hysteresis, or rounding error
    237      1.1      tron         success += g(concurrency)
    238      1.1      tron         while (success >= 1)            Hysteresis 1
    239      1.1      tron             concurrency += 1            Hysteresis 1
    240      1.1      tron             failure = 0
    241      1.1      tron             success -= 1                Hysteresis 1
    242      1.1      tron         Be prepared for overshoot
    243      1.1      tron         if (concurrency &gt; concurrency limit)
    244      1.1      tron             concurrency = concurrency limit
    245      1.1      tron 
    246      1.1      tron Safety:
    247      1.1      tron         Don't apply positive feedback unless
    248      1.1      tron             concurrency &lt; busy_refcount + init_dest_concurrency
    249      1.1      tron         otherwise negative feedback effect could be delayed
    250      1.1      tron 
    251      1.1      tron After failure:
    252      1.1      tron         if (concurrency &gt; 0)
    253      1.1      tron             fail_cohorts += 1.0 / concurrency
    254      1.1      tron             if (fail_cohorts &gt; cohort_failure_limit)
    255      1.1      tron                 concurrency = 0
    256      1.1      tron         if (concurrency &gt; 0)
    257      1.1      tron             Be prepared for feedback &gt; hysteresis, rounding errors
    258      1.1      tron             failure -= f(concurrency)
    259      1.1      tron             while (failure &lt; 0)
    260      1.1      tron                 concurrency -= 1        Hysteresis 1
    261      1.1      tron                 failure += 1            Hysteresis 1
    262      1.1      tron                 success = 0
    263      1.1      tron             Be prepared for overshoot
    264      1.1      tron             if (concurrency &lt; 1)
    265      1.1      tron                 concurrency = 1
    266      1.1      tron </pre>
    267      1.1      tron 
    268      1.1      tron <h3> <a name="concurrency_results"> Results for delivery to concurrency-limited servers </a> </h3>
    269      1.1      tron 
    270      1.1      tron <p> Discussions about the concurrency scheduler redesign started
    271      1.1      tron early 2004, when the primary goal was to find alternatives that did
    272      1.1      tron not exhibit exponential growth or rapid concurrency throttling.  No
    273      1.1      tron code was implemented until late 2007, when the primary concern had
    274      1.1      tron shifted towards better handling of server concurrency limits. For
    275      1.1      tron this reason we measure how well the new scheduler does this
    276      1.1      tron job.  The table below compares mail delivery performance of the old
    277      1.1      tron +/-1 feedback per delivery with several less-than-1 feedback
    278      1.1      tron functions, for different limited-concurrency server scenarios.
    279      1.1      tron Measurements were done with a FreeBSD 6.2 client and with FreeBSD
    280      1.1      tron 6.2 and various Linux servers.  </p>
    281      1.1      tron 
    282      1.1      tron <p> Server configuration: </p>
    283      1.1      tron 
    284      1.1      tron <ul> <li> The mail flow was slowed down with 1 second latency per
    285      1.1      tron recipient ("smtpd_client_restrictions = sleep 1"). The purpose was
    286      1.1      tron to make results less dependent on hardware details, by avoiding
    287      1.1      tron slow-downs by queue file I/O, logging I/O, and network I/O.
    288      1.1      tron 
    289      1.1      tron <li> Concurrency was limited by the server process limit
    290      1.1      tron ("default_process_limit = 5" and "smtpd_client_event_limit_exceptions
    291      1.1      tron = static:all"). Postfix was stopped and started after changing the
    292      1.1      tron process limit, because the same number is also used as the backlog
    293      1.1      tron argument to the listen(2) system call, and "postfix reload" does
    294      1.1      tron not re-issue this call.
    295      1.1      tron 
    296      1.1      tron <li> Mail was discarded with "local_recipient_maps = static:all" and
    297      1.1      tron "local_transport = discard". The discard action in access maps or
    298      1.1      tron header/body checks
    299      1.1      tron could not be used as it fails to update the in_flow_delay counters.
    300      1.1      tron 
    301      1.1      tron </ul>
    302      1.1      tron 
    303      1.1      tron <p> Client configuration: </p>
    304      1.1      tron 
    305      1.1      tron <ul>
    306      1.1      tron 
    307      1.1      tron <li> Queue file overhead was minimized by sending one message to a
    308      1.1      tron virtual alias that expanded into 2000 different remote recipients.
    309      1.1      tron All recipients were accounted for according to the maillog file.
    310      1.1      tron The virtual_alias_expansion_limit setting was increased to avoid
    311      1.1      tron complaints from the cleanup(8) server.
    312      1.1      tron 
    313      1.1      tron <li> The number of deliveries was maximized with
    314      1.1      tron "smtp_destination_recipient_limit = 2". A smaller limit would cause
    315      1.1      tron Postfix to schedule the concurrency per recipient instead of domain,
    316      1.1      tron which is not what we want.
    317      1.1      tron 
    318      1.1      tron <li> Maximum concurrency was limited with
    319      1.1      tron "smtp_destination_concurrency_limit = 20", and
    320      1.1      tron initial_destination_concurrency was set to the same value.
    321      1.1      tron 
    322      1.1      tron <li> The positive and negative concurrency feedback hysteresis was
    323      1.1      tron 1.  Concurrency was incremented by 1 at the END of 1/feedback steps
    324      1.1      tron of positive feedback, and was decremented by 1 at the START of
    325      1.1      tron 1/feedback steps of negative feedback.
    326      1.1      tron 
    327      1.1      tron <li> The SMTP client used the default 30s SMTP connect timeout and
    328      1.1      tron 300s SMTP greeting timeout.
    329      1.1      tron 
    330      1.1      tron </ul>
    331      1.1      tron 
    332      1.1      tron <h4> Impact of the 30s SMTP connect timeout </h4>
    333      1.1      tron 
    334      1.1      tron <p> The first results are for a FreeBSD 6.2 server, where our
    335      1.1      tron artificially low listen(2) backlog results in a very short kernel
    336      1.1      tron queue for established connections. The table shows that all deferred
    337      1.1      tron deliveries failed due to a 30s connection timeout, and none failed
    338      1.1      tron due to a server greeting timeout.  This measurement simulates what
    339      1.1      tron happens when the server's connection queue is completely full under
    340      1.1      tron load, and the TCP engine drops new connections.  </p>
    341      1.1      tron 
    342      1.1      tron <blockquote>
    343      1.1      tron 
    344      1.1      tron <table>
    345      1.1      tron 
    346      1.1      tron <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
    347      1.1      tron style</th> <th>connection<br> caching</th> <th>percentage<br>
    348      1.1      tron deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
    349      1.1      tron <th colspan=2>timed-out in<br> connect/greeting </th> </tr>
    350      1.1      tron 
    351      1.1      tron <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    352      1.1      tron 
    353      1.1      tron <tr><td align="center">20</td> <td align="center">5</td> <td
    354      1.1      tron align="center">1/N</td> <td align="center">no</td> <td
    355      1.1      tron align="center">9.9</td> <td align="center">19.4</td> <td
    356      1.1      tron align="center">0.49</td> <td align="center">198</td> <td
    357      1.1      tron align="center">-</td> </tr>
    358      1.1      tron 
    359      1.1      tron <tr><td align="center">20</td> <td align="center">5</td> <td
    360      1.1      tron align="center">1/N</td> <td align="center">yes</td> <td
    361      1.1      tron align="center">10.3</td> <td align="center">19.4</td> <td
    362      1.1      tron align="center">0.49</td> <td align="center">206</td> <td
    363      1.1      tron align="center">-</td> </tr>
    364      1.1      tron 
    365      1.1      tron <tr><td align="center">20</td> <td align="center">5</td> <td
    366      1.1      tron align="center">1/sqrt(N)</td> <td align="center">no</td>
    367      1.1      tron <td align="center">10.4</td> <td align="center">19.6</td> <td
    368      1.1      tron align="center">0.59</td> <td align="center">208</td> <td
    369      1.1      tron align="center">-</td> </tr>
    370      1.1      tron 
    371      1.1      tron <tr><td align="center">20</td> <td align="center">5</td> <td
    372      1.1      tron align="center">1/sqrt(N)</td> <td align="center">yes</td>
    373      1.1      tron <td align="center">10.6</td> <td align="center">19.6</td> <td
    374      1.1      tron align="center">0.61</td> <td align="center">212</td> <td
    375      1.1      tron align="center">-</td> </tr>
    376      1.1      tron 
    377      1.1      tron <tr><td align="center">20</td> <td align="center">5</td> <td
    378      1.1      tron align="center">1</td> <td align="center">no</td> <td
    379      1.1      tron align="center">10.1</td> <td align="center">19.5</td> <td
    380      1.1      tron align="center">1.29</td> <td align="center">202</td> <td
    381      1.1      tron align="center">-</td> </tr>
    382      1.1      tron 
    383      1.1      tron <tr><td align="center">20</td> <td align="center">5</td> <td
    384      1.1      tron align="center">1</td> <td align="center">yes</td> <td
    385      1.1      tron align="center">10.8</td> <td align="center">19.3</td> <td
    386      1.1      tron align="center">1.57</td> <td align="center">216</td> <td
    387      1.1      tron align="center">-</td> </tr>
    388      1.1      tron 
    389      1.1      tron <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    390      1.1      tron 
    391      1.1      tron </table>
    392      1.1      tron 
    393      1.1      tron <p> A busy server with a completely full connection queue.  N is
    394      1.1      tron the client delivery concurrency.  Failed deliveries time out after
    395      1.1      tron 30s without completing the TCP handshake. See text for a discussion
    396      1.1      tron of results. </p>
    397      1.1      tron 
    398      1.1      tron </blockquote>
    399      1.1      tron 
    400      1.1      tron <h4> Impact of the 300s SMTP greeting timeout </h4>
    401      1.1      tron 
    402      1.1      tron <p> The next table shows results for a Fedora Core 8 server (results
    403      1.1      tron for RedHat 7.3 are identical). In this case, the artificially small
    404      1.1      tron listen(2) backlog argument does not impact our measurement.  The
    405      1.1      tron table shows that practically all deferred deliveries fail after the
    406      1.1      tron 300s SMTP greeting timeout. As these timeouts were 10x longer than
    407      1.1      tron with the first measurement, we increased the recipient count (and
    408      1.1      tron thus the running time) by a factor of 10 to keep the results
    409      1.1      tron comparable. The deferred mail percentages are a factor 10 lower
    410      1.1      tron than with the first measurement, because the 1s per-recipient delay
    411      1.1      tron was 1/300th of the greeting timeout instead of 1/30th of the
    412      1.1      tron connection timeout.  </p>
    413      1.1      tron 
    414      1.1      tron <blockquote>
    415      1.1      tron 
    416      1.1      tron <table>
    417      1.1      tron 
    418      1.1      tron <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
    419      1.1      tron style</th> <th>connection<br> caching</th> <th>percentage<br>
    420      1.1      tron deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
    421      1.1      tron <th colspan=2>timed-out in<br> connect/greeting </th> </tr>
    422      1.1      tron 
    423      1.1      tron <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    424      1.1      tron 
    425      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    426      1.1      tron align="center">1/N</td> <td align="center">no</td> <td
    427      1.1      tron align="center">1.16</td> <td align="center">19.8</td> <td
    428      1.1      tron align="center">0.37</td> <td align="center">-</td> <td
    429      1.1      tron align="center">230</td> </tr>
    430      1.1      tron 
    431      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    432      1.1      tron align="center">1/N</td> <td align="center">yes</td> <td
    433      1.1      tron align="center">1.36</td> <td align="center">19.8</td> <td
    434      1.1      tron align="center">0.36</td> <td align="center">-</td> <td
    435      1.1      tron align="center">272</td> </tr>
    436      1.1      tron 
    437      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    438      1.1      tron align="center">1/sqrt(N)</td> <td align="center">no</td>
    439      1.1      tron <td align="center">1.21</td> <td align="center">19.9</td> <td
    440      1.1      tron align="center">0.23</td> <td align="center">4</td> <td
    441      1.1      tron align="center">238</td> </tr>
    442      1.1      tron 
    443      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    444      1.1      tron align="center">1/sqrt(N)</td> <td align="center">yes</td>
    445      1.1      tron <td align="center">1.36</td> <td align="center">20.0</td> <td
    446      1.1      tron align="center">0.23</td> <td align="center">-</td> <td
    447      1.1      tron align="center">272</td> </tr>
    448      1.1      tron 
    449      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    450      1.1      tron align="center">1</td> <td align="center">no</td> <td
    451      1.1      tron align="center">1.18</td> <td align="center">20.0</td> <td
    452      1.1      tron align="center">0.16</td> <td align="center">-</td> <td
    453      1.1      tron align="center">236</td> </tr>
    454      1.1      tron 
    455      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    456      1.1      tron align="center">1</td> <td align="center">yes</td> <td
    457      1.1      tron align="center">1.39</td> <td align="center">20.0</td> <td
    458      1.1      tron align="center">0.16</td> <td align="center">-</td> <td
    459      1.1      tron align="center">278</td> </tr>
    460      1.1      tron 
    461      1.1      tron <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    462      1.1      tron 
    463      1.1      tron </table>
    464      1.1      tron 
    465      1.1      tron <p> A busy server with a non-full connection queue.  N is the client
    466      1.1      tron delivery concurrency. Failed deliveries complete at the TCP level,
    467      1.1      tron but time out after 300s while waiting for the SMTP greeting.  See
    468      1.1      tron text for a discussion of results.  </p>
    469      1.1      tron 
    470      1.1      tron </blockquote>
    471      1.1      tron 
    472      1.1      tron <h4> Impact of active server concurrency limiter </h4>
    473      1.1      tron 
    474      1.1      tron <p> The final concurrency-limited result shows what happens when
    475      1.1      tron SMTP connections don't time out, but are rejected immediately with
    476      1.1      tron the Postfix server's smtpd_client_connection_count_limit feature
    477      1.1      tron (the server replies with a 421 status and disconnects immediately).
    478      1.1      tron Similar results can be expected with concurrency limiting features
    479      1.1      tron built into other MTAs or firewalls.  For this measurement we specified
    480      1.1      tron a server concurrency limit and a client initial destination concurrency
    481      1.1      tron of 5, and a server process limit of 10; all other conditions were
    482      1.1      tron the same as with the first measurement. The same result would be
    483      1.1      tron obtained with a FreeBSD or Linux server, because the "pushing back"
    484      1.1      tron is done entirely by the receiving side. </p>
    485      1.1      tron 
    486      1.1      tron <blockquote>
    487      1.1      tron 
    488      1.1      tron <table>
    489      1.1      tron 
    490      1.1      tron <tr> <th>client<br> limit</th> <th>server<br> limit</th> <th>feedback<br>
    491      1.1      tron style</th> <th>connection<br> caching</th> <th>percentage<br>
    492      1.1      tron deferred</th> <th colspan="2">client concurrency<br> average/stddev</th>
    493      1.1      tron <th>theoretical<br>defer rate</th> </tr>
    494      1.1      tron 
    495      1.1      tron <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    496      1.1      tron 
    497      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    498      1.1      tron align="center">1/N</td> <td align="center">no</td> <td
    499      1.1      tron align="center">16.5</td> <td align="center">5.17</td> <td
    500      1.1      tron align="center">0.38</td> <td align="center">1/6</td> </tr>
    501      1.1      tron 
    502      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    503      1.1      tron align="center">1/N</td> <td align="center">yes</td> <td
    504      1.1      tron align="center">16.5</td> <td align="center">5.17</td> <td
    505      1.1      tron align="center">0.38</td> <td align="center">1/6</td> </tr>
    506      1.1      tron 
    507      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    508      1.1      tron align="center">1/sqrt(N)</td> <td align="center">no</td>
    509      1.1      tron <td align="center">24.5</td> <td align="center">5.28</td> <td
    510      1.1      tron align="center">0.45</td> <td align="center">1/4</td> </tr>
    511      1.1      tron 
    512      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    513      1.1      tron align="center">1/sqrt(N)</td> <td align="center">yes</td>
    514      1.1      tron <td align="center">24.3</td> <td align="center">5.28</td> <td
    515      1.1      tron align="center">0.46</td> <td align="center">1/4</td> </tr>
    516      1.1      tron 
    517      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    518      1.1      tron align="center">1</td> <td align="center">no</td> <td
    519      1.1      tron align="center">49.7</td> <td align="center">5.63</td> <td
    520      1.1      tron align="center">0.67</td> <td align="center">1/2</td> </tr>
    521      1.1      tron 
    522      1.1      tron <tr> <td align="center">20</td> <td align="center">5</td> <td
    523      1.1      tron align="center">1</td> <td align="center">yes</td> <td
    524      1.1      tron align="center">49.7</td> <td align="center">5.68</td> <td
    525      1.1      tron align="center">0.70</td> <td align="center">1/2</td> </tr>
    526      1.1      tron 
    527      1.1      tron <tr> <td align="center" colspan="9"> <hr> </td> </tr>
    528      1.1      tron 
    529      1.1      tron </table>
    530      1.1      tron 
    531      1.1      tron <p> A server with active per-client concurrency limiter that replies
    532      1.1      tron with 421 and disconnects.  N is the client delivery concurrency.
    533      1.1      tron The theoretical defer rate is 1/(1+roundup(1/feedback)).  This is
    534      1.1      tron always 1/2 with the fixed +/-1 feedback per delivery; with the
    535      1.1      tron concurrency-dependent feedback variants, the defer rate decreases
    536      1.1      tron with increasing concurrency. See text for a discussion of results.
    537      1.1      tron </p>
    538      1.1      tron 
    539      1.1      tron </blockquote>
    540      1.1      tron 
    541      1.1      tron <h3> <a name="concurrency_discussion"> Discussion of concurrency-limited server results </a> </h3>
    542      1.1      tron 
    543      1.1      tron <p> All results in the previous sections are based on the first
    544      1.1      tron delivery runs only; they do not include any second etc. delivery
    545      1.1      tron attempts. It's also worth noting that the measurements look at
    546      1.1      tron steady-state behavior only. They don't show what happens when the
    547      1.1      tron client starts sending at a much higher or lower concurrency.
    548      1.1      tron </p>
    549      1.1      tron 
    550      1.1      tron <p> The first two examples show that the effect of feedback
    551      1.1      tron is negligible when concurrency is limited due to congestion. This
    552      1.1      tron is because the initial concurrency is already at the client's
    553      1.1      tron concurrency maximum, and because there is 10-100 times more positive
    554      1.1      tron than negative feedback.  Under these conditions, it is no surprise
    555      1.1      tron that the contribution from SMTP connection caching is also negligible.
    556      1.1      tron </p>
    557      1.1      tron 
    558      1.1      tron <p> In the last example, the old +/-1 feedback per delivery will
    559      1.1      tron defer 50% of the mail when confronted with an active (anvil-style)
    560      1.1      tron server concurrency limit, where the server hangs up immediately
    561      1.1      tron with a 421 status (a TCP-level RST would have the same result).
    562      1.1      tron Less aggressive feedback mechanisms fare better than more aggressive
    563      1.1      tron ones.  Concurrency-dependent feedback fares even better at higher
    564      1.1      tron concurrencies than shown here, but has limitations as discussed in
    565      1.1      tron the next section.  </p>
    566      1.1      tron 
    567      1.1      tron <h3> <a name="concurrency_limitations"> Limitations of less-than-1 per delivery feedback </a> </h3>
    568      1.1      tron 
    569      1.1      tron <p> Less-than-1 feedback is of interest primarily when sending large
    570      1.1      tron amounts of mail to destinations with active concurrency limiters
    571      1.1      tron (servers that reply with 421, or firewalls that send RST).  When
    572      1.1      tron sending small amounts of mail per destination, less-than-1 per-delivery
    573      1.1      tron feedback won't have a noticeable effect on the per-destination
    574      1.1      tron concurrency, because the number of deliveries to the same destination
    575      1.1      tron is too small. You might just as well use zero per-delivery feedback
    576      1.1      tron and stay with the initial per-destination concurrency. And when
    577      1.1      tron mail deliveries fail due to congestion instead of active concurrency
    578      1.1      tron limiters, the measurements above show that per-delivery feedback
    579      1.1      tron has no effect.  With large amounts of mail you might just as well
    580      1.1      tron use zero per-delivery feedback and start with the maximal per-destination
    581      1.1      tron concurrency.  </p>
    582      1.1      tron 
    583      1.1      tron <p> The scheduler with less-than-1 concurrency
    584      1.1      tron feedback per delivery solves a problem with servers that have active
    585      1.1      tron concurrency limiters.  This works only because feedback is handled
    586      1.1      tron in a peculiar manner: positive feedback will increment the concurrency
    587      1.1      tron by 1 at the <b>end</b> of a sequence of events of length 1/feedback,
    588      1.1      tron while negative feedback will decrement concurrency by 1 at the
    589      1.1      tron <b>beginning</b> of such a sequence.  This is how Postfix adjusts
    590      1.1      tron quickly for overshoot without causing lots of mail to be deferred.
    591      1.1      tron Without this difference in feedback treatment, less-than-1 feedback
    592      1.1      tron per delivery would defer 50% of the mail, and would be no better
    593      1.1      tron in this respect than the old +/-1 feedback per delivery.  </p>
    594      1.1      tron 
    595      1.1      tron <p> Unfortunately, the same feature that corrects quickly for
    596      1.1      tron concurrency overshoot also makes the scheduler more sensitive for
    597      1.1      tron noisy negative feedback.  The reason is that one lonely negative
    598      1.1      tron feedback event has the same effect as a complete sequence of length
    599      1.1      tron 1/feedback: in both cases delivery concurrency is dropped by 1
    600      1.1      tron immediately.  As a worst-case scenario, consider multiple servers
    601      1.1      tron behind a load balancer on a single IP address, and no backup MX
    602      1.1      tron address.  When 1 out of K servers fails to complete the SMTP handshake
    603      1.1      tron or drops the connection, a scheduler with 1/N (N = concurrency)
    604      1.1      tron feedback stops increasing its concurrency once it reaches a concurrency
    605      1.1      tron level of about K,  even though the good servers behind the load
    606      1.1      tron balancer are perfectly capable of handling more traffic. </p>
    607      1.1      tron 
    608      1.1      tron <p> This noise problem gets worse as the amount of positive feedback
    609      1.1      tron per delivery gets smaller.  A compromise is to use fixed less-than-1
    610      1.1      tron positive feedback values instead of concurrency-dependent positive
    611      1.1      tron feedback.  For example, to tolerate 1 of 4 bad servers in the above
    612      1.1      tron load balancer scenario, use positive feedback of 1/4 per "good"
    613      1.1      tron delivery (no connect or handshake error), and use an equal or smaller
    614      1.1      tron amount of negative feedback per "bad" delivery.  The downside of
    615      1.1      tron using concurrency-independent feedback is that some of the old +/-1
    616      1.1      tron feedback problems will return at large concurrencies.  Sites that
    617      1.1      tron must deliver mail at non-trivial per-destination concurrencies will
    618      1.1      tron require special configuration.  </p>
    619      1.1      tron 
    620      1.1      tron <h3> <a name="concurrency_config"> Concurrency configuration parameters </a> </h3>
    621      1.1      tron 
    622      1.1      tron <p> The Postfix 2.5 concurrency scheduler is controlled with the
    623      1.1      tron following configuration parameters, where "<i>transport</i>_foo"
    624      1.1      tron provides a transport-specific parameter override.  All parameter
    625      1.1      tron default settings are compatible with earlier Postfix versions. </p>
    626      1.1      tron 
    627      1.1      tron <blockquote>
    628      1.1      tron 
    629      1.1      tron <table border="0">
    630      1.1      tron 
    631      1.1      tron <tr> <th> Parameter name </th> <th> Postfix version </th> <th>
    632      1.1      tron Description </th> </tr>
    633      1.1      tron 
    634      1.1      tron <tr> <td colspan="3"> <hr> </td> </tr>
    635      1.1      tron 
    636      1.1      tron <tr> <td> initial_destination_concurrency<br>
    637      1.1      tron <i>transport</i>_initial_destination_concurrency </td> <td
    638      1.1      tron align="center"> all<br> 2.5 </td> <td> Initial per-destination
    639      1.1      tron delivery concurrency </td> </tr>
    640      1.1      tron 
    641      1.1      tron <tr> <td> default_destination_concurrency_limit<br>
    642      1.1      tron <i>transport</i>_destination_concurrency_limit </td> <td align="center">
    643      1.1      tron all<br> all </td> <td> Maximum per-destination delivery concurrency
    644      1.1      tron </td> </tr>
    645      1.1      tron 
    646      1.1      tron <tr> <td> default_destination_concurrency_positive_feedback<br>
    647      1.1      tron <i>transport</i>_destination_concurrency_positive_feedback </td>
    648      1.1      tron <td align="center"> 2.5<br> 2.5 </td> <td> Per-destination positive
    649      1.1      tron feedback amount, per delivery that does not fail with connection
    650      1.1      tron or handshake failure </td> </tr>
    651      1.1      tron 
    652      1.1      tron <tr> <td> default_destination_concurrency_negative_feedback<br>
    653      1.1      tron <i>transport</i>_destination_concurrency_negative_feedback </td>
    654      1.1      tron <td align="center"> 2.5<br> 2.5 </td> <td> Per-destination negative
    655      1.1      tron feedback amount, per delivery that fails with connection or handshake
    656      1.1      tron failure </td> </tr>
    657      1.1      tron 
    658      1.1      tron <tr> <td> default_destination_concurrency_failed_cohort_limit<br>
    659      1.1      tron <i>transport</i>_destination_concurrency_failed_cohort_limit </td>
    660      1.1      tron <td align="center"> 2.5<br> 2.5 </td> <td> Number of failed
    661      1.1      tron pseudo-cohorts after which a destination is declared "dead" and
    662      1.1      tron delivery is suspended </td> </tr>
    663      1.1      tron 
    664      1.1      tron <tr> <td> destination_concurrency_feedback_debug</td> <td align="center">
    665      1.1      tron 2.5 </td> <td> Enable verbose logging of concurrency scheduler
    666      1.1      tron activity </td> </tr>
    667      1.1      tron 
    668      1.1      tron <tr> <td colspan="3"> <hr> </td> </tr>
    669      1.1      tron 
    670      1.1      tron </table>
    671      1.1      tron 
    672      1.1      tron </blockquote>
    673      1.1      tron 
    674      1.1      tron <h2> <a name="jobs"> Preemptive scheduling </a> </h2>
    675      1.1      tron 
    676      1.1      tron <p>
    677      1.1      tron 
    678      1.1      tron The following sections describe the new queue manager and its
    679      1.1      tron preemptive scheduler algorithm. Note that the document was originally
    680      1.1      tron written to describe the changes between the new queue manager (in
    681      1.1      tron this text referred to as <tt>nqmgr</tt>, the name it was known by
    682      1.1      tron before it became the default queue manager) and the old queue manager
    683      1.1      tron (referred to as <tt>oqmgr</tt>). This is why it refers to <tt>oqmgr</tt>
    684      1.1      tron every so often.
    685      1.1      tron 
    686      1.1      tron </p>
    687      1.1      tron 
    688      1.1      tron <p>
    689      1.1      tron 
    690      1.1      tron This document is divided into sections as follows:
    691      1.1      tron 
    692      1.1      tron </p>
    693      1.1      tron 
    694      1.1      tron <ul>
    695      1.1      tron 
    696      1.1      tron <li> <a href="#<tt>nqmgr</tt>_structures"> The structures used by
    697      1.1      tron nqmgr </a>
    698      1.1      tron 
    699      1.1      tron <li> <a href="#<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
    700      1.1      tron up the message </a> - how it is assigned to transports, jobs, peers,
    701      1.1      tron entries
    702      1.1      tron 
    703      1.1      tron <li> <a href="#<tt>nqmgr</tt>_selection"> How the entry selection
    704      1.1      tron works </a>
    705      1.1      tron 
    706      1.1      tron <li> <a href="#<tt>nqmgr</tt>_preemption"> How the preemption
    707      1.1      tron works </a> - what messages may be preempted and how and what messages
    708      1.1      tron are chosen to preempt them
    709      1.1      tron 
    710      1.1      tron <li> <a href="#<tt>nqmgr</tt>_concurrency"> How destination concurrency
    711      1.1      tron limits affect the scheduling algorithm </a>
    712      1.1      tron 
    713      1.1      tron <li> <a href="#<tt>nqmgr</tt>_memory"> Dealing with memory resource
    714      1.1      tron limits </a>
    715      1.1      tron 
    716      1.1      tron </ul>
    717      1.1      tron 
    718      1.1      tron <h3> <a name="<tt>nqmgr</tt>_structures"> The structures used by
    719      1.1      tron nqmgr </a> </h3>
    720      1.1      tron 
    721      1.1      tron <p>
    722      1.1      tron 
    723      1.1      tron Let's start by recapitulating the structures and terms used when
    724  1.1.1.4  christos referring to the queue manager and how it operates. Many of these are
    725      1.1      tron partially described elsewhere, but it is nice to have a coherent
    726      1.1      tron overview in one place:
    727      1.1      tron 
    728      1.1      tron </p>
    729      1.1      tron 
    730      1.1      tron <ul>
    731      1.1      tron 
    732      1.1      tron <li> <p> Each message structure represents one mail message which
    733      1.1      tron Postfix is to deliver. The message recipients specify to what
    734      1.1      tron destinations is the message to be delivered and what transports are
    735      1.1      tron going to be used for the delivery. </p>
    736      1.1      tron 
    737      1.1      tron <li> <p> Each recipient entry groups a batch of recipients of one
    738  1.1.1.3      tron message which are all going to be delivered to the same destination
    739  1.1.1.3      tron (and over the same transport).
    740      1.1      tron </p>
    741      1.1      tron 
    742      1.1      tron <li> <p> Each transport structure groups everything what is going
    743      1.1      tron to be delivered by delivery agents dedicated for that transport.
    744      1.1      tron Each transport maintains a set of queues (describing the destinations
    745      1.1      tron it shall talk to) and jobs (referencing the messages it shall
    746      1.1      tron deliver). </p>
    747      1.1      tron 
    748      1.1      tron <li> <p> Each transport queue (not to be confused with the on-disk
    749  1.1.1.4  christos "active" queue or "incoming" queue) groups everything what is going be
    750      1.1      tron delivered to given destination (aka nexthop) by its transport.  Each
    751      1.1      tron queue belongs to one transport, so each destination may be referred
    752      1.1      tron to by several queues, one for each transport.  Each queue maintains
    753      1.1      tron a list of all recipient entries (batches of message recipients)
    754      1.1      tron which shall be delivered to given destination (the todo list), and
    755      1.1      tron a list of recipient entries already being delivered by the delivery
    756      1.1      tron agents (the busy list). </p>
    757      1.1      tron 
    758      1.1      tron <li> <p> Each queue corresponds to multiple peer structures.  Each
    759      1.1      tron peer structure is like the queue structure, belonging to one transport
    760      1.1      tron and referencing one destination. The difference is that it lists
    761      1.1      tron only the recipient entries which all originate from the same message,
    762      1.1      tron unlike the queue structure, whose entries may originate from various
    763      1.1      tron messages. For messages with few recipients, there is usually just
    764      1.1      tron one recipient entry for each destination, resulting in one recipient
    765      1.1      tron entry per peer. But for large mailing list messages the recipients
    766      1.1      tron may need to be split to multiple recipient entries, in which case
    767      1.1      tron the peer structure may list many entries for single destination.
    768      1.1      tron </p>
    769      1.1      tron 
    770      1.1      tron <li> <p> Each transport job groups everything it takes to deliver
    771      1.1      tron one message via its transport. Each job represents one message
    772      1.1      tron within the context of the transport. The job belongs to one transport
    773      1.1      tron and message, so each message may have multiple jobs, one for each
    774      1.1      tron transport. The job groups all the peer structures, which describe
    775      1.1      tron the destinations the job's message has to be delivered to. </p>
    776      1.1      tron 
    777      1.1      tron </ul>
    778      1.1      tron 
    779      1.1      tron <p>
    780      1.1      tron 
    781      1.1      tron The first four structures are common to both <tt>nqmgr</tt> and
    782      1.1      tron <tt>oqmgr</tt>, the latter two were introduced by <tt>nqmgr</tt>.
    783      1.1      tron 
    784      1.1      tron </p>
    785      1.1      tron 
    786      1.1      tron <p>
    787      1.1      tron 
    788      1.1      tron These terms are used extensively in the text below, feel free to
    789      1.1      tron look up the description above anytime you'll feel you have lost a
    790      1.1      tron sense what is what.
    791      1.1      tron 
    792      1.1      tron </p>
    793      1.1      tron 
    794      1.1      tron <h3> <a name="<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
    795      1.1      tron up the message </a> </h3>
    796      1.1      tron 
    797      1.1      tron <p>
    798      1.1      tron 
    799  1.1.1.4  christos Whenever <tt>nqmgr</tt> moves a queue file into the "active" queue,
    800      1.1      tron the following happens: It reads all necessary information from the
    801      1.1      tron queue file as <tt>oqmgr</tt> does, and also reads as many recipients
    802      1.1      tron as possible - more on that later, for now let's just pretend it
    803      1.1      tron always reads all recipients.
    804      1.1      tron 
    805      1.1      tron </p>
    806      1.1      tron 
    807      1.1      tron <p>
    808      1.1      tron 
    809      1.1      tron Then it resolves the recipients as <tt>oqmgr</tt> does, which
    810      1.1      tron means obtaining (address, nexthop, transport) triple for each
    811      1.1      tron recipient. For each triple, it finds the transport; if it does not
    812      1.1      tron exist yet, it instantiates it (unless it's dead). Within the
    813  1.1.1.4  christos transport, it finds the destination queue for the given nexthop; if it
    814      1.1      tron does not exist yet, it instantiates it (unless it's dead). The
    815      1.1      tron triple is then bound to given destination queue. This happens in
    816      1.1      tron qmgr_resolve() and is basically the same as in <tt>oqmgr</tt>.
    817      1.1      tron 
    818      1.1      tron </p>
    819      1.1      tron 
    820      1.1      tron <p>
    821      1.1      tron 
    822      1.1      tron Then for each triple which was bound to some queue (and thus
    823      1.1      tron transport), the program finds the job which represents the message
    824      1.1      tron within that transport's context; if it does not exist yet, it
    825      1.1      tron instantiates it. Within the job, it finds the peer which represents
    826      1.1      tron the bound destination queue within this jobs context; if it does
    827      1.1      tron not exist yet, it instantiates it.  Finally, it stores the address
    828      1.1      tron from the resolved triple to the recipient entry which is appended
    829      1.1      tron to both the queue entry list and the peer entry list. The addresses
    830  1.1.1.4  christos for the same nexthop are batched in the entries up to the
    831  1.1.1.4  christos <i>transport</i>_destination_recipient_limit for that transport.
    832  1.1.1.4  christos This happens in qmgr_message_assign(), and apart
    833  1.1.1.4  christos from that it operates with job and peer structures, it is basically the
    834      1.1      tron same as in <tt>oqmgr</tt>.
    835      1.1      tron 
    836      1.1      tron </p>
    837      1.1      tron 
    838      1.1      tron <p>
    839      1.1      tron 
    840      1.1      tron When the job is instantiated, it is enqueued on the transport's job
    841      1.1      tron list based on the time its message was picked up by <tt>nqmgr</tt>.
    842      1.1      tron For first batch of recipients this means it is appended to the end
    843      1.1      tron of the job list, but the ordering of the job list by the enqueue
    844      1.1      tron time is important as we will see shortly.
    845      1.1      tron 
    846      1.1      tron </p>
    847      1.1      tron 
    848      1.1      tron <p>
    849      1.1      tron 
    850  1.1.1.4  christos [Now you should have a pretty good idea what the state of the
    851  1.1.1.4  christos <tt>nqmgr</tt> is after a couple of messages were picked up, and what the
    852  1.1.1.4  christos relation is between all those job, peer, queue and entry structures.]
    853      1.1      tron 
    854      1.1      tron </p>
    855      1.1      tron 
    856      1.1      tron <h3> <a name="<tt>nqmgr</tt>_selection"> How the entry selection
    857      1.1      tron works </a> </h3>
    858      1.1      tron 
    859      1.1      tron <p>
    860      1.1      tron 
    861      1.1      tron Having prepared all those above mentioned structures, the task of
    862      1.1      tron the <tt>nqmgr</tt>'s scheduler is to choose the recipient entries
    863      1.1      tron one at a time and pass them to the delivery agent for corresponding
    864      1.1      tron transport. Now how does this work?
    865      1.1      tron 
    866      1.1      tron </p>
    867      1.1      tron 
    868      1.1      tron <p>
    869      1.1      tron 
    870      1.1      tron The first approximation of the new scheduling algorithm is like this:
    871      1.1      tron 
    872      1.1      tron </p>
    873      1.1      tron 
    874      1.1      tron <blockquote>
    875      1.1      tron <pre>
    876      1.1      tron foreach transport (round-robin-by-transport)
    877      1.1      tron do
    878      1.1      tron     if transport busy continue
    879      1.1      tron     if transport process limit reached continue
    880      1.1      tron     foreach transport's job (in the order of the transport's job list)
    881      1.1      tron     do
    882      1.1      tron 	foreach job's peer (round-robin-by-destination)
    883      1.1      tron 	     if peer-&gt;queue-&gt;concurrency &lt; peer-&gt;queue-&gt;window
    884      1.1      tron 		 return next peer entry.
    885      1.1      tron 	done
    886      1.1      tron     done
    887      1.1      tron done
    888      1.1      tron </pre>
    889      1.1      tron </blockquote>
    890      1.1      tron 
    891      1.1      tron <p>
    892      1.1      tron 
    893      1.1      tron Now what is the "order of the transport's job list"? As we know
    894      1.1      tron already, the job list is by default kept in the order the message
    895      1.1      tron was picked up by the <tt>nqmgr</tt>. So by default we get the
    896      1.1      tron top-level round-robin transport, and within each transport we get
    897      1.1      tron the FIFO message delivery. The round-robin of the peers by the
    898      1.1      tron destination is perhaps of little importance in most real-life cases
    899  1.1.1.4  christos (unless the <i>transport</i>_destination_recipient_limit is reached,
    900  1.1.1.4  christos in one job there
    901      1.1      tron is only one peer structure for each destination), but theoretically
    902      1.1      tron it makes sure that even within single jobs, destinations are treated
    903      1.1      tron fairly.
    904      1.1      tron 
    905      1.1      tron </p>
    906      1.1      tron 
    907      1.1      tron <p>
    908      1.1      tron 
    909      1.1      tron [By now you should have a feeling you really know how the scheduler
    910      1.1      tron works, except for the preemption, under ideal conditions - that is,
    911      1.1      tron no recipient resource limits and no destination concurrency problems.]
    912      1.1      tron 
    913      1.1      tron </p>
    914      1.1      tron 
    915      1.1      tron <h3> <a name="<tt>nqmgr</tt>_preemption"> How the preemption
    916      1.1      tron works </a> </h3>
    917      1.1      tron 
    918      1.1      tron <p>
    919      1.1      tron 
    920      1.1      tron As you might perhaps expect by now, the transport's job list does
    921      1.1      tron not remain sorted by the job's message enqueue time all the time.
    922      1.1      tron The most cool thing about <tt>nqmgr</tt> is not the simple FIFO
    923      1.1      tron delivery, but that it is able to slip mail with little recipients
    924      1.1      tron past the mailing-list bulk mail.  This is what the job preemption
    925      1.1      tron is about - shuffling the jobs on the transport's job list to get
    926      1.1      tron the best message delivery rates. Now how is it achieved?
    927      1.1      tron 
    928      1.1      tron </p>
    929      1.1      tron 
    930      1.1      tron <p>
    931      1.1      tron 
    932      1.1      tron First I have to tell you that there are in fact two job lists in
    933      1.1      tron each transport. One is the scheduler's job list, which the scheduler
    934      1.1      tron is free to play with, while the other one keeps the jobs always
    935      1.1      tron listed in the order of the enqueue time and is used for recipient
    936      1.1      tron pool management we will discuss later. For now, we will deal with
    937      1.1      tron the scheduler's job list only.
    938      1.1      tron 
    939      1.1      tron </p>
    940      1.1      tron 
    941      1.1      tron <p>
    942      1.1      tron 
    943      1.1      tron So, we have the job list, which is first ordered by the time the
    944      1.1      tron jobs' messages were enqueued, oldest messages first, the most recently
    945      1.1      tron picked one at the end. For now, let's assume that there are no
    946      1.1      tron destination concurrency problems. Without preemption, we pick some
    947      1.1      tron entry of the first (oldest) job on the queue, assign it to delivery
    948      1.1      tron agent, pick another one from the same job, assign it again, and so
    949      1.1      tron on, until all the entries are used and the job is delivered. We
    950      1.1      tron would then move onto the next job and so on and on. Now how do we
    951      1.1      tron manage to sneak in some entries from the recently added jobs when
    952      1.1      tron the first job on the job list belongs to a message going to the
    953      1.1      tron mailing-list and has thousands of recipient entries?
    954      1.1      tron 
    955      1.1      tron </p>
    956      1.1      tron 
    957      1.1      tron <p>
    958      1.1      tron 
    959      1.1      tron The <tt>nqmgr</tt>'s answer is that we can artificially "inflate"
    960      1.1      tron the delivery time of that first job by some constant for free - it
    961      1.1      tron is basically the same trick you might remember as "accumulation of
    962      1.1      tron potential" from the amortized complexity lessons. For example,
    963      1.1      tron instead of delivering the entries of the first job on the job list
    964      1.1      tron every time a delivery agent becomes available, we can do it only
    965      1.1      tron every second time. If you view the moments the delivery agent becomes
    966      1.1      tron available on a timeline as "delivery slots", then instead of using
    967      1.1      tron every delivery slot for the first job, we can use only every other
    968      1.1      tron slot, and still the overall delivery efficiency of the first job
    969      1.1      tron remains the same. So the delivery <tt>11112222</tt> becomes
    970      1.1      tron <tt>1.1.1.1.2.2.2.2</tt> (1 and 2 are the imaginary job numbers, .
    971      1.1      tron denotes the free slot). Now what do we do with free slots?
    972      1.1      tron 
    973      1.1      tron </p>
    974      1.1      tron 
    975      1.1      tron <p>
    976      1.1      tron 
    977      1.1      tron As you might have guessed, we will use them for sneaking the mail
    978      1.1      tron with little recipients in. For example, if we have one four-recipient
    979      1.1      tron mail followed by four one recipients mail, the delivery sequence
    980      1.1      tron (that is, the sequence in which the jobs are assigned to the
    981      1.1      tron delivery slots) might look like this: <tt>12131415</tt>. Hmm, fine
    982      1.1      tron for sneaking in the single recipient mail, but how do we sneak in
    983      1.1      tron the mail with more than one recipient? Say if we have one four-recipient
    984      1.1      tron mail followed by two two-recipient mails?
    985      1.1      tron 
    986      1.1      tron </p>
    987      1.1      tron 
    988      1.1      tron <p>
    989      1.1      tron 
    990      1.1      tron The simple answer would be to use delivery sequence <tt>12121313</tt>.
    991      1.1      tron But the problem is that this does not scale well. Imagine you have
    992  1.1.1.4  christos mail with a thousand recipients followed by mail with a hundred recipients.
    993      1.1      tron It is tempting to suggest the  delivery sequence like <tt>121212....</tt>,
    994      1.1      tron but alas! Imagine there arrives another mail with say ten recipients.
    995      1.1      tron But there are no free slots anymore, so it can't slip by, not even
    996  1.1.1.4  christos if it had only one recipient.  It will be stuck until the
    997      1.1      tron hundred-recipient mail is delivered, which really sucks.
    998      1.1      tron 
    999      1.1      tron </p>
   1000      1.1      tron 
   1001      1.1      tron <p>
   1002      1.1      tron 
   1003      1.1      tron So, it becomes obvious that while inflating the message to get
   1004  1.1.1.4  christos free slots is a great idea, one has to be really careful of how the
   1005      1.1      tron free slots are assigned, otherwise one might corner himself. So,
   1006      1.1      tron how does <tt>nqmgr</tt> really use the free slots?
   1007      1.1      tron 
   1008      1.1      tron </p>
   1009      1.1      tron 
   1010      1.1      tron <p>
   1011      1.1      tron 
   1012      1.1      tron The key idea is that one does not have to generate the free slots
   1013      1.1      tron in a uniform way. The delivery sequence <tt>111...1</tt> is no
   1014      1.1      tron worse than <tt>1.1.1.1</tt>, in fact, it is even better as some
   1015      1.1      tron entries are in the first case selected earlier than in the second
   1016      1.1      tron case, and none is selected later! So it is possible to first
   1017      1.1      tron "accumulate" the free delivery slots and then use them all at once.
   1018      1.1      tron It is even possible to accumulate some, then use them, then accumulate
   1019      1.1      tron some more and use them again, as in <tt>11..1.1</tt> .
   1020      1.1      tron 
   1021      1.1      tron </p>
   1022      1.1      tron 
   1023      1.1      tron <p>
   1024      1.1      tron 
   1025      1.1      tron Let's get back to the one hundred recipient example. We now know
   1026      1.1      tron that we could first accumulate one hundred free slots, and only
   1027      1.1      tron after then to preempt the first job and sneak the one hundred
   1028      1.1      tron recipient mail in. Applying the algorithm recursively, we see the
   1029      1.1      tron hundred recipient job can accumulate ten free delivery slots, and
   1030      1.1      tron then we could preempt it and sneak in the ten-recipient mail...
   1031      1.1      tron Wait wait wait! Could we? Aren't we overinflating the original one
   1032      1.1      tron thousand recipient mail?
   1033      1.1      tron 
   1034      1.1      tron </p>
   1035      1.1      tron 
   1036      1.1      tron <p>
   1037      1.1      tron 
   1038  1.1.1.4  christos Well, despite the fact that it looks so at the first glance, another trick will
   1039      1.1      tron allow us to answer "no, we are not!". If we had said that we will
   1040      1.1      tron inflate the delivery time twice at maximum, and then we consider
   1041      1.1      tron every other slot as a free slot, then we would overinflate in case
   1042      1.1      tron of the recursive preemption. BUT! The trick is that if we use only
   1043      1.1      tron every n-th slot as a free slot for n&gt;2, there is always some worst
   1044      1.1      tron inflation factor which we can guarantee not to be breached, even
   1045      1.1      tron if we apply the algorithm recursively. To be precise, if for every
   1046      1.1      tron k&gt;1 normally used slots we accumulate one free delivery slot, than
   1047      1.1      tron the inflation factor is not worse than k/(k-1) no matter how many
   1048      1.1      tron recursive preemptions happen. And it's not worse than (k+1)/k if
   1049      1.1      tron only non-recursive preemption happens. Now, having got through the
   1050      1.1      tron theory and the related math, let's see how <tt>nqmgr</tt> implements
   1051      1.1      tron this.
   1052      1.1      tron 
   1053      1.1      tron </p>
   1054      1.1      tron 
   1055      1.1      tron <p>
   1056      1.1      tron 
   1057      1.1      tron Each job has so called "available delivery slot" counter. Each
   1058      1.1      tron transport has a <i>transport</i>_delivery_slot_cost parameter, which
   1059      1.1      tron defaults to default_delivery_slot_cost parameter which is set to 5
   1060      1.1      tron by default. This is the k from the paragraph above. Each time k
   1061      1.1      tron entries of the job are selected for delivery, this counter is
   1062  1.1.1.4  christos incremented by one. Once there are some slots accumulated, a job which
   1063      1.1      tron requires no more than that number of slots to be fully delivered
   1064      1.1      tron can preempt this job.
   1065      1.1      tron 
   1066      1.1      tron </p>
   1067      1.1      tron 
   1068      1.1      tron <p>
   1069      1.1      tron 
   1070      1.1      tron [Well, the truth is, the counter is incremented every time an entry
   1071  1.1.1.3      tron is selected and it is divided by k when it is used.
   1072  1.1.1.4  christos But to understand, it's good enough to use
   1073      1.1      tron the above approximation of the truth.]
   1074      1.1      tron 
   1075      1.1      tron </p>
   1076      1.1      tron 
   1077      1.1      tron <p>
   1078      1.1      tron 
   1079      1.1      tron OK, so now we know the conditions which must be satisfied so one
   1080      1.1      tron job can preempt another one. But what job gets preempted, how do
   1081      1.1      tron we choose what job preempts it if there are several valid candidates,
   1082      1.1      tron and when does all this exactly happen?
   1083      1.1      tron 
   1084      1.1      tron </p>
   1085      1.1      tron 
   1086      1.1      tron <p>
   1087      1.1      tron 
   1088      1.1      tron The answer for the first part is simple. The job whose entry was
   1089  1.1.1.4  christos selected the last time is the so called current job. Normally, it is
   1090      1.1      tron the first job on the scheduler's job list, but destination concurrency
   1091      1.1      tron limits may change this as we will see later. It is always only the
   1092      1.1      tron current job which may get preempted.
   1093      1.1      tron 
   1094      1.1      tron </p>
   1095      1.1      tron 
   1096      1.1      tron <p>
   1097      1.1      tron 
   1098  1.1.1.4  christos Now for the second part. The current job has a certain amount of
   1099      1.1      tron recipient entries, and as such may accumulate at maximum some amount
   1100      1.1      tron of available delivery slots. It might have already accumulated some,
   1101      1.1      tron and perhaps even already used some when it was preempted before
   1102      1.1      tron (remember a job can be preempted several times). In either case,
   1103      1.1      tron we know how many are accumulated and how many are left to deliver,
   1104      1.1      tron so we know how many it may yet accumulate at maximum. Every other
   1105      1.1      tron job which may be delivered by less than that number of slots is a
   1106      1.1      tron valid candidate for preemption. How do we choose among them?
   1107      1.1      tron 
   1108      1.1      tron </p>
   1109      1.1      tron 
   1110      1.1      tron <p>
   1111      1.1      tron 
   1112      1.1      tron The answer is - the one with maximum enqueue_time/recipient_entry_count.
   1113      1.1      tron That is, the older the job is, the more we should try to deliver
   1114      1.1      tron it in order to get best message delivery rates. These rates are of
   1115      1.1      tron course subject to how many recipients the message has, therefore
   1116      1.1      tron the division by the recipient (entry) count. No one shall be surprised
   1117  1.1.1.4  christos that a message with n recipients takes n times longer to deliver than
   1118  1.1.1.4  christos a message with one recipient.
   1119      1.1      tron 
   1120      1.1      tron </p>
   1121      1.1      tron 
   1122      1.1      tron <p>
   1123      1.1      tron 
   1124      1.1      tron Now let's recap the previous two paragraphs. Isn't it too complicated?
   1125      1.1      tron Why don't the candidates come only among the jobs which can be
   1126      1.1      tron delivered within the number of slots the current job already
   1127      1.1      tron accumulated? Why do we need to estimate how much it has yet to
   1128      1.1      tron accumulate? If you found out the answer, congratulate yourself. If
   1129      1.1      tron we did it this simple way, we would always choose the candidate
   1130  1.1.1.4  christos with the fewest recipient entries. If there were enough single recipient
   1131      1.1      tron mails coming in, they would always slip by the bulk mail as soon
   1132  1.1.1.4  christos as possible, and the two or more recipients mail would never get
   1133      1.1      tron a chance, no matter how long they have been sitting around in the
   1134      1.1      tron job list.
   1135      1.1      tron 
   1136      1.1      tron </p>
   1137      1.1      tron 
   1138      1.1      tron <p>
   1139      1.1      tron 
   1140  1.1.1.4  christos This candidate selection has an interesting implication - that when
   1141      1.1      tron we choose the best candidate for preemption (this is done in
   1142      1.1      tron qmgr_choose_candidate()), it may happen that we may not use it for
   1143      1.1      tron preemption immediately. This leads to an answer to the last part
   1144      1.1      tron of the original question - when does the preemption happen?
   1145      1.1      tron 
   1146      1.1      tron </p>
   1147      1.1      tron 
   1148      1.1      tron <p>
   1149      1.1      tron 
   1150      1.1      tron The preemption attempt happens every time next transport's recipient
   1151      1.1      tron entry is to be chosen for delivery. To avoid needless overhead, the
   1152      1.1      tron preemption is not attempted if the current job could never accumulate
   1153      1.1      tron more than <i>transport</i>_minimum_delivery_slots (defaults to
   1154  1.1.1.4  christos default_minimum_delivery_slots which defaults to 3). If there are
   1155      1.1      tron already enough accumulated slots to preempt the current job by the
   1156      1.1      tron chosen best candidate, it is done immediately. This basically means
   1157      1.1      tron that the candidate is moved in front of the current job on the
   1158      1.1      tron scheduler's job list and decreasing the accumulated slot counter
   1159  1.1.1.4  christos by the amount used by the candidate. If there are not enough slots...
   1160      1.1      tron well, I could say that nothing happens and the another preemption
   1161      1.1      tron is attempted the next time. But that's not the complete truth.
   1162      1.1      tron 
   1163      1.1      tron </p>
   1164      1.1      tron 
   1165      1.1      tron <p>
   1166      1.1      tron 
   1167      1.1      tron The truth is that it turns out that it is not really necessary to
   1168      1.1      tron wait until the jobs counter accumulates all the delivery slots in
   1169      1.1      tron advance. Say we have ten-recipient mail followed by two two-recipient
   1170  1.1.1.4  christos mails. If the preemption happened when enough delivery slots accumulate
   1171      1.1      tron (assuming slot cost 2), the delivery sequence becomes
   1172      1.1      tron <tt>11112211113311</tt>. Now what would we get if we would wait
   1173      1.1      tron only for 50% of the necessary slots to accumulate and we promise
   1174      1.1      tron we would wait for the remaining 50% later, after we get back
   1175  1.1.1.4  christos to the preempted job? If we use such a slot loan, the delivery sequence
   1176  1.1.1.4  christos becomes <tt>11221111331111</tt>. As we can see, it makes it not
   1177      1.1      tron considerably worse for the delivery of the ten-recipient mail, but
   1178      1.1      tron it allows the small messages to be delivered sooner.
   1179      1.1      tron 
   1180      1.1      tron </p>
   1181      1.1      tron 
   1182      1.1      tron <p>
   1183      1.1      tron 
   1184      1.1      tron The concept of these slot loans is where the
   1185      1.1      tron <i>transport</i>_delivery_slot_discount and
   1186      1.1      tron <i>transport</i>_delivery_slot_loan come from (they default to
   1187      1.1      tron default_delivery_slot_discount and default_delivery_slot_loan, whose
   1188      1.1      tron values are by default 50 and 3, respectively). The discount (resp.
   1189      1.1      tron loan) specifies how many percent (resp. how many slots) one "gets
   1190      1.1      tron in advance", when the number of slots required to deliver the best
   1191      1.1      tron candidate is compared with the number of slots the current slot had
   1192      1.1      tron accumulated so far.
   1193      1.1      tron 
   1194      1.1      tron </p>
   1195      1.1      tron 
   1196      1.1      tron <p>
   1197      1.1      tron 
   1198  1.1.1.4  christos And that pretty much concludes this chapter.
   1199      1.1      tron 
   1200      1.1      tron </p>
   1201      1.1      tron 
   1202      1.1      tron <p>
   1203      1.1      tron 
   1204      1.1      tron [Now you should have a feeling that you pretty much understand the
   1205  1.1.1.4  christos scheduler and the preemption, or at least that you will have
   1206  1.1.1.4  christos after you read the last chapter a couple more times. You shall clearly
   1207      1.1      tron see the job list and the preemption happening at its head, in ideal
   1208      1.1      tron delivery conditions. The feeling of understanding shall last until
   1209      1.1      tron you start wondering what happens if some of the jobs are blocked,
   1210      1.1      tron which you might eventually figure out correctly from what had been
   1211      1.1      tron said already. But I would be surprised if your mental image of the
   1212      1.1      tron scheduler's functionality is not completely shattered once you
   1213      1.1      tron start wondering how it works when not all recipients may be read
   1214      1.1      tron in-core.  More on that later.]
   1215      1.1      tron 
   1216      1.1      tron </p>
   1217      1.1      tron 
   1218      1.1      tron <h3> <a name="<tt>nqmgr</tt>_concurrency"> How destination concurrency
   1219      1.1      tron limits affect the scheduling algorithm </a> </h3>
   1220      1.1      tron 
   1221      1.1      tron <p>
   1222      1.1      tron 
   1223      1.1      tron The <tt>nqmgr</tt> uses the same algorithm for destination concurrency
   1224      1.1      tron control as <tt>oqmgr</tt>. Now what happens when the destination
   1225      1.1      tron limits are reached and no more entries for that destination may be
   1226      1.1      tron selected by the scheduler?
   1227      1.1      tron 
   1228      1.1      tron </p>
   1229      1.1      tron 
   1230      1.1      tron <p>
   1231      1.1      tron 
   1232  1.1.1.4  christos From the user's point of view it is all simple. If some of the peers
   1233      1.1      tron of a job can't be selected, those peers are simply skipped by the
   1234      1.1      tron entry selection algorithm (the pseudo-code described before) and
   1235      1.1      tron only the selectable ones are used. If none of the peers may be
   1236      1.1      tron selected, the job is declared a "blocker job". Blocker jobs are
   1237      1.1      tron skipped by the entry selection algorithm and they are also excluded
   1238  1.1.1.4  christos from the candidates for preemption of the current job. Thus the scheduler
   1239      1.1      tron effectively behaves as if the blocker jobs didn't exist on the job
   1240      1.1      tron list at all. As soon as at least one of the peers of a blocker job
   1241      1.1      tron becomes unblocked (that is, the delivery agent handling the delivery
   1242  1.1.1.4  christos of the recipient entry for the given destination successfully finishes),
   1243      1.1      tron the job's blocker status is removed and the job again participates
   1244      1.1      tron in all further scheduler actions normally.
   1245      1.1      tron 
   1246      1.1      tron </p>
   1247      1.1      tron 
   1248      1.1      tron <p>
   1249      1.1      tron 
   1250      1.1      tron So the summary is that the users don't really have to be concerned
   1251      1.1      tron about the interaction of the destination limits and scheduling
   1252      1.1      tron algorithm. It works well on its own and there are no knobs they
   1253      1.1      tron would need to control it.
   1254      1.1      tron 
   1255      1.1      tron </p>
   1256      1.1      tron 
   1257      1.1      tron <p>
   1258      1.1      tron 
   1259      1.1      tron From a programmer's point of view, the blocker jobs complicate the
   1260      1.1      tron scheduler quite a lot. Without them, the jobs on the job list would
   1261      1.1      tron be normally delivered in strict FIFO order. If the current job is
   1262      1.1      tron preempted, the job preempting it is completely delivered unless it
   1263      1.1      tron is preempted itself. Without blockers, the current job is thus
   1264      1.1      tron always either the first job on the job list, or the top of the stack
   1265      1.1      tron of jobs preempting the first job on the job list.
   1266      1.1      tron 
   1267      1.1      tron </p>
   1268      1.1      tron 
   1269      1.1      tron <p>
   1270      1.1      tron 
   1271      1.1      tron The visualization of the job list and the preemption stack without
   1272      1.1      tron blockers would be like this:
   1273      1.1      tron 
   1274      1.1      tron </p>
   1275      1.1      tron 
   1276      1.1      tron <blockquote>
   1277      1.1      tron <pre>
   1278      1.1      tron first job-&gt;    1--2--3--5--6--8--...    &lt;- job list
   1279      1.1      tron on job list    |
   1280      1.1      tron                4    &lt;- preemption stack
   1281      1.1      tron                |
   1282      1.1      tron current job-&gt;  7
   1283      1.1      tron </pre>
   1284      1.1      tron </blockquote>
   1285      1.1      tron 
   1286      1.1      tron <p>
   1287      1.1      tron 
   1288      1.1      tron In the example above we see that job 1 was preempted by job 4 and
   1289      1.1      tron then job 4 was preempted by job 7. After job 7 is completed, remaining
   1290      1.1      tron entries of job 4 are selected, and once they are all selected, job
   1291      1.1      tron 1 continues.
   1292      1.1      tron 
   1293      1.1      tron </p>
   1294      1.1      tron 
   1295      1.1      tron <p>
   1296      1.1      tron 
   1297      1.1      tron As we see, it's all very clean and straightforward. Now how does
   1298      1.1      tron this change because of blockers?
   1299      1.1      tron 
   1300      1.1      tron </p>
   1301      1.1      tron 
   1302      1.1      tron <p>
   1303      1.1      tron 
   1304  1.1.1.4  christos The answer is: a lot. Any job may become a blocker job at any time,
   1305  1.1.1.4  christos and also become a normal job again at any time. This has several
   1306      1.1      tron important implications:
   1307      1.1      tron 
   1308      1.1      tron </p>
   1309      1.1      tron 
   1310      1.1      tron <ol>
   1311      1.1      tron 
   1312      1.1      tron <li> <p>
   1313      1.1      tron 
   1314      1.1      tron The jobs may be completed in arbitrary order. For example, in the
   1315      1.1      tron example above, if the current job 7 becomes blocked, the next job
   1316      1.1      tron 4 may complete before the job 7 becomes unblocked again. Or if both
   1317      1.1      tron 7 and 4 are blocked, then 1 is completed, then 7 becomes unblocked
   1318      1.1      tron and is completed, then 2 is completed and only after that 4 becomes
   1319      1.1      tron unblocked and is completed... You get the idea.
   1320      1.1      tron 
   1321      1.1      tron </p>
   1322      1.1      tron 
   1323      1.1      tron <p>
   1324      1.1      tron 
   1325      1.1      tron [Interesting side note: even when jobs are delivered out of order,
   1326  1.1.1.4  christos from a single destination's point of view the jobs are still delivered
   1327      1.1      tron in the expected order (that is, FIFO unless there was some preemption
   1328      1.1      tron involved). This is because whenever a destination queue becomes
   1329      1.1      tron unblocked (the destination limit allows selection of more recipient
   1330      1.1      tron entries for that destination), all jobs which have peers for that
   1331      1.1      tron destination are unblocked at once.]
   1332      1.1      tron 
   1333      1.1      tron </p>
   1334      1.1      tron 
   1335      1.1      tron <li> <p>
   1336      1.1      tron 
   1337      1.1      tron The idea of the preemption stack at the head of the job list is
   1338      1.1      tron gone.  That is, it must be possible to preempt any job on the job
   1339      1.1      tron list. For example, if the jobs 7, 4, 1 and 2 in the example above
   1340      1.1      tron become all blocked, job 3 becomes the current job. And of course
   1341      1.1      tron we do not want the preemption to be affected by the fact that there
   1342      1.1      tron are some blocked jobs or not. Therefore, if it turns out that job
   1343      1.1      tron 3 might be preempted by job 6, the implementation shall make it
   1344      1.1      tron possible.
   1345      1.1      tron 
   1346      1.1      tron </p>
   1347      1.1      tron 
   1348      1.1      tron <li> <p>
   1349      1.1      tron 
   1350      1.1      tron The idea of the linear preemption stack itself is gone. It's no
   1351      1.1      tron longer true that one job is always preempted by only one job at one
   1352      1.1      tron time (that is directly preempted, not counting the recursively
   1353      1.1      tron nested jobs). For example, in the example above, job 1 is directly
   1354      1.1      tron preempted by only job 4, and job 4 by job 7. Now assume job 7 becomes
   1355      1.1      tron blocked, and job 4 is being delivered. If it accumulates enough
   1356      1.1      tron delivery slots, it is natural that it might be preempted for example
   1357      1.1      tron by job 8. Now job 4 is preempted by both job 7 AND job 8 at the
   1358      1.1      tron same time.
   1359      1.1      tron 
   1360      1.1      tron </p>
   1361      1.1      tron 
   1362      1.1      tron </ol>
   1363      1.1      tron 
   1364      1.1      tron <p>
   1365      1.1      tron 
   1366      1.1      tron Now combine the points 2) and 3) with point 1) again and you realize
   1367      1.1      tron that the relations on the once linear job list became pretty
   1368      1.1      tron complicated. If we extend the point 3) example: jobs 7 and 8 preempt
   1369      1.1      tron job 4, now job 8 becomes blocked too, then job 4 completes. Tricky,
   1370      1.1      tron huh?
   1371      1.1      tron 
   1372      1.1      tron </p>
   1373      1.1      tron 
   1374      1.1      tron <p>
   1375      1.1      tron 
   1376      1.1      tron If I illustrate the relations after the above mentioned examples
   1377  1.1.1.4  christos (but those in point 1), the situation would look like this:
   1378      1.1      tron 
   1379      1.1      tron </p>
   1380      1.1      tron 
   1381      1.1      tron <blockquote>
   1382      1.1      tron <pre>
   1383      1.1      tron                             v- parent
   1384      1.1      tron 
   1385      1.1      tron adoptive parent -&gt;    1--2--3--5--...      &lt;- "stack" level 0
   1386      1.1      tron                       |     |
   1387      1.1      tron parent gone -&gt;        ?     6              &lt;- "stack" level 1
   1388      1.1      tron                      / \
   1389      1.1      tron children -&gt;         7   8   ^- child       &lt;- "stack" level 2
   1390      1.1      tron 
   1391      1.1      tron                       ^- siblings
   1392      1.1      tron </pre>
   1393      1.1      tron </blockquote>
   1394      1.1      tron 
   1395      1.1      tron <p>
   1396      1.1      tron 
   1397      1.1      tron Now how does <tt>nqmgr</tt> deal with all these complicated relations?
   1398      1.1      tron 
   1399      1.1      tron </p>
   1400      1.1      tron 
   1401      1.1      tron <p>
   1402      1.1      tron 
   1403      1.1      tron Well, it maintains them all as described, but fortunately, all these
   1404  1.1.1.4  christos relations are necessary only for the purpose of proper counting of
   1405  1.1.1.4  christos available delivery slots. For the purpose of ordering the jobs for
   1406      1.1      tron entry selection, the original rule still applies: "the job preempting
   1407      1.1      tron the current job is moved in front of the current job on the job
   1408      1.1      tron list". So for entry selection purposes, the job relations remain
   1409      1.1      tron as simple as this:
   1410      1.1      tron 
   1411      1.1      tron </p>
   1412      1.1      tron 
   1413      1.1      tron <blockquote>
   1414      1.1      tron <pre>
   1415      1.1      tron 7--8--1--2--6--3--5--..   &lt;- scheduler's job list order
   1416      1.1      tron </pre>
   1417      1.1      tron </blockquote>
   1418      1.1      tron 
   1419      1.1      tron <p>
   1420      1.1      tron 
   1421      1.1      tron The job list order and the preemption parent/child/siblings relations
   1422      1.1      tron are maintained separately. And because the selection works only
   1423      1.1      tron with the job list, you can happily forget about those complicated
   1424      1.1      tron relations unless you want to study the <tt>nqmgr</tt> sources. In
   1425      1.1      tron that case the text above might provide some helpful introduction
   1426      1.1      tron to the problem domain. Otherwise I suggest you just forget about
   1427      1.1      tron all this and stick with the user's point of view: the blocker jobs
   1428      1.1      tron are simply ignored.
   1429      1.1      tron 
   1430      1.1      tron </p>
   1431      1.1      tron 
   1432      1.1      tron <p>
   1433      1.1      tron 
   1434  1.1.1.4  christos [By now, you should have a feeling that there are more things going
   1435  1.1.1.4  christos on under the hood than you ever wanted to know. You decide that
   1436      1.1      tron forgetting about this chapter is the best you can do for the sake
   1437      1.1      tron of your mind's health and you basically stick with the idea how the
   1438      1.1      tron scheduler works in ideal conditions, when there are no blockers,
   1439      1.1      tron which is good enough.]
   1440      1.1      tron 
   1441      1.1      tron </p>
   1442      1.1      tron 
   1443      1.1      tron <h3> <a name="<tt>nqmgr</tt>_memory"> Dealing with memory resource
   1444      1.1      tron limits </a> </h3>
   1445      1.1      tron 
   1446      1.1      tron <p>
   1447      1.1      tron 
   1448      1.1      tron When discussing the <tt>nqmgr</tt> scheduler, we have so far assumed
   1449  1.1.1.4  christos that all recipients of all messages in the "active" queue are completely
   1450  1.1.1.4  christos read into memory. This is simply not true. There is an upper
   1451      1.1      tron bound on the amount of memory the <tt>nqmgr</tt> may use, and
   1452      1.1      tron therefore it must impose some limits on the information it may store
   1453  1.1.1.4  christos in memory at any given time.
   1454      1.1      tron 
   1455      1.1      tron </p>
   1456      1.1      tron 
   1457      1.1      tron <p>
   1458      1.1      tron 
   1459      1.1      tron First of all, not all messages may be read in-core at once. At any
   1460      1.1      tron time, only qmgr_message_active_limit messages may be read in-core
   1461      1.1      tron at maximum. When read into memory, the messages are picked from the
   1462  1.1.1.4  christos "incoming" and "deferred" queues and moved to the "active" queue
   1463  1.1.1.4  christos (incoming having priority), so if there are more than
   1464  1.1.1.4  christos qmgr_message_active_limit messages queued in the "active" queue, the
   1465  1.1.1.4  christos rest will have to wait until (some of) the messages in the "active" queue
   1466  1.1.1.4  christos are completely delivered (or deferred).
   1467      1.1      tron 
   1468      1.1      tron </p>
   1469      1.1      tron 
   1470      1.1      tron <p>
   1471      1.1      tron 
   1472      1.1      tron Even with the limited amount of in-core messages, there is another
   1473      1.1      tron limit which must be imposed in order to avoid memory exhaustion.
   1474  1.1.1.4  christos Each message may contain a huge number of recipients (tens or hundreds
   1475      1.1      tron of thousands are not uncommon), so if <tt>nqmgr</tt> read all
   1476  1.1.1.4  christos recipients of all messages in the "active" queue, it may easily run
   1477      1.1      tron out of memory. Therefore there must be some upper bound on the
   1478  1.1.1.4  christos amount of message recipients which are read into memory at the
   1479      1.1      tron same time.
   1480      1.1      tron 
   1481      1.1      tron </p>
   1482      1.1      tron 
   1483      1.1      tron <p>
   1484      1.1      tron 
   1485      1.1      tron Before discussing how exactly <tt>nqmgr</tt> implements the recipient
   1486      1.1      tron limits, let's see how the sole existence of the limits themselves
   1487      1.1      tron affects the <tt>nqmgr</tt> and its scheduler.
   1488      1.1      tron 
   1489      1.1      tron </p>
   1490      1.1      tron 
   1491      1.1      tron <p>
   1492      1.1      tron 
   1493      1.1      tron The message limit is straightforward - it just limits the size of
   1494      1.1      tron the
   1495      1.1      tron lookahead the <tt>nqmgr</tt>'s scheduler has when choosing which
   1496  1.1.1.4  christos message can preempt the current one. Messages not in the "active" queue
   1497  1.1.1.4  christos are simply not considered at all.
   1498      1.1      tron 
   1499      1.1      tron </p>
   1500      1.1      tron 
   1501      1.1      tron <p>
   1502      1.1      tron 
   1503      1.1      tron The recipient limit complicates more things. First of all, the
   1504      1.1      tron message reading code must support reading the recipients in batches,
   1505      1.1      tron which among other things means accessing the queue file several
   1506      1.1      tron times and continuing where the last recipient batch ended. This is
   1507      1.1      tron invoked by the scheduler whenever the current job has space for more
   1508      1.1      tron recipients, subject to transport's refill_limit and refill_delay parameters.
   1509      1.1      tron It is also done any time when all
   1510      1.1      tron in-core recipients of the message are dealt with (which may also
   1511      1.1      tron mean they were deferred) but there are still more in the queue file.
   1512      1.1      tron 
   1513      1.1      tron </p>
   1514      1.1      tron 
   1515      1.1      tron <p>
   1516      1.1      tron 
   1517      1.1      tron The second complication is that with some recipients left unread
   1518      1.1      tron in the queue file, the scheduler can't operate with exact counts
   1519      1.1      tron of recipient entries. With unread recipients, it is not clear how
   1520      1.1      tron many recipient entries there will be, as they are subject to
   1521      1.1      tron per-destination grouping. It is not even clear to what transports
   1522      1.1      tron (and thus jobs) the recipients will be assigned. And with messages
   1523  1.1.1.4  christos coming from the "deferred" queue, it is not even clear how many unread
   1524      1.1      tron recipients are still to be delivered. This all means that the
   1525      1.1      tron scheduler must use only estimates of how many recipients entries
   1526      1.1      tron there will be.  Fortunately, it is possible to estimate the minimum
   1527      1.1      tron and maximum correctly, so the scheduler can always err on the safe
   1528  1.1.1.4  christos side.  Obviously, the better the estimates, the better the results, so
   1529      1.1      tron it is best when we are able to read all recipients in-core and turn
   1530      1.1      tron the estimates into exact counts, or at least try to read as many
   1531      1.1      tron as possible to make the estimates as accurate as possible.
   1532      1.1      tron 
   1533      1.1      tron </p>
   1534      1.1      tron 
   1535      1.1      tron <p>
   1536      1.1      tron 
   1537      1.1      tron The third complication is that it is no longer true that the scheduler
   1538      1.1      tron is done with a job once all of its in-core recipients are delivered.
   1539      1.1      tron It is possible that the job will be revived later, when another
   1540      1.1      tron batch of recipients is read in core. It is also possible that some
   1541      1.1      tron jobs will be created for the first time long after the first batch
   1542      1.1      tron of recipients was read in core. The <tt>nqmgr</tt> code must be
   1543      1.1      tron ready to handle all such situations.
   1544      1.1      tron 
   1545      1.1      tron </p>
   1546      1.1      tron 
   1547      1.1      tron <p>
   1548      1.1      tron 
   1549      1.1      tron And finally, the fourth complication is that the <tt>nqmgr</tt>
   1550      1.1      tron code must somehow impose the recipient limit itself. Now how does
   1551      1.1      tron it achieve it?
   1552      1.1      tron 
   1553      1.1      tron </p>
   1554      1.1      tron 
   1555      1.1      tron <p>
   1556      1.1      tron 
   1557      1.1      tron Perhaps the easiest solution would be to say that each message may
   1558  1.1.1.4  christos have at maximum X recipients stored in-core, but such a solution would
   1559      1.1      tron be poor for several reasons. With reasonable qmgr_message_active_limit
   1560  1.1.1.4  christos values, the X would have to be quite low to maintain a reasonable
   1561      1.1      tron memory footprint. And with low X lots of things would not work well.
   1562      1.1      tron The <tt>nqmgr</tt> would have problems to use the
   1563      1.1      tron <i>transport</i>_destination_recipient_limit efficiently. The
   1564      1.1      tron scheduler's preemption would be suboptimal as the recipient count
   1565      1.1      tron estimates would be inaccurate. The message queue file would have
   1566      1.1      tron to be accessed many times to read in more recipients again and
   1567      1.1      tron again.
   1568      1.1      tron 
   1569      1.1      tron </p>
   1570      1.1      tron 
   1571      1.1      tron <p>
   1572      1.1      tron 
   1573      1.1      tron Therefore it seems reasonable to have a solution which does not use
   1574  1.1.1.4  christos a limit imposed on a per-message basis, but which maintains a pool
   1575      1.1      tron of available recipient slots, which can be shared among all messages
   1576      1.1      tron in the most efficient manner. And as we do not want separate
   1577      1.1      tron transports to compete for resources whenever possible, it seems
   1578  1.1.1.4  christos appropriate to maintain such a recipient pool for each transport
   1579      1.1      tron separately. This is the general idea, now how does it work in
   1580      1.1      tron practice?
   1581      1.1      tron 
   1582      1.1      tron </p>
   1583      1.1      tron 
   1584      1.1      tron <p>
   1585      1.1      tron 
   1586  1.1.1.4  christos First we have to solve a little chicken-and-egg problem. If we want
   1587      1.1      tron to use the per-transport recipient pools, we first need to know to
   1588  1.1.1.4  christos what transport(s) the message is assigned. But we will find that
   1589  1.1.1.4  christos out only after we first read in the recipients. So it is obvious
   1590      1.1      tron that we first have to read in some recipients, use them to find out
   1591  1.1.1.4  christos to what transports the message is to be assigned, and only after
   1592  1.1.1.4  christos that can we use the per-transport recipient pools.
   1593      1.1      tron 
   1594      1.1      tron </p>
   1595      1.1      tron 
   1596      1.1      tron <p>
   1597      1.1      tron 
   1598      1.1      tron Now how many recipients shall we read for the first time? This is
   1599      1.1      tron what qmgr_message_recipient_minimum and qmgr_message_recipient_limit
   1600      1.1      tron values control. The qmgr_message_recipient_minimum value specifies
   1601  1.1.1.4  christos how many recipients of each message we will read the first time,
   1602      1.1      tron no matter what.  It is necessary to read at least one recipient
   1603      1.1      tron before we can assign the message to a transport and create the first
   1604      1.1      tron job. However, reading only qmgr_message_recipient_minimum recipients
   1605      1.1      tron even if there are only few messages with few recipients in-core would
   1606  1.1.1.4  christos be wasteful. Therefore if there are fewer than qmgr_message_recipient_limit
   1607      1.1      tron recipients in-core so far, the first batch of recipients may be
   1608      1.1      tron larger than qmgr_message_recipient_minimum - as large as is required
   1609      1.1      tron to reach the qmgr_message_recipient_limit limit.
   1610      1.1      tron 
   1611      1.1      tron </p>
   1612      1.1      tron 
   1613      1.1      tron <p>
   1614      1.1      tron 
   1615      1.1      tron Once the first batch of recipients was read in core and the message
   1616      1.1      tron jobs were created, the size of the subsequent recipient batches (if
   1617      1.1      tron any - of course it's best when all recipients are read in one batch)
   1618      1.1      tron is based solely on the position of the message jobs on their
   1619      1.1      tron corresponding transports' job lists. Each transport has a pool of
   1620      1.1      tron <i>transport</i>_recipient_limit recipient slots which it can
   1621      1.1      tron distribute among its jobs (how this is done is described later).
   1622      1.1      tron The subsequent recipient batch may be as large as the sum of all
   1623      1.1      tron recipient slots of all jobs of the message permits (plus the
   1624      1.1      tron qmgr_message_recipient_minimum amount which always applies).
   1625      1.1      tron 
   1626      1.1      tron </p>
   1627      1.1      tron 
   1628      1.1      tron <p>
   1629      1.1      tron 
   1630  1.1.1.4  christos For example, if a message has three jobs, the first with 1 recipient
   1631  1.1.1.4  christos still in-core and 4 recipient slots, the second with 5 recipients in-core
   1632  1.1.1.4  christos and 5 recipient slots, and the third with 2 recipients in-core and 0
   1633  1.1.1.4  christos recipient slots, it has 1+5+2=8 recipients in-core and 4+5+0=9 jobs'
   1634      1.1      tron recipients slots in total. This means that we could immediately
   1635      1.1      tron read 2+qmgr_message_recipient_minimum more recipients of that message
   1636      1.1      tron in core.
   1637      1.1      tron 
   1638      1.1      tron </p>
   1639      1.1      tron 
   1640      1.1      tron <p>
   1641      1.1      tron 
   1642      1.1      tron The above example illustrates several things which might be worth
   1643      1.1      tron mentioning explicitly: first, note that although the per-transport
   1644      1.1      tron slots are assigned to particular jobs, we can't guarantee that once
   1645      1.1      tron the next batch of recipients is read in core, that the corresponding
   1646      1.1      tron amounts of recipients will be assigned to those jobs. The jobs lend
   1647      1.1      tron its slots to the message as a whole, so it is possible that some
   1648      1.1      tron jobs end up sponsoring other jobs of their message. For example,
   1649      1.1      tron if in the example above the 2 newly read recipients were assigned
   1650      1.1      tron to the second job, the first job sponsored the second job with 2
   1651      1.1      tron slots. The second notable thing is the third job, which has more
   1652      1.1      tron recipients in-core than it has slots. Apart from sponsoring by other
   1653      1.1      tron job we just saw it can be result of the first recipient batch, which
   1654      1.1      tron is sponsored from global recipient pool of qmgr_message_recipient_limit
   1655      1.1      tron recipients. It can be also sponsored from the message recipient
   1656      1.1      tron pool of qmgr_message_recipient_minimum recipients.
   1657      1.1      tron 
   1658      1.1      tron </p>
   1659      1.1      tron 
   1660      1.1      tron <p>
   1661      1.1      tron 
   1662      1.1      tron Now how does each transport distribute the recipient slots among
   1663      1.1      tron its jobs?  The strategy is quite simple. As most scheduler activity
   1664      1.1      tron happens on the head of the job list, it is our intention to make
   1665      1.1      tron sure that the scheduler has the best estimates of the recipient
   1666      1.1      tron counts for those jobs. As we mentioned above, this means that we
   1667      1.1      tron want to try to make sure that the messages of those jobs have all
   1668      1.1      tron recipients read in-core. Therefore the transport distributes the
   1669      1.1      tron slots "along" the job list from start to end. In this case the job
   1670      1.1      tron list sorted by message enqueue time is used, because it doesn't
   1671      1.1      tron change over time as the scheduler's job list does.
   1672      1.1      tron 
   1673      1.1      tron </p>
   1674      1.1      tron 
   1675      1.1      tron <p>
   1676      1.1      tron 
   1677      1.1      tron More specifically, each time a job is created and appended to the
   1678      1.1      tron job list, it gets all unused recipient slots from its transport's
   1679      1.1      tron pool. It keeps them until all recipients of its message are read.
   1680      1.1      tron When this happens, all unused recipient slots are transferred to
   1681  1.1.1.4  christos the next job (which is now in fact the first such job) on the job
   1682      1.1      tron list which still has some recipients unread, or eventually back to
   1683  1.1.1.4  christos the transport pool if there is no such job. Such a transfer then also
   1684      1.1      tron happens whenever a recipient entry of that job is delivered.
   1685      1.1      tron 
   1686      1.1      tron </p>
   1687      1.1      tron 
   1688      1.1      tron <p>
   1689      1.1      tron 
   1690      1.1      tron There is also a scenario when a job is not appended to the end of
   1691  1.1.1.4  christos the job list (for example it was created as a result of a second or
   1692      1.1      tron later recipient batch). Then it works exactly as above, except that
   1693      1.1      tron if it was put in front of the first unread job (that is, the job
   1694  1.1.1.4  christos of a message which still has some unread recipients in the queue file),
   1695      1.1      tron that job is first forced to return all of its unused recipient slots
   1696      1.1      tron to the transport pool.
   1697      1.1      tron 
   1698      1.1      tron </p>
   1699      1.1      tron 
   1700      1.1      tron <p>
   1701      1.1      tron 
   1702      1.1      tron The algorithm just described leads to the following state: The first
   1703      1.1      tron unread job on the job list always gets all the remaining recipient
   1704      1.1      tron slots of that transport (if there are any). The jobs queued before
   1705      1.1      tron this job are completely read (that is, all recipients of their
   1706      1.1      tron message were already read in core) and have at maximum as many slots
   1707      1.1      tron as they still have recipients in-core (the maximum is there because
   1708      1.1      tron of the sponsoring mentioned before) and the jobs after this job get
   1709      1.1      tron nothing from the transport recipient pool (unless they got something
   1710      1.1      tron before and then the first unread job was created and enqueued in
   1711  1.1.1.4  christos front of them later - in such a case, they also get at maximum as many
   1712      1.1      tron slots as they have recipients in-core).
   1713      1.1      tron 
   1714      1.1      tron </p>
   1715      1.1      tron 
   1716      1.1      tron <p>
   1717      1.1      tron 
   1718  1.1.1.4  christos Things work fine in such a state for most of the time, because the
   1719  1.1.1.4  christos current job is either completely read in-core or has as many recipient
   1720      1.1      tron slots as there are, but there is one situation which we still have
   1721      1.1      tron to take care of specially.  Imagine if the current job is preempted
   1722      1.1      tron by some unread job from the job list and there are no more recipient
   1723      1.1      tron slots available, so this new current job could read only batches
   1724      1.1      tron of qmgr_message_recipient_minimum recipients at a time. This would
   1725  1.1.1.4  christos really degrade performance. For this reason, each transport has an
   1726      1.1      tron extra pool of <i>transport</i>_extra_recipient_limit recipient
   1727      1.1      tron slots, dedicated exactly for this situation. Each time an unread
   1728      1.1      tron job preempts the current job, it gets half of the remaining recipient
   1729      1.1      tron slots from the normal pool and this extra pool.
   1730      1.1      tron 
   1731      1.1      tron </p>
   1732      1.1      tron 
   1733      1.1      tron <p>
   1734      1.1      tron 
   1735      1.1      tron And that's it. It sure does sound pretty complicated, but fortunately
   1736  1.1.1.4  christos most people don't really have to care exactly how it works as long
   1737      1.1      tron as it works.  Perhaps the only important things to know for most
   1738      1.1      tron people are the following upper bound formulas:
   1739      1.1      tron 
   1740      1.1      tron </p>
   1741      1.1      tron 
   1742      1.1      tron <p>
   1743      1.1      tron 
   1744      1.1      tron Each transport has at maximum
   1745      1.1      tron 
   1746      1.1      tron </p>
   1747      1.1      tron 
   1748      1.1      tron <blockquote>
   1749      1.1      tron <pre>
   1750      1.1      tron max(
   1751      1.1      tron qmgr_message_recipient_minimum * qmgr_message_active_limit
   1752      1.1      tron + *_recipient_limit + *_extra_recipient_limit,
   1753      1.1      tron qmgr_message_recipient_limit
   1754      1.1      tron )
   1755      1.1      tron </pre>
   1756      1.1      tron </blockquote>
   1757      1.1      tron 
   1758      1.1      tron <p>
   1759      1.1      tron 
   1760      1.1      tron recipients in core.
   1761      1.1      tron 
   1762      1.1      tron </p>
   1763      1.1      tron 
   1764      1.1      tron <p>
   1765      1.1      tron 
   1766      1.1      tron The total amount of recipients in core is
   1767      1.1      tron 
   1768      1.1      tron </p>
   1769      1.1      tron 
   1770      1.1      tron <blockquote>
   1771      1.1      tron <pre>
   1772      1.1      tron max(
   1773      1.1      tron qmgr_message_recipient_minimum * qmgr_message_active_limit
   1774      1.1      tron + sum( *_recipient_limit + *_extra_recipient_limit ),
   1775      1.1      tron qmgr_message_recipient_limit
   1776      1.1      tron )
   1777      1.1      tron </pre>
   1778      1.1      tron </blockquote>
   1779      1.1      tron 
   1780      1.1      tron <p>
   1781      1.1      tron 
   1782      1.1      tron where the sum is over all used transports.
   1783      1.1      tron 
   1784      1.1      tron </p>
   1785      1.1      tron 
   1786      1.1      tron <p>
   1787      1.1      tron 
   1788      1.1      tron And this terribly complicated chapter concludes the documentation
   1789  1.1.1.4  christos of the <tt>nqmgr</tt> scheduler.
   1790      1.1      tron 
   1791      1.1      tron </p>
   1792      1.1      tron 
   1793      1.1      tron <p>
   1794      1.1      tron 
   1795      1.1      tron [By now you should theoretically know the <tt>nqmgr</tt> scheduler
   1796      1.1      tron inside out. In practice, you still hope that you will never have
   1797      1.1      tron to really understand the last or last two chapters completely, and
   1798      1.1      tron fortunately most people really won't. Understanding how the scheduler
   1799  1.1.1.4  christos works in ideal conditions is more than good enough for the vast majority
   1800      1.1      tron of users.]
   1801      1.1      tron 
   1802      1.1      tron </p>
   1803      1.1      tron 
   1804      1.1      tron <h2> <a name="credits"> Credits </a> </h2>
   1805      1.1      tron 
   1806      1.1      tron <ul>
   1807      1.1      tron 
   1808      1.1      tron <li> Wietse Venema designed and implemented the initial queue manager
   1809      1.1      tron with per-domain FIFO scheduling, and per-delivery +/-1 concurrency
   1810      1.1      tron feedback.
   1811      1.1      tron 
   1812      1.1      tron <li> Patrik Rak designed and implemented preemption where mail with
   1813      1.1      tron fewer recipients can slip past mail with more recipients in a
   1814      1.1      tron controlled manner, and wrote up its documentation.
   1815      1.1      tron 
   1816      1.1      tron <li> Wietse Venema initiated a discussion with Patrik Rak and Victor
   1817      1.1      tron Duchovni on alternatives for the +/-1 feedback scheduler's aggressive
   1818      1.1      tron behavior. This is when K/N feedback was reviewed (N = concurrency).
   1819      1.1      tron The discussion ended without a good solution for both negative
   1820      1.1      tron feedback and dead site detection.
   1821      1.1      tron 
   1822      1.1      tron <li> Victor Duchovni resumed work on concurrency feedback in the
   1823      1.1      tron context of concurrency-limited servers.
   1824      1.1      tron 
   1825      1.1      tron <li> Wietse Venema then re-designed the concurrency scheduler in
   1826      1.1      tron terms of the simplest possible concepts: less-than-1 concurrency
   1827      1.1      tron feedback per delivery, forward and reverse concurrency feedback
   1828      1.1      tron hysteresis, and pseudo-cohort failure. At this same time, concurrency
   1829      1.1      tron feedback was separated from dead site detection.
   1830      1.1      tron 
   1831      1.1      tron <li> These simplifications, and their modular implementation, helped
   1832      1.1      tron to develop further insights into the different roles that positive
   1833      1.1      tron and negative concurrency feedback play, and helped to identify some
   1834      1.1      tron worst-case scenarios.
   1835      1.1      tron 
   1836      1.1      tron </ul>
   1837      1.1      tron 
   1838      1.1      tron </body>
   1839      1.1      tron 
   1840      1.1      tron </html>
   1841