Home | History | Annotate | Line # | Download | only in doc
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
      2 <html>
      3 <head>
      4   <title>GMP Development Projects</title>
      5   <link rel="shortcut icon" href="favicon.ico">
      6   <link rel="stylesheet" href="gmp.css">
      7   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      8 </head>
      9 
     10 <center>
     11   <h1>
     12     GMP Development Projects
     13   </h1>
     14 </center>
     15 
     16 <font size=-1>
     17 <pre>
     18 Copyright 2000-2006, 2008-2011 Free Software Foundation, Inc.
     19 
     20 This file is part of the GNU MP Library.
     21 
     22 The GNU MP Library is free software; you can redistribute it and/or modify
     23 it under the terms of either:
     24 
     25   * the GNU Lesser General Public License as published by the Free
     26     Software Foundation; either version 3 of the License, or (at your
     27     option) any later version.
     28 
     29 or
     30 
     31   * the GNU General Public License as published by the Free Software
     32     Foundation; either version 2 of the License, or (at your option) any
     33     later version.
     34 
     35 or both in parallel, as here.
     36 
     37 The GNU MP Library is distributed in the hope that it will be useful, but
     38 WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
     39 or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
     40 for more details.
     41 
     42 You should have received copies of the GNU General Public License and the
     43 GNU Lesser General Public License along with the GNU MP Library.  If not,
     44 see https://www.gnu.org/licenses/.
     45 </pre>
     46 </font>
     47 
     48 <hr>
     49 <!-- NB. timestamp updated automatically by emacs -->
     50   This file current as of 29 Jan 2014.  An up-to-date version is available at
     51   <a href="https://gmplib.org/projects.html">https://gmplib.org/projects.html</a>.
     52   Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
     53 
     54 <p> This file lists projects suitable for volunteers.  Please see the
     55     <a href="tasks.html">tasks file</a> for smaller tasks.
     56 
     57 <p> If you want to work on any of the projects below, please let
     58     gmp-devel<font>@</font>gmplib.org know.  If you want to help with a project
     59     that already somebody else is working on, you will get in touch through
     60     gmp-devel<font>@</font>gmplib.org.  (There are no email addresses of
     61     volunteers below, due to spamming problems.)
     62 
     63 <ul>
     64 <li> <strong>Faster multiplication</strong>
     65 
     66   <ol>
     67 
     68     <li> Work on the algorithm selection code for unbalanced multiplication.
     69 
     70     <li> Implement an FFT variant computing the coefficients mod m different
     71 	 limb size primes of the form l*2^k+1. i.e., compute m separate FFTs.
     72 	 The wanted coefficients will at the end be found by lifting with CRT
     73 	 (Chinese Remainder Theorem).  If we let m = 3, i.e., use 3 primes, we
     74 	 can split the operands into coefficients at limb boundaries, and if
     75 	 our machine uses b-bit limbs, we can multiply numbers with close to
     76 	 2^b limbs without coefficient overflow.  For smaller multiplication,
     77 	 we might perhaps let m = 1, and instead of splitting our operands at
     78 	 limb boundaries, split them in much smaller pieces.  We might also use
     79 	 4 or more primes, and split operands into bigger than b-bit chunks.
     80 	 By using more primes, the gain in shorter transform length, but lose
     81 	 in having to do more FFTs, but that is a slight total save.  We then
     82 	 lose in more expensive CRT. <br><br>
     83 
     84 	 <p> [We now have two implementations of this algorithm, one by Tommy
     85 	 Frnqvist and one by Niels Mller.]
     86 
     87     <li> Work on short products.  Our mullo and mulmid are probably K, but we
     88          lack mulhi.
     89 
     90   </ol>
     91 
     92   <p> Another possibility would be an optimized cube.  In the basecase that
     93       should definitely be able to save cross-products in a similar fashion to
     94       squaring, but some investigation might be needed for how best to adapt
     95       the higher-order algorithms.  Not sure whether cubing or further small
     96       powers have any particularly important uses though.
     97 
     98 
     99 <li> <strong>Assembly routines</strong>
    100 
    101   <p> Write new and improve existing assembly routines.  The tests/devel
    102       programs and the tune/speed.c and tune/many.pl programs are useful for
    103       testing and timing the routines you write.  See the README files in those
    104       directories for more information.
    105 
    106   <p> Please make sure your new routines are fast for these three situations:
    107       <ol>
    108 	<li> Small operands of less than, say, 10 limbs.
    109 	<li> Medium size operands, that fit into the cache.
    110 	<li> Huge operands that does not fit into the cache.
    111       </ol>
    112 
    113   <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
    114       mpn_sqr_basecase.  The latter two don't exist for all machines, while
    115       mpn_addmul_1 exists for almost all machines.
    116 
    117   <p> Standard techniques for these routines are unrolling, software
    118       pipelining, and specialization for common operand values.  For machines
    119       with poor integer multiplication, it is sometimes possible to remedy the
    120       situation using floating-point operations or SIMD operations such as MMX
    121       (x86) (x86), SSE (x86), VMX (PowerPC), VIS (Sparc).
    122 
    123   <p> Using floating-point operations is interesting but somewhat tricky.
    124       Since IEEE double has 53 bit of mantissa, one has to split the operands
    125       in small pieces, so that no intermediates are greater than 2^53.  For
    126       32-bit computers, splitting one operand into 16-bit pieces works.  For
    127       64-bit machines, one operand can be split into 21-bit pieces and the
    128       other into 32-bit pieces.  (A 64-bit operand can be split into just three
    129       21-bit pieces if one allows the split operands to be negative!)
    130 
    131 
    132 <li> <strong>Faster sqrt</strong>
    133 
    134   <p> The current code uses divisions, which are reasonably fast, but it'd be
    135       possible to use only multiplications by computing 1/sqrt(A) using this
    136       iteration:
    137       <pre>
    138 				    2
    139 		   x   = x  (3 &minus; A x )/2
    140 		    i+1	  i	    i  </pre>
    141       The square root can then be computed like this:
    142       <pre>
    143 		     sqrt(A) = A x
    144 				  n  </pre>
    145   <p> That final multiply might be the full size of the input (though it might
    146       only need the high half of that), so there may or may not be any speedup
    147       overall.
    148 
    149   <p> We should probably allow a special exponent-like parameter, to speed
    150       computations of a precise square root of a small number in mpf and mpfr.
    151 
    152 
    153 <li> <strong>Nth root</strong>
    154 
    155   <p> Improve mpn_rootrem.  The current code is not too bad, but its time
    156       complexity is a function of the input, while it is possible to make
    157       the <i>average</i> complexity a function of the output.
    158 
    159 
    160 <li> <strong>Fat binaries</strong>
    161 
    162   <p> Add more functions to the set of fat functions.
    163 
    164   <p> The speed of multiplication is today highly dependent on combination
    165   functions like <code>addlsh1_n</code>.  A fat binary will never use any such
    166   functions, since they are classified as optional.  Ideally, we should use
    167   them, but making the current compile-time selections of optional functions
    168   become run-time selections for fat binaries.
    169 
    170   <p> If we make fat binaries work really well, we should move away frm tehe
    171   current configure scheme (at least by default) and instead include all code
    172   always.
    173 
    174 
    175 <li> <strong>Exceptions</strong>
    176 
    177   <p> Some sort of scheme for exceptions handling would be desirable.
    178       Presently the only thing documented is that divide by zero in GMP
    179       functions provokes a deliberate machine divide by zero (on those systems
    180       where such a thing exists at least).  The global <code>gmp_errno</code>
    181       is not actually documented, except for the old <code>gmp_randinit</code>
    182       function.  Being currently just a plain global means it's not
    183       thread-safe.
    184 
    185   <p> The basic choices for exceptions are returning an error code or having a
    186       handler function to be called.  The disadvantage of error returns is they
    187       have to be checked, leading to tedious and rarely executed code, and
    188       strictly speaking such a scheme wouldn't be source or binary compatible.
    189       The disadvantage of a handler function is that a <code>longjmp</code> or
    190       similar recovery from it may be difficult.  A combination would be
    191       possible, for instance by allowing the handler to return an error code.
    192 
    193   <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
    194       normally be detected at the start of functions, so exception handling
    195       would have a clean state.  What's worth considering though is that the
    196       GMP function detecting the exception may have been called via some third
    197       party library or self contained application module, and hence have
    198       various bits of state to be cleaned up above it.  It'd be highly
    199       desirable for an exceptions scheme to allow for such cleanups.
    200 
    201   <p> The C++ destructor mechanism could help with cleanups both internally and
    202       externally, but being a plain C library we don't want to depend on that.
    203 
    204   <p> A C++ <code>throw</code> might be a good optional extra exceptions
    205       mechanism, perhaps under a build option.  For
    206       GCC <code>-fexceptions</code> will add the necessary frame information to
    207       plain C code, or GMP could be compiled as C++.
    208 
    209   <p> Out-of-memory exceptions are expected to be handled by the
    210       <code>mp_set_memory_functions</code> routines, rather than being a
    211       prospective part of divide-by-zero etc.  Some similar considerations
    212       apply but what differs is that out-of-memory can arise deep within GMP
    213       internals.  Even fundamental routines like <code>mpn_add_n</code> and
    214       <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
    215       vector systems).  Allowing for an error code return would require an
    216       awful lot of checking internally.  Perhaps it'd still be worthwhile, but
    217       it'd be a lot of changes and the extra code would probably be rather
    218       rarely executed in normal usages.
    219 
    220   <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
    221       general, lead to memory leaks and may leave GMP variables operated on in
    222       inconsistent states.  Maybe it'd be possible to record recovery
    223       information for use by the relevant allocate or reallocate function, but
    224       that too would be a lot of changes.
    225 
    226   <p> One scheme for out-of-memory would be to note that all GMP allocations go
    227       through the <code>mp_set_memory_functions</code> routines.  So if the
    228       application has an intended <code>setjmp</code> recovery point it can
    229       record memory activity by GMP and abandon space allocated and variables
    230       initialized after that point.  This might be as simple as directing the
    231       allocation functions to a separate pool, but in general would have the
    232       disadvantage of needing application-level bookkeeping on top of the
    233       normal system <code>malloc</code>.  An advantage however is that it needs
    234       nothing from GMP itself and on that basis doesn't burden applications not
    235       needing recovery.  Note that there's probably some details to be worked
    236       out here about reallocs of existing variables, and perhaps about copying
    237       or swapping between "permanent" and "temporary" variables.
    238 
    239   <p> Applications desiring a fine-grained error control, for instance a
    240       language interpreter, would very possibly not be well served by a scheme
    241       requiring <code>longjmp</code>.  Wrapping every GMP function call with a
    242       <code>setjmp</code> would be very inconvenient.
    243 
    244   <p> Another option would be to let <code>mpz_t</code> etc hold a sort of NaN,
    245       a special value indicating an out-of-memory or other failure.  This would
    246       be similar to NaNs in mpfr.  Unfortunately such a scheme could only be
    247       used by programs prepared to handle such special values, since for
    248       instance a program waiting for some condition to be satisfied could
    249       become an infinite loop if it wasn't also watching for NaNs.  The work to
    250       implement this would be significant too, lots of checking of inputs and
    251       intermediate results.  And if <code>mpn</code> routines were to
    252       participate in this (which they would have to internally) a lot of new
    253       return values would need to be added, since of course there's no
    254       <code>mpz_t</code> etc structure for them to indicate failure in.
    255 
    256   <p> Stack overflow is another possible exception, but perhaps not one that
    257       can be easily detected in general.  On i386 GNU/Linux for instance GCC
    258       normally doesn't generate stack probes for an <code>alloca</code>, but
    259       merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
    260       miss the stack redzone and hit arbitrary data.  GMP stack usage is
    261       normally a function of operand size, which might be enough for some
    262       applications to know they'll be safe.  Otherwise a fixed maximum usage
    263       can probably be obtained by building with
    264       <code>--enable-alloca=malloc-reentrant</code> (or
    265       <code>notreentrant</code>).  Arranging the default to be
    266       <code>alloca</code> only on blocks up to a certain size and
    267       <code>malloc</code> thereafter might be a better approach and would have
    268       the advantage of not having calculations limited by available stack.
    269 
    270   <p> Actually recovering from stack overflow is of course another problem.  It
    271       might be possible to catch a <code>SIGSEGV</code> in the stack redzone
    272       and do something in a <code>sigaltstack</code>, on systems which have
    273       that, but recovery might otherwise not be possible.  This is worth
    274       bearing in mind because there's no point worrying about tight and careful
    275       out-of-memory recovery if an out-of-stack is fatal.
    276 
    277   <p> Operand overflow is another exception to be addressed.  It's easy for
    278       instance to ask <code>mpz_pow_ui</code> for a result bigger than an
    279       <code>mpz_t</code> can possibly represent.  Currently overflows in limb
    280       or byte count calculations will go undetected.  Often they'll still end
    281       up asking the memory functions for blocks bigger than available memory,
    282       but that's by no means certain and results are unpredictable in general.
    283       It'd be desirable to tighten up such size calculations.  Probably only
    284       selected routines would need checks, if it's assumed say that no input
    285       will be more than half of all memory and hence size additions like say
    286       <code>mpz_mul</code> won't overflow.
    287 
    288 
    289 <li> <strong>Performance Tool</strong>
    290 
    291   <p> It'd be nice to have some sort of tool for getting an overview of
    292       performance.  Clearly a great many things could be done, but some primary
    293       uses would be,
    294 
    295       <ol>
    296 	<li> Checking speed variations between compilers.
    297 	<li> Checking relative performance between systems or CPUs.
    298       </ol>
    299 
    300   <p> A combination of measuring some fundamental routines and some
    301       representative application routines might satisfy these.
    302 
    303   <p> The tune/time.c routines would be the easiest way to get good accurate
    304       measurements on lots of different systems.  The high level
    305       <code>speed_measure</code> may or may not suit, but the basic
    306       <code>speed_starttime</code> and <code>speed_endtime</code> would cover
    307       lots of portability and accuracy questions.
    308 
    309 
    310 <li> <strong>Using <code>restrict</code></strong>
    311 
    312   <p> There might be some value in judicious use of C99 style
    313       <code>restrict</code> on various pointers, but this would need some
    314       careful thought about what it implies for the various operand overlaps
    315       permitted in GMP.
    316 
    317   <p> Rumour has it some pre-C99 compilers had <code>restrict</code>, but
    318       expressing tighter (or perhaps looser) requirements.  Might be worth
    319       investigating that before using <code>restrict</code> unconditionally.
    320 
    321   <p> Loops are presumably where the greatest benefit would be had, by allowing
    322       the compiler to advance reads ahead of writes, perhaps as part of loop
    323       unrolling.  However critical loops are generally coded in assembler, so
    324       there might not be very much to gain.  And on Cray systems the explicit
    325       use of <code>_Pragma</code> gives an equivalent effect.
    326 
    327   <p> One thing to note is that Microsoft C headers (on ia64 at least) contain
    328       <code>__declspec(restrict)</code>, so a <code>#define</code> of
    329       <code>restrict</code> should be avoided.  It might be wisest to setup a
    330       <code>gmp_restrict</code>.
    331 
    332 
    333 <li> <strong>Prime Testing</strong>
    334 
    335   <p> GMP is not really a number theory library and probably shouldn't have
    336       large amounts of code dedicated to sophisticated prime testing
    337       algorithms, but basic things well-implemented would suit.  Tests offering
    338       certainty are probably all too big or too slow (or both!) to justify
    339       inclusion in the main library.  Demo programs showing some possibilities
    340       would be good though.
    341 
    342   <p> The present "repetitions" argument to <code>mpz_probab_prime_p</code> is
    343       rather specific to the Miller-Rabin tests of the current implementation.
    344       Better would be some sort of parameter asking perhaps for a maximum
    345       chance 1/2^x of a probable prime in fact being composite.  If
    346       applications follow the advice that the present reps gives 1/4^reps
    347       chance then perhaps such a change is unnecessary, but an explicitly
    348       described 1/2^x would allow for changes in the implementation or even for
    349       new proofs about the theory.
    350 
    351   <p> <code>mpz_probab_prime_p</code> always initializes a new
    352       <code>gmp_randstate_t</code> for randomized tests, which unfortunately
    353       means it's not really very random and in particular always runs the same
    354       tests for a given input.  Perhaps a new interface could accept an rstate
    355       to use, so successive tests could increase confidence in the result.
    356 
    357   <p> <code>mpn_mod_34lsub1</code> is an obvious and easy improvement to the
    358       trial divisions.  And since the various prime factors are constants, the
    359       remainder can be tested with something like
    360 <pre>
    361 #define MP_LIMB_DIVISIBLE_7_P(n) \
    362   ((n) * MODLIMB_INVERSE_7 &lt;= MP_LIMB_T_MAX/7)
    363 </pre>
    364       Which would help compilers that don't know how to optimize divisions by
    365       constants, and is even an improvement on current gcc 3.2 code.  This
    366       technique works for any modulus, see Granlund and Montgomery "Division by
    367       Invariant Integers" section 9.
    368 
    369   <p> The trial divisions are done with primes generated and grouped at
    370       runtime.  This could instead be a table of data, with pre-calculated
    371       inverses too.  Storing deltas, ie. amounts to add, rather than actual
    372       primes would save space.  <code>udiv_qrnnd_preinv</code> style inverses
    373       can be made to exist by adding dummy factors of 2 if necessary.  Some
    374       thought needs to be given as to how big such a table should be, based on
    375       how much dividing would be profitable for what sort of size inputs.  The
    376       data could be shared by the perfect power testing.
    377 
    378   <p> Jason Moxham points out that if a sqrt(-1) mod N exists then any factor
    379       of N must be == 1 mod 4, saving half the work in trial dividing.  (If
    380       x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the
    381       jacobi symbol (-1/p)=1.  But also (-1/p)=(-1)^((p-1)/2), hence must have
    382       p==1 mod 4.)  But knowing whether sqrt(-1) mod N exists is not too easy.
    383       A strong pseudoprime test can reveal one, so perhaps such a test could be
    384       inserted part way though the dividing.
    385 
    386   <p> Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a
    387       quadratic pseudoprime test taking about 3x longer than a plain test, but
    388       with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests
    389       would offer only (1/4)^3 == 1/64).  Such a test needs completely random
    390       parameters to satisfy the theory, though single-limb values would run
    391       faster.  It's probably best to do at least one plain Miller-Rabin before
    392       any quadratic tests, since that can identify composites in less total
    393       time.
    394 
    395   <p> Some thought needs to be given to the structure of which tests (trial
    396       division, Miller-Rabin, quadratic) and how many are done, based on what
    397       sort of inputs we expect, with a view to minimizing average time.
    398 
    399   <p> It might be a good idea to break out subroutines for the various tests,
    400       so that an application can combine them in ways it prefers, if sensible
    401       defaults in <code>mpz_probab_prime_p</code> don't suit.  In particular
    402       this would let applications skip tests it knew would be unprofitable,
    403       like trial dividing when an input is already known to have no small
    404       factors.
    405 
    406   <p> For small inputs, combinations of theory and explicit search make it
    407       relatively easy to offer certainty.  For instance numbers up to 2^32
    408       could be handled with a strong pseudoprime test and table lookup.  But
    409       it's rather doubtful whether a smallnum prime test belongs in a bignum
    410       library.  Perhaps if it had other internal uses.
    411 
    412   <p> An <code>mpz_nthprime</code> might be cute, but is almost certainly
    413       impractical for anything but small n.
    414 
    415 
    416 <li> <strong>Intra-Library Calls</strong>
    417 
    418   <p> On various systems, calls within libgmp still go through the PLT, TOC or
    419       other mechanism, which makes the code bigger and slower than it needs to
    420       be.
    421 
    422   <p> The theory would be to have all GMP intra-library calls resolved directly
    423       to the routines in the library.  An application wouldn't be able to
    424       replace a routine, the way it can normally, but there seems no good
    425       reason to do that, in normal circumstances.
    426 
    427   <p> The <code>visibility</code> attribute in recent gcc is good for this,
    428       because it lets gcc omit unnecessary GOT pointer setups or whatever if it
    429       finds all calls are local and there's no global data references.
    430       Documented entrypoints would be <code>protected</code>, and purely
    431       internal things not wanted by test programs or anything can be
    432       <code>internal</code>.
    433 
    434   <p> Unfortunately, on i386 it seems <code>protected</code> ends up causing
    435       text segment relocations within libgmp.so, meaning the library code can't
    436       be shared between processes, defeating the purpose of a shared library.
    437       Perhaps this is just a gremlin in binutils (debian packaged
    438       2.13.90.0.16-1).
    439 
    440   <p> The linker can be told directly (with a link script, or options) to do
    441       the same sort of thing.  This doesn't change the code emitted by gcc of
    442       course, but it does mean calls are resolved directly to their targets,
    443       avoiding a PLT entry.
    444 
    445   <p> Keeping symbols private to libgmp.so is probably a good thing in general
    446       too, to stop anyone even attempting to access them.  But some
    447       undocumented things will need or want to be kept visible, for use by
    448       mpfr, or the test and tune programs.  Libtool has a standard option for
    449       selecting public symbols (used now for libmp).
    450 
    451 
    452 <li> <strong>Math functions for the mpf layer</strong>
    453 
    454   <p> Implement the functions of math.h for the GMP mpf layer!	Check the book
    455       "Pi and the AGM" by Borwein and Borwein for ideas how to do this.  These
    456       functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2,
    457       cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
    458 
    459   <p> Note that the <a href="http://mpfr.org">mpfr</a> functions already
    460   provide these functions, and that we usually recommend new programs to use
    461   mpfr instead of mpf.
    462 </ul>
    463 <hr>
    464 
    465 </body>
    466 </html>
    467 
    468 <!--
    469 Local variables:
    470 eval: (add-hook 'write-file-hooks 'time-stamp)
    471 time-stamp-start: "This file current as of "
    472 time-stamp-format: "%:d %3b %:y"
    473 time-stamp-end: "\\."
    474 time-stamp-line-limit: 50
    475 End:
    476 -->
    477