Home | History | Annotate | Line # | Download | only in doc
      1 <h1>TRE API reference manual</h1>
      2 
      3 <h2>The <tt>regcomp()</tt> functions</h2>
      4 <a name="regcomp"></a>
      5 
      6 <div class="code">
      7 <code>
      8 #include &lt;tre/regex.h&gt;
      9 <br>
     10 <br>
     11 <font class="type">int</font>
     12 <font class="func">regcomp</font>(<font
     13 class="type">regex_t</font> *<font class="arg">preg</font>,
     14 <font class="qual">const</font> <font class="type">char</font>
     15 *<font class="arg">regex</font>, <font class="type">int</font>
     16 <font class="arg">cflags</font>);
     17 <br>
     18 <font class="type">int</font> <font
     19 class="func">regncomp</font>(<font class="type">regex_t</font>
     20 *<font class="arg">preg</font>, <font class="qual">const</font>
     21 <font class="type">char</font> *<font class="arg">regex</font>,
     22 <font class="type">size_t</font> <font class="arg">len</font>,
     23 <font class="type">int</font> <font class="arg">cflags</font>);
     24 <br>
     25 <font class="type">int</font> <font
     26 class="func">regwcomp</font>(<font class="type">regex_t</font>
     27 *<font class="arg">preg</font>, <font class="qual">const</font>
     28 <font class="type">wchar_t</font> *<font
     29 class="arg">regex</font>, <font class="type">int</font> <font
     30 class="arg">cflags</font>);
     31 <br>
     32 <font class="type">int</font> <font
     33 class="func">regwncomp</font>(<font class="type">regex_t</font>
     34 *<font class="arg">preg</font>, <font class="qual">const</font>
     35 <font class="type">wchar_t</font> *<font
     36 class="arg">regex</font>, <font class="type">size_t</font>
     37 <font class="arg">len</font>, <font class="type">int</font>
     38 <font class="arg">cflags</font>);
     39 <br>
     40 <font class="type">void</font> <font
     41 class="func">regfree</font>(<font class="type">regex_t</font>
     42 *<font class="arg">preg</font>);
     43 <br>
     44 </code>
     45 </div>
     46 
     47 <p>
     48 The <tt><font class="func">regcomp</font>()</tt> function compiles
     49 the regex string pointed to by <tt><font
     50 class="arg">regex</font></tt> to an internal representation and
     51 stores the result in the pattern buffer structure pointed to by
     52 <tt><font class="arg">preg</font></tt>.  The <tt><font
     53 class="func">regncomp</font>()</tt> function is like <tt><font
     54 class="func">regcomp</font>()</tt>, but <tt><font
     55 class="arg">regex</font></tt> is not terminated with the null
     56 byte.  Instead, the <tt><font class="arg">len</font></tt> argument
     57 is used to give the length of the string, and the string may contain
     58 null bytes.  The <tt><font class="func">regwcomp</font>()</tt> and
     59 <tt><font class="func">regwncomp</font>()</tt> functions work like
     60 <tt><font class="func">regcomp</font>()</tt> and <tt><font
     61 class="func">regncomp</font>()</tt>, respectively, but take a
     62 wide-character (<tt><font class="type">wchar_t</font></tt>) string
     63 instead of a byte string.
     64 </p>
     65 
     66 <p>
     67 The <tt><font class="arg">cflags</font></tt> argument is a the
     68 bitwise inclusive OR of zero or more of the following flags (defined
     69 in the header <tt>&lt;tre/regex.h&gt;</tt>):
     70 </p>
     71 
     72 <blockquote>
     73 <dl>
     74 <dt><tt>REG_EXTENDED</tt></dt>
     75 <dd>Use POSIX Extended Regular Expression (ERE) compatible syntax when
     76 compiling <tt><font class="arg">regex</font></tt>.  The default
     77 syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is
     78 considered obsolete.</dd>
     79 
     80 <dt><tt>REG_ICASE</tt></dt>
     81 <dd>Ignore case.  Subsequent searches with the <a
     82 href="#regexec"><tt>regexec</tt></a> family of functions using this
     83 pattern buffer will be case insensitive.</dd>
     84 
     85 <dt><tt>REG_NOSUB</tt></dt>
     86 <dd>Do not report submatches.  Subsequent searches with the <a
     87 href="#regexec"><tt>regexec</tt></a> family of functions will only
     88 report whether a match was found or not and will not fill the submatch
     89 array.</dd>
     90 
     91 <dt><tt>REG_NEWLINE</tt></dt>
     92 <dd>Normally the newline character is treated as an ordinary
     93 character.  When this flag is used, the newline character
     94 (<tt>'\n'</tt>, ASCII code 10) is treated specially as follows:
     95 <ol>
     96 <li>The match-any-character operator (dot <tt>"."</tt> outside a
     97 bracket expression) does not match a newline.</li>
     98 <li>A non-matching list (<tt>[^...]</tt>) not containing a newline
     99 does not match a newline.</li>
    100 <li>The match-beginning-of-line operator <tt>^</tt> matches the empty
    101 string immediately after a newline as well as the empty string at the
    102 beginning of the string (but see the <code>REG_NOTBOL</code>
    103 <code>regexec()</code> flag below).
    104 <li>The match-end-of-line operator <tt>$</tt> matches the empty
    105 string immediately before a newline as well as the empty string at the
    106 end of the string (but see the <code>REG_NOTEOL</code>
    107 <code>regexec()</code> flag below).
    108 </ol>
    109 </dd>
    110 
    111 <dt><tt>REG_LITERAL</tt></dt>
    112 <dd>Interpret the entire <tt><font class="arg">regex</font></tt>
    113 argument as a literal string, that is, all characters will be
    114 considered ordinary.  This is a nonstandard extension, compatible with
    115 but not specified by POSIX.</dd>
    116 
    117 <dt><tt>REG_NOSPEC</tt></dt>
    118 <dd>Same as <tt>REG_LITERAL</tt>.  This flag is provided for
    119 compatibility with BSD.</dd>
    120 
    121 <dt><tt>REG_RIGHT_ASSOC</tt></dt>
    122 <dd>By default, concatenation is left associative in TRE, as per
    123 the grammar given in the <a
    124 href="http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html">base
    125 specifications on regular expressions</a> of Std 1003.1-2001 (POSIX).
    126 This flag flips associativity of concatenation to right associative.
    127 Associativity can have an effect on how a match is divided into
    128 submatches, but does not change what is matched by the entire regexp.
    129 </dd>
    130 
    131 <dt><tt>REG_UNGREEDY</tt></dt>
    132 <dd>By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and
    133 can be forced to be non-greedy by appending a <tt>?</tt> character. This flag reverses this behavior
    134 by making the operators non-greedy by default and greedy when a <tt>?</tt> is specified.</dd>
    135 </dl>
    136 </blockquote>
    137 
    138 <p>
    139 After a successful call to <tt><font class="func">regcomp</font></tt> it is
    140 possible to use the <tt><font class="arg">preg</font></tt> pattern buffer for
    141 searching for matches in strings (see below).  Once the pattern buffer is no
    142 longer needed, it should be freed with <tt><font
    143 class="func">regfree</font></tt> to free the memory allocated for it.
    144 </p>
    145 
    146 
    147 <p>
    148 The <tt><font class="type">regex_t</font></tt> structure has the
    149 following fields that the application can read:
    150 </p>
    151 <blockquote>
    152 <dl>
    153 <dt><tt><font class="type">size_t</font> <font
    154 class="arg">re_nsub</font></tt></dt>
    155 <dd>Number of parenthesized subexpressions in <tt><font
    156 class="arg">regex</font></tt>.
    157 </dd>
    158 </dl>
    159 </blockquote>
    160 
    161 <p>
    162 The <tt><font class="func">regcomp</font></tt> function returns
    163 zero if the compilation was successful, or one of the following error
    164 codes if there was an error:
    165 </p>
    166 <blockquote>
    167 <dl>
    168 <dt><tt>REG_BADPAT</tt></dt>
    169 <dd>Invalid regexp.  TRE returns this only if a multibyte character
    170 set is used in the current locale, and <tt><font
    171 class="arg">regex</font></tt> contained an invalid multibyte
    172 sequence.</dd>
    173 <dt><tt>REG_ECOLLATE</tt></dt>
    174 <dd>Invalid collating element referenced.  TRE returns this whenever
    175 equivalence classes or multicharacter collating elements are used in
    176 bracket expressions (they are not supported yet).</dd>
    177 <dt><tt>REG_ECTYPE</tt></dt>
    178 <dd>Unknown character class name in <tt>[[:<i>name</i>:]]</tt>.</dd>
    179 <dt><tt>REG_EESCAPE</tt></dt>
    180 <dd>The last character of <tt><font class="arg">regex</font></tt>
    181 was a backslash (<tt>\</tt>).</dd>
    182 <dt><tt>REG_ESUBREG</tt></dt>
    183 <dd>Invalid back reference; number in <tt>\<i>digit</i></tt>
    184 invalid.</dd>
    185 <dt><tt>REG_EBRACK</tt></dt>
    186 <dd><tt>[]</tt> imbalance.</dd>
    187 <dt><tt>REG_EPAREN</tt></dt>
    188 <dd><tt>\(\)</tt> or <tt>()</tt> imbalance.</dd>
    189 <dt><tt>REG_EBRACE</tt></dt>
    190 <dd><tt>\{\}</tt> or <tt>{}</tt> imbalance.</dd>
    191 <dt><tt>REG_BADBR</tt></dt>
    192 <dd><tt>{}</tt> content invalid: not a number, more than two numbers,
    193 first larger than second, or number too large.
    194 <dt><tt>REG_ERANGE</tt></dt>
    195 <dd>Invalid character range, e.g. ending point is earlier in the
    196 collating order than the starting point.</dd>
    197 <dt><tt>REG_ESPACE</tt></dt>
    198 <dd>Out of memory, or an internal limit exceeded.</dd>
    199 <dt><tt>REG_BADRPT</tt></dt>
    200 <dd>Invalid use of repetition operators: two or more repetition operators have
    201 been chained in an undefined way.</dd>
    202 </dl>
    203 </blockquote>
    204 
    205 
    206 <h2>The <tt>regexec()</tt> functions</h2>
    207 <a name="regexec"></a>
    208 
    209 <div class="code">
    210 <code>
    211 #include &lt;tre/regex.h&gt;
    212 <br>
    213 <br>
    214 <font class="type">int</font> <font
    215 class="func">regexec</font>(<font class="qual">const</font>
    216 <font class="type">regex_t</font> *<font
    217 class="arg">preg</font>, <font class="qual">const</font> <font
    218 class="type">char</font> *<font class="arg">string</font>,
    219 <font class="type">size_t</font> <font
    220 class="arg">nmatch</font>,
    221 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    222 <font class="type">regmatch_t</font> <font
    223 class="arg">pmatch</font>[], <font class="type">int</font>
    224 <font class="arg">eflags</font>);
    225 <br>
    226 <font class="type">int</font> <font
    227 class="func">regnexec</font>(<font class="qual">const</font>
    228 <font class="type">regex_t</font> *<font
    229 class="arg">preg</font>, <font class="qual">const</font> <font
    230 class="type">char</font> *<font class="arg">string</font>,
    231 <font class="type">size_t</font> <font class="arg">len</font>,
    232 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    233 <font class="type">size_t</font> <font
    234 class="arg">nmatch</font>, <font class="type">regmatch_t</font>
    235 <font class="arg">pmatch</font>[], <font
    236 class="type">int</font> <font class="arg">eflags</font>);
    237 <br>
    238 <font class="type">int</font> <font
    239 class="func">regwexec</font>(<font class="qual">const</font>
    240 <font class="type">regex_t</font> *<font
    241 class="arg">preg</font>, <font class="qual">const</font> <font
    242 class="type">wchar_t</font> *<font class="arg">string</font>,
    243 <font class="type">size_t</font> <font
    244 class="arg">nmatch</font>,
    245 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    246 <font class="type">regmatch_t</font> <font
    247 class="arg">pmatch</font>[], <font class="type">int</font>
    248 <font class="arg">eflags</font>);
    249 <br>
    250 <font class="type">int</font> <font
    251 class="func">regwnexec</font>(<font class="qual">const</font>
    252 <font class="type">regex_t</font> *<font
    253 class="arg">preg</font>, <font class="qual">const</font> <font
    254 class="type">wchar_t</font> *<font class="arg">string</font>,
    255 <font class="type">size_t</font> <font class="arg">len</font>,
    256 <br>
    257 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    258 <font class="type">size_t</font> <font
    259 class="arg">nmatch</font>, <font class="type">regmatch_t</font>
    260 <font class="arg">pmatch</font>[], <font
    261 class="type">int</font> <font class="arg">eflags</font>);
    262 </code>
    263 </div>
    264 
    265 <p>
    266 The <tt><font class="func">regexec</font>()</tt> function matches
    267 the null-terminated string against the compiled regexp <tt><font
    268 class="arg">preg</font></tt>, initialized by a previous call to
    269 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.  The
    270 <tt><font class="func">regnexec</font>()</tt> function is like
    271 <tt><font class="func">regexec</font>()</tt>, but <tt><font
    272 class="arg">string</font></tt> is not terminated with a null byte.
    273 Instead, the <tt><font class="arg">len</font></tt> argument is used
    274 to give the length of the string, and the string may contain null
    275 bytes.  The <tt><font class="func">regwexec</font>()</tt> and
    276 <tt><font class="func">regwnexec</font>()</tt> functions work like
    277 <tt><font class="func">regexec</font>()</tt> and <tt><font
    278 class="func">regnexec</font>()</tt>, respectively, but take a wide
    279 character (<tt><font class="type">wchar_t</font></tt>) string
    280 instead of a byte string. The <tt><font
    281 class="arg">eflags</font></tt> argument is a bitwise OR of zero or
    282 more of the following flags:
    283 </p>
    284 <blockquote>
    285 <dl>
    286 <dt><code>REG_NOTBOL</code></dt>
    287 <dd>
    288 <p>
    289 When this flag is used, the match-beginning-of-line operator
    290 <tt>^</tt> does not match the empty string at the beginning of
    291 <tt><font class="arg">string</font></tt>.  If
    292 <code>REG_NEWLINE</code> was used when compiling
    293 <tt><font class="arg">preg</font></tt> the empty string
    294 immediately after a newline character will still be matched.
    295 </p>
    296 </dd>
    297 
    298 <dt><code>REG_NOTEOL</code></dt>
    299 <dd>
    300 <p>
    301 When this flag is used, the match-end-of-line operator
    302 <tt>$</tt> does not match the empty string at the end of
    303 <tt><font class="arg">string</font></tt>.  If
    304 <code>REG_NEWLINE</code> was used when compiling
    305 <tt><font class="arg">preg</font></tt> the empty string
    306 immediately before a newline character will still be matched.
    307 </p>
    308 
    309 </dl>
    310 
    311 <p>
    312 These flags are useful when different portions of a string are passed
    313 to <code>regexec</code> and the beginning or end of the partial string
    314 should not be interpreted as the beginning or end of a line.
    315 </p>
    316 
    317 </blockquote>
    318 
    319 <p>
    320 If <code>REG_NOSUB</code> was used when compiling <tt><font
    321 class="arg">preg</font></tt>, <tt><font
    322 class="arg">nmatch</font></tt> is zero, or <tt><font
    323 class="arg">pmatch</font></tt> is <code>NULL</code>, then the
    324 <tt><font class="arg">pmatch</font></tt> argument is ignored.
    325 Otherwise, the submatches corresponding to the parenthesized
    326 subexpressions are filled in the elements of <tt><font
    327 class="arg">pmatch</font></tt>, which must be dimensioned to have
    328 at least <tt><font class="arg">nmatch</font></tt> elements.
    329 </p>
    330 
    331 <p>
    332 The <tt><font class="type">regmatch_t</font></tt> structure contains
    333 at least the following fields:
    334 </p>
    335 <blockquote>
    336 <dl>
    337 <dt><tt><font class="type">regoff_t</font> <font
    338 class="arg">rm_so</font></tt></dt>
    339 <dd>Offset from start of <tt><font class="arg">string</font></tt> to start of
    340 substring.  </dd>
    341 <dt><tt><font class="type">regoff_t</font> <font
    342 class="arg">rm_eo</font></tt></dt>
    343 <dd>Offset from start of <tt><font class="arg">string</font></tt> to the first
    344 character after the substring.  </dd>
    345 </dl>
    346 </blockquote>
    347 
    348 <p>
    349 The length of a submatch can be computed by subtracting <code>rm_eo</code> and
    350 <code>rm_so</code>.  If a parenthesized subexpression did not participate in a
    351 match, the <code>rm_so</code> and <code>rm_eo</code> fields for the
    352 corresponding <code>pmatch</code> element are set to <code>-1</code>.  Note
    353 that when a multibyte character set is in effect, the submatch offsets are
    354 given as byte offsets, not character offsets.
    355 </p>
    356 
    357 <p>
    358 The <code>regexec()</code> functions return zero if a match was found,
    359 otherwise they return <code>REG_NOMATCH</code> to indicate no match,
    360 or <code>REG_ESPACE</code> to indicate that enough temporary memory
    361 could not be allocated to complete the matching operation.
    362 </p>
    363 
    364 
    365 
    366 <h3>reguexec()</h3>
    367 
    368 <div class="code">
    369 <code>
    370 #include &lt;tre/regex.h&gt;
    371 <br>
    372 <br>
    373 <font class="qual">typedef struct</font> {
    374 <br>
    375 &nbsp;&nbsp;<font class="type">int</font> (*get_next_char)(<font
    376 class="type">tre_char_t</font> *<font class="arg">c</font>, <font
    377 class="type">unsigned int</font> *<font class="arg">pos_add</font>,
    378 <font class="type">void</font> *<font class="arg">context</font>);
    379 <br>
    380 &nbsp;&nbsp;<font class="type">void</font> (*rewind)(<font
    381 class="type">size_t</font> <font class="arg">pos</font>, <font
    382 class="type">void</font> *<font class="arg">context</font>);
    383 <br>
    384 &nbsp;&nbsp;<font class="type">int</font> (*compare)(<font
    385 class="type">size_t</font> <font class="arg">pos1</font>, <font
    386 class="type">size_t</font> <font class="arg">pos2</font>, <font
    387 class="type">size_t</font> <font class="arg">len</font>, <font
    388 class="type">void</font> *<font class="arg">context</font>);
    389 <br>
    390 &nbsp;&nbsp;<font class="type">void</font> *<font
    391 class="arg">context</font>;
    392 <br>
    393 } <font class="type">tre_str_source</font>;
    394 <br>
    395 <br>
    396 <font class="type">int</font> <font
    397 class="func">reguexec</font>(<font class="qual">const</font>
    398 <font class="type">regex_t</font> *<font
    399 class="arg">preg</font>, <font class="qual">const</font> <font
    400 class="type">tre_str_source</font> *<font class="arg">string</font>,
    401 <font class="type">size_t</font> <font class="arg">nmatch</font>,
    402 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    403 <font class="type">regmatch_t</font> <font
    404 class="arg">pmatch</font>[], <font class="type">int</font>
    405 <font class="arg">eflags</font>);
    406 </code>
    407 </div>
    408 
    409 <p>
    410 The <tt><font class="func">reguexec</font>()</tt> function works just
    411 like the other <tt>regexec()</tt> functions, except that the input
    412 string is read from user specified callback functions instead of a
    413 character array.  This makes it possible, for example, to match
    414 regexps over arbitrary user specified data structures.
    415 </p>
    416 
    417 <p>
    418 The <tt><font class="type">tre_str_source</font></tt> structure
    419 contains the following fields:
    420 </p>
    421 <blockquote>
    422 <dl>
    423 <dt><tt>get_next_char</tt></dt>
    424 <dd>This function must retrieve the next available character.  If a
    425 character is not available, the space pointed to by
    426 <tt><font class="arg">c</font></tt> must be set to zero and it must return
    427 a nonzero value.  If a character is available, it must be stored
    428 to the space pointed to by
    429 <tt><font class="arg">c</font></tt>, and the integer pointer to by
    430 <tt><font class="arg">pos_add</font></tt> must be set to the
    431 number of units advanced in the input (the value must be
    432 <tt>&gt;=1</tt>), and zero must be returned.</dd>
    433 
    434 <dt><tt>rewind</tt></dt>
    435 <dd>This function must rewind the input stream to the position
    436 specified by <tt><font class="arg">pos</font></tt>.  Unless the regexp
    437 uses back references, <tt>rewind</tt> is not needed and can be set to
    438 <tt>NULL</tt>.</dd>
    439 
    440 <dt><tt>compare</tt></dt>
    441 <dd>This function compares two substrings in the input streams
    442 starting at the positions specified by <tt><font
    443 class="arg">pos1</font></tt> and <tt><font
    444 class="arg">pos2</font></tt> of length <tt><font
    445 class="arg">len</font></tt>.  If the substrings are equal,
    446 <tt>compare</tt> must return zero, otherwise a nonzero value must be
    447 returned.  Unless the regexp uses back references, <tt>compare</tt> is
    448 not needed and can be set to <tt>NULL</tt>.</dd>
    449 
    450 <dt><tt>context</tt></dt>
    451 <dd>This is a context variable, passed as the last argument to
    452 all of the above functions for keeping track of the internal state of
    453 the users code.</dd>
    454 
    455 </dl>
    456 </blockquote>
    457 
    458 <p>
    459 The position in the input stream is measured in <tt><font
    460 class="type">size_t</font></tt> units.  The current position is the
    461 sum of the increments gotten from <tt><font
    462 class="arg">pos_add</font></tt> (plus the position of the last
    463 <tt>rewind</tt>, if any).  The starting position is zero.  Submatch
    464 positions filled in the <tt><font class="arg">pmatch</font>[]</tt>
    465 array are, of course, given using positions computed in this way.
    466 </p>
    467 
    468 <p>
    469 For an example of how to use <tt>reguexec()</tt>, see the
    470 <tt>tests/test-str-source.c</tt> file in the TRE source code
    471 distribution.
    472 </p>
    473 
    474 <h2>The approximate matching functions</h2>
    475 <a name="regaexec"></a>
    476 
    477 <div class="code">
    478 <code>
    479 #include &lt;tre/regex.h&gt;
    480 <br>
    481 <br>
    482 <font class="qual">typedef struct</font> {<br>
    483 &nbsp;&nbsp;<font class="type">int</font>
    484 <font class="arg">cost_ins</font>;<br>
    485 &nbsp;&nbsp;<font class="type">int</font>
    486 <font class="arg">cost_del</font>;<br>
    487 &nbsp;&nbsp;<font class="type">int</font>
    488 <font class="arg">cost_subst</font>;<br>
    489 &nbsp;&nbsp;<font class="type">int</font>
    490 <font class="arg">max_cost</font>;<br><br>
    491 &nbsp;&nbsp;<font class="type">int</font>
    492 <font class="arg">max_ins</font>;<br>
    493 &nbsp;&nbsp;<font class="type">int</font>
    494 <font class="arg">max_del</font>;<br>
    495 &nbsp;&nbsp;<font class="type">int</font>
    496 <font class="arg">max_subst</font>;<br>
    497 &nbsp;&nbsp;<font class="type">int</font>
    498 <font class="arg">max_err</font>;<br>
    499 } <font class="type">regaparams_t</font>;<br>
    500 <br>
    501 <font class="qual">typedef struct</font> {<br>
    502 &nbsp;&nbsp;<font class="type">size_t</font>
    503 <font class="arg">nmatch</font>;<br>
    504 &nbsp;&nbsp;<font class="type">regmatch_t</font>
    505 *<font class="arg">pmatch</font>;<br>
    506 &nbsp;&nbsp;<font class="type">int</font>
    507 <font class="arg">cost</font>;<br>
    508 &nbsp;&nbsp;<font class="type">int</font>
    509 <font class="arg">num_ins</font>;<br>
    510 &nbsp;&nbsp;<font class="type">int</font>
    511 <font class="arg">num_del</font>;<br>
    512 &nbsp;&nbsp;<font class="type">int</font>
    513 <font class="arg">num_subst</font>;<br>
    514 } <font class="type">regamatch_t</font>;<br>
    515 <br>
    516 <font class="type">int</font> <font
    517 class="func">regaexec</font>(<font class="qual">const</font>
    518 <font class="type">regex_t</font> *<font
    519 class="arg">preg</font>, <font class="qual">const</font> <font
    520 class="type">char</font> *<font class="arg">string</font>,<br>
    521 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    522 <font class="type">regamatch_t</font>
    523 *<font class="arg">match</font>,
    524 <font class="type">regaparams_t</font>
    525 <font class="arg">params</font>,
    526 <font class="type">int</font>
    527 <font class="arg">eflags</font>);
    528 <br>
    529 <font class="type">int</font> <font
    530 class="func">reganexec</font>(<font class="qual">const</font>
    531 <font class="type">regex_t</font> *<font
    532 class="arg">preg</font>, <font class="qual">const</font> <font
    533 class="type">char</font> *<font class="arg">string</font>,
    534 <font class="type">size_t</font> <font class="arg">len</font>,<br>
    535 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    536 <font class="type">regamatch_t</font>
    537 *<font class="arg">match</font>,
    538 <font class="type">regaparams_t</font>
    539 <font class="arg">params</font>,
    540 <font class="type">int</font> <font class="arg">eflags</font>);
    541 <br>
    542 <font class="type">int</font> <font
    543 class="func">regawexec</font>(<font class="qual">const</font>
    544 <font class="type">regex_t</font> *<font
    545 class="arg">preg</font>, <font class="qual">const</font> <font
    546 class="type">wchar_t</font> *<font class="arg">string</font>,<br>
    547 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    548 <font class="type">regamatch_t</font>
    549 *<font class="arg">match</font>,
    550 <font class="type">regaparams_t</font>
    551 <font class="arg">params</font>,
    552 <font class="type">int</font>
    553 <font class="arg">eflags</font>);
    554 <br>
    555 <font class="type">int</font>
    556 <font class="func">regawnexec</font>(
    557 <font class="qual">const</font>
    558 <font class="type">regex_t</font>
    559 *<font class="arg">preg</font>,
    560 <font class="qual">const</font>
    561 <font class="type">wchar_t</font>
    562 *<font class="arg">string</font>,
    563 <font class="type">size_t</font>
    564 <font class="arg">len</font>,<br>
    565 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
    566 <font class="type">regamatch_t</font>
    567 *<font class="arg">match</font>,
    568 <font class="type">regaparams_t</font>
    569 <font class="arg">params</font>,
    570 <font class="type">int</font>
    571 <font class="arg">eflags</font>);
    572 <br>
    573 </code>
    574 </div>
    575 
    576 <p>
    577 The <tt><font class="func">regaexec</font>()</tt> function searches for
    578 the best match in <tt><font class="arg">string</font></tt>
    579 against the compiled regexp <tt><font
    580 class="arg">preg</font></tt>, initialized by a previous call to
    581 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.
    582 </p>
    583 
    584 <p>
    585 The <tt><font class="func">reganexec</font>()</tt> function is like
    586 <tt><font class="func">regaexec</font>()</tt>, but <tt><font
    587 class="arg">string</font></tt> is not terminated by a null byte.
    588 Instead, the <tt><font class="arg">len</font></tt> argument is used to
    589 tell the length of the string, and the string may contain null
    590 bytes. The <tt><font class="func">regawexec</font>()</tt> and
    591 <tt><font class="func">regawnexec</font>()</tt> functions work like
    592 <tt><font class="func">regaexec</font>()</tt> and <tt><font
    593 class="func">reganexec</font>()</tt>, respectively, but take a wide
    594 character (<tt><font class="type">wchar_t</font></tt>) string instead
    595 of a byte string.
    596 </p>
    597 
    598 <p>
    599 The <tt><font class="arg">eflags</font></tt> argument is like for
    600 the regexec() functions.
    601 </p>
    602 
    603 <p>
    604 The <tt><font class="arg">params</font></tt> struct controls the
    605 approximate matching parameters:
    606 <blockquote>
    607 <dl>
    608   <dt><tt><font class="type">int</font></tt>
    609       <tt><font class="arg">cost_ins</font></tt></dt>
    610   <dd>The default cost of an inserted character, that is, an extra
    611       character in <tt><font class="arg">string</font></tt>.</dd>
    612 
    613   <dt><tt><font class="type">int</font></tt>
    614       <tt><font class="arg">cost_del</font></tt></dt>
    615   <dd>The default cost of a deleted character, that is, a character
    616       missing from <tt><font class="arg">string</font></tt>.</dd>
    617 
    618   <dt><tt><font class="type">int</font></tt>
    619       <tt><font class="arg">cost_subst</font></tt></dt>
    620   <dd>The default cost of a substituted character.</dd>
    621 
    622   <dt><tt><font class="type">int</font></tt>
    623       <tt><font class="arg">max_cost</font></tt></dt>
    624   <dd>The maximum allowed cost of a match.  If this is set to zero,
    625       an exact matching is searched for, and results equivalent to
    626       those returned by the <tt>regexec()</tt> functions are
    627       returned.</dd>
    628 
    629   <dt><tt><font class="type">int</font></tt>
    630       <tt><font class="arg">max_ins</font></tt></dt>
    631   <dd>Maximum allowed number of inserted characters.</dd>
    632 
    633   <dt><tt><font class="type">int</font></tt>
    634       <tt><font class="arg">max_del</font></tt></dt>
    635   <dd>Maximum allowed number of deleted characters.</dd>
    636 
    637   <dt><tt><font class="type">int</font></tt>
    638       <tt><font class="arg">max_subst</font></tt></dt>
    639   <dd>Maximum allowed number of substituted characters.</dd>
    640 
    641   <dt><tt><font class="type">int</font></tt>
    642       <tt><font class="arg">max_err</font></tt></dt>
    643   <dd>Maximum allowed number of errors (inserts + deletes +
    644       substitutes).</dd>
    645 </dl>
    646 </blockquote>
    647 
    648 <p>
    649 The <tt><font class="arg">match</font></tt> argument points to a
    650 <tt><font class="type">regamatch_t</font></tt> structure.  The
    651 <tt><font class="arg">nmatch</font></tt> and <tt><font
    652 class="arg">pmatch</font></tt> field must be filled by the caller.  If
    653 <code>REG_NOSUB</code> was used when compiling the regexp, or
    654 <code>match-&gt;nmatch</code> is zero, or
    655 <code>match-&gt;pmatch</code> is <code>NULL</code>, the
    656 <code>match-&gt;pmatch</code> argument is ignored.  Otherwise, the
    657 submatches corresponding to the parenthesized subexpressions are
    658 filled in the elements of <code>match-&gt;pmatch</code>, which must be
    659 dimensioned to have at least <code>match-&gt;nmatch</code> elements.
    660 The <code>match-&gt;cost</code> field is set to the cost of the match
    661 found, and the <code>match-&gt;num_ins</code>,
    662 <code>match-&gt;num_del</code>, and <code>match-&gt;num_subst</code>
    663 fields are set to the number of inserts, deletes, and substitutes in
    664 the match, respectively.
    665 </p>
    666 
    667 <p>
    668 The <tt>regaexec()</tt> functions return zero if a match with cost
    669 smaller than <code>params-&gt;max_cost</code> was found, otherwise
    670 they return <code>REG_NOMATCH</code> to indicate no match, or
    671 <code>REG_ESPACE</code> to indicate that enough temporary memory could
    672 not be allocated to complete the matching operation.
    673 </p>
    674 
    675 <h2>Miscellaneous</h2>
    676 
    677 <div class="code">
    678 <code>
    679 #include &lt;tre/regex.h&gt;
    680 <br>
    681 <br>
    682 <font class="type">int</font> <font
    683 class="func">tre_have_backrefs</font>(<font class="qual">const</font>
    684 <font class="type">regex_t</font> *<font class="arg">preg</font>);
    685 <br>
    686 <font class="type">int</font> <font
    687 class="func">tre_have_approx</font>(<font class="qual">const</font>
    688 <font class="type">regex_t</font> *<font class="arg">preg</font>);
    689 <br>
    690 </code>
    691 </div>
    692 
    693 <p>
    694 The <tt><font class="func">tre_have_backrefs</font>()</tt> and
    695 <tt><font class="func">tre_have_approx</font>()</tt> functions return
    696 1 if the compiled pattern has back references or uses approximate
    697 matching, respectively, and 0 if not.
    698 </p>
    699 
    700 
    701 <h2>Checking build time options</h2>
    702 
    703 <a name="tre_config"></a>
    704 <div class="code">
    705 <code>
    706 #include &lt;tre/regex.h&gt;
    707 <br>
    708 <br>
    709 <font class="type">char</font> *<font
    710 class="func">tre_version</font>(<font class="type">void</font>);
    711 <br>
    712 <font class="type">int</font> <font
    713 class="func">tre_config</font>(<font class="type">int</font> <font
    714 class="arg">query</font>, <font class="type">void</font> *<font
    715 class="arg">result</font>);
    716 <br>
    717 </code>
    718 </div>
    719 
    720 <p>
    721 The <tt><font class="func">tre_config</font>()</tt> function can be
    722 used to retrieve information of which optional features have been
    723 compiled into the TRE library and information of other parameters that
    724 may change between releases.
    725 </p>
    726 
    727 <p>
    728 The <tt><font class="arg">query</font></tt> argument is an integer
    729 telling what information is requested for.  The <tt><font
    730 class="arg">result</font></tt> argument is a pointer to a variable
    731 where the information is returned.  The return value of a call to
    732 <tt><font class="func">tre_config</font>()</tt> is zero if <tt><font
    733 class="arg">query</font></tt> was recognized, REG_NOMATCH otherwise.
    734 </p>
    735 
    736 <p>
    737 The following values are recognized for <tt><font
    738 class="arg">query</font></tt>:
    739 
    740 <blockquote>
    741 <dl>
    742 <dt><tt>TRE_CONFIG_APPROX</tt></dt>
    743 <dd>The result is an integer that is set to one if approximate
    744 matching support is available, zero if not.</dd>
    745 <dt><tt>TRE_CONFIG_WCHAR</tt></dt>
    746 <dd>The result is an integer that is set to one if wide character
    747 support is available, zero if not.</dd>
    748 <dt><tt>TRE_CONFIG_MULTIBYTE</tt></dt>
    749 <dd>The result is an integer that is set to one if multibyte character
    750 set support is available, zero if not.</dd>
    751 <dt><tt>TRE_CONFIG_SYSTEM_ABI</tt></dt>
    752 <dd>The result is an integer that is set to one if TRE has been
    753 compiled to be compatible with the system regex ABI, zero if not.</dd>
    754 <dt><tt>TRE_CONFIG_VERSION</tt></dt>
    755 <dd>The result is a pointer to a static character string that gives
    756 the version of the TRE library.</dd>
    757 </dl>
    758 </blockquote>
    759 
    760 
    761 <p>
    762 The <tt><font class="func">tre_version</font>()</tt> function returns
    763 a short human readable character string which shows the software name,
    764 version, and license.
    765 
    766 <h2>Preprocessor definitions</h2>
    767 
    768 <p>The header <tt>&lt;tre/regex.h&gt;</tt> defines certain
    769 C preprocessor symbols.
    770 
    771 <h3>Version information</h3>
    772 
    773 <p>The following definitions may be useful for checking whether a new
    774 enough version is being used.  Note that it is recommended to use the
    775 <tt>pkg-config</tt> tool for version and other checks in Autoconf
    776 scripts.</p>
    777 
    778 <blockquote>
    779 <dl>
    780 <dt><tt>TRE_VERSION</tt></dt>
    781 <dd>The version string. </dd>
    782 
    783 <dt><tt>TRE_VERSION_1</tt></dt>
    784 <dd>The major version number (first part of version string).</dd>
    785 
    786 <dt><tt>TRE_VERSION_2</tt></dt>
    787 <dd>The minor version number (second part of version string).</dd>
    788 
    789 <dt><tt>TRE_VERSION_3</tt></dt>
    790 <dd>The micro version number (third part of version string).</dd>
    791 
    792 </dl>
    793 </blockquote>
    794 
    795 <h3>Features</h3>
    796 
    797 <p>The following definitions may be useful for checking whether all
    798 necessary features are enabled.  Use these only if compile time
    799 checking suffices (linking statically with TRE).  When linking
    800 dynamically <a href="#tre_config"><tt>tre_config()</tt></a> should be used
    801 instead.</p>
    802 
    803 <blockquote>
    804 <dl>
    805 <dt><tt>TRE_APPROX</tt></dt>
    806 <dd>This is defined if approximate matching support is enabled.  The
    807 prototypes for approximate matching functions are defined only if
    808 <tt>TRE_APPROX</tt> is defined.</dd>
    809 
    810 <dt><tt>TRE_WCHAR</tt></dt>
    811 <dd>This is defined if wide character support is enabled.  The
    812 prototypes for wide character matching functions are defined only if
    813 <tt>TRE_WCHAR</tt> is defined.</dd>
    814 
    815 <dt><tt>TRE_MULTIBYTE</tt></dt>
    816 <dd>This is defined if multibyte character set support is enabled.
    817 If this is not set any locale settings are ignored, and the default
    818 locale is used when parsing regexps and matching strings.</dd>
    819 
    820 </dl>
    821 </blockquote>
    822