1 <h1>TRE API reference manual</h1> 2 3 <h2>The <tt>regcomp()</tt> functions</h2> 4 <a name="regcomp"></a> 5 6 <div class="code"> 7 <code> 8 #include <tre/regex.h> 9 <br> 10 <br> 11 <font class="type">int</font> 12 <font class="func">regcomp</font>(<font 13 class="type">regex_t</font> *<font class="arg">preg</font>, 14 <font class="qual">const</font> <font class="type">char</font> 15 *<font class="arg">regex</font>, <font class="type">int</font> 16 <font class="arg">cflags</font>); 17 <br> 18 <font class="type">int</font> <font 19 class="func">regncomp</font>(<font class="type">regex_t</font> 20 *<font class="arg">preg</font>, <font class="qual">const</font> 21 <font class="type">char</font> *<font class="arg">regex</font>, 22 <font class="type">size_t</font> <font class="arg">len</font>, 23 <font class="type">int</font> <font class="arg">cflags</font>); 24 <br> 25 <font class="type">int</font> <font 26 class="func">regwcomp</font>(<font class="type">regex_t</font> 27 *<font class="arg">preg</font>, <font class="qual">const</font> 28 <font class="type">wchar_t</font> *<font 29 class="arg">regex</font>, <font class="type">int</font> <font 30 class="arg">cflags</font>); 31 <br> 32 <font class="type">int</font> <font 33 class="func">regwncomp</font>(<font class="type">regex_t</font> 34 *<font class="arg">preg</font>, <font class="qual">const</font> 35 <font class="type">wchar_t</font> *<font 36 class="arg">regex</font>, <font class="type">size_t</font> 37 <font class="arg">len</font>, <font class="type">int</font> 38 <font class="arg">cflags</font>); 39 <br> 40 <font class="type">void</font> <font 41 class="func">regfree</font>(<font class="type">regex_t</font> 42 *<font class="arg">preg</font>); 43 <br> 44 </code> 45 </div> 46 47 <p> 48 The <tt><font class="func">regcomp</font>()</tt> function compiles 49 the regex string pointed to by <tt><font 50 class="arg">regex</font></tt> to an internal representation and 51 stores the result in the pattern buffer structure pointed to by 52 <tt><font class="arg">preg</font></tt>. The <tt><font 53 class="func">regncomp</font>()</tt> function is like <tt><font 54 class="func">regcomp</font>()</tt>, but <tt><font 55 class="arg">regex</font></tt> is not terminated with the null 56 byte. Instead, the <tt><font class="arg">len</font></tt> argument 57 is used to give the length of the string, and the string may contain 58 null bytes. The <tt><font class="func">regwcomp</font>()</tt> and 59 <tt><font class="func">regwncomp</font>()</tt> functions work like 60 <tt><font class="func">regcomp</font>()</tt> and <tt><font 61 class="func">regncomp</font>()</tt>, respectively, but take a 62 wide-character (<tt><font class="type">wchar_t</font></tt>) string 63 instead of a byte string. 64 </p> 65 66 <p> 67 The <tt><font class="arg">cflags</font></tt> argument is a the 68 bitwise inclusive OR of zero or more of the following flags (defined 69 in the header <tt><tre/regex.h></tt>): 70 </p> 71 72 <blockquote> 73 <dl> 74 <dt><tt>REG_EXTENDED</tt></dt> 75 <dd>Use POSIX Extended Regular Expression (ERE) compatible syntax when 76 compiling <tt><font class="arg">regex</font></tt>. The default 77 syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is 78 considered obsolete.</dd> 79 80 <dt><tt>REG_ICASE</tt></dt> 81 <dd>Ignore case. Subsequent searches with the <a 82 href="#regexec"><tt>regexec</tt></a> family of functions using this 83 pattern buffer will be case insensitive.</dd> 84 85 <dt><tt>REG_NOSUB</tt></dt> 86 <dd>Do not report submatches. Subsequent searches with the <a 87 href="#regexec"><tt>regexec</tt></a> family of functions will only 88 report whether a match was found or not and will not fill the submatch 89 array.</dd> 90 91 <dt><tt>REG_NEWLINE</tt></dt> 92 <dd>Normally the newline character is treated as an ordinary 93 character. When this flag is used, the newline character 94 (<tt>'\n'</tt>, ASCII code 10) is treated specially as follows: 95 <ol> 96 <li>The match-any-character operator (dot <tt>"."</tt> outside a 97 bracket expression) does not match a newline.</li> 98 <li>A non-matching list (<tt>[^...]</tt>) not containing a newline 99 does not match a newline.</li> 100 <li>The match-beginning-of-line operator <tt>^</tt> matches the empty 101 string immediately after a newline as well as the empty string at the 102 beginning of the string (but see the <code>REG_NOTBOL</code> 103 <code>regexec()</code> flag below). 104 <li>The match-end-of-line operator <tt>$</tt> matches the empty 105 string immediately before a newline as well as the empty string at the 106 end of the string (but see the <code>REG_NOTEOL</code> 107 <code>regexec()</code> flag below). 108 </ol> 109 </dd> 110 111 <dt><tt>REG_LITERAL</tt></dt> 112 <dd>Interpret the entire <tt><font class="arg">regex</font></tt> 113 argument as a literal string, that is, all characters will be 114 considered ordinary. This is a nonstandard extension, compatible with 115 but not specified by POSIX.</dd> 116 117 <dt><tt>REG_NOSPEC</tt></dt> 118 <dd>Same as <tt>REG_LITERAL</tt>. This flag is provided for 119 compatibility with BSD.</dd> 120 121 <dt><tt>REG_RIGHT_ASSOC</tt></dt> 122 <dd>By default, concatenation is left associative in TRE, as per 123 the grammar given in the <a 124 href="http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html">base 125 specifications on regular expressions</a> of Std 1003.1-2001 (POSIX). 126 This flag flips associativity of concatenation to right associative. 127 Associativity can have an effect on how a match is divided into 128 submatches, but does not change what is matched by the entire regexp. 129 </dd> 130 131 <dt><tt>REG_UNGREEDY</tt></dt> 132 <dd>By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and 133 can be forced to be non-greedy by appending a <tt>?</tt> character. This flag reverses this behavior 134 by making the operators non-greedy by default and greedy when a <tt>?</tt> is specified.</dd> 135 </dl> 136 </blockquote> 137 138 <p> 139 After a successful call to <tt><font class="func">regcomp</font></tt> it is 140 possible to use the <tt><font class="arg">preg</font></tt> pattern buffer for 141 searching for matches in strings (see below). Once the pattern buffer is no 142 longer needed, it should be freed with <tt><font 143 class="func">regfree</font></tt> to free the memory allocated for it. 144 </p> 145 146 147 <p> 148 The <tt><font class="type">regex_t</font></tt> structure has the 149 following fields that the application can read: 150 </p> 151 <blockquote> 152 <dl> 153 <dt><tt><font class="type">size_t</font> <font 154 class="arg">re_nsub</font></tt></dt> 155 <dd>Number of parenthesized subexpressions in <tt><font 156 class="arg">regex</font></tt>. 157 </dd> 158 </dl> 159 </blockquote> 160 161 <p> 162 The <tt><font class="func">regcomp</font></tt> function returns 163 zero if the compilation was successful, or one of the following error 164 codes if there was an error: 165 </p> 166 <blockquote> 167 <dl> 168 <dt><tt>REG_BADPAT</tt></dt> 169 <dd>Invalid regexp. TRE returns this only if a multibyte character 170 set is used in the current locale, and <tt><font 171 class="arg">regex</font></tt> contained an invalid multibyte 172 sequence.</dd> 173 <dt><tt>REG_ECOLLATE</tt></dt> 174 <dd>Invalid collating element referenced. TRE returns this whenever 175 equivalence classes or multicharacter collating elements are used in 176 bracket expressions (they are not supported yet).</dd> 177 <dt><tt>REG_ECTYPE</tt></dt> 178 <dd>Unknown character class name in <tt>[[:<i>name</i>:]]</tt>.</dd> 179 <dt><tt>REG_EESCAPE</tt></dt> 180 <dd>The last character of <tt><font class="arg">regex</font></tt> 181 was a backslash (<tt>\</tt>).</dd> 182 <dt><tt>REG_ESUBREG</tt></dt> 183 <dd>Invalid back reference; number in <tt>\<i>digit</i></tt> 184 invalid.</dd> 185 <dt><tt>REG_EBRACK</tt></dt> 186 <dd><tt>[]</tt> imbalance.</dd> 187 <dt><tt>REG_EPAREN</tt></dt> 188 <dd><tt>\(\)</tt> or <tt>()</tt> imbalance.</dd> 189 <dt><tt>REG_EBRACE</tt></dt> 190 <dd><tt>\{\}</tt> or <tt>{}</tt> imbalance.</dd> 191 <dt><tt>REG_BADBR</tt></dt> 192 <dd><tt>{}</tt> content invalid: not a number, more than two numbers, 193 first larger than second, or number too large. 194 <dt><tt>REG_ERANGE</tt></dt> 195 <dd>Invalid character range, e.g. ending point is earlier in the 196 collating order than the starting point.</dd> 197 <dt><tt>REG_ESPACE</tt></dt> 198 <dd>Out of memory, or an internal limit exceeded.</dd> 199 <dt><tt>REG_BADRPT</tt></dt> 200 <dd>Invalid use of repetition operators: two or more repetition operators have 201 been chained in an undefined way.</dd> 202 </dl> 203 </blockquote> 204 205 206 <h2>The <tt>regexec()</tt> functions</h2> 207 <a name="regexec"></a> 208 209 <div class="code"> 210 <code> 211 #include <tre/regex.h> 212 <br> 213 <br> 214 <font class="type">int</font> <font 215 class="func">regexec</font>(<font class="qual">const</font> 216 <font class="type">regex_t</font> *<font 217 class="arg">preg</font>, <font class="qual">const</font> <font 218 class="type">char</font> *<font class="arg">string</font>, 219 <font class="type">size_t</font> <font 220 class="arg">nmatch</font>, 221 <br> 222 <font class="type">regmatch_t</font> <font 223 class="arg">pmatch</font>[], <font class="type">int</font> 224 <font class="arg">eflags</font>); 225 <br> 226 <font class="type">int</font> <font 227 class="func">regnexec</font>(<font class="qual">const</font> 228 <font class="type">regex_t</font> *<font 229 class="arg">preg</font>, <font class="qual">const</font> <font 230 class="type">char</font> *<font class="arg">string</font>, 231 <font class="type">size_t</font> <font class="arg">len</font>, 232 <br> 233 <font class="type">size_t</font> <font 234 class="arg">nmatch</font>, <font class="type">regmatch_t</font> 235 <font class="arg">pmatch</font>[], <font 236 class="type">int</font> <font class="arg">eflags</font>); 237 <br> 238 <font class="type">int</font> <font 239 class="func">regwexec</font>(<font class="qual">const</font> 240 <font class="type">regex_t</font> *<font 241 class="arg">preg</font>, <font class="qual">const</font> <font 242 class="type">wchar_t</font> *<font class="arg">string</font>, 243 <font class="type">size_t</font> <font 244 class="arg">nmatch</font>, 245 <br> 246 <font class="type">regmatch_t</font> <font 247 class="arg">pmatch</font>[], <font class="type">int</font> 248 <font class="arg">eflags</font>); 249 <br> 250 <font class="type">int</font> <font 251 class="func">regwnexec</font>(<font class="qual">const</font> 252 <font class="type">regex_t</font> *<font 253 class="arg">preg</font>, <font class="qual">const</font> <font 254 class="type">wchar_t</font> *<font class="arg">string</font>, 255 <font class="type">size_t</font> <font class="arg">len</font>, 256 <br> 257 258 <font class="type">size_t</font> <font 259 class="arg">nmatch</font>, <font class="type">regmatch_t</font> 260 <font class="arg">pmatch</font>[], <font 261 class="type">int</font> <font class="arg">eflags</font>); 262 </code> 263 </div> 264 265 <p> 266 The <tt><font class="func">regexec</font>()</tt> function matches 267 the null-terminated string against the compiled regexp <tt><font 268 class="arg">preg</font></tt>, initialized by a previous call to 269 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions. The 270 <tt><font class="func">regnexec</font>()</tt> function is like 271 <tt><font class="func">regexec</font>()</tt>, but <tt><font 272 class="arg">string</font></tt> is not terminated with a null byte. 273 Instead, the <tt><font class="arg">len</font></tt> argument is used 274 to give the length of the string, and the string may contain null 275 bytes. The <tt><font class="func">regwexec</font>()</tt> and 276 <tt><font class="func">regwnexec</font>()</tt> functions work like 277 <tt><font class="func">regexec</font>()</tt> and <tt><font 278 class="func">regnexec</font>()</tt>, respectively, but take a wide 279 character (<tt><font class="type">wchar_t</font></tt>) string 280 instead of a byte string. The <tt><font 281 class="arg">eflags</font></tt> argument is a bitwise OR of zero or 282 more of the following flags: 283 </p> 284 <blockquote> 285 <dl> 286 <dt><code>REG_NOTBOL</code></dt> 287 <dd> 288 <p> 289 When this flag is used, the match-beginning-of-line operator 290 <tt>^</tt> does not match the empty string at the beginning of 291 <tt><font class="arg">string</font></tt>. If 292 <code>REG_NEWLINE</code> was used when compiling 293 <tt><font class="arg">preg</font></tt> the empty string 294 immediately after a newline character will still be matched. 295 </p> 296 </dd> 297 298 <dt><code>REG_NOTEOL</code></dt> 299 <dd> 300 <p> 301 When this flag is used, the match-end-of-line operator 302 <tt>$</tt> does not match the empty string at the end of 303 <tt><font class="arg">string</font></tt>. If 304 <code>REG_NEWLINE</code> was used when compiling 305 <tt><font class="arg">preg</font></tt> the empty string 306 immediately before a newline character will still be matched. 307 </p> 308 309 </dl> 310 311 <p> 312 These flags are useful when different portions of a string are passed 313 to <code>regexec</code> and the beginning or end of the partial string 314 should not be interpreted as the beginning or end of a line. 315 </p> 316 317 </blockquote> 318 319 <p> 320 If <code>REG_NOSUB</code> was used when compiling <tt><font 321 class="arg">preg</font></tt>, <tt><font 322 class="arg">nmatch</font></tt> is zero, or <tt><font 323 class="arg">pmatch</font></tt> is <code>NULL</code>, then the 324 <tt><font class="arg">pmatch</font></tt> argument is ignored. 325 Otherwise, the submatches corresponding to the parenthesized 326 subexpressions are filled in the elements of <tt><font 327 class="arg">pmatch</font></tt>, which must be dimensioned to have 328 at least <tt><font class="arg">nmatch</font></tt> elements. 329 </p> 330 331 <p> 332 The <tt><font class="type">regmatch_t</font></tt> structure contains 333 at least the following fields: 334 </p> 335 <blockquote> 336 <dl> 337 <dt><tt><font class="type">regoff_t</font> <font 338 class="arg">rm_so</font></tt></dt> 339 <dd>Offset from start of <tt><font class="arg">string</font></tt> to start of 340 substring. </dd> 341 <dt><tt><font class="type">regoff_t</font> <font 342 class="arg">rm_eo</font></tt></dt> 343 <dd>Offset from start of <tt><font class="arg">string</font></tt> to the first 344 character after the substring. </dd> 345 </dl> 346 </blockquote> 347 348 <p> 349 The length of a submatch can be computed by subtracting <code>rm_eo</code> and 350 <code>rm_so</code>. If a parenthesized subexpression did not participate in a 351 match, the <code>rm_so</code> and <code>rm_eo</code> fields for the 352 corresponding <code>pmatch</code> element are set to <code>-1</code>. Note 353 that when a multibyte character set is in effect, the submatch offsets are 354 given as byte offsets, not character offsets. 355 </p> 356 357 <p> 358 The <code>regexec()</code> functions return zero if a match was found, 359 otherwise they return <code>REG_NOMATCH</code> to indicate no match, 360 or <code>REG_ESPACE</code> to indicate that enough temporary memory 361 could not be allocated to complete the matching operation. 362 </p> 363 364 365 366 <h3>reguexec()</h3> 367 368 <div class="code"> 369 <code> 370 #include <tre/regex.h> 371 <br> 372 <br> 373 <font class="qual">typedef struct</font> { 374 <br> 375 <font class="type">int</font> (*get_next_char)(<font 376 class="type">tre_char_t</font> *<font class="arg">c</font>, <font 377 class="type">unsigned int</font> *<font class="arg">pos_add</font>, 378 <font class="type">void</font> *<font class="arg">context</font>); 379 <br> 380 <font class="type">void</font> (*rewind)(<font 381 class="type">size_t</font> <font class="arg">pos</font>, <font 382 class="type">void</font> *<font class="arg">context</font>); 383 <br> 384 <font class="type">int</font> (*compare)(<font 385 class="type">size_t</font> <font class="arg">pos1</font>, <font 386 class="type">size_t</font> <font class="arg">pos2</font>, <font 387 class="type">size_t</font> <font class="arg">len</font>, <font 388 class="type">void</font> *<font class="arg">context</font>); 389 <br> 390 <font class="type">void</font> *<font 391 class="arg">context</font>; 392 <br> 393 } <font class="type">tre_str_source</font>; 394 <br> 395 <br> 396 <font class="type">int</font> <font 397 class="func">reguexec</font>(<font class="qual">const</font> 398 <font class="type">regex_t</font> *<font 399 class="arg">preg</font>, <font class="qual">const</font> <font 400 class="type">tre_str_source</font> *<font class="arg">string</font>, 401 <font class="type">size_t</font> <font class="arg">nmatch</font>, 402 <br> 403 <font class="type">regmatch_t</font> <font 404 class="arg">pmatch</font>[], <font class="type">int</font> 405 <font class="arg">eflags</font>); 406 </code> 407 </div> 408 409 <p> 410 The <tt><font class="func">reguexec</font>()</tt> function works just 411 like the other <tt>regexec()</tt> functions, except that the input 412 string is read from user specified callback functions instead of a 413 character array. This makes it possible, for example, to match 414 regexps over arbitrary user specified data structures. 415 </p> 416 417 <p> 418 The <tt><font class="type">tre_str_source</font></tt> structure 419 contains the following fields: 420 </p> 421 <blockquote> 422 <dl> 423 <dt><tt>get_next_char</tt></dt> 424 <dd>This function must retrieve the next available character. If a 425 character is not available, the space pointed to by 426 <tt><font class="arg">c</font></tt> must be set to zero and it must return 427 a nonzero value. If a character is available, it must be stored 428 to the space pointed to by 429 <tt><font class="arg">c</font></tt>, and the integer pointer to by 430 <tt><font class="arg">pos_add</font></tt> must be set to the 431 number of units advanced in the input (the value must be 432 <tt>>=1</tt>), and zero must be returned.</dd> 433 434 <dt><tt>rewind</tt></dt> 435 <dd>This function must rewind the input stream to the position 436 specified by <tt><font class="arg">pos</font></tt>. Unless the regexp 437 uses back references, <tt>rewind</tt> is not needed and can be set to 438 <tt>NULL</tt>.</dd> 439 440 <dt><tt>compare</tt></dt> 441 <dd>This function compares two substrings in the input streams 442 starting at the positions specified by <tt><font 443 class="arg">pos1</font></tt> and <tt><font 444 class="arg">pos2</font></tt> of length <tt><font 445 class="arg">len</font></tt>. If the substrings are equal, 446 <tt>compare</tt> must return zero, otherwise a nonzero value must be 447 returned. Unless the regexp uses back references, <tt>compare</tt> is 448 not needed and can be set to <tt>NULL</tt>.</dd> 449 450 <dt><tt>context</tt></dt> 451 <dd>This is a context variable, passed as the last argument to 452 all of the above functions for keeping track of the internal state of 453 the users code.</dd> 454 455 </dl> 456 </blockquote> 457 458 <p> 459 The position in the input stream is measured in <tt><font 460 class="type">size_t</font></tt> units. The current position is the 461 sum of the increments gotten from <tt><font 462 class="arg">pos_add</font></tt> (plus the position of the last 463 <tt>rewind</tt>, if any). The starting position is zero. Submatch 464 positions filled in the <tt><font class="arg">pmatch</font>[]</tt> 465 array are, of course, given using positions computed in this way. 466 </p> 467 468 <p> 469 For an example of how to use <tt>reguexec()</tt>, see the 470 <tt>tests/test-str-source.c</tt> file in the TRE source code 471 distribution. 472 </p> 473 474 <h2>The approximate matching functions</h2> 475 <a name="regaexec"></a> 476 477 <div class="code"> 478 <code> 479 #include <tre/regex.h> 480 <br> 481 <br> 482 <font class="qual">typedef struct</font> {<br> 483 <font class="type">int</font> 484 <font class="arg">cost_ins</font>;<br> 485 <font class="type">int</font> 486 <font class="arg">cost_del</font>;<br> 487 <font class="type">int</font> 488 <font class="arg">cost_subst</font>;<br> 489 <font class="type">int</font> 490 <font class="arg">max_cost</font>;<br><br> 491 <font class="type">int</font> 492 <font class="arg">max_ins</font>;<br> 493 <font class="type">int</font> 494 <font class="arg">max_del</font>;<br> 495 <font class="type">int</font> 496 <font class="arg">max_subst</font>;<br> 497 <font class="type">int</font> 498 <font class="arg">max_err</font>;<br> 499 } <font class="type">regaparams_t</font>;<br> 500 <br> 501 <font class="qual">typedef struct</font> {<br> 502 <font class="type">size_t</font> 503 <font class="arg">nmatch</font>;<br> 504 <font class="type">regmatch_t</font> 505 *<font class="arg">pmatch</font>;<br> 506 <font class="type">int</font> 507 <font class="arg">cost</font>;<br> 508 <font class="type">int</font> 509 <font class="arg">num_ins</font>;<br> 510 <font class="type">int</font> 511 <font class="arg">num_del</font>;<br> 512 <font class="type">int</font> 513 <font class="arg">num_subst</font>;<br> 514 } <font class="type">regamatch_t</font>;<br> 515 <br> 516 <font class="type">int</font> <font 517 class="func">regaexec</font>(<font class="qual">const</font> 518 <font class="type">regex_t</font> *<font 519 class="arg">preg</font>, <font class="qual">const</font> <font 520 class="type">char</font> *<font class="arg">string</font>,<br> 521 522 <font class="type">regamatch_t</font> 523 *<font class="arg">match</font>, 524 <font class="type">regaparams_t</font> 525 <font class="arg">params</font>, 526 <font class="type">int</font> 527 <font class="arg">eflags</font>); 528 <br> 529 <font class="type">int</font> <font 530 class="func">reganexec</font>(<font class="qual">const</font> 531 <font class="type">regex_t</font> *<font 532 class="arg">preg</font>, <font class="qual">const</font> <font 533 class="type">char</font> *<font class="arg">string</font>, 534 <font class="type">size_t</font> <font class="arg">len</font>,<br> 535 536 <font class="type">regamatch_t</font> 537 *<font class="arg">match</font>, 538 <font class="type">regaparams_t</font> 539 <font class="arg">params</font>, 540 <font class="type">int</font> <font class="arg">eflags</font>); 541 <br> 542 <font class="type">int</font> <font 543 class="func">regawexec</font>(<font class="qual">const</font> 544 <font class="type">regex_t</font> *<font 545 class="arg">preg</font>, <font class="qual">const</font> <font 546 class="type">wchar_t</font> *<font class="arg">string</font>,<br> 547 548 <font class="type">regamatch_t</font> 549 *<font class="arg">match</font>, 550 <font class="type">regaparams_t</font> 551 <font class="arg">params</font>, 552 <font class="type">int</font> 553 <font class="arg">eflags</font>); 554 <br> 555 <font class="type">int</font> 556 <font class="func">regawnexec</font>( 557 <font class="qual">const</font> 558 <font class="type">regex_t</font> 559 *<font class="arg">preg</font>, 560 <font class="qual">const</font> 561 <font class="type">wchar_t</font> 562 *<font class="arg">string</font>, 563 <font class="type">size_t</font> 564 <font class="arg">len</font>,<br> 565 566 <font class="type">regamatch_t</font> 567 *<font class="arg">match</font>, 568 <font class="type">regaparams_t</font> 569 <font class="arg">params</font>, 570 <font class="type">int</font> 571 <font class="arg">eflags</font>); 572 <br> 573 </code> 574 </div> 575 576 <p> 577 The <tt><font class="func">regaexec</font>()</tt> function searches for 578 the best match in <tt><font class="arg">string</font></tt> 579 against the compiled regexp <tt><font 580 class="arg">preg</font></tt>, initialized by a previous call to 581 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions. 582 </p> 583 584 <p> 585 The <tt><font class="func">reganexec</font>()</tt> function is like 586 <tt><font class="func">regaexec</font>()</tt>, but <tt><font 587 class="arg">string</font></tt> is not terminated by a null byte. 588 Instead, the <tt><font class="arg">len</font></tt> argument is used to 589 tell the length of the string, and the string may contain null 590 bytes. The <tt><font class="func">regawexec</font>()</tt> and 591 <tt><font class="func">regawnexec</font>()</tt> functions work like 592 <tt><font class="func">regaexec</font>()</tt> and <tt><font 593 class="func">reganexec</font>()</tt>, respectively, but take a wide 594 character (<tt><font class="type">wchar_t</font></tt>) string instead 595 of a byte string. 596 </p> 597 598 <p> 599 The <tt><font class="arg">eflags</font></tt> argument is like for 600 the regexec() functions. 601 </p> 602 603 <p> 604 The <tt><font class="arg">params</font></tt> struct controls the 605 approximate matching parameters: 606 <blockquote> 607 <dl> 608 <dt><tt><font class="type">int</font></tt> 609 <tt><font class="arg">cost_ins</font></tt></dt> 610 <dd>The default cost of an inserted character, that is, an extra 611 character in <tt><font class="arg">string</font></tt>.</dd> 612 613 <dt><tt><font class="type">int</font></tt> 614 <tt><font class="arg">cost_del</font></tt></dt> 615 <dd>The default cost of a deleted character, that is, a character 616 missing from <tt><font class="arg">string</font></tt>.</dd> 617 618 <dt><tt><font class="type">int</font></tt> 619 <tt><font class="arg">cost_subst</font></tt></dt> 620 <dd>The default cost of a substituted character.</dd> 621 622 <dt><tt><font class="type">int</font></tt> 623 <tt><font class="arg">max_cost</font></tt></dt> 624 <dd>The maximum allowed cost of a match. If this is set to zero, 625 an exact matching is searched for, and results equivalent to 626 those returned by the <tt>regexec()</tt> functions are 627 returned.</dd> 628 629 <dt><tt><font class="type">int</font></tt> 630 <tt><font class="arg">max_ins</font></tt></dt> 631 <dd>Maximum allowed number of inserted characters.</dd> 632 633 <dt><tt><font class="type">int</font></tt> 634 <tt><font class="arg">max_del</font></tt></dt> 635 <dd>Maximum allowed number of deleted characters.</dd> 636 637 <dt><tt><font class="type">int</font></tt> 638 <tt><font class="arg">max_subst</font></tt></dt> 639 <dd>Maximum allowed number of substituted characters.</dd> 640 641 <dt><tt><font class="type">int</font></tt> 642 <tt><font class="arg">max_err</font></tt></dt> 643 <dd>Maximum allowed number of errors (inserts + deletes + 644 substitutes).</dd> 645 </dl> 646 </blockquote> 647 648 <p> 649 The <tt><font class="arg">match</font></tt> argument points to a 650 <tt><font class="type">regamatch_t</font></tt> structure. The 651 <tt><font class="arg">nmatch</font></tt> and <tt><font 652 class="arg">pmatch</font></tt> field must be filled by the caller. If 653 <code>REG_NOSUB</code> was used when compiling the regexp, or 654 <code>match->nmatch</code> is zero, or 655 <code>match->pmatch</code> is <code>NULL</code>, the 656 <code>match->pmatch</code> argument is ignored. Otherwise, the 657 submatches corresponding to the parenthesized subexpressions are 658 filled in the elements of <code>match->pmatch</code>, which must be 659 dimensioned to have at least <code>match->nmatch</code> elements. 660 The <code>match->cost</code> field is set to the cost of the match 661 found, and the <code>match->num_ins</code>, 662 <code>match->num_del</code>, and <code>match->num_subst</code> 663 fields are set to the number of inserts, deletes, and substitutes in 664 the match, respectively. 665 </p> 666 667 <p> 668 The <tt>regaexec()</tt> functions return zero if a match with cost 669 smaller than <code>params->max_cost</code> was found, otherwise 670 they return <code>REG_NOMATCH</code> to indicate no match, or 671 <code>REG_ESPACE</code> to indicate that enough temporary memory could 672 not be allocated to complete the matching operation. 673 </p> 674 675 <h2>Miscellaneous</h2> 676 677 <div class="code"> 678 <code> 679 #include <tre/regex.h> 680 <br> 681 <br> 682 <font class="type">int</font> <font 683 class="func">tre_have_backrefs</font>(<font class="qual">const</font> 684 <font class="type">regex_t</font> *<font class="arg">preg</font>); 685 <br> 686 <font class="type">int</font> <font 687 class="func">tre_have_approx</font>(<font class="qual">const</font> 688 <font class="type">regex_t</font> *<font class="arg">preg</font>); 689 <br> 690 </code> 691 </div> 692 693 <p> 694 The <tt><font class="func">tre_have_backrefs</font>()</tt> and 695 <tt><font class="func">tre_have_approx</font>()</tt> functions return 696 1 if the compiled pattern has back references or uses approximate 697 matching, respectively, and 0 if not. 698 </p> 699 700 701 <h2>Checking build time options</h2> 702 703 <a name="tre_config"></a> 704 <div class="code"> 705 <code> 706 #include <tre/regex.h> 707 <br> 708 <br> 709 <font class="type">char</font> *<font 710 class="func">tre_version</font>(<font class="type">void</font>); 711 <br> 712 <font class="type">int</font> <font 713 class="func">tre_config</font>(<font class="type">int</font> <font 714 class="arg">query</font>, <font class="type">void</font> *<font 715 class="arg">result</font>); 716 <br> 717 </code> 718 </div> 719 720 <p> 721 The <tt><font class="func">tre_config</font>()</tt> function can be 722 used to retrieve information of which optional features have been 723 compiled into the TRE library and information of other parameters that 724 may change between releases. 725 </p> 726 727 <p> 728 The <tt><font class="arg">query</font></tt> argument is an integer 729 telling what information is requested for. The <tt><font 730 class="arg">result</font></tt> argument is a pointer to a variable 731 where the information is returned. The return value of a call to 732 <tt><font class="func">tre_config</font>()</tt> is zero if <tt><font 733 class="arg">query</font></tt> was recognized, REG_NOMATCH otherwise. 734 </p> 735 736 <p> 737 The following values are recognized for <tt><font 738 class="arg">query</font></tt>: 739 740 <blockquote> 741 <dl> 742 <dt><tt>TRE_CONFIG_APPROX</tt></dt> 743 <dd>The result is an integer that is set to one if approximate 744 matching support is available, zero if not.</dd> 745 <dt><tt>TRE_CONFIG_WCHAR</tt></dt> 746 <dd>The result is an integer that is set to one if wide character 747 support is available, zero if not.</dd> 748 <dt><tt>TRE_CONFIG_MULTIBYTE</tt></dt> 749 <dd>The result is an integer that is set to one if multibyte character 750 set support is available, zero if not.</dd> 751 <dt><tt>TRE_CONFIG_SYSTEM_ABI</tt></dt> 752 <dd>The result is an integer that is set to one if TRE has been 753 compiled to be compatible with the system regex ABI, zero if not.</dd> 754 <dt><tt>TRE_CONFIG_VERSION</tt></dt> 755 <dd>The result is a pointer to a static character string that gives 756 the version of the TRE library.</dd> 757 </dl> 758 </blockquote> 759 760 761 <p> 762 The <tt><font class="func">tre_version</font>()</tt> function returns 763 a short human readable character string which shows the software name, 764 version, and license. 765 766 <h2>Preprocessor definitions</h2> 767 768 <p>The header <tt><tre/regex.h></tt> defines certain 769 C preprocessor symbols. 770 771 <h3>Version information</h3> 772 773 <p>The following definitions may be useful for checking whether a new 774 enough version is being used. Note that it is recommended to use the 775 <tt>pkg-config</tt> tool for version and other checks in Autoconf 776 scripts.</p> 777 778 <blockquote> 779 <dl> 780 <dt><tt>TRE_VERSION</tt></dt> 781 <dd>The version string. </dd> 782 783 <dt><tt>TRE_VERSION_1</tt></dt> 784 <dd>The major version number (first part of version string).</dd> 785 786 <dt><tt>TRE_VERSION_2</tt></dt> 787 <dd>The minor version number (second part of version string).</dd> 788 789 <dt><tt>TRE_VERSION_3</tt></dt> 790 <dd>The micro version number (third part of version string).</dd> 791 792 </dl> 793 </blockquote> 794 795 <h3>Features</h3> 796 797 <p>The following definitions may be useful for checking whether all 798 necessary features are enabled. Use these only if compile time 799 checking suffices (linking statically with TRE). When linking 800 dynamically <a href="#tre_config"><tt>tre_config()</tt></a> should be used 801 instead.</p> 802 803 <blockquote> 804 <dl> 805 <dt><tt>TRE_APPROX</tt></dt> 806 <dd>This is defined if approximate matching support is enabled. The 807 prototypes for approximate matching functions are defined only if 808 <tt>TRE_APPROX</tt> is defined.</dd> 809 810 <dt><tt>TRE_WCHAR</tt></dt> 811 <dd>This is defined if wide character support is enabled. The 812 prototypes for wide character matching functions are defined only if 813 <tt>TRE_WCHAR</tt> is defined.</dd> 814 815 <dt><tt>TRE_MULTIBYTE</tt></dt> 816 <dd>This is defined if multibyte character set support is enabled. 817 If this is not set any locale settings are ignored, and the default 818 locale is used when parsing regexps and matching strings.</dd> 819 820 </dl> 821 </blockquote> 822