tre-api.html revision 1.1 1 <h1>TRE API reference manual</h1>
2
3 <h2>The <tt>regcomp()</tt> functions</h2>
4 <a name="regcomp"></a>
5
6 <div class="code">
7 <code>
8 #include <tre/regex.h>
9 <br>
10 <br>
11 <font class="type">int</font>
12 <font class="func">regcomp</font>(<font
13 class="type">regex_t</font> *<font class="arg">preg</font>,
14 <font class="qual">const</font> <font class="type">char</font>
15 *<font class="arg">regex</font>, <font class="type">int</font>
16 <font class="arg">cflags</font>);
17 <br>
18 <font class="type">int</font> <font
19 class="func">regncomp</font>(<font class="type">regex_t</font>
20 *<font class="arg">preg</font>, <font class="qual">const</font>
21 <font class="type">char</font> *<font class="arg">regex</font>,
22 <font class="type">size_t</font> <font class="arg">len</font>,
23 <font class="type">int</font> <font class="arg">cflags</font>);
24 <br>
25 <font class="type">int</font> <font
26 class="func">regwcomp</font>(<font class="type">regex_t</font>
27 *<font class="arg">preg</font>, <font class="qual">const</font>
28 <font class="type">wchar_t</font> *<font
29 class="arg">regex</font>, <font class="type">int</font> <font
30 class="arg">cflags</font>);
31 <br>
32 <font class="type">int</font> <font
33 class="func">regwncomp</font>(<font class="type">regex_t</font>
34 *<font class="arg">preg</font>, <font class="qual">const</font>
35 <font class="type">wchar_t</font> *<font
36 class="arg">regex</font>, <font class="type">size_t</font>
37 <font class="arg">len</font>, <font class="type">int</font>
38 <font class="arg">cflags</font>);
39 <br>
40 <font class="type">void</font> <font
41 class="func">regfree</font>(<font class="type">regex_t</font>
42 *<font class="arg">preg</font>);
43 <br>
44 </code>
45 </div>
46
47 <p>
48 The <tt><font class="func">regcomp</font>()</tt> function compiles
49 the regex string pointed to by <tt><font
50 class="arg">regex</font></tt> to an internal representation and
51 stores the result in the pattern buffer structure pointed to by
52 <tt><font class="arg">preg</font></tt>. The <tt><font
53 class="func">regncomp</font>()</tt> function is like <tt><font
54 class="func">regcomp</font>()</tt>, but <tt><font
55 class="arg">regex</font></tt> is not terminated with the null
56 byte. Instead, the <tt><font class="arg">len</font></tt> argument
57 is used to give the length of the string, and the string may contain
58 null bytes. The <tt><font class="func">regwcomp</font>()</tt> and
59 <tt><font class="func">regwncomp</font>()</tt> functions work like
60 <tt><font class="func">regcomp</font>()</tt> and <tt><font
61 class="func">regncomp</font>()</tt>, respectively, but take a wide
62 character (<tt><font class="type">wchar_t</font></tt>) string
63 instead of a byte string.
64 </p>
65
66 <p>
67 The <tt><font class="arg">cflags</font></tt> argument is a the
68 bitwise inclusive OR of zero or more of the following flags (defined
69 in the header <tt><tre/regex.h></tt>):
70 </p>
71
72 <blockquote>
73 <dl>
74 <dt><tt>REG_EXTENDED</tt></dt>
75 <dd>Use POSIX Extended Regular Expression (ERE) compatible syntax when
76 compiling <tt><font class="arg">regex</font></tt>. The default
77 syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is
78 considered obsolete.</dd>
79
80 <dt><tt>REG_ICASE</tt></dt>
81 <dd>Ignore case. Subsequent searches with the <a
82 href="#regexec"><tt>regexec</tt></a> family of functions using this
83 pattern buffer will be case insensitive.</dd>
84
85 <dt><tt>REG_NOSUB</tt></dt>
86 <dd>Do not report submatches. Subsequent searches with the <a
87 href="#regexec"><tt>regexec</tt></a> family of functions will only
88 report whether a match was found or not and will not fill the submatch
89 array.</dd>
90
91 <dt><tt>REG_NEWLINE</tt></dt>
92 <dd>Normally the newline character is treated as an ordinary
93 character. When this flag is used, the newline character
94 (<tt>'\n'</tt>, ASCII code 10) is treated specially as follows:
95 <ol>
96 <li>The match-any-character operator (dot <tt>"."</tt> outside a
97 bracket expression) does not match a newline.</li>
98 <li>A non-matching list (<tt>[^...]</tt>) not containing a newline
99 does not match a newline.</li>
100 <li>The match-beginning-of-line operator <tt>^</tt> matches the empty
101 string immediately after a newline as well as the empty string at the
102 beginning of the string (but see the <code>REG_NOTBOL</code>
103 <code>regexec()</code> flag below).
104 <li>The match-end-of-line operator <tt>$</tt> matches the empty
105 string immediately before a newline as well as the empty string at the
106 end of the string (but see the <code>REG_NOTEOL</code>
107 <code>regexec()</code> flag below).
108 </ol>
109 </dd>
110
111 <dt><tt>REG_LITERAL</tt></dt>
112 <dd>Interpret the entire <tt><font class="arg">regex</font></tt>
113 argument as a literal string, that is, all characters will be
114 considered ordinary. This is a nonstandard extension, compatible with
115 but not specified by POSIX.</dd>
116
117 <dt><tt>REG_NOSPEC</tt></dt>
118 <dd>Same as <tt>REG_LITERAL</tt>. This flag is provided for
119 compatibility with BSD.</dd>
120
121 <dt><tt>REG_RIGHT_ASSOC</tt></dt>
122 <dd>By default, concatenation is left associative in TRE, as per
123 the grammar given in the <a
124 href="http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html">base
125 specifications on regular expressions</a> of Std 1003.1-2001 (POSIX).
126 This flag flips associativity of concatenation to right associative.
127 Associativity can have an effect on how a match is divided into
128 submatches, but does not change what is matched by the entire regexp.
129 </dd>
130
131 <dt><tt>REG_UNGREEDY</tt></dt>
132 <dd>By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and
133 can be forced to be non-greedy by appending a <tt>?</tt> character. This flag reverses this behavior
134 by making the operators non-greedy by default and greedy when a <tt>?</tt> is specified.</dd>
135 </dl>
136 </blockquote>
137
138 <p>
139 After a successful call to <tt><font class="func">regcomp</font></tt> it is
140 possible to use the <tt><font class="arg">preg</font></tt> pattern buffer for
141 searching for matches in strings (see below). Once the pattern buffer is no
142 longer needed, it should be freed with <tt><font
143 class="func">regfree</font></tt> to free the memory allocated for it.
144 </p>
145
146
147 <p>
148 The <tt><font class="type">regex_t</font></tt> structure has the
149 following fields that the application can read:
150 </p>
151 <blockquote>
152 <dl>
153 <dt><tt><font class="type">size_t</font> <font
154 class="arg">re_nsub</font></tt></dt>
155 <dd>Number of parenthesized subexpressions in <tt><font
156 class="arg">regex</font></tt>.
157 </dd>
158 </dl>
159 </blockquote>
160
161 <p>
162 The <tt><font class="func">regcomp</font></tt> function returns
163 zero if the compilation was successful, or one of the following error
164 codes if there was an error:
165 </p>
166 <blockquote>
167 <dl>
168 <dt><tt>REG_BADPAT</tt></dt>
169 <dd>Invalid regexp. TRE returns this only if a multibyte character
170 set is used in the current locale, and <tt><font
171 class="arg">regex</font></tt> contained an invalid multibyte
172 sequence.</dd>
173 <dt><tt>REG_ECOLLATE</tt></dt>
174 <dd>Invalid collating element referenced. TRE returns this whenever
175 equivalence classes or multicharacter collating elements are used in
176 bracket expressions (they are not supported yet).</dd>
177 <dt><tt>REG_ECTYPE</tt></dt>
178 <dd>Unknown character class name in <tt>[[:<i>name</i>:]]</tt>.</dd>
179 <dt><tt>REG_EESCAPE</tt></dt>
180 <dd>The last character of <tt><font class="arg">regex</font></tt>
181 was a backslash (<tt>\</tt>).</dd>
182 <dt><tt>REG_ESUBREG</tt></dt>
183 <dd>Invalid back reference; number in <tt>\<i>digit</i></tt>
184 invalid.</dd>
185 <dt><tt>REG_EBRACK</tt></dt>
186 <dd><tt>[]</tt> imbalance.</dd>
187 <dt><tt>REG_EPAREN</tt></dt>
188 <dd><tt>\(\)</tt> or <tt>()</tt> imbalance.</dd>
189 <dt><tt>REG_EBRACE</tt></dt>
190 <dd><tt>\{\}</tt> or <tt>{}</tt> imbalance.</dd>
191 <dt><tt>REG_BADBR</tt></dt>
192 <dd><tt>{}</tt> content invalid: not a number, more than two numbers,
193 first larger than second, or number too large.
194 <dt><tt>REG_ERANGE</tt></dt>
195 <dd>Invalid character range, e.g. ending point is earlier in the
196 collating order than the starting point.</dd>
197 <dt><tt>REG_ESPACE</tt></dt>
198 <dd>Out of memory, or an internal limit exceeded.</dd>
199 <dt><tt>REG_BADRPT</tt></dt>
200 <dd>Invalid use of repetition operators: two or more repetition operators have
201 been chained in an undefined way.</dd>
202 </dl>
203 </blockquote>
204
205
206 <h2>The <tt>regexec()</tt> functions</h2>
207 <a name="regexec"></a>
208
209 <div class="code">
210 <code>
211 #include <tre/regex.h>
212 <br>
213 <br>
214 <font class="type">int</font> <font
215 class="func">regexec</font>(<font class="qual">const</font>
216 <font class="type">regex_t</font> *<font
217 class="arg">preg</font>, <font class="qual">const</font> <font
218 class="type">char</font> *<font class="arg">string</font>,
219 <font class="type">size_t</font> <font
220 class="arg">nmatch</font>,
221 <br>
222 <font class="type">regmatch_t</font> <font
223 class="arg">pmatch</font>[], <font class="type">int</font>
224 <font class="arg">eflags</font>);
225 <br>
226 <font class="type">int</font> <font
227 class="func">regnexec</font>(<font class="qual">const</font>
228 <font class="type">regex_t</font> *<font
229 class="arg">preg</font>, <font class="qual">const</font> <font
230 class="type">char</font> *<font class="arg">string</font>,
231 <font class="type">size_t</font> <font class="arg">len</font>,
232 <br>
233 <font class="type">size_t</font> <font
234 class="arg">nmatch</font>, <font class="type">regmatch_t</font>
235 <font class="arg">pmatch</font>[], <font
236 class="type">int</font> <font class="arg">eflags</font>);
237 <br>
238 <font class="type">int</font> <font
239 class="func">regwexec</font>(<font class="qual">const</font>
240 <font class="type">regex_t</font> *<font
241 class="arg">preg</font>, <font class="qual">const</font> <font
242 class="type">wchar_t</font> *<font class="arg">string</font>,
243 <font class="type">size_t</font> <font
244 class="arg">nmatch</font>,
245 <br>
246 <font class="type">regmatch_t</font> <font
247 class="arg">pmatch</font>[], <font class="type">int</font>
248 <font class="arg">eflags</font>);
249 <br>
250 <font class="type">int</font> <font
251 class="func">regwnexec</font>(<font class="qual">const</font>
252 <font class="type">regex_t</font> *<font
253 class="arg">preg</font>, <font class="qual">const</font> <font
254 class="type">wchar_t</font> *<font class="arg">string</font>,
255 <font class="type">size_t</font> <font class="arg">len</font>,
256 <br>
257
258 <font class="type">size_t</font> <font
259 class="arg">nmatch</font>, <font class="type">regmatch_t</font>
260 <font class="arg">pmatch</font>[], <font
261 class="type">int</font> <font class="arg">eflags</font>);
262 </code>
263 </div>
264
265 <p>
266 The <tt><font class="func">regexec</font>()</tt> function matches
267 the null-terminated string against the compiled regexp <tt><font
268 class="arg">preg</font></tt>, initialized by a previous call to
269 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions. The
270 <tt><font class="func">regnexec</font>()</tt> function is like
271 <tt><font class="func">regexec</font>()</tt>, but <tt><font
272 class="arg">string</font></tt> is not terminated with a null byte.
273 Instead, the <tt><font class="arg">len</font></tt> argument is used
274 to give the length of the string, and the string may contain null
275 bytes. The <tt><font class="func">regwexec</font>()</tt> and
276 <tt><font class="func">regwnexec</font>()</tt> functions work like
277 <tt><font class="func">regexec</font>()</tt> and <tt><font
278 class="func">regnexec</font>()</tt>, respectively, but take a wide
279 character (<tt><font class="type">wchar_t</font></tt>) string
280 instead of a byte string. The <tt><font
281 class="arg">eflags</font></tt> argument is a bitwise OR of zero or
282 more of the following flags:
283 </p>
284 <blockquote>
285 <dl>
286 <dt><code>REG_NOTBOL</code></dt>
287 <dd>
288 <p>
289 When this flag is used, the match-beginning-of-line operator
290 <tt>^</tt> does not match the empty string at the beginning of
291 <tt><font class="arg">string</font></tt>. If
292 <code>REG_NEWLINE</code> was used when compiling
293 <tt><font class="arg">preg</font></tt> the empty string
294 immediately after a newline character will still be matched.
295 </p>
296 </dd>
297
298 <dt><code>REG_NOTEOL</code></dt>
299 <dd>
300 <p>
301 When this flag is used, the match-end-of-line operator
302 <tt>$</tt> does not match the empty string at the end of
303 <tt><font class="arg">string</font></tt>. If
304 <code>REG_NEWLINE</code> was used when compiling
305 <tt><font class="arg">preg</font></tt> the empty string
306 immediately before a newline character will still be matched.
307 </p>
308
309 </dl>
310
311 <p>
312 These flags are useful when different portions of a string are passed
313 to <code>regexec</code> and the beginning or end of the partial string
314 should not be interpreted as the beginning or end of a line.
315 </p>
316
317 </blockquote>
318
319 <p>
320 If <code>REG_NOSUB</code> was used when compiling <tt><font
321 class="arg">preg</font></tt>, <tt><font
322 class="arg">nmatch</font></tt> is zero, or <tt><font
323 class="arg">pmatch</font></tt> is <code>NULL</code>, then the
324 <tt><font class="arg">pmatch</font></tt> argument is ignored.
325 Otherwise, the submatches corresponding to the parenthesized
326 subexpressions are filled in the elements of <tt><font
327 class="arg">pmatch</font></tt>, which must be dimensioned to have
328 at least <tt><font class="arg">nmatch</font></tt> elements.
329 </p>
330
331 <p>
332 The <tt><font class="type">regmatch_t</font></tt> structure contains
333 at least the following fields:
334 </p>
335 <blockquote>
336 <dl>
337 <dt><tt><font class="type">regoff_t</font> <font
338 class="arg">rm_so</font></tt></dt>
339 <dd>Offset from start of <tt><font class="arg">string</font></tt> to start of
340 substring. </dd>
341 <dt><tt><font class="type">regoff_t</font> <font
342 class="arg">rm_eo</font></tt></dt>
343 <dd>Offset from start of <tt><font class="arg">string</font></tt> to the first
344 character after the substring. </dd>
345 </dl>
346 </blockquote>
347
348 <p>
349 The length of a submatch can be computed by subtracting <code>rm_eo</code> and
350 <code>rm_so</code>. If a parenthesized subexpression did not participate in a
351 match, the <code>rm_so</code> and <code>rm_eo</code> fields for the
352 corresponding <code>pmatch</code> element are set to <code>-1</code>. Note
353 that when a multibyte character set is in effect, the submatch offsets are
354 given as byte offsets, not character offsets.
355 </p>
356
357 <p>
358 The <code>regexec()</code> functions return zero if a match was found,
359 otherwise they return <code>REG_NOMATCH</code> to indicate no match,
360 or <code>REG_ESPACE</code> to indicate that enough temporary memory
361 could not be allocated to complete the matching operation.
362 </p>
363
364
365
366 <h3>reguexec()</h3>
367
368 <div class="code">
369 <code>
370 #include <tre/regex.h>
371 <br>
372 <br>
373 <font class="qual">typedef struct</font> {
374 <br>
375 <font class="type">int</font> (*get_next_char)(<font
376 class="type">tre_char_t</font> *<font class="arg">c</font>, <font
377 class="type">unsigned int</font> *<font class="arg">pos_add</font>,
378 <font class="type">void</font> *<font class="arg">context</font>);
379 <br>
380 <font class="type">void</font> (*rewind)(<font
381 class="type">size_t</font> <font class="arg">pos</font>, <font
382 class="type">void</font> *<font class="arg">context</font>);
383 <br>
384 <font class="type">int</font> (*compare)(<font
385 class="type">size_t</font> <font class="arg">pos1</font>, <font
386 class="type">size_t</font> <font class="arg">pos2</font>, <font
387 class="type">size_t</font> <font class="arg">len</font>, <font
388 class="type">void</font> *<font class="arg">context</font>);
389 <br>
390 <font class="type">void</font> *<font
391 class="arg">context</font>;
392 <br>
393 } <font class="type">tre_str_source</font>;
394 <br>
395 <br>
396 <font class="type">int</font> <font
397 class="func">reguexec</font>(<font class="qual">const</font>
398 <font class="type">regex_t</font> *<font
399 class="arg">preg</font>, <font class="qual">const</font> <font
400 class="type">tre_str_source</font> *<font class="arg">string</font>,
401 <font class="type">size_t</font> <font class="arg">nmatch</font>,
402 <br>
403 <font class="type">regmatch_t</font> <font
404 class="arg">pmatch</font>[], <font class="type">int</font>
405 <font class="arg">eflags</font>);
406 </code>
407 </div>
408
409 <p>
410 The <tt><font class="func">reguexec</font>()</tt> function works just
411 like the other <tt>regexec()</tt> functions, except that the input
412 string is read from user specified callback functions instead of a
413 character array. This makes it possible, for example, to match
414 regexps over arbitrary user specified data structures.
415 </p>
416
417 <p>
418 The <tt><font class="type">tre_str_source</font></tt> structure
419 contains the following fields:
420 </p>
421 <blockquote>
422 <dl>
423 <dt><tt>get_next_char</tt></dt>
424 <dd>This function must retrieve the next available character. If a
425 character is not available, the space pointed to by
426 <tt><font class="arg">c</font></tt> must be set to zero and it must return
427 a nonzero value. If a character is available, it must be stored
428 to the space pointed to by
429 <tt><font class="arg">c</font></tt>, and the integer pointer to by
430 <tt><font class="arg">pos_add</font></tt> must be set to the
431 number of units advanced in the input (the value must be
432 <tt>>=1</tt>), and zero must be returned.</dd>
433
434 <dt><tt>rewind</tt></dt>
435 <dd>This function must rewind the input stream to the position
436 specified by <tt><font class="arg">pos</font></tt>. Unless the regexp
437 uses back references, <tt>rewind</tt> is not needed and can be set to
438 <tt>NULL</tt>.</dd>
439
440 <dt><tt>compare</tt></dt>
441 <dd>This function compares two substrings in the input streams
442 starting at the positions specified by <tt><font
443 class="arg">pos1</font></tt> and <tt><font
444 class="arg">pos2</font></tt> of length <tt><font
445 class="arg">len</font></tt>. If the substrings are equal,
446 <tt>compare</tt> must return zero, otherwise a nonzero value must be
447 returned. Unless the regexp uses back references, <tt>compare</tt> is
448 not needed and can be set to <tt>NULL</tt>.</dd>
449
450 <dt><tt>context</tt></dt>
451 <dd>This is a context variable, passed as the last argument to
452 all of the above functions for keeping track of the internal state of
453 the users code.</dd>
454
455 </dl>
456 </blockquote>
457
458 <p>
459 The position in the input stream is measured in <tt><font
460 class="type">size_t</font></tt> units. The current position is the
461 sum of the increments gotten from <tt><font
462 class="arg">pos_add</font></tt> (plus the position of the last
463 <tt>rewind</tt>, if any). The starting position is zero. Submatch
464 positions filled in the <tt><font class="arg">pmatch</font>[]</tt>
465 array are, of course, given using positions computed in this way.
466 </p>
467
468 <p>
469 For an example of how to use <tt>reguexec()</tt>, see the
470 <tt>tests/test-str-source.c</tt> file in the TRE source code
471 distribution.
472 </p>
473
474 <h2>The approximate matching functions</h2>
475 <a name="regaexec"></a>
476
477 <div class="code">
478 <code>
479 #include <tre/regex.h>
480 <br>
481 <br>
482 <font class="qual">typedef struct</font> {<br>
483 <font class="type">int</font>
484 <font class="arg">cost_ins</font>;<br>
485 <font class="type">int</font>
486 <font class="arg">cost_del</font>;<br>
487 <font class="type">int</font>
488 <font class="arg">cost_subst</font>;<br>
489 <font class="type">int</font>
490 <font class="arg">max_cost</font>;<br><br>
491 <font class="type">int</font>
492 <font class="arg">max_ins</font>;<br>
493 <font class="type">int</font>
494 <font class="arg">max_del</font>;<br>
495 <font class="type">int</font>
496 <font class="arg">max_subst</font>;<br>
497 <font class="type">int</font>
498 <font class="arg">max_err</font>;<br>
499 } <font class="type">regaparams_t</font>;<br>
500 <br>
501 <font class="qual">typedef struct</font> {<br>
502 <font class="type">size_t</font>
503 <font class="arg">nmatch</font>;<br>
504 <font class="type">regmatch_t</font>
505 *<font class="arg">pmatch</font>;<br>
506 <font class="type">int</font>
507 <font class="arg">cost</font>;<br>
508 <font class="type">int</font>
509 <font class="arg">num_ins</font>;<br>
510 <font class="type">int</font>
511 <font class="arg">num_del</font>;<br>
512 <font class="type">int</font>
513 <font class="arg">num_subst</font>;<br>
514 } <font class="type">regamatch_t</font>;<br>
515 <br>
516 <font class="type">int</font> <font
517 class="func">regaexec</font>(<font class="qual">const</font>
518 <font class="type">regex_t</font> *<font
519 class="arg">preg</font>, <font class="qual">const</font> <font
520 class="type">char</font> *<font class="arg">string</font>,<br>
521
522 <font class="type">regamatch_t</font>
523 *<font class="arg">match</font>,
524 <font class="type">regaparams_t</font>
525 <font class="arg">params</font>,
526 <font class="type">int</font>
527 <font class="arg">eflags</font>);
528 <br>
529 <font class="type">int</font> <font
530 class="func">reganexec</font>(<font class="qual">const</font>
531 <font class="type">regex_t</font> *<font
532 class="arg">preg</font>, <font class="qual">const</font> <font
533 class="type">char</font> *<font class="arg">string</font>,
534 <font class="type">size_t</font> <font class="arg">len</font>,<br>
535
536 <font class="type">regamatch_t</font>
537 *<font class="arg">match</font>,
538 <font class="type">regaparams_t</font>
539 <font class="arg">params</font>,
540 <font class="type">int</font> <font class="arg">eflags</font>);
541 <br>
542 <font class="type">int</font> <font
543 class="func">regawexec</font>(<font class="qual">const</font>
544 <font class="type">regex_t</font> *<font
545 class="arg">preg</font>, <font class="qual">const</font> <font
546 class="type">wchar_t</font> *<font class="arg">string</font>,<br>
547
548 <font class="type">regamatch_t</font>
549 *<font class="arg">match</font>,
550 <font class="type">regaparams_t</font>
551 <font class="arg">params</font>,
552 <font class="type">int</font>
553 <font class="arg">eflags</font>);
554 <br>
555 <font class="type">int</font>
556 <font class="func">regawnexec</font>(
557 <font class="qual">const</font>
558 <font class="type">regex_t</font>
559 *<font class="arg">preg</font>,
560 <font class="qual">const</font>
561 <font class="type">wchar_t</font>
562 *<font class="arg">string</font>,
563 <font class="type">size_t</font>
564 <font class="arg">len</font>,<br>
565
566 <font class="type">regamatch_t</font>
567 *<font class="arg">match</font>,
568 <font class="type">regaparams_t</font>
569 <font class="arg">params</font>,
570 <font class="type">int</font>
571 <font class="arg">eflags</font>);
572 <br>
573 </code>
574 </div>
575
576 <p>
577 The <tt><font class="func">regaexec</font>()</tt> function searches for
578 the best match in <tt><font class="arg">string</font></tt>
579 against the compiled regexp <tt><font
580 class="arg">preg</font></tt>, initialized by a previous call to
581 any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions.
582 </p>
583
584 <p>
585 The <tt><font class="func">reganexec</font>()</tt> function is like
586 <tt><font class="func">regaexec</font>()</tt>, but <tt><font
587 class="arg">string</font></tt> is not terminated by a null byte.
588 Instead, the <tt><font class="arg">len</font></tt> argument is used to
589 tell the length of the string, and the string may contain null
590 bytes. The <tt><font class="func">regawexec</font>()</tt> and
591 <tt><font class="func">regawnexec</font>()</tt> functions work like
592 <tt><font class="func">regaexec</font>()</tt> and <tt><font
593 class="func">reganexec</font>()</tt>, respectively, but take a wide
594 character (<tt><font class="type">wchar_t</font></tt>) string instead
595 of a byte string.
596 </p>
597
598 <p>
599 The <tt><font class="arg">eflags</font></tt> argument is like for
600 the regexec() functions.
601 </p>
602
603 <p>
604 The <tt><font class="arg">params</font></tt> struct controls the
605 approximate matching parameters:
606 <blockquote>
607 <dl>
608 <dt><tt><font class="type">int</font></tt>
609 <tt><font class="arg">cost_ins</font></tt></dt>
610 <dd>The default cost of an inserted character, that is, an extra
611 character in <tt><font class="arg">string</font></tt>.</dd>
612
613 <dt><tt><font class="type">int</font></tt>
614 <tt><font class="arg">cost_del</font></tt></dt>
615 <dd>The default cost of a deleted character, that is, a character
616 missing from <tt><font class="arg">string</font></tt>.</dd>
617
618 <dt><tt><font class="type">int</font></tt>
619 <tt><font class="arg">cost_subst</font></tt></dt>
620 <dd>The default cost of a substituted character.</dd>
621
622 <dt><tt><font class="type">int</font></tt>
623 <tt><font class="arg">max_cost</font></tt></dt>
624 <dd>The maximum allowed cost of a match. If this is set to zero,
625 an exact matching is searched for, and results equivalent to
626 those returned by the <tt>regexec()</tt> functions are
627 returned.</dd>
628
629 <dt><tt><font class="type">int</font></tt>
630 <tt><font class="arg">max_ins</font></tt></dt>
631 <dd>Maximum allowed number of inserted characters.</dd>
632
633 <dt><tt><font class="type">int</font></tt>
634 <tt><font class="arg">max_del</font></tt></dt>
635 <dd>Maximum allowed number of deleted characters.</dd>
636
637 <dt><tt><font class="type">int</font></tt>
638 <tt><font class="arg">max_subst</font></tt></dt>
639 <dd>Maximum allowed number of substituted characters.</dd>
640
641 <dt><tt><font class="type">int</font></tt>
642 <tt><font class="arg">max_err</font></tt></dt>
643 <dd>Maximum allowed number of errors (inserts + deletes +
644 substitutes).</dd>
645 </dl>
646 </blockquote>
647
648 <p>
649 The <tt><font class="arg">match</font></tt> argument points to a
650 <tt><font class="type">regamatch_t</font></tt> structure. The
651 <tt><font class="arg">nmatch</font></tt> and <tt><font
652 class="arg">pmatch</font></tt> field must be filled by the caller. If
653 <code>REG_NOSUB</code> was used when compiling the regexp, or
654 <code>match->nmatch</code> is zero, or
655 <code>match->pmatch</code> is <code>NULL</code>, the
656 <code>match->pmatch</code> argument is ignored. Otherwise, the
657 submatches corresponding to the parenthesized subexpressions are
658 filled in the elements of <code>match->pmatch</code>, which must be
659 dimensioned to have at least <code>match->nmatch</code> elements.
660 The <code>match->cost</code> field is set to the cost of the match
661 found, and the <code>match->num_ins</code>,
662 <code>match->num_del</code>, and <code>match->num_subst</code>
663 fields are set to the number of inserts, deletes, and substitutes in
664 the match, respectively.
665 </p>
666
667 <p>
668 The <tt>regaexec()</tt> functions return zero if a match with cost
669 smaller than <code>params->max_cost</code> was found, otherwise
670 they return <code>REG_NOMATCH</code> to indicate no match, or
671 <code>REG_ESPACE</code> to indicate that enough temporary memory could
672 not be allocated to complete the matching operation.
673 </p>
674
675 <h2>Miscellaneous</h2>
676
677 <div class="code">
678 <code>
679 #include <tre/regex.h>
680 <br>
681 <br>
682 <font class="type">int</font> <font
683 class="func">tre_have_backrefs</font>(<font class="qual">const</font>
684 <font class="type">regex_t</font> *<font class="arg">preg</font>);
685 <br>
686 <font class="type">int</font> <font
687 class="func">tre_have_approx</font>(<font class="qual">const</font>
688 <font class="type">regex_t</font> *<font class="arg">preg</font>);
689 <br>
690 </code>
691 </div>
692
693 <p>
694 The <tt><font class="func">tre_have_backrefs</font>()</tt> and
695 <tt><font class="func">tre_have_approx</font>()</tt> functions return
696 1 if the compiled pattern has back references or uses approximate
697 matching, respectively, and 0 if not.
698 </p>
699
700
701 <h2>Checking build time options</h2>
702
703 <a name="tre_config"></a>
704 <div class="code">
705 <code>
706 #include <tre/regex.h>
707 <br>
708 <br>
709 <font class="type">char</font> *<font
710 class="func">tre_version</font>(<font class="type">void</font>);
711 <br>
712 <font class="type">int</font> <font
713 class="func">tre_config</font>(<font class="type">int</font> <font
714 class="arg">query</font>, <font class="type">void</font> *<font
715 class="arg">result</font>);
716 <br>
717 </code>
718 </div>
719
720 <p>
721 The <tt><font class="func">tre_config</font>()</tt> function can be
722 used to retrieve information of which optional features have been
723 compiled into the TRE library and information of other parameters that
724 may change between releases.
725 </p>
726
727 <p>
728 The <tt><font class="arg">query</font></tt> argument is an integer
729 telling what information is requested for. The <tt><font
730 class="arg">result</font></tt> argument is a pointer to a variable
731 where the information is returned. The return value of a call to
732 <tt><font class="func">tre_config</font>()</tt> is zero if <tt><font
733 class="arg">query</font></tt> was recognized, REG_NOMATCH otherwise.
734 </p>
735
736 <p>
737 The following values are recognized for <tt><font
738 class="arg">query</font></tt>:
739
740 <blockquote>
741 <dl>
742 <dt><tt>TRE_CONFIG_APPROX</tt></dt>
743 <dd>The result is an integer that is set to one if approximate
744 matching support is available, zero if not.</dd>
745 <dt><tt>TRE_CONFIG_WCHAR</tt></dt>
746 <dd>The result is an integer that is set to one if wide character
747 support is available, zero if not.</dd>
748 <dt><tt>TRE_CONFIG_MULTIBYTE</tt></dt>
749 <dd>The result is an integer that is set to one if multibyte character
750 set support is available, zero if not.</dd>
751 <dt><tt>TRE_CONFIG_SYSTEM_ABI</tt></dt>
752 <dd>The result is an integer that is set to one if TRE has been
753 compiled to be compatible with the system regex ABI, zero if not.</dd>
754 <dt><tt>TRE_CONFIG_VERSION</tt></dt>
755 <dd>The result is a pointer to a static character string that gives
756 the version of the TRE library.</dd>
757 </dl>
758 </blockquote>
759
760
761 <p>
762 The <tt><font class="func">tre_version</font>()</tt> function returns
763 a short human readable character string which shows the software name,
764 version, and license.
765
766 <h2>Preprocessor definitions</h2>
767
768 <p>The header <tt><tre/regex.h></tt> defines certain
769 C preprocessor symbols.
770
771 <h3>Version information</h3>
772
773 <p>The following definitions may be useful for checking whether a new
774 enough version is being used. Note that it is recommended to use the
775 <tt>pkg-config</tt> tool for version and other checks in Autoconf
776 scripts.</p>
777
778 <blockquote>
779 <dl>
780 <dt><tt>TRE_VERSION</tt></dt>
781 <dd>The version string. </dd>
782
783 <dt><tt>TRE_VERSION_1</tt></dt>
784 <dd>The major version number (first part of version string).</dd>
785
786 <dt><tt>TRE_VERSION_2</tt></dt>
787 <dd>The minor version number (second part of version string).</dd>
788
789 <dt><tt>TRE_VERSION_3</tt></dt>
790 <dd>The micro version number (third part of version string).</dd>
791
792 </dl>
793 </blockquote>
794
795 <h3>Features</h3>
796
797 <p>The following definitions may be useful for checking whether all
798 necessary features are enabled. Use these only if compile time
799 checking suffices (linking statically with TRE). When linking
800 dynamically <a href="#tre_config"><tt>tre_config()</tt></a> should be used
801 instead.</p>
802
803 <blockquote>
804 <dl>
805 <dt><tt>TRE_APPROX</tt></dt>
806 <dd>This is defined if approximate matching support is enabled. The
807 prototypes for approximate matching functions are defined only if
808 <tt>TRE_APPROX</tt> is defined.</dd>
809
810 <dt><tt>TRE_WCHAR</tt></dt>
811 <dd>This is defined if wide character support is enabled. The
812 prototypes for wide character matching functions are defined only if
813 <tt>TRE_WCHAR</tt> is defined.</dd>
814
815 <dt><tt>TRE_MULTIBYTE</tt></dt>
816 <dd>This is defined if multibyte character set support is enabled.
817 If this is not set any locale settings are ignored, and the default
818 locale is used when parsing regexps and matching strings.</dd>
819
820 </dl>
821 </blockquote>
822