1 1.1 agc <h1>TRE API reference manual</h1> 2 1.1 agc 3 1.1 agc <h2>The <tt>regcomp()</tt> functions</h2> 4 1.1 agc <a name="regcomp"></a> 5 1.1 agc 6 1.1 agc <div class="code"> 7 1.1 agc <code> 8 1.1 agc #include <tre/regex.h> 9 1.1 agc <br> 10 1.1 agc <br> 11 1.1 agc <font class="type">int</font> 12 1.1 agc <font class="func">regcomp</font>(<font 13 1.1 agc class="type">regex_t</font> *<font class="arg">preg</font>, 14 1.1 agc <font class="qual">const</font> <font class="type">char</font> 15 1.1 agc *<font class="arg">regex</font>, <font class="type">int</font> 16 1.1 agc <font class="arg">cflags</font>); 17 1.1 agc <br> 18 1.1 agc <font class="type">int</font> <font 19 1.1 agc class="func">regncomp</font>(<font class="type">regex_t</font> 20 1.1 agc *<font class="arg">preg</font>, <font class="qual">const</font> 21 1.1 agc <font class="type">char</font> *<font class="arg">regex</font>, 22 1.1 agc <font class="type">size_t</font> <font class="arg">len</font>, 23 1.1 agc <font class="type">int</font> <font class="arg">cflags</font>); 24 1.1 agc <br> 25 1.1 agc <font class="type">int</font> <font 26 1.1 agc class="func">regwcomp</font>(<font class="type">regex_t</font> 27 1.1 agc *<font class="arg">preg</font>, <font class="qual">const</font> 28 1.1 agc <font class="type">wchar_t</font> *<font 29 1.1 agc class="arg">regex</font>, <font class="type">int</font> <font 30 1.1 agc class="arg">cflags</font>); 31 1.1 agc <br> 32 1.1 agc <font class="type">int</font> <font 33 1.1 agc class="func">regwncomp</font>(<font class="type">regex_t</font> 34 1.1 agc *<font class="arg">preg</font>, <font class="qual">const</font> 35 1.1 agc <font class="type">wchar_t</font> *<font 36 1.1 agc class="arg">regex</font>, <font class="type">size_t</font> 37 1.1 agc <font class="arg">len</font>, <font class="type">int</font> 38 1.1 agc <font class="arg">cflags</font>); 39 1.1 agc <br> 40 1.1 agc <font class="type">void</font> <font 41 1.1 agc class="func">regfree</font>(<font class="type">regex_t</font> 42 1.1 agc *<font class="arg">preg</font>); 43 1.1 agc <br> 44 1.1 agc </code> 45 1.1 agc </div> 46 1.1 agc 47 1.1 agc <p> 48 1.1 agc The <tt><font class="func">regcomp</font>()</tt> function compiles 49 1.1 agc the regex string pointed to by <tt><font 50 1.1 agc class="arg">regex</font></tt> to an internal representation and 51 1.1 agc stores the result in the pattern buffer structure pointed to by 52 1.1 agc <tt><font class="arg">preg</font></tt>. The <tt><font 53 1.1 agc class="func">regncomp</font>()</tt> function is like <tt><font 54 1.1 agc class="func">regcomp</font>()</tt>, but <tt><font 55 1.1 agc class="arg">regex</font></tt> is not terminated with the null 56 1.1 agc byte. Instead, the <tt><font class="arg">len</font></tt> argument 57 1.1 agc is used to give the length of the string, and the string may contain 58 1.1 agc null bytes. The <tt><font class="func">regwcomp</font>()</tt> and 59 1.1 agc <tt><font class="func">regwncomp</font>()</tt> functions work like 60 1.1 agc <tt><font class="func">regcomp</font>()</tt> and <tt><font 61 1.2 wiz class="func">regncomp</font>()</tt>, respectively, but take a 62 1.2 wiz wide-character (<tt><font class="type">wchar_t</font></tt>) string 63 1.1 agc instead of a byte string. 64 1.1 agc </p> 65 1.1 agc 66 1.1 agc <p> 67 1.1 agc The <tt><font class="arg">cflags</font></tt> argument is a the 68 1.1 agc bitwise inclusive OR of zero or more of the following flags (defined 69 1.1 agc in the header <tt><tre/regex.h></tt>): 70 1.1 agc </p> 71 1.1 agc 72 1.1 agc <blockquote> 73 1.1 agc <dl> 74 1.1 agc <dt><tt>REG_EXTENDED</tt></dt> 75 1.1 agc <dd>Use POSIX Extended Regular Expression (ERE) compatible syntax when 76 1.1 agc compiling <tt><font class="arg">regex</font></tt>. The default 77 1.1 agc syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is 78 1.1 agc considered obsolete.</dd> 79 1.1 agc 80 1.1 agc <dt><tt>REG_ICASE</tt></dt> 81 1.1 agc <dd>Ignore case. Subsequent searches with the <a 82 1.1 agc href="#regexec"><tt>regexec</tt></a> family of functions using this 83 1.1 agc pattern buffer will be case insensitive.</dd> 84 1.1 agc 85 1.1 agc <dt><tt>REG_NOSUB</tt></dt> 86 1.1 agc <dd>Do not report submatches. Subsequent searches with the <a 87 1.1 agc href="#regexec"><tt>regexec</tt></a> family of functions will only 88 1.1 agc report whether a match was found or not and will not fill the submatch 89 1.1 agc array.</dd> 90 1.1 agc 91 1.1 agc <dt><tt>REG_NEWLINE</tt></dt> 92 1.1 agc <dd>Normally the newline character is treated as an ordinary 93 1.1 agc character. When this flag is used, the newline character 94 1.1 agc (<tt>'\n'</tt>, ASCII code 10) is treated specially as follows: 95 1.1 agc <ol> 96 1.1 agc <li>The match-any-character operator (dot <tt>"."</tt> outside a 97 1.1 agc bracket expression) does not match a newline.</li> 98 1.1 agc <li>A non-matching list (<tt>[^...]</tt>) not containing a newline 99 1.1 agc does not match a newline.</li> 100 1.1 agc <li>The match-beginning-of-line operator <tt>^</tt> matches the empty 101 1.1 agc string immediately after a newline as well as the empty string at the 102 1.1 agc beginning of the string (but see the <code>REG_NOTBOL</code> 103 1.1 agc <code>regexec()</code> flag below). 104 1.1 agc <li>The match-end-of-line operator <tt>$</tt> matches the empty 105 1.1 agc string immediately before a newline as well as the empty string at the 106 1.1 agc end of the string (but see the <code>REG_NOTEOL</code> 107 1.1 agc <code>regexec()</code> flag below). 108 1.1 agc </ol> 109 1.1 agc </dd> 110 1.1 agc 111 1.1 agc <dt><tt>REG_LITERAL</tt></dt> 112 1.1 agc <dd>Interpret the entire <tt><font class="arg">regex</font></tt> 113 1.1 agc argument as a literal string, that is, all characters will be 114 1.1 agc considered ordinary. This is a nonstandard extension, compatible with 115 1.1 agc but not specified by POSIX.</dd> 116 1.1 agc 117 1.1 agc <dt><tt>REG_NOSPEC</tt></dt> 118 1.1 agc <dd>Same as <tt>REG_LITERAL</tt>. This flag is provided for 119 1.1 agc compatibility with BSD.</dd> 120 1.1 agc 121 1.1 agc <dt><tt>REG_RIGHT_ASSOC</tt></dt> 122 1.1 agc <dd>By default, concatenation is left associative in TRE, as per 123 1.1 agc the grammar given in the <a 124 1.1 agc href="http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html">base 125 1.1 agc specifications on regular expressions</a> of Std 1003.1-2001 (POSIX). 126 1.1 agc This flag flips associativity of concatenation to right associative. 127 1.1 agc Associativity can have an effect on how a match is divided into 128 1.1 agc submatches, but does not change what is matched by the entire regexp. 129 1.1 agc </dd> 130 1.1 agc 131 1.1 agc <dt><tt>REG_UNGREEDY</tt></dt> 132 1.1 agc <dd>By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and 133 1.1 agc can be forced to be non-greedy by appending a <tt>?</tt> character. This flag reverses this behavior 134 1.1 agc by making the operators non-greedy by default and greedy when a <tt>?</tt> is specified.</dd> 135 1.1 agc </dl> 136 1.1 agc </blockquote> 137 1.1 agc 138 1.1 agc <p> 139 1.1 agc After a successful call to <tt><font class="func">regcomp</font></tt> it is 140 1.1 agc possible to use the <tt><font class="arg">preg</font></tt> pattern buffer for 141 1.1 agc searching for matches in strings (see below). Once the pattern buffer is no 142 1.1 agc longer needed, it should be freed with <tt><font 143 1.1 agc class="func">regfree</font></tt> to free the memory allocated for it. 144 1.1 agc </p> 145 1.1 agc 146 1.1 agc 147 1.1 agc <p> 148 1.1 agc The <tt><font class="type">regex_t</font></tt> structure has the 149 1.1 agc following fields that the application can read: 150 1.1 agc </p> 151 1.1 agc <blockquote> 152 1.1 agc <dl> 153 1.1 agc <dt><tt><font class="type">size_t</font> <font 154 1.1 agc class="arg">re_nsub</font></tt></dt> 155 1.1 agc <dd>Number of parenthesized subexpressions in <tt><font 156 1.1 agc class="arg">regex</font></tt>. 157 1.1 agc </dd> 158 1.1 agc </dl> 159 1.1 agc </blockquote> 160 1.1 agc 161 1.1 agc <p> 162 1.1 agc The <tt><font class="func">regcomp</font></tt> function returns 163 1.1 agc zero if the compilation was successful, or one of the following error 164 1.1 agc codes if there was an error: 165 1.1 agc </p> 166 1.1 agc <blockquote> 167 1.1 agc <dl> 168 1.1 agc <dt><tt>REG_BADPAT</tt></dt> 169 1.1 agc <dd>Invalid regexp. TRE returns this only if a multibyte character 170 1.1 agc set is used in the current locale, and <tt><font 171 1.1 agc class="arg">regex</font></tt> contained an invalid multibyte 172 1.1 agc sequence.</dd> 173 1.1 agc <dt><tt>REG_ECOLLATE</tt></dt> 174 1.1 agc <dd>Invalid collating element referenced. TRE returns this whenever 175 1.1 agc equivalence classes or multicharacter collating elements are used in 176 1.1 agc bracket expressions (they are not supported yet).</dd> 177 1.1 agc <dt><tt>REG_ECTYPE</tt></dt> 178 1.1 agc <dd>Unknown character class name in <tt>[[:<i>name</i>:]]</tt>.</dd> 179 1.1 agc <dt><tt>REG_EESCAPE</tt></dt> 180 1.1 agc <dd>The last character of <tt><font class="arg">regex</font></tt> 181 1.1 agc was a backslash (<tt>\</tt>).</dd> 182 1.1 agc <dt><tt>REG_ESUBREG</tt></dt> 183 1.1 agc <dd>Invalid back reference; number in <tt>\<i>digit</i></tt> 184 1.1 agc invalid.</dd> 185 1.1 agc <dt><tt>REG_EBRACK</tt></dt> 186 1.1 agc <dd><tt>[]</tt> imbalance.</dd> 187 1.1 agc <dt><tt>REG_EPAREN</tt></dt> 188 1.1 agc <dd><tt>\(\)</tt> or <tt>()</tt> imbalance.</dd> 189 1.1 agc <dt><tt>REG_EBRACE</tt></dt> 190 1.1 agc <dd><tt>\{\}</tt> or <tt>{}</tt> imbalance.</dd> 191 1.1 agc <dt><tt>REG_BADBR</tt></dt> 192 1.1 agc <dd><tt>{}</tt> content invalid: not a number, more than two numbers, 193 1.1 agc first larger than second, or number too large. 194 1.1 agc <dt><tt>REG_ERANGE</tt></dt> 195 1.1 agc <dd>Invalid character range, e.g. ending point is earlier in the 196 1.1 agc collating order than the starting point.</dd> 197 1.1 agc <dt><tt>REG_ESPACE</tt></dt> 198 1.1 agc <dd>Out of memory, or an internal limit exceeded.</dd> 199 1.1 agc <dt><tt>REG_BADRPT</tt></dt> 200 1.1 agc <dd>Invalid use of repetition operators: two or more repetition operators have 201 1.1 agc been chained in an undefined way.</dd> 202 1.1 agc </dl> 203 1.1 agc </blockquote> 204 1.1 agc 205 1.1 agc 206 1.1 agc <h2>The <tt>regexec()</tt> functions</h2> 207 1.1 agc <a name="regexec"></a> 208 1.1 agc 209 1.1 agc <div class="code"> 210 1.1 agc <code> 211 1.1 agc #include <tre/regex.h> 212 1.1 agc <br> 213 1.1 agc <br> 214 1.1 agc <font class="type">int</font> <font 215 1.1 agc class="func">regexec</font>(<font class="qual">const</font> 216 1.1 agc <font class="type">regex_t</font> *<font 217 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 218 1.1 agc class="type">char</font> *<font class="arg">string</font>, 219 1.1 agc <font class="type">size_t</font> <font 220 1.1 agc class="arg">nmatch</font>, 221 1.1 agc <br> 222 1.1 agc <font class="type">regmatch_t</font> <font 223 1.1 agc class="arg">pmatch</font>[], <font class="type">int</font> 224 1.1 agc <font class="arg">eflags</font>); 225 1.1 agc <br> 226 1.1 agc <font class="type">int</font> <font 227 1.1 agc class="func">regnexec</font>(<font class="qual">const</font> 228 1.1 agc <font class="type">regex_t</font> *<font 229 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 230 1.1 agc class="type">char</font> *<font class="arg">string</font>, 231 1.1 agc <font class="type">size_t</font> <font class="arg">len</font>, 232 1.1 agc <br> 233 1.1 agc <font class="type">size_t</font> <font 234 1.1 agc class="arg">nmatch</font>, <font class="type">regmatch_t</font> 235 1.1 agc <font class="arg">pmatch</font>[], <font 236 1.1 agc class="type">int</font> <font class="arg">eflags</font>); 237 1.1 agc <br> 238 1.1 agc <font class="type">int</font> <font 239 1.1 agc class="func">regwexec</font>(<font class="qual">const</font> 240 1.1 agc <font class="type">regex_t</font> *<font 241 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 242 1.1 agc class="type">wchar_t</font> *<font class="arg">string</font>, 243 1.1 agc <font class="type">size_t</font> <font 244 1.1 agc class="arg">nmatch</font>, 245 1.1 agc <br> 246 1.1 agc <font class="type">regmatch_t</font> <font 247 1.1 agc class="arg">pmatch</font>[], <font class="type">int</font> 248 1.1 agc <font class="arg">eflags</font>); 249 1.1 agc <br> 250 1.1 agc <font class="type">int</font> <font 251 1.1 agc class="func">regwnexec</font>(<font class="qual">const</font> 252 1.1 agc <font class="type">regex_t</font> *<font 253 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 254 1.1 agc class="type">wchar_t</font> *<font class="arg">string</font>, 255 1.1 agc <font class="type">size_t</font> <font class="arg">len</font>, 256 1.1 agc <br> 257 1.1 agc 258 1.1 agc <font class="type">size_t</font> <font 259 1.1 agc class="arg">nmatch</font>, <font class="type">regmatch_t</font> 260 1.1 agc <font class="arg">pmatch</font>[], <font 261 1.1 agc class="type">int</font> <font class="arg">eflags</font>); 262 1.1 agc </code> 263 1.1 agc </div> 264 1.1 agc 265 1.1 agc <p> 266 1.1 agc The <tt><font class="func">regexec</font>()</tt> function matches 267 1.1 agc the null-terminated string against the compiled regexp <tt><font 268 1.1 agc class="arg">preg</font></tt>, initialized by a previous call to 269 1.1 agc any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions. The 270 1.1 agc <tt><font class="func">regnexec</font>()</tt> function is like 271 1.1 agc <tt><font class="func">regexec</font>()</tt>, but <tt><font 272 1.1 agc class="arg">string</font></tt> is not terminated with a null byte. 273 1.1 agc Instead, the <tt><font class="arg">len</font></tt> argument is used 274 1.1 agc to give the length of the string, and the string may contain null 275 1.1 agc bytes. The <tt><font class="func">regwexec</font>()</tt> and 276 1.1 agc <tt><font class="func">regwnexec</font>()</tt> functions work like 277 1.1 agc <tt><font class="func">regexec</font>()</tt> and <tt><font 278 1.1 agc class="func">regnexec</font>()</tt>, respectively, but take a wide 279 1.1 agc character (<tt><font class="type">wchar_t</font></tt>) string 280 1.1 agc instead of a byte string. The <tt><font 281 1.1 agc class="arg">eflags</font></tt> argument is a bitwise OR of zero or 282 1.1 agc more of the following flags: 283 1.1 agc </p> 284 1.1 agc <blockquote> 285 1.1 agc <dl> 286 1.1 agc <dt><code>REG_NOTBOL</code></dt> 287 1.1 agc <dd> 288 1.1 agc <p> 289 1.1 agc When this flag is used, the match-beginning-of-line operator 290 1.1 agc <tt>^</tt> does not match the empty string at the beginning of 291 1.1 agc <tt><font class="arg">string</font></tt>. If 292 1.1 agc <code>REG_NEWLINE</code> was used when compiling 293 1.1 agc <tt><font class="arg">preg</font></tt> the empty string 294 1.1 agc immediately after a newline character will still be matched. 295 1.1 agc </p> 296 1.1 agc </dd> 297 1.1 agc 298 1.1 agc <dt><code>REG_NOTEOL</code></dt> 299 1.1 agc <dd> 300 1.1 agc <p> 301 1.1 agc When this flag is used, the match-end-of-line operator 302 1.1 agc <tt>$</tt> does not match the empty string at the end of 303 1.1 agc <tt><font class="arg">string</font></tt>. If 304 1.1 agc <code>REG_NEWLINE</code> was used when compiling 305 1.1 agc <tt><font class="arg">preg</font></tt> the empty string 306 1.1 agc immediately before a newline character will still be matched. 307 1.1 agc </p> 308 1.1 agc 309 1.1 agc </dl> 310 1.1 agc 311 1.1 agc <p> 312 1.1 agc These flags are useful when different portions of a string are passed 313 1.1 agc to <code>regexec</code> and the beginning or end of the partial string 314 1.1 agc should not be interpreted as the beginning or end of a line. 315 1.1 agc </p> 316 1.1 agc 317 1.1 agc </blockquote> 318 1.1 agc 319 1.1 agc <p> 320 1.1 agc If <code>REG_NOSUB</code> was used when compiling <tt><font 321 1.1 agc class="arg">preg</font></tt>, <tt><font 322 1.1 agc class="arg">nmatch</font></tt> is zero, or <tt><font 323 1.1 agc class="arg">pmatch</font></tt> is <code>NULL</code>, then the 324 1.1 agc <tt><font class="arg">pmatch</font></tt> argument is ignored. 325 1.1 agc Otherwise, the submatches corresponding to the parenthesized 326 1.1 agc subexpressions are filled in the elements of <tt><font 327 1.1 agc class="arg">pmatch</font></tt>, which must be dimensioned to have 328 1.1 agc at least <tt><font class="arg">nmatch</font></tt> elements. 329 1.1 agc </p> 330 1.1 agc 331 1.1 agc <p> 332 1.1 agc The <tt><font class="type">regmatch_t</font></tt> structure contains 333 1.1 agc at least the following fields: 334 1.1 agc </p> 335 1.1 agc <blockquote> 336 1.1 agc <dl> 337 1.1 agc <dt><tt><font class="type">regoff_t</font> <font 338 1.1 agc class="arg">rm_so</font></tt></dt> 339 1.1 agc <dd>Offset from start of <tt><font class="arg">string</font></tt> to start of 340 1.1 agc substring. </dd> 341 1.1 agc <dt><tt><font class="type">regoff_t</font> <font 342 1.1 agc class="arg">rm_eo</font></tt></dt> 343 1.1 agc <dd>Offset from start of <tt><font class="arg">string</font></tt> to the first 344 1.1 agc character after the substring. </dd> 345 1.1 agc </dl> 346 1.1 agc </blockquote> 347 1.1 agc 348 1.1 agc <p> 349 1.1 agc The length of a submatch can be computed by subtracting <code>rm_eo</code> and 350 1.1 agc <code>rm_so</code>. If a parenthesized subexpression did not participate in a 351 1.1 agc match, the <code>rm_so</code> and <code>rm_eo</code> fields for the 352 1.1 agc corresponding <code>pmatch</code> element are set to <code>-1</code>. Note 353 1.1 agc that when a multibyte character set is in effect, the submatch offsets are 354 1.1 agc given as byte offsets, not character offsets. 355 1.1 agc </p> 356 1.1 agc 357 1.1 agc <p> 358 1.1 agc The <code>regexec()</code> functions return zero if a match was found, 359 1.1 agc otherwise they return <code>REG_NOMATCH</code> to indicate no match, 360 1.1 agc or <code>REG_ESPACE</code> to indicate that enough temporary memory 361 1.1 agc could not be allocated to complete the matching operation. 362 1.1 agc </p> 363 1.1 agc 364 1.1 agc 365 1.1 agc 366 1.1 agc <h3>reguexec()</h3> 367 1.1 agc 368 1.1 agc <div class="code"> 369 1.1 agc <code> 370 1.1 agc #include <tre/regex.h> 371 1.1 agc <br> 372 1.1 agc <br> 373 1.1 agc <font class="qual">typedef struct</font> { 374 1.1 agc <br> 375 1.1 agc <font class="type">int</font> (*get_next_char)(<font 376 1.1 agc class="type">tre_char_t</font> *<font class="arg">c</font>, <font 377 1.1 agc class="type">unsigned int</font> *<font class="arg">pos_add</font>, 378 1.1 agc <font class="type">void</font> *<font class="arg">context</font>); 379 1.1 agc <br> 380 1.1 agc <font class="type">void</font> (*rewind)(<font 381 1.1 agc class="type">size_t</font> <font class="arg">pos</font>, <font 382 1.1 agc class="type">void</font> *<font class="arg">context</font>); 383 1.1 agc <br> 384 1.1 agc <font class="type">int</font> (*compare)(<font 385 1.1 agc class="type">size_t</font> <font class="arg">pos1</font>, <font 386 1.1 agc class="type">size_t</font> <font class="arg">pos2</font>, <font 387 1.1 agc class="type">size_t</font> <font class="arg">len</font>, <font 388 1.1 agc class="type">void</font> *<font class="arg">context</font>); 389 1.1 agc <br> 390 1.1 agc <font class="type">void</font> *<font 391 1.1 agc class="arg">context</font>; 392 1.1 agc <br> 393 1.1 agc } <font class="type">tre_str_source</font>; 394 1.1 agc <br> 395 1.1 agc <br> 396 1.1 agc <font class="type">int</font> <font 397 1.1 agc class="func">reguexec</font>(<font class="qual">const</font> 398 1.1 agc <font class="type">regex_t</font> *<font 399 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 400 1.1 agc class="type">tre_str_source</font> *<font class="arg">string</font>, 401 1.1 agc <font class="type">size_t</font> <font class="arg">nmatch</font>, 402 1.1 agc <br> 403 1.1 agc <font class="type">regmatch_t</font> <font 404 1.1 agc class="arg">pmatch</font>[], <font class="type">int</font> 405 1.1 agc <font class="arg">eflags</font>); 406 1.1 agc </code> 407 1.1 agc </div> 408 1.1 agc 409 1.1 agc <p> 410 1.1 agc The <tt><font class="func">reguexec</font>()</tt> function works just 411 1.1 agc like the other <tt>regexec()</tt> functions, except that the input 412 1.1 agc string is read from user specified callback functions instead of a 413 1.1 agc character array. This makes it possible, for example, to match 414 1.1 agc regexps over arbitrary user specified data structures. 415 1.1 agc </p> 416 1.1 agc 417 1.1 agc <p> 418 1.1 agc The <tt><font class="type">tre_str_source</font></tt> structure 419 1.1 agc contains the following fields: 420 1.1 agc </p> 421 1.1 agc <blockquote> 422 1.1 agc <dl> 423 1.1 agc <dt><tt>get_next_char</tt></dt> 424 1.1 agc <dd>This function must retrieve the next available character. If a 425 1.1 agc character is not available, the space pointed to by 426 1.1 agc <tt><font class="arg">c</font></tt> must be set to zero and it must return 427 1.1 agc a nonzero value. If a character is available, it must be stored 428 1.1 agc to the space pointed to by 429 1.1 agc <tt><font class="arg">c</font></tt>, and the integer pointer to by 430 1.1 agc <tt><font class="arg">pos_add</font></tt> must be set to the 431 1.1 agc number of units advanced in the input (the value must be 432 1.1 agc <tt>>=1</tt>), and zero must be returned.</dd> 433 1.1 agc 434 1.1 agc <dt><tt>rewind</tt></dt> 435 1.1 agc <dd>This function must rewind the input stream to the position 436 1.1 agc specified by <tt><font class="arg">pos</font></tt>. Unless the regexp 437 1.1 agc uses back references, <tt>rewind</tt> is not needed and can be set to 438 1.1 agc <tt>NULL</tt>.</dd> 439 1.1 agc 440 1.1 agc <dt><tt>compare</tt></dt> 441 1.1 agc <dd>This function compares two substrings in the input streams 442 1.1 agc starting at the positions specified by <tt><font 443 1.1 agc class="arg">pos1</font></tt> and <tt><font 444 1.1 agc class="arg">pos2</font></tt> of length <tt><font 445 1.1 agc class="arg">len</font></tt>. If the substrings are equal, 446 1.1 agc <tt>compare</tt> must return zero, otherwise a nonzero value must be 447 1.1 agc returned. Unless the regexp uses back references, <tt>compare</tt> is 448 1.1 agc not needed and can be set to <tt>NULL</tt>.</dd> 449 1.1 agc 450 1.1 agc <dt><tt>context</tt></dt> 451 1.1 agc <dd>This is a context variable, passed as the last argument to 452 1.1 agc all of the above functions for keeping track of the internal state of 453 1.1 agc the users code.</dd> 454 1.1 agc 455 1.1 agc </dl> 456 1.1 agc </blockquote> 457 1.1 agc 458 1.1 agc <p> 459 1.1 agc The position in the input stream is measured in <tt><font 460 1.1 agc class="type">size_t</font></tt> units. The current position is the 461 1.1 agc sum of the increments gotten from <tt><font 462 1.1 agc class="arg">pos_add</font></tt> (plus the position of the last 463 1.1 agc <tt>rewind</tt>, if any). The starting position is zero. Submatch 464 1.1 agc positions filled in the <tt><font class="arg">pmatch</font>[]</tt> 465 1.1 agc array are, of course, given using positions computed in this way. 466 1.1 agc </p> 467 1.1 agc 468 1.1 agc <p> 469 1.1 agc For an example of how to use <tt>reguexec()</tt>, see the 470 1.1 agc <tt>tests/test-str-source.c</tt> file in the TRE source code 471 1.1 agc distribution. 472 1.1 agc </p> 473 1.1 agc 474 1.1 agc <h2>The approximate matching functions</h2> 475 1.1 agc <a name="regaexec"></a> 476 1.1 agc 477 1.1 agc <div class="code"> 478 1.1 agc <code> 479 1.1 agc #include <tre/regex.h> 480 1.1 agc <br> 481 1.1 agc <br> 482 1.1 agc <font class="qual">typedef struct</font> {<br> 483 1.1 agc <font class="type">int</font> 484 1.1 agc <font class="arg">cost_ins</font>;<br> 485 1.1 agc <font class="type">int</font> 486 1.1 agc <font class="arg">cost_del</font>;<br> 487 1.1 agc <font class="type">int</font> 488 1.1 agc <font class="arg">cost_subst</font>;<br> 489 1.1 agc <font class="type">int</font> 490 1.1 agc <font class="arg">max_cost</font>;<br><br> 491 1.1 agc <font class="type">int</font> 492 1.1 agc <font class="arg">max_ins</font>;<br> 493 1.1 agc <font class="type">int</font> 494 1.1 agc <font class="arg">max_del</font>;<br> 495 1.1 agc <font class="type">int</font> 496 1.1 agc <font class="arg">max_subst</font>;<br> 497 1.1 agc <font class="type">int</font> 498 1.1 agc <font class="arg">max_err</font>;<br> 499 1.1 agc } <font class="type">regaparams_t</font>;<br> 500 1.1 agc <br> 501 1.1 agc <font class="qual">typedef struct</font> {<br> 502 1.1 agc <font class="type">size_t</font> 503 1.1 agc <font class="arg">nmatch</font>;<br> 504 1.1 agc <font class="type">regmatch_t</font> 505 1.1 agc *<font class="arg">pmatch</font>;<br> 506 1.1 agc <font class="type">int</font> 507 1.1 agc <font class="arg">cost</font>;<br> 508 1.1 agc <font class="type">int</font> 509 1.1 agc <font class="arg">num_ins</font>;<br> 510 1.1 agc <font class="type">int</font> 511 1.1 agc <font class="arg">num_del</font>;<br> 512 1.1 agc <font class="type">int</font> 513 1.1 agc <font class="arg">num_subst</font>;<br> 514 1.1 agc } <font class="type">regamatch_t</font>;<br> 515 1.1 agc <br> 516 1.1 agc <font class="type">int</font> <font 517 1.1 agc class="func">regaexec</font>(<font class="qual">const</font> 518 1.1 agc <font class="type">regex_t</font> *<font 519 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 520 1.1 agc class="type">char</font> *<font class="arg">string</font>,<br> 521 1.1 agc 522 1.1 agc <font class="type">regamatch_t</font> 523 1.1 agc *<font class="arg">match</font>, 524 1.1 agc <font class="type">regaparams_t</font> 525 1.1 agc <font class="arg">params</font>, 526 1.1 agc <font class="type">int</font> 527 1.1 agc <font class="arg">eflags</font>); 528 1.1 agc <br> 529 1.1 agc <font class="type">int</font> <font 530 1.1 agc class="func">reganexec</font>(<font class="qual">const</font> 531 1.1 agc <font class="type">regex_t</font> *<font 532 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 533 1.1 agc class="type">char</font> *<font class="arg">string</font>, 534 1.1 agc <font class="type">size_t</font> <font class="arg">len</font>,<br> 535 1.1 agc 536 1.1 agc <font class="type">regamatch_t</font> 537 1.1 agc *<font class="arg">match</font>, 538 1.1 agc <font class="type">regaparams_t</font> 539 1.1 agc <font class="arg">params</font>, 540 1.1 agc <font class="type">int</font> <font class="arg">eflags</font>); 541 1.1 agc <br> 542 1.1 agc <font class="type">int</font> <font 543 1.1 agc class="func">regawexec</font>(<font class="qual">const</font> 544 1.1 agc <font class="type">regex_t</font> *<font 545 1.1 agc class="arg">preg</font>, <font class="qual">const</font> <font 546 1.1 agc class="type">wchar_t</font> *<font class="arg">string</font>,<br> 547 1.1 agc 548 1.1 agc <font class="type">regamatch_t</font> 549 1.1 agc *<font class="arg">match</font>, 550 1.1 agc <font class="type">regaparams_t</font> 551 1.1 agc <font class="arg">params</font>, 552 1.1 agc <font class="type">int</font> 553 1.1 agc <font class="arg">eflags</font>); 554 1.1 agc <br> 555 1.1 agc <font class="type">int</font> 556 1.1 agc <font class="func">regawnexec</font>( 557 1.1 agc <font class="qual">const</font> 558 1.1 agc <font class="type">regex_t</font> 559 1.1 agc *<font class="arg">preg</font>, 560 1.1 agc <font class="qual">const</font> 561 1.1 agc <font class="type">wchar_t</font> 562 1.1 agc *<font class="arg">string</font>, 563 1.1 agc <font class="type">size_t</font> 564 1.1 agc <font class="arg">len</font>,<br> 565 1.1 agc 566 1.1 agc <font class="type">regamatch_t</font> 567 1.1 agc *<font class="arg">match</font>, 568 1.1 agc <font class="type">regaparams_t</font> 569 1.1 agc <font class="arg">params</font>, 570 1.1 agc <font class="type">int</font> 571 1.1 agc <font class="arg">eflags</font>); 572 1.1 agc <br> 573 1.1 agc </code> 574 1.1 agc </div> 575 1.1 agc 576 1.1 agc <p> 577 1.1 agc The <tt><font class="func">regaexec</font>()</tt> function searches for 578 1.1 agc the best match in <tt><font class="arg">string</font></tt> 579 1.1 agc against the compiled regexp <tt><font 580 1.1 agc class="arg">preg</font></tt>, initialized by a previous call to 581 1.1 agc any one of the <a href="#regcomp"><tt>regcomp</tt></a> functions. 582 1.1 agc </p> 583 1.1 agc 584 1.1 agc <p> 585 1.1 agc The <tt><font class="func">reganexec</font>()</tt> function is like 586 1.1 agc <tt><font class="func">regaexec</font>()</tt>, but <tt><font 587 1.1 agc class="arg">string</font></tt> is not terminated by a null byte. 588 1.1 agc Instead, the <tt><font class="arg">len</font></tt> argument is used to 589 1.1 agc tell the length of the string, and the string may contain null 590 1.1 agc bytes. The <tt><font class="func">regawexec</font>()</tt> and 591 1.1 agc <tt><font class="func">regawnexec</font>()</tt> functions work like 592 1.1 agc <tt><font class="func">regaexec</font>()</tt> and <tt><font 593 1.1 agc class="func">reganexec</font>()</tt>, respectively, but take a wide 594 1.1 agc character (<tt><font class="type">wchar_t</font></tt>) string instead 595 1.1 agc of a byte string. 596 1.1 agc </p> 597 1.1 agc 598 1.1 agc <p> 599 1.1 agc The <tt><font class="arg">eflags</font></tt> argument is like for 600 1.1 agc the regexec() functions. 601 1.1 agc </p> 602 1.1 agc 603 1.1 agc <p> 604 1.1 agc The <tt><font class="arg">params</font></tt> struct controls the 605 1.1 agc approximate matching parameters: 606 1.1 agc <blockquote> 607 1.1 agc <dl> 608 1.1 agc <dt><tt><font class="type">int</font></tt> 609 1.1 agc <tt><font class="arg">cost_ins</font></tt></dt> 610 1.1 agc <dd>The default cost of an inserted character, that is, an extra 611 1.1 agc character in <tt><font class="arg">string</font></tt>.</dd> 612 1.1 agc 613 1.1 agc <dt><tt><font class="type">int</font></tt> 614 1.1 agc <tt><font class="arg">cost_del</font></tt></dt> 615 1.1 agc <dd>The default cost of a deleted character, that is, a character 616 1.1 agc missing from <tt><font class="arg">string</font></tt>.</dd> 617 1.1 agc 618 1.1 agc <dt><tt><font class="type">int</font></tt> 619 1.1 agc <tt><font class="arg">cost_subst</font></tt></dt> 620 1.1 agc <dd>The default cost of a substituted character.</dd> 621 1.1 agc 622 1.1 agc <dt><tt><font class="type">int</font></tt> 623 1.1 agc <tt><font class="arg">max_cost</font></tt></dt> 624 1.1 agc <dd>The maximum allowed cost of a match. If this is set to zero, 625 1.1 agc an exact matching is searched for, and results equivalent to 626 1.1 agc those returned by the <tt>regexec()</tt> functions are 627 1.1 agc returned.</dd> 628 1.1 agc 629 1.1 agc <dt><tt><font class="type">int</font></tt> 630 1.1 agc <tt><font class="arg">max_ins</font></tt></dt> 631 1.1 agc <dd>Maximum allowed number of inserted characters.</dd> 632 1.1 agc 633 1.1 agc <dt><tt><font class="type">int</font></tt> 634 1.1 agc <tt><font class="arg">max_del</font></tt></dt> 635 1.1 agc <dd>Maximum allowed number of deleted characters.</dd> 636 1.1 agc 637 1.1 agc <dt><tt><font class="type">int</font></tt> 638 1.1 agc <tt><font class="arg">max_subst</font></tt></dt> 639 1.1 agc <dd>Maximum allowed number of substituted characters.</dd> 640 1.1 agc 641 1.1 agc <dt><tt><font class="type">int</font></tt> 642 1.1 agc <tt><font class="arg">max_err</font></tt></dt> 643 1.1 agc <dd>Maximum allowed number of errors (inserts + deletes + 644 1.1 agc substitutes).</dd> 645 1.1 agc </dl> 646 1.1 agc </blockquote> 647 1.1 agc 648 1.1 agc <p> 649 1.1 agc The <tt><font class="arg">match</font></tt> argument points to a 650 1.1 agc <tt><font class="type">regamatch_t</font></tt> structure. The 651 1.1 agc <tt><font class="arg">nmatch</font></tt> and <tt><font 652 1.1 agc class="arg">pmatch</font></tt> field must be filled by the caller. If 653 1.1 agc <code>REG_NOSUB</code> was used when compiling the regexp, or 654 1.1 agc <code>match->nmatch</code> is zero, or 655 1.1 agc <code>match->pmatch</code> is <code>NULL</code>, the 656 1.1 agc <code>match->pmatch</code> argument is ignored. Otherwise, the 657 1.1 agc submatches corresponding to the parenthesized subexpressions are 658 1.1 agc filled in the elements of <code>match->pmatch</code>, which must be 659 1.1 agc dimensioned to have at least <code>match->nmatch</code> elements. 660 1.1 agc The <code>match->cost</code> field is set to the cost of the match 661 1.1 agc found, and the <code>match->num_ins</code>, 662 1.1 agc <code>match->num_del</code>, and <code>match->num_subst</code> 663 1.1 agc fields are set to the number of inserts, deletes, and substitutes in 664 1.1 agc the match, respectively. 665 1.1 agc </p> 666 1.1 agc 667 1.1 agc <p> 668 1.1 agc The <tt>regaexec()</tt> functions return zero if a match with cost 669 1.1 agc smaller than <code>params->max_cost</code> was found, otherwise 670 1.1 agc they return <code>REG_NOMATCH</code> to indicate no match, or 671 1.1 agc <code>REG_ESPACE</code> to indicate that enough temporary memory could 672 1.1 agc not be allocated to complete the matching operation. 673 1.1 agc </p> 674 1.1 agc 675 1.1 agc <h2>Miscellaneous</h2> 676 1.1 agc 677 1.1 agc <div class="code"> 678 1.1 agc <code> 679 1.1 agc #include <tre/regex.h> 680 1.1 agc <br> 681 1.1 agc <br> 682 1.1 agc <font class="type">int</font> <font 683 1.1 agc class="func">tre_have_backrefs</font>(<font class="qual">const</font> 684 1.1 agc <font class="type">regex_t</font> *<font class="arg">preg</font>); 685 1.1 agc <br> 686 1.1 agc <font class="type">int</font> <font 687 1.1 agc class="func">tre_have_approx</font>(<font class="qual">const</font> 688 1.1 agc <font class="type">regex_t</font> *<font class="arg">preg</font>); 689 1.1 agc <br> 690 1.1 agc </code> 691 1.1 agc </div> 692 1.1 agc 693 1.1 agc <p> 694 1.1 agc The <tt><font class="func">tre_have_backrefs</font>()</tt> and 695 1.1 agc <tt><font class="func">tre_have_approx</font>()</tt> functions return 696 1.1 agc 1 if the compiled pattern has back references or uses approximate 697 1.1 agc matching, respectively, and 0 if not. 698 1.1 agc </p> 699 1.1 agc 700 1.1 agc 701 1.1 agc <h2>Checking build time options</h2> 702 1.1 agc 703 1.1 agc <a name="tre_config"></a> 704 1.1 agc <div class="code"> 705 1.1 agc <code> 706 1.1 agc #include <tre/regex.h> 707 1.1 agc <br> 708 1.1 agc <br> 709 1.1 agc <font class="type">char</font> *<font 710 1.1 agc class="func">tre_version</font>(<font class="type">void</font>); 711 1.1 agc <br> 712 1.1 agc <font class="type">int</font> <font 713 1.1 agc class="func">tre_config</font>(<font class="type">int</font> <font 714 1.1 agc class="arg">query</font>, <font class="type">void</font> *<font 715 1.1 agc class="arg">result</font>); 716 1.1 agc <br> 717 1.1 agc </code> 718 1.1 agc </div> 719 1.1 agc 720 1.1 agc <p> 721 1.1 agc The <tt><font class="func">tre_config</font>()</tt> function can be 722 1.1 agc used to retrieve information of which optional features have been 723 1.1 agc compiled into the TRE library and information of other parameters that 724 1.1 agc may change between releases. 725 1.1 agc </p> 726 1.1 agc 727 1.1 agc <p> 728 1.1 agc The <tt><font class="arg">query</font></tt> argument is an integer 729 1.1 agc telling what information is requested for. The <tt><font 730 1.1 agc class="arg">result</font></tt> argument is a pointer to a variable 731 1.1 agc where the information is returned. The return value of a call to 732 1.1 agc <tt><font class="func">tre_config</font>()</tt> is zero if <tt><font 733 1.1 agc class="arg">query</font></tt> was recognized, REG_NOMATCH otherwise. 734 1.1 agc </p> 735 1.1 agc 736 1.1 agc <p> 737 1.1 agc The following values are recognized for <tt><font 738 1.1 agc class="arg">query</font></tt>: 739 1.1 agc 740 1.1 agc <blockquote> 741 1.1 agc <dl> 742 1.1 agc <dt><tt>TRE_CONFIG_APPROX</tt></dt> 743 1.1 agc <dd>The result is an integer that is set to one if approximate 744 1.1 agc matching support is available, zero if not.</dd> 745 1.1 agc <dt><tt>TRE_CONFIG_WCHAR</tt></dt> 746 1.1 agc <dd>The result is an integer that is set to one if wide character 747 1.1 agc support is available, zero if not.</dd> 748 1.1 agc <dt><tt>TRE_CONFIG_MULTIBYTE</tt></dt> 749 1.1 agc <dd>The result is an integer that is set to one if multibyte character 750 1.1 agc set support is available, zero if not.</dd> 751 1.1 agc <dt><tt>TRE_CONFIG_SYSTEM_ABI</tt></dt> 752 1.1 agc <dd>The result is an integer that is set to one if TRE has been 753 1.1 agc compiled to be compatible with the system regex ABI, zero if not.</dd> 754 1.1 agc <dt><tt>TRE_CONFIG_VERSION</tt></dt> 755 1.1 agc <dd>The result is a pointer to a static character string that gives 756 1.1 agc the version of the TRE library.</dd> 757 1.1 agc </dl> 758 1.1 agc </blockquote> 759 1.1 agc 760 1.1 agc 761 1.1 agc <p> 762 1.1 agc The <tt><font class="func">tre_version</font>()</tt> function returns 763 1.1 agc a short human readable character string which shows the software name, 764 1.1 agc version, and license. 765 1.1 agc 766 1.1 agc <h2>Preprocessor definitions</h2> 767 1.1 agc 768 1.1 agc <p>The header <tt><tre/regex.h></tt> defines certain 769 1.1 agc C preprocessor symbols. 770 1.1 agc 771 1.1 agc <h3>Version information</h3> 772 1.1 agc 773 1.1 agc <p>The following definitions may be useful for checking whether a new 774 1.1 agc enough version is being used. Note that it is recommended to use the 775 1.1 agc <tt>pkg-config</tt> tool for version and other checks in Autoconf 776 1.1 agc scripts.</p> 777 1.1 agc 778 1.1 agc <blockquote> 779 1.1 agc <dl> 780 1.1 agc <dt><tt>TRE_VERSION</tt></dt> 781 1.1 agc <dd>The version string. </dd> 782 1.1 agc 783 1.1 agc <dt><tt>TRE_VERSION_1</tt></dt> 784 1.1 agc <dd>The major version number (first part of version string).</dd> 785 1.1 agc 786 1.1 agc <dt><tt>TRE_VERSION_2</tt></dt> 787 1.1 agc <dd>The minor version number (second part of version string).</dd> 788 1.1 agc 789 1.1 agc <dt><tt>TRE_VERSION_3</tt></dt> 790 1.1 agc <dd>The micro version number (third part of version string).</dd> 791 1.1 agc 792 1.1 agc </dl> 793 1.1 agc </blockquote> 794 1.1 agc 795 1.1 agc <h3>Features</h3> 796 1.1 agc 797 1.1 agc <p>The following definitions may be useful for checking whether all 798 1.1 agc necessary features are enabled. Use these only if compile time 799 1.1 agc checking suffices (linking statically with TRE). When linking 800 1.1 agc dynamically <a href="#tre_config"><tt>tre_config()</tt></a> should be used 801 1.1 agc instead.</p> 802 1.1 agc 803 1.1 agc <blockquote> 804 1.1 agc <dl> 805 1.1 agc <dt><tt>TRE_APPROX</tt></dt> 806 1.1 agc <dd>This is defined if approximate matching support is enabled. The 807 1.1 agc prototypes for approximate matching functions are defined only if 808 1.1 agc <tt>TRE_APPROX</tt> is defined.</dd> 809 1.1 agc 810 1.1 agc <dt><tt>TRE_WCHAR</tt></dt> 811 1.1 agc <dd>This is defined if wide character support is enabled. The 812 1.1 agc prototypes for wide character matching functions are defined only if 813 1.1 agc <tt>TRE_WCHAR</tt> is defined.</dd> 814 1.1 agc 815 1.1 agc <dt><tt>TRE_MULTIBYTE</tt></dt> 816 1.1 agc <dd>This is defined if multibyte character set support is enabled. 817 1.1 agc If this is not set any locale settings are ignored, and the default 818 1.1 agc locale is used when parsing regexps and matching strings.</dd> 819 1.1 agc 820 1.1 agc </dl> 821 1.1 agc </blockquote> 822