1$XFree86: xc/programs/xedit/lisp/re/README,v 1.3 2002/09/23 01:25:41 paulo Exp $ 2 3LAST UPDATED: $Date: 2008/07/30 04:16:29 $ 4 5 This is a small regex library for fast matching tokens in text. It was built 6to be used by xedit and it's syntax highlight code. It is not compliant with 7IEEE Std 1003.2, but is expected to be used where very fast matching is 8required, and exotic patterns will not be used. 9 10 To understand what kind of patterns this library is expected to be used with, 11see the file <XRoot>xc/programs/xedit/lisp/modules/progmodes/c.lsp and some 12samples in the file tests.txt, with comments for patterns that will not work, 13or may give incorrect results. 14 15 The library is not built upon the standard regex library by Henry Spencer, 16but is completely written from scratch, but it's syntax is heavily based on 17that library, and the only reason for it to exist is that unfortunately 18the standard version does not fit the requirements needed by xedit. 19Anyways, I would like to thanks Henry for his regex library, it is a really 20very useful tool. 21 22 Small description of understood tokens: 23 24 M A T C H I N G 25------------------------------------------------------------------------ 26. Any character (won't match newline if compiled with RE_NEWLINE) 27\w Any word letter (shortcut to [a-zA-Z0-9_] 28\W Not a word letter (shortcut to [^a-zA-Z0-9_] 29\d Decimal number 30\D Not a decimal number 31\s A space 32\S Not a space 33\l A lower case letter 34\u An upper case letter 35\c A control character, currently the range 1-32 (minus tab) 36\C Not a control character 37\o Octal number 38\O Not an octal number 39\x Hexadecimal number 40\X Not an hexadecimal number 41\< Beginning of a word (matches an empty string) 42\> End of a word (matches an empty string) 43^ Beginning of a line (matches an empty string) 44$ End of a line (matches an empty string) 45[...] Matches one of the characters inside the brackets 46 ranges are specified separating two characters with "-". 47 If the first character is "^", matches only if the 48 character is not in this range. To add a "]" make it 49 the first character, and to add a "-" make it the last. 50\1 to \9 Backreference, matches the text that was matched by a group, 51 that is, text that was matched by the pattern inside 52 "(" and ")". 53 54 55 O P E R A T O R S 56------------------------------------------------------------------------ 57() Any pattern inside works as a backreference, and is also 58 used to group patterns. 59| Alternation, allows choosing different possibilities, like 60 character ranges, but allows patterns of different lengths. 61 62 63 R E P E T I T I O N 64------------------------------------------------------------------------ 65<re>* <re> may occur any number of times, including zero 66<re>+ <re> must occur at least once 67<re>? <re> is optional 68<re>{<e>} <re> must occur exactly <e> times 69<re>{<n>,} <re> must occur at least <n> times 70<re>{,<m>} <re> must not occur more than <m> times 71<re>{<n>,<m>} <re> must occur at least <n> times, but no more than <m> 72 73 74 Note that "." is a special character, and when used with a repetition 75operator it changes completely its meaning. For example, ".*" matches 76anything up to the end of the input string (unless the pattern was compiled 77with RE_NEWLINE, in that case it will match anything, but a newline). 78 79 80 Limitations: 81 82o Only minimal matches supported. The engine has only one level "backtracking", 83 so, it also only does minimal matches to allow backreferences working 84 properly, and to avoid failing to match depending on the input. 85 86o Only one level "grouping", for example, with the pattern: 87 (a(b)c) 88 If "abc" is anywhere in the input, it will be in "\1", but there will 89 not exist a "\2" for "b". 90 91o Some "special repetitions" were not implemented, these are: 92 .{<e>} 93 .{<n>,} 94 .{,<m>} 95 .{<n>,<m>} 96 97o Some patterns will never match, for example: 98 \w*\d 99 Since "\w*" already includes all possible matches of "\d", "\d" will 100 only be tested when "\w*" failed. There are no plans to make such 101 patterns work. 102 103 104 Some of these limitations may be worked on future versions of the library, 105but this is not what the library is expected to do, and, adding support for 106correct handling of these would probably make the library slower, what is 107not the reason of it to exist in the first time. 108 109 If you need "true" regex than this library is not for you, but if all 110you need is support for very quickly finding simple patterns, than this 111library can be a very powerful tool, on some patterns it can run more 112than 200 times faster than "true" regex implementations! And this is 113the reason it was written. 114 115 116 117 Send comments and code to me (paulo@XFree86.Org) or to the XFree86 118mailing/patch lists. 119 120-- 121Paulo 122