15dfecf96Smrg$XFree86: xc/programs/xedit/lisp/re/README,v 1.3 2002/09/23 01:25:41 paulo Exp $ 25dfecf96Smrg 35dfecf96SmrgLAST UPDATED: $Date: 2008/07/30 04:16:29 $ 45dfecf96Smrg 55dfecf96Smrg This is a small regex library for fast matching tokens in text. It was built 65dfecf96Smrgto be used by xedit and it's syntax highlight code. It is not compliant with 75dfecf96SmrgIEEE Std 1003.2, but is expected to be used where very fast matching is 85dfecf96Smrgrequired, and exotic patterns will not be used. 95dfecf96Smrg 105dfecf96Smrg To understand what kind of patterns this library is expected to be used with, 115dfecf96Smrgsee the file <XRoot>xc/programs/xedit/lisp/modules/progmodes/c.lsp and some 125dfecf96Smrgsamples in the file tests.txt, with comments for patterns that will not work, 135dfecf96Smrgor may give incorrect results. 145dfecf96Smrg 155dfecf96Smrg The library is not built upon the standard regex library by Henry Spencer, 165dfecf96Smrgbut is completely written from scratch, but it's syntax is heavily based on 175dfecf96Smrgthat library, and the only reason for it to exist is that unfortunately 185dfecf96Smrgthe standard version does not fit the requirements needed by xedit. 195dfecf96SmrgAnyways, I would like to thanks Henry for his regex library, it is a really 205dfecf96Smrgvery useful tool. 215dfecf96Smrg 225dfecf96Smrg Small description of understood tokens: 235dfecf96Smrg 245dfecf96Smrg M A T C H I N G 255dfecf96Smrg------------------------------------------------------------------------ 265dfecf96Smrg. Any character (won't match newline if compiled with RE_NEWLINE) 275dfecf96Smrg\w Any word letter (shortcut to [a-zA-Z0-9_] 285dfecf96Smrg\W Not a word letter (shortcut to [^a-zA-Z0-9_] 295dfecf96Smrg\d Decimal number 305dfecf96Smrg\D Not a decimal number 315dfecf96Smrg\s A space 325dfecf96Smrg\S Not a space 335dfecf96Smrg\l A lower case letter 345dfecf96Smrg\u An upper case letter 355dfecf96Smrg\c A control character, currently the range 1-32 (minus tab) 365dfecf96Smrg\C Not a control character 375dfecf96Smrg\o Octal number 385dfecf96Smrg\O Not an octal number 395dfecf96Smrg\x Hexadecimal number 405dfecf96Smrg\X Not an hexadecimal number 415dfecf96Smrg\< Beginning of a word (matches an empty string) 425dfecf96Smrg\> End of a word (matches an empty string) 435dfecf96Smrg^ Beginning of a line (matches an empty string) 445dfecf96Smrg$ End of a line (matches an empty string) 455dfecf96Smrg[...] Matches one of the characters inside the brackets 465dfecf96Smrg ranges are specified separating two characters with "-". 475dfecf96Smrg If the first character is "^", matches only if the 485dfecf96Smrg character is not in this range. To add a "]" make it 495dfecf96Smrg the first character, and to add a "-" make it the last. 505dfecf96Smrg\1 to \9 Backreference, matches the text that was matched by a group, 515dfecf96Smrg that is, text that was matched by the pattern inside 525dfecf96Smrg "(" and ")". 535dfecf96Smrg 545dfecf96Smrg 555dfecf96Smrg O P E R A T O R S 565dfecf96Smrg------------------------------------------------------------------------ 575dfecf96Smrg() Any pattern inside works as a backreference, and is also 585dfecf96Smrg used to group patterns. 595dfecf96Smrg| Alternation, allows choosing different possibilities, like 605dfecf96Smrg character ranges, but allows patterns of different lengths. 615dfecf96Smrg 625dfecf96Smrg 635dfecf96Smrg R E P E T I T I O N 645dfecf96Smrg------------------------------------------------------------------------ 655dfecf96Smrg<re>* <re> may occur any number of times, including zero 665dfecf96Smrg<re>+ <re> must occur at least once 675dfecf96Smrg<re>? <re> is optional 685dfecf96Smrg<re>{<e>} <re> must occur exactly <e> times 695dfecf96Smrg<re>{<n>,} <re> must occur at least <n> times 705dfecf96Smrg<re>{,<m>} <re> must not occur more than <m> times 715dfecf96Smrg<re>{<n>,<m>} <re> must occur at least <n> times, but no more than <m> 725dfecf96Smrg 735dfecf96Smrg 745dfecf96Smrg Note that "." is a special character, and when used with a repetition 755dfecf96Smrgoperator it changes completely its meaning. For example, ".*" matches 765dfecf96Smrganything up to the end of the input string (unless the pattern was compiled 775dfecf96Smrgwith RE_NEWLINE, in that case it will match anything, but a newline). 785dfecf96Smrg 795dfecf96Smrg 805dfecf96Smrg Limitations: 815dfecf96Smrg 825dfecf96Smrgo Only minimal matches supported. The engine has only one level "backtracking", 835dfecf96Smrg so, it also only does minimal matches to allow backreferences working 845dfecf96Smrg properly, and to avoid failing to match depending on the input. 855dfecf96Smrg 865dfecf96Smrgo Only one level "grouping", for example, with the pattern: 875dfecf96Smrg (a(b)c) 885dfecf96Smrg If "abc" is anywhere in the input, it will be in "\1", but there will 895dfecf96Smrg not exist a "\2" for "b". 905dfecf96Smrg 915dfecf96Smrgo Some "special repetitions" were not implemented, these are: 925dfecf96Smrg .{<e>} 935dfecf96Smrg .{<n>,} 945dfecf96Smrg .{,<m>} 955dfecf96Smrg .{<n>,<m>} 965dfecf96Smrg 975dfecf96Smrgo Some patterns will never match, for example: 985dfecf96Smrg \w*\d 995dfecf96Smrg Since "\w*" already includes all possible matches of "\d", "\d" will 1005dfecf96Smrg only be tested when "\w*" failed. There are no plans to make such 1015dfecf96Smrg patterns work. 1025dfecf96Smrg 1035dfecf96Smrg 1045dfecf96Smrg Some of these limitations may be worked on future versions of the library, 1055dfecf96Smrgbut this is not what the library is expected to do, and, adding support for 1065dfecf96Smrgcorrect handling of these would probably make the library slower, what is 1075dfecf96Smrgnot the reason of it to exist in the first time. 1085dfecf96Smrg 1095dfecf96Smrg If you need "true" regex than this library is not for you, but if all 1105dfecf96Smrgyou need is support for very quickly finding simple patterns, than this 1115dfecf96Smrglibrary can be a very powerful tool, on some patterns it can run more 1125dfecf96Smrgthan 200 times faster than "true" regex implementations! And this is 1135dfecf96Smrgthe reason it was written. 1145dfecf96Smrg 1155dfecf96Smrg 1165dfecf96Smrg 1175dfecf96Smrg Send comments and code to me (paulo@XFree86.Org) or to the XFree86 1185dfecf96Smrgmailing/patch lists. 1195dfecf96Smrg 1205dfecf96Smrg-- 1215dfecf96SmrgPaulo 122