15dfecf96Smrg$XFree86: xc/programs/xedit/lisp/re/README,v 1.3 2002/09/23 01:25:41 paulo Exp $
25dfecf96Smrg
35dfecf96SmrgLAST UPDATED:	$Date: 2008/07/30 04:16:29 $
45dfecf96Smrg
55dfecf96Smrg  This is a small regex library for fast matching tokens in text. It was built
65dfecf96Smrgto be used by xedit and it's syntax highlight code. It is not compliant with
75dfecf96SmrgIEEE Std 1003.2, but is expected to be used where very fast matching is
85dfecf96Smrgrequired, and exotic patterns will not be used.
95dfecf96Smrg
105dfecf96Smrg  To understand what kind of patterns this library is expected to be used with,
115dfecf96Smrgsee the file <XRoot>xc/programs/xedit/lisp/modules/progmodes/c.lsp and some
125dfecf96Smrgsamples in the file tests.txt, with comments for patterns that will not work,
135dfecf96Smrgor may give incorrect results.
145dfecf96Smrg
155dfecf96Smrg  The library is not built upon the standard regex library by Henry Spencer,
165dfecf96Smrgbut is completely written from scratch, but it's syntax is heavily based on
175dfecf96Smrgthat library, and the only reason for it to exist is that unfortunately
185dfecf96Smrgthe standard version does not fit the requirements needed by xedit.
195dfecf96SmrgAnyways, I would like to thanks Henry for his regex library, it is a really
205dfecf96Smrgvery useful tool.
215dfecf96Smrg
225dfecf96Smrg  Small description of understood tokens:
235dfecf96Smrg
245dfecf96Smrg		M A T C H I N G
255dfecf96Smrg------------------------------------------------------------------------
265dfecf96Smrg.		Any character (won't match newline if compiled with RE_NEWLINE)
275dfecf96Smrg\w		Any word letter (shortcut to [a-zA-Z0-9_]
285dfecf96Smrg\W		Not a word letter (shortcut to [^a-zA-Z0-9_]
295dfecf96Smrg\d		Decimal number
305dfecf96Smrg\D		Not a decimal number
315dfecf96Smrg\s		A space
325dfecf96Smrg\S		Not a space
335dfecf96Smrg\l		A lower case letter
345dfecf96Smrg\u		An upper case letter
355dfecf96Smrg\c		A control character, currently the range 1-32 (minus tab)
365dfecf96Smrg\C		Not a control character
375dfecf96Smrg\o		Octal number
385dfecf96Smrg\O		Not an octal number
395dfecf96Smrg\x		Hexadecimal number
405dfecf96Smrg\X		Not an hexadecimal number
415dfecf96Smrg\<		Beginning of a word (matches an empty string)
425dfecf96Smrg\>		End of a word (matches an empty string)
435dfecf96Smrg^		Beginning of a line (matches an empty string)
445dfecf96Smrg$		End of a line (matches an empty string)
455dfecf96Smrg[...]		Matches one of the characters inside the brackets
465dfecf96Smrg		ranges are specified separating two characters with "-".
475dfecf96Smrg		If the first character is "^", matches only if the
485dfecf96Smrg		character is not in this range. To add a "]" make it
495dfecf96Smrg		the first character, and to add a "-" make it the last.
505dfecf96Smrg\1 to \9	Backreference, matches the text that was matched by a group,
515dfecf96Smrg		that is, text that was matched by the pattern inside
525dfecf96Smrg		"(" and ")".
535dfecf96Smrg
545dfecf96Smrg
555dfecf96Smrg		O P E R A T O R S
565dfecf96Smrg------------------------------------------------------------------------
575dfecf96Smrg()		Any pattern inside works as a backreference, and is also
585dfecf96Smrg		used to group patterns.
595dfecf96Smrg|		Alternation, allows choosing different possibilities, like
605dfecf96Smrg		character ranges, but allows patterns of different lengths.
615dfecf96Smrg
625dfecf96Smrg
635dfecf96Smrg		R E P E T I T I O N
645dfecf96Smrg------------------------------------------------------------------------
655dfecf96Smrg<re>*		<re> may occur any number of times, including zero
665dfecf96Smrg<re>+		<re> must occur at least once
675dfecf96Smrg<re>?		<re> is optional
685dfecf96Smrg<re>{<e>}	<re> must occur exactly <e> times
695dfecf96Smrg<re>{<n>,}	<re> must occur at least <n> times
705dfecf96Smrg<re>{,<m>}	<re> must not occur more than <m> times
715dfecf96Smrg<re>{<n>,<m>}	<re> must occur at least <n> times, but no more than <m>
725dfecf96Smrg
735dfecf96Smrg
745dfecf96Smrg  Note that "." is a special character, and when used with a repetition
755dfecf96Smrgoperator it changes completely its meaning. For example, ".*" matches
765dfecf96Smrganything up to the end of the input string (unless the pattern was compiled
775dfecf96Smrgwith RE_NEWLINE, in that case it will match anything, but a newline).
785dfecf96Smrg
795dfecf96Smrg
805dfecf96Smrg  Limitations:
815dfecf96Smrg
825dfecf96Smrgo Only minimal matches supported. The engine has only one level "backtracking",
835dfecf96Smrg  so, it also only does minimal matches to allow backreferences working
845dfecf96Smrg  properly, and to avoid failing to match depending on the input.
855dfecf96Smrg
865dfecf96Smrgo Only one level "grouping", for example, with the pattern:
875dfecf96Smrg	(a(b)c)
885dfecf96Smrg   If "abc" is anywhere in the input, it will be in "\1", but there will
895dfecf96Smrg  not exist a "\2" for "b".
905dfecf96Smrg
915dfecf96Smrgo Some "special repetitions" were not implemented, these are:
925dfecf96Smrg	.{<e>}
935dfecf96Smrg	.{<n>,}
945dfecf96Smrg	.{,<m>}
955dfecf96Smrg	.{<n>,<m>}
965dfecf96Smrg
975dfecf96Smrgo Some patterns will never match, for example:
985dfecf96Smrg	\w*\d
995dfecf96Smrg    Since "\w*" already includes all possible matches of "\d", "\d" will
1005dfecf96Smrg  only be tested when "\w*" failed. There are no plans to make such
1015dfecf96Smrg  patterns work.
1025dfecf96Smrg
1035dfecf96Smrg
1045dfecf96Smrg  Some of these limitations may be worked on future versions of the library,
1055dfecf96Smrgbut this is not what the library is expected to do, and, adding support for
1065dfecf96Smrgcorrect handling of these would probably make the library slower, what is
1075dfecf96Smrgnot the reason of it to exist in the first time.
1085dfecf96Smrg
1095dfecf96Smrg  If you need "true" regex than this library is not for you, but if all
1105dfecf96Smrgyou need is support for very quickly finding simple patterns, than this
1115dfecf96Smrglibrary can be a very powerful tool, on some patterns it can run more
1125dfecf96Smrgthan 200 times faster than "true" regex implementations! And this is
1135dfecf96Smrgthe reason it was written.
1145dfecf96Smrg
1155dfecf96Smrg
1165dfecf96Smrg
1175dfecf96Smrg  Send comments and code to me (paulo@XFree86.Org) or to the XFree86
1185dfecf96Smrgmailing/patch lists.
1195dfecf96Smrg
1205dfecf96Smrg--
1215dfecf96SmrgPaulo
122