README revision 1.1
11.1ScgdThis is a nearly-public-domain reimplementation of the V8 regexp(3) package. 21.1ScgdIt gives C programs the ability to use egrep-style regular expressions, and 31.1Scgddoes it in a much cleaner fashion than the analogous routines in SysV. 41.1Scgd 51.1Scgd Copyright (c) 1986 by University of Toronto. 61.1Scgd Written by Henry Spencer. Not derived from licensed software. 71.1Scgd 81.1Scgd Permission is granted to anyone to use this software for any 91.1Scgd purpose on any computer system, and to redistribute it freely, 101.1Scgd subject to the following restrictions: 111.1Scgd 121.1Scgd 1. The author is not responsible for the consequences of use of 131.1Scgd this software, no matter how awful, even if they arise 141.1Scgd from defects in it. 151.1Scgd 161.1Scgd 2. The origin of this software must not be misrepresented, either 171.1Scgd by explicit claim or by omission. 181.1Scgd 191.1Scgd 3. Altered versions must be plainly marked as such, and must not 201.1Scgd be misrepresented as being the original software. 211.1Scgd 221.1ScgdBarring a couple of small items in the BUGS list, this implementation is 231.1Scgdbelieved 100% compatible with V8. It should even be binary-compatible, 241.1Scgdsort of, since the only fields in a "struct regexp" that other people have 251.1Scgdany business touching are declared in exactly the same way at the same 261.1Scgdlocation in the struct (the beginning). 271.1Scgd 281.1ScgdThis implementation is *NOT* AT&T/Bell code, and is not derived from licensed 291.1Scgdsoftware. Even though U of T is a V8 licensee. This software is based on 301.1Scgda V8 manual page sent to me by Dennis Ritchie (the manual page enclosed 311.1Scgdhere is a complete rewrite and hence is not covered by AT&T copyright). 321.1ScgdThe software was nearly complete at the time of arrival of our V8 tape. 331.1ScgdI haven't even looked at V8 yet, although a friend elsewhere at U of T has 341.1Scgdbeen kind enough to run a few test programs using the V8 regexp(3) to resolve 351.1Scgda few fine points. I admit to some familiarity with regular-expression 361.1Scgdimplementations of the past, but the only one that this code traces any 371.1Scgdancestry to is the one published in Kernighan & Plauger (from which this 381.1Scgdone draws ideas but not code). 391.1Scgd 401.1ScgdSimplistically: put this stuff into a source directory, copy regexp.h into 411.1Scgd/usr/include, inspect Makefile for compilation options that need changing 421.1Scgdto suit your local environment, and then do "make r". This compiles the 431.1Scgdregexp(3) functions, compiles a test program, and runs a large set of 441.1Scgdregression tests. If there are no complaints, then put regexp.o, regsub.o, 451.1Scgdand regerror.o into your C library, and regexp.3 into your manual-pages 461.1Scgddirectory. 471.1Scgd 481.1ScgdNote that if you don't put regexp.h into /usr/include *before* compiling, 491.1Scgdyou'll have to add "-I." to CFLAGS before compiling. 501.1Scgd 511.1ScgdThe files are: 521.1Scgd 531.1ScgdMakefile instructions to make everything 541.1Scgdregexp.3 manual page 551.1Scgdregexp.h header file, for /usr/include 561.1Scgdregexp.c source for regcomp() and regexec() 571.1Scgdregsub.c source for regsub() 581.1Scgdregerror.c source for default regerror() 591.1Scgdregmagic.h internal header file 601.1Scgdtry.c source for test program 611.1Scgdtimer.c source for timing program 621.1Scgdtests test list for try and timer 631.1Scgd 641.1ScgdThis implementation uses nondeterministic automata rather than the 651.1Scgddeterministic ones found in some other implementations, which makes it 661.1Scgdsimpler, smaller, and faster at compiling regular expressions, but slower 671.1Scgdat executing them. In theory, anyway. This implementation does employ 681.1Scgdsome special-case optimizations to make the simpler cases (which do make 691.1Scgdup the bulk of regular expressions actually used) run quickly. In general, 701.1Scgdif you want blazing speed you're in the wrong place. Replacing the insides 711.1Scgdof egrep with this stuff is probably a mistake; if you want your own egrep 721.1Scgdyou're going to have to do a lot more work. But if you want to use regular 731.1Scgdexpressions a little bit in something else, you're in luck. Note that many 741.1Scgdexisting text editors use nondeterministic regular-expression implementations, 751.1Scgdso you're in good company. 761.1Scgd 771.1ScgdThis stuff should be pretty portable, given appropriate option settings. 781.1ScgdIf your chars have less than 8 bits, you're going to have to change the 791.1Scgdinternal representation of the automaton, although knowledge of the details 801.1Scgdof this is fairly localized. There are no "reserved" char values except for 811.1ScgdNUL, and no special significance is attached to the top bit of chars. 821.1ScgdThe string(3) functions are used a fair bit, on the grounds that they are 831.1Scgdprobably faster than coding the operations in line. Some attempts at code 841.1Scgdtuning have been made, but this is invariably a bit machine-specific. 85