11.2Sperry# $NetBSD: README,v 1.2 1998/01/09 04:12:00 perry Exp $ 21.2Sperry 31.1ScgdThis is a nearly-public-domain reimplementation of the V8 regexp(3) package. 41.1ScgdIt gives C programs the ability to use egrep-style regular expressions, and 51.1Scgddoes it in a much cleaner fashion than the analogous routines in SysV. 61.1Scgd 71.1Scgd Copyright (c) 1986 by University of Toronto. 81.1Scgd Written by Henry Spencer. Not derived from licensed software. 91.1Scgd 101.1Scgd Permission is granted to anyone to use this software for any 111.1Scgd purpose on any computer system, and to redistribute it freely, 121.1Scgd subject to the following restrictions: 131.1Scgd 141.1Scgd 1. The author is not responsible for the consequences of use of 151.1Scgd this software, no matter how awful, even if they arise 161.1Scgd from defects in it. 171.1Scgd 181.1Scgd 2. The origin of this software must not be misrepresented, either 191.1Scgd by explicit claim or by omission. 201.1Scgd 211.1Scgd 3. Altered versions must be plainly marked as such, and must not 221.1Scgd be misrepresented as being the original software. 231.1Scgd 241.1ScgdBarring a couple of small items in the BUGS list, this implementation is 251.1Scgdbelieved 100% compatible with V8. It should even be binary-compatible, 261.1Scgdsort of, since the only fields in a "struct regexp" that other people have 271.1Scgdany business touching are declared in exactly the same way at the same 281.1Scgdlocation in the struct (the beginning). 291.1Scgd 301.1ScgdThis implementation is *NOT* AT&T/Bell code, and is not derived from licensed 311.1Scgdsoftware. Even though U of T is a V8 licensee. This software is based on 321.1Scgda V8 manual page sent to me by Dennis Ritchie (the manual page enclosed 331.1Scgdhere is a complete rewrite and hence is not covered by AT&T copyright). 341.1ScgdThe software was nearly complete at the time of arrival of our V8 tape. 351.1ScgdI haven't even looked at V8 yet, although a friend elsewhere at U of T has 361.1Scgdbeen kind enough to run a few test programs using the V8 regexp(3) to resolve 371.1Scgda few fine points. I admit to some familiarity with regular-expression 381.1Scgdimplementations of the past, but the only one that this code traces any 391.1Scgdancestry to is the one published in Kernighan & Plauger (from which this 401.1Scgdone draws ideas but not code). 411.1Scgd 421.1ScgdSimplistically: put this stuff into a source directory, copy regexp.h into 431.1Scgd/usr/include, inspect Makefile for compilation options that need changing 441.1Scgdto suit your local environment, and then do "make r". This compiles the 451.1Scgdregexp(3) functions, compiles a test program, and runs a large set of 461.1Scgdregression tests. If there are no complaints, then put regexp.o, regsub.o, 471.1Scgdand regerror.o into your C library, and regexp.3 into your manual-pages 481.1Scgddirectory. 491.1Scgd 501.1ScgdNote that if you don't put regexp.h into /usr/include *before* compiling, 511.1Scgdyou'll have to add "-I." to CFLAGS before compiling. 521.1Scgd 531.1ScgdThe files are: 541.1Scgd 551.1ScgdMakefile instructions to make everything 561.1Scgdregexp.3 manual page 571.1Scgdregexp.h header file, for /usr/include 581.1Scgdregexp.c source for regcomp() and regexec() 591.1Scgdregsub.c source for regsub() 601.1Scgdregerror.c source for default regerror() 611.1Scgdregmagic.h internal header file 621.1Scgdtry.c source for test program 631.1Scgdtimer.c source for timing program 641.1Scgdtests test list for try and timer 651.1Scgd 661.1ScgdThis implementation uses nondeterministic automata rather than the 671.1Scgddeterministic ones found in some other implementations, which makes it 681.1Scgdsimpler, smaller, and faster at compiling regular expressions, but slower 691.1Scgdat executing them. In theory, anyway. This implementation does employ 701.1Scgdsome special-case optimizations to make the simpler cases (which do make 711.1Scgdup the bulk of regular expressions actually used) run quickly. In general, 721.1Scgdif you want blazing speed you're in the wrong place. Replacing the insides 731.1Scgdof egrep with this stuff is probably a mistake; if you want your own egrep 741.1Scgdyou're going to have to do a lot more work. But if you want to use regular 751.1Scgdexpressions a little bit in something else, you're in luck. Note that many 761.1Scgdexisting text editors use nondeterministic regular-expression implementations, 771.1Scgdso you're in good company. 781.1Scgd 791.1ScgdThis stuff should be pretty portable, given appropriate option settings. 801.1ScgdIf your chars have less than 8 bits, you're going to have to change the 811.1Scgdinternal representation of the automaton, although knowledge of the details 821.1Scgdof this is fairly localized. There are no "reserved" char values except for 831.1ScgdNUL, and no special significance is attached to the top bit of chars. 841.1ScgdThe string(3) functions are used a fair bit, on the grounds that they are 851.1Scgdprobably faster than coding the operations in line. Some attempts at code 861.1Scgdtuning have been made, but this is invariably a bit machine-specific. 87