README revision 1.1
11.1ScgdThis is a nearly-public-domain reimplementation of the V8 regexp(3) package.
21.1ScgdIt gives C programs the ability to use egrep-style regular expressions, and
31.1Scgddoes it in a much cleaner fashion than the analogous routines in SysV.
41.1Scgd
51.1Scgd	Copyright (c) 1986 by University of Toronto.
61.1Scgd	Written by Henry Spencer.  Not derived from licensed software.
71.1Scgd
81.1Scgd	Permission is granted to anyone to use this software for any
91.1Scgd	purpose on any computer system, and to redistribute it freely,
101.1Scgd	subject to the following restrictions:
111.1Scgd
121.1Scgd	1. The author is not responsible for the consequences of use of
131.1Scgd		this software, no matter how awful, even if they arise
141.1Scgd		from defects in it.
151.1Scgd
161.1Scgd	2. The origin of this software must not be misrepresented, either
171.1Scgd		by explicit claim or by omission.
181.1Scgd
191.1Scgd	3. Altered versions must be plainly marked as such, and must not
201.1Scgd		be misrepresented as being the original software.
211.1Scgd
221.1ScgdBarring a couple of small items in the BUGS list, this implementation is
231.1Scgdbelieved 100% compatible with V8.  It should even be binary-compatible,
241.1Scgdsort of, since the only fields in a "struct regexp" that other people have
251.1Scgdany business touching are declared in exactly the same way at the same
261.1Scgdlocation in the struct (the beginning).
271.1Scgd
281.1ScgdThis implementation is *NOT* AT&T/Bell code, and is not derived from licensed
291.1Scgdsoftware.  Even though U of T is a V8 licensee.  This software is based on
301.1Scgda V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
311.1Scgdhere is a complete rewrite and hence is not covered by AT&T copyright).
321.1ScgdThe software was nearly complete at the time of arrival of our V8 tape.
331.1ScgdI haven't even looked at V8 yet, although a friend elsewhere at U of T has
341.1Scgdbeen kind enough to run a few test programs using the V8 regexp(3) to resolve
351.1Scgda few fine points.  I admit to some familiarity with regular-expression
361.1Scgdimplementations of the past, but the only one that this code traces any
371.1Scgdancestry to is the one published in Kernighan & Plauger (from which this
381.1Scgdone draws ideas but not code).
391.1Scgd
401.1ScgdSimplistically:  put this stuff into a source directory, copy regexp.h into
411.1Scgd/usr/include, inspect Makefile for compilation options that need changing
421.1Scgdto suit your local environment, and then do "make r".  This compiles the
431.1Scgdregexp(3) functions, compiles a test program, and runs a large set of
441.1Scgdregression tests.  If there are no complaints, then put regexp.o, regsub.o,
451.1Scgdand regerror.o into your C library, and regexp.3 into your manual-pages
461.1Scgddirectory.
471.1Scgd
481.1ScgdNote that if you don't put regexp.h into /usr/include *before* compiling,
491.1Scgdyou'll have to add "-I." to CFLAGS before compiling.
501.1Scgd
511.1ScgdThe files are:
521.1Scgd
531.1ScgdMakefile	instructions to make everything
541.1Scgdregexp.3	manual page
551.1Scgdregexp.h	header file, for /usr/include
561.1Scgdregexp.c	source for regcomp() and regexec()
571.1Scgdregsub.c	source for regsub()
581.1Scgdregerror.c	source for default regerror()
591.1Scgdregmagic.h	internal header file
601.1Scgdtry.c		source for test program
611.1Scgdtimer.c		source for timing program
621.1Scgdtests		test list for try and timer
631.1Scgd
641.1ScgdThis implementation uses nondeterministic automata rather than the
651.1Scgddeterministic ones found in some other implementations, which makes it
661.1Scgdsimpler, smaller, and faster at compiling regular expressions, but slower
671.1Scgdat executing them.  In theory, anyway.  This implementation does employ
681.1Scgdsome special-case optimizations to make the simpler cases (which do make
691.1Scgdup the bulk of regular expressions actually used) run quickly.  In general,
701.1Scgdif you want blazing speed you're in the wrong place.  Replacing the insides
711.1Scgdof egrep with this stuff is probably a mistake; if you want your own egrep
721.1Scgdyou're going to have to do a lot more work.  But if you want to use regular
731.1Scgdexpressions a little bit in something else, you're in luck.  Note that many
741.1Scgdexisting text editors use nondeterministic regular-expression implementations,
751.1Scgdso you're in good company.
761.1Scgd
771.1ScgdThis stuff should be pretty portable, given appropriate option settings.
781.1ScgdIf your chars have less than 8 bits, you're going to have to change the
791.1Scgdinternal representation of the automaton, although knowledge of the details
801.1Scgdof this is fairly localized.  There are no "reserved" char values except for
811.1ScgdNUL, and no special significance is attached to the top bit of chars.
821.1ScgdThe string(3) functions are used a fair bit, on the grounds that they are
831.1Scgdprobably faster than coding the operations in line.  Some attempts at code
841.1Scgdtuning have been made, but this is invariably a bit machine-specific.
85