11.2Sperry#	$NetBSD: README,v 1.2 1998/01/09 04:12:00 perry Exp $
21.2Sperry
31.1ScgdThis is a nearly-public-domain reimplementation of the V8 regexp(3) package.
41.1ScgdIt gives C programs the ability to use egrep-style regular expressions, and
51.1Scgddoes it in a much cleaner fashion than the analogous routines in SysV.
61.1Scgd
71.1Scgd	Copyright (c) 1986 by University of Toronto.
81.1Scgd	Written by Henry Spencer.  Not derived from licensed software.
91.1Scgd
101.1Scgd	Permission is granted to anyone to use this software for any
111.1Scgd	purpose on any computer system, and to redistribute it freely,
121.1Scgd	subject to the following restrictions:
131.1Scgd
141.1Scgd	1. The author is not responsible for the consequences of use of
151.1Scgd		this software, no matter how awful, even if they arise
161.1Scgd		from defects in it.
171.1Scgd
181.1Scgd	2. The origin of this software must not be misrepresented, either
191.1Scgd		by explicit claim or by omission.
201.1Scgd
211.1Scgd	3. Altered versions must be plainly marked as such, and must not
221.1Scgd		be misrepresented as being the original software.
231.1Scgd
241.1ScgdBarring a couple of small items in the BUGS list, this implementation is
251.1Scgdbelieved 100% compatible with V8.  It should even be binary-compatible,
261.1Scgdsort of, since the only fields in a "struct regexp" that other people have
271.1Scgdany business touching are declared in exactly the same way at the same
281.1Scgdlocation in the struct (the beginning).
291.1Scgd
301.1ScgdThis implementation is *NOT* AT&T/Bell code, and is not derived from licensed
311.1Scgdsoftware.  Even though U of T is a V8 licensee.  This software is based on
321.1Scgda V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
331.1Scgdhere is a complete rewrite and hence is not covered by AT&T copyright).
341.1ScgdThe software was nearly complete at the time of arrival of our V8 tape.
351.1ScgdI haven't even looked at V8 yet, although a friend elsewhere at U of T has
361.1Scgdbeen kind enough to run a few test programs using the V8 regexp(3) to resolve
371.1Scgda few fine points.  I admit to some familiarity with regular-expression
381.1Scgdimplementations of the past, but the only one that this code traces any
391.1Scgdancestry to is the one published in Kernighan & Plauger (from which this
401.1Scgdone draws ideas but not code).
411.1Scgd
421.1ScgdSimplistically:  put this stuff into a source directory, copy regexp.h into
431.1Scgd/usr/include, inspect Makefile for compilation options that need changing
441.1Scgdto suit your local environment, and then do "make r".  This compiles the
451.1Scgdregexp(3) functions, compiles a test program, and runs a large set of
461.1Scgdregression tests.  If there are no complaints, then put regexp.o, regsub.o,
471.1Scgdand regerror.o into your C library, and regexp.3 into your manual-pages
481.1Scgddirectory.
491.1Scgd
501.1ScgdNote that if you don't put regexp.h into /usr/include *before* compiling,
511.1Scgdyou'll have to add "-I." to CFLAGS before compiling.
521.1Scgd
531.1ScgdThe files are:
541.1Scgd
551.1ScgdMakefile	instructions to make everything
561.1Scgdregexp.3	manual page
571.1Scgdregexp.h	header file, for /usr/include
581.1Scgdregexp.c	source for regcomp() and regexec()
591.1Scgdregsub.c	source for regsub()
601.1Scgdregerror.c	source for default regerror()
611.1Scgdregmagic.h	internal header file
621.1Scgdtry.c		source for test program
631.1Scgdtimer.c		source for timing program
641.1Scgdtests		test list for try and timer
651.1Scgd
661.1ScgdThis implementation uses nondeterministic automata rather than the
671.1Scgddeterministic ones found in some other implementations, which makes it
681.1Scgdsimpler, smaller, and faster at compiling regular expressions, but slower
691.1Scgdat executing them.  In theory, anyway.  This implementation does employ
701.1Scgdsome special-case optimizations to make the simpler cases (which do make
711.1Scgdup the bulk of regular expressions actually used) run quickly.  In general,
721.1Scgdif you want blazing speed you're in the wrong place.  Replacing the insides
731.1Scgdof egrep with this stuff is probably a mistake; if you want your own egrep
741.1Scgdyou're going to have to do a lot more work.  But if you want to use regular
751.1Scgdexpressions a little bit in something else, you're in luck.  Note that many
761.1Scgdexisting text editors use nondeterministic regular-expression implementations,
771.1Scgdso you're in good company.
781.1Scgd
791.1ScgdThis stuff should be pretty portable, given appropriate option settings.
801.1ScgdIf your chars have less than 8 bits, you're going to have to change the
811.1Scgdinternal representation of the automaton, although knowledge of the details
821.1Scgdof this is fairly localized.  There are no "reserved" char values except for
831.1ScgdNUL, and no special significance is attached to the top bit of chars.
841.1ScgdThe string(3) functions are used a fair bit, on the grounds that they are
851.1Scgdprobably faster than coding the operations in line.  Some attempts at code
861.1Scgdtuning have been made, but this is invariably a bit machine-specific.
87