Home | History | Annotate | Line # | Download | only in regexp
README revision 1.1
      1  1.1  cgd This is a nearly-public-domain reimplementation of the V8 regexp(3) package.
      2  1.1  cgd It gives C programs the ability to use egrep-style regular expressions, and
      3  1.1  cgd does it in a much cleaner fashion than the analogous routines in SysV.
      4  1.1  cgd 
      5  1.1  cgd 	Copyright (c) 1986 by University of Toronto.
      6  1.1  cgd 	Written by Henry Spencer.  Not derived from licensed software.
      7  1.1  cgd 
      8  1.1  cgd 	Permission is granted to anyone to use this software for any
      9  1.1  cgd 	purpose on any computer system, and to redistribute it freely,
     10  1.1  cgd 	subject to the following restrictions:
     11  1.1  cgd 
     12  1.1  cgd 	1. The author is not responsible for the consequences of use of
     13  1.1  cgd 		this software, no matter how awful, even if they arise
     14  1.1  cgd 		from defects in it.
     15  1.1  cgd 
     16  1.1  cgd 	2. The origin of this software must not be misrepresented, either
     17  1.1  cgd 		by explicit claim or by omission.
     18  1.1  cgd 
     19  1.1  cgd 	3. Altered versions must be plainly marked as such, and must not
     20  1.1  cgd 		be misrepresented as being the original software.
     21  1.1  cgd 
     22  1.1  cgd Barring a couple of small items in the BUGS list, this implementation is
     23  1.1  cgd believed 100% compatible with V8.  It should even be binary-compatible,
     24  1.1  cgd sort of, since the only fields in a "struct regexp" that other people have
     25  1.1  cgd any business touching are declared in exactly the same way at the same
     26  1.1  cgd location in the struct (the beginning).
     27  1.1  cgd 
     28  1.1  cgd This implementation is *NOT* AT&T/Bell code, and is not derived from licensed
     29  1.1  cgd software.  Even though U of T is a V8 licensee.  This software is based on
     30  1.1  cgd a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
     31  1.1  cgd here is a complete rewrite and hence is not covered by AT&T copyright).
     32  1.1  cgd The software was nearly complete at the time of arrival of our V8 tape.
     33  1.1  cgd I haven't even looked at V8 yet, although a friend elsewhere at U of T has
     34  1.1  cgd been kind enough to run a few test programs using the V8 regexp(3) to resolve
     35  1.1  cgd a few fine points.  I admit to some familiarity with regular-expression
     36  1.1  cgd implementations of the past, but the only one that this code traces any
     37  1.1  cgd ancestry to is the one published in Kernighan & Plauger (from which this
     38  1.1  cgd one draws ideas but not code).
     39  1.1  cgd 
     40  1.1  cgd Simplistically:  put this stuff into a source directory, copy regexp.h into
     41  1.1  cgd /usr/include, inspect Makefile for compilation options that need changing
     42  1.1  cgd to suit your local environment, and then do "make r".  This compiles the
     43  1.1  cgd regexp(3) functions, compiles a test program, and runs a large set of
     44  1.1  cgd regression tests.  If there are no complaints, then put regexp.o, regsub.o,
     45  1.1  cgd and regerror.o into your C library, and regexp.3 into your manual-pages
     46  1.1  cgd directory.
     47  1.1  cgd 
     48  1.1  cgd Note that if you don't put regexp.h into /usr/include *before* compiling,
     49  1.1  cgd you'll have to add "-I." to CFLAGS before compiling.
     50  1.1  cgd 
     51  1.1  cgd The files are:
     52  1.1  cgd 
     53  1.1  cgd Makefile	instructions to make everything
     54  1.1  cgd regexp.3	manual page
     55  1.1  cgd regexp.h	header file, for /usr/include
     56  1.1  cgd regexp.c	source for regcomp() and regexec()
     57  1.1  cgd regsub.c	source for regsub()
     58  1.1  cgd regerror.c	source for default regerror()
     59  1.1  cgd regmagic.h	internal header file
     60  1.1  cgd try.c		source for test program
     61  1.1  cgd timer.c		source for timing program
     62  1.1  cgd tests		test list for try and timer
     63  1.1  cgd 
     64  1.1  cgd This implementation uses nondeterministic automata rather than the
     65  1.1  cgd deterministic ones found in some other implementations, which makes it
     66  1.1  cgd simpler, smaller, and faster at compiling regular expressions, but slower
     67  1.1  cgd at executing them.  In theory, anyway.  This implementation does employ
     68  1.1  cgd some special-case optimizations to make the simpler cases (which do make
     69  1.1  cgd up the bulk of regular expressions actually used) run quickly.  In general,
     70  1.1  cgd if you want blazing speed you're in the wrong place.  Replacing the insides
     71  1.1  cgd of egrep with this stuff is probably a mistake; if you want your own egrep
     72  1.1  cgd you're going to have to do a lot more work.  But if you want to use regular
     73  1.1  cgd expressions a little bit in something else, you're in luck.  Note that many
     74  1.1  cgd existing text editors use nondeterministic regular-expression implementations,
     75  1.1  cgd so you're in good company.
     76  1.1  cgd 
     77  1.1  cgd This stuff should be pretty portable, given appropriate option settings.
     78  1.1  cgd If your chars have less than 8 bits, you're going to have to change the
     79  1.1  cgd internal representation of the automaton, although knowledge of the details
     80  1.1  cgd of this is fairly localized.  There are no "reserved" char values except for
     81  1.1  cgd NUL, and no special significance is attached to the top bit of chars.
     82  1.1  cgd The string(3) functions are used a fair bit, on the grounds that they are
     83  1.1  cgd probably faster than coding the operations in line.  Some attempts at code
     84  1.1  cgd tuning have been made, but this is invariably a bit machine-specific.
     85