POSIX revision 1.2
11.2Scgd#	@(#)POSIX	8.1 (Berkeley) 6/6/93
21.1Salm
31.1SalmComments on the IEEE P1003.2 Draft 12
41.1Salm     Part 2: Shell and Utilities
51.1Salm  Section 4.55: sed - Stream editor
61.1Salm
71.1SalmDiomidis Spinellis <dds@doc.ic.ac.uk>
81.1SalmKeith Bostic <bostic@cs.berkeley.edu>
91.1Salm
101.1SalmIn the following paragraphs, "wrong" usually means "inconsistent with
111.1Salmhistoric practice", as most of the following comments refer to
121.1Salmundocumented inconsistencies between the historical versions of sed and
131.1Salmthe POSIX 1003.2 standard.  All the comments are notes taken while
141.1Salmimplementing a POSIX-compatible version of sed, and should not be
151.1Salminterpreted as official opinions or criticism towards the POSIX committee.
161.1SalmAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
171.1Salm
181.1Salm 1.	32V and BSD derived implementations of sed strip the text
191.1Salm	arguments of the a, c and i commands of their initial blanks,
201.1Salm	i.e.
211.1Salm
221.1Salm	#!/bin/sed -f
231.1Salm	a\
241.1Salm		foo\
251.1Salm		\  indent\
261.1Salm		bar
271.1Salm
281.1Salm	produces:
291.1Salm
301.1Salm	foo
311.1Salm	  indent
321.1Salm	bar
331.1Salm
341.1Salm	POSIX does not specify this behavior as the System V versions of
351.1Salm	sed do not do this stripping.  The argument against stripping is
361.1Salm	that it is difficult to write sed scripts that have leading blanks
371.1Salm	if they are stripped.  The argument for stripping is that it is
381.1Salm	difficult to write readable sed scripts unless indentation is allowed
391.1Salm	and ignored, and leading whitespace is obtainable by entering a
401.1Salm	backslash in front of it.  This implementation follows the BSD
411.1Salm	historic practice.
421.1Salm
431.1Salm 2.	Historical versions of sed required that the w flag be the last
441.1Salm	flag to an s command as it takes an additional argument.  This
451.1Salm	is obvious, but not specified in POSIX.
461.1Salm
471.1Salm 3.	Historical versions of sed required that whitespace follow a w
481.1Salm	flag to an s command.  This is not specified in POSIX.  This
491.1Salm	implementation permits whitespace but does not require it.
501.1Salm
511.1Salm 4.	Historical versions of sed permitted any number of whitespace
521.1Salm	characters to follow the w command.  This is not specified in
531.1Salm	POSIX.  This implementation permits whitespace but does not
541.1Salm	require it.
551.1Salm
561.1Salm 5.	The rule for the l command differs from historic practice.  Table
571.1Salm	2-15 includes the various ANSI C escape sequences, including \\
581.1Salm	for backslash.  Some historical versions of sed displayed two
591.1Salm	digit octal numbers, too, not three as specified by POSIX.  POSIX
601.1Salm	is a cleanup, and is followed by this implementation.
611.1Salm
621.1Salm 6.	The POSIX specification for ! does not specify that for a single
631.1Salm	command the command must not contain an address specification
641.1Salm	whereas the command list can contain address specifications.  The
651.1Salm	specification for ! implies that "3!/hello/p" works, and it never
661.1Salm	has, historically.  Note,
671.1Salm
681.1Salm		3!{
691.1Salm			/hello/p
701.1Salm		}
711.1Salm
721.1Salm	does work.
731.1Salm
741.1Salm 7.	POSIX does not specify what happens with consecutive ! commands
751.1Salm	(e.g. /foo/!!!p).  Historic implementations allow any number of
761.1Salm	!'s without changing the behaviour.  (It seems logical that each
771.1Salm	one might reverse the behaviour.)  This implementation follows
781.1Salm	historic practice.
791.1Salm
801.1Salm 8.	Historic versions of sed permitted commands to be separated
811.1Salm	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
821.1Salm	three lines of a file.  This is not specified by POSIX.
831.1Salm	Note, the ; command separator is not allowed for the commands
841.1Salm	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
851.1Salm	command.  This implementation follows historic practice and
861.1Salm	implements the ; separator.
871.1Salm
881.1Salm 9.	Historic versions of sed terminated the script if EOF was reached
891.1Salm	during the execution of the 'n' command, i.e.:
901.1Salm
911.1Salm	sed -e '
921.1Salm	n
931.1Salm	i\
941.1Salm	hello
951.1Salm	' </dev/null
961.1Salm
971.1Salm	did not produce any output.  POSIX does not specify this behavior.
981.1Salm	This implementation follows historic practice.
991.1Salm
1001.2Scgd10.	Deleted.
1011.1Salm
1021.1Salm11.	Historical implementations do not output the change text of a c
1031.1Salm	command in the case of an address range whose first line number
1041.1Salm	is greater than the second (e.g. 3,1).  POSIX requires that the
1051.1Salm	text be output.  Since the historic behavior doesn't seem to have
1061.1Salm	any particular purpose, this implementation follows the POSIX
1071.1Salm	behavior.
1081.1Salm
1091.1Salm12.	POSIX does not specify whether address ranges are checked and
1101.1Salm	reset if a command is not executed due to a jump.  The following
1111.1Salm	program will behave in different ways depending on whether the
1121.1Salm	'c' command is triggered at the third line, i.e. will the text
1131.1Salm	be output even though line 3 of the input will never logically
1141.1Salm	encounter that command.
1151.1Salm
1161.1Salm	2,4b
1171.1Salm	1,3c\
1181.1Salm		text
1191.1Salm
1201.1Salm	Historic implementations, and this implementation, do not output
1211.1Salm	the text in the above example.  The general rule, therefore,
1221.1Salm	is that a range whose second address is never matched extends to
1231.1Salm	the end of the input.
1241.1Salm
1251.1Salm13.	Historical implementations allow an output suppressing #n at the
1261.1Salm	beginning of -e arguments as well as in a script file.  POSIX
1271.1Salm	does not specify this.  This implementation follows historical
1281.1Salm	practice.
1291.1Salm
1301.1Salm14.	POSIX does not explicitly specify how sed behaves if no script is
1311.1Salm	specified.  Since the sed Synopsis permits this form of the command,
1321.1Salm	and the language in the Description section states that the input
1331.1Salm	is output, it seems reasonable that it behave like the cat(1)
1341.1Salm	command.  Historic sed implementations behave differently for "ls |
1351.1Salm	sed", where they produce no output, and "ls | sed -e#", where they
1361.1Salm	behave like cat.  This implementation behaves like cat in both cases.
1371.1Salm
1381.1Salm15.	The POSIX requirement to open all w files at the beginning makes
1391.1Salm	sed behave nonintuitively when the w commands are preceded by
1401.1Salm	addresses or are within conditional blocks.  This implementation
1411.1Salm	follows historic practice and POSIX, by default, and provides the
1421.1Salm	-a option which opens the files only when they are needed.
1431.1Salm
1441.1Salm16.	POSIX does not specify how escape sequences other than \n and \D
1451.1Salm	(where D is the delimiter character) are to be treated.  This is
1461.1Salm	reasonable, however, it also doesn't state that the backslash is
1471.1Salm	to be discarded from the output regardless.  A strict reading of
1481.1Salm	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
1491.1Salm	As historic sed implementations always discarded the backslash,
1501.1Salm	this implementation does as well.
1511.1Salm
1521.1Salm17.	POSIX specifies that an address can be "empty".  This implies
1531.1Salm	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
1541.1Salm	is not true for historic implementations or this implementation
1551.1Salm	of sed.
1561.1Salm
1571.1Salm18.	The b t and : commands are documented in POSIX to ignore leading
1581.1Salm	white space, but no mention is made of trailing white space.
1591.1Salm	Historic implementations of sed assigned different locations to
1601.1Salm	the labels "x" and "x ".  This is not useful, and leads to subtle
1611.1Salm	programming errors, but it is historic practice and changing it
1621.1Salm	could theoretically break working scripts.  This implementation
1631.1Salm	follows historic practice.
1641.1Salm
1651.1Salm19.	Although POSIX specifies that reading from files that do not exist
1661.1Salm	from within the script must not terminate the script, it does not
1671.1Salm	specify what happens if a write command fails.  Historic practice
1681.1Salm	is to fail immediately if the file cannot be opened or written.
1691.1Salm	This implementation follows historic practice.
1701.1Salm
1711.1Salm20.	Historic practice is that the \n construct can be used for either
1721.1Salm	string1 or string2 of the y command.  This is not specified by
1731.1Salm	POSIX.  This implementation follows historic practice.
1741.1Salm
1751.2Scgd21.	Deleted.
1761.1Salm
1771.1Salm22.	Historic implementations of sed ignore the RE delimiter characters
1781.1Salm	within character classes.  This is not specified in POSIX.  This
1791.1Salm	implementation follows historic practice.
1801.1Salm
1811.1Salm23.	Historic implementations handle empty RE's in a special way: the
1821.1Salm	empty RE is interpreted as if it were the last RE encountered,
1831.1Salm	whether in an address or elsewhere.  POSIX does not document this
1841.1Salm	behavior.  For example the command:
1851.1Salm
1861.1Salm		sed -e /abc/s//XXX/
1871.1Salm
1881.1Salm	substitutes XXX for the pattern abc.  The semantics of "the last
1891.1Salm	RE" can be defined in two different ways:
1901.1Salm
1911.1Salm	1. The last RE encountered when compiling (lexical/static scope).
1921.1Salm	2. The last RE encountered while running (dynamic scope).
1931.1Salm
1941.1Salm	While many historical implementations fail on programs depending
1951.1Salm	on scope differences, the SunOS version exhibited dynamic scope
1961.1Salm	behaviour.  This implementation does dynamic scoping, as this seems
1971.1Salm	the most useful and in order to remain consistent with historical
1981.1Salm	practice.
199