11.5Schristos# $NetBSD: POSIX,v 1.5 2014/06/06 00:13:13 christos Exp $
21.5Schristos#	@(#)POSIX	8.1 (Berkeley) 6/6/93
31.5Schristos# $FreeBSD: head/usr.bin/sed/POSIX 168417 2007-04-06 08:43:30Z yar $
41.1Salm
51.1SalmComments on the IEEE P1003.2 Draft 12
61.1Salm     Part 2: Shell and Utilities
71.1Salm  Section 4.55: sed - Stream editor
81.1Salm
91.1SalmDiomidis Spinellis <dds@doc.ic.ac.uk>
101.1SalmKeith Bostic <bostic@cs.berkeley.edu>
111.1Salm
121.1SalmIn the following paragraphs, "wrong" usually means "inconsistent with
131.1Salmhistoric practice", as most of the following comments refer to
141.1Salmundocumented inconsistencies between the historical versions of sed and
151.1Salmthe POSIX 1003.2 standard.  All the comments are notes taken while
161.1Salmimplementing a POSIX-compatible version of sed, and should not be
171.1Salminterpreted as official opinions or criticism towards the POSIX committee.
181.1SalmAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
191.1Salm
201.1Salm 1.	32V and BSD derived implementations of sed strip the text
211.1Salm	arguments of the a, c and i commands of their initial blanks,
221.1Salm	i.e.
231.1Salm
241.1Salm	#!/bin/sed -f
251.1Salm	a\
261.1Salm		foo\
271.1Salm		\  indent\
281.1Salm		bar
291.1Salm
301.1Salm	produces:
311.1Salm
321.1Salm	foo
331.1Salm	  indent
341.1Salm	bar
351.1Salm
361.1Salm	POSIX does not specify this behavior as the System V versions of
371.1Salm	sed do not do this stripping.  The argument against stripping is
381.1Salm	that it is difficult to write sed scripts that have leading blanks
391.1Salm	if they are stripped.  The argument for stripping is that it is
401.1Salm	difficult to write readable sed scripts unless indentation is allowed
411.1Salm	and ignored, and leading whitespace is obtainable by entering a
421.1Salm	backslash in front of it.  This implementation follows the BSD
431.1Salm	historic practice.
441.1Salm
451.1Salm 2.	Historical versions of sed required that the w flag be the last
461.1Salm	flag to an s command as it takes an additional argument.  This
471.1Salm	is obvious, but not specified in POSIX.
481.1Salm
491.1Salm 3.	Historical versions of sed required that whitespace follow a w
501.1Salm	flag to an s command.  This is not specified in POSIX.  This
511.1Salm	implementation permits whitespace but does not require it.
521.1Salm
531.1Salm 4.	Historical versions of sed permitted any number of whitespace
541.1Salm	characters to follow the w command.  This is not specified in
551.1Salm	POSIX.  This implementation permits whitespace but does not
561.1Salm	require it.
571.1Salm
581.1Salm 5.	The rule for the l command differs from historic practice.  Table
591.1Salm	2-15 includes the various ANSI C escape sequences, including \\
601.1Salm	for backslash.  Some historical versions of sed displayed two
611.1Salm	digit octal numbers, too, not three as specified by POSIX.  POSIX
621.1Salm	is a cleanup, and is followed by this implementation.
631.1Salm
641.1Salm 6.	The POSIX specification for ! does not specify that for a single
651.1Salm	command the command must not contain an address specification
661.1Salm	whereas the command list can contain address specifications.  The
671.1Salm	specification for ! implies that "3!/hello/p" works, and it never
681.1Salm	has, historically.  Note,
691.1Salm
701.1Salm		3!{
711.1Salm			/hello/p
721.1Salm		}
731.1Salm
741.1Salm	does work.
751.1Salm
761.1Salm 7.	POSIX does not specify what happens with consecutive ! commands
771.1Salm	(e.g. /foo/!!!p).  Historic implementations allow any number of
781.1Salm	!'s without changing the behaviour.  (It seems logical that each
791.1Salm	one might reverse the behaviour.)  This implementation follows
801.1Salm	historic practice.
811.1Salm
821.1Salm 8.	Historic versions of sed permitted commands to be separated
831.1Salm	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
841.1Salm	three lines of a file.  This is not specified by POSIX.
851.1Salm	Note, the ; command separator is not allowed for the commands
861.1Salm	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
871.1Salm	command.  This implementation follows historic practice and
881.1Salm	implements the ; separator.
891.1Salm
901.1Salm 9.	Historic versions of sed terminated the script if EOF was reached
911.1Salm	during the execution of the 'n' command, i.e.:
921.1Salm
931.1Salm	sed -e '
941.1Salm	n
951.1Salm	i\
961.1Salm	hello
971.1Salm	' </dev/null
981.1Salm
991.1Salm	did not produce any output.  POSIX does not specify this behavior.
1001.1Salm	This implementation follows historic practice.
1011.1Salm
1021.2Scgd10.	Deleted.
1031.1Salm
1041.1Salm11.	Historical implementations do not output the change text of a c
1051.1Salm	command in the case of an address range whose first line number
1061.1Salm	is greater than the second (e.g. 3,1).  POSIX requires that the
1071.1Salm	text be output.  Since the historic behavior doesn't seem to have
1081.1Salm	any particular purpose, this implementation follows the POSIX
1091.1Salm	behavior.
1101.1Salm
1111.1Salm12.	POSIX does not specify whether address ranges are checked and
1121.1Salm	reset if a command is not executed due to a jump.  The following
1131.1Salm	program will behave in different ways depending on whether the
1141.1Salm	'c' command is triggered at the third line, i.e. will the text
1151.1Salm	be output even though line 3 of the input will never logically
1161.1Salm	encounter that command.
1171.1Salm
1181.1Salm	2,4b
1191.1Salm	1,3c\
1201.1Salm		text
1211.1Salm
1221.5Schristos	Historic implementations did not output the text in the above
1231.5Schristos	example.  Therefore it was believed that a range whose second
1241.5Schristos	address was never matched extended to the end of the input.
1251.5Schristos	However, the current practice adopted by this implementation,
1261.5Schristos	as well as by those from GNU and SUN, is as follows:  The text
1271.5Schristos	from the 'c' command still isn't output because the second address
1281.5Schristos	isn't actually matched; but the range is reset after all if its
1291.5Schristos	second address is a line number.  In the above example, only the
1301.5Schristos	first line of the input will be deleted.
1311.1Salm
1321.1Salm13.	Historical implementations allow an output suppressing #n at the
1331.1Salm	beginning of -e arguments as well as in a script file.  POSIX
1341.1Salm	does not specify this.  This implementation follows historical
1351.1Salm	practice.
1361.1Salm
1371.1Salm14.	POSIX does not explicitly specify how sed behaves if no script is
1381.1Salm	specified.  Since the sed Synopsis permits this form of the command,
1391.1Salm	and the language in the Description section states that the input
1401.1Salm	is output, it seems reasonable that it behave like the cat(1)
1411.1Salm	command.  Historic sed implementations behave differently for "ls |
1421.1Salm	sed", where they produce no output, and "ls | sed -e#", where they
1431.1Salm	behave like cat.  This implementation behaves like cat in both cases.
1441.1Salm
1451.1Salm15.	The POSIX requirement to open all w files at the beginning makes
1461.1Salm	sed behave nonintuitively when the w commands are preceded by
1471.1Salm	addresses or are within conditional blocks.  This implementation
1481.1Salm	follows historic practice and POSIX, by default, and provides the
1491.1Salm	-a option which opens the files only when they are needed.
1501.1Salm
1511.1Salm16.	POSIX does not specify how escape sequences other than \n and \D
1521.1Salm	(where D is the delimiter character) are to be treated.  This is
1531.1Salm	reasonable, however, it also doesn't state that the backslash is
1541.1Salm	to be discarded from the output regardless.  A strict reading of
1551.1Salm	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
1561.1Salm	As historic sed implementations always discarded the backslash,
1571.1Salm	this implementation does as well.
1581.1Salm
1591.1Salm17.	POSIX specifies that an address can be "empty".  This implies
1601.1Salm	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
1611.1Salm	is not true for historic implementations or this implementation
1621.1Salm	of sed.
1631.1Salm
1641.1Salm18.	The b t and : commands are documented in POSIX to ignore leading
1651.1Salm	white space, but no mention is made of trailing white space.
1661.1Salm	Historic implementations of sed assigned different locations to
1671.1Salm	the labels "x" and "x ".  This is not useful, and leads to subtle
1681.1Salm	programming errors, but it is historic practice and changing it
1691.1Salm	could theoretically break working scripts.  This implementation
1701.1Salm	follows historic practice.
1711.1Salm
1721.1Salm19.	Although POSIX specifies that reading from files that do not exist
1731.1Salm	from within the script must not terminate the script, it does not
1741.1Salm	specify what happens if a write command fails.  Historic practice
1751.1Salm	is to fail immediately if the file cannot be opened or written.
1761.1Salm	This implementation follows historic practice.
1771.1Salm
1781.1Salm20.	Historic practice is that the \n construct can be used for either
1791.1Salm	string1 or string2 of the y command.  This is not specified by
1801.1Salm	POSIX.  This implementation follows historic practice.
1811.1Salm
1821.2Scgd21.	Deleted.
1831.1Salm
1841.1Salm22.	Historic implementations of sed ignore the RE delimiter characters
1851.1Salm	within character classes.  This is not specified in POSIX.  This
1861.1Salm	implementation follows historic practice.
1871.1Salm
1881.1Salm23.	Historic implementations handle empty RE's in a special way: the
1891.1Salm	empty RE is interpreted as if it were the last RE encountered,
1901.1Salm	whether in an address or elsewhere.  POSIX does not document this
1911.1Salm	behavior.  For example the command:
1921.1Salm
1931.1Salm		sed -e /abc/s//XXX/
1941.1Salm
1951.1Salm	substitutes XXX for the pattern abc.  The semantics of "the last
1961.1Salm	RE" can be defined in two different ways:
1971.1Salm
1981.1Salm	1. The last RE encountered when compiling (lexical/static scope).
1991.1Salm	2. The last RE encountered while running (dynamic scope).
2001.1Salm
2011.1Salm	While many historical implementations fail on programs depending
2021.1Salm	on scope differences, the SunOS version exhibited dynamic scope
2031.1Salm	behaviour.  This implementation does dynamic scoping, as this seems
2041.1Salm	the most useful and in order to remain consistent with historical
2051.1Salm	practice.
206