POSIX revision 1.2
11.2Scgd# @(#)POSIX 8.1 (Berkeley) 6/6/93 21.1Salm 31.1SalmComments on the IEEE P1003.2 Draft 12 41.1Salm Part 2: Shell and Utilities 51.1Salm Section 4.55: sed - Stream editor 61.1Salm 71.1SalmDiomidis Spinellis <dds@doc.ic.ac.uk> 81.1SalmKeith Bostic <bostic@cs.berkeley.edu> 91.1Salm 101.1SalmIn the following paragraphs, "wrong" usually means "inconsistent with 111.1Salmhistoric practice", as most of the following comments refer to 121.1Salmundocumented inconsistencies between the historical versions of sed and 131.1Salmthe POSIX 1003.2 standard. All the comments are notes taken while 141.1Salmimplementing a POSIX-compatible version of sed, and should not be 151.1Salminterpreted as official opinions or criticism towards the POSIX committee. 161.1SalmAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 171.1Salm 181.1Salm 1. 32V and BSD derived implementations of sed strip the text 191.1Salm arguments of the a, c and i commands of their initial blanks, 201.1Salm i.e. 211.1Salm 221.1Salm #!/bin/sed -f 231.1Salm a\ 241.1Salm foo\ 251.1Salm \ indent\ 261.1Salm bar 271.1Salm 281.1Salm produces: 291.1Salm 301.1Salm foo 311.1Salm indent 321.1Salm bar 331.1Salm 341.1Salm POSIX does not specify this behavior as the System V versions of 351.1Salm sed do not do this stripping. The argument against stripping is 361.1Salm that it is difficult to write sed scripts that have leading blanks 371.1Salm if they are stripped. The argument for stripping is that it is 381.1Salm difficult to write readable sed scripts unless indentation is allowed 391.1Salm and ignored, and leading whitespace is obtainable by entering a 401.1Salm backslash in front of it. This implementation follows the BSD 411.1Salm historic practice. 421.1Salm 431.1Salm 2. Historical versions of sed required that the w flag be the last 441.1Salm flag to an s command as it takes an additional argument. This 451.1Salm is obvious, but not specified in POSIX. 461.1Salm 471.1Salm 3. Historical versions of sed required that whitespace follow a w 481.1Salm flag to an s command. This is not specified in POSIX. This 491.1Salm implementation permits whitespace but does not require it. 501.1Salm 511.1Salm 4. Historical versions of sed permitted any number of whitespace 521.1Salm characters to follow the w command. This is not specified in 531.1Salm POSIX. This implementation permits whitespace but does not 541.1Salm require it. 551.1Salm 561.1Salm 5. The rule for the l command differs from historic practice. Table 571.1Salm 2-15 includes the various ANSI C escape sequences, including \\ 581.1Salm for backslash. Some historical versions of sed displayed two 591.1Salm digit octal numbers, too, not three as specified by POSIX. POSIX 601.1Salm is a cleanup, and is followed by this implementation. 611.1Salm 621.1Salm 6. The POSIX specification for ! does not specify that for a single 631.1Salm command the command must not contain an address specification 641.1Salm whereas the command list can contain address specifications. The 651.1Salm specification for ! implies that "3!/hello/p" works, and it never 661.1Salm has, historically. Note, 671.1Salm 681.1Salm 3!{ 691.1Salm /hello/p 701.1Salm } 711.1Salm 721.1Salm does work. 731.1Salm 741.1Salm 7. POSIX does not specify what happens with consecutive ! commands 751.1Salm (e.g. /foo/!!!p). Historic implementations allow any number of 761.1Salm !'s without changing the behaviour. (It seems logical that each 771.1Salm one might reverse the behaviour.) This implementation follows 781.1Salm historic practice. 791.1Salm 801.1Salm 8. Historic versions of sed permitted commands to be separated 811.1Salm by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 821.1Salm three lines of a file. This is not specified by POSIX. 831.1Salm Note, the ; command separator is not allowed for the commands 841.1Salm a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 851.1Salm command. This implementation follows historic practice and 861.1Salm implements the ; separator. 871.1Salm 881.1Salm 9. Historic versions of sed terminated the script if EOF was reached 891.1Salm during the execution of the 'n' command, i.e.: 901.1Salm 911.1Salm sed -e ' 921.1Salm n 931.1Salm i\ 941.1Salm hello 951.1Salm ' </dev/null 961.1Salm 971.1Salm did not produce any output. POSIX does not specify this behavior. 981.1Salm This implementation follows historic practice. 991.1Salm 1001.2Scgd10. Deleted. 1011.1Salm 1021.1Salm11. Historical implementations do not output the change text of a c 1031.1Salm command in the case of an address range whose first line number 1041.1Salm is greater than the second (e.g. 3,1). POSIX requires that the 1051.1Salm text be output. Since the historic behavior doesn't seem to have 1061.1Salm any particular purpose, this implementation follows the POSIX 1071.1Salm behavior. 1081.1Salm 1091.1Salm12. POSIX does not specify whether address ranges are checked and 1101.1Salm reset if a command is not executed due to a jump. The following 1111.1Salm program will behave in different ways depending on whether the 1121.1Salm 'c' command is triggered at the third line, i.e. will the text 1131.1Salm be output even though line 3 of the input will never logically 1141.1Salm encounter that command. 1151.1Salm 1161.1Salm 2,4b 1171.1Salm 1,3c\ 1181.1Salm text 1191.1Salm 1201.1Salm Historic implementations, and this implementation, do not output 1211.1Salm the text in the above example. The general rule, therefore, 1221.1Salm is that a range whose second address is never matched extends to 1231.1Salm the end of the input. 1241.1Salm 1251.1Salm13. Historical implementations allow an output suppressing #n at the 1261.1Salm beginning of -e arguments as well as in a script file. POSIX 1271.1Salm does not specify this. This implementation follows historical 1281.1Salm practice. 1291.1Salm 1301.1Salm14. POSIX does not explicitly specify how sed behaves if no script is 1311.1Salm specified. Since the sed Synopsis permits this form of the command, 1321.1Salm and the language in the Description section states that the input 1331.1Salm is output, it seems reasonable that it behave like the cat(1) 1341.1Salm command. Historic sed implementations behave differently for "ls | 1351.1Salm sed", where they produce no output, and "ls | sed -e#", where they 1361.1Salm behave like cat. This implementation behaves like cat in both cases. 1371.1Salm 1381.1Salm15. The POSIX requirement to open all w files at the beginning makes 1391.1Salm sed behave nonintuitively when the w commands are preceded by 1401.1Salm addresses or are within conditional blocks. This implementation 1411.1Salm follows historic practice and POSIX, by default, and provides the 1421.1Salm -a option which opens the files only when they are needed. 1431.1Salm 1441.1Salm16. POSIX does not specify how escape sequences other than \n and \D 1451.1Salm (where D is the delimiter character) are to be treated. This is 1461.1Salm reasonable, however, it also doesn't state that the backslash is 1471.1Salm to be discarded from the output regardless. A strict reading of 1481.1Salm POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 1491.1Salm As historic sed implementations always discarded the backslash, 1501.1Salm this implementation does as well. 1511.1Salm 1521.1Salm17. POSIX specifies that an address can be "empty". This implies 1531.1Salm that constructs like ",d" or "1,d" and ",5d" are allowed. This 1541.1Salm is not true for historic implementations or this implementation 1551.1Salm of sed. 1561.1Salm 1571.1Salm18. The b t and : commands are documented in POSIX to ignore leading 1581.1Salm white space, but no mention is made of trailing white space. 1591.1Salm Historic implementations of sed assigned different locations to 1601.1Salm the labels "x" and "x ". This is not useful, and leads to subtle 1611.1Salm programming errors, but it is historic practice and changing it 1621.1Salm could theoretically break working scripts. This implementation 1631.1Salm follows historic practice. 1641.1Salm 1651.1Salm19. Although POSIX specifies that reading from files that do not exist 1661.1Salm from within the script must not terminate the script, it does not 1671.1Salm specify what happens if a write command fails. Historic practice 1681.1Salm is to fail immediately if the file cannot be opened or written. 1691.1Salm This implementation follows historic practice. 1701.1Salm 1711.1Salm20. Historic practice is that the \n construct can be used for either 1721.1Salm string1 or string2 of the y command. This is not specified by 1731.1Salm POSIX. This implementation follows historic practice. 1741.1Salm 1751.2Scgd21. Deleted. 1761.1Salm 1771.1Salm22. Historic implementations of sed ignore the RE delimiter characters 1781.1Salm within character classes. This is not specified in POSIX. This 1791.1Salm implementation follows historic practice. 1801.1Salm 1811.1Salm23. Historic implementations handle empty RE's in a special way: the 1821.1Salm empty RE is interpreted as if it were the last RE encountered, 1831.1Salm whether in an address or elsewhere. POSIX does not document this 1841.1Salm behavior. For example the command: 1851.1Salm 1861.1Salm sed -e /abc/s//XXX/ 1871.1Salm 1881.1Salm substitutes XXX for the pattern abc. The semantics of "the last 1891.1Salm RE" can be defined in two different ways: 1901.1Salm 1911.1Salm 1. The last RE encountered when compiling (lexical/static scope). 1921.1Salm 2. The last RE encountered while running (dynamic scope). 1931.1Salm 1941.1Salm While many historical implementations fail on programs depending 1951.1Salm on scope differences, the SunOS version exhibited dynamic scope 1961.1Salm behaviour. This implementation does dynamic scoping, as this seems 1971.1Salm the most useful and in order to remain consistent with historical 1981.1Salm practice. 199