POSIX revision 1.1
11.1Salm# @(#)POSIX 5.9 (Berkeley) 8/28/92 21.1Salm 31.1SalmComments on the IEEE P1003.2 Draft 12 41.1Salm Part 2: Shell and Utilities 51.1Salm Section 4.55: sed - Stream editor 61.1Salm 71.1SalmDiomidis Spinellis <dds@doc.ic.ac.uk> 81.1SalmKeith Bostic <bostic@cs.berkeley.edu> 91.1Salm 101.1SalmIn the following paragraphs, "wrong" usually means "inconsistent with 111.1Salmhistoric practice", as most of the following comments refer to 121.1Salmundocumented inconsistencies between the historical versions of sed and 131.1Salmthe POSIX 1003.2 standard. All the comments are notes taken while 141.1Salmimplementing a POSIX-compatible version of sed, and should not be 151.1Salminterpreted as official opinions or criticism towards the POSIX committee. 161.1SalmAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 171.1Salm 181.1Salm 1. 32V and BSD derived implementations of sed strip the text 191.1Salm arguments of the a, c and i commands of their initial blanks, 201.1Salm i.e. 211.1Salm 221.1Salm #!/bin/sed -f 231.1Salm a\ 241.1Salm foo\ 251.1Salm \ indent\ 261.1Salm bar 271.1Salm 281.1Salm produces: 291.1Salm 301.1Salm foo 311.1Salm indent 321.1Salm bar 331.1Salm 341.1Salm POSIX does not specify this behavior as the System V versions of 351.1Salm sed do not do this stripping. The argument against stripping is 361.1Salm that it is difficult to write sed scripts that have leading blanks 371.1Salm if they are stripped. The argument for stripping is that it is 381.1Salm difficult to write readable sed scripts unless indentation is allowed 391.1Salm and ignored, and leading whitespace is obtainable by entering a 401.1Salm backslash in front of it. This implementation follows the BSD 411.1Salm historic practice. 421.1Salm 431.1Salm 2. Historical versions of sed required that the w flag be the last 441.1Salm flag to an s command as it takes an additional argument. This 451.1Salm is obvious, but not specified in POSIX. 461.1Salm 471.1Salm 3. Historical versions of sed required that whitespace follow a w 481.1Salm flag to an s command. This is not specified in POSIX. This 491.1Salm implementation permits whitespace but does not require it. 501.1Salm 511.1Salm 4. Historical versions of sed permitted any number of whitespace 521.1Salm characters to follow the w command. This is not specified in 531.1Salm POSIX. This implementation permits whitespace but does not 541.1Salm require it. 551.1Salm 561.1Salm 5. The rule for the l command differs from historic practice. Table 571.1Salm 2-15 includes the various ANSI C escape sequences, including \\ 581.1Salm for backslash. Some historical versions of sed displayed two 591.1Salm digit octal numbers, too, not three as specified by POSIX. POSIX 601.1Salm is a cleanup, and is followed by this implementation. 611.1Salm 621.1Salm 6. The POSIX specification for ! does not specify that for a single 631.1Salm command the command must not contain an address specification 641.1Salm whereas the command list can contain address specifications. The 651.1Salm specification for ! implies that "3!/hello/p" works, and it never 661.1Salm has, historically. Note, 671.1Salm 681.1Salm 3!{ 691.1Salm /hello/p 701.1Salm } 711.1Salm 721.1Salm does work. 731.1Salm 741.1Salm 7. POSIX does not specify what happens with consecutive ! commands 751.1Salm (e.g. /foo/!!!p). Historic implementations allow any number of 761.1Salm !'s without changing the behaviour. (It seems logical that each 771.1Salm one might reverse the behaviour.) This implementation follows 781.1Salm historic practice. 791.1Salm 801.1Salm 8. Historic versions of sed permitted commands to be separated 811.1Salm by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 821.1Salm three lines of a file. This is not specified by POSIX. 831.1Salm Note, the ; command separator is not allowed for the commands 841.1Salm a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 851.1Salm command. This implementation follows historic practice and 861.1Salm implements the ; separator. 871.1Salm 881.1Salm 9. Historic versions of sed terminated the script if EOF was reached 891.1Salm during the execution of the 'n' command, i.e.: 901.1Salm 911.1Salm sed -e ' 921.1Salm n 931.1Salm i\ 941.1Salm hello 951.1Salm ' </dev/null 961.1Salm 971.1Salm did not produce any output. POSIX does not specify this behavior. 981.1Salm This implementation follows historic practice. 991.1Salm 1001.1Salm10. POSIX does not specify that the q command causes all lines that 1011.1Salm have been appended to be output and that the pattern space is 1021.1Salm printed before exiting. This implementation follows historic 1031.1Salm practice. 1041.1Salm 1051.1Salm11. Historical implementations do not output the change text of a c 1061.1Salm command in the case of an address range whose first line number 1071.1Salm is greater than the second (e.g. 3,1). POSIX requires that the 1081.1Salm text be output. Since the historic behavior doesn't seem to have 1091.1Salm any particular purpose, this implementation follows the POSIX 1101.1Salm behavior. 1111.1Salm 1121.1Salm12. POSIX does not specify whether address ranges are checked and 1131.1Salm reset if a command is not executed due to a jump. The following 1141.1Salm program will behave in different ways depending on whether the 1151.1Salm 'c' command is triggered at the third line, i.e. will the text 1161.1Salm be output even though line 3 of the input will never logically 1171.1Salm encounter that command. 1181.1Salm 1191.1Salm 2,4b 1201.1Salm 1,3c\ 1211.1Salm text 1221.1Salm 1231.1Salm Historic implementations, and this implementation, do not output 1241.1Salm the text in the above example. The general rule, therefore, 1251.1Salm is that a range whose second address is never matched extends to 1261.1Salm the end of the input. 1271.1Salm 1281.1Salm13. Historical implementations allow an output suppressing #n at the 1291.1Salm beginning of -e arguments as well as in a script file. POSIX 1301.1Salm does not specify this. This implementation follows historical 1311.1Salm practice. 1321.1Salm 1331.1Salm14. POSIX does not explicitly specify how sed behaves if no script is 1341.1Salm specified. Since the sed Synopsis permits this form of the command, 1351.1Salm and the language in the Description section states that the input 1361.1Salm is output, it seems reasonable that it behave like the cat(1) 1371.1Salm command. Historic sed implementations behave differently for "ls | 1381.1Salm sed", where they produce no output, and "ls | sed -e#", where they 1391.1Salm behave like cat. This implementation behaves like cat in both cases. 1401.1Salm 1411.1Salm15. The POSIX requirement to open all w files at the beginning makes 1421.1Salm sed behave nonintuitively when the w commands are preceded by 1431.1Salm addresses or are within conditional blocks. This implementation 1441.1Salm follows historic practice and POSIX, by default, and provides the 1451.1Salm -a option which opens the files only when they are needed. 1461.1Salm 1471.1Salm16. POSIX does not specify how escape sequences other than \n and \D 1481.1Salm (where D is the delimiter character) are to be treated. This is 1491.1Salm reasonable, however, it also doesn't state that the backslash is 1501.1Salm to be discarded from the output regardless. A strict reading of 1511.1Salm POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 1521.1Salm As historic sed implementations always discarded the backslash, 1531.1Salm this implementation does as well. 1541.1Salm 1551.1Salm17. POSIX specifies that an address can be "empty". This implies 1561.1Salm that constructs like ",d" or "1,d" and ",5d" are allowed. This 1571.1Salm is not true for historic implementations or this implementation 1581.1Salm of sed. 1591.1Salm 1601.1Salm18. The b t and : commands are documented in POSIX to ignore leading 1611.1Salm white space, but no mention is made of trailing white space. 1621.1Salm Historic implementations of sed assigned different locations to 1631.1Salm the labels "x" and "x ". This is not useful, and leads to subtle 1641.1Salm programming errors, but it is historic practice and changing it 1651.1Salm could theoretically break working scripts. This implementation 1661.1Salm follows historic practice. 1671.1Salm 1681.1Salm19. Although POSIX specifies that reading from files that do not exist 1691.1Salm from within the script must not terminate the script, it does not 1701.1Salm specify what happens if a write command fails. Historic practice 1711.1Salm is to fail immediately if the file cannot be opened or written. 1721.1Salm This implementation follows historic practice. 1731.1Salm 1741.1Salm20. Historic practice is that the \n construct can be used for either 1751.1Salm string1 or string2 of the y command. This is not specified by 1761.1Salm POSIX. This implementation follows historic practice. 1771.1Salm 1781.1Salm21. POSIX does not specify if the "Nth occurrence" of an RE in a 1791.1Salm substitute command is an overlapping or a non-overlapping one, 1801.1Salm i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". 1811.1Salm Historical practice is to drop core or only do non-overlapping 1821.1Salm RE's. This implementation only does non-overlapping RE's. 1831.1Salm 1841.1Salm22. Historic implementations of sed ignore the RE delimiter characters 1851.1Salm within character classes. This is not specified in POSIX. This 1861.1Salm implementation follows historic practice. 1871.1Salm 1881.1Salm23. Historic implementations handle empty RE's in a special way: the 1891.1Salm empty RE is interpreted as if it were the last RE encountered, 1901.1Salm whether in an address or elsewhere. POSIX does not document this 1911.1Salm behavior. For example the command: 1921.1Salm 1931.1Salm sed -e /abc/s//XXX/ 1941.1Salm 1951.1Salm substitutes XXX for the pattern abc. The semantics of "the last 1961.1Salm RE" can be defined in two different ways: 1971.1Salm 1981.1Salm 1. The last RE encountered when compiling (lexical/static scope). 1991.1Salm 2. The last RE encountered while running (dynamic scope). 2001.1Salm 2011.1Salm While many historical implementations fail on programs depending 2021.1Salm on scope differences, the SunOS version exhibited dynamic scope 2031.1Salm behaviour. This implementation does dynamic scoping, as this seems 2041.1Salm the most useful and in order to remain consistent with historical 2051.1Salm practice. 206