11.5Schristos# $NetBSD: POSIX,v 1.5 2014/06/06 00:13:13 christos Exp $ 21.5Schristos# @(#)POSIX 8.1 (Berkeley) 6/6/93 31.5Schristos# $FreeBSD: head/usr.bin/sed/POSIX 168417 2007-04-06 08:43:30Z yar $ 41.1Salm 51.1SalmComments on the IEEE P1003.2 Draft 12 61.1Salm Part 2: Shell and Utilities 71.1Salm Section 4.55: sed - Stream editor 81.1Salm 91.1SalmDiomidis Spinellis <dds@doc.ic.ac.uk> 101.1SalmKeith Bostic <bostic@cs.berkeley.edu> 111.1Salm 121.1SalmIn the following paragraphs, "wrong" usually means "inconsistent with 131.1Salmhistoric practice", as most of the following comments refer to 141.1Salmundocumented inconsistencies between the historical versions of sed and 151.1Salmthe POSIX 1003.2 standard. All the comments are notes taken while 161.1Salmimplementing a POSIX-compatible version of sed, and should not be 171.1Salminterpreted as official opinions or criticism towards the POSIX committee. 181.1SalmAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 191.1Salm 201.1Salm 1. 32V and BSD derived implementations of sed strip the text 211.1Salm arguments of the a, c and i commands of their initial blanks, 221.1Salm i.e. 231.1Salm 241.1Salm #!/bin/sed -f 251.1Salm a\ 261.1Salm foo\ 271.1Salm \ indent\ 281.1Salm bar 291.1Salm 301.1Salm produces: 311.1Salm 321.1Salm foo 331.1Salm indent 341.1Salm bar 351.1Salm 361.1Salm POSIX does not specify this behavior as the System V versions of 371.1Salm sed do not do this stripping. The argument against stripping is 381.1Salm that it is difficult to write sed scripts that have leading blanks 391.1Salm if they are stripped. The argument for stripping is that it is 401.1Salm difficult to write readable sed scripts unless indentation is allowed 411.1Salm and ignored, and leading whitespace is obtainable by entering a 421.1Salm backslash in front of it. This implementation follows the BSD 431.1Salm historic practice. 441.1Salm 451.1Salm 2. Historical versions of sed required that the w flag be the last 461.1Salm flag to an s command as it takes an additional argument. This 471.1Salm is obvious, but not specified in POSIX. 481.1Salm 491.1Salm 3. Historical versions of sed required that whitespace follow a w 501.1Salm flag to an s command. This is not specified in POSIX. This 511.1Salm implementation permits whitespace but does not require it. 521.1Salm 531.1Salm 4. Historical versions of sed permitted any number of whitespace 541.1Salm characters to follow the w command. This is not specified in 551.1Salm POSIX. This implementation permits whitespace but does not 561.1Salm require it. 571.1Salm 581.1Salm 5. The rule for the l command differs from historic practice. Table 591.1Salm 2-15 includes the various ANSI C escape sequences, including \\ 601.1Salm for backslash. Some historical versions of sed displayed two 611.1Salm digit octal numbers, too, not three as specified by POSIX. POSIX 621.1Salm is a cleanup, and is followed by this implementation. 631.1Salm 641.1Salm 6. The POSIX specification for ! does not specify that for a single 651.1Salm command the command must not contain an address specification 661.1Salm whereas the command list can contain address specifications. The 671.1Salm specification for ! implies that "3!/hello/p" works, and it never 681.1Salm has, historically. Note, 691.1Salm 701.1Salm 3!{ 711.1Salm /hello/p 721.1Salm } 731.1Salm 741.1Salm does work. 751.1Salm 761.1Salm 7. POSIX does not specify what happens with consecutive ! commands 771.1Salm (e.g. /foo/!!!p). Historic implementations allow any number of 781.1Salm !'s without changing the behaviour. (It seems logical that each 791.1Salm one might reverse the behaviour.) This implementation follows 801.1Salm historic practice. 811.1Salm 821.1Salm 8. Historic versions of sed permitted commands to be separated 831.1Salm by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 841.1Salm three lines of a file. This is not specified by POSIX. 851.1Salm Note, the ; command separator is not allowed for the commands 861.1Salm a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 871.1Salm command. This implementation follows historic practice and 881.1Salm implements the ; separator. 891.1Salm 901.1Salm 9. Historic versions of sed terminated the script if EOF was reached 911.1Salm during the execution of the 'n' command, i.e.: 921.1Salm 931.1Salm sed -e ' 941.1Salm n 951.1Salm i\ 961.1Salm hello 971.1Salm ' </dev/null 981.1Salm 991.1Salm did not produce any output. POSIX does not specify this behavior. 1001.1Salm This implementation follows historic practice. 1011.1Salm 1021.2Scgd10. Deleted. 1031.1Salm 1041.1Salm11. Historical implementations do not output the change text of a c 1051.1Salm command in the case of an address range whose first line number 1061.1Salm is greater than the second (e.g. 3,1). POSIX requires that the 1071.1Salm text be output. Since the historic behavior doesn't seem to have 1081.1Salm any particular purpose, this implementation follows the POSIX 1091.1Salm behavior. 1101.1Salm 1111.1Salm12. POSIX does not specify whether address ranges are checked and 1121.1Salm reset if a command is not executed due to a jump. The following 1131.1Salm program will behave in different ways depending on whether the 1141.1Salm 'c' command is triggered at the third line, i.e. will the text 1151.1Salm be output even though line 3 of the input will never logically 1161.1Salm encounter that command. 1171.1Salm 1181.1Salm 2,4b 1191.1Salm 1,3c\ 1201.1Salm text 1211.1Salm 1221.5Schristos Historic implementations did not output the text in the above 1231.5Schristos example. Therefore it was believed that a range whose second 1241.5Schristos address was never matched extended to the end of the input. 1251.5Schristos However, the current practice adopted by this implementation, 1261.5Schristos as well as by those from GNU and SUN, is as follows: The text 1271.5Schristos from the 'c' command still isn't output because the second address 1281.5Schristos isn't actually matched; but the range is reset after all if its 1291.5Schristos second address is a line number. In the above example, only the 1301.5Schristos first line of the input will be deleted. 1311.1Salm 1321.1Salm13. Historical implementations allow an output suppressing #n at the 1331.1Salm beginning of -e arguments as well as in a script file. POSIX 1341.1Salm does not specify this. This implementation follows historical 1351.1Salm practice. 1361.1Salm 1371.1Salm14. POSIX does not explicitly specify how sed behaves if no script is 1381.1Salm specified. Since the sed Synopsis permits this form of the command, 1391.1Salm and the language in the Description section states that the input 1401.1Salm is output, it seems reasonable that it behave like the cat(1) 1411.1Salm command. Historic sed implementations behave differently for "ls | 1421.1Salm sed", where they produce no output, and "ls | sed -e#", where they 1431.1Salm behave like cat. This implementation behaves like cat in both cases. 1441.1Salm 1451.1Salm15. The POSIX requirement to open all w files at the beginning makes 1461.1Salm sed behave nonintuitively when the w commands are preceded by 1471.1Salm addresses or are within conditional blocks. This implementation 1481.1Salm follows historic practice and POSIX, by default, and provides the 1491.1Salm -a option which opens the files only when they are needed. 1501.1Salm 1511.1Salm16. POSIX does not specify how escape sequences other than \n and \D 1521.1Salm (where D is the delimiter character) are to be treated. This is 1531.1Salm reasonable, however, it also doesn't state that the backslash is 1541.1Salm to be discarded from the output regardless. A strict reading of 1551.1Salm POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 1561.1Salm As historic sed implementations always discarded the backslash, 1571.1Salm this implementation does as well. 1581.1Salm 1591.1Salm17. POSIX specifies that an address can be "empty". This implies 1601.1Salm that constructs like ",d" or "1,d" and ",5d" are allowed. This 1611.1Salm is not true for historic implementations or this implementation 1621.1Salm of sed. 1631.1Salm 1641.1Salm18. The b t and : commands are documented in POSIX to ignore leading 1651.1Salm white space, but no mention is made of trailing white space. 1661.1Salm Historic implementations of sed assigned different locations to 1671.1Salm the labels "x" and "x ". This is not useful, and leads to subtle 1681.1Salm programming errors, but it is historic practice and changing it 1691.1Salm could theoretically break working scripts. This implementation 1701.1Salm follows historic practice. 1711.1Salm 1721.1Salm19. Although POSIX specifies that reading from files that do not exist 1731.1Salm from within the script must not terminate the script, it does not 1741.1Salm specify what happens if a write command fails. Historic practice 1751.1Salm is to fail immediately if the file cannot be opened or written. 1761.1Salm This implementation follows historic practice. 1771.1Salm 1781.1Salm20. Historic practice is that the \n construct can be used for either 1791.1Salm string1 or string2 of the y command. This is not specified by 1801.1Salm POSIX. This implementation follows historic practice. 1811.1Salm 1821.2Scgd21. Deleted. 1831.1Salm 1841.1Salm22. Historic implementations of sed ignore the RE delimiter characters 1851.1Salm within character classes. This is not specified in POSIX. This 1861.1Salm implementation follows historic practice. 1871.1Salm 1881.1Salm23. Historic implementations handle empty RE's in a special way: the 1891.1Salm empty RE is interpreted as if it were the last RE encountered, 1901.1Salm whether in an address or elsewhere. POSIX does not document this 1911.1Salm behavior. For example the command: 1921.1Salm 1931.1Salm sed -e /abc/s//XXX/ 1941.1Salm 1951.1Salm substitutes XXX for the pattern abc. The semantics of "the last 1961.1Salm RE" can be defined in two different ways: 1971.1Salm 1981.1Salm 1. The last RE encountered when compiling (lexical/static scope). 1991.1Salm 2. The last RE encountered while running (dynamic scope). 2001.1Salm 2011.1Salm While many historical implementations fail on programs depending 2021.1Salm on scope differences, the SunOS version exhibited dynamic scope 2031.1Salm behaviour. This implementation does dynamic scoping, as this seems 2041.1Salm the most useful and in order to remain consistent with historical 2051.1Salm practice. 206