Cross Reference: /src/usr.bin/printf/printf.c

History log of /src/usr.bin/printf/printf.c
Revision	Date	Author	Comments
1.59	24-Nov-2024	kre	Improve detection and diagnosis of invalid values for conversions. (In particular, integer conversions contain no spaces, and must always contain at least 1 digit, '' is not valid).
1.58	07-Aug-2024	kre	Correctly handle extracting wide chars from empty strings. Fix a (probably would have rarely been seen) bug I installed yesterday. It turns out that mbtowc() needs to include the terminating \0 in the length arg passed to it, or it errors (EILSEQ) on a zero length (instead of doing the sane thing and treating that the same as "\0" (treated as being length 1). So, increase the length passed to mbtowc() by 1. That makes no difference in the typical case, it is an upper limit on the number of bytes to examine, and mbtowc() stops after it has converted 1 character, so in the non "" input cases, nothing that matters changes. The rest of this you can skip if you like, not directly related to this change... Note: it is not clear to me what is correct here, POSIX looks to be ambiguous, or strange anyway; in the RETURN VALUE section it says: If s is not a null pointer, mbtowc() shall either return 0 (if s points to the null byte), or return the number of bytes [...] Further for the error possibilities it says: [EILSEQ] An invalid character sequence is detected. In the POSIX locale an [EILSEQ] error cannot occur since all byte values are valid characters. On the other hand our mbtowc(3) says: There are special cases: n == 0 In this case, the first n bytes of the array pointed to by s never form a complete character. Thus, the mbtowc() always fails. Since EILSEQ is the only defined error for mbtowc() in POSIX, and cannot happen (according to it) in the POSIX locale, that "always fails" in our manual page looks dubious. What actually happens in our mbtowc() in the POSIX locale, is that if passed n==0 (and *s == '\0') mbtowc() returns 0 (that's good) but also sets errno to EILSEQ (not so good - though this is not one of the functions guaranteed to not alter errno if it doesn't fail). In other locales it returns -1 (with errno == EILSEQ) when n == 0. (Well, in some other locales anyway, I didn't go and test all of them). Where POSIX gets weird, is that earlier it says: At most n bytes of the array pointed to by s shall be examined. If n == 0, then no bytes can be examined. In that case mbtowc() cannot test whether s points to the null byte, even in the POSIX locale. So it is unclear (to me) what should be returned in that case.
1.57	06-Aug-2024	kre	Add %C format conversion and -L option to printf(1) %C does what everyone always thought %c should do, but doesn't, and operates rather like the %c conversion in printf(3) (to be more precise, like %lc). It takes a code point integer value in the current locale's LC_CTYPE and prints the character designated. -L (this printf's first, and only, option) makes the floating conversions use long double instead of double. In the manual (printf.1) document both of those, and also be more precise as to when things are affecting bytes, and when they're manipulating characters (which makes no difference if LC_ALL=C).
1.56	06-Aug-2024	kre	PR bin/58534 -- printf(1) source code comment fix Update the comment near the start of main() in printf.c so it explains what is really happening and why, rather than being a whole bunch of incorrect BS about what posix does or doesn't require. This changes comments only, NFC (should be no binary change at all).
1.55	18-Jul-2024	wiz	Fix typo in comment. Reported by Emanuele Torre in PR 58439.
1.54	20-May-2021	christos	fix typo
1.53	19-May-2021	kre	Changes for POSIX conformance. 1. exit(1) with an error message on stderr if an I/O error occurs. 1a. To work properly when built into /bin/sh sprinkle clearerr() at appropriate places. 2. Verify that when a 'X data value is used with one of the numeric conversions, that nothing follows the 'X'. It used to be unclear in the standard whether this was required or not, it is clear that with numeric conversions the entire data value must be used, or an error must result. But with string conversions, that isn't the case and unused parts are simply ignored. This one is a numeric conversion with a string value, so which applies? The standard used to contain an example of '+3 being converted, producing the same as '+ ignoring the '3' with no mention of any error, so that's the approach we adopted, The forthcoming version now explicitly states that an error would also be generated from that case, as the '3' was not used by the numeric conversion. 2a. We support those conversions with floating as well as integer conversions, as the standard used to suggest that was required (but it makes no sense, the values are always integers, printing them in a floating format is dumb). The standard has been revised to make it clear that only the integer numeric conversions %d %u %x (etc) are supposed to handle the 'X form of data value. We still allow it with the floating formats as an extension, for backward compat, just in case someone (other than the ATF tests) is using it. It might go away. 2b. These formats are sypposed to convert 'X where 'X' is a character (perhaps multibyte encoded) in the current LC_CTYPE locale category. We don't handle that, only 1 byte characters are handled currently. However the framework is now there to allow code to (one hopes, easily) be added to handle multi-byte locales. (Note that for the purposes of #2 above, 'X' must be a single character, not a single byte.)
1.52	16-Apr-2021	christos	branches: 1.52.2; make value an int to avoid all the casts and conversion warnings.
1.51	16-Apr-2021	christos	Change octal and hex parsing to not use strtoul so that they don't handle '-'. From Martijn van Duren. Also add a warning if the conversion fails (like the gnu printf does)
1.50	22-Jul-2019	kre	Amend the previous change: we can have (almost) the best of both worlds, as when the first arg (which should be the format) contains no % conversions, and there are more args, the results are unspecified (according to POSIX). We can use this so the previous usage printf -- format arg... (which is stupid, and pointless, but used to work) continues to simply ignore the -- (unspecified results mean we can do whatever feels good...) This brings back the #if 0'd block from the previous modification (so there is no longer anything that needs cleaning up later) but runs the getopt() loop it contained only when there are at least 2 args (so any 1 arg printf always uses that arg as the format string, whatever it contains, including just "--") and also only when the first (format) arg contains no '%' characters (which guarantees no % conversions without needing to actually parse the arg). This is the (or a) "unspecified results" case from POSIX, so we are free to do anything we like - including assuming that we might have options (we don't) and pretending to process them.
1.49	21-Jul-2019	kre	Stop assuming that printf handles options in any way at all (it doesn't - that is, shouldn't) which includes processing -- as an "end of options". The first arg is (always) the format string. Remove call to getopt() (but still do associated changes to argc/argv) Note: for now this is #if 0's out instead of being deleted, the old code should be fully removed sometime soon. Problem pointed out on tech-userlevel by Thierry Laronde.
1.48	27-Jan-2019	kre	Revert previous, it was based upon a misreading of the POSIX spec. POSIX requires "as if by calling strtod()" which we did already ... by calling strtod(). Go back to doing that.
1.47	26-Jan-2019	kre	Always convert input numbers (from the command line) in the C locale, not as set in the environment. Conforms with POSIX spec.
1.46	10-Sep-2018	kre	A truly ancient bug found by Edgar Fuss When printf is running builtin in a sh, global vars aren't reset to 0 between invocations. This affects "rval" which remembers state from a previous %b \c and thereafter always exits after the first format conversion, until we get a conversion that generates an error (which resets the flag almost by accident) printf %b abc\\c abc (no \n) printf %s%s hello world hello (no \n, of course, no world ...) printf %s%s hello world hello printf %s%s hello world hello printf %d hello printf: hello: expected numeric value 0 (no \n) printf %s%s hello world helloworld (no \n, and we are back!) This affects both /bin/sh and /bin/csh (and has for a very long time). XXX pullup -8
1.45	04-Sep-2018	kre	Printf's that support \e for escape all seem to also support \E. Except us. Now we do as well.
1.44	03-Sep-2018	kre	Tighten syntax a little (no more %4.2d nonsense). Include the format collected so far in "missing format char" err message. Minor KNF and whitespace.
1.43	31-Aug-2018	kre	PR standards/53563 POSIX requires that signed numbers (strings preceded by '+' or '-') be allowed as inputs to all of the integer format conversions, including those which treat the data as unsigned. Hence we do not need a variant function whose only difference from its companion is to reject strings starting with '-' - instead we use the primary function (getintmax()) for everything and remove getuintmax(). Minor update to the man page to indicate that the arg to all of the integer conversions (diouxX) must be an integer constant (with an optional sign) and to make it blatantly clear that %o is octal and %u is unsigned decimal (for some reason those weren't explicitly stated unlike d i x and X). Delete "respectively", it is not needed (and does not really apply). XXX pullup -8
1.42	25-Jul-2018	kre	NFC: More KNF (remove () around returned constants).
1.41	25-Jul-2018	kre	NFC: whitespace & KNF.
1.40	24-Jul-2018	kre	Add support for F a and A formats (which go with the eEfgG formats already supported.)
1.39	03-Jul-2018	kre	Avoid printing error messages twice when an invalid escape sequence (\ sequence) is present in an arg to a %b conversion.
1.38	03-Jul-2018	kre	From leot@ on tech-userlevel: Avoid running off into oblivion when a format string, or arg to a %b conversion ends in an unescaped backslash. Patch from Leo slightly modified by me.
1.37	16-Jun-2015	christos	branches: 1.37.8; 1.37.14; 1.37.16; fix some error handling.
1.36	16-Jul-2013	christos	branches: 1.36.6; 1.36.8; 1.36.12; WARNS=6
1.35	15-Mar-2011	christos	branches: 1.35.4; 1.35.10; support grouping format.
1.34	13-Oct-2009	christos	Avoid segv on "printf '%*********s' 666", from Maksymilian Arciemowicz
1.33	21-Jul-2008	lukem	branches: 1.33.4; 1.33.8; 1.33.10; Remove the \n and tabs from the __COPYRIGHT() strings. Tweak to use a consistent format.
1.32	28-Mar-2008	christos	branches: 1.32.4; detect more errors from printf/malloc.
1.31	22-Mar-2005	dsl	Remember to consume input bytes when processing '\0nnn' for %b formats
1.30	30-Oct-2004	christos	branches: 1.30.2; - KNF, WARNS=3, pass lint. - Simplify octal parsing code.
1.29	07-Aug-2003	agc	Move UCB-licensed code from 4-clause to 3-clause licence. Patches provided by Joel Baker in PR 22365, verified by myself.
1.28	25-Jun-2003	dsl	Revert previous. 'None' means that the "Utility Syntax Guidlines" apply.
1.27	25-Jun-2003	dsl	Remove getopt() loop, IEEE 1003.1 doesn't say that printf(1) should conform to the "Utility Syntax Guidlines". Fixes PR 21970.
1.26	24-Feb-2003	dsl	Fix the output of NUL bytes within %b formats. (Approved by Christos)
1.25	24-Nov-2002	christos	Fixes from David Laight: - ansification - format of output of jobs command (etc) - job identiers %+, %- etc - $? and $(...) - correct quoting of output of set, export -p and readonly -p - differentiation between nornal and 'posix special' builtins - correct behaviour (posix) for errors on builtins and special builtins - builtin printf and kill - set -o debug (if compiled with DEBUG) - cd src obj (as ksh - too useful to do without) - unset -e name, remove non-readonly variable from export list. (so I could unset -e PS1 before running the test shell...)
1.24	14-Jun-2002	tron	Complete declaration of progprintf() to fix build problem in csh(1).
1.23	14-Jun-2002	wiz	Remove #ifdef __STDC__. De-__P() and ANSIfy. Fix a prototype mismatch uncovered by this.
1.22	05-May-2001	kleink	Change to use {u,}intmax_t internally (was: (unsigned) long).
1.21	19-Dec-1998	christos	brace pollution, and char -> unsigned char
1.20	14-Oct-1998	wsanchez	include unistd
1.19	03-Feb-1998	perry	add <unistd.h> to fix compiler warning
1.18	19-Oct-1997	lukem	s/index/strchr
1.17	18-Oct-1997	mrg	"merge" lite-2. our printf is already kinda different...minor changes only.
1.16	04-Jul-1997	christos	Fix compiler warnings.
1.15	14-Jan-1997	cgd	lint and KNF changes. (mostly casting returns to void to quiet lint.)
1.14	09-Jan-1997	tls	RCS ID police
1.13	03-Feb-1994	jtc	Simplify conversion of "quoted" numeric arguments.
1.12	03-Feb-1994	jtc	Code to check if conversion (by strtol(), strtoul(), or strtod()) was identical, so I moved it into its own function.
1.11	03-Feb-1994	jtc	Add and use getulong() to handle %u, %o, %x & %X formatting directives. It was using getlong(), which caused values larger than LONG_MAX to be truncated to LONG_MAX. As recommended by 1003.2, print warning messages when argument cannot be converted to value or is out of range.
1.10	31-Dec-1993	jtc	Handle format strings error correctly.
1.9	25-Nov-1993	jtc	Error in hextobin() macro messed up hex escape constants.
1.8	19-Nov-1993	jtc	Oops! get rid of the free(), mklong()'s buffer no longer malloc()'d.
1.7	19-Nov-1993	jtc	Return from main() if a \c escape is encountered in a %b string (was an exit()). Use macro constants for "skip1" and "skip2" instead of assigning them each loop iteration. Reformat the multi-case entries in the "big switch" so the lines don't wrap.
1.6	19-Nov-1993	jtc	Move all the code from do_printf() into do-while loop in main(). I need to be able to return from main() when a "\c" in a %b string is encountered.
1.5	19-Nov-1993	jtc	Merged in most of the changes from 4.4 necessary to make printf a sh and csh builtin --- still need to handle the one remaining exit() in the SysV escape string handling code.
1.4	05-Nov-1993	jtc	Changes required to make printf utility POSIX.2 compliant: * Escape characters in the string needed to be processed as they were encountered, otherwise a "\000" octal constant would prematurely terminate the formatting string. * Implemented the %b, SysV echo(1) compatibility, formatting directive.
1.3	01-Aug-1993	mycroft	Add RCS identifiers.
1.2	19-Apr-1993	mycroft	Cleanup for GCC 2.
1.1	21-Mar-1993	cgd	branches: 1.1.1; Initial revision
1.1.1.2	22-Mar-1995	mrg	4.4BSD-Lite2
1.1.1.1	21-Mar-1993	cgd	initial import of 386bsd-0.1 sources
1.30.2.1	27-Mar-2005	tron	Pull up revision 1.31 (requested by dsl in ticket #57): Remember to consume input bytes when processing '\0nnn' for %b formats
1.32.4.1	18-Sep-2008	wrstuden	Sync with wrstuden-revivesa-base-2.
1.33.10.1	21-Apr-2010	matt	sync to netbsd-5
1.33.8.1	14-Oct-2009	sborrill	Pull up the following revisions(s) (requested by christos in ticket #1091): usr.bin/printf/printf.c: revision 1.34 Avoid segv on "printf '%*********s' 666".
1.33.4.1	14-Oct-2009	sborrill	Pull up the following revisions(s) (requested by christos in ticket #1091): usr.bin/printf/printf.c: revision 1.34 Avoid segv on "printf '%*********s' 666".
1.35.10.1	20-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.35.4.1	22-May-2014	yamt	sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
1.36.12.1	12-Jul-2018	martin	Pull up following revision(s) (requested by kre in ticket #1619): usr.bin/printf/printf.c: revision 1.37-1.39 fix some error handling. From leot@ on tech-userlevel: Avoid running off into oblivion when a format string, or arg to a %b conversion ends in an unescaped backslash. Patch from Leo slightly modified by me. Avoid printing error messages twice when an invalid escape sequence (\ sequence) is present in an arg to a %b conversion.
1.36.8.1	12-Jul-2018	martin	Pull up following revision(s) (requested by kre in ticket #1619): usr.bin/printf/printf.c: revision 1.37-1.39 fix some error handling. From leot@ on tech-userlevel: Avoid running off into oblivion when a format string, or arg to a %b conversion ends in an unescaped backslash. Patch from Leo slightly modified by me. Avoid printing error messages twice when an invalid escape sequence (\ sequence) is present in an arg to a %b conversion.
1.36.6.1	12-Jul-2018	martin	Pull up following revision(s) (requested by kre in ticket #1619): usr.bin/printf/printf.c: revision 1.37-1.39 fix some error handling. From leot@ on tech-userlevel: Avoid running off into oblivion when a format string, or arg to a %b conversion ends in an unescaped backslash. Patch from Leo slightly modified by me. Avoid printing error messages twice when an invalid escape sequence (\ sequence) is present in an arg to a %b conversion.
1.37.16.2	13-Apr-2020	martin	Mostly merge changes from HEAD upto 20200411
1.37.16.1	10-Jun-2019	christos	Sync with HEAD
1.37.14.4	26-Jan-2019	pgoyette	Sync with HEAD
1.37.14.3	30-Sep-2018	pgoyette	Ssync with HEAD
1.37.14.2	06-Sep-2018	pgoyette	Sync with HEAD Resolve a couple of conflicts (result of the uimin/uimax changes)
1.37.14.1	28-Jul-2018	pgoyette	Sync with HEAD
1.37.8.3	23-Sep-2018	martin	Pull up following revision(s) (requested by kre in ticket #1020): usr.bin/printf/printf.c: revision 1.46 A truly ancient bug found by Edgar Fuss When printf is running builtin in a sh, global vars aren't reset to 0 between invocations. This affects "rval" which remembers state from a previous %b \c and thereafter always exits after the first format conversion, until we get a conversion that generates an error (which resets the flag almost by accident) printf %b abc\\c abc (no \n) printf %s%s hello world hello (no \n, of course, no world ...) printf %s%s hello world hello printf %s%s hello world hello printf %d hello printf: hello: expected numeric value 0 (no \n) printf %s%s hello world helloworld (no \n, and we are back!) This affects both /bin/sh and /bin/csh (and has for a very long time). XXX pullup -8
1.37.8.2	01-Sep-2018	martin	Pull up following revision(s) (requested by kre in ticket #1002): usr.bin/printf/printf.1: revision 1.31 (via patch) usr.bin/printf/printf.c: revision 1.43 PR standards/53563 POSIX requires that signed numbers (strings preceded by '+' or '-') be allowed as inputs to all of the integer format conversions, including those which treat the data as unsigned. Hence we do not need a variant function whose only difference from its companion is to reject strings starting with '-' - instead we use the primary function (getintmax()) for everything and remove getuintmax(). Minor update to the man page to indicate that the arg to all of the integer conversions (diouxX) must be an integer constant (with an optional sign) and to make it blatantly clear that %o is octal and %u is unsigned decimal (for some reason those weren't explicitly stated unlike d i x and X). Delete "respectively", it is not needed (and does not really apply). XXX pullup -8
1.37.8.1	13-Jul-2018	martin	Pull up following revision(s) (requested by kre in ticket #914): usr.bin/printf/printf.c: revision 1.38,1.39 From leot@ on tech-userlevel: Avoid running off into oblivion when a format string, or arg to a %b conversion ends in an unescaped backslash. Patch from Leo slightly modified by me. Avoid printing error messages twice when an invalid escape sequence (\ sequence) is present in an arg to a %b conversion.
1.52.2.1	31-May-2021	cjep	sync with head

OpenGrok