Home | History | Annotate | Download | only in printf
History log of /src/usr.bin/printf/printf.c
RevisionDateAuthorComments
 1.59  24-Nov-2024  kre Improve detection and diagnosis of invalid values for conversions.
(In particular, integer conversions contain no spaces, and must always
contain at least 1 digit, '' is not valid).
 1.58  07-Aug-2024  kre Correctly handle extracting wide chars from empty strings.

Fix a (probably would have rarely been seen) bug I installed yesterday.

It turns out that mbtowc() needs to include the terminating \0 in the
length arg passed to it, or it errors (EILSEQ) on a zero length (instead
of doing the sane thing and treating that the same as "\0" (treated as
being length 1). So, increase the length passed to mbtowc() by 1.
That makes no difference in the typical case, it is an upper limit on
the number of bytes to examine, and mbtowc() stops after it has
converted 1 character, so in the non "" input cases, nothing that
matters changes.

The rest of this you can skip if you like, not directly related to
this change...

Note: it is not clear to me what is correct here, POSIX looks to be
ambiguous, or strange anyway; in the RETURN VALUE section it says:

If s is not a null pointer, mbtowc() shall either return 0 (if s points
to the null byte), or return the number of bytes [...]

Further for the error possibilities it says:

[EILSEQ] An invalid character sequence is detected. In the POSIX locale
an [EILSEQ] error cannot occur since all byte values are valid
characters.

On the other hand our mbtowc(3) says:

There are special cases:

n == 0 In this case, the first n bytes of the array pointed to by
s never form a complete character. Thus, the mbtowc()
always fails.

Since EILSEQ is the only defined error for mbtowc() in POSIX, and
cannot happen (according to it) in the POSIX locale, that "always fails"
in our manual page looks dubious.

What actually happens in our mbtowc() in the POSIX locale, is that if
passed n==0 (and *s == '\0') mbtowc() returns 0 (that's good) but
also sets errno to EILSEQ (not so good - though this is not one of
the functions guaranteed to not alter errno if it doesn't fail).

In other locales it returns -1 (with errno == EILSEQ) when n == 0.
(Well, in some other locales anyway, I didn't go and test all of them).

Where POSIX gets weird, is that earlier it says:

At most n bytes of the array pointed to by s shall be examined.

If n == 0, then no bytes can be examined. In that case mbtowc()
cannot test whether s points to the null byte, even in the POSIX locale.

So it is unclear (to me) what should be returned in that case.
 1.57  06-Aug-2024  kre Add %C format conversion and -L option to printf(1)

%C does what everyone always thought %c should do, but doesn't,
and operates rather like the %c conversion in printf(3) (to be
more precise, like %lc). It takes a code point integer value
in the current locale's LC_CTYPE and prints the character designated.

-L (this printf's first, and only, option) makes the floating conversions
use long double instead of double.

In the manual (printf.1) document both of those, and also be more
precise as to when things are affecting bytes, and when they're
manipulating characters (which makes no difference if LC_ALL=C).
 1.56  06-Aug-2024  kre PR bin/58534 -- printf(1) source code comment fix

Update the comment near the start of main() in printf.c so it
explains what is really happening and why, rather than being a
whole bunch of incorrect BS about what posix does or doesn't require.

This changes comments only, NFC (should be no binary change at all).
 1.55  18-Jul-2024  wiz Fix typo in comment.

Reported by Emanuele Torre in PR 58439.
 1.54  20-May-2021  christos fix typo
 1.53  19-May-2021  kre Changes for POSIX conformance.

1. exit(1) with an error message on stderr if an I/O error occurs.
1a. To work properly when built into /bin/sh sprinkle clearerr() at
appropriate places.

2. Verify that when a 'X data value is used with one of the numeric
conversions, that nothing follows the 'X'. It used to be unclear
in the standard whether this was required or not, it is clear that
with numeric conversions the entire data value must be used, or an
error must result. But with string conversions, that isn't the case
and unused parts are simply ignored. This one is a numeric conversion
with a string value, so which applies? The standard used to contain
an example of '+3 being converted, producing the same as '+ ignoring
the '3' with no mention of any error, so that's the approach we adopted,
The forthcoming version now explicitly states that an error would also
be generated from that case, as the '3' was not used by the numeric
conversion.

2a. We support those conversions with floating as well as integer conversions,
as the standard used to suggest that was required (but it makes no sense,
the values are always integers, printing them in a floating format is
dumb). The standard has been revised to make it clear that only the
integer numeric conversions %d %u %x (etc) are supposed to handle the 'X
form of data value. We still allow it with the floating formats as an
extension, for backward compat, just in case someone (other than the ATF
tests) is using it. It might go away.

2b. These formats are sypposed to convert 'X where 'X' is a character
(perhaps multibyte encoded) in the current LC_CTYPE locale category.
We don't handle that, only 1 byte characters are handled currently.
However the framework is now there to allow code to (one hopes, easily)
be added to handle multi-byte locales. (Note that for the purposes of
#2 above, 'X' must be a single character, not a single byte.)
 1.52  16-Apr-2021  christos branches: 1.52.2;
make value an int to avoid all the casts and conversion warnings.
 1.51  16-Apr-2021  christos Change octal and hex parsing to not use strtoul so that they don't handle
'-'. From Martijn van Duren.
Also add a warning if the conversion fails (like the gnu printf does)
 1.50  22-Jul-2019  kre Amend the previous change: we can have (almost) the best of both
worlds, as when the first arg (which should be the format) contains
no % conversions, and there are more args, the results are unspecified
(according to POSIX).

We can use this so the previous usage
printf -- format arg...
(which is stupid, and pointless, but used to work) continues to
simply ignore the -- (unspecified results mean we can do whatever
feels good...)

This brings back the #if 0'd block from the previous modification
(so there is no longer anything that needs cleaning up later) but runs
the getopt() loop it contained only when there are at least 2 args
(so any 1 arg printf always uses that arg as the format string,
whatever it contains, including just "--") and also only when the
first (format) arg contains no '%' characters (which guarantees no %
conversions without needing to actually parse the arg). This is the
(or a) "unspecified results" case from POSIX, so we are free to do
anything we like - including assuming that we might have options
(we don't) and pretending to process them.
 1.49  21-Jul-2019  kre Stop assuming that printf handles options in any way at all
(it doesn't - that is, shouldn't) which includes processing -- as an
"end of options". The first arg is (always) the format string.

Remove call to getopt() (but still do associated changes to argc/argv)

Note: for now this is #if 0's out instead of being deleted, the old
code should be fully removed sometime soon.

Problem pointed out on tech-userlevel by Thierry Laronde.
 1.48  27-Jan-2019  kre Revert previous, it was based upon a misreading of the POSIX
spec. POSIX requires "as if by calling strtod()" which we
did already ... by calling strtod(). Go back to doing that.
 1.47  26-Jan-2019  kre Always convert input numbers (from the command line) in the C
locale, not as set in the environment. Conforms with POSIX spec.
 1.46  10-Sep-2018  kre A truly ancient bug found by Edgar Fuss

When printf is running builtin in a sh, global vars aren't reset to
0 between invocations. This affects "rval" which remembers state
from a previous %b \c and thereafter always exits after the first
format conversion, until we get a conversion that generates an
error (which resets the flag almost by accident)

printf %b abc\\c
abc (no \n)
printf %s%s hello world
hello (no \n, of course, no world ...)
printf %s%s hello world
hello
printf %s%s hello world
hello
printf %d hello
printf: hello: expected numeric value
0 (no \n)
printf %s%s hello world
helloworld (no \n, and we are back!)

This affects both /bin/sh and /bin/csh (and has for a very long time).

XXX pullup -8
 1.45  04-Sep-2018  kre Printf's that support \e for escape all seem to also support \E.
Except us. Now we do as well.
 1.44  03-Sep-2018  kre Tighten syntax a little (no more %*4.*2d nonsense).
Include the format collected so far in "missing format char" err message.
Minor KNF and whitespace.
 1.43  31-Aug-2018  kre PR standards/53563

POSIX requires that signed numbers (strings preceded by '+' or '-')
be allowed as inputs to all of the integer format conversions, including
those which treat the data as unsigned.

Hence we do not need a variant function whose only difference from its
companion is to reject strings starting with '-' - instead we use
the primary function (getintmax()) for everything and remove getuintmax().

Minor update to the man page to indicate that the arg to all of the
integer conversions (diouxX) must be an integer constant (with an
optional sign) and to make it blatantly clear that %o is octal and
%u is unsigned decimal (for some reason those weren't explicitly stated
unlike d i x and X). Delete "respectively", it is not needed (and does
not really apply).

XXX pullup -8
 1.42  25-Jul-2018  kre NFC: More KNF (remove () around returned constants).
 1.41  25-Jul-2018  kre NFC: whitespace & KNF.
 1.40  24-Jul-2018  kre Add support for F a and A formats (which go with the eEfgG formats
already supported.)
 1.39  03-Jul-2018  kre Avoid printing error messages twice when an invalid
escape sequence (\ sequence) is present in an arg to a %b
conversion.
 1.38  03-Jul-2018  kre From leot@ on tech-userlevel:

Avoid running off into oblivion when a format string,
or arg to a %b conversion ends in an unescaped backslash.

Patch from Leo slightly modified by me.
 1.37  16-Jun-2015  christos branches: 1.37.8; 1.37.14; 1.37.16;
fix some error handling.
 1.36  16-Jul-2013  christos branches: 1.36.6; 1.36.8; 1.36.12;
WARNS=6
 1.35  15-Mar-2011  christos branches: 1.35.4; 1.35.10;
support grouping format.
 1.34  13-Oct-2009  christos Avoid segv on "printf '%*********s' 666", from Maksymilian Arciemowicz
 1.33  21-Jul-2008  lukem branches: 1.33.4; 1.33.8; 1.33.10;
Remove the \n and tabs from the __COPYRIGHT() strings.
Tweak to use a consistent format.
 1.32  28-Mar-2008  christos branches: 1.32.4;
detect more errors from printf/malloc.
 1.31  22-Mar-2005  dsl Remember to consume input bytes when processing '\0nnn' for %b formats
 1.30  30-Oct-2004  christos branches: 1.30.2;
- KNF, WARNS=3, pass lint.
- Simplify octal parsing code.
 1.29  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.28  25-Jun-2003  dsl Revert previous. 'None' means that the "Utility Syntax Guidlines" apply.
 1.27  25-Jun-2003  dsl Remove getopt() loop, IEEE 1003.1 doesn't say that printf(1) should conform
to the "Utility Syntax Guidlines".
Fixes PR 21970.
 1.26  24-Feb-2003  dsl Fix the output of NUL bytes within %b formats.
(Approved by Christos)
 1.25  24-Nov-2002  christos Fixes from David Laight:
- ansification
- format of output of jobs command (etc)
- job identiers %+, %- etc
- $? and $(...)
- correct quoting of output of set, export -p and readonly -p
- differentiation between nornal and 'posix special' builtins
- correct behaviour (posix) for errors on builtins and special builtins
- builtin printf and kill
- set -o debug (if compiled with DEBUG)
- cd src obj (as ksh - too useful to do without)
- unset -e name, remove non-readonly variable from export list.
(so I could unset -e PS1 before running the test shell...)
 1.24  14-Jun-2002  tron Complete declaration of progprintf() to fix build problem in csh(1).
 1.23  14-Jun-2002  wiz Remove #ifdef __STDC__. De-__P() and ANSIfy. Fix a prototype mismatch
uncovered by this.
 1.22  05-May-2001  kleink Change to use {u,}intmax_t internally (was: (unsigned) long).
 1.21  19-Dec-1998  christos brace pollution, and char -> unsigned char
 1.20  14-Oct-1998  wsanchez include unistd
 1.19  03-Feb-1998  perry add <unistd.h> to fix compiler warning
 1.18  19-Oct-1997  lukem s/index/strchr
 1.17  18-Oct-1997  mrg "merge" lite-2. our printf is already kinda different...minor changes only.
 1.16  04-Jul-1997  christos Fix compiler warnings.
 1.15  14-Jan-1997  cgd lint and KNF changes. (mostly casting returns to void to quiet lint.)
 1.14  09-Jan-1997  tls RCS ID police
 1.13  03-Feb-1994  jtc Simplify conversion of "quoted" numeric arguments.
 1.12  03-Feb-1994  jtc Code to check if conversion (by strtol(), strtoul(), or strtod()) was
identical, so I moved it into its own function.
 1.11  03-Feb-1994  jtc Add and use getulong() to handle %u, %o, %x & %X formatting directives.
It was using getlong(), which caused values larger than LONG_MAX to be
truncated to LONG_MAX.
As recommended by 1003.2, print warning messages when argument cannot be
converted to value or is out of range.
 1.10  31-Dec-1993  jtc Handle format strings error correctly.
 1.9  25-Nov-1993  jtc Error in hextobin() macro messed up hex escape constants.
 1.8  19-Nov-1993  jtc Oops! get rid of the free(), mklong()'s buffer no longer malloc()'d.
 1.7  19-Nov-1993  jtc Return from main() if a \c escape is encountered in a %b string (was an exit()).
Use macro constants for "skip1" and "skip2" instead of assigning them
each loop iteration.
Reformat the multi-case entries in the "big switch" so the lines don't wrap.
 1.6  19-Nov-1993  jtc Move all the code from do_printf() into do-while loop in main(). I need
to be able to return from main() when a "\c" in a %b string is encountered.
 1.5  19-Nov-1993  jtc Merged in most of the changes from 4.4 necessary to make printf a sh
and csh builtin --- still need to handle the one remaining exit() in
the SysV escape string handling code.
 1.4  05-Nov-1993  jtc Changes required to make printf utility POSIX.2 compliant:
* Escape characters in the string needed to be processed as they were
encountered, otherwise a "\000" octal constant would prematurely
terminate the formatting string.
* Implemented the %b, SysV echo(1) compatibility, formatting directive.
 1.3  01-Aug-1993  mycroft Add RCS identifiers.
 1.2  19-Apr-1993  mycroft Cleanup for GCC 2.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2  22-Mar-1995  mrg 4.4BSD-Lite2
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.30.2.1  27-Mar-2005  tron Pull up revision 1.31 (requested by dsl in ticket #57):
Remember to consume input bytes when processing '\0nnn' for %b formats
 1.32.4.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.33.10.1  21-Apr-2010  matt sync to netbsd-5
 1.33.8.1  14-Oct-2009  sborrill Pull up the following revisions(s) (requested by christos in ticket #1091):
usr.bin/printf/printf.c: revision 1.34

Avoid segv on "printf '%*********s' 666".
 1.33.4.1  14-Oct-2009  sborrill Pull up the following revisions(s) (requested by christos in ticket #1091):
usr.bin/printf/printf.c: revision 1.34

Avoid segv on "printf '%*********s' 666".
 1.35.10.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.35.4.1  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.36.12.1  12-Jul-2018  martin Pull up following revision(s) (requested by kre in ticket #1619):

usr.bin/printf/printf.c: revision 1.37-1.39

fix some error handling.

From leot@ on tech-userlevel:
Avoid running off into oblivion when a format string,
or arg to a %b conversion ends in an unescaped backslash.

Patch from Leo slightly modified by me.

Avoid printing error messages twice when an invalid
escape sequence (\ sequence) is present in an arg to a %b
conversion.
 1.36.8.1  12-Jul-2018  martin Pull up following revision(s) (requested by kre in ticket #1619):

usr.bin/printf/printf.c: revision 1.37-1.39

fix some error handling.

From leot@ on tech-userlevel:
Avoid running off into oblivion when a format string,
or arg to a %b conversion ends in an unescaped backslash.

Patch from Leo slightly modified by me.

Avoid printing error messages twice when an invalid
escape sequence (\ sequence) is present in an arg to a %b
conversion.
 1.36.6.1  12-Jul-2018  martin Pull up following revision(s) (requested by kre in ticket #1619):

usr.bin/printf/printf.c: revision 1.37-1.39

fix some error handling.

From leot@ on tech-userlevel:
Avoid running off into oblivion when a format string,
or arg to a %b conversion ends in an unescaped backslash.

Patch from Leo slightly modified by me.

Avoid printing error messages twice when an invalid
escape sequence (\ sequence) is present in an arg to a %b
conversion.
 1.37.16.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.37.16.1  10-Jun-2019  christos Sync with HEAD
 1.37.14.4  26-Jan-2019  pgoyette Sync with HEAD
 1.37.14.3  30-Sep-2018  pgoyette Ssync with HEAD
 1.37.14.2  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.37.14.1  28-Jul-2018  pgoyette Sync with HEAD
 1.37.8.3  23-Sep-2018  martin Pull up following revision(s) (requested by kre in ticket #1020):
usr.bin/printf/printf.c: revision 1.46

A truly ancient bug found by Edgar Fuss

When printf is running builtin in a sh, global vars aren't reset to
0 between invocations. This affects "rval" which remembers state
from a previous %b \c and thereafter always exits after the first
format conversion, until we get a conversion that generates an
error (which resets the flag almost by accident)

printf %b abc\\c
abc (no \n)
printf %s%s hello world
hello (no \n, of course, no world ...)
printf %s%s hello world
hello
printf %s%s hello world
hello
printf %d hello
printf: hello: expected numeric value
0 (no \n)
printf %s%s hello world
helloworld (no \n, and we are back!)

This affects both /bin/sh and /bin/csh (and has for a very long time).

XXX pullup -8
 1.37.8.2  01-Sep-2018  martin Pull up following revision(s) (requested by kre in ticket #1002):

usr.bin/printf/printf.1: revision 1.31 (via patch)
usr.bin/printf/printf.c: revision 1.43

PR standards/53563

POSIX requires that signed numbers (strings preceded by '+' or '-')
be allowed as inputs to all of the integer format conversions, including
those which treat the data as unsigned.

Hence we do not need a variant function whose only difference from its
companion is to reject strings starting with '-' - instead we use
the primary function (getintmax()) for everything and remove getuintmax().

Minor update to the man page to indicate that the arg to all of the
integer conversions (diouxX) must be an integer constant (with an
optional sign) and to make it blatantly clear that %o is octal and
%u is unsigned decimal (for some reason those weren't explicitly stated
unlike d i x and X). Delete "respectively", it is not needed (and does
not really apply).

XXX pullup -8
 1.37.8.1  13-Jul-2018  martin Pull up following revision(s) (requested by kre in ticket #914):

usr.bin/printf/printf.c: revision 1.38,1.39

From leot@ on tech-userlevel:
Avoid running off into oblivion when a format string,
or arg to a %b conversion ends in an unescaped backslash.

Patch from Leo slightly modified by me.

Avoid printing error messages twice when an invalid
escape sequence (\ sequence) is present in an arg to a %b
conversion.
 1.52.2.1  31-May-2021  cjep sync with head

RSS XML Feed