p Each call to .Nm : l -enum -compact t examines up to .Fa n bytes starting at .Fa s , t yields a Unicode scalar value (i.e., a UTF-32 code unit) if available by storing it at .Li * Ns Fa pc32 , t saves state at .Fa ps , and t returns either the number of bytes consumed if any or a special return value. .El
p Specifically: l -bullet t If the multibyte sequence at .Fa s is invalid after any previous input saved at .Fa ps , or if an error occurs in decoding, .Nm returns .Li (size_t)-1 and sets .Xr errno 2 to indicate the error. t If the multibyte sequence at .Fa s is still incomplete after .Fa n bytes, including any previous input saved in .Fa ps , .Nm saves its state in .Fa ps after all the input so far and returns .Li "(size_t)-2". t If .Nm decodes the null multibyte character, then it stores zero at .Li * Ns Fa pc32 and returns zero. t Otherwise, .Nm decodes a single multibyte character, stores its Unicode scalar value at .Li * Ns Fa pc32 , and returns the number of bytes consumed to decode the first multibyte character. .El
p If .Fa pc32 is a null pointer, nothing is stored, but the effects on .Fa ps and the return value are unchanged.
p If .Fa s is a null pointer, the .Nm call is equivalent to: d -ragged -offset indent .Fo mbrtoc32 .Li NULL , .Li \*q\*q , .Li 1 , .Fa ps .Fc .Ed
p This always returns zero, and has the effect of resetting .Fa ps to the initial conversion state, without writing to .Fa pc32 , even if it is nonnull.
p If .Fa ps is a null pointer, .Nm uses an internal .Vt mbstate_t object with static storage duration, distinct from all other .Vt mbstate_t objects
o including those used by .Xr mbrtoc8 3 , .Xr mbrtoc16 3 , .Xr c8rtomb 3 , .Xr c16rtomb 3 , and .Xr c32rtomb 3
c ,
which is initialized at program startup to the initial conversion
state.
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh RETURN VALUES
The
.Nm
function returns:
l -tag -width Li t Li 0 q null if
.Nm
decoded a null multibyte character.
t Ar i q scalar value where
.Li 0
\*(Le
.Ar i
\*(Le
.Fa n ,
if
.Nm
consumed
.Ar i
bytes of input to decode the next multibyte character, yielding a
Unicode scalar value.
t Li (size_t)-2 q incomplete if
.Nm
found only an incomplete multibyte sequence after all
.Fa n
bytes of input and any previous input, and saved its state to restart
in the next call with
.Fa ps .
t Li (size_t)-1 q error if any encoding error was detected;
.Xr errno 2
is set to reflect the error.
.El
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh EXAMPLES
d -literal -offset indent char *s = ...;
size_t n = ...;
mbstate_t mbs = {0}; /* initial conversion state */
while (n) {
char32_t c32;
size_t len;
len = mbrtoc32(&c32, s, n, &mbs);
switch (len) {
case 0: /* NUL terminator */
assert(c32 == 0);
goto out;
default: /* scalar value */
printf("U+%04"PRIx32"\en", (uint32_t)c32);
break;
case (size_t)-2: /* incomplete */
printf("incomplete\en");
goto readmore;
case (size_t)-1: /* error */
printf("error: %d\en", errno);
goto out;
}
s += len;
n -= len;
}
.Ed
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh ERRORS
l -tag -width Bq t Bq Er EILSEQ The multibyte sequence cannot be decoded in the current locale as a
Unicode scalar value.
t Bq Er EIO An error occurred in loading the locale's character conversions.
.El
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh SEE ALSO
.Xr c16rtomb 3 ,
.Xr c32rtomb 3 ,
.Xr c8rtomb 3 ,
.Xr mbrtoc16 3 ,
.Xr mbrtoc8 3 ,
.Xr uchar 3
.Rs
.%B The Unicode Standard
.%O Version 15.0 \(em Core Specification
.%Q The Unicode Consortium
.%D September 2022
.%U https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf
.Re
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh STANDARDS
The
.Nm
function conforms to
.St -isoC-2011 .
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh HISTORY
The
.Nm
function first appeared in
.Nx 11.0 .