p Each call to .Nm : l -enum -compact t examines up to .Fa n bytes starting at .Fa s , t yields a UTF-16 code unit if available by storing it at .Li * Ns Fa pc16 , t saves state at .Fa ps , and t returns either the number of bytes consumed if any or a special return value. .El
p Specifically: l -bullet t If the multibyte sequence at .Fa s is invalid after any previous input saved at .Fa ps , or if an error occurs in decoding, .Nm returns .Li (size_t)-1 and sets .Xr errno 2 to indicate the error. t If the multibyte sequence at .Fa s is still incomplete after .Fa n bytes, including any previous input saved in .Fa ps , .Nm saves its state in .Fa ps after all the input so far and returns .Li "(size_t)-2". t If .Nm had previously decoded a multibyte character but has not yet yielded all the code units of its UTF-16 encoding, it stores the next UTF-16 code unit at .Li * Ns Fa pc16 and returns .Li "(size_t)-3" . t If .Nm decodes the null multibyte character, then it stores zero at .Li * Ns Fa pc16 and returns zero. t Otherwise, .Nm decodes a single multibyte character, stores the first (and possibly only) code unit in its UTF-16 encoding at .Li * Ns Fa pc16 , and returns the number of bytes consumed to decode the first multibyte character. .El
p If .Fa pc16 is a null pointer, nothing is stored, but the effects on .Fa ps and the return value are unchanged.
p If .Fa s is a null pointer, the .Nm call is equivalent to: d -ragged -offset indent .Fo mbrtoc16 .Li NULL , .Li \*q\*q , .Li 1 , .Fa ps .Fc .Ed
p This always returns zero, and has the effect of resetting .Fa ps to the initial conversion state, without writing to .Fa pc16 , even if it is nonnull.
p If .Fa ps is a null pointer, .Nm uses an internal .Vt mbstate_t object with static storage duration, distinct from all other .Vt mbstate_t objects
o including those used by .Xr mbrtoc8 3 , .Xr mbrtoc32 3 , .Xr c8rtomb 3 , .Xr c16rtomb 3 , and .Xr c32rtomb 3
c ,
which is initialized at program startup to the initial conversion
state.
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh IMPLEMENTATION NOTES
On well-formed input, the
.Nm
function yields either a Unicode scalar value in the Basic Multilingual
Plane (BMP), i.e., a 16-bit Unicode code point that is not a surrogate
code point, or, over two successive calls, yields the high and low
surrogate code points (in that order) of a Unicode scalar value outside
the BMP.
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh RETURN VALUES
The
.Nm
function returns:
l -tag -width Li t Li 0 q null if
.Nm
decoded a null multibyte character.
t Ar i q code unit where
.Li 0
\*(Le
.Ar i
\*(Le
.Fa n ,
if
.Nm
consumed
.Ar i
bytes of input to decode the next multibyte character, yielding a
UTF-16 code unit.
t Li (size_t)-3 q continuation if
.Nm
consumed no new bytes of input but yielded a UTF-16 code unit that was
pending from previous input.
t Li (size_t)-2 q incomplete if
.Nm
found only an incomplete multibyte sequence after all
.Fa n
bytes of input and any previous input, and saved its state to restart
in the next call with
.Fa ps .
t Li (size_t)-1 q error if any encoding error was detected;
.Xr errno 2
is set to reflect the error.
.El
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh EXAMPLES
Print the UTF-16 code units of a multibyte string in hexadecimal text:
d -literal -offset indent char *s = ...;
size_t n = ...;
mbstate_t mbs = {0}; /* initial conversion state */
while (n) {
char16_t c16;
size_t len;
len = mbrtoc16(&c16, s, n, &mbs);
switch (len) {
case 0: /* null terminator */
assert(c16 == L'\e0');
goto out;
default: /* scalar value or high surrogate */
printf("U+%04"PRIx16"\en", (uint16_t)c16);
break;
case (size_t)-3: /* low surrogate */
printf("continue U+%04"PRIx16"\en", (uint16_t)c16);
break;
case (size_t)-2: /* incomplete */
printf("incomplete\en");
goto readmore;
case (size_t)-1: /* error */
printf("error: %d\en", errno);
goto out;
}
s += len;
n -= len;
}
.Ed
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh ERRORS
l -tag -width Bq t Bq Er EILSEQ The multibyte sequence cannot be decoded in the current locale as a
Unicode scalar value.
t Bq Er EIO An error occurred in loading the locale's character conversions.
.El
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh SEE ALSO
.Xr c16rtomb 3 ,
.Xr c32rtomb 3 ,
.Xr mbrtoc32 3 ,
.Xr uchar 3
.Rs
.%B The Unicode Standard
.%O Version 15.0 \(em Core Specification
.%Q The Unicode Consortium
.%D September 2022
.%U https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf
.Re
.Rs
.%A P. Hoffman
.%A F. Yergeau
.%T UTF-16, an encoding of ISO 10646
.%R RFC 2781
.%D February 2000
.%I Internet Engineering Task Force
.%U https://datatracker.ietf.org/doc/html/rfc2781
.Re
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh STANDARDS
The
.Nm
function conforms to
.St -isoC-2011 .
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.Sh HISTORY
The
.Nm
function first appeared in
.Nx 11.0 .