| History log of /src/tests/lib/libc/locale |
| Revision | Date | Author | Comments |
| 1.18 | 15-Aug-2024 |
riastradh | libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L
(Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
|
| 1.17 | 15-Aug-2024 |
riastradh | libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing
|
| 1.16 | 15-Aug-2024 |
riastradh | uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later.
PR lib/52374: <uchar.h> missing
|
| 1.15 | 14-Aug-2024 |
riastradh | tests/lib/libc/locale/Makefile: Sort.
No functional change intended.
Preparation for PR lib/52374.
|
| 1.14 | 27-Nov-2023 |
christos | branches: 1.14.2; Don't use fmtcheck for strfmon format strings. It does not work. Fix a broken test.
|
| 1.13 | 28-Jul-2019 |
christos | branches: 1.13.10; PR/54414: Valery Ushakov: add a test for wcsrtombs(3) doesn't update the source argument on conversion error
|
| 1.12 | 16-Aug-2017 |
joerg | branches: 1.12.4; Add missing strfmon_l. Noticed by Bruno Haible. Add test case.
|
| 1.11 | 23-Jul-2017 |
perseant | Add missing files from last commit:
Move Unicode <-> ku/ten mapping into the individual codec modules. Mapping is based on existing iconv data for single-byte encodings, and included for several, but not all, multibyte encodings.
|
| 1.10 | 14-Jul-2017 |
perseant | branches: 1.10.2; Add a simple collation test. This test is expected to fail on HEAD since we do not yet have a working implementation of wcscoll.
|
| 1.9 | 01-Jun-2017 |
perseant | branches: 1.9.2; Add tests for btowc(3)/wctob(3) and enable compilation of the test for digittoint(3).
The digittoint(3) test is skipped since we don't provide that function yet.
One of the test cases for btowc(3) is also skipped, since it tests conversion to Unicode---whereas our wchar_t representation is locale-dependent.
|
| 1.8 | 30-May-2017 |
perseant | Add test cases for sprintf/sscanf/strto{d,l} and the is* and isw* ctype functions, for single-byte encodings
|
| 1.7 | 30-May-2017 |
perseant | Add simple test case for toupper/tolower
|
| 1.6 | 28-May-2013 |
joerg | Add mbsnrtowcs and wcsnrtombs. Approved by core.
|
| 1.5 | 28-Feb-2013 |
christos | regression tests for wide char i/o. Currently there are failures.
|
| 1.4 | 21-Nov-2011 |
joerg | branches: 1.4.6; Add test cases for strcspn, strpbrk, strspn, wcscspn, wcspbrk and wcsspn.
|
| 1.3 | 15-Jul-2011 |
jruoho | branches: 1.3.2; Rename two test files to get functional scope (and avoid confusion with ctype(3)). No functional change.
|
| 1.2 | 11-Apr-2011 |
tron | Fix build with stack smash protection enabled.
|
| 1.1 | 09-Apr-2011 |
pgoyette | atf-ify the various locale tests
|
| 1.3.2.2 | 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
| 1.3.2.1 | 17-Apr-2012 |
yamt | sync with head
|
| 1.4.6.1 | 23-Jun-2013 |
tls | resync from head
|
| 1.9.2.1 | 29-Aug-2017 |
martin | Pull up following revision(s) (requested by joerg in ticket #215): tests/lib/libc/locale/t_strfmon.c: revision 1.1 tests/lib/libc/locale/Makefile: revision 1.12 lib/libc/stdlib/strfmon.c: revision 1.11 distrib/sets/lists/debug/mi: revision 1.224 include/monetary.h: revision 1.3 distrib/sets/lists/tests/mi: revision 1.761 lib/libc/stdlib/strfmon.3: revision 1.6 lib/libc/stdlib/strfmon.3: revision 1.7 Add missing strfmon_l. Noticed by Bruno Haible. Add test case. Typo fix.
|
| 1.10.2.2 | 23-Jul-2017 |
perseant | Add Unicode copyright notice and more verbose DUCET test.
|
| 1.10.2.1 | 14-Jul-2017 |
perseant | Initial commit of a mostly-working implementation of __STDC_ISO_10646__, with collation support using the Unicode Collation Algorithm.
The conversion from men/ku/ten form to Unicode is a gross hack at present. Fixing this, and fleshing out the LC_COLLATE locale component, are next on the agenda.
|
| 1.12.4.1 | 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
| 1.13.10.1 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.14.2.1 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.2 | 23-Jul-2017 |
perseant | Add missing files from last commit:
Move Unicode <-> ku/ten mapping into the individual codec modules. Mapping is based on existing iconv data for single-byte encodings, and included for several, but not all, multibyte encodings.
|
| 1.1 | 14-Jul-2017 |
perseant | branches: 1.1.2; file ducet_test.h was initially added on branch perseant-stdc-iso10646.
|
| 1.1.2.2 | 23-Jul-2017 |
perseant | Add Unicode copyright notice and more verbose DUCET test.
|
| 1.1.2.1 | 14-Jul-2017 |
perseant | Initial commit of a mostly-working implementation of __STDC_ISO_10646__, with collation support using the Unicode Collation Algorithm.
The conversion from men/ku/ten form to Unicode is a gross hack at present. Fixing this, and fleshing out the LC_COLLATE locale component, are next on the agenda.
|
| 1.3 | 10-Aug-2017 |
perseant | Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.2 | 12-Jul-2017 |
perseant | Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.1 | 01-Jun-2017 |
perseant | branches: 1.1.2; Add tests for btowc(3)/wctob(3) and enable compilation of the test for digittoint(3).
The digittoint(3) test is skipped since we don't provide that function yet.
One of the test cases for btowc(3) is also skipped, since it tests conversion to Unicode---whereas our wchar_t representation is locale-dependent.
|
| 1.1.2.1 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.6 | 19-Aug-2024 |
riastradh | branches: 1.6.2; 1.6.6; c32rtomb(3): Use conversion state to handle shift sequences.
For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce:
1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state
This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte.
Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing
- _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character
XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
|
| 1.5 | 19-Aug-2024 |
riastradh | t_c8rtomb, t_c16rtomb: Simplify comment.
ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
|
| 1.4 | 18-Aug-2024 |
riastradh | c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
|
| 1.3 | 18-Aug-2024 |
riastradh | c8rtomb(3), c16rtomb(3): Fix NUL handling.
PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
|
| 1.2 | 17-Aug-2024 |
riastradh | c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination.
PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
|
| 1.1 | 15-Aug-2024 |
riastradh | libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing
|
| 1.6.6.2 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.6.6.1 | 19-Aug-2024 |
perseant | file t_c16rtomb.c was added on branch perseant-exfatfs on 2025-08-02 05:58:05 +0000
|
| 1.6.2.2 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.6.2.1 | 19-Aug-2024 |
martin | file t_c16rtomb.c was added on branch netbsd-10 on 2024-10-14 17:20:19 +0000
|
| 1.1 | 15-Aug-2024 |
riastradh | branches: 1.1.2; 1.1.6; libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing
|
| 1.1.6.2 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.1.6.1 | 15-Aug-2024 |
perseant | file t_c32rtomb.c was added on branch perseant-exfatfs on 2025-08-02 05:58:05 +0000
|
| 1.1.2.2 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.1.2.1 | 15-Aug-2024 |
martin | file t_c32rtomb.c was added on branch netbsd-10 on 2024-10-14 17:20:19 +0000
|
| 1.7 | 19-Aug-2024 |
riastradh | branches: 1.7.2; 1.7.6; c32rtomb(3): Use conversion state to handle shift sequences.
For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce:
1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state
This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte.
Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing
- _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character
XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
|
| 1.6 | 19-Aug-2024 |
riastradh | t_c8rtomb, t_c16rtomb: Simplify comment.
ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
|
| 1.5 | 18-Aug-2024 |
riastradh | c8rtomb(3): Fix digit error in shift sequence test.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
|
| 1.4 | 18-Aug-2024 |
riastradh | c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences.
PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
|
| 1.3 | 18-Aug-2024 |
riastradh | c8rtomb(3), c16rtomb(3): Fix NUL handling.
PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
|
| 1.2 | 17-Aug-2024 |
riastradh | c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination.
PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
|
| 1.1 | 15-Aug-2024 |
riastradh | libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L
(Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
|
| 1.7.6.2 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.7.6.1 | 19-Aug-2024 |
perseant | file t_c8rtomb.c was added on branch perseant-exfatfs on 2025-08-02 05:58:05 +0000
|
| 1.7.2.2 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.7.2.1 | 19-Aug-2024 |
martin | file t_c8rtomb.c was added on branch netbsd-10 on 2024-10-14 17:20:19 +0000
|
| 1.2 | 15-Jul-2011 |
jruoho | Rename two test files to get functional scope (and avoid confusion with ctype(3)). No functional change.
|
| 1.1 | 09-Apr-2011 |
pgoyette | atf-ify the various locale tests
|
| 1.2 | 15-Jul-2011 |
jruoho | Rename two test files to get functional scope (and avoid confusion with ctype(3)). No functional change.
|
| 1.1 | 09-Apr-2011 |
pgoyette | atf-ify the various locale tests
|
| 1.3 | 24-May-2022 |
andvar | fix various typos in comment, documentation and log messages.
|
| 1.2 | 01-Jun-2017 |
perseant | Add tests for btowc(3)/wctob(3) and enable compilation of the test for digittoint(3).
The digittoint(3) test is skipped since we don't provide that function yet.
One of the test cases for btowc(3) is also skipped, since it tests conversion to Unicode---whereas our wchar_t representation is locale-dependent.
|
| 1.1 | 30-May-2017 |
perseant | Add test cases for sprintf/sscanf/strto{d,l} and the is* and isw* ctype functions, for single-byte encodings
|
| 1.2 | 23-Jul-2017 |
perseant | Add missing files from last commit:
Move Unicode <-> ku/ten mapping into the individual codec modules. Mapping is based on existing iconv data for single-byte encodings, and included for several, but not all, multibyte encodings.
|
| 1.1 | 14-Jul-2017 |
perseant | branches: 1.1.2; file t_ducet.c was initially added on branch perseant-stdc-iso10646.
|
| 1.1.2.2 | 23-Jul-2017 |
perseant | Add Unicode copyright notice and more verbose DUCET test.
|
| 1.1.2.1 | 14-Jul-2017 |
perseant | Initial commit of a mostly-working implementation of __STDC_ISO_10646__, with collation support using the Unicode Collation Algorithm.
The conversion from men/ku/ten form to Unicode is a gross hack at present. Fixing this, and fleshing out the LC_COLLATE locale component, are next on the agenda.
|
| 1.5 | 12-Jul-2017 |
perseant | Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.4 | 21-Jan-2014 |
yamt | branches: 1.4.4; 1.4.20; fix comment typos pointed out by uebayasi
|
| 1.3 | 20-Jan-2014 |
yamt | - fix funopen usage - some more checks - remove a bogus test case (bad_eucJP_getwc) PR/47660 (Julio Merino) - add XXX comments
|
| 1.2 | 17-Mar-2013 |
jmmv | branches: 1.2.4; Mark two routinely-broken tests as expected failures referencing PR lib/47660.
|
| 1.1 | 28-Feb-2013 |
christos | regression tests for wide char i/o. Currently there are failures.
|
| 1.2.4.3 | 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
| 1.2.4.2 | 23-Jun-2013 |
tls | resync from head
|
| 1.2.4.1 | 17-Mar-2013 |
tls | file t_io.c was added on branch tls-maxphys on 2013-06-23 06:28:56 +0000
|
| 1.4.20.1 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.4.4.2 | 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
| 1.4.4.1 | 21-Jan-2014 |
yamt | file t_io.c was added on branch yamt-pagecache on 2014-05-22 11:42:20 +0000
|
| 1.3 | 20-Aug-2024 |
riastradh | branches: 1.3.2; 1.3.6; mbrtoc32(3): Use conversion state to handle shift sequences.
PR lib/58618: mbrtocN(3) fails to keep shift state
|
| 1.2 | 19-Aug-2024 |
riastradh | mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state.
This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it.
PR lib/58618: mbrtocN(3) fails to keep shift state
|
| 1.1 | 15-Aug-2024 |
riastradh | libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing
|
| 1.3.6.2 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.3.6.1 | 20-Aug-2024 |
perseant | file t_mbrtoc16.c was added on branch perseant-exfatfs on 2025-08-02 05:58:05 +0000
|
| 1.3.2.2 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.3.2.1 | 20-Aug-2024 |
martin | file t_mbrtoc16.c was added on branch netbsd-10 on 2024-10-14 17:20:19 +0000
|
| 1.1 | 15-Aug-2024 |
riastradh | branches: 1.1.2; 1.1.6; libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing
|
| 1.1.6.2 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.1.6.1 | 15-Aug-2024 |
perseant | file t_mbrtoc32.c was added on branch perseant-exfatfs on 2025-08-02 05:58:05 +0000
|
| 1.1.2.2 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.1.2.1 | 15-Aug-2024 |
martin | file t_mbrtoc32.c was added on branch netbsd-10 on 2024-10-14 17:20:19 +0000
|
| 1.3 | 20-Aug-2024 |
riastradh | branches: 1.3.2; 1.3.6; mbrtoc32(3): Use conversion state to handle shift sequences.
PR lib/58618: mbrtocN(3) fails to keep shift state
|
| 1.2 | 19-Aug-2024 |
riastradh | mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state.
This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it.
PR lib/58618: mbrtocN(3) fails to keep shift state
|
| 1.1 | 15-Aug-2024 |
riastradh | libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L
(Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
|
| 1.3.6.2 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.3.6.1 | 20-Aug-2024 |
perseant | file t_mbrtoc8.c was added on branch perseant-exfatfs on 2025-08-02 05:58:05 +0000
|
| 1.3.2.2 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.3.2.1 | 20-Aug-2024 |
martin | file t_mbrtoc8.c was added on branch netbsd-10 on 2024-10-14 17:20:19 +0000
|
| 1.2 | 12-Jul-2017 |
perseant | Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.1 | 15-Jul-2011 |
jruoho | branches: 1.1.34; Rename two test files to get functional scope (and avoid confusion with ctype(3)). No functional change.
|
| 1.1.34.1 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.2 | 06-May-2014 |
yamt | branches: 1.2.2; include string.h for memset
|
| 1.1 | 28-May-2013 |
joerg | branches: 1.1.2; 1.1.6; Add mbsnrtowcs and wcsnrtombs. Approved by core.
|
| 1.1.6.1 | 10-Aug-2014 |
tls | Rebase.
|
| 1.1.2.3 | 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
| 1.1.2.2 | 23-Jun-2013 |
tls | resync from head
|
| 1.1.2.1 | 28-May-2013 |
tls | file t_mbsnrtowcs.c was added on branch tls-maxphys on 2013-06-23 06:28:56 +0000
|
| 1.2.2.2 | 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
| 1.2.2.1 | 06-May-2014 |
yamt | file t_mbsnrtowcs.c was added on branch yamt-pagecache on 2014-05-22 11:42:20 +0000
|
| 1.3 | 21-Dec-2022 |
wiz | adapt mbstowcs_basic test for unicode table update
reformat so it's easier to find which result data belongs to which input
|
| 1.2 | 12-Jul-2017 |
perseant | branches: 1.2.16; Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.1 | 15-Jul-2011 |
jruoho | branches: 1.1.34; Rename two test files to get functional scope (and avoid confusion with ctype(3)). No functional change.
|
| 1.1.34.1 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.2.16.1 | 11-Sep-2023 |
martin | Pull up following revision(s) (requested by wiz in ticket #368):
share/locale/ctype/en_US.UTF-8.src: revision 1.10 share/locale/ctype/en_US.UTF-8.src: revision 1.8 share/locale/ctype/en_US.UTF-8.src: revision 1.9 share/locale/ctype/gen_ctype_utf8.pl: revision 1.1 share/locale/ctype/gen_ctype_utf8.pl: revision 1.2 tests/lib/libc/locale/t_mbstowcs.c: revision 1.3
Update unicode tables.
This version of the file, and the generator script, come from OpenBSD. The script was written by Andrew Fresh. The file covers the encodings from Unicode 13.0.0, based on the files distributed with perl 5.32.1.
Add NetBSD RCS Id header instead of OpenBSD one.
Update Unicode tables.
These tables are for Unicode 14.0.0 using the data provided with perl 5.36.0.
Update Unicode tables to 15.0.0. This is based on the tables provided by perl 5.37.7.
adapt mbstowcs_basic test for unicode table update reformat so it's easier to find which result data belongs to which input
|
| 1.3 | 30-Jun-2020 |
jruoho | After a comedy of errors, move t_mbtowc to its final resting place.
|
| 1.2 | 25-May-2017 |
perseant | Add a member to the test data structure that indicates whether the given encoding is state-dependent, and test the results of wctomb(NULL, '\0') and mbtowc(NULL, NULL, 0) against this instead of against each other.
|
| 1.1 | 09-Apr-2011 |
pgoyette | atf-ify the various locale tests
|
| 1.8 | 02-Aug-2021 |
andvar | s/diferent/different/
|
| 1.7 | 01-Dec-2017 |
kre | Since the C standard allows for intermediate floating results to contain more precision bits than the data type expects, but (kind of obviously) does not allow such values to be stored in memory, expecting the value returned from strtod() (an intermediate result) to be identical (that is, equal) to a stored value is incorrect.
So instead go back to checking that the two numbers are very very close. See comments added to the test for more explanation.
|
| 1.6 | 28-Nov-2017 |
kre | Revert 1.4 (perhaps temporarily) and add even more diagnostics to those added in 1.3 to see if it is possible to determine why the strict equality test fails on i386, yet succeeds elsewhere.
|
| 1.5 | 24-Nov-2017 |
kre | When comparing doubles (any floating point values) which have been computed using different methods, don't expect to achieve identical results (here, one constant is perhaps converted to binary from a string by a cross compiler, the other is converted at run time). Allow them to have a small difference (for now, small is < 1e-7 - the constant is ~ 1e5, so this is 12 orders of magnitude less) before failing (and include the actual difference in the error message if it does fail.)
|
| 1.4 | 23-Nov-2017 |
kre | Add some diagnostics to the strto test, so I can see why this fails on i386 (on qemu) - will probably keep them when done.
|
| 1.3 | 12-Jul-2017 |
perseant | Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.2 | 07-Jun-2017 |
perseant | Change t_sprintf to an expected failure, since we don't respect the empty thousands separator of the C/POSIX locale (PR standards/52282).
|
| 1.1 | 30-May-2017 |
perseant | branches: 1.1.2; Add test cases for sprintf/sscanf/strto{d,l} and the is* and isw* ctype functions, for single-byte encodings
|
| 1.1.2.3 | 15-Mar-2018 |
bouyer | Pull up following revision(s) (requested by martin in ticket #631): tests/lib/libc/locale/t_sprintf.c: revision 1.4 tests/lib/libc/locale/t_sprintf.c: revision 1.5 tests/lib/libc/locale/t_sprintf.c: revision 1.6 tests/lib/libc/locale/t_sprintf.c: revision 1.7 Add some diagnostics to the strto test, so I can see why this fails on i386 (on qemu) - will probably keep them when done. When comparing doubles (any floating point values) which have been computed using different methods, don't expect to achieve identical results (here, one constant is perhaps converted to binary from a string by a cross compiler, the other is converted at run time). Allow them to have a small difference (for now, small is < 1e-7 - the constant is ~ 1e5, so this is 12 orders of magnitude less) before failing (and include the actual difference in the error message if it does fail.) Revert 1.4 (perhaps temporarily) and add even more diagnostics to those added in 1.3 to see if it is possible to determine why the strict equality test fails on i386, yet succeeds elsewhere. Since the C standard allows for intermediate floating results to contain more precision bits than the data type expects, but (kind of obviously) does not allow such values to be stored in memory, expecting the value returned from strtod() (an intermediate result) to be identical (that is, equal) to a stored value is incorrect. So instead go back to checking that the two numbers are very very close. See comments added to the test for more explanation.
|
| 1.1.2.2 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.1.2.1 | 14-Mar-2018 |
bouyer | Pull up following revision(s) (requested by martin in ticket #630): lib/libc/stdio/vfwprintf.c: revision 1.35 lib/libc/stdio/vfwprintf.c: revision 1.36 tests/lib/libc/locale/t_sprintf.c: revision 1.2 Change t_sprintf to an expected failure, since we don't respect the empty thousands separator of the C/POSIX locale (PR standards/52282). Do not use thousands grouping when none is specified by the locale. Fixes PR standards/52282. A more correct fix for PR standards/52282.
|
| 1.6 | 27-Nov-2023 |
christos | Don't use fmtcheck for strfmon format strings. It does not work. Fix a broken test.
|
| 1.5 | 14-Oct-2023 |
christos | PR/57633: Jose Luis Duran: Add strfmon tests from FreeBSD
|
| 1.4 | 28-Sep-2023 |
christos | Add testing for pad resetting (Jose Luis Duran)
|
| 1.3 | 02-Aug-2021 |
andvar | s/diferent/different/
|
| 1.2 | 07-Dec-2017 |
kre | Update this test to expect the output that is supposed to be produced by strfmon() rather than the output the old buggy implementation used to produce.
|
| 1.1 | 16-Aug-2017 |
joerg | branches: 1.1.2; Add missing strfmon_l. Noticed by Bruno Haible. Add test case.
|
| 1.1.2.2 | 29-Aug-2017 |
martin | Pull up following revision(s) (requested by joerg in ticket #215): tests/lib/libc/locale/t_strfmon.c: revision 1.1 tests/lib/libc/locale/Makefile: revision 1.12 lib/libc/stdlib/strfmon.c: revision 1.11 distrib/sets/lists/debug/mi: revision 1.224 include/monetary.h: revision 1.3 distrib/sets/lists/tests/mi: revision 1.761 lib/libc/stdlib/strfmon.3: revision 1.6 lib/libc/stdlib/strfmon.3: revision 1.7 Add missing strfmon_l. Noticed by Bruno Haible. Add test case. Typo fix.
|
| 1.1.2.1 | 16-Aug-2017 |
martin | file t_strfmon.c was added on branch netbsd-8 on 2017-08-29 11:51:50 +0000
|
| 1.2 | 02-Aug-2021 |
andvar | s/diferent/different/
|
| 1.1 | 30-May-2017 |
perseant | branches: 1.1.4; Add simple test case for toupper/tolower
|
| 1.1.4.1 | 23-Jan-2018 |
perseant | Make the tests pass once more when __STDC_ISO_10646__ is not defined.
|
| 1.3 | 14-Oct-2024 |
rillig | branches: 1.3.2; 1.3.6; tests/t_uchar: fix copy-and-paste typo
|
| 1.2 | 13-Oct-2024 |
riastradh | tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb.
PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h
PR lib/52374: <uchar.h> missing
|
| 1.1 | 15-Aug-2024 |
riastradh | uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later.
PR lib/52374: <uchar.h> missing
|
| 1.3.6.2 | 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.3.6.1 | 14-Oct-2024 |
perseant | file t_uchar.c was added on branch perseant-exfatfs on 2025-08-02 05:58:05 +0000
|
| 1.3.2.2 | 14-Oct-2024 |
martin | Pull up following revision(s) (requested by riastradh in ticket #976):
lib/libc/locale/c32rtomb.3: revision 1.10 lib/libc/locale/c32rtomb.3: revision 1.9 lib/libc/locale/c32rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc32.c: revision 1.1 distrib/sets/lists/base/shl.mi: revision 1.988 lib/libc/include/namespace.h: revision 1.204 lib/libc/include/namespace.h: revision 1.205 lib/libc/locale/mbrtoc16.3: revision 1.1 lib/libc/locale/mbrtoc16.c: revision 1.1 lib/libc/locale/mbrtoc16.3: revision 1.2 lib/libc/locale/mbrtoc16.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.3 lib/libc/locale/mbrtoc16.c: revision 1.3 lib/libc/locale/mbrtoc32.3: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.1 tests/lib/libc/locale/t_c16rtomb.c: revision 1.1 lib/libc/locale/mbrtoc32.c: revision 1.2 lib/libc/locale/mbrtoc16.3: revision 1.4 lib/libc/locale/mbrtoc16.c: revision 1.4 lib/libc/locale/mbrtoc32.3: revision 1.2 tests/lib/libc/locale/t_c16rtomb.c: revision 1.2 lib/libc/locale/mbrtoc32.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.5 lib/libc/locale/mbrtoc16.c: revision 1.5 lib/libc/locale/mbrtoc32.3: revision 1.3 tests/lib/libc/locale/t_c16rtomb.c: revision 1.3 lib/libc/locale/mbrtoc32.c: revision 1.4 lib/libc/locale/mbrtoc16.3: revision 1.6 lib/libc/locale/mbrtoc16.c: revision 1.6 lib/libc/locale/mbrtoc32.3: revision 1.4 tests/lib/libc/locale/t_c16rtomb.c: revision 1.4 lib/libc/locale/mbrtoc32.c: revision 1.5 lib/libc/locale/mbrtoc16.3: revision 1.7 lib/libc/locale/mbrtoc16.c: revision 1.7 lib/libc/locale/mbrtoc32.3: revision 1.5 tests/lib/libc/locale/t_c16rtomb.c: revision 1.5 lib/libc/locale/mbrtoc32.c: revision 1.6 lib/libc/locale/mbrtoc16.3: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.6 tests/lib/libc/locale/t_c16rtomb.c: revision 1.6 lib/libc/locale/mbrtoc32.c: revision 1.7 lib/libc/locale/mbrtoc16.3: revision 1.9 lib/libc/locale/mbrtoc32.3: revision 1.7 lib/libc/locale/mbrtoc32.c: revision 1.8 lib/libc/locale/mbrtoc32.3: revision 1.8 lib/libc/locale/mbrtoc32.c: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2468 lib/libc/locale/mbrtoc32.3: revision 1.9 distrib/sets/lists/comp/mi: revision 1.2469 lib/libc/locale/c32rtomb.h: revision 1.1 lib/libc/locale/c32rtomb.h: revision 1.2 include/Makefile: revision 1.147 share/man/man3/uchar.3: revision 1.1 share/man/man3/uchar.3: revision 1.2 tests/lib/libc/locale/t_c32rtomb.c: revision 1.1 distrib/sets/lists/comp/mi: revision 1.2470 lib/libc/locale/c16rtomb.3: revision 1.1 lib/libc/locale/c16rtomb.c: revision 1.1 lib/libc/locale/c16rtomb.3: revision 1.2 lib/libc/locale/c16rtomb.c: revision 1.2 lib/libc/locale/c16rtomb.3: revision 1.3 lib/libc/locale/c16rtomb.c: revision 1.3 lib/libc/locale/c16rtomb.3: revision 1.4 lib/libc/locale/c16rtomb.c: revision 1.4 lib/libc/locale/c16rtomb.3: revision 1.5 lib/libc/locale/c16rtomb.c: revision 1.5 lib/libc/locale/c16rtomb.3: revision 1.6 lib/libc/locale/c16rtomb.c: revision 1.6 lib/libc/locale/c16rtomb.3: revision 1.7 lib/libc/locale/c16rtomb.c: revision 1.7 lib/libc/locale/c16rtomb.3: revision 1.8 lib/libc/locale/c16rtomb.3: revision 1.9 distrib/sets/lists/tests/mi: revision 1.1330 distrib/sets/lists/tests/mi: revision 1.1331 distrib/sets/lists/tests/mi: revision 1.1332 tests/lib/libc/locale/t_uchar.c: revision 1.1 tests/lib/libc/locale/t_uchar.c: revision 1.2 tests/lib/libc/locale/t_uchar.c: revision 1.3 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc16.c: revision 1.3 include/uchar.h: revision 1.1 include/uchar.h: revision 1.2 include/uchar.h: revision 1.3 include/uchar.h: revision 1.4 include/uchar.h: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.1 include/uchar.h: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.2 tests/lib/libc/locale/t_c8rtomb.c: revision 1.3 tests/lib/libc/locale/t_c8rtomb.c: revision 1.4 share/man/man3/Makefile: revision 1.93 tests/lib/libc/locale/t_c8rtomb.c: revision 1.5 tests/lib/libc/locale/t_c8rtomb.c: revision 1.6 tests/lib/libc/locale/t_c8rtomb.c: revision 1.7 lib/libc/shlib_version: revision 1.297 lib/libc/locale/c16rtomb.3: revision 1.10 lib/libc/locale/c16rtomb.3: revision 1.11 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.1 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.2 tests/lib/libc/locale/t_mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc16.3: revision 1.10 tests/lib/libc/locale/Makefile: revision 1.15 tests/lib/libc/locale/Makefile: revision 1.16 tests/lib/libc/locale/Makefile: revision 1.17 tests/lib/libc/locale/Makefile: revision 1.18 distrib/sets/lists/debug/mi: revision 1.442 distrib/sets/lists/debug/mi: revision 1.443 distrib/sets/lists/debug/mi: revision 1.444 lib/libc/locale/c8rtomb.3: revision 1.1 lib/libc/locale/c8rtomb.c: revision 1.1 lib/libc/locale/c8rtomb.3: revision 1.2 lib/libc/locale/c8rtomb.c: revision 1.2 lib/libc/locale/c8rtomb.3: revision 1.3 lib/libc/locale/c8rtomb.c: revision 1.3 lib/libc/locale/c8rtomb.3: revision 1.4 lib/libc/locale/c8rtomb.c: revision 1.4 lib/libc/locale/c8rtomb.3: revision 1.5 lib/libc/locale/c8rtomb.c: revision 1.5 lib/libc/locale/c8rtomb.3: revision 1.6 lib/libc/locale/c8rtomb.c: revision 1.6 lib/libc/locale/c8rtomb.3: revision 1.7 lib/libc/locale/c8rtomb.3: revision 1.8 lib/libc/locale/c8rtomb.3: revision 1.9 lib/libc/locale/mbrtoc32.h: revision 1.1 lib/libc/locale/mbrtoc32.h: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.1 lib/libc/locale/mbrtoc8.3: revision 1.1 lib/libc/locale/mbrtoc8.c: revision 1.2 lib/libc/locale/mbrtoc8.3: revision 1.2 lib/libc/locale/mbrtoc8.c: revision 1.3 lib/libc/locale/mbrtoc8.3: revision 1.3 lib/libc/locale/mbrtoc8.c: revision 1.4 lib/libc/locale/mbrtoc8.3: revision 1.4 lib/libc/locale/Makefile.inc: revision 1.66 lib/libc/locale/mbrtoc8.c: revision 1.5 lib/libc/locale/mbrtoc8.3: revision 1.5 lib/libc/locale/Makefile.inc: revision 1.67 lib/libc/locale/mbrtoc8.c: revision 1.6 lib/libc/locale/mbrtoc8.3: revision 1.6 lib/libc/locale/mbrtoc8.c: revision 1.7 lib/libc/locale/mbrtoc8.3: revision 1.7 lib/libc/locale/mbrtoc8.c: revision 1.8 lib/libc/locale/c32rtomb.3: revision 1.1 lib/libc/locale/c32rtomb.c: revision 1.1 lib/libc/locale/c32rtomb.3: revision 1.2 lib/libc/locale/c32rtomb.c: revision 1.2 lib/libc/locale/c32rtomb.3: revision 1.3 lib/libc/locale/c32rtomb.c: revision 1.3 lib/libc/locale/c32rtomb.3: revision 1.4 lib/libc/locale/c32rtomb.c: revision 1.4 lib/libc/locale/c32rtomb.3: revision 1.5 lib/libc/locale/c32rtomb.c: revision 1.5 lib/libc/locale/c32rtomb.3: revision 1.6 lib/libc/locale/c32rtomb.c: revision 1.6 lib/libc/locale/c32rtomb.3: revision 1.7 lib/libc/locale/c32rtomb.3: revision 1.8
(all via patch)
tests/lib/libc/locale/Makefile: Sort. No functional change intended. Preparation for PR lib/52374.
uchar.h: New header file for C11 (and C++11) compliance.
Implementation of the new functions mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb to come later. Updates for C23 to come later. PR lib/52374: <uchar.h> missing
libc: New C11 functions mbrtoc16, mbrtoc32, c16rtomb, c32rtomb.
The mbrtoc16/32 functions read mulitbyte strings according to the current locale into UTF-16/32 code unit sequences; the c16/32rtomb functions write UTF-16/32 code unit sequences into multibyte strings according to the current locale. The `r' means restartable: they work incrementally and pick up where they left off.
NOTE: This bumps the libc minor version, since it adds new symbols.
PR lib/52374: <uchar.h> missing mbrtoc16(3), mbrtoc32(3): Fix \n in man page examples. Need to write \en to pacify roff. PR lib/52374: <uchar.h> missing
c16rtomb(3), c32rtomb(3): Fix more \n in man pages. Also, tighten an assertion: we left room for a NUL byte at the end. PR lib/52374: <uchar.h> missing
libc: Use the more idiomatic alignof from stdalign.h. No functional change intended. PR lib/52374: <uchar.h> missing
mbrtoc16(3): Simplify surrogate state test.
Turn the finer-grained test into an assertion. No semantic change intended: we are supposed to control this state, and we always arrange it this way. (But in principle this could change the behaviour of buggy programs that violate the mbstate_t abstraction.) PR lib/52374: <uchar.h> missing
libc: New functions c8rtomb(3) and mbrtoc8(3).
New in C23, for converting from UTF-8 to locale-dependent multibyte sequences (c8rtomb) or vice versa (mbrtoc8), along with the new type char8_t.
Conditional on either: - _NETBSD_SOURCE - _ISOC23_SOURCE - __STDC_VERSION__ >= 202311L (Riding the libc minor bump from this morning for the UTF-16/UTF-32 versions from C11.)
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb libc: c32rtomb and mbrtoc32 are used internally, so weak-alias them. PR lib/52374: <uchar.h> missing c8rtomb(3), mbrtoc8(3): Use namespace.h to get private aliases.
This way applications defining the symbols c32rtomb or mbrtoc32 won't clobber our private definitions, which are slightly more constrained about their use of mbstate_t than is obvious from the interface contract.
PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb mbrtoc16(3), mbrtoc32(3): brush up markup
Split long .Fn lines into Fo/Fa/Fc. Dont indent the list of return values. Don't use artisanal -width.
Untabify code examples - indented literal displays don't have correct tab stops consistent with tab stops in the fixed font code, so the lines end up misaligned in the PostScript output.
c16rtomb(3), c32rtomb(3): brush up markup
mbrtoc16(3), mbrtoc32(3): Simplify return value language. Also expand BMP only once. PR lib/52374: <uchar.h> missing
mbrtoc16(3), mbrtoc32(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc32(3): Clarify control flow. No need for another goto here; let's keep it clearly structured with a single `out' label. No functional change intended. PR lib/52374: <uchar.h> missing
c8rtomb(3), mbrtoc8(3): brush up markup
mbrtoc8(3): Simplify return value language. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Specify what happens if ps is null. PR lib/52374: <uchar.h> missing
c8rtomb(3): Specify what happens when ps is null. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): No state overlap with mbrtoc8 or c8rtomb. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Work on deturgidifying prose. Still maybe not great but at least there's less jargon in most of the text, without really losing any content. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Work on deturgidifying prose. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3), mbrtoc32(3): Restore word accidentally removed. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Restore word accidentally removed. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c8rtomb(3): Fix possible error descriptions. The argument c8 can't be a surrogate code point itself (they're in the range [0xd800,0xdfff], beyond 8-bit values), but the bits of a surrogate code point could be forced into the UTF-8 format, which is also invalid. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
c16rtomb(3), c32rtomb(3): Attempt a deturgidification pass. Limit the jargon around surrogates. PR lib/52374: <uchar.h> missing
c8rtomb(3): Clarify prose and fix example in caveat. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb c16rtomb(3), c32rtomb(3), mbrtoc16(3), mbrtoc32(3): xref c8 versions. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc16(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR lib/52374: <uchar.h> missing
mbrtoc8(3): Clarify how many bytes are consumed in special cases. Fix overlap in RETURN VALUES section. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
pass lint, XXX see lint bug.
libc: Add _l variants of the cNrtomb and mbrtocN functions. These accept an explicit locale parameter, rather than using the current locale. Visible under _NETBSD_SOURCE, not exposed otherwise. NOTE: This adds libc symbols. Riding the libc minor bump for the non-_l variants of these from two days ago -- hope that's not pushing it too far. PR lib/58613: c*rtomb, mbrtoc* should have locale-parametric _l variants
c8rtomb(3), c16rtomb(3): Add tests for incomplete NUL termination. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3): Fix NUL handling. PR lib/58615: incomplete c8rtomb, c16rtomb handles NUL termination wrong
c8rtomb(3), c16rtomb(3), c32rtomb(3): Test stateful shift sequences. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Fix digit error in shift sequence test. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3): Nix __CTASSERT after case label. I put this in to make it (machine-verifiably) clear that zeroing the state is the same as returning to the initial conversion state, as the standard requires, but this is causing build trouble (and will likely cause more trouble if pulled up) because some definitions of __CTASSERT make a declaration which is forbidden after a label, so let's remove it. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8(3): Fix pasto in comment at top. No functional change intended. PR standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb
mbrtoc8: remove lint-specific workarounds No binary change.
mbrtoc8: fix comments
mbrtoc16, mbrtoc32: fix comments, remove lint-specific workarounds No binary change. t_c8rtomb, t_c16rtomb: Simplify comment. ESC $ B is technically rather the JIS X 0208-1983 shift sequence, but since I don't see any way to provoke the JIS X 0208-1978 shift sequence to come flying out of this conversion (ESC $ @), and I'm not sure there's any difference in the interpretation, let's just say JIS X 0208. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c32rtomb(3): Use conversion state to handle shift sequences. For conversion of Unicode scalar values to coding systems requiring shift sequences, such as ISO-2022-JP, _citrus_iconv_convert will always produce: 1. a shift sequence from the initial state to some nondefault state, like from US-ASCII to JIS X 0208 2. the encoding of the desired characater 3. a shift sequence restoring the initial state This is unnecessary if the output is already in the state needed to encoded the desired character. For example, this method produces seven bytes to encode each YEN SIGN in ISO-2022-JP -- and fourteen, to encode two consecutive ones -- even though the shift sequence is only three bytes long and once shifted YEN SIGN takes only one byte. Instead, convert the Unicode scalar value to a locale-dependent wide character and encode that, by composing - _citrus_iconv_convert => gives us a multibyte encoding of the character from the initial state (and restoring the initial state afterward) - mbrtowc with initial conversion state => gives us the single wide character representation XXX If combining characters are possible here, this may fail. - wcrtomb with caller's conversion tsate => gives us a state-dependent multibyte encoding of the character XXX Is there a cheaper way to convert from Unicode scalar value to locale-dependent wide character? It is not obvious to me from the largely undocumented Citrus machinery, but it would obviously be better than this somewhat circuitous Rube Goldberg contraption of chained multibyte APIs. PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
mbrtoc8(3), mbrtoc16(3): Test consuming shift sequences with state. This has the side effect of testing mbrtoc32(3) because they are both defined in terms of it. PR lib/58618: mbrtocN(3) fails to keep shift state
c8rtomb(3), c16rtomb(3), c32rtomb(3): Suggest MB_LEN_MAX in example. This way it avoids variable-length arrays, by always allocating the maximum space that could be occupied by MB_CUR_MAX.
mbrtoc32(3): Use conversion state to handle shift sequences. PR lib/58618: mbrtocN(3) fails to keep shift state
mbrtoc32(3): Fix name and type of mbrtowc_l return value. This was from `int mbtowc_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to mbrtowc_l. Caught by lint. `mb_len' avoids (harmless) clash with standard C function mblen(3). PR lib/58618: mbrtocN(3) fails to keep shift state
c32rtomb(3): Fix type of wcrtomb_l return value. This was from `int wctomb_l(...)' in an earlier draft and I didn't update it to size_t when I changed the draft to wcrtomb_l. Caught by lint. `wc_len' mirrors `mb_len' in the complementary code in mbrtoc32(3) to avoid clash with standard C function mblen(3). PR lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
c8rtomb(3), c16rtomb(3), c32rtomb(3): Attempt to simplify language.
c8rtomb(3), c16rtomb(3), c32rtomb(3): Fix null string output case. This ignores c8/c16/c32, produces no output anywhere, and just resets ps to the initial conversion state. Also just use 0 in the example, not '\0' or L'\0'. This works for C11, which prefers '\0' and L'\0', for and C23, which introduced the new u8'\0', u'\0' (UTF-16), and U'\0' (UTF-32). c16rtomb, c32rtomb, mbrtoc8: fix page numbers in comments mbrtoc8(3), mbrtoc16(3), mbrtoc32(3): Say 0 for zero code unit. Rather than deal with differences between C11 and C23 in notation, '\0' vs L'\0' vs u8'\0' vs u'\0' vs U'\0'. uchar.h: Include <sys/featuretest.h> before testing _*_SOURCE. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
uchar.h: Need <sys/cdefs.h> for __restrict. PR lib/52374: <uchar.h> missing
uchar.h: Simplify __cpp_char8_t and __cplusplus conditionals. No functional change intended. PR lib/52374: <uchar.h> missing
tests/lib/libc/locale/t_uchar: Test for char8_t, mbrtoc8, c8rtomb. PR lib/58752: various header files test _*_SOURCE macros but don't include sys/featuretest.h PR lib/52374: <uchar.h> missing
tests/t_uchar: fix copy-and-paste typo
|
| 1.3.2.1 | 14-Oct-2024 |
martin | file t_uchar.c was added on branch netbsd-10 on 2024-10-14 17:20:19 +0000
|
| 1.1 | 14-Jul-2017 |
perseant | branches: 1.1.2; Add a simple collation test. This test is expected to fail on HEAD since we do not yet have a working implementation of wcscoll.
|
| 1.1.2.1 | 14-Jul-2017 |
perseant | Initial commit of a mostly-working implementation of __STDC_ISO_10646__, with collation support using the Unicode Collation Algorithm.
The conversion from men/ku/ten form to Unicode is a gross hack at present. Fixing this, and fleshing out the LC_COLLATE locale component, are next on the agenda.
|
| 1.1 | 21-Nov-2011 |
joerg | branches: 1.1.4; Add test cases for strcspn, strpbrk, strspn, wcscspn, wcspbrk and wcsspn.
|
| 1.1.4.2 | 17-Apr-2012 |
yamt | sync with head
|
| 1.1.4.1 | 21-Nov-2011 |
yamt | file t_wcscspn.c was added on branch yamt-pagecache on 2012-04-17 00:09:11 +0000
|
| 1.1 | 21-Nov-2011 |
joerg | branches: 1.1.4; Add test cases for strcspn, strpbrk, strspn, wcscspn, wcspbrk and wcsspn.
|
| 1.1.4.2 | 17-Apr-2012 |
yamt | sync with head
|
| 1.1.4.1 | 21-Nov-2011 |
yamt | file t_wcspbrk.c was added on branch yamt-pagecache on 2012-04-17 00:09:11 +0000
|
| 1.1 | 28-Jul-2019 |
christos | branches: 1.1.6; PR/54414: Valery Ushakov: add a test for wcsrtombs(3) doesn't update the source argument on conversion error
|
| 1.1.6.2 | 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
| 1.1.6.1 | 28-Jul-2019 |
martin | file t_wcsrtombs.c was added on branch phil-wifi on 2020-04-13 08:05:26 +0000
|
| 1.1 | 21-Nov-2011 |
joerg | branches: 1.1.4; Add test cases for strcspn, strpbrk, strspn, wcscspn, wcspbrk and wcsspn.
|
| 1.1.4.2 | 17-Apr-2012 |
yamt | sync with head
|
| 1.1.4.1 | 21-Nov-2011 |
yamt | file t_wcsspn.c was added on branch yamt-pagecache on 2012-04-17 00:09:11 +0000
|
| 1.5 | 14-Jul-2017 |
joerg | VAX doesn't have the test cases, so stub the body as well.
|
| 1.4 | 12-Jul-2017 |
perseant | Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.3 | 01-Oct-2011 |
christos | branches: 1.3.34; Undo previous, Checking for vax is more appropriate.
|
| 1.2 | 01-Oct-2011 |
christos | no more ifdef vax
|
| 1.1 | 09-Apr-2011 |
pgoyette | atf-ify the various locale tests
|
| 1.3.34.2 | 18-Mar-2018 |
martin | Additionally pull up r1.5 for ticket #608:
VAX doesn't have the test cases, so stub the body as well.
|
| 1.3.34.1 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.5 | 12-Jul-2017 |
perseant | Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.4 | 25-May-2017 |
perseant | branches: 1.4.2; Add a member to the test data structure that indicates whether the given encoding is state-dependent, and test the results of wctomb(NULL, '\0') and mbtowc(NULL, NULL, 0) against this instead of against each other.
|
| 1.3 | 25-Mar-2013 |
gson | Don't size an array using MB_CUR_MAX while one locale is in effect and then use it with another locale having a larger MB_CUR_MAX. This should fix the t_wctomb:wcrtomb_state test failures seen on i386.
|
| 1.2 | 11-Jun-2011 |
christos | branches: 1.2.2; 1.2.8; Turn warns on for all tests and fix all the bugs.
|
| 1.1 | 09-Apr-2011 |
pgoyette | branches: 1.1.2; atf-ify the various locale tests
|
| 1.1.2.1 | 23-Jun-2011 |
cherry | Catchup with rmind-uvmplock merge.
|
| 1.2.8.1 | 23-Jun-2013 |
tls | resync from head
|
| 1.2.2.1 | 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
| 1.4.2.1 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|
| 1.3 | 24-May-2022 |
andvar | fix various typos in comment, documentation and log messages.
|
| 1.2 | 12-Jul-2017 |
perseant | Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
|
| 1.1 | 30-May-2017 |
perseant | branches: 1.1.2; Add test cases for sprintf/sscanf/strto{d,l} and the is* and isw* ctype functions, for single-byte encodings
|
| 1.1.2.1 | 15-Mar-2018 |
martin | Pull up following revision(s) (requested by maya in ticket #608): tests/lib/libc/locale/t_sprintf.c: revision 1.3 tests/lib/libc/locale/t_wctomb.c: revision 1.5 tests/lib/libc/locale/t_io.c: revision 1.5 tests/lib/libc/locale/t_wcstod.c: revision 1.4 tests/lib/libc/locale/t_mbstowcs.c: revision 1.2 tests/lib/libc/locale/t_wctype.c: revision 1.2 tests/lib/libc/locale/t_mbrtowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.2 tests/lib/libc/locale/t_btowc.c: revision 1.3 Add ISO10646 versions of these tests, conditional on __STDC_ISO_10646__ . Also make the tests a bit more verbose, to aid debugging when they fail.
Separate the C/POSIX locale test from the rest; make it more thorough and more correct. This fixes a problem reported by martin@ when the test is compiled with -funsigned-char.
|