man/man7/nls.7

Language Name Code Language Family
ABKHAZIAN AB IBERO-CAUCASIAN
AFAN (OROMO) OM HAMITIC
AFAR AA HAMITIC
AFRIKAANS AF GERMANIC
ALBANIAN SQ INDO-EUROPEAN (OTHER)
AMHARIC AM SEMITIC
ARABIC AR SEMITIC
ARMENIAN HY INDO-EUROPEAN (OTHER)
ASSAMESE AS INDIAN
AYMARA AY AMERINDIAN
AZERBAIJANI AZ TURKIC/ALTAIC
BASHKIR BA TURKIC/ALTAIC
BASQUE EU BASQUE
BENGALI BN INDIAN
BHUTANI DZ ASIAN
BIHARI BH INDIAN
BISLAMA BI
BRETON BR CELTIC
BULGARIAN BG SLAVIC
BURMESE MY ASIAN
BYELORUSSIAN BE SLAVIC
CAMBODIAN KM ASIAN
CATALAN CA ROMANCE
CHINESE ZH ASIAN
CORSICAN CO ROMANCE
CROATIAN HR SLAVIC
CZECH CS SLAVIC
DANISH DA GERMANIC
DUTCH NL GERMANIC
ENGLISH EN GERMANIC
ESPERANTO EO INTERNATIONAL AUX.
ESTONIAN ET FINNO-UGRIC
FAROESE FO GERMANIC
FIJI FJ OCEANIC/INDONESIAN
FINNISH FI FINNO-UGRIC
FRENCH FR ROMANCE
FRISIAN FY GERMANIC
GALICIAN GL ROMANCE
GEORGIAN KA IBERO-CAUCASIAN
GERMAN DE GERMANIC
GREEK EL LATIN/GREEK
GREENLANDIC KL ESKIMO
GUARANI GN AMERINDIAN
GUJARATI GU INDIAN
HAUSA HA NEGRO-AFRICAN
HEBREW HE SEMITIC
HINDI HI INDIAN
HUNGARIAN HU FINNO-UGRIC
ICELANDIC IS GERMANIC
INDONESIAN ID OCEANIC/INDONESIAN
INTERLINGUA IA INTERNATIONAL AUX.
INTERLINGUE IE INTERNATIONAL AUX.
INUKTITUT IU
INUPIAK IK ESKIMO
IRISH GA CELTIC
ITALIAN IT ROMANCE
JAPANESE JA ASIAN
JAVANESE JV OCEANIC/INDONESIAN
KANNADA KN DRAVIDIAN
KASHMIRI KS INDIAN
KAZAKH KK TURKIC/ALTAIC
KINYARWANDA RW NEGRO-AFRICAN
KIRGHIZ KY TURKIC/ALTAIC
KURUNDI RN NEGRO-AFRICAN
KOREAN KO ASIAN
KURDISH KU IRANIAN
LAOTHIAN LO ASIAN
LATIN LA LATIN/GREEK
LATVIAN LV BALTIC
LINGALA LN NEGRO-AFRICAN
LITHUANIAN LT BALTIC
MACEDONIAN MK SLAVIC
MALAGASY MG OCEANIC/INDONESIAN
MALAY MS OCEANIC/INDONESIAN
MALAYALAM ML DRAVIDIAN
MALTESE MT SEMITIC
MAORI MI OCEANIC/INDONESIAN
MARATHI MR INDIAN
MOLDAVIAN MO ROMANCE
MONGOLIAN MN
NAURU NA
NEPALI NE INDIAN
NORWEGIAN NO GERMANIC
OCCITAN OC ROMANCE
ORIYA OR INDIAN
PASHTO PS IRANIAN
PERSIAN (farsi) FA IRANIAN
POLISH PL SLAVIC
PORTUGUESE PT ROMANCE
PUNJABI PA INDIAN
QUECHUA QU AMERINDIAN
RHAETO-ROMANCE RM ROMANCE
ROMANIAN RO ROMANCE
RUSSIAN RU SLAVIC
SAMOAN SM OCEANIC/INDONESIAN
SANGHO SG NEGRO-AFRICAN
SANSKRIT SA INDIAN
SCOTS GAELIC GD CELTIC
SERBIAN SR SLAVIC
SERBO-CROATIAN SH SLAVIC
SESOTHO ST NEGRO-AFRICAN
SETSWANA TN NEGRO-AFRICAN
SHONA SN NEGRO-AFRICAN
SINDHI SD INDIAN
SINGHALESE SI INDIAN
SISWATI SS NEGRO-AFRICAN
SLOVAK SK SLAVIC
SLOVENIAN SL SLAVIC
SOMALI SO HAMITIC
SPANISH ES ROMANCE
SUNDANESE SU OCEANIC/INDONESIAN
SWAHILI SW NEGRO-AFRICAN
SWEDISH SV GERMANIC
TAGALOG TL OCEANIC/INDONESIAN
TAJIK TG IRANIAN
TAMIL TA DRAVIDIAN
TATAR TT TURKIC/ALTAIC
TELUGU TE DRAVIDIAN
THAI TH ASIAN
TIBETAN BO ASIAN
TIGRINYA TI SEMITIC
TONGA TO OCEANIC/INDONESIAN
TSONGA TS NEGRO-AFRICAN
TURKISH TR TURKIC/ALTAIC
TURKMEN TK TURKIC/ALTAIC
TWI TW NEGRO-AFRICAN
UIGUR UG
UKRAINIAN UK SLAVIC
URDU UR INDIAN
UZBEK UZ TURKIC/ALTAIC
VIETNAMESE VI ASIAN
VOLAPUK VO INTERNATIONAL AUX.
WELSH CY CELTIC
WOLOF WO NEGRO-AFRICAN
XHOSA XH NEGRO-AFRICAN
YIDDISH YI GERMANIC
YORUBA YO NEGRO-AFRICAN
ZHUANG ZA
ZULU ZU NEGRO-AFRICAN
p
For example, the locale for the Danish language spoken in Denmark
using the ISO8859-1 character set is da_DK.ISO8859-1.
The da stands for the Danish language and the DK stands for Denmark.
The short form of da_DK is sufficient to indicate this locale.
p
The environment variable settings are queried by their priority level
in the following manner:
p
l -bullet t If the
.Ev LC_ALL
environment variable is set, all six categories use the locale it
specifies.
t If the
.Ev LC_ALL
environment variable is not set, each individual category uses the
locale specified by its corresponding environment variable.
t If the
.Ev LC_ALL
environment variable is not set, and a value for a particular
.Ev LC_*
environment variable is not set, the value of the
.Ev LANG
environment variable specifies the default locale for all categories.
Only the
.Ev LANG
environment variable should be set in /etc/profile, since it makes it
most easy for the user to override the system default using the individual
.Ev LC_*
variables.
t If the
.Ev LC_ALL
environment variable is not set, a value for a particular
.Ev LC_*
environment variable is not set, and the value of the
.Ev LANG
environment variable is not set, the locale for that specific
category defaults to the C locale.
The C or POSIX locale assumes the 7-bit ASCII character set and defines
information for the six categories.
.El
.Ss Character Sets
A character is any symbol used for the organization, control, or
representation of data.
A group of such symbols used to describe a
particular language make up a character set.
It is the encoding values in a character set that provide
the interface between the system and its input and output devices.
p
The following character sets are supported in
.Nx
l -tag -width ISO8859_family t ISO8859 family Industry-standard character sets are provided by means of the ISO8859
family of character sets, which provide a range of single-byte character set
support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
Greek, and Turkish.
The eucJP character set is the industry-standard character set used to support
the Japanese locale.
t Unicode A Unicode environment based on the UTF-8 character set is supported for all
supported language/territories.
UTF-8 provides character support for most of the major languages of the
world and can be used in environments where multiple languages must be
processed simultaneously.
.El
.Ss Font Sets
A font set contains the glyphs to be displayed on the screen for a
corresponding character in a character set.
A display must support a suitable font to display a character set.
If suitable fonts are available to the X server, then X clients can
include support for different character sets.
.Xr xterm 1
includes support for UTF-8 character sets.
.Xr xfd 1
is useful for displaying all the characters in an X font.
p
The
.Nx
.Xr wscons 4
console provides support for loading fonts using the
.Xr wsfontload 8
utility.
Currently, only fonts for the ISO8859-1 family of character sets are
supported.
.Ss Internationalization for Programmers
To facilitate translations of messages into various languages and to
make the translated messages available to the program based on a
user's locale, it is necessary to keep messages separate from the
programs and provide them in the form of message catalogs that a
program can access at run time.
p
Access to locale information is provided through the
.Xr setlocale 3
and
.Xr nl_langinfo 3
interfaces.
See their respective man pages for further information.
p
Message source files containing application messages are created by
the programmer and converted to message catalogs.
These catalogs are used by the application to retrieve and display
messages, as needed.
p
.Nx
supports two message catalog interfaces: the X/Open
.Xr catgets 3
interface and the Uniforum
.Xr gettext 3
interface.
The
.Xr catgets 3
interface has the advantage that it belongs to a standard which is
well supported.
Unfortunately the interface is complicated to use and
maintenance of the catalogs is difficult.
The implementation also doesn't support different character sets.
The
.Xr gettext 3
interface has not been standardized yet, however it is being supported
by an increasing number of systems.
It also provides many additional tools which make programming and
catalog maintenance much easier.
.Ss Support for Multibyte Characters and Wide Characters
Character sets with multibyte characters may be difficult to decode, or may
contain state (i.e., adjacent characters are dependent).
ISO C specifies a set of functions using 'wide characters' which can handle
multibyte characters properly.
A wide character is specified in ISO C
as being a fixed number of bits wide and is stateless.
p
There are two types for wide characters:
.Em wchar_t
and
.Em wint_t .
.Em wchar_t
is a type which can contain one wide character and operates like
'char' type does for one character.
.Em wint_t
can contain one wide character or WEOF (wide EOF).
p
There are functions that operate on
.Em wchar_t ,
and substitute for functions operating on 'char'.
See
.Xr wmemchr 3
and
.Xr towlower 3
for details.
There are some additional functions that operate on
.Em wchar_t .
See
.Xr wctype 3
and
.Xr wctran 3
for details.
p
Wide characters should be used for all I/O processing which may rely
on locale-specific strings.
The two primary issues requiring special use of wide characters are:
l -bullet -offset indent t All I/O is performed using multibyte characters.
Input data is converted into wide characters immediately after
reading and data for output is converted from wide characters to
multibyte characters immediately before writing.
Conversion is achieved using
.Xr mbstowcs 3 ,
.Xr mbsrtowcs 3 ,
.Xr wcstombs 3 ,
.Xr wcsrtombs 3 ,
.Xr mblen 3 ,
.Xr mbrlen 3 ,
and
.Xr mbsinit 3 .
t Wide characters are used directly for I/O, using
.Xr getwchar 3 ,
.Xr fgetwc 3 ,
.Xr getwc 3 ,
.Xr ungetwc 3 ,
.Xr fgetws 3 ,
.Xr putwchar 3 ,
.Xr fputwc 3 ,
.Xr putwc 3 ,
and
.Xr fputws 3 .
They are also used for formatted I/O functions for wide characters
such as
.Xr fwscanf 3 ,
.Xr wscanf 3 ,
.Xr swscanf 3 ,
.Xr fwprintf 3 ,
.Xr wprintf 3 ,
.Xr swprintf 3 ,
.Xr vfwprintf 3 ,
.Xr vwprintf 3 ,
and
.Xr vswprintf 3 ,
and wide character identifier of %lc, %C, %ls, %S for conventional
formatted I/O functions.
.El
.Sh SEE ALSO
.Xr gencat 1 ,
.Xr xfd 1 ,
.Xr xterm 1 ,
.Xr catgets 3 ,
.Xr gettext 3 ,
.Xr nl_langinfo 3 ,
.Xr setlocale 3 ,
.Xr wsfontload 8
.Sh BUGS
This man page is incomplete.