man/man7/nls.7

    nls.7 revision 1.1
     $NetBSD: nls.7,v 1.1 2003/02/12 02:42:44 gmcgarry Exp $

 Copyright (c) 2003 The NetBSD Foundation, Inc.
 All rights reserved.

 This code is derived from software contributed to The NetBSD Foundation
 by Gregory McGarry.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
 1. Redistributions of source code must retain the above copyright
 notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above copyright
 notice, this list of conditions and the following disclaimer in the
 documentation and/or other materials provided with the distribution.
 3. All advertising materials mentioning features or use of this software
 must display the following acknowledgement:
 This product includes software developed by the NetBSD
 Foundation, Inc. and its contributors.
 4. Neither the name of The NetBSD Foundation nor the names of its
 contributors may be used to endorse or promote products derived
 from this software without specific prior written permission.

 THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.

.Dd February 12, 2003
.Dt NLS 7
.Os
.Sh NAME
.Nm NLS
.Nd Natural Language Support Overview
.Sh DESCRIPTION
National Language Support (NLS) provides commands for a single
worldwide operating system base. An internationalized system has no
built-in assumptions or dependencies on language-specific or
cultural-specific conventions such as:
p
l -bullet -indent -compact t Character classifications
t Character comparison rules
t Character collation order
t Numeric and monetary formatting
t Date and time formatting
t Message-text language
t Code sets
.El
p
All information pertaining to cultural conventions and language is
obtained at program run time.
p
"Internationalization" (often abbreviated "i18n") refers to the
operation by which system software is developed to support multiple
cultural-specific and language-specific conventions. This is a
generalization process by which the system is untied from calling only
English strings or other English-specific conventions. "Localization"
(often abbreviated "l10n") refers to the operations by which the user
environment is customized to handle its input and output appropriate
for specific language and cultural conventions. This is a
specialization process, by which generic methods already implemented
in an internationalized system are used in specific ways. The formal
description of cultural conventions for some country, together with
all associated translations targeted to the native language, is called
the "locale".
p
.Nx
provides extensive support to programmers and system developers to
enable internationalized software to be developed.
.Nx
also supplies a large variety of locales for system localization.
.Ss Localization of Information
All locale information is accessible to programs at run time so that
data is processed and displayed correctly for specific cultural
conventions and language.
p
A locale is divided into categories. A category is a group of
language-specific and culture-specific conventions as outlined in the
list above. ISO C specifies the following six standard categories
supported by
.Nx :
p
l -tag -compact -width LC_MESSAGES t LC_COLLATE string-collation order information
t LC_CTYPE character classification, case conversion, and other character attributes
t LC_MESSAGES the format for affirmative and negative responses
t LC_MONETARY rules and symbols for formatting monetary numeric information
t LC_NUMERIC rules and symbols for formatting nonmonetary numeric information
t LC_TIME rules and symbols for formatting time and date information
.El
p
Localization of the system is achieved by setting appropriate values
in environment variables to identify which locale should be used. The
following environment variables are used: LANG, LC_ALL, LC_COLLATE,
LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME, and NLSPATH.
The NLSPATH environment variable specifies a colon-separated list of
directory names where the message catalog files of the NLS database
are located. The LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC,
LC_TIME, and LC_MESSAGES environment variables determine the current
values for their respective categories. The LC_ALL and LANG
environment variables also determine the current locale.
p
The values of these environment variables contains a string format as:
p
d -literal  language[_territory][.codeset][@modifier]
.Ed
p
For example, the locale for the Danish language spoken in Denmark
using the ISO8859-1 code set is da_DK.ISO8859-1. The da stands for
the Danish language and the DK stands for Denmark. The short form of
da_DK is sufficient to indicate this locale.
p
The environment variable settings are queried by their priority level
in the following manner:
p
l -bullet t If the LC_ALL environment variable is set, all six categories use the
locale it specified. For example, if the LC_ALL environment variable
is set to en_US and the LANG environment variable is set to fr_FR,
each of the six categories is defined as the en_US locale.
t If the LC_ALL environment variable is not set, each individual
category uses the locale specified by its corresponding environment
variable. For example, if the LC_ALL environment variable is not set,
the LC_COLLATE environment variable is set to de_DE, and the LC_TIME
environment variable is set to fr_CA, then the LC_COLLATE category is
defined as de_DE and the LC_TIME category is defined as fr_CA.
Neither environment variable has precedence over the other in this
situation.
t If the LC_ALL environment variable is not set, and a value for a
particular LC_* environment variable is not set, the value of the LANG
environment variable determines the definition for that specific
category. For example, if the LC_ALL environment variable is not set,
the LC_CTYPE environment variable is set to en_US, the LC_NUMERIC
environment variable is not set, and the LANG environment variable is
set to is_IS, then the LC_CTYPE category is defined as en_US and the
LC_NUMERIC category to defined as is_IS. The LANG environment
variable specifies the locale for only those categories not previously
determined by an LC_* environment variable.
t If the LC_ALL environment variable is not set, a value for a
particular LC_* environment variable is not set, and the value of the
LANG environment variable is not set, the locale for that specific
category defaults to the C locale. The C or POSIX locale assumes the
7-bit ASCII character set and defines information for the six
categories. For example, if the LC_ALL environment variable is not
set, the LC_MONETARY environment variable is set to sv_SE, the LC_TIME
environment variable is not set, and the LANG environment variable is
not set, then the LC_MONETARY category is defined as sv_SE and the
LC_TIME category as C.
.El
.Ss Code Sets
A character is any symbol used for the organization, control, or
representation of data. A group of such symbols used to describe a
particular language make up a character set. A code set contains the
encoding values (conversion from bits to displayed characters) for a
character set. It is the encoding values in a code set that provide
the interface between the system and its input and output devices.
p
The following code sets are supported in
.Nx
l -tag -width ISO8859_family t ISO8859 family Industry-standard code sets are provided by means of the ISO8859
family of code sets, which provide a range of single-byte code set
support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
Greek, and Turkish. The eucJP code set is the industry-standard code
set used to support the Japanese locale.
t Unicode A Unicode environment based on the UTF-8 codeset is supported for all
supported language/territories. UTF-8 provides character support for
most of the major languages of the world and can be used in
environments where multiple languages must be processed
simultaneously.
.El
.Ss Internationalization for Programmers
To facilitate translations of messages into various languages and to
make the translated messages available to the program based on a
user's locale, it is necessary to keep messages separate from the
programs and provide them in the form of message catalogs that a
program can access at run time.
p
Access to locale information is provided through the
.Xr setlocale 3
and
.Xr nl_langinfo 3
interfaces. See their respective man pages for further information.
p
Message source files containing application messages are created by
the programmer and converted to message catalogs. These catalogs are
used by the application to retrieve and display messages, as needed.
p
.Nx
supports two message catalog interfaces: the X/Open
.Xr catgets 3
interface and
the Uniforum
.Xr gettext 3
interface. The
.Xr catgets
interface has the advantage that it belongs to a standard which is
well supported. Unfortunately the interface is complicated to use and
maintenance of the catalogs is difficult. The implementation also
doesn't support different codesets. The
.Xr gettext 3
interface has not been standardized yet, however it is being supported
by an increasing number of systems. It also provides many additional
tools which make programming and catalog maintenance much easier.
.Sh SEE ALSO
.Xr gencat 1 ,
.Xr catgets 3 ,
.Xr gettext 3 ,
.Xr nl_langinfo 3 ,
.Xr setlocale 3
.Sh BUGS
This man page is incomplete.