@node uniconv.h @chapter Conversions between Unicode and encodings @code{<uniconv.h>} This include file declares functions for converting between Unicode strings and @code{char *} strings in locale encoding or in other specified encodings. @cindex locale encoding The following function returns the locale encoding. @deftypefun {const char *} locale_charset () Determines the current locale's character encoding, and canonicalizes it into one of the canonical names listed in @file{localcharset.h}. If the canonical name cannot be determined, the result is a non-canonical name. The result must not be freed; it is statically allocated. The result of this function can be used as an argument to the @code{iconv_open} function in GNU libc, in GNU libiconv, or in the gnulib provided wrapper around the native @code{iconv_open} function. It may not work as an argument to the native @code{iconv_open} function directly. @end deftypefun The handling of unconvertible characters during the conversions can be parametrized through the following enumeration type: @deftp Type {enum iconv_ilseq_handler} This type specifies how unconvertible characters in the input are handled. @end deftp @deftypevr Constant {enum iconv_ilseq_handler} iconveh_error This handler causes the function to return with @code{errno} set to @code{EILSEQ}. @end deftypevr @deftypevr Constant {enum iconv_ilseq_handler} iconveh_question_mark This handler produces one question mark @samp{?} per unconvertible character. @end deftypevr @deftypevr Constant {enum iconv_ilseq_handler} iconveh_question_replacement_character This handler produces one U+FFFD per unconvertible character if that fits in the target encoding, otherwise one question mark @samp{?} per unconvertible character. @end deftypevr @deftypevr Constant {enum iconv_ilseq_handler} iconveh_escape_sequence This handler produces an escape sequence @code{\u@var{xxxx}} or @code{\U@var{xxxxxxxx}} for each unconvertible character. @end deftypevr @cindex converting The following functions convert between strings in a specified encoding and Unicode strings. @deftypefun {uint8_t *} u8_conv_from_encoding (const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}char@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, uint8_t@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp}) @deftypefunx {uint16_t *} u16_conv_from_encoding (const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}char@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, uint16_t@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp}) @deftypefunx {uint32_t *} u32_conv_from_encoding (const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}char@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, uint32_t@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp}) Converts an entire string, possibly including NUL bytes, from one encoding to UTF-8 encoding. Converts a memory region given in encoding @var{fromcode}. @var{fromcode} is as for the @code{iconv_open} function. The input is in the memory region between @var{src} (inclusive) and @code{@var{src} + @var{srclen}} (exclusive). If @var{offsets} is not NULL, it should point to an array of @var{srclen} integers; this array is filled with offsets into the result, i.e@. the character starting at @code{@var{src}[i]} corresponds to the character starting at @code{@var{result}[@var{offsets}[i]]}, and other offsets are set to @code{(size_t)(-1)}. @code{@var{resultbuf}} and @code{*@var{lengthp}} should be a scratch buffer and its size, or @code{@var{resultbuf}} can be NULL. May erase the contents of the memory at @code{@var{resultbuf}}. If successful: The resulting Unicode string (non-NULL) is returned and its length stored in @code{*@var{lengthp}}. The resulting string is @code{@var{resultbuf}} if no dynamic memory allocation was necessary, or a freshly allocated memory block otherwise. In case of error: NULL is returned and @code{errno} is set. Particular @code{errno} values: @code{EINVAL}, @code{EILSEQ}, @code{ENOMEM}. @end deftypefun @deftypefun {char *} u8_conv_to_encoding (const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}uint8_t@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, char@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp}) @deftypefunx {char *} u16_conv_to_encoding (const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}uint16_t@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, char@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp}) @deftypefunx {char *} u32_conv_to_encoding (const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}, const@tie{}uint32_t@tie{}*@var{src}, size_t@tie{}@var{srclen}, size_t@tie{}*@var{offsets}, char@tie{}*@var{resultbuf}, size_t@tie{}*@var{lengthp}) Converts an entire Unicode string, possibly including NUL units, from UTF-8 encoding to a given encoding. Converts a memory region to encoding @var{tocode}. @var{tocode} is as for the @code{iconv_open} function. The input is in the memory region between @var{src} (inclusive) and @code{@var{src} + @var{srclen}} (exclusive). If @var{offsets} is not NULL, it should point to an array of @var{srclen} integers; this array is filled with offsets into the result, i.e@. the character starting at @code{@var{src}[i]} corresponds to the character starting at @code{@var{result}[@var{offsets}[i]]}, and other offsets are set to @code{(size_t)(-1)}. @code{@var{resultbuf}} and @code{*@var{lengthp}} should be a scratch buffer and its size, or @code{@var{resultbuf}} can be NULL. May erase the contents of the memory at @code{@var{resultbuf}}. If successful: The resulting Unicode string (non-NULL) is returned and its length stored in @code{*@var{lengthp}}. The resulting string is @code{@var{resultbuf}} if no dynamic memory allocation was necessary, or a freshly allocated memory block otherwise. In case of error: NULL is returned and @code{errno} is set. Particular @code{errno} values: @code{EINVAL}, @code{EILSEQ}, @code{ENOMEM}. @end deftypefun The following functions convert between NUL terminated strings in a specified encoding and NUL terminated Unicode strings. @deftypefun {uint8_t *} u8_strconv_from_encoding (const@tie{}char@tie{}*@var{string}, const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}) @deftypefunx {uint16_t *} u16_strconv_from_encoding (const@tie{}char@tie{}*@var{string}, const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}) @deftypefunx {uint32_t *} u32_strconv_from_encoding (const@tie{}char@tie{}*@var{string}, const@tie{}char@tie{}*@var{fromcode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}) Converts a NUL terminated string from a given encoding. The result is @code{malloc} allocated, or NULL (with @var{errno} set) in case of error. Particular @code{errno} values: @code{EILSEQ}, @code{ENOMEM}. @end deftypefun @deftypefun {char *} u8_strconv_to_encoding (const@tie{}uint8_t@tie{}*@var{string}, const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}) @deftypefunx {char *} u16_strconv_to_encoding (const@tie{}uint16_t@tie{}*@var{string}, const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}) @deftypefunx {char *} u32_strconv_to_encoding (const@tie{}uint32_t@tie{}*@var{string}, const@tie{}char@tie{}*@var{tocode}, enum@tie{}iconv_ilseq_handler@tie{}@var{handler}) Converts a NUL terminated string to a given encoding. The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error. Particular @code{errno} values: @code{EILSEQ}, @code{ENOMEM}. @end deftypefun The following functions are shorthands that convert between NUL terminated strings in locale encoding and NUL terminated Unicode strings. @deftypefun {uint8_t *} u8_strconv_from_locale (const@tie{}char@tie{}*@var{string}) @deftypefunx {uint16_t *} u16_strconv_from_locale (const@tie{}char@tie{}*@var{string}) @deftypefunx {uint32_t *} u32_strconv_from_locale (const@tie{}char@tie{}*@var{string}) Converts a NUL terminated string from the locale encoding. The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error. Particular @code{errno} values: @code{ENOMEM}. @end deftypefun @deftypefun {char *} u8_strconv_to_locale (const@tie{}uint8_t@tie{}*@var{string}) @deftypefunx {char *} u16_strconv_to_locale (const@tie{}uint16_t@tie{}*@var{string}) @deftypefunx {char *} u32_strconv_to_locale (const@tie{}uint32_t@tie{}*@var{string}) Converts a NUL terminated string to the locale encoding. The result is @code{malloc} allocated, or NULL (with @code{errno} set) in case of error. Particular @code{errno} values: @code{ENOMEM}. @end deftypefun