From 3590c846d4c2febbc05b4ad6b14a06edc549e453 Mon Sep 17 00:00:00 2001 From: "Manuel A. Fernandez Montecelo" Date: Fri, 27 May 2016 14:35:16 +0100 Subject: Imported Upstream version 0.9.6+really0.9.6 --- doc/libunistring_13.html | 660 ++++++++++++++++++++--------------------------- 1 file changed, 278 insertions(+), 382 deletions(-) (limited to 'doc/libunistring_13.html') diff --git a/doc/libunistring_13.html b/doc/libunistring_13.html index 8b779109..ca81cf8a 100644 --- a/doc/libunistring_13.html +++ b/doc/libunistring_13.html @@ -1,6 +1,6 @@ - + -GNU libunistring: 13. Case mappings <unicase.h> +GNU libunistring: 13. Normalization forms (composition and decomposition) <uninorm.h> - - + + @@ -42,7 +42,7 @@ ul.toc {list-style: none} - + @@ -51,446 +51,369 @@ ul.toc {list-style: none} - +
[ << ]
[ << ] [ >> ]       [Top] [Contents][Index][Index] [ ? ]

- + -

13. Case mappings <unicase.h>

+

13. Normalization forms (composition and decomposition) <uninorm.h>

-

This include file defines functions for case mapping for Unicode strings and -case insensitive comparison of Unicode strings and C strings. -

-

These string functions fix the problems that were mentioned in -char *’ strings, namely, they handle the Croatian -LETTER DZ WITH CARON, the German LATIN SMALL LETTER SHARP S, the -Greek sigma and the Lithuanian i correctly. +

This include file defines functions for transforming Unicode strings to one +of the four normal forms, known as NFC, NFD, NKFC, NFKD. These +transformations involve decomposition and — for NFC and NFKC — composition +of Unicode characters.


- + -

13.1 Case mappings of characters

+

13.1 Decomposition of Unicode characters

-

The following functions implement case mappings on Unicode characters — -for those cases only where the result of the mapping is a again a single +

The following enumerated values are the possible types of decomposition of a Unicode character.

-

These mappings are locale and context independent. -

-
-

WARNING! These functions are not sufficient for languages such as -German, Greek and Lithuanian. Better use the functions below that treat an -entire string at once and are language aware. -

-
-
Function: ucs4_t uc_toupper (ucs4_t uc) - +
Constant: int UC_DECOMP_CANONICAL +
-

Returns the uppercase mapping of the Unicode character uc. +

Denotes canonical decomposition.

-
Function: ucs4_t uc_tolower (ucs4_t uc) - +
Constant: int UC_DECOMP_FONT +
-

Returns the lowercase mapping of the Unicode character uc. +

UCD marker: <font>. Denotes a font variant (e.g. a blackletter form).

-
Function: ucs4_t uc_totitle (ucs4_t uc) - +
Constant: int UC_DECOMP_NOBREAK +
-

Returns the titlecase mapping of the Unicode character uc. -

-

The titlecase mapping of a character is to be used when the character should -look like upper case and the following characters are lower cased. -

-

For most characters, this is the same as the uppercase mapping. There are -only few characters where the title case variant and the uuper case variant -are different. These characters occur in the Latin writing of the Croatian, -Bosnian, and Serbian languages. -

- - - - - - -

Lower case

Title case

Upper case -

LATIN SMALL LETTER LJ -

LATIN CAPITAL LETTER L WITH SMALL LETTER J -

LATIN CAPITAL LETTER LJ -

LATIN SMALL LETTER NJ -

LATIN CAPITAL LETTER N WITH SMALL LETTER J -

LATIN CAPITAL LETTER NJ -

LATIN SMALL LETTER DZ -

LATIN CAPITAL LETTER D WITH SMALL LETTER Z -

LATIN CAPITAL LETTER DZ -

LATIN SMALL LETTER DZ WITH CARON -

LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON -

LATIN CAPITAL LETTER DZ WITH CARON -

-
- -
- - -

13.2 Case mappings of strings

+

UCD marker: <noBreak>. +Denotes a no-break version of a space or hyphen. +

-

Case mapping should always be performed on entire strings, not on individual -characters. The functions in this sections do so. -

-

These functions allow to apply a normalization after the case mapping. The -reason is that if you want to treat ‘ä’ and ‘Ä’ the same, -you most often also want to treat the composed and decomposed forms of such -a character, U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS and -U+0041 LATIN CAPITAL LETTER A U+0308 COMBINING DIAERESIS the same. -The nf argument designates the normalization. -

- -

These functions are locale dependent. The iso639_language argument -identifies the language (e.g. "tr" for Turkish). NULL means to use -locale independent case mappings. -

-
Function: const char * uc_locale_language () - +
Constant: int UC_DECOMP_INITIAL +
-

Returns the ISO 639 language code of the current locale. -Returns "" if it is unknown, or in the "C" locale. +

UCD marker: <initial>. +Denotes an initial presentation form (Arabic).

-
Function: uint8_t * u8_toupper (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - -
-
Function: uint16_t * u16_toupper (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - +
Constant: int UC_DECOMP_MEDIAL +
-
Function: uint32_t * u32_toupper (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - -
-

Returns the uppercase mapping of a string. -

-

The nf argument identifies the normalization form to apply after the -case-mapping. It can also be NULL, for no normalization. +

UCD marker: <medial>. +Denotes a medial presentation form (Arabic).

-
Function: uint8_t * u8_tolower (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - +
Constant: int UC_DECOMP_FINAL +
-
Function: uint16_t * u16_tolower (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - -
-
Function: uint32_t * u32_tolower (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - -
-

Returns the lowercase mapping of a string. -

-

The nf argument identifies the normalization form to apply after the -case-mapping. It can also be NULL, for no normalization. +

UCD marker: <final>. +Denotes a final presentation form (Arabic).

-
Function: uint8_t * u8_totitle (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - +
Constant: int UC_DECOMP_ISOLATED +
-
Function: uint16_t * u16_totitle (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - -
-
Function: uint32_t * u32_totitle (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - -
-

Returns the titlecase mapping of a string. -

-

Mapping to title case means that, in each word, the first cased character -is being mapped to title case and the remaining characters of the word -are being mapped to lower case. -

-

The nf argument identifies the normalization form to apply after the -case-mapping. It can also be NULL, for no normalization. +

UCD marker: <isolated>. +Denotes an isolated presentation form (Arabic).

-
- - -

13.3 Case mappings of substrings

- -

Case mapping of a substring cannot simply be performed by extracting the -substring and then applying the case mapping function to it. This does not -work because case mapping requires some information about the surrounding -characters. The following functions allow to apply case mappings to -substrings of a given string, while taking into account the characters that -precede it (the “prefix”) and the characters that follow it (the “suffix”). -

-
Type: casing_prefix_context_t - +
Constant: int UC_DECOMP_CIRCLE +
-

This data type denotes the case-mapping context that is given by a prefix -string. It is an immediate type that can be copied by simple assignment, -without involving memory allocation. It is not an array type. +

UCD marker: <circle>. +Denotes an encircled form.

-
Constant: casing_prefix_context_t unicase_empty_prefix_context - +
Constant: int UC_DECOMP_SUPER +
-

This constant is the case-mapping context that corresponds to an empty prefix -string. +

UCD marker: <super>. +Denotes a superscript form.

-

The following functions return casing_prefix_context_t objects: -

-
Function: casing_prefix_context_t u8_casing_prefix_context (const uint8_t *s, size_t n) - +
Constant: int UC_DECOMP_SUB +
-
Function: casing_prefix_context_t u16_casing_prefix_context (const uint16_t *s, size_t n) - +

UCD marker: <sub>. +Denotes a subscript form. +

+ +
+
Constant: int UC_DECOMP_VERTICAL +
-
Function: casing_prefix_context_t u32_casing_prefix_context (const uint32_t *s, size_t n) - +

UCD marker: <vertical>. +Denotes a vertical layout presentation form. +

+ +
+
Constant: int UC_DECOMP_WIDE +
-

Returns the case-mapping context of a given prefix string. +

UCD marker: <wide>. +Denotes a wide (or zenkaku) compatibility character.

-
Function: casing_prefix_context_t u8_casing_prefixes_context (const uint8_t *s, size_t n, casing_prefix_context_t a_context) - +
Constant: int UC_DECOMP_NARROW +
-
Function: casing_prefix_context_t u16_casing_prefixes_context (const uint16_t *s, size_t n, casing_prefix_context_t a_context) - +

UCD marker: <narrow>. +Denotes a narrow (or hankaku) compatibility character. +

+ +
+
Constant: int UC_DECOMP_SMALL +
-
Function: casing_prefix_context_t u32_casing_prefixes_context (const uint32_t *s, size_t n, casing_prefix_context_t a_context) - +

UCD marker: <small>. +Denotes a small variant form (CNS compatibility). +

+ +
+
Constant: int UC_DECOMP_SQUARE +
-

Returns the case-mapping context of the prefix concat(a, s), -given the case-mapping context of the prefix a. +

UCD marker: <square>. +Denotes a CJK squared font variant.

-
Type: casing_suffix_context_t - +
Constant: int UC_DECOMP_FRACTION +
-

This data type denotes the case-mapping context that is given by a suffix -string. It is an immediate type that can be copied by simple assignment, -without involving memory allocation. It is not an array type. +

UCD marker: <fraction>. +Denotes a vulgar fraction form.

-
Constant: casing_suffix_context_t unicase_empty_suffix_context - +
Constant: int UC_DECOMP_COMPAT +
-

This constant is the case-mapping context that corresponds to an empty suffix -string. +

UCD marker: <compat>. +Denotes an otherwise unspecified compatibility character.

-

The following functions return casing_suffix_context_t objects: +

The following constant denotes the maximum size of decomposition of a single +Unicode character.

-
Function: casing_suffix_context_t u8_casing_suffix_context (const uint8_t *s, size_t n) - -
-
Function: casing_suffix_context_t u16_casing_suffix_context (const uint16_t *s, size_t n) - +
Macro: unsigned int UC_DECOMPOSITION_MAX_LENGTH +
-
Function: casing_suffix_context_t u32_casing_suffix_context (const uint32_t *s, size_t n) - -
-

Returns the case-mapping context of a given suffix string. +

This macro expands to a constant that is the required size of buffer passed to +the uc_decomposition and uc_canonical_decomposition functions.

+

The following functions decompose a Unicode character. +

-
Function: casing_suffix_context_t u8_casing_suffixes_context (const uint8_t *s, size_t n, casing_suffix_context_t a_context) - +
Function: int uc_decomposition (ucs4_t uc, int *decomp_tag, ucs4_t *decomposition) +
-
Function: casing_suffix_context_t u16_casing_suffixes_context (const uint16_t *s, size_t n, casing_suffix_context_t a_context) - -
-
Function: casing_suffix_context_t u32_casing_suffixes_context (const uint32_t *s, size_t n, casing_suffix_context_t a_context) - +

Returns the character decomposition mapping of the Unicode character uc. +decomposition must point to an array of at least +UC_DECOMPOSITION_MAX_LENGTH ucs_t elements. +

+

When a decomposition exists, decomposition[0..n-1] and +*decomp_tag are filled and n is returned. Otherwise -1 is +returned. +

+ +
+
Function: int uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition) +
-

Returns the case-mapping context of the suffix concat(s, a), -given the case-mapping context of the suffix a. +

Returns the canonical character decomposition mapping of the Unicode character +uc. decomposition must point to an array of at least +UC_DECOMPOSITION_MAX_LENGTH ucs_t elements. +

+

When a decomposition exists, decomposition[0..n-1] is filled +and n is returned. Otherwise -1 is returned.

-

The following functions perform a case mapping, considering the -prefix context and the suffix context. +


+ + +

13.2 Composition of Unicode characters

+ +

The following function composes a Unicode character from two Unicode +characters.

-
Function: uint8_t * u8_ct_toupper (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - +
Function: ucs4_t uc_composition (ucs4_t uc1, ucs4_t uc2) +
-
Function: uint16_t * u16_ct_toupper (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - -
-
Function: uint32_t * u32_ct_toupper (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - -
-

Returns the uppercase mapping of a string that is surrounded by a prefix -and a suffix. +

Attempts to combine the Unicode characters uc1, uc2. +uc1 is known to have canonical combining class 0. +

+

Returns the combination of uc1 and uc2, if it exists. +Returns 0 otherwise. +

+

Not all decompositions can be recombined using this function. See the Unicode +file ‘CompositionExclusions.txt’ for details.

+
+ + +

13.3 Normalization of strings

+ +

The Unicode standard defines four normalization forms for Unicode strings. +The following type is used to denote a normalization form. +

-
Function: uint8_t * u8_ct_tolower (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - -
-
Function: uint16_t * u16_ct_tolower (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - +
Type: uninorm_t +
-
Function: uint32_t * u32_ct_tolower (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - -
-

Returns the lowercase mapping of a string that is surrounded by a prefix -and a suffix. +

An object of type uninorm_t denotes a Unicode normalization form. +This is a scalar type; its values can be compared with ==.

+

The following constants denote the four normalization forms. +

-
Function: uint8_t * u8_ct_totitle (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - +
Macro: uninorm_t UNINORM_NFD +
-
Function: uint16_t * u16_ct_totitle (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - -
-
Function: uint32_t * u32_ct_totitle (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - +

Denotes Normalization form D: canonical decomposition. +

+ +
+
Macro: uninorm_t UNINORM_NFC +
-

Returns the titlecase mapping of a string that is surrounded by a prefix -and a suffix. +

Normalization form C: canonical decomposition, then canonical composition.

-

For example, to uppercase the UTF-8 substring between s + start_index -and s + end_index of a string that extends from s to -s + u8_strlen (s), you can use the statements -

-
 
size_t result_length;
-uint8_t result =
-  u8_ct_toupper (s + start_index, end_index - start_index,
-                 u8_casing_prefix_context (s, start_index),
-                 u8_casing_suffix_context (s + end_index,
-                                           u8_strlen (s) - end_index),
-                 iso639_language, NULL, NULL, &result_length);
-
+
+
Macro: uninorm_t UNINORM_NFKD + +
+

Normalization form KD: compatibility decomposition. +

-
- - -

13.4 Case insensitive comparison

+
+
Macro: uninorm_t UNINORM_NFKC + +
+

Normalization form KC: compatibility decomposition, then canonical composition. +

-

The following functions implement comparison that ignores differences in case -and normalization. +

The following functions operate on uninorm_t objects.

-
Function: uint8_t * u8_casefold (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - +
Function: bool uninorm_is_compat_decomposing (uninorm_t nf) +
-
Function: uint16_t * u16_casefold (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - +

Tests whether the normalization form nf does compatibility decomposition. +

+ +
+
Function: bool uninorm_is_composing (uninorm_t nf) +
-
Function: uint32_t * u32_casefold (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - +

Tests whether the normalization form nf includes canonical composition. +

+ +
+
Function: uninorm_t uninorm_decomposing_form (uninorm_t nf) +
-

Returns the case folded string. -

-

Comparing u8_casefold (s1) and u8_casefold (s2) -with the u8_cmp2 function is equivalent to comparing s1 and -s2 with u8_casecmp. -

-

The nf argument identifies the normalization form to apply after the -case-mapping. It can also be NULL, for no normalization. +

Returns the decomposing variant of the normalization form nf. +This maps NFC,NFD → NFD and NFKC,NFKD → NFKD.

+

The following functions apply a Unicode normalization form to a Unicode string. +

-
Function: uint8_t * u8_ct_casefold (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) - +
Function: uint8_t * u8_normalize (uninorm_t nf, const uint8_t *s, size_t n, uint8_t *resultbuf, size_t *lengthp) +
-
Function: uint16_t * u16_ct_casefold (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) - +
Function: uint16_t * u16_normalize (uninorm_t nf, const uint16_t *s, size_t n, uint16_t *resultbuf, size_t *lengthp) +
-
Function: uint32_t * u32_ct_casefold (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) - +
Function: uint32_t * u32_normalize (uninorm_t nf, const uint32_t *s, size_t n, uint32_t *resultbuf, size_t *lengthp) +
-

Returns the case folded string. The case folding takes into account the -case mapping contexts of the prefix and suffix strings. +

Returns the specified normalization form of a string.

+
+ + +

13.4 Normalizing comparisons

+ +

The following functions compare Unicode string, ignoring differences in +normalization. +

-
Function: int u8_casecmp (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - -
-
Function: int u16_casecmp (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - +
Function: int u8_normcmp (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2, uninorm_t nf, int *resultp) +
-
Function: int u32_casecmp (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - +
Function: int u16_normcmp (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2, uninorm_t nf, int *resultp) +
-
Function: int ulc_casecmp (const char *s1, size_t n1, const char *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - +
Function: int u32_normcmp (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2, uninorm_t nf, int *resultp) +
-

Compares s1 and s2, ignoring differences in case and normalization. +

Compares s1 and s2, ignoring differences in normalization.

-

The nf argument identifies the normalization form to apply after the -case-mapping. It can also be NULL, for no normalization. +

nf must be either UNINORM_NFD or UNINORM_NFKD.

If successful, sets *resultp to -1 if s1 < s2, 0 if s1 = s2, 1 if s1 > s2, and returns 0. Upon failure, returns -1 with errno set.

- - - - -

The following functions additionally take into account the sorting rules of the -current locale. -

+ +
-
Function: char * u8_casexfrm (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) - +
Function: char * u8_normxfrm (const uint8_t *s, size_t n, uninorm_t nf, char *resultbuf, size_t *lengthp) +
-
Function: char * u16_casexfrm (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) - +
Function: char * u16_normxfrm (const uint16_t *s, size_t n, uninorm_t nf, char *resultbuf, size_t *lengthp) +
-
Function: char * u32_casexfrm (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) - -
-
Function: char * ulc_casexfrm (const char *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) - +
Function: char * u32_normxfrm (const uint32_t *s, size_t n, uninorm_t nf, char *resultbuf, size_t *lengthp) +

Converts the string s of length n to a NUL-terminated byte -sequence, in such a way that comparing u8_casexfrm (s1) and -u8_casexfrm (s2) with the gnulib function memcmp2 is -equivalent to comparing s1 and s2 with u8_casecoll. +sequence, in such a way that comparing u8_normxfrm (s1) and +u8_normxfrm (s2) with the u8_cmp2 function is equivalent to +comparing s1 and s2 with the u8_normcoll function.

-

nf must be either UNINORM_NFC, UNINORM_NFKC, or NULL for -no normalization. +

nf must be either UNINORM_NFC or UNINORM_NFKC.

-
Function: int u8_casecoll (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - -
-
Function: int u16_casecoll (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - +
Function: int u8_normcoll (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2, uninorm_t nf, int *resultp) +
-
Function: int u32_casecoll (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - +
Function: int u16_normcoll (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2, uninorm_t nf, int *resultp) +
-
Function: int ulc_casecoll (const char *s1, size_t n1, const char *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) - +
Function: int u32_normcoll (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2, uninorm_t nf, int *resultp) +
-

Compares s1 and s2, ignoring differences in case and normalization, -using the collation rules of the current locale. +

Compares s1 and s2, ignoring differences in normalization, using +the collation rules of the current locale.

-

The nf argument identifies the normalization form to apply after the -case-mapping. It must be either UNINORM_NFC or UNINORM_NFKC. -It can also be NULL, for no normalization. +

nf must be either UNINORM_NFC or UNINORM_NFKC.

If successful, sets *resultp to -1 if s1 < s2, 0 if s1 = s2, 1 if s1 > s2, and returns 0. @@ -498,93 +421,66 @@ Upon failure, returns -1 with errno set.


- + -

13.5 Case detection

+

13.5 Normalization of streams of Unicode characters

-

The following functions determine whether a Unicode string is entirely in -upper case. or entirely in lower case, or entirely in title case, or already -case-folded. +

A “stream of Unicode characters” is essentially a function that accepts an +ucs4_t argument repeatedly, optionally combined with a function that +“flushes” the stream.

-
Function: int u8_is_uppercase (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-
Function: int u16_is_uppercase (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-
Function: int u32_is_uppercase (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) - +
Type: struct uninorm_filter +
-

Sets *resultp to true if mapping NFD(s) to upper case is -a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with -errno set. +

This is the data type of a stream of Unicode characters that normalizes its +input according to a given normalization form and passes the normalized +character sequence to the encapsulated stream of Unicode characters.

-
Function: int u8_is_lowercase (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) - +
Function: struct uninorm_filter * uninorm_filter_create (uninorm_t nf, int (*stream_func) (void *stream_data, ucs4_t uc), void *stream_data) +
-
Function: int u16_is_lowercase (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-
Function: int u32_is_lowercase (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-

Sets *resultp to true if mapping NFD(s) to lower case is -a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with -errno set. +

Creates and returns a normalization filter for Unicode characters. +

+

The pair (stream_func, stream_data) is the encapsulated stream. +stream_func (stream_data, uc) receives the Unicode +character uc and returns 0 if successful, or -1 with errno set +upon failure. +

+

Returns the new filter, or NULL with errno set upon failure.

-
Function: int u8_is_titlecase (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-
Function: int u16_is_titlecase (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) - +
Function: int uninorm_filter_write (struct uninorm_filter *filter, ucs4_t uc) +
-
Function: int u32_is_titlecase (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-

Sets *resultp to true if mapping NFD(s) to title case is -a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with -errno set. +

Stuffs a Unicode character into a normalizing filter. +Returns 0 if successful, or -1 with errno set upon failure.

-
Function: int u8_is_casefolded (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-
Function: int u16_is_casefolded (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) - +
Function: int uninorm_filter_flush (struct uninorm_filter *filter) +
-
Function: int u32_is_casefolded (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-

Sets *resultp to true if applying case folding to NFD(S) is -a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with -errno set. +

Brings data buffered in the filter to its destination, the encapsulated stream. +

+

Returns 0 if successful, or -1 with errno set upon failure. +

+

Note! If after calling this function, additional characters are written +into the filter, the resulting character sequence in the encapsulated stream +will not necessarily be normalized.

-

The following functions determine whether case mappings have any effect on a -Unicode string. -

-
Function: int u8_is_cased (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-
Function: int u16_is_cased (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) - +
Function: int uninorm_filter_free (struct uninorm_filter *filter) +
-
Function: int u32_is_cased (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) - -
-

Sets *resultp to true if case matters for s, that is, if -mapping NFD(s) to either upper case or lower case or title case is not -a no-op. Set *resultp to false if NFD(s) maps to itself -under the upper case mapping, under the lower case mapping, and under the title -case mapping; in other words, when NFD(s) consists entirely of caseless -characters. Upon failure, returns -1 with errno set. +

Brings data buffered in the filter to its destination, the encapsulated stream, +then closes and frees the filter. +

+

Returns 0 if successful, or -1 with errno set upon failure.


@@ -597,12 +493,12 @@ characters. Upon failure, returns -1 with errno set. - +
  [Top] [Contents][Index][Index] [ ? ]

- This document was generated by Bruno Haible on March, 30 2010 using texi2html 1.78a. + This document was generated by Daiki Ueno on July, 8 2015 using texi2html 1.78a.
-- cgit v1.2.3