From fa095a4504cbe668e4244547e2c141597bea4ecf Mon Sep 17 00:00:00 2001 From: Andreas Rottmann Date: Mon, 14 Sep 2009 12:32:44 +0200 Subject: Imported Upstream version 0.9.1 --- doc/libunistring_10.html | 192 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 doc/libunistring_10.html (limited to 'doc/libunistring_10.html') diff --git a/doc/libunistring_10.html b/doc/libunistring_10.html new file mode 100644 index 00000000..bf22ca1b --- /dev/null +++ b/doc/libunistring_10.html @@ -0,0 +1,192 @@ + + + + + +GNU libunistring: 10. Word breaks in strings <uniwbrk.h> + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ +

+ + +

10. Word breaks in strings `<uniwbrk.h>`

+ +

This include file declares functions for determining where in a string +“words” start and end. Here “words” are not necessarily the same as +entities that can be looked up in dictionaries, but rather groups of +consecutive characters that should not be split by text processing +operations. +

+ +

+ + +

10.1 Word breaks in a string

+ +

The following functions determine the word breaks in a string. +

Function: void u8_wordbreaks (const uint8_t *s, size_t n, char *p) + +

Function: void u16_wordbreaks (const uint16_t *s, size_t n, char *p) + +

Function: void u32_wordbreaks (const uint32_t *s, size_t n, char *p) + +

Function: void ulc_wordbreaks (const char *s, size_t n, char *p) + +

Determines the word break points in s, an array of n units, and +stores the result at p[0..n-1]. +

p[i] = 1: means that there is a word boundary between s[i-1] and +s[i]. +
p[i] = 0: means that s[i-1] and s[i] must not be separated. +

p[0] is always set to 0. If an application wants to consider a +word break to be present at the beginning of the string (before +s[0]) or at the end of the string (after +s[0..n-1]), it has to treat these cases explicitly. +

+ +

+ + +

10.2 Word break property

+ +

This is a more low-level API. The word break property is a property defined +in Unicode Standard Annex #29, section “Word Boundaries”, see +http://www.unicode.org/reports/tr29/#Word_Boundaries. It is +used for determining the word breaks in a string. +

The following are the possible values of the word break property. More values +may be added in the future. +

Constant: int WBP_OTHER + +
Constant: int WBP_CR + +
Constant: int WBP_LF + +
Constant: int WBP_NEWLINE + +
Constant: int WBP_EXTEND + +
Constant: int WBP_FORMAT + +
Constant: int WBP_KATAKANA + +
Constant: int WBP_ALETTER + +
Constant: int WBP_MIDNUMLET + +
Constant: int WBP_MIDLETTER + +
Constant: int WBP_MIDNUM + +
Constant: int WBP_NUMERIC + +
Constant: int WBP_EXTENDNUMLET + +

+ +

The following function looks up the word break property of a character. +

Function: int uc_wordbreak_property (ucs4_t uc) + +: Returns the Word_Break property of a Unicode character. +

+ + + + + + + + + + + + +

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

+ + This document was generated by Bruno Haible on July, 1 2009 using texi2html 1.78a. + +
+ +

+ + -- cgit v1.2.3