From a9a31b1de5776a3b08a82101a4fa711294f0dd1d Mon Sep 17 00:00:00 2001 From: "Manuel A. Fernandez Montecelo" Date: Fri, 27 May 2016 14:28:30 +0100 Subject: Imported Upstream version 0.9.6+really0.9.3 --- doc/libunistring_11.html | 204 ++++++++++++++++++++++++----------------------- 1 file changed, 106 insertions(+), 98 deletions(-) (limited to 'doc/libunistring_11.html') diff --git a/doc/libunistring_11.html b/doc/libunistring_11.html index 1e95b7ad..7fd2dc3a 100644 --- a/doc/libunistring_11.html +++ b/doc/libunistring_11.html @@ -1,6 +1,6 @@ - + -GNU libunistring: 11. Word breaks in strings <uniwbrk.h> +GNU libunistring: 11. Line breaking <unilbrk.h> - - + + @@ -42,8 +42,8 @@ ul.toc {list-style: none} - - + + @@ -51,126 +51,134 @@ ul.toc {list-style: none} - +
[ << ][ >> ]
[ << ][ >> ]         [Top] [Contents][Index][Index] [ ? ]

- - -

11. Word breaks in strings <uniwbrk.h>

+ + +

11. Line breaking <unilbrk.h>

This include file declares functions for determining where in a string -“words” start and end. Here “words” are not necessarily the same as -entities that can be looked up in dictionaries, but rather groups of -consecutive characters that should not be split by text processing -operations. +line breaks could or should be introduced, in order to make the displayed +string fit into a column of given width.

- -
- - -

11.1 Word breaks in a string

- -

The following functions determine the word breaks in a string. +

These functions are locale dependent. The encoding argument identifies +the encoding (e.g. "ISO-8859-2" for Polish). +

+

The following enumerated values indicate whether, at a given position, a line +break is possible or not. Given an string s as an array +s[0..n-1] and a position i, the values have the +following meanings:

-
Function: void u8_wordbreaks (const uint8_t *s, size_t n, char *p) - +
Constant: int UC_BREAK_MANDATORY +
-
Function: void u16_wordbreaks (const uint16_t *s, size_t n, char *p) - +

This value indicates that s[i] is a line break character. +

+ +
+
Constant: int UC_BREAK_POSSIBLE +
-
Function: void u32_wordbreaks (const uint32_t *s, size_t n, char *p) - +

This value indicates that a line break may be inserted between +s[i-1] and s[i]. +

+ +
+
Constant: int UC_BREAK_HYPHENATION +
-
Function: void ulc_wordbreaks (const char *s, size_t n, char *p) - +

This value indicates that a hyphen and a line break may be inserted between +s[i-1] and s[i]. But beware of language +dependent hyphenation rules. +

+ +
+
Constant: int UC_BREAK_PROHIBITED +
-

Determines the word break points in s, an array of n units, and -stores the result at p[0..n-1]. -

-
p[i] = 1
-

means that there is a word boundary between s[i-1] and -s[i]. -

-
p[i] = 0
-

means that s[i-1] and s[i] must not be separated. -

-
-

p[0] is always set to 0. If an application wants to consider a -word break to be present at the beginning of the string (before -s[0]) or at the end of the string (after -s[0..n-1]), it has to treat these cases explicitly. +

This value indicates that s[i-1] and s[i] +must not be separated.

-
- - -

11.2 Word break property

+
+
Constant: int UC_BREAK_UNDEFINED + +
+

This value is not used as a return value; rather, in the overriding argument of +the u*_width_linebreaks functions, it indicates the absence of an +override. +

-

This is a more low-level API. The word break property is a property defined -in Unicode Standard Annex #29, section “Word Boundaries”, see -http://www.unicode.org/reports/tr29/#Word_Boundaries. It is -used for determining the word breaks in a string. -

-

The following are the possible values of the word break property. More values -may be added in the future. +

The following functions determine the positions at which line breaks are +possible.

-
Constant: int WBP_OTHER - -
-
Constant: int WBP_CR - -
-
Constant: int WBP_LF - -
-
Constant: int WBP_NEWLINE - -
-
Constant: int WBP_EXTEND - +
Function: void u8_possible_linebreaks (const uint8_t *s, size_t n, const char *encoding, char *p) +
-
Constant: int WBP_FORMAT - +
Function: void u16_possible_linebreaks (const uint16_t *s, size_t n, const char *encoding, char *p) +
-
Constant: int WBP_KATAKANA - +
Function: void u32_possible_linebreaks (const uint32_t *s, size_t n, const char *encoding, char *p) +
-
Constant: int WBP_ALETTER - +
Function: void ulc_possible_linebreaks (const char *s, size_t n, const char *encoding, char *p) +
-
Constant: int WBP_MIDNUMLET - -
-
Constant: int WBP_MIDLETTER - +

Determines the line break points in s, and stores the result at +p[0..n-1]. Every p[i] is assigned one of +the values UC_BREAK_MANDATORY, UC_BREAK_POSSIBLE, +UC_BREAK_HYPHENATION, UC_BREAK_PROHIBITED. +

+ +

The following functions determine where line breaks should be inserted so that +each line fits in a given width, when output to a device that uses +non-proportional fonts. +

+
+
Function: int u8_width_linebreaks (const uint8_t *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) +
-
Constant: int WBP_MIDNUM - +
Function: int u16_width_linebreaks (const uint16_t *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) +
-
Constant: int WBP_NUMERIC - +
Function: int u32_width_linebreaks (const uint32_t *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) +
-
Constant: int WBP_EXTENDNUMLET - +
Function: int ulc_width_linebreaks (const char *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) +
-
- -

The following function looks up the word break property of a character. +

Chooses the best line breaks, assuming that every character occupies a width +given by the uc_width function (see Display width <uniwidth.h>).

-
-
Function: int uc_wordbreak_property (ucs4_t uc) - -
-

Returns the Word_Break property of a Unicode character. +

The string is s[0..n-1]. +

+

The maximum number of columns per line is given as width. +The starting column of the string is given as start_column. +If the algorithm shall keep room after the last piece, this amount of room can +be given as at_end_columns. +

+

override is an optional override; if +override[i] != UC_BREAK_UNDEFINED, +override[i] takes precedence over p[i] +as returned by the u*_possible_linebreaks function. +

+

The given encoding is used for disambiguating widths in uc_width. +

+

Returns the column after the end of the string, and stores the result at +p[0..n-1]. Every p[i] is assigned one of +the values UC_BREAK_MANDATORY, UC_BREAK_POSSIBLE, +UC_BREAK_HYPHENATION, UC_BREAK_PROHIBITED. Here the value +UC_BREAK_POSSIBLE indicates that a line break should be inserted.


- - + + @@ -178,12 +186,12 @@ may be added in the future. - +
[ << ][ >> ]
[ << ][ >> ]         [Top] [Contents][Index][Index] [ ? ]

- This document was generated by Daiki Ueno on July, 8 2015 using texi2html 1.78a. + This document was generated by Bruno Haible on March, 30 2010 using texi2html 1.78a.
-- cgit v1.2.3