summaryrefslogtreecommitdiff
path: root/doc/libunistring_11.html
diff options
context:
space:
mode:
authorManuel A. Fernandez Montecelo <manuel.montezelo@gmail.com>2016-05-26 16:48:15 +0100
committerManuel A. Fernandez Montecelo <manuel.montezelo@gmail.com>2016-05-26 16:48:15 +0100
commit5f2b09982312c98863eb9a8dfe2c608b81f58259 (patch)
treee5d38581c2f36e1cca02efedd2d85044d77f76f9 /doc/libunistring_11.html
parent3e0814cd9862b89c7a39672672937477bd87ddfb (diff)
Imported Upstream version 0.9.6upstream/0.9.6
Diffstat (limited to 'doc/libunistring_11.html')
-rw-r--r--doc/libunistring_11.html204
1 files changed, 98 insertions, 106 deletions
diff --git a/doc/libunistring_11.html b/doc/libunistring_11.html
index 7fd2dc3a..1e95b7ad 100644
--- a/doc/libunistring_11.html
+++ b/doc/libunistring_11.html
@@ -1,6 +1,6 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd">
<html>
-<!-- Created on March, 30 2010 by texi2html 1.78a -->
+<!-- Created on July, 8 2015 by texi2html 1.78a -->
<!--
Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author)
Karl Berry <karl@freefriends.org>
@@ -11,10 +11,10 @@ Send bugs and suggestions to <texi2html-bug@nongnu.org>
-->
<head>
-<title>GNU libunistring: 11. Line breaking &lt;unilbrk.h&gt;</title>
+<title>GNU libunistring: 11. Word breaks in strings &lt;uniwbrk.h&gt;</title>
-<meta name="description" content="GNU libunistring: 11. Line breaking &lt;unilbrk.h&gt;">
-<meta name="keywords" content="GNU libunistring: 11. Line breaking &lt;unilbrk.h&gt;">
+<meta name="description" content="GNU libunistring: 11. Word breaks in strings &lt;uniwbrk.h&gt;">
+<meta name="keywords" content="GNU libunistring: 11. Word breaks in strings &lt;uniwbrk.h&gt;">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="texi2html 1.78a">
@@ -42,8 +42,8 @@ ul.toc {list-style: none}
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<table cellpadding="1" cellspacing="1" border="0">
-<tr><td valign="middle" align="left">[<a href="libunistring_10.html#SEC38" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
-<td valign="middle" align="left">[<a href="libunistring_12.html#SEC42" title="Next chapter"> &gt;&gt; </a>]</td>
+<tr><td valign="middle" align="left">[<a href="libunistring_10.html#SEC41" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
+<td valign="middle" align="left">[<a href="libunistring_12.html#SEC47" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
@@ -51,134 +51,126 @@ ul.toc {list-style: none}
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
-<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td>
+<td valign="middle" align="left">[<a href="libunistring_19.html#SEC77" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<hr size="2">
-<a name="unilbrk_002eh"></a>
-<a name="SEC41"></a>
-<h1 class="chapter"> <a href="libunistring.html#TOC41">11. Line breaking <code>&lt;unilbrk.h&gt;</code></a> </h1>
+<a name="uniwbrk_002eh"></a>
+<a name="SEC44"></a>
+<h1 class="chapter"> <a href="libunistring.html#TOC44">11. Word breaks in strings <code>&lt;uniwbrk.h&gt;</code></a> </h1>
<p>This include file declares functions for determining where in a string
-line breaks could or should be introduced, in order to make the displayed
-string fit into a column of given width.
+&ldquo;words&rdquo; start and end. Here &ldquo;words&rdquo; are not necessarily the same as
+entities that can be looked up in dictionaries, but rather groups of
+consecutive characters that should not be split by text processing
+operations.
</p>
-<p>These functions are locale dependent. The <var>encoding</var> argument identifies
-the encoding (e.g. <code>&quot;ISO-8859-2&quot;</code> for Polish).
-</p>
-<p>The following enumerated values indicate whether, at a given position, a line
-break is possible or not. Given an string <var>s</var> as an array
-<code><var>s</var>[0..<var>n</var>-1]</code> and a position <var>i</var>, the values have the
-following meanings:
+
+<hr size="6">
+<a name="Word-breaks-in-a-string"></a>
+<a name="SEC45"></a>
+<h2 class="section"> <a href="libunistring.html#TOC45">11.1 Word breaks in a string</a> </h2>
+
+<p>The following functions determine the word breaks in a string.
</p>
<dl>
-<dt><u>Constant:</u> int <b>UC_BREAK_MANDATORY</b>
-<a name="IDX633"></a>
+<dt><u>Function:</u> void <b>u8_wordbreaks</b><i> (const uint8_t *<var>s</var>, size_t <var>n</var>, char *<var>p</var>)</i>
+<a name="IDX736"></a>
</dt>
-<dd><p>This value indicates that <code><var>s</var>[<var>i</var>]</code> is a line break character.
-</p></dd></dl>
-
-<dl>
-<dt><u>Constant:</u> int <b>UC_BREAK_POSSIBLE</b>
-<a name="IDX634"></a>
+<dt><u>Function:</u> void <b>u16_wordbreaks</b><i> (const uint16_t *<var>s</var>, size_t <var>n</var>, char *<var>p</var>)</i>
+<a name="IDX737"></a>
</dt>
-<dd><p>This value indicates that a line break may be inserted between
-<code><var>s</var>[<var>i</var>-1]</code> and <code><var>s</var>[<var>i</var>]</code>.
-</p></dd></dl>
-
-<dl>
-<dt><u>Constant:</u> int <b>UC_BREAK_HYPHENATION</b>
-<a name="IDX635"></a>
+<dt><u>Function:</u> void <b>u32_wordbreaks</b><i> (const uint32_t *<var>s</var>, size_t <var>n</var>, char *<var>p</var>)</i>
+<a name="IDX738"></a>
</dt>
-<dd><p>This value indicates that a hyphen and a line break may be inserted between
-<code><var>s</var>[<var>i</var>-1]</code> and <code><var>s</var>[<var>i</var>]</code>. But beware of language
-dependent hyphenation rules.
-</p></dd></dl>
-
-<dl>
-<dt><u>Constant:</u> int <b>UC_BREAK_PROHIBITED</b>
-<a name="IDX636"></a>
+<dt><u>Function:</u> void <b>ulc_wordbreaks</b><i> (const char *<var>s</var>, size_t <var>n</var>, char *<var>p</var>)</i>
+<a name="IDX739"></a>
</dt>
-<dd><p>This value indicates that <code><var>s</var>[<var>i</var>-1]</code> and <code><var>s</var>[<var>i</var>]</code>
-must not be separated.
+<dd><p>Determines the word break points in <var>s</var>, an array of <var>n</var> units, and
+stores the result at <code><var>p</var>[0..<var>n</var>-1]</code>.
+</p><dl compact="compact">
+<dt> <code><var>p</var>[i] = 1</code></dt>
+<dd><p>means that there is a word boundary between <code><var>s</var>[i-1]</code> and
+<code><var>s</var>[i]</code>.
+</p></dd>
+<dt> <code><var>p</var>[i] = 0</code></dt>
+<dd><p>means that <code><var>s</var>[i-1]</code> and <code><var>s</var>[i]</code> must not be separated.
+</p></dd>
+</dl>
+<p><code><var>p</var>[0]</code> is always set to 0. If an application wants to consider a
+word break to be present at the beginning of the string (before
+<code><var>s</var>[0]</code>) or at the end of the string (after
+<code><var>s</var>[0..<var>n</var>-1]</code>), it has to treat these cases explicitly.
</p></dd></dl>
-<dl>
-<dt><u>Constant:</u> int <b>UC_BREAK_UNDEFINED</b>
-<a name="IDX637"></a>
-</dt>
-<dd><p>This value is not used as a return value; rather, in the overriding argument of
-the <code>u*_width_linebreaks</code> functions, it indicates the absence of an
-override.
-</p></dd></dl>
+<hr size="6">
+<a name="Word-break-property"></a>
+<a name="SEC46"></a>
+<h2 class="section"> <a href="libunistring.html#TOC46">11.2 Word break property</a> </h2>
-<p>The following functions determine the positions at which line breaks are
-possible.
+<p>This is a more low-level API. The word break property is a property defined
+in Unicode Standard Annex #29, section &ldquo;Word Boundaries&rdquo;, see
+<a href="http://www.unicode.org/reports/tr29/#Word_Boundaries">http://www.unicode.org/reports/tr29/#Word_Boundaries</a>. It is
+used for determining the word breaks in a string.
+</p>
+<p>The following are the possible values of the word break property. More values
+may be added in the future.
</p>
<dl>
-<dt><u>Function:</u> void <b>u8_possible_linebreaks</b><i> (const uint8_t *<var>s</var>, size_t <var>n</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX638"></a>
+<dt><u>Constant:</u> int <b>WBP_OTHER</b>
+<a name="IDX740"></a>
</dt>
-<dt><u>Function:</u> void <b>u16_possible_linebreaks</b><i> (const uint16_t *<var>s</var>, size_t <var>n</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX639"></a>
+<dt><u>Constant:</u> int <b>WBP_CR</b>
+<a name="IDX741"></a>
</dt>
-<dt><u>Function:</u> void <b>u32_possible_linebreaks</b><i> (const uint32_t *<var>s</var>, size_t <var>n</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX640"></a>
+<dt><u>Constant:</u> int <b>WBP_LF</b>
+<a name="IDX742"></a>
</dt>
-<dt><u>Function:</u> void <b>ulc_possible_linebreaks</b><i> (const char *<var>s</var>, size_t <var>n</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX641"></a>
+<dt><u>Constant:</u> int <b>WBP_NEWLINE</b>
+<a name="IDX743"></a>
</dt>
-<dd><p>Determines the line break points in <var>s</var>, and stores the result at
-<code><var>p</var>[0..<var>n</var>-1]</code>. Every <code><var>p</var>[<var>i</var>]</code> is assigned one of
-the values <code>UC_BREAK_MANDATORY</code>, <code>UC_BREAK_POSSIBLE</code>,
-<code>UC_BREAK_HYPHENATION</code>, <code>UC_BREAK_PROHIBITED</code>.
-</p></dd></dl>
-
-<p>The following functions determine where line breaks should be inserted so that
-each line fits in a given width, when output to a device that uses
-non-proportional fonts.
-</p>
-<dl>
-<dt><u>Function:</u> int <b>u8_width_linebreaks</b><i> (const uint8_t *<var>s</var>, size_t <var>n</var>, int <var>width</var>, int <var>start_column</var>, int <var>at_end_columns</var>, const char *<var>override</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX642"></a>
+<dt><u>Constant:</u> int <b>WBP_EXTEND</b>
+<a name="IDX744"></a>
</dt>
-<dt><u>Function:</u> int <b>u16_width_linebreaks</b><i> (const uint16_t *<var>s</var>, size_t <var>n</var>, int <var>width</var>, int <var>start_column</var>, int <var>at_end_columns</var>, const char *<var>override</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX643"></a>
+<dt><u>Constant:</u> int <b>WBP_FORMAT</b>
+<a name="IDX745"></a>
</dt>
-<dt><u>Function:</u> int <b>u32_width_linebreaks</b><i> (const uint32_t *<var>s</var>, size_t <var>n</var>, int <var>width</var>, int <var>start_column</var>, int <var>at_end_columns</var>, const char *<var>override</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX644"></a>
+<dt><u>Constant:</u> int <b>WBP_KATAKANA</b>
+<a name="IDX746"></a>
</dt>
-<dt><u>Function:</u> int <b>ulc_width_linebreaks</b><i> (const char *<var>s</var>, size_t <var>n</var>, int <var>width</var>, int <var>start_column</var>, int <var>at_end_columns</var>, const char *<var>override</var>, const char *<var>encoding</var>, char *<var>p</var>)</i>
-<a name="IDX645"></a>
+<dt><u>Constant:</u> int <b>WBP_ALETTER</b>
+<a name="IDX747"></a>
</dt>
-<dd><p>Chooses the best line breaks, assuming that every character occupies a width
-given by the <code>uc_width</code> function (see <a href="libunistring_9.html#SEC37">Display width <code>&lt;uniwidth.h&gt;</code></a>).
-</p>
-<p>The string is <code><var>s</var>[0..<var>n</var>-1]</code>.
-</p>
-<p>The maximum number of columns per line is given as <var>width</var>.
-The starting column of the string is given as <var>start_column</var>.
-If the algorithm shall keep room after the last piece, this amount of room can
-be given as <var>at_end_columns</var>.
-</p>
-<p><var>override</var> is an optional override; if
-<code><var>override</var>[<var>i</var>] != UC_BREAK_UNDEFINED</code>,
-<code><var>override</var>[<var>i</var>]</code> takes precedence over <code><var>p</var>[<var>i</var>]</code>
-as returned by the <code>u*_possible_linebreaks</code> function.
-</p>
-<p>The given <var>encoding</var> is used for disambiguating widths in <code>uc_width</code>.
+<dt><u>Constant:</u> int <b>WBP_MIDNUMLET</b>
+<a name="IDX748"></a>
+</dt>
+<dt><u>Constant:</u> int <b>WBP_MIDLETTER</b>
+<a name="IDX749"></a>
+</dt>
+<dt><u>Constant:</u> int <b>WBP_MIDNUM</b>
+<a name="IDX750"></a>
+</dt>
+<dt><u>Constant:</u> int <b>WBP_NUMERIC</b>
+<a name="IDX751"></a>
+</dt>
+<dt><u>Constant:</u> int <b>WBP_EXTENDNUMLET</b>
+<a name="IDX752"></a>
+</dt>
+</dl>
+
+<p>The following function looks up the word break property of a character.
</p>
-<p>Returns the column after the end of the string, and stores the result at
-<code><var>p</var>[0..<var>n</var>-1]</code>. Every <code><var>p</var>[<var>i</var>]</code> is assigned one of
-the values <code>UC_BREAK_MANDATORY</code>, <code>UC_BREAK_POSSIBLE</code>,
-<code>UC_BREAK_HYPHENATION</code>, <code>UC_BREAK_PROHIBITED</code>. Here the value
-<code>UC_BREAK_POSSIBLE</code> indicates that a line break <em>should</em> be inserted.
+<dl>
+<dt><u>Function:</u> int <b>uc_wordbreak_property</b><i> (ucs4_t <var>uc</var>)</i>
+<a name="IDX753"></a>
+</dt>
+<dd><p>Returns the Word_Break property of a Unicode character.
</p></dd></dl>
<hr size="6">
<table cellpadding="1" cellspacing="1" border="0">
-<tr><td valign="middle" align="left">[<a href="libunistring_10.html#SEC38" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
-<td valign="middle" align="left">[<a href="libunistring_12.html#SEC42" title="Next chapter"> &gt;&gt; </a>]</td>
+<tr><td valign="middle" align="left">[<a href="#SEC44" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
+<td valign="middle" align="left">[<a href="libunistring_12.html#SEC47" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
@@ -186,12 +178,12 @@ the values <code>UC_BREAK_MANDATORY</code>, <code>UC_BREAK_POSSIBLE</code>,
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
-<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td>
+<td valign="middle" align="left">[<a href="libunistring_19.html#SEC77" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<p>
<font size="-1">
- This document was generated by <em>Bruno Haible</em> on <em>March, 30 2010</em> using <a href="http://www.nongnu.org/texi2html/"><em>texi2html 1.78a</em></a>.
+ This document was generated by <em>Daiki Ueno</em> on <em>July, 8 2015</em> using <a href="http://www.nongnu.org/texi2html/"><em>texi2html 1.78a</em></a>.
</font>
<br>