diff options
Diffstat (limited to 'doc/libunistring_8.html')
-rw-r--r-- | doc/libunistring_8.html | 2071 |
1 files changed, 2071 insertions, 0 deletions
diff --git a/doc/libunistring_8.html b/doc/libunistring_8.html new file mode 100644 index 00000000..def5e04a --- /dev/null +++ b/doc/libunistring_8.html @@ -0,0 +1,2071 @@ +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd"> +<html> +<!-- Created on July, 1 2009 by texi2html 1.78a --> +<!-- +Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author) + Karl Berry <karl@freefriends.org> + Olaf Bachmann <obachman@mathematik.uni-kl.de> + and many others. +Maintained by: Many creative people. +Send bugs and suggestions to <texi2html-bug@nongnu.org> + +--> +<head> +<title>GNU libunistring: 8. Unicode character classification and properties <unictype.h></title> + +<meta name="description" content="GNU libunistring: 8. Unicode character classification and properties <unictype.h>"> +<meta name="keywords" content="GNU libunistring: 8. Unicode character classification and properties <unictype.h>"> +<meta name="resource-type" content="document"> +<meta name="distribution" content="global"> +<meta name="Generator" content="texi2html 1.78a"> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> +<style type="text/css"> +<!-- +a.summary-letter {text-decoration: none} +pre.display {font-family: serif} +pre.format {font-family: serif} +pre.menu-comment {font-family: serif} +pre.menu-preformatted {font-family: serif} +pre.smalldisplay {font-family: serif; font-size: smaller} +pre.smallexample {font-size: smaller} +pre.smallformat {font-family: serif; font-size: smaller} +pre.smalllisp {font-size: smaller} +span.roman {font-family:serif; font-weight:normal;} +span.sansserif {font-family:sans-serif; font-weight:normal;} +ul.toc {list-style: none} +--> +</style> + + +</head> + +<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> + +<table cellpadding="1" cellspacing="1" border="0"> +<tr><td valign="middle" align="left">[<a href="libunistring_7.html#SEC19" title="Beginning of this chapter or previous chapter"> << </a>]</td> +<td valign="middle" align="left">[<a href="libunistring_9.html#SEC37" title="Next chapter"> >> </a>]</td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td> +<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td> +<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td> +<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td> +</tr></table> + +<hr size="2"> +<a name="unictype_002eh"></a> +<a name="SEC20"></a> +<h1 class="chapter"> <a href="libunistring.html#TOC20">8. Unicode character classification and properties <code><unictype.h></code></a> </h1> + +<p>This include file declares functions that classify Unicode characters +and that test whether Unicode characters have specific properties. +</p> +<p>The classification assigns a “general category” to every Unicode +character. This is similar to the classification provided by ISO C in +<code><wctype.h></code>. +</p> +<p>Properties are the data that guides various text processing algorithms +in the presence of specific Unicode characters. +</p> + +<hr size="6"> +<a name="General-category"></a> +<a name="SEC21"></a> +<h2 class="section"> <a href="libunistring.html#TOC21">8.1 General category</a> </h2> + +<p>Every Unicode character or code point has a <em>general category</em> assigned +to it. This classification is important for most algorithms that work on +Unicode text. +</p> +<p>The GNU libunistring library provides two kinds of API for working with +general categories. The object oriented API uses a variable to denote +every predefined general category value or combinations thereof. The +low-level API uses a bit mask instead. The advantage of the object oriented +API is that if only a few predefined general category values are used, +the data tables are relatively small. When you combine general category +values (using <code>uc_general_category_or</code>, <code>uc_general_category_and</code>, +or <code>uc_general_category_and_not</code>), or when you use the low level +bit masks, a big table is used thats holds the complete general category +information for all Unicode characters. +</p> + +<hr size="6"> +<a name="Object-oriented-API"></a> +<a name="SEC22"></a> +<h3 class="subsection"> <a href="libunistring.html#TOC22">8.1.1 The object oriented API for general category</a> </h3> + +<dl> +<dt><u>Type:</u> <b>uc_general_category_t</b> +<a name="IDX241"></a> +</dt> +<dd><p>This data type denotes a general category value. It is an immediate type that +can be copied by simple assignment, without involving memory allocation. It is +not an array type. +</p></dd></dl> + +<p>The following are the predefined general category value. Additional general +categories may be added in the future. +</p> +<dl> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_L</b> +<a name="IDX242"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Lu</b> +<a name="IDX243"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Ll</b> +<a name="IDX244"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Lt</b> +<a name="IDX245"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Lm</b> +<a name="IDX246"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Lo</b> +<a name="IDX247"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_M</b> +<a name="IDX248"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Mn</b> +<a name="IDX249"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Mc</b> +<a name="IDX250"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Me</b> +<a name="IDX251"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_N</b> +<a name="IDX252"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Nd</b> +<a name="IDX253"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Nl</b> +<a name="IDX254"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_No</b> +<a name="IDX255"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_P</b> +<a name="IDX256"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Pc</b> +<a name="IDX257"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Pd</b> +<a name="IDX258"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Ps</b> +<a name="IDX259"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Pe</b> +<a name="IDX260"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Pi</b> +<a name="IDX261"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Pf</b> +<a name="IDX262"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Po</b> +<a name="IDX263"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_S</b> +<a name="IDX264"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Sm</b> +<a name="IDX265"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Sc</b> +<a name="IDX266"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Sk</b> +<a name="IDX267"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_So</b> +<a name="IDX268"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Z</b> +<a name="IDX269"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Zs</b> +<a name="IDX270"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Zl</b> +<a name="IDX271"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Zp</b> +<a name="IDX272"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_C</b> +<a name="IDX273"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Cc</b> +<a name="IDX274"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Cf</b> +<a name="IDX275"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Cs</b> +<a name="IDX276"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Co</b> +<a name="IDX277"></a> +</dt> +<dt><u>Constant:</u> uc_general_category_t <b>UC_CATEGORY_Cn</b> +<a name="IDX278"></a> +</dt> +</dl> + +<p>The following are alias names for predefined General category values. +</p> +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_LETTER</b> +<a name="IDX279"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_L</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_UPPERCASE_LETTER</b> +<a name="IDX280"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Lu</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_LOWERCASE_LETTER</b> +<a name="IDX281"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Ll</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_TITLECASE_LETTER</b> +<a name="IDX282"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Lt</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_MODIFIER_LETTER</b> +<a name="IDX283"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Lm</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_OTHER_LETTER</b> +<a name="IDX284"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Lo</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_MARK</b> +<a name="IDX285"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_M</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_NON_SPACING_MARK</b> +<a name="IDX286"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Mn</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_COMBINING_SPACING_MARK</b> +<a name="IDX287"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Mc</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_ENCLOSING_MARK</b> +<a name="IDX288"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Me</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_NUMBER</b> +<a name="IDX289"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_N</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_DECIMAL_DIGIT_NUMBER</b> +<a name="IDX290"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Nd</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_LETTER_NUMBER</b> +<a name="IDX291"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Nl</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_OTHER_NUMBER</b> +<a name="IDX292"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_No</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_PUNCTUATION</b> +<a name="IDX293"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_P</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_CONNECTOR_PUNCTUATION</b> +<a name="IDX294"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Pc</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_DASH_PUNCTUATION</b> +<a name="IDX295"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Pd</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_OPEN_PUNCTUATION</b> +<a name="IDX296"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Ps</code> (“start punctuation”). +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_CLOSE_PUNCTUATION</b> +<a name="IDX297"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Pe</code> (“end punctuation”). +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_INITIAL_QUOTE_PUNCTUATION</b> +<a name="IDX298"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Pi</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_FINAL_QUOTE_PUNCTUATION</b> +<a name="IDX299"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Pf</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_OTHER_PUNCTUATION</b> +<a name="IDX300"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Po</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_SYMBOL</b> +<a name="IDX301"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_S</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_MATH_SYMBOL</b> +<a name="IDX302"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Sm</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_CURRENCY_SYMBOL</b> +<a name="IDX303"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Sc</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_MODIFIER_SYMBOL</b> +<a name="IDX304"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Sk</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_OTHER_SYMBOL</b> +<a name="IDX305"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_So</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_SEPARATOR</b> +<a name="IDX306"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Z</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_SPACE_SEPARATOR</b> +<a name="IDX307"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Zs</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_LINE_SEPARATOR</b> +<a name="IDX308"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Zl</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_PARAGRAPH_SEPARATOR</b> +<a name="IDX309"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Zp</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_OTHER</b> +<a name="IDX310"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_C</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_CONTROL</b> +<a name="IDX311"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Cc</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_FORMAT</b> +<a name="IDX312"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Cf</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_SURROGATE</b> +<a name="IDX313"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Cs</code>. All code points in this +category are invalid characters. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_PRIVATE_USE</b> +<a name="IDX314"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Co</code>. +</p></dd></dl> + +<dl> +<dt><u>Macro:</u> uc_general_category_t <b>UC_UNASSIGNED</b> +<a name="IDX315"></a> +</dt> +<dd><p>This is another name for <code>UC_CATEGORY_Cn</code>. Some code points in this +category are invalid characters. +</p></dd></dl> + +<p>The following functions combine general categories, like in a boolean algebra, +except that there is no ‘<samp>not</samp>’ operation. +</p> +<dl> +<dt><u>Function:</u> uc_general_category_t <b>uc_general_category_or</b><i> (uc_general_category_t <var>category1</var>, uc_general_category_t <var>category2</var>)</i> +<a name="IDX316"></a> +</dt> +<dd><p>Returns the union of two general categories. +This corresponds to the unions of the two sets of characters. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> uc_general_category_t <b>uc_general_category_and</b><i> (uc_general_category_t <var>category1</var>, uc_general_category_t <var>category2</var>)</i> +<a name="IDX317"></a> +</dt> +<dd><p>Returns the intersection of two general categories as bit masks. +This <em>does not</em> correspond to the intersection of the two sets of +characters. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> uc_general_category_t <b>uc_general_category_and_not</b><i> (uc_general_category_t <var>category1</var>, uc_general_category_t <var>category2</var>)</i> +<a name="IDX318"></a> +</dt> +<dd><p>Returns the intersection of a general category with the complement of a +second general category, as bit masks. +This <em>does not</em> correspond to the intersection with complement, when +viewing the categories as sets of characters. +</p></dd></dl> + +<p>The following functions associate general categories with their name. +</p> +<dl> +<dt><u>Function:</u> const char * <b>uc_general_category_name</b><i> (uc_general_category_t <var>category</var>)</i> +<a name="IDX319"></a> +</dt> +<dd><p>Returns the name of a general category. +Returns NULL if the general category corresponds to a bit mask that does not +have a name. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> uc_general_category_t <b>uc_general_category_byname</b><i> (const char *<var>category_name</var>)</i> +<a name="IDX320"></a> +</dt> +<dd><p>Returns the general category given by name, e.g. <code>"Lu"</code>. +</p></dd></dl> + +<p>The following functions view general categories as sets of Unicode characters. +</p> +<dl> +<dt><u>Function:</u> uc_general_category_t <b>uc_general_category</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX321"></a> +</dt> +<dd><p>Returns the general category of a Unicode character. +</p> +<p>This function uses a big table. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_general_category</b><i> (ucs4_t <var>uc</var>, uc_general_category_t <var>category</var>)</i> +<a name="IDX322"></a> +</dt> +<dd><p>Tests whether a Unicode character belongs to a given category. +The <var>category</var> argument can be a predefined general category or the +combination of several predefined general categories. +</p></dd></dl> + +<hr size="6"> +<a name="Bit-mask-API"></a> +<a name="SEC23"></a> +<h3 class="subsection"> <a href="libunistring.html#TOC23">8.1.2 The bit mask API for general category</a> </h3> + +<p>The following are the predefined general category value as bit masks. +Additional general categories may be added in the future. +</p> +<dl> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_L</b> +<a name="IDX323"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Lu</b> +<a name="IDX324"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Ll</b> +<a name="IDX325"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Lt</b> +<a name="IDX326"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Lm</b> +<a name="IDX327"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Lo</b> +<a name="IDX328"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_M</b> +<a name="IDX329"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Mn</b> +<a name="IDX330"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Mc</b> +<a name="IDX331"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Me</b> +<a name="IDX332"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_N</b> +<a name="IDX333"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Nd</b> +<a name="IDX334"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Nl</b> +<a name="IDX335"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_No</b> +<a name="IDX336"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_P</b> +<a name="IDX337"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Pc</b> +<a name="IDX338"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Pd</b> +<a name="IDX339"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Ps</b> +<a name="IDX340"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Pe</b> +<a name="IDX341"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Pi</b> +<a name="IDX342"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Pf</b> +<a name="IDX343"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Po</b> +<a name="IDX344"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_S</b> +<a name="IDX345"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Sm</b> +<a name="IDX346"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Sc</b> +<a name="IDX347"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Sk</b> +<a name="IDX348"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_So</b> +<a name="IDX349"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Z</b> +<a name="IDX350"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Zs</b> +<a name="IDX351"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Zl</b> +<a name="IDX352"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Zp</b> +<a name="IDX353"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_C</b> +<a name="IDX354"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Cc</b> +<a name="IDX355"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Cf</b> +<a name="IDX356"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Cs</b> +<a name="IDX357"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Co</b> +<a name="IDX358"></a> +</dt> +<dt><u>Macro:</u> uint32_t <b>UC_CATEGORY_MASK_Cn</b> +<a name="IDX359"></a> +</dt> +</dl> + +<p>The following function views general categories as sets of Unicode characters. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_general_category_withtable</b><i> (ucs4_t <var>uc</var>, uint32_t <var>bitmask</var>)</i> +<a name="IDX360"></a> +</dt> +<dd><p>Tests whether a Unicode character belongs to a given category. +The <var>bitmask</var> argument can be a predefined general category bitmask or the +combination of several predefined general category bitmasks. +</p> +<p>This function uses a big table comprising all general categories. +</p></dd></dl> + +<hr size="6"> +<a name="Canonical-combining-class"></a> +<a name="SEC24"></a> +<h2 class="section"> <a href="libunistring.html#TOC24">8.2 Canonical combining class</a> </h2> + +<p>Every Unicode character or code point has a <em>canonical combining class</em> +assigned to it. +</p> +<p>What is the meaning of the canonical combining class? Essentially, it +indicates the priority with which a combining character is attached to its +base character. The characters for which the canonical combining class is 0 +are the base characters, and the characters for which it is greater than 0 are +the combining characters. Combining characters are rendered +near/attached/around their base character, and combining characters with small +combining classes are attached "first" or "closer" to the base character. +</p> +<p>The canonical combining class of a character is a number in the range +0..255. The possible values are described in the Unicode Character Database +<a href="http://www.unicode.org/Public/UNIDATA/UCD.html">http://www.unicode.org/Public/UNIDATA/UCD.html</a>. The list here is +not definitive; more values can be added in future versions. +</p> +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_NR</b> +<a name="IDX361"></a> +</dt> +<dd><p>The canonical combining class value for “Not Reordered” characters. +The value is 0. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_OV</b> +<a name="IDX362"></a> +</dt> +<dd><p>The canonical combining class value for “Overlay” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_NK</b> +<a name="IDX363"></a> +</dt> +<dd><p>The canonical combining class value for “Nukta” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_KV</b> +<a name="IDX364"></a> +</dt> +<dd><p>The canonical combining class value for “Kana Voicing” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_VR</b> +<a name="IDX365"></a> +</dt> +<dd><p>The canonical combining class value for “Virama” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_ATBL</b> +<a name="IDX366"></a> +</dt> +<dd><p>The canonical combining class value for “Attached Below Left” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_ATB</b> +<a name="IDX367"></a> +</dt> +<dd><p>The canonical combining class value for “Attached Below” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_ATAR</b> +<a name="IDX368"></a> +</dt> +<dd><p>The canonical combining class value for “Attached Above Right” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_BL</b> +<a name="IDX369"></a> +</dt> +<dd><p>The canonical combining class value for “Below Left” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_B</b> +<a name="IDX370"></a> +</dt> +<dd><p>The canonical combining class value for “Below” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_BR</b> +<a name="IDX371"></a> +</dt> +<dd><p>The canonical combining class value for “Below Right” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_L</b> +<a name="IDX372"></a> +</dt> +<dd><p>The canonical combining class value for “Left” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_R</b> +<a name="IDX373"></a> +</dt> +<dd><p>The canonical combining class value for “Right” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_AL</b> +<a name="IDX374"></a> +</dt> +<dd><p>The canonical combining class value for “Above Left” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_A</b> +<a name="IDX375"></a> +</dt> +<dd><p>The canonical combining class value for “Above” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_AR</b> +<a name="IDX376"></a> +</dt> +<dd><p>The canonical combining class value for “Above Right” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_DB</b> +<a name="IDX377"></a> +</dt> +<dd><p>The canonical combining class value for “Double Below” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_DA</b> +<a name="IDX378"></a> +</dt> +<dd><p>The canonical combining class value for “Double Above” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_CCC_IS</b> +<a name="IDX379"></a> +</dt> +<dd><p>The canonical combining class value for “Iota Subscript” characters. +</p></dd></dl> + +<p>The following function looks up the canonical combining class of a character. +</p> +<dl> +<dt><u>Function:</u> int <b>uc_combining_class</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX380"></a> +</dt> +<dd><p>Returns the canonical combining class of a Unicode character. +</p></dd></dl> + +<hr size="6"> +<a name="Bidirectional-category"></a> +<a name="SEC25"></a> +<h2 class="section"> <a href="libunistring.html#TOC25">8.3 Bidirectional category</a> </h2> + +<p>Every Unicode character or code point has a <em>bidirectional category</em> +assigned to it. +</p> +<p>The bidirectional category guides the bidirectional algorithm +(<a href="http://www.unicode.org/reports/tr9/">http://www.unicode.org/reports/tr9/</a>). The possible values are +the following. +</p> +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_L</b> +<a name="IDX381"></a> +</dt> +<dd><p>The bidirectional category for `Left-to-Right`” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_LRE</b> +<a name="IDX382"></a> +</dt> +<dd><p>The bidirectional category for “Left-to-Right Embedding” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_LRO</b> +<a name="IDX383"></a> +</dt> +<dd><p>The bidirectional category for “Left-to-Right Override” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_R</b> +<a name="IDX384"></a> +</dt> +<dd><p>The bidirectional category for “Right-to-Left” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_AL</b> +<a name="IDX385"></a> +</dt> +<dd><p>The bidirectional category for “Right-to-Left Arabic” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_RLE</b> +<a name="IDX386"></a> +</dt> +<dd><p>The bidirectional category for “Right-to-Left Embedding” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_RLO</b> +<a name="IDX387"></a> +</dt> +<dd><p>The bidirectional category for “Right-to-Left Override” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_PDF</b> +<a name="IDX388"></a> +</dt> +<dd><p>The bidirectional category for “Pop Directional Format” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_EN</b> +<a name="IDX389"></a> +</dt> +<dd><p>The bidirectional category for “European Number” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_ES</b> +<a name="IDX390"></a> +</dt> +<dd><p>The bidirectional category for “European Number Separator” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_ET</b> +<a name="IDX391"></a> +</dt> +<dd><p>The bidirectional category for “European Number Terminator” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_AN</b> +<a name="IDX392"></a> +</dt> +<dd><p>The bidirectional category for “Arabic Number” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_CS</b> +<a name="IDX393"></a> +</dt> +<dd><p>The bidirectional category for “Common Number Separator” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_NSM</b> +<a name="IDX394"></a> +</dt> +<dd><p>The bidirectional category for “Non-Spacing Mark” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_BN</b> +<a name="IDX395"></a> +</dt> +<dd><p>The bidirectional category for “Boundary Neutral” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_B</b> +<a name="IDX396"></a> +</dt> +<dd><p>The bidirectional category for “Paragraph Separator” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_S</b> +<a name="IDX397"></a> +</dt> +<dd><p>The bidirectional category for “Segment Separator” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_WS</b> +<a name="IDX398"></a> +</dt> +<dd><p>The bidirectional category for “Whitespace” characters. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_BIDI_ON</b> +<a name="IDX399"></a> +</dt> +<dd><p>The bidirectional category for “Other Neutral” characters. +</p></dd></dl> + +<p>The following functions implement the association between a bidirectional +category and its name. +</p> +<dl> +<dt><u>Function:</u> const char * <b>uc_bidi_category_name</b><i> (int <var>category</var>)</i> +<a name="IDX400"></a> +</dt> +<dd><p>Returns the name of a bidirectional category. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> int <b>uc_bidi_category_byname</b><i> (const char *<var>category_name</var>)</i> +<a name="IDX401"></a> +</dt> +<dd><p>Returns the bidirectional category given by name, e.g. <code>"LRE"</code>. +</p></dd></dl> + +<p>The following functions view bidirectional categories as sets of Unicode +characters. +</p> +<dl> +<dt><u>Function:</u> int <b>uc_bidi_category</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX402"></a> +</dt> +<dd><p>Returns the bidirectional category of a Unicode character. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_bidi_category</b><i> (ucs4_t <var>uc</var>, int <var>category</var>)</i> +<a name="IDX403"></a> +</dt> +<dd><p>Tests whether a Unicode character belongs to a given bidirectional category. +</p></dd></dl> + +<hr size="6"> +<a name="Decimal-digit-value"></a> +<a name="SEC26"></a> +<h2 class="section"> <a href="libunistring.html#TOC26">8.4 Decimal digit value</a> </h2> + +<p>Decimal digits (like the digits from ‘<samp>0</samp>’ to ‘<samp>9</samp>’) exist in many +scripts. The following function converts a decimal digit character to its +numerical value. +</p> +<dl> +<dt><u>Function:</u> int <b>uc_decimal_value</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX404"></a> +</dt> +<dd><p>Returns the decimal digit value of a Unicode character. +The return value is an integer in the range 0..9, or -1 for characters that +do not represent a decimal digit. +</p></dd></dl> + +<hr size="6"> +<a name="Digit-value"></a> +<a name="SEC27"></a> +<h2 class="section"> <a href="libunistring.html#TOC27">8.5 Digit value</a> </h2> + +<p>Digit characters are like decimal digit characters, possibly in special forms, +like as superscript, subscript, or circled. The following function converts a +digit character to its numerical value. +</p> +<dl> +<dt><u>Function:</u> int <b>uc_digit_value</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX405"></a> +</dt> +<dd><p>Returns the digit value of a Unicode character. +The return value is an integer in the range 0..9, or -1 for characters that +do not represent a digit. +</p></dd></dl> + +<hr size="6"> +<a name="Numeric-value"></a> +<a name="SEC28"></a> +<h2 class="section"> <a href="libunistring.html#TOC28">8.6 Numeric value</a> </h2> + +<p>There are also characters that represent numbers without a digit system, like +the Roman numerals, and fractional numbers, like 1/4 or 3/4. +</p> +<p>The following type represents the numeric value of a Unicode character. +</p><dl> +<dt><u>Type:</u> <b>uc_fraction_t</b> +<a name="IDX406"></a> +</dt> +<dd><p>This is a structure type with the following fields: +</p><table><tr><td> </td><td><pre class="smallexample">int numerator; +int denominator; +</pre></td></tr></table> +<p>An integer <var>n</var> is represented by <code>numerator = <var>n</var></code>, +<code>denominator = 1</code>. +</p></dd></dl> + +<p>The following function converts a number character to its numerical value. +</p> +<dl> +<dt><u>Function:</u> uc_fraction_t <b>uc_numeric_value</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX407"></a> +</dt> +<dd><p>Returns the numeric value of a Unicode character. +The return value is a fraction, or the pseudo-fraction <code>{ 0, 0 }</code> for +characters that do not represent a number. +</p></dd></dl> + +<hr size="6"> +<a name="Mirrored-character"></a> +<a name="SEC29"></a> +<h2 class="section"> <a href="libunistring.html#TOC29">8.7 Mirrored character</a> </h2> + +<p>Character mirroring is used to associate the closing parenthesis character +to the opening parenthesis character, the closing brace character with the +opening brace character, and so on. +</p> +<p>The following function looks up the mirrored character of a Unicode character. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_mirror_char</b><i> (ucs4_t <var>uc</var>, ucs4_t *<var>puc</var>)</i> +<a name="IDX408"></a> +</dt> +<dd><p>Stores the mirrored character of a Unicode character <var>uc</var> in +<code>*<var>puc</var></code> and returns <code>true</code>, if it exists. Otherwise it +stores <var>uc</var> unmodified in <code>*<var>puc</var></code> and returns <code>false</code>. +</p></dd></dl> + +<hr size="6"> +<a name="Properties"></a> +<a name="SEC30"></a> +<h2 class="section"> <a href="libunistring.html#TOC30">8.8 Properties</a> </h2> + +<p>This section defines boolean properties of Unicode characters. This +means, a character either has the given property or does not have it. +In other words, the property can be viewed as a subset of the set of +Unicode characters. +</p> +<p>The GNU libunistring library provides two kinds of API for working with +properties. The object oriented API uses a type <code>uc_property_t</code> +to designate a property. In the function-based API, which is a bit more +low level, a property is merely a function. +</p> + +<hr size="6"> +<a name="Properties-as-objects"></a> +<a name="SEC31"></a> +<h3 class="subsection"> <a href="libunistring.html#TOC31">8.8.1 Properties as objects – the object oriented API</a> </h3> + +<p>The following type designates a property on Unicode characters. +</p> +<dl> +<dt><u>Type:</u> <b>uc_property_t</b> +<a name="IDX409"></a> +</dt> +<dd><p>This data type denotes a boolean property on Unicode characters. It is an +immediate type that can be copied by simple assignment, without involving +memory allocation. It is not an array type. +</p></dd></dl> + +<p>Many Unicode properties are predefined. +</p> +<p>The following are general properties. +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_WHITE_SPACE</b> +<a name="IDX410"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_ALPHABETIC</b> +<a name="IDX411"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_ALPHABETIC</b> +<a name="IDX412"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_NOT_A_CHARACTER</b> +<a name="IDX413"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_DEFAULT_IGNORABLE_CODE_POINT</b> +<a name="IDX414"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_DEFAULT_IGNORABLE_CODE_POINT</b> +<a name="IDX415"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_DEPRECATED</b> +<a name="IDX416"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_LOGICAL_ORDER_EXCEPTION</b> +<a name="IDX417"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_VARIATION_SELECTOR</b> +<a name="IDX418"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_PRIVATE_USE</b> +<a name="IDX419"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_UNASSIGNED_CODE_VALUE</b> +<a name="IDX420"></a> +</dt> +</dl> + +<p>The following properties are related to case folding. +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_UPPERCASE</b> +<a name="IDX421"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_UPPERCASE</b> +<a name="IDX422"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_LOWERCASE</b> +<a name="IDX423"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_LOWERCASE</b> +<a name="IDX424"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_TITLECASE</b> +<a name="IDX425"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_SOFT_DOTTED</b> +<a name="IDX426"></a> +</dt> +</dl> + +<p>The following properties are related to identifiers. +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_ID_START</b> +<a name="IDX427"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_ID_START</b> +<a name="IDX428"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_ID_CONTINUE</b> +<a name="IDX429"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_ID_CONTINUE</b> +<a name="IDX430"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_XID_START</b> +<a name="IDX431"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_XID_CONTINUE</b> +<a name="IDX432"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_PATTERN_WHITE_SPACE</b> +<a name="IDX433"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_PATTERN_SYNTAX</b> +<a name="IDX434"></a> +</dt> +</dl> + +<p>The following properties have an influence on shaping and rendering. +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_JOIN_CONTROL</b> +<a name="IDX435"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_GRAPHEME_BASE</b> +<a name="IDX436"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_GRAPHEME_EXTEND</b> +<a name="IDX437"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_GRAPHEME_EXTEND</b> +<a name="IDX438"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_GRAPHEME_LINK</b> +<a name="IDX439"></a> +</dt> +</dl> + +<p>The following properties relate to bidirectional reordering. +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_CONTROL</b> +<a name="IDX440"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_LEFT_TO_RIGHT</b> +<a name="IDX441"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_HEBREW_RIGHT_TO_LEFT</b> +<a name="IDX442"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_ARABIC_RIGHT_TO_LEFT</b> +<a name="IDX443"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_EUROPEAN_DIGIT</b> +<a name="IDX444"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_EUR_NUM_SEPARATOR</b> +<a name="IDX445"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_EUR_NUM_TERMINATOR</b> +<a name="IDX446"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_ARABIC_DIGIT</b> +<a name="IDX447"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_COMMON_SEPARATOR</b> +<a name="IDX448"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_BLOCK_SEPARATOR</b> +<a name="IDX449"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_SEGMENT_SEPARATOR</b> +<a name="IDX450"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_WHITESPACE</b> +<a name="IDX451"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_NON_SPACING_MARK</b> +<a name="IDX452"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_BOUNDARY_NEUTRAL</b> +<a name="IDX453"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_PDF</b> +<a name="IDX454"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_EMBEDDING_OR_OVERRIDE</b> +<a name="IDX455"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_BIDI_OTHER_NEUTRAL</b> +<a name="IDX456"></a> +</dt> +</dl> + +<p>The following properties deal with number representations. +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_HEX_DIGIT</b> +<a name="IDX457"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_ASCII_HEX_DIGIT</b> +<a name="IDX458"></a> +</dt> +</dl> + +<p>The following properties deal with CJK. +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_IDEOGRAPHIC</b> +<a name="IDX459"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_UNIFIED_IDEOGRAPH</b> +<a name="IDX460"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_RADICAL</b> +<a name="IDX461"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_IDS_BINARY_OPERATOR</b> +<a name="IDX462"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_IDS_TRINARY_OPERATOR</b> +<a name="IDX463"></a> +</dt> +</dl> + +<p>Other miscellaneous properties are: +</p> +<dl> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_ZERO_WIDTH</b> +<a name="IDX464"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_SPACE</b> +<a name="IDX465"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_NON_BREAK</b> +<a name="IDX466"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_ISO_CONTROL</b> +<a name="IDX467"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_FORMAT_CONTROL</b> +<a name="IDX468"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_DASH</b> +<a name="IDX469"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_HYPHEN</b> +<a name="IDX470"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_PUNCTUATION</b> +<a name="IDX471"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_LINE_SEPARATOR</b> +<a name="IDX472"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_PARAGRAPH_SEPARATOR</b> +<a name="IDX473"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_QUOTATION_MARK</b> +<a name="IDX474"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_SENTENCE_TERMINAL</b> +<a name="IDX475"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_TERMINAL_PUNCTUATION</b> +<a name="IDX476"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_CURRENCY_SYMBOL</b> +<a name="IDX477"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_MATH</b> +<a name="IDX478"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_OTHER_MATH</b> +<a name="IDX479"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_PAIRED_PUNCTUATION</b> +<a name="IDX480"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_LEFT_OF_PAIR</b> +<a name="IDX481"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_COMBINING</b> +<a name="IDX482"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_COMPOSITE</b> +<a name="IDX483"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_DECIMAL_DIGIT</b> +<a name="IDX484"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_NUMERIC</b> +<a name="IDX485"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_DIACRITIC</b> +<a name="IDX486"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_EXTENDER</b> +<a name="IDX487"></a> +</dt> +<dt><u>Constant:</u> uc_property_t <b>UC_PROPERTY_IGNORABLE_CONTROL</b> +<a name="IDX488"></a> +</dt> +</dl> + +<p>The following function looks up a property by its name. +</p> +<dl> +<dt><u>Function:</u> uc_property_t <b>uc_property_byname</b><i> (const char *<var>property_name</var>)</i> +<a name="IDX489"></a> +</dt> +<dd><p>Returns the property given by name, e.g. <code>"White space"</code>. If a property +with the given name exists, the result will satisfy the +<code>uc_property_is_valid</code> predicate. Otherwise the result will not satisfy +this predicate and must not be passed to functions that expect an +<code>uc_property_t</code> argument. +</p> +<p>This function references a big table of all predefined properties. Its use +can significantly increase the size of your application. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_property_is_valid</b><i> (uc_property_t property)</i> +<a name="IDX490"></a> +</dt> +<dd><p>Returns <code>true</code> when the given property is valid, or <code>false</code> +otherwise. +</p></dd></dl> + +<p>The following function views a property as a set of Unicode characters. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property</b><i> (ucs4_t <var>uc</var>, uc_property_t <var>property</var>)</i> +<a name="IDX491"></a> +</dt> +<dd><p>Tests whether the Unicode character <var>uc</var> has the given property. +</p></dd></dl> + +<hr size="6"> +<a name="Properties-as-functions"></a> +<a name="SEC32"></a> +<h3 class="subsection"> <a href="libunistring.html#TOC32">8.8.2 Properties as functions – the functional API</a> </h3> + +<p>The following are general properties. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_white_space</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX492"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_alphabetic</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX493"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_alphabetic</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX494"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_not_a_character</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX495"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_default_ignorable_code_point</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX496"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_default_ignorable_code_point</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX497"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_deprecated</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX498"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_logical_order_exception</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX499"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_variation_selector</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX500"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_private_use</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX501"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_unassigned_code_value</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX502"></a> +</dt> +</dl> + +<p>The following properties are related to case folding. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_uppercase</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX503"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_uppercase</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX504"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_lowercase</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX505"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_lowercase</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX506"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_titlecase</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX507"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_soft_dotted</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX508"></a> +</dt> +</dl> + +<p>The following properties are related to identifiers. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_id_start</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX509"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_id_start</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX510"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_id_continue</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX511"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_id_continue</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX512"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_xid_start</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX513"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_xid_continue</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX514"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_pattern_white_space</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX515"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_pattern_syntax</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX516"></a> +</dt> +</dl> + +<p>The following properties have an influence on shaping and rendering. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_join_control</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX517"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_grapheme_base</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX518"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_grapheme_extend</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX519"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_grapheme_extend</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX520"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_grapheme_link</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX521"></a> +</dt> +</dl> + +<p>The following properties relate to bidirectional reordering. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_control</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX522"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_left_to_right</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX523"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_hebrew_right_to_left</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX524"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_arabic_right_to_left</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX525"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_european_digit</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX526"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_eur_num_separator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX527"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_eur_num_terminator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX528"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_arabic_digit</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX529"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_common_separator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX530"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_block_separator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX531"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_segment_separator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX532"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_whitespace</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX533"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_non_spacing_mark</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX534"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_boundary_neutral</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX535"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_pdf</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX536"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_embedding_or_override</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX537"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_bidi_other_neutral</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX538"></a> +</dt> +</dl> + +<p>The following properties deal with number representations. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_hex_digit</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX539"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_ascii_hex_digit</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX540"></a> +</dt> +</dl> + +<p>The following properties deal with CJK. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_ideographic</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX541"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_unified_ideograph</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX542"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_radical</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX543"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_ids_binary_operator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX544"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_ids_trinary_operator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX545"></a> +</dt> +</dl> + +<p>Other miscellaneous properties are: +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_property_zero_width</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX546"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_space</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX547"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_non_break</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX548"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_iso_control</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX549"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_format_control</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX550"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_dash</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX551"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_hyphen</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX552"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_punctuation</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX553"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_line_separator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX554"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_paragraph_separator</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX555"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_quotation_mark</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX556"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_sentence_terminal</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX557"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_terminal_punctuation</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX558"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_currency_symbol</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX559"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_math</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX560"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_other_math</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX561"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_paired_punctuation</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX562"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_left_of_pair</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX563"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_combining</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX564"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_composite</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX565"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_decimal_digit</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX566"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_numeric</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX567"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_diacritic</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX568"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_extender</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX569"></a> +</dt> +<dt><u>Function:</u> bool <b>uc_is_property_ignorable_control</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX570"></a> +</dt> +</dl> + +<hr size="6"> +<a name="Scripts"></a> +<a name="SEC33"></a> +<h2 class="section"> <a href="libunistring.html#TOC33">8.9 Scripts</a> </h2> + +<p>The Unicode characters are subdivided into scripts. +</p> +<p>The following type is used to represent a script: +</p> +<dl> +<dt><u>Type:</u> <b>uc_script_t</b> +<a name="IDX571"></a> +</dt> +<dd><p>This data type is a structure type that refers to statically allocated +read-only data. It contains the following fields: +</p><table><tr><td> </td><td><pre class="smallexample">const char *name; +</pre></td></tr></table> + +<p>The <code>name</code> field contains the name of the script. +</p></dd></dl> + +<a name="IDX572"></a> +<p>The following functions look up a script. +</p> +<dl> +<dt><u>Function:</u> const uc_script_t * <b>uc_script</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX573"></a> +</dt> +<dd><p>Returns the script of a Unicode character. Returns NULL if <var>uc</var> does not +belong to any script. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> const uc_script_t * <b>uc_script_byname</b><i> (const char *<var>script_name</var>)</i> +<a name="IDX574"></a> +</dt> +<dd><p>Returns the script given by its name, e.g. <code>"HAN"</code>. Returns NULL if a +script with the given name does not exist. +</p></dd></dl> + +<p>The following function views a script as a set of Unicode characters. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_script</b><i> (ucs4_t <var>uc</var>, const uc_script_t *<var>script</var>)</i> +<a name="IDX575"></a> +</dt> +<dd><p>Tests whether a Unicode character belongs to a given script. +</p></dd></dl> + +<p>The following gives a global picture of all scripts. +</p> +<dl> +<dt><u>Function:</u> void <b>uc_all_scripts</b><i> (const uc_script_t **<var>scripts</var>, size_t *<var>count</var>)</i> +<a name="IDX576"></a> +</dt> +<dd><p>Get the list of all scripts. Stores a pointer to an array of all scripts in +<code>*<var>scripts</var></code> and the length of this array in <code>*<var>count</var></code>. +</p></dd></dl> + +<hr size="6"> +<a name="Blocks"></a> +<a name="SEC34"></a> +<h2 class="section"> <a href="libunistring.html#TOC34">8.10 Blocks</a> </h2> + +<p>The Unicode characters are subdivided into blocks. A block is an interval of +Unicode code points. +</p> +<p>The following type is used to represent a block. +</p> +<dl> +<dt><u>Type:</u> <b>uc_block_t</b> +<a name="IDX577"></a> +</dt> +<dd><p>This data type is a structure type that refers to statically allocated data. +It contains the following fields: +</p><table><tr><td> </td><td><pre class="smallexample">ucs4_t start; +ucs4_t end; +const char *name; +</pre></td></tr></table> + +<p>The <code>start</code> field is the first Unicode code point in the block. +</p> +<p>The <code>end</code> field is the last Unicode code point in the block. +</p> +<p>The <code>name</code> field is the name of the block. +</p></dd></dl> + +<a name="IDX578"></a> +<p>The following function looks up a block. +</p> +<dl> +<dt><u>Function:</u> const uc_block_t * <b>uc_block</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX579"></a> +</dt> +<dd><p>Returns the block a character belongs to. +</p></dd></dl> + +<p>The following function views a block as a set of Unicode characters. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_block</b><i> (ucs4_t <var>uc</var>, const uc_block_t *<var>block</var>)</i> +<a name="IDX580"></a> +</dt> +<dd><p>Tests whether a Unicode character belongs to a given block. +</p></dd></dl> + +<p>The following gives a global picture of all block. +</p> +<dl> +<dt><u>Function:</u> void <b>uc_all_blocks</b><i> (const uc_block_t **<var>blocks</var>, size_t *<var>count</var>)</i> +<a name="IDX581"></a> +</dt> +<dd><p>Get the list of all blocks. Stores a pointer to an array of all blocks in +<code>*<var>blocks</var></code> and the length of this array in <code>*<var>count</var></code>. +</p></dd></dl> + +<hr size="6"> +<a name="ISO-C-and-Java-syntax"></a> +<a name="SEC35"></a> +<h2 class="section"> <a href="libunistring.html#TOC35">8.11 ISO C and Java syntax</a> </h2> + +<p>The following properties are taken from language standards. The supported +language standards are ISO C 99 and Java. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_c_whitespace</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX582"></a> +</dt> +<dd><p>Tests whether a Unicode character is considered whitespace in ISO C 99. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_java_whitespace</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX583"></a> +</dt> +<dd><p>Tests whether a Unicode character is considered whitespace in Java. +</p></dd></dl> + +<p>The following enumerated values are the possible return values of the functions +<code>uc_c_ident_category</code> and <code>uc_java_ident_category</code>. +</p> +<dl> +<dt><u>Constant:</u> int <b>UC_IDENTIFIER_START</b> +<a name="IDX584"></a> +</dt> +<dd><p>This return value means that the given character is valid as first or +subsequent character in an identifier. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_IDENTIFIER_VALID</b> +<a name="IDX585"></a> +</dt> +<dd><p>This return value means that the given character is valid as subsequent +character only. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_IDENTIFIER_INVALID</b> +<a name="IDX586"></a> +</dt> +<dd><p>This return value means that the given character is not valid in an identifier. +</p></dd></dl> + +<dl> +<dt><u>Constant:</u> int <b>UC_IDENTIFIER_IGNORABLE</b> +<a name="IDX587"></a> +</dt> +<dd><p>This return value (only for Java) means that the given character is ignorable. +</p></dd></dl> + +<p>The following function determine whether a given character can be a constituent +of an identifier in the given programming language. +</p> +<a name="IDX588"></a> +<dl> +<dt><u>Function:</u> int <b>uc_c_ident_category</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX589"></a> +</dt> +<dd><p>Returns the categorization of a Unicode character with respect to the ISO C 99 +identifier syntax. +</p></dd></dl> + +<a name="IDX590"></a> +<dl> +<dt><u>Function:</u> int <b>uc_java_ident_category</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX591"></a> +</dt> +<dd><p>Returns the categorization of a Unicode character with respect to the Java +identifier syntax. +</p></dd></dl> + +<hr size="6"> +<a name="Classifications-like-in-ISO-C"></a> +<a name="SEC36"></a> +<h2 class="section"> <a href="libunistring.html#TOC36">8.12 Classifications like in ISO C</a> </h2> + +<p>The following character classifications mimic those declared in the ISO C +header files <code><ctype.h></code> and <code><wctype.h></code>. These functions are +deprecated, because this set of functions was designed with ASCII in mind and +cannot reflect the more diverse reality of the Unicode character set. But +they can be a quick-and-dirty porting aid when migrating from <code>wchar_t</code> +APIs to Unicode strings. +</p> +<dl> +<dt><u>Function:</u> bool <b>uc_is_alnum</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX592"></a> +</dt> +<dd><p>Tests for any character for which <code>uc_is_alpha</code> or <code>uc_is_digit</code> is +true. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_alpha</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX593"></a> +</dt> +<dd><p>Tests for any character for which <code>uc_is_upper</code> or <code>uc_is_lower</code> is +true, or any character that is one of a locale-specific set of characters for +which none of <code>uc_is_cntrl</code>, <code>uc_is_digit</code>, <code>uc_is_punct</code>, or +<code>uc_is_space</code> is true. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_cntrl</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX594"></a> +</dt> +<dd><p>Tests for any control character. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_digit</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX595"></a> +</dt> +<dd><p>Tests for any character that corresponds to a decimal-digit character. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_graph</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX596"></a> +</dt> +<dd><p>Tests for any character for which <code>uc_is_print</code> is true and +<code>uc_is_space</code> is false. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_lower</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX597"></a> +</dt> +<dd><p>Tests for any character that corresponds to a lowercase letter or is one +of a locale-specific set of characters for which none of <code>uc_is_cntrl</code>, +<code>uc_is_digit</code>, <code>uc_is_punct</code>, or <code>uc_is_space</code> is true. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_print</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX598"></a> +</dt> +<dd><p>Tests for any printing character. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_punct</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX599"></a> +</dt> +<dd><p>Tests for any printing character that is one of a locale-specific set of +characters for which neither <code>uc_is_space</code> nor <code>uc_is_alnum</code> is true. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_space</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX600"></a> +</dt> +<dd><p>Test for any character that corresponds to a locale-specific set of characters +for which none of <code>uc_is_alnum</code>, <code>uc_is_graph</code>, or <code>uc_is_punct</code> +is true. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_upper</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX601"></a> +</dt> +<dd><p>Tests for any character that corresponds to an uppercase letter or is one +of a locale-specific set of characters for which none of <code>uc_is_cntrl</code>, +<code>uc_is_digit</code>, <code>uc_is_punct</code>, or <code>uc_is_space</code> is true. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_xdigit</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX602"></a> +</dt> +<dd><p>Tests for any character that corresponds to a hexadecimal-digit character. +</p></dd></dl> + +<dl> +<dt><u>Function:</u> bool <b>uc_is_blank</b><i> (ucs4_t <var>uc</var>)</i> +<a name="IDX603"></a> +</dt> +<dd><p>Tests for any character that corresponds to a standard blank character or +a locale-specific set of characters for which <code>uc_is_alnum</code> is false. +</p></dd></dl> +<hr size="6"> +<table cellpadding="1" cellspacing="1" border="0"> +<tr><td valign="middle" align="left">[<a href="#SEC20" title="Beginning of this chapter or previous chapter"> << </a>]</td> +<td valign="middle" align="left">[<a href="libunistring_9.html#SEC37" title="Next chapter"> >> </a>]</td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left"> </td> +<td valign="middle" align="left">[<a href="libunistring.html#SEC_Top" title="Cover (top) of document">Top</a>]</td> +<td valign="middle" align="left">[<a href="libunistring.html#SEC_Contents" title="Table of contents">Contents</a>]</td> +<td valign="middle" align="left">[<a href="libunistring_18.html#SEC71" title="Index">Index</a>]</td> +<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td> +</tr></table> +<p> + <font size="-1"> + This document was generated by <em>Bruno Haible</em> on <em>July, 1 2009</em> using <a href="http://www.nongnu.org/texi2html/"><em>texi2html 1.78a</em></a>. + </font> + <br> + +</p> +</body> +</html> |