diff options
author | Manuel A. Fernandez Montecelo <manuel.montezelo@gmail.com> | 2016-05-27 14:28:30 +0100 |
---|---|---|
committer | Manuel A. Fernandez Montecelo <manuel.montezelo@gmail.com> | 2016-05-27 14:28:30 +0100 |
commit | a9a31b1de5776a3b08a82101a4fa711294f0dd1d (patch) | |
tree | 159134a624e51509f40ed8823249f09a70d1dda3 /doc/unictype.texi | |
parent | 5f2b09982312c98863eb9a8dfe2c608b81f58259 (diff) |
Imported Upstream version 0.9.6+really0.9.3upstream/0.9.6+really0.9.3
Diffstat (limited to 'doc/unictype.texi')
-rw-r--r-- | doc/unictype.texi | 311 |
1 files changed, 36 insertions, 275 deletions
diff --git a/doc/unictype.texi b/doc/unictype.texi index 5f292cc0..129159c7 100644 --- a/doc/unictype.texi +++ b/doc/unictype.texi @@ -14,12 +14,11 @@ in the presence of specific Unicode characters. @menu * General category:: * Canonical combining class:: -* Bidi class:: +* Bidirectional category:: * Decimal digit value:: * Digit value:: * Numeric value:: * Mirrored character:: -* Arabic shaping:: * Properties:: * Scripts:: * Blocks:: @@ -66,7 +65,6 @@ The following are the predefined general category value. Additional general categories may be added in the future. @deftypevr Constant uc_general_category_t UC_CATEGORY_L -@deftypevrx Constant uc_general_category_t UC_CATEGORY_LC @deftypevrx Constant uc_general_category_t UC_CATEGORY_Lu @deftypevrx Constant uc_general_category_t UC_CATEGORY_Ll @deftypevrx Constant uc_general_category_t UC_CATEGORY_Lt @@ -111,10 +109,6 @@ The following are alias names for predefined General category values. This is another name for @code{UC_CATEGORY_L}. @end deftypevr -@deftypevr Macro uc_general_category_t UC_CASED_LETTER -This is another name for @code{UC_CATEGORY_LC}. -@end deftypevr - @deftypevr Macro uc_general_category_t UC_UPPERCASE_LETTER This is another name for @code{UC_CATEGORY_Lu}. @end deftypevr @@ -287,22 +281,13 @@ viewing the categories as sets of characters. The following functions associate general categories with their name. @deftypefun {const char *} uc_general_category_name (uc_general_category_t @var{category}) -Returns the name of a general category, more precisely, the abbreviated name. -Returns NULL if the general category corresponds to a bit mask that does not -have a name. -@end deftypefun - -@deftypefun {const char *} uc_general_category_long_name (uc_general_category_t @var{category}) -Returns the long name of a general category. +Returns the name of a general category. Returns NULL if the general category corresponds to a bit mask that does not have a name. @end deftypefun @deftypefun uc_general_category_t uc_general_category_byname (const char *@var{category_name}) -Returns the general category given by name, e.g@. @code{"Lu"}, or by long -name, e.g@. @code{"Uppercase Letter"}. -This lookup ignores spaces, underscores, or hyphens as word separators and is -case-insignificant. +Returns the general category given by name, e.g@. @code{"Lu"}. @end deftypefun The following functions view general categories as sets of Unicode characters. @@ -326,7 +311,6 @@ The following are the predefined general category value as bit masks. Additional general categories may be added in the future. @deftypevr Macro uint32_t UC_CATEGORY_MASK_L -@deftypevrx Macro uint32_t UC_CATEGORY_MASK_LC @deftypevrx Macro uint32_t UC_CATEGORY_MASK_Lu @deftypevrx Macro uint32_t UC_CATEGORY_MASK_Ll @deftypevrx Macro uint32_t UC_CATEGORY_MASK_Lt @@ -425,10 +409,6 @@ The canonical combining class value for ``Attached Below Left'' characters. The canonical combining class value for ``Attached Below'' characters. @end deftypevr -@deftypevr Constant int UC_CCC_ATA -The canonical combining class value for ``Attached Above'' characters. -@end deftypevr - @deftypevr Constant int UC_CCC_ATAR The canonical combining class value for ``Attached Above Right'' characters. @end deftypevr @@ -477,155 +457,120 @@ The canonical combining class value for ``Double Above'' characters. The canonical combining class value for ``Iota Subscript'' characters. @end deftypevr -The following functions associate canonical combining classes with their name. - -@deftypefun {const char *} uc_combining_class_name (int @var{ccc}) -Returns the name of a canonical combining class, more precisely, the -abbreviated name. -Returns NULL if the canonical combining class is a numeric value without a -name. -@end deftypefun - -@deftypefun {const char *} uc_combining_class_long_name (int @var{ccc}) -Returns the long name of a canonical combining class. -Returns NULL if the canonical combining class is a numeric value without a -name. -@end deftypefun - -@deftypefun int uc_combining_class_byname (const char *@var{ccc_name}) -Returns the canonical combining class given by name, e.g@. @code{"BL"}, or by -long name, e.g@. @code{"Below Left"}. -This lookup ignores spaces, underscores, or hyphens as word separators and is -case-insignificant. -@end deftypefun - The following function looks up the canonical combining class of a character. @deftypefun int uc_combining_class (ucs4_t @var{uc}) Returns the canonical combining class of a Unicode character. @end deftypefun -@node Bidi class -@section Bidi class +@node Bidirectional category +@section Bidirectional category -@cindex bidi class @cindex bidirectional category -@cindex Unicode character, bidi class @cindex Unicode character, bidirectional category -Every Unicode character or code point has a @emph{bidi class} assigned to it. -Before Unicode 4.0, this concept was known as @emph{bidirectional category}. +Every Unicode character or code point has a @emph{bidirectional category} +assigned to it. -The bidi class guides the bidirectional algorithm@texnl{} +The bidirectional category guides the bidirectional algorithm@texnl{} (@url{http://www.unicode.org/reports/tr9/}). The possible values are the following. @deftypevr Constant int UC_BIDI_L -The bidi class for `Left-to-Right`'' characters. +The bidirectional category for `Left-to-Right`'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_LRE -The bidi class for ``Left-to-Right Embedding'' characters. +The bidirectional category for ``Left-to-Right Embedding'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_LRO -The bidi class for ``Left-to-Right Override'' characters. +The bidirectional category for ``Left-to-Right Override'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_R -The bidi class for ``Right-to-Left'' characters. +The bidirectional category for ``Right-to-Left'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_AL -The bidi class for ``Right-to-Left Arabic'' characters. +The bidirectional category for ``Right-to-Left Arabic'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_RLE -The bidi class for ``Right-to-Left Embedding'' characters. +The bidirectional category for ``Right-to-Left Embedding'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_RLO -The bidi class for ``Right-to-Left Override'' characters. +The bidirectional category for ``Right-to-Left Override'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_PDF -The bidi class for ``Pop Directional Format'' characters. +The bidirectional category for ``Pop Directional Format'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_EN -The bidi class for ``European Number'' characters. +The bidirectional category for ``European Number'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_ES -The bidi class for ``European Number Separator'' characters. +The bidirectional category for ``European Number Separator'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_ET -The bidi class for ``European Number Terminator'' characters. +The bidirectional category for ``European Number Terminator'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_AN -The bidi class for ``Arabic Number'' characters. +The bidirectional category for ``Arabic Number'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_CS -The bidi class for ``Common Number Separator'' characters. +The bidirectional category for ``Common Number Separator'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_NSM -The bidi class for ``Non-Spacing Mark'' characters. +The bidirectional category for ``Non-Spacing Mark'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_BN -The bidi class for ``Boundary Neutral'' characters. +The bidirectional category for ``Boundary Neutral'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_B -The bidi class for ``Paragraph Separator'' characters. +The bidirectional category for ``Paragraph Separator'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_S -The bidi class for ``Segment Separator'' characters. +The bidirectional category for ``Segment Separator'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_WS -The bidi class for ``Whitespace'' characters. +The bidirectional category for ``Whitespace'' characters. @end deftypevr @deftypevr Constant int UC_BIDI_ON -The bidi class for ``Other Neutral'' characters. +The bidirectional category for ``Other Neutral'' characters. @end deftypevr The following functions implement the association between a bidirectional category and its name. -@deftypefun {const char *} uc_bidi_class_name (int @var{bidi_class}) -@deftypefunx {const char *} uc_bidi_category_name (int @var{category}) -Returns the name of a bidi class, more precisely, the abbreviated name. -@end deftypefun - -@deftypefun {const char *} uc_bidi_class_long_name (int @var{bidi_class}) -Returns the long name of a bidi class. +@deftypefun {const char *} uc_bidi_category_name (int @var{category}) +Returns the name of a bidirectional category. @end deftypefun -@deftypefun int uc_bidi_class_byname (const char *@var{bidi_class_name}) -@deftypefunx int uc_bidi_category_byname (const char *@var{category_name}) -Returns the bidi class given by name, e.g@. @code{"LRE"}, or by long name, -e.g@. @code{"Left-to-Right Embedding"}. -This lookup ignores spaces, underscores, or hyphens as word separators and is -case-insignificant. +@deftypefun int uc_bidi_category_byname (const char *@var{category_name}) +Returns the bidirectional category given by name, e.g@. @code{"LRE"}. @end deftypefun The following functions view bidirectional categories as sets of Unicode characters. -@deftypefun int uc_bidi_class (ucs4_t @var{uc}) -@deftypefunx int uc_bidi_category (ucs4_t @var{uc}) -Returns the bidi class of a Unicode character. +@deftypefun int uc_bidi_category (ucs4_t @var{uc}) +Returns the bidirectional category of a Unicode character. @end deftypefun -@deftypefun bool uc_is_bidi_class (ucs4_t @var{uc}, int @var{bidi_class}) -@deftypefunx bool uc_is_bidi_category (ucs4_t @var{uc}, int @var{category}) -Tests whether a Unicode character belongs to a given bidi class. +@deftypefun bool uc_is_bidi_category (ucs4_t @var{uc}, int @var{category}) +Tests whether a Unicode character belongs to a given bidirectional category. @end deftypefun @node Decimal digit value @@ -702,172 +647,6 @@ Stores the mirrored character of a Unicode character @var{uc} in stores @var{uc} unmodified in @code{*@var{puc}} and returns @code{false}. @end deftypefun -@node Arabic shaping -@section Arabic shaping - -@cindex Arabic shaping -@cindex joining of Arabic characters -When Arabic characters are rendered, after bidi reordering has taken -place, the shape of the glyphs are modified so that many adjacent glyphs -are joined. Two character properties describe how this ``Arabic shaping'' -takes place: the joining type and the joining group. - -@menu -* Joining type:: -* Joining group:: -@end menu - -@node Joining type -@subsection Joining type of Arabic characters - -@cindex joining type -The joining type of a character describes on which of the left and right -neighbour characters the character's shape depends, and which of the two -neighbour characters are rendered depending on this character. - -The joining type has the following possible values: - -@deftypevr Constant int UC_JOINING_TYPE_U -``Non joining'': Characters of this joining type prohibit joining. -@end deftypevr - -@deftypevr Constant int UC_JOINING_TYPE_T -``Transparent'': Characters of this joining type are skipped when -considering joining. -@end deftypevr - -@deftypevr Constant int UC_JOINING_TYPE_C -``Join causing'': Characters of this joining type cause their neighbour -characters to change their shapes but don't change their own shape. -@end deftypevr - -@deftypevr Constant int UC_JOINING_TYPE_L -``Left joining'': Characters of this joining type have two shapes, -isolated and initial. Such characters currently don't exist. -@end deftypevr - -@deftypevr Constant int UC_JOINING_TYPE_R -``Right joining'': Characters of this joining type have two shapes, -isolated and final. -@end deftypevr - -@deftypevr Constant int UC_JOINING_TYPE_D -``Dual joining'': Characters of this joining type have four shapes, -initial, medial, final, and isolated. -@end deftypevr - -The following functions implement the association between a joining type -and its name. - -@deftypefun {const char *} uc_joining_type_name (int @var{joining_type}) -Returns the name of a joining type. -@end deftypefun - -@deftypefun {const char *} uc_joining_type_long_name (int @var{joining_type}) -Returns the long name of a joining type. -@end deftypefun - -@deftypefun int uc_joining_type_byname (const char *@var{joining_type_name}) -Returns the joining type given by name, e.g@. @code{"D"}, or by long name, -e.g@. @code{"Dual Joining}. -This lookup ignores spaces, underscores, or hyphens as word separators and is -case-insignificant. -@end deftypefun - -The following function gives the joining type of every Unicode character. - -@deftypefun int uc_joining_type (ucs4_t @var{uc}) -Returns the joining type of a Unicode character. -@end deftypefun - -@node Joining group -@subsection Joining group of Arabic characters - -@cindex joining group -The joining group of a character describes how the character's shape -is modified in the four contexts of dual-joining characters or in the -two contexts of right-joining characters. - -The joining group has the following possible values: - -@deftypevr Constant int UC_JOINING_GROUP_NONE -@deftypevrx Constant int UC_JOINING_GROUP_AIN -@deftypevrx Constant int UC_JOINING_GROUP_ALAPH -@deftypevrx Constant int UC_JOINING_GROUP_ALEF -@deftypevrx Constant int UC_JOINING_GROUP_BEH -@deftypevrx Constant int UC_JOINING_GROUP_BETH -@deftypevrx Constant int UC_JOINING_GROUP_BURUSHASKI_YEH_BARREE -@deftypevrx Constant int UC_JOINING_GROUP_DAL -@deftypevrx Constant int UC_JOINING_GROUP_DALATH_RISH -@deftypevrx Constant int UC_JOINING_GROUP_E -@deftypevrx Constant int UC_JOINING_GROUP_FARSI_YEH -@deftypevrx Constant int UC_JOINING_GROUP_FE -@deftypevrx Constant int UC_JOINING_GROUP_FEH -@deftypevrx Constant int UC_JOINING_GROUP_FINAL_SEMKATH -@deftypevrx Constant int UC_JOINING_GROUP_GAF -@deftypevrx Constant int UC_JOINING_GROUP_GAMAL -@deftypevrx Constant int UC_JOINING_GROUP_HAH -@deftypevrx Constant int UC_JOINING_GROUP_HE -@deftypevrx Constant int UC_JOINING_GROUP_HEH -@deftypevrx Constant int UC_JOINING_GROUP_HEH_GOAL -@deftypevrx Constant int UC_JOINING_GROUP_HETH -@deftypevrx Constant int UC_JOINING_GROUP_KAF -@deftypevrx Constant int UC_JOINING_GROUP_KAPH -@deftypevrx Constant int UC_JOINING_GROUP_KHAPH -@deftypevrx Constant int UC_JOINING_GROUP_KNOTTED_HEH -@deftypevrx Constant int UC_JOINING_GROUP_LAM -@deftypevrx Constant int UC_JOINING_GROUP_LAMADH -@deftypevrx Constant int UC_JOINING_GROUP_MEEM -@deftypevrx Constant int UC_JOINING_GROUP_MIM -@deftypevrx Constant int UC_JOINING_GROUP_NOON -@deftypevrx Constant int UC_JOINING_GROUP_NUN -@deftypevrx Constant int UC_JOINING_GROUP_NYA -@deftypevrx Constant int UC_JOINING_GROUP_PE -@deftypevrx Constant int UC_JOINING_GROUP_QAF -@deftypevrx Constant int UC_JOINING_GROUP_QAPH -@deftypevrx Constant int UC_JOINING_GROUP_REH -@deftypevrx Constant int UC_JOINING_GROUP_REVERSED_PE -@deftypevrx Constant int UC_JOINING_GROUP_SAD -@deftypevrx Constant int UC_JOINING_GROUP_SADHE -@deftypevrx Constant int UC_JOINING_GROUP_SEEN -@deftypevrx Constant int UC_JOINING_GROUP_SEMKATH -@deftypevrx Constant int UC_JOINING_GROUP_SHIN -@deftypevrx Constant int UC_JOINING_GROUP_SWASH_KAF -@deftypevrx Constant int UC_JOINING_GROUP_SYRIAC_WAW -@deftypevrx Constant int UC_JOINING_GROUP_TAH -@deftypevrx Constant int UC_JOINING_GROUP_TAW -@deftypevrx Constant int UC_JOINING_GROUP_TEH_MARBUTA -@deftypevrx Constant int UC_JOINING_GROUP_TEH_MARBUTA_GOAL -@deftypevrx Constant int UC_JOINING_GROUP_TETH -@deftypevrx Constant int UC_JOINING_GROUP_WAW -@deftypevrx Constant int UC_JOINING_GROUP_YEH -@deftypevrx Constant int UC_JOINING_GROUP_YEH_BARREE -@deftypevrx Constant int UC_JOINING_GROUP_YEH_WITH_TAIL -@deftypevrx Constant int UC_JOINING_GROUP_YUDH -@deftypevrx Constant int UC_JOINING_GROUP_YUDH_HE -@deftypevrx Constant int UC_JOINING_GROUP_ZAIN -@deftypevrx Constant int UC_JOINING_GROUP_ZHAIN -@end deftypevr - -The following functions implement the association between a joining group -and its name. - -@deftypefun {const char *} uc_joining_group_name (int @var{joining_group}) -Returns the name of a joining group. -@end deftypefun - -@deftypefun int uc_joining_group_byname (const char *@var{joining_group_name}) -Returns the joining group given by name, e.g@. @code{"Teh_Marbuta"}. -This lookup ignores spaces, underscores, or hyphens as word separators and is -case-insignificant. -@end deftypefun - -The following function gives the joining group of every Unicode character. - -@deftypefun int uc_joining_group (ucs4_t @var{uc}) -Returns the joining group of a Unicode character. -@end deftypefun - @node Properties @section Properties @@ -923,13 +702,6 @@ The following properties are related to case folding. @deftypevrx Constant uc_property_t UC_PROPERTY_LOWERCASE @deftypevrx Constant uc_property_t UC_PROPERTY_OTHER_LOWERCASE @deftypevrx Constant uc_property_t UC_PROPERTY_TITLECASE -@deftypevrx Constant uc_property_t UC_PROPERTY_CASED -@deftypevrx Constant uc_property_t UC_PROPERTY_CASE_IGNORABLE -@deftypevrx Constant uc_property_t UC_PROPERTY_CHANGES_WHEN_LOWERCASED -@deftypevrx Constant uc_property_t UC_PROPERTY_CHANGES_WHEN_UPPERCASED -@deftypevrx Constant uc_property_t UC_PROPERTY_CHANGES_WHEN_TITLECASED -@deftypevrx Constant uc_property_t UC_PROPERTY_CHANGES_WHEN_CASEFOLDED -@deftypevrx Constant uc_property_t UC_PROPERTY_CHANGES_WHEN_CASEMAPPED @deftypevrx Constant uc_property_t UC_PROPERTY_SOFT_DOTTED @end deftypevr @@ -1022,16 +794,12 @@ Other miscellaneous properties are: The following function looks up a property by its name. @deftypefun uc_property_t uc_property_byname (const char *@var{property_name}) -Returns the property given by name, e.g@. @code{"White space"}. If a property +Returns the property given by name, e.g. @code{"White space"}. If a property with the given name exists, the result will satisfy the @code{uc_property_is_valid} predicate. Otherwise the result will not satisfy this predicate and must not be passed to functions that expect an @code{uc_property_t} argument. -This lookup ignores spaces, underscores, or hyphens as word separators, is -case-insignificant, and supports the aliases listed in Unicode's -@file{PropertyAliases.txt} file. - This function references a big table of all predefined properties. Its use can significantly increase the size of your application. @end deftypefun @@ -1072,13 +840,6 @@ The following properties are related to case folding. @deftypefunx bool uc_is_property_lowercase (ucs4_t @var{uc}) @deftypefunx bool uc_is_property_other_lowercase (ucs4_t @var{uc}) @deftypefunx bool uc_is_property_titlecase (ucs4_t @var{uc}) -@deftypefunx bool uc_is_property_cased (ucs4_t @var{uc}) -@deftypefunx bool uc_is_property_case_ignorable (ucs4_t @var{uc}) -@deftypefunx bool uc_is_property_changes_when_lowercased (ucs4_t @var{uc}) -@deftypefunx bool uc_is_property_changes_when_uppercased (ucs4_t @var{uc}) -@deftypefunx bool uc_is_property_changes_when_titlecased (ucs4_t @var{uc}) -@deftypefunx bool uc_is_property_changes_when_casefolded (ucs4_t @var{uc}) -@deftypefunx bool uc_is_property_changes_when_casemapped (ucs4_t @var{uc}) @deftypefunx bool uc_is_property_soft_dotted (ucs4_t @var{uc}) @end deftypefun |