1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd">
<html>
<!-- Created on February, 24 2024 by texi2html 1.78a -->
<!--
Written by: Lionel Cons <Lionel.Cons@cern.ch> (original author)
Karl Berry <karl@freefriends.org>
Olaf Bachmann <obachman@mathematik.uni-kl.de>
and many others.
Maintained by: Many creative people.
Send bugs and suggestions to <texi2html-bug@nongnu.org>
-->
<head>
<title>GNU libunistring: B. The char32_t problem</title>
<meta name="description" content="GNU libunistring: B. The char32_t problem">
<meta name="keywords" content="GNU libunistring: B. The char32_t problem">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="texi2html 1.78a">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
pre.display {font-family: serif}
pre.format {font-family: serif}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: serif; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: serif; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.roman {font-family:serif; font-weight:normal;}
span.sansserif {font-family:sans-serif; font-weight:normal;}
ul.toc {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="libunistring_18.html#SEC83" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="libunistring_20.html#SEC85" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_21.html#SEC94" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<hr size="2">
<a name="The-char32_005ft-problem"></a>
<a name="SEC84"></a>
<h1 class="appendix"> <a href="libunistring_toc.html#TOC84">B. The <code>char32_t</code> problem</a> </h1>
<p>In response to the <code>wchar_t</code> mess described in the previous section,
ISO C 11 introduces two new types: <code>char32_t</code> and <code>char16_t</code>.
</p>
<p><code>char32_t</code> is a type like <code>wchar_t</code>, with the added guarantee that it
is 32 bits wide. So, it is a type that is appropriate for encoding a Unicode
character. It is meant to resolve the problems of the 16-bit wide
<code>wchar_t</code> on AIX and Windows platforms, and allow a saner programming model
for wide character strings across all platforms.
</p>
<p><code>char16_t</code> is a type like <code>wchar_t</code>, with the added guarantee that it
is 16 bits wide. It is meant to allow porting programs that use the broken wide
character strings programming model from Windows to all platforms. Of course,
no one needs this.
</p>
<p>These types are accompanied with a syntax for defining wide string literals with
these element types: <code>u"..."</code> and <code>U"..."</code>.
</p>
<p>So far, so good. What the ISO C designers forgot, is to provide standardized C
library functions that operate on these wide character strings. They
standardized only the most basic functions, <code>mbrtoc32</code> and <code>c32rtomb</code>,
which are analogous to <code>mbrtowc</code> and <code>wcrtomb</code>, respectively. For the
rest, GNU gnulib <a href="https://www.gnu.org/software/gnulib/">https://www.gnu.org/software/gnulib/</a> provides the
functions:
</p><ul>
<li>
Functions for converting an entire string: <code>mbstoc32s</code> – like
<code>mbstowcs</code>, <code>c32stombs</code> – like <code>wcstombs</code>.
</li><li>
Functions for testing the properties of a 32-bit wide character:
<code>c32isalnum</code>, <code>c32isalpha</code>, etc. – like <code>iswalnum</code>,
<code>iswalpha</code>, etc.
</li></ul>
<p>Still, this API has two problems:
</p><ul>
<li>
The <code>char32_t</code> encoding is locale dependent and undocumented. This means,
if you want to know any property of a <code>char32_t</code> character, other than the
properties defined by <code><wctype.h></code> – such as whether it's a dash, currency
symbol, paragraph separator, or similar –, you have to convert it to
<code>char *</code> encoding first, by use of the function <code>c32tomb</code>.
</li><li>
Even on platforms where <code>wchar_t</code> is 32 bits wide, the <code>char32_t</code>
encoding may be different from the <code>wchar_t</code> encoding.
</li></ul>
<hr size="6">
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="libunistring_18.html#SEC83" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="libunistring_20.html#SEC85" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_21.html#SEC94" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libunistring_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<p>
<font size="-1">
This document was generated by <em>Bruno Haible</em> on <em>February, 24 2024</em> using <a href="https://www.nongnu.org/texi2html/"><em>texi2html 1.78a</em></a>.
</font>
<br>
</p>
</body>
</html>
|