1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
3705
3706
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716
3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728
3729
3730
3731
3732
3733
3734
3735
3736
3737
3738
3739
3740
3741
3742
3743
3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
3774
3775
3776
3777
3778
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
3961
3962
3963
3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
4047
4048
4049
4050
4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
4107
4108
4109
4110
4111
4112
4113
4114
4115
4116
4117
4118
4119
4120
4121
4122
4123
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
4134
4135
4136
4137
4138
4139
4140
4141
4142
4143
4144
4145
4146
4147
4148
4149
4150
4151
4152
4153
4154
4155
4156
4157
4158
4159
4160
4161
4162
4163
4164
4165
4166
4167
4168
4169
4170
4171
4172
4173
4174
4175
4176
4177
4178
4179
4180
4181
4182
4183
4184
4185
4186
4187
4188
4189
4190
4191
4192
4193
4194
4195
4196
4197
4198
4199
4200
4201
4202
4203
4204
4205
4206
4207
4208
4209
4210
4211
4212
4213
4214
4215
4216
4217
4218
4219
4220
4221
4222
4223
4224
4225
4226
4227
4228
4229
4230
4231
4232
4233
4234
4235
4236
4237
4238
4239
4240
4241
4242
4243
4244
4245
4246
4247
4248
4249
4250
4251
4252
4253
4254
4255
4256
4257
4258
4259
4260
4261
4262
4263
4264
4265
4266
4267
4268
4269
4270
4271
4272
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289
4290
4291
4292
4293
4294
4295
4296
4297
4298
4299
4300
4301
4302
4303
4304
4305
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316
4317
4318
4319
4320
4321
4322
4323
4324
4325
4326
4327
4328
4329
4330
4331
4332
4333
4334
4335
4336
4337
4338
4339
4340
4341
4342
4343
4344
4345
4346
4347
4348
4349
4350
4351
4352
4353
4354
4355
4356
4357
4358
4359
4360
4361
4362
4363
4364
4365
4366
4367
4368
4369
4370
4371
4372
4373
4374
4375
4376
4377
4378
4379
4380
4381
4382
4383
4384
4385
4386
4387
4388
4389
4390
4391
4392
4393
4394
4395
4396
4397
4398
4399
4400
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419
4420
4421
4422
4423
4424
4425
4426
4427
4428
4429
4430
4431
4432
4433
4434
4435
4436
4437
4438
4439
4440
4441
4442
4443
4444
4445
4446
4447
4448
4449
4450
4451
4452
4453
4454
4455
4456
4457
4458
4459
4460
4461
4462
4463
4464
4465
4466
4467
4468
4469
4470
4471
4472
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482
4483
4484
4485
4486
4487
4488
4489
4490
4491
4492
4493
4494
4495
4496
4497
4498
4499
4500
4501
4502
4503
4504
4505
4506
4507
4508
4509
4510
4511
4512
4513
4514
4515
4516
4517
4518
4519
4520
4521
4522
4523
4524
4525
4526
4527
4528
4529
4530
4531
4532
4533
4534
4535
4536
4537
4538
4539
4540
4541
4542
4543
4544
4545
4546
4547
4548
4549
4550
4551
4552
4553
4554
4555
4556
4557
4558
4559
4560
4561
4562
4563
4564
4565
4566
4567
4568
4569
4570
4571
4572
4573
4574
4575
4576
4577
4578
4579
4580
4581
4582
4583
4584
4585
4586
4587
4588
4589
4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614
4615
4616
4617
4618
4619
4620
4621
4622
4623
4624
4625
4626
4627
4628
4629
4630
4631
4632
4633
4634
4635
4636
4637
4638
4639
4640
4641
4642
4643
4644
4645
4646
4647
4648
4649
4650
4651
4652
4653
4654
4655
4656
4657
4658
4659
4660
4661
4662
4663
4664
4665
4666
4667
4668
4669
4670
4671
4672
4673
4674
4675
4676
4677
4678
4679
4680
4681
4682
4683
4684
4685
4686
4687
4688
4689
4690
4691
4692
4693
4694
4695
4696
4697
4698
4699
4700
4701
4702
4703
4704
4705
4706
4707
4708
4709
4710
4711
4712
4713
4714
4715
4716
4717
4718
4719
4720
4721
4722
4723
4724
4725
4726
4727
4728
4729
4730
4731
4732
4733
4734
4735
4736
4737
4738
4739
4740
4741
4742
4743
4744
4745
4746
4747
4748
4749
4750
4751
4752
4753
4754
4755
4756
4757
4758
4759
4760
4761
4762
4763
4764
4765
4766
4767
4768
4769
4770
4771
4772
4773
4774
4775
4776
4777
4778
4779
4780
4781
4782
4783
4784
4785
4786
4787
4788
4789
4790
4791
4792
4793
4794
4795
4796
4797
4798
4799
4800
4801
4802
4803
4804
4805
4806
4807
4808
4809
4810
4811
4812
4813
4814
4815
4816
4817
4818
4819
4820
4821
4822
4823
4824
4825
4826
4827
4828
4829
4830
4831
4832
4833
4834
4835
4836
4837
4838
4839
4840
4841
4842
4843
4844
4845
4846
4847
4848
4849
4850
4851
4852
4853
4854
4855
4856
4857
4858
4859
4860
4861
4862
4863
4864
4865
4866
4867
4868
4869
4870
4871
4872
4873
4874
4875
4876
4877
4878
4879
4880
4881
4882
4883
4884
4885
4886
4887
4888
4889
4890
4891
4892
4893
4894
4895
4896
4897
4898
4899
4900
4901
4902
4903
4904
4905
4906
4907
4908
4909
4910
4911
4912
4913
4914
4915
4916
4917
4918
4919
4920
4921
4922
4923
4924
4925
4926
4927
4928
4929
4930
4931
4932
4933
4934
4935
4936
4937
4938
4939
4940
4941
4942
4943
4944
4945
4946
4947
4948
4949
4950
4951
4952
4953
4954
4955
4956
4957
4958
4959
4960
4961
4962
4963
4964
4965
4966
4967
4968
4969
4970
4971
4972
4973
4974
4975
4976
4977
4978
4979
4980
4981
4982
4983
4984
4985
4986
4987
4988
4989
4990
4991
4992
4993
4994
4995
4996
4997
4998
4999
5000
5001
5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
5027
5028
5029
5030
5031
5032
5033
5034
5035
5036
5037
5038
5039
5040
5041
5042
5043
5044
5045
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
5062
5063
5064
5065
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
5125
5126
5127
5128
5129
5130
5131
5132
5133
5134
5135
5136
5137
5138
5139
5140
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151
5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
5192
5193
5194
5195
5196
5197
5198
5199
5200
5201
5202
5203
5204
5205
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
5231
5232
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264
5265
5266
5267
5268
5269
5270
5271
5272
5273
5274
5275
5276
5277
5278
5279
5280
5281
5282
5283
5284
5285
5286
5287
5288
5289
5290
5291
5292
5293
5294
5295
5296
5297
5298
5299
5300
5301
5302
5303
5304
5305
5306
5307
5308
5309
5310
5311
5312
5313
5314
5315
5316
5317
5318
5319
5320
5321
5322
5323
5324
5325
5326
5327
5328
5329
5330
5331
5332
5333
5334
5335
5336
5337
5338
5339
5340
5341
5342
5343
5344
5345
5346
5347
5348
5349
5350
5351
5352
5353
5354
5355
5356
5357
5358
5359
5360
5361
5362
5363
5364
5365
5366
5367
5368
5369
5370
5371
5372
5373
5374
5375
5376
5377
5378
5379
5380
5381
5382
5383
5384
5385
5386
5387
5388
5389
5390
5391
5392
5393
5394
5395
5396
5397
5398
5399
5400
5401
5402
5403
5404
5405
5406
5407
5408
5409
5410
5411
5412
5413
5414
5415
5416
5417
5418
5419
5420
5421
5422
5423
5424
5425
5426
5427
5428
5429
5430
5431
5432
5433
5434
5435
5436
5437
5438
5439
5440
5441
5442
5443
5444
5445
5446
5447
5448
5449
5450
5451
5452
5453
5454
5455
5456
5457
5458
5459
5460
5461
5462
5463
5464
5465
5466
5467
5468
5469
5470
5471
5472
5473
5474
5475
5476
5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
5488
5489
5490
5491
5492
5493
5494
5495
5496
5497
5498
5499
5500
5501
5502
5503
5504
5505
5506
5507
5508
5509
5510
5511
5512
5513
5514
5515
5516
5517
5518
5519
5520
5521
5522
5523
5524
5525
5526
5527
5528
5529
5530
5531
5532
5533
5534
5535
5536
5537
5538
5539
5540
5541
5542
5543
5544
5545
5546
5547
5548
5549
5550
5551
5552
5553
5554
5555
5556
5557
5558
5559
5560
5561
5562
5563
5564
5565
5566
5567
5568
5569
5570
5571
5572
5573
5574
5575
5576
5577
5578
5579
5580
5581
5582
5583
5584
5585
5586
5587
5588
5589
5590
5591
5592
5593
5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
5609
5610
5611
5612
5613
5614
5615
5616
5617
5618
5619
5620
5621
5622
5623
5624
5625
5626
5627
5628
5629
5630
5631
5632
5633
5634
5635
5636
5637
5638
5639
5640
5641
5642
5643
5644
5645
5646
5647
5648
5649
5650
5651
5652
5653
5654
5655
5656
5657
5658
5659
5660
5661
5662
5663
5664
5665
5666
5667
5668
5669
5670
5671
5672
5673
5674
5675
5676
5677
5678
5679
5680
5681
5682
5683
5684
5685
5686
5687
5688
5689
5690
5691
5692
5693
5694
5695
5696
5697
5698
5699
5700
5701
5702
5703
5704
5705
5706
5707
5708
5709
5710
5711
5712
5713
5714
5715
5716
5717
5718
5719
5720
5721
5722
5723
5724
5725
5726
5727
5728
5729
5730
5731
5732
5733
5734
5735
5736
5737
5738
5739
5740
5741
5742
5743
5744
5745
5746
5747
5748
5749
5750
5751
5752
5753
5754
5755
5756
5757
5758
5759
5760
5761
5762
5763
5764
5765
5766
5767
5768
5769
5770
5771
5772
5773
5774
5775
5776
5777
5778
5779
5780
5781
5782
5783
5784
5785
5786
5787
5788
5789
5790
5791
5792
5793
5794
5795
5796
5797
5798
5799
5800
5801
5802
5803
5804
5805
5806
5807
5808
5809
5810
5811
5812
5813
5814
5815
5816
5817
5818
5819
5820
5821
5822
5823
5824
5825
5826
5827
5828
5829
5830
5831
5832
5833
5834
5835
5836
5837
5838
5839
5840
5841
5842
5843
5844
5845
5846
5847
5848
5849
5850
5851
5852
5853
5854
5855
5856
5857
5858
5859
5860
5861
5862
5863
5864
5865
5866
5867
5868
5869
5870
5871
5872
5873
5874
5875
5876
5877
5878
5879
5880
5881
5882
5883
5884
5885
5886
5887
5888
5889
5890
5891
5892
5893
5894
5895
5896
5897
5898
5899
5900
5901
5902
5903
5904
5905
5906
5907
5908
5909
5910
5911
5912
5913
5914
5915
5916
5917
5918
5919
5920
5921
5922
5923
5924
5925
5926
5927
5928
5929
5930
5931
5932
5933
5934
5935
5936
5937
5938
5939
5940
5941
5942
5943
5944
5945
5946
5947
5948
5949
5950
5951
5952
5953
5954
5955
5956
5957
5958
5959
5960
5961
5962
5963
5964
5965
5966
5967
5968
5969
5970
5971
5972
5973
5974
5975
5976
5977
5978
5979
5980
5981
5982
5983
5984
5985
5986
5987
5988
5989
5990
5991
5992
5993
5994
5995
5996
5997
5998
5999
6000
6001
6002
6003
6004
6005
6006
6007
6008
6009
6010
6011
6012
6013
6014
6015
6016
6017
6018
6019
6020
6021
6022
6023
6024
6025
6026
6027
6028
6029
6030
6031
6032
6033
6034
6035
6036
6037
6038
6039
6040
6041
6042
6043
6044
6045
6046
6047
6048
6049
6050
6051
6052
6053
6054
6055
6056
6057
6058
6059
6060
6061
6062
6063
6064
6065
6066
6067
6068
6069
6070
6071
6072
6073
6074
6075
6076
6077
6078
6079
6080
6081
6082
6083
6084
6085
6086
6087
6088
6089
6090
6091
6092
6093
6094
6095
6096
6097
6098
6099
6100
6101
6102
6103
6104
6105
6106
6107
6108
6109
6110
6111
6112
6113
6114
6115
6116
6117
6118
6119
6120
6121
6122
6123
6124
6125
6126
6127
6128
6129
6130
6131
6132
6133
6134
6135
6136
6137
6138
6139
6140
6141
6142
6143
6144
6145
6146
6147
6148
6149
6150
6151
6152
6153
6154
6155
6156
6157
6158
6159
6160
6161
6162
6163
6164
6165
6166
6167
6168
6169
6170
6171
6172
6173
6174
6175
6176
6177
6178
6179
6180
6181
6182
6183
6184
6185
6186
6187
6188
6189
6190
6191
6192
6193
6194
6195
6196
6197
6198
6199
6200
|
This is libunistring.info, produced by makeinfo version 4.13 from
libunistring.texi.
INFO-DIR-SECTION Software development
START-INFO-DIR-ENTRY
* GNU libunistring: (libunistring). Unicode string library.
END-INFO-DIR-ENTRY
This manual is for GNU libunistring.
File: libunistring.info, Node: Top, Next: Introduction, Up: (dir)
GNU libunistring
****************
* Menu:
* Introduction:: Who may need Unicode strings?
* Conventions:: Conventions used in this manual
* unitypes.h:: Elementary types
* unistr.h:: Elementary Unicode string functions
* uniconv.h:: Conversions between Unicode and encodings
* unistdio.h:: Output with Unicode strings
* uniname.h:: Names of Unicode characters
* unictype.h:: Unicode character classification and properties
* uniwidth.h:: Display width
* uniwbrk.h:: Word breaks in strings
* unilbrk.h:: Line breaking
* uninorm.h:: Normalization forms
* unicase.h:: Case mappings
* uniregex.h:: Regular expressions
* Using the library:: How to link with the library and use it?
* More functionality:: More advanced functionality
* Licenses:: Licenses
* Index:: General Index
--- The Detailed Node Listing ---
Introduction
* Unicode:: What is Unicode?
* Unicode and i18n:: Unicode and internationalization
* Locale encodings:: What is a locale encoding?
* In-memory representation:: How to represent strings in memory?
* char * strings:: What to keep in mind with `char *' strings
* The wchar_t mess:: Why `wchar_t *' strings are useless
* Unicode strings:: How are Unicode strings represented?
unistr.h
* Elementary string checks::
* Elementary string conversions::
* Elementary string functions::
* Elementary string functions with memory allocation::
* Elementary string functions on NUL terminated strings::
unictype.h
* General category::
* Canonical combining class::
* Bidirectional category::
* Decimal digit value::
* Digit value::
* Numeric value::
* Mirrored character::
* Properties::
* Scripts::
* Blocks::
* ISO C and Java syntax::
* Classifications like in ISO C::
General category
* Object oriented API::
* Bit mask API::
Properties
* Properties as objects::
* Properties as functions::
uniwbrk.h
* Word breaks in a string::
* Word break property::
uninorm.h
* Decomposition of characters::
* Composition of characters::
* Normalization of strings::
* Normalizing comparisons::
* Normalization of streams::
unicase,h
* Case mappings of characters::
* Case mappings of strings::
* Case mappings of substrings::
* Case insensitive comparison::
* Case detection::
Using the library
* Installation::
* Compiler options::
* Include files::
* Autoconf macro::
* Reporting problems::
Licenses
* GNU GPL:: GNU General Public License
* GNU LGPL:: GNU Lesser General Public License
* GNU FDL:: GNU Free Documentation License
File: libunistring.info, Node: Introduction, Next: Conventions, Prev: Top, Up: Top
1 Introduction
**************
This library provides functions for manipulating Unicode strings and
for manipulating C strings according to the Unicode standard.
It consists of the following parts:
`<unistr.h>'
elementary string functions
`<uniconv.h>'
conversion from/to legacy encodings
`<unistdio.h>'
formatted output to strings
`<uniname.h>'
character names
`<unictype.h>'
character classification and properties
`<uniwidth.h>'
string width when using nonproportional fonts
`<uniwbrk.h>'
word breaks
`<unilbrk.h>'
line breaking algorithm
`<uninorm.h>'
normalization (composition and decomposition)
`<unicase.h>'
case folding
`<uniregex.h>'
regular expressions (not yet implemented)
libunistring is for you if your application involves non-trivial text
processing, such as upper/lower case conversions, line breaking,
operations on words, or more advanced analysis of text. Text provided
by the user can, in general, contain characters of all kinds of
scripts. The text processing functions provided by this library handle
all scripts and all languages.
libunistring is for you if your application already uses the ISO C /
POSIX `<ctype.h>', `<wctype.h>' functions and the text it operates on is
provided by the user and can be in any language.
libunistring is also for you if your application uses Unicode
strings as internal in-memory representation.
* Menu:
* Unicode:: What is Unicode?
* Unicode and i18n:: Unicode and internationalization
* Locale encodings:: What is a locale encoding?
* In-memory representation:: How to represent strings in memory?
* char * strings:: What to keep in mind with `char *' strings
* The wchar_t mess:: Why `wchar_t *' strings are useless
* Unicode strings:: How are Unicode strings represented?
File: libunistring.info, Node: Unicode, Next: Unicode and i18n, Up: Introduction
1.1 Unicode
===========
Unicode is a standardized repertoire of characters that contains
characters from all scripts of the world, from Latin letters to Chinese
ideographs and Babylonian cuneiform glyphs. It also specifies how
these characters are to be rendered on a screen or on paper, and how
common text processing (word selection, line breaking, uppercasing of
page titles etc.) is supposed to behave on Unicode text.
Unicode also specifies three ways of storing sequences of Unicode
characters in a computer whose basic unit of data is an 8-bit byte:
UTF-8
Every character is represented as 1 to 4 bytes.
UTF-16
Every character is represented as 1 to 2 units of 16 bits.
UTF-32, a.k.a. UCS-4
Every character is represented as 1 unit of 32 bits.
For encoding Unicode text in a file, UTF-8 is usually used. For
encoding Unicode strings in memory for a program, either of the three
encoding forms can be reasonably used.
Unicode is widely used on the web. Prior to the use of Unicode, web
pages were in many different encodings (ISO-8859-1 for English, French,
Spanish, ISO-8859-2 for Polish, ISO-8859-7 for Greek, KOI8-R for
Russian, GB2312 or BIG5 for Chinese, ISO-2022-JP-2 or EUC-JP or
Shift_JIS for Japanese, and many many others). It was next to
impossible to create a document that contained Chinese and Polish text
in the same document. Due to the many encodings for Japanese, even the
processing of pure Japanese text was error prone.
References:
* The Unicode standard: `http://www.unicode.org/'
* Definition of UTF-8: `http://www.rfc-editor.org/rfc/rfc3629.txt'
* Definition of UTF-16: `http://www.rfc-editor.org/rfc/rfc2781.txt'
* Markus Kuhn's UTF-8 and Unicode FAQ:
`http://www.cl.cam.ac.uk/~mgk25/unicode.html'
File: libunistring.info, Node: Unicode and i18n, Next: Locale encodings, Prev: Unicode, Up: Introduction
1.2 Unicode and Internationalization
====================================
Internationalization is the process of changing the source code of a
program so that it can meet the expectations of users in any culture,
if culture specific data (translations, images etc.) are provided.
Use of Unicode is not strictly required for internationalization,
but it makes internationalization much easier, because operations that
need to look at specific characters (like hyphenation, spell checking,
or the automatic conversion of double-quotes to opening and closing
double-quote characters) don't need to consider multiple possible
encodings of the text.
Use of Unicode also enables multilingualization: the ability of
having text in multiple languages present in the same document or even
in the same line of text.
But use of Unicode is not everything. Internationalization usually
consists of three features:
* Use of Unicode where needed for text processing. This is what
this library is for.
* Use of message catalogs for messages shown to the user, This is
what GNU gettext is about.
* Use of locale specific conventions for date and time formats, for
numeric formatting, or for sorting of text. This can be done
adequately with the POSIX APIs and the implementation of locales
in the GNU C library.
File: libunistring.info, Node: Locale encodings, Next: In-memory representation, Prev: Unicode and i18n, Up: Introduction
1.3 Locale encodings
====================
A locale is a set of cultural conventions. According to POSIX, for
a program, at any moment, there is one locale being designated as the
"current locale". (Actually, POSIX supports also one locale per
thread, but this feature is not yet universally implemented and not
widely used.) The locale is partitioned into several aspects, called
the "categories" of the locale. The main various aspects are:
* The character encoding and the character properties. This is the
`LC_CTYPE' category.
* The sorting rules for text. This is the `LC_COLLATE' category.
* The language specific translations of messages. This is the
`LC_MESSAGES' category.
* The formatting rules for numbers, such as the decimal separator.
This is the `LC_NUMERIC' category.
* The formatting rules for amounts of money. This is the
`LC_MONETARY' category.
* The formatting of date and time. This is the `LC_TIME' category.
In particular, the `LC_CTYPE' category of the current locale
determines the character encoding. This is the encoding of `char *'
strings. We also call it the "locale encoding". GNU libunistring has
a function, `locale_charset', that returns a standardized (platform
independent) name for this encoding.
All locale encodings used on glibc systems are essentially ASCII
compatible: Most graphic ASCII characters have the same representation,
as a single byte, in that encoding as in ASCII.
Among the possible locale encodings are UTF-8 and GB18030. Both
allow to represent any Unicode character as a sequence of bytes. UTF-8
is used in most of the world, whereas GB18030 is used in the People's
Republic of China, because it is backward compatible with the GB2312
encoding that was used in this country earlier.
The legacy locale encodings, ISO-8859-15 (which supplanted
ISO-8859-1 in most of Europe), ISO-8859-2, KOI8-R, EUC-JP, etc., are
still in use in many places, though.
UTF-16 and UTF-32 are not used as locale encodings, because they are
not ASCII compatible.
File: libunistring.info, Node: In-memory representation, Next: char * strings, Prev: Locale encodings, Up: Introduction
1.4 Choice of in-memory representation of strings
=================================================
There are three ways of representing strings in memory of a running
program.
* As `char *' strings. Such strings are represented in locale
encoding. This approach is employed when not much text processing
is done by the program. When some Unicode aware processing is to
be done, a string is converted to Unicode on the fly and back to
locale encoding afterwards.
* As UTF-8 or UTF-16 or UTF-32 strings. This implies that
conversion from locale encoding to Unicode is performed on input,
and in the opposite direction on output. This approach is
employed when the program does a significant amount of text
processing, or when the program has multiple threads operating on
the same data but in different locales.
* As `wchar_t *', a.k.a. "wide strings". This approach is misguided,
see *note The wchar_t mess::.
File: libunistring.info, Node: char * strings, Next: The wchar_t mess, Prev: In-memory representation, Up: Introduction
1.5 `char *' strings
====================
The classical C strings, with its C library support standardized by
ISO C and POSIX, can be used in internationalized programs with some
precautions. The problem with this API is that many of the C library
functions for strings don't work correctly on strings in locale
encodings, leading to bugs that only people in some cultures of the
world will experience.
The first problem with the C library API is the support of multibyte
locales. According to the locale encoding, in general, every character
is represented by one or more bytes (up to 4 bytes in practice -- but
use `MB_LEN_MAX' instead of the number 4 in the code). When every
character is represented by only 1 byte, we speak of an "unibyte
locale", otherwise of a "multibyte locale". It is important to realize
that the majority of Unix installations nowadays use UTF-8 or GB18030
as locale encoding; therefore, the majority of users are using
multibyte locales.
The important fact to remember is: _A `char' is a byte, not a
character._
As a consequence:
* The `<ctype.h>' API is useless in this context; it does not work in
multibyte locales.
* The `strlen' function does not return the number of characters in
a string. Nor does it return the number of screen columns occupied
by a string after it is output. It merely returns the number of
_bytes_ occupied by a string.
* Truncating a string, for example, with `strncpy', can have the
effect of truncating it in the middle of a multibyte character.
Such a string will, when output, have a garbled character at its
end, often represented by a hollow box.
* `strchr' and `strrchr' do not work with multibyte strings if the
locale encoding is GB18030 and the character to be searched is a
digit.
* `strstr' does not work with multibyte strings if the locale
encoding is different from UTF-8.
* `strcspn', `strpbrk', `strspn' cannot work correctly in multibyte
locales: they assume the second argument is a list of single-byte
characters. Even in this simple case, they do not work with
multibyte strings if the locale encoding is GB18030 and one of the
characters to be searched is a digit.
* `strsep' and `strtok_r' do not work with multibyte strings unless
all of the delimiter characters are ASCII characters < 0x30.
* The `strcasecmp', `strncasecmp', and `strcasestr' functions do not
work with multibyte strings.
The workarounds can be found in GNU gnulib
`http://www.gnu.org/software/gnulib/'.
* gnulib has modules `mbchar', `mbiter', `mbuiter' that represent
multibyte characters and allow to iterate across a multibyte
string with the same ease as through a unibyte string.
* gnulib has functions `mbslen' and `mbswidth' that can be used
instead of `strlen' when the number of characters or the number of
screen columns of a string is requested.
* gnulib has functions `mbschr' and `mbsrrchr' that are like
`strchr' and `strrchr', but work in multibyte locales.
* gnulib has a function `mbsstr', like `strstr', but works in
multibyte locales.
* gnulib has functions `mbscspn', `mbspbrk', `mbsspn' that are like
`strcspn', `strpbrk', `strspn', but work in multibyte locales.
* gnulib has functions `mbssep' and `mbstok_r' that are like
`strsep' and `strtok_r' but work in multibyte locales.
* gnulib has functions `mbscasecmp', `mbsncasecmp', `mbspcasecmp',
and `mbscasestr' that are like `strcasecmp', `strncasecmp', and
`strcasestr', but work in multibyte locales. Still, the function
`ulc_casecmp' is preferable to these functions; see below.
The second problem with the C library API is that it has some
assumptions built-in that are not valid in some languages:
* It assumes that there are only two forms of every character:
uppercase and lowercase. This is not true for Croatian, where the
character LETTER DZ WITH CARON comes in three forms: LATIN CAPITAL
LETTER DZ WITH CARON (DZ), LATIN CAPITAL LETTER D WITH SMALL
LETTER Z WITH CARON (Dz), LATIN SMALL LETTER DZ WITH CARON (dz).
* It assumes that uppercasing of 1 character leads to 1 character.
This is not true for German, where the LATIN SMALL LETTER SHARP S,
when uppercased, becomes `SS'.
* It assumes that there is 1:1 mapping between uppercase and
lowercase forms. This is not true for the Greek sigma: GREEK
CAPITAL LETTER SIGMA is the uppercase of both GREEK SMALL LETTER
SIGMA and GREEK SMALL LETTER FINAL SIGMA.
* It assumes that the upper/lowercase mappings are position
independent. This is not true for the Greek sigma and the
Lithuanian i.
The correct way to deal with this problem is
1. to provide functions for titlecasing, as well as for upper- and
lowercasing,
2. to view case transformations as functions that operates on strings,
rather than on characters.
This is implemented in this library, through the functions declared
in `<unicase.h>', see *note unicase.h::.
File: libunistring.info, Node: The wchar_t mess, Next: Unicode strings, Prev: char * strings, Up: Introduction
1.6 The `wchar_t' mess
======================
The ISO C and POSIX standard creators made an attempt to fix the
first problem mentioned in the previous section. They introduced
* a type `wchar_t', designed to encapsulate an entire character,
* a "wide string" type `wchar_t *', and
* functions declared in `<wctype.h>' that were meant to supplant the
ones in `<ctype.h>'.
Unfortunately, this API and its implementation has numerous problems:
* On AIX and Windows platforms, `wchar_t' is a 16-bit type. This
means that it can never accommodate an entire Unicode character.
Either the `wchar_t *' strings are limited to characters in UCS-2
(the "Basic Multilingual Plane" of Unicode), or -- if `wchar_t *'
strings are encoded in UTF-16 -- a `wchar_t' represents only half
of a character in the worst case, making the `<wctype.h>' functions
pointless.
* On Solaris and FreeBSD, the `wchar_t' encoding is locale dependent
and undocumented. This means, if you want to know any property of
a `wchar_t' character, other than the properties defined by
`<wctype.h>' -- such as whether it's a dash, currency symbol,
paragraph separator, or similar --, you have to convert it to
`char *' encoding first, by use of the function `wctomb'.
* When you read a stream of wide characters, through the functions
`fgetwc' and `fgetws', and when the input stream/file is not in
the expected encoding, you have no way to determine the invalid
byte sequence and do some corrective action. If you use these
functions, your program becomes "garbage in - more garbage out" or
"garbage in - abort".
As a consequence, it is better to use multibyte strings, as
explained in the previous section. Such multibyte strings can bypass
limitations of the `wchar_t' type, if you use functions defined in
gnulib and libunistring for text processing. They can also faithfully
transport malformed characters that were present in the input, without
requiring the program to produce garbage or abort.
File: libunistring.info, Node: Unicode strings, Prev: The wchar_t mess, Up: Introduction
1.7 Unicode strings
===================
libunistring supports Unicode strings in three representations:
* UTF-8 strings, through the type `uint8_t *'. The units are bytes
(`uint8_t').
* UTF-16 strings, through the type `uint16_t *', The units are
16-bit memory words (`uint16_t').
* UTF-32 strings, through the type `uint32_t *'. The units are
32-bit memory words (`uint32_t').
As with C strings, there are two variants:
* Unicode strings with a terminating NUL character are represented as
a pointer to the first unit of the string. There is a unit
containing a 0 value at the end. It is considered part of the
string for all memory allocation purposes, but is not considered
part of the string for all other logical purposes.
* Unicode strings where embedded NUL characters are allowed. These
are represented by a pointer to the first unit and the number of
units (not bytes!) of the string. In this setting, there is no
trailing zero-valued unit used as "end marker".
File: libunistring.info, Node: Conventions, Next: unitypes.h, Prev: Introduction, Up: Top
2 Conventions
*************
This chapter explains conventions valid throughout the libunistring
library.
Variables of type `char *' denote C strings in locale encoding. See
*note Locale encodings::.
Variables of type `uint8_t *' denote UTF-8 strings. Their units are
bytes.
Variables of type `uint16_t *' denote UTF-16 strings, without byte
order mark. Their units are 2-byte words.
Variables of type `uint32_t *' denote UTF-32 strings, without byte
order mark. Their units are 4-byte words.
Argument pairs `(S, N)' denote a string `S[0..N-1]' with exactly N
units.
All functions with prefix `ulc_' operate on C strings in locale
encoding.
All functions with prefix `u8_' operate on UTF-8 strings.
All functions with prefix `u16_' operate on UTF-16 strings.
All functions with prefix `u32_' operate on UTF-32 strings.
For every function with prefix `u8_', operating on UTF-8 strings,
there is also a corresponding function with prefix `u16_', operating on
UTF-16 strings, and a corresponding function with prefix `u32_',
operating on UTF-32 strings. Their description is analogous; in this
documentation we describe only the function that operates on UTF-8
strings, for brevity.
A declaration with a variable N denotes the three concrete
declarations with N = 8, N = 16, N = 32.
All parameters starting with `str' and the parameters of functions
starting with `u8_str'/`u16_str'/`u32_str' denote a NUL terminated
string.
Error values are always returned through the `errno' variable,
usually with a return value that indicates the presence of an error
(NULL for functions that return an pointer, or -1 for functions that
return an `int').
Functions returning a string result take a `(RESULTBUF, LENGTHP)'
argument pair. If RESULTBUF is not NULL and the result fits into
`*LENGTHP' units, it is put in RESULTBUF, and RESULTBUF is returned.
Otherwise, a freshly allocated string is returned. In both cases,
`*LENGTHP' is set to the length (number of units) of the returned
string. In case of error, NULL is returned and `errno' is set.
File: libunistring.info, Node: unitypes.h, Next: unistr.h, Prev: Conventions, Up: Top
3 Elementary types `<unitypes.h>'
*********************************
The include file `<unitypes.h>' provides the following basic types.
-- Type: uint8_t
-- Type: uint16_t
-- Type: uint32_t
These are the storage units of UTF-8/16/32 strings, respectively.
The definitions are taken from `<stdint.h>', on platforms where
this include file is present.
-- Type: ucs4_t
This type represents a single Unicode character, outside of an
UTF-32 string.
File: libunistring.info, Node: unistr.h, Next: uniconv.h, Prev: unitypes.h, Up: Top
4 Elementary Unicode string functions `<unistr.h>'
**************************************************
This include file declares elementary functions for Unicode strings.
It is essentially the equivalent of what `<string.h>' is for C strings.
* Menu:
* Elementary string checks::
* Elementary string conversions::
* Elementary string functions::
* Elementary string functions with memory allocation::
* Elementary string functions on NUL terminated strings::
File: libunistring.info, Node: Elementary string checks, Next: Elementary string conversions, Up: unistr.h
4.1 Elementary string checks
============================
The following function is available to verify the integrity of a
Unicode string.
-- Function: const uint8_t * u8_check (const uint8_t *S, size_t N)
-- Function: const uint16_t * u16_check (const uint16_t *S, size_t N)
-- Function: const uint32_t * u32_check (const uint32_t *S, size_t N)
This function checks whether a Unicode string is well-formed. It
returns NULL if valid, or a pointer to the first invalid unit
otherwise.
File: libunistring.info, Node: Elementary string conversions, Next: Elementary string functions, Prev: Elementary string checks, Up: unistr.h
4.2 Elementary string conversions
=================================
The following functions perform conversions between the different
forms of Unicode strings.
-- Function: uint16_t * u8_to_u16 (const uint8_t *S, size_t N,
uint16_t *RESULTBUF, size_t *LENGTHP)
Converts an UTF-8 string to an UTF-16 string.
-- Function: uint32_t * u8_to_u32 (const uint8_t *S, size_t N,
uint32_t *RESULTBUF, size_t *LENGTHP)
Converts an UTF-8 string to an UTF-32 string.
-- Function: uint8_t * u16_to_u8 (const uint16_t *S, size_t N, uint8_t
*RESULTBUF, size_t *LENGTHP)
Converts an UTF-16 string to an UTF-8 string.
-- Function: uint32_t * u16_to_u32 (const uint16_t *S, size_t N,
uint32_t *RESULTBUF, size_t *LENGTHP)
Converts an UTF-16 string to an UTF-32 string.
-- Function: uint8_t * u32_to_u8 (const uint32_t *S, size_t N, uint8_t
*RESULTBUF, size_t *LENGTHP)
Converts an UTF-32 string to an UTF-8 string.
-- Function: uint16_t * u32_to_u16 (const uint32_t *S, size_t N,
uint16_t *RESULTBUF, size_t *LENGTHP)
Converts an UTF-32 string to an UTF-16 string.
File: libunistring.info, Node: Elementary string functions, Next: Elementary string functions with memory allocation, Prev: Elementary string conversions, Up: unistr.h
4.3 Elementary string functions
===============================
The following functions inspect and return details about the first
character in a Unicode string.
-- Function: int u8_mblen (const uint8_t *S, size_t N)
-- Function: int u16_mblen (const uint16_t *S, size_t N)
-- Function: int u32_mblen (const uint32_t *S, size_t N)
Returns the length (number of units) of the first character in S,
which is no longer than N. Returns 0 if it is the NUL character.
Returns -1 upon failure.
This function is similar to `mblen', except that it operates on a
Unicode string and that S must not be NULL.
-- Function: int u8_mbtouc_unsafe (ucs4_t *PUC, const uint8_t *S,
size_t N)
-- Function: int u16_mbtouc_unsafe (ucs4_t *PUC, const uint16_t *S,
size_t N)
-- Function: int u32_mbtouc_unsafe (ucs4_t *PUC, const uint32_t *S,
size_t N)
Returns the length (number of units) of the first character in S,
putting its `ucs4_t' representation in `*PUC'. Upon failure,
`*PUC' is set to `0xfffd', and an appropriate number of units is
returned.
The number of available units, N, must be > 0.
This function is similar to `mbtowc', except that it operates on a
Unicode string, PUC and S must not be NULL, N must be > 0, and the
NUL character is not treated specially.
-- Function: int u8_mbtouc (ucs4_t *PUC, const uint8_t *S, size_t N)
-- Function: int u16_mbtouc (ucs4_t *PUC, const uint16_t *S, size_t N)
-- Function: int u32_mbtouc (ucs4_t *PUC, const uint32_t *S, size_t N)
This function is like `u8_mbtouc_unsafe', except that it will
detect an invalid UTF-8 character, even if the library is compiled
without `--enable-safety'.
-- Function: int u8_mbtoucr (ucs4_t *PUC, const uint8_t *S, size_t N)
-- Function: int u16_mbtoucr (ucs4_t *PUC, const uint16_t *S, size_t N)
-- Function: int u32_mbtoucr (ucs4_t *PUC, const uint32_t *S, size_t N)
Returns the length (number of units) of the first character in S,
putting its `ucs4_t' representation in `*PUC'. Upon failure,
`*PUC' is set to `0xfffd', and -1 is returned for an invalid
sequence of units, -2 is returned for an incomplete sequence of
units.
The number of available units, N, must be > 0.
This function is similar to `u8_mbtouc', except that the return
value gives more details about the failure, similar to `mbrtowc'.
The following function stores a Unicode character as a Unicode
string in memory.
-- Function: int u8_uctomb (uint8_t *S, ucs4_t UC, int N)
-- Function: int u16_uctomb (uint16_t *S, ucs4_t UC, int N)
-- Function: int u32_uctomb (uint32_t *S, ucs4_t UC, int N)
Puts the multibyte character represented by UC in S, returning its
length. Returns -1 upon failure, -2 if the number of available
units, N, is too small. The latter case cannot occur if N >=
6/2/1, respectively.
This function is similar to `wctomb', except that it operates on a
Unicode strings, S must not be NULL, and the argument N must be
specified.
The following functions copy Unicode strings in memory.
-- Function: uint8_t * u8_cpy (uint8_t *DEST, const uint8_t *SRC,
size_t N)
-- Function: uint16_t * u16_cpy (uint16_t *DEST, const uint16_t *SRC,
size_t N)
-- Function: uint32_t * u32_cpy (uint32_t *DEST, const uint32_t *SRC,
size_t N)
Copies N units from SRC to DEST.
This function is similar to `memcpy', except that it operates on
Unicode strings.
-- Function: uint8_t * u8_move (uint8_t *DEST, const uint8_t *SRC,
size_t N)
-- Function: uint16_t * u16_move (uint16_t *DEST, const uint16_t *SRC,
size_t N)
-- Function: uint32_t * u32_move (uint32_t *DEST, const uint32_t *SRC,
size_t N)
Copies N units from SRC to DEST, guaranteeing correct behavior for
overlapping memory areas.
This function is similar to `memmove', except that it operates on
Unicode strings.
The following function fills a Unicode string.
-- Function: uint8_t * u8_set (uint8_t *S, ucs4_t UC, size_t N)
-- Function: uint16_t * u16_set (uint16_t *S, ucs4_t UC, size_t N)
-- Function: uint32_t * u32_set (uint32_t *S, ucs4_t UC, size_t N)
Sets the first N characters of S to UC. UC should be a character
that occupies only 1 unit.
This function is similar to `memset', except that it operates on
Unicode strings.
The following function compares two Unicode strings of the same
length.
-- Function: int u8_cmp (const uint8_t *S1, const uint8_t *S2, size_t
N)
-- Function: int u16_cmp (const uint16_t *S1, const uint16_t *S2,
size_t N)
-- Function: int u32_cmp (const uint32_t *S1, const uint32_t *S2,
size_t N)
Compares S1 and S2, each of length N, lexicographically. Returns
a negative value if S1 compares smaller than S2, a positive value
if S1 compares larger than S2, or 0 if they compare equal.
This function is similar to `memcmp', except that it operates on
Unicode strings.
The following function compares two Unicode strings of possibly
different lengths.
-- Function: int u8_cmp2 (const uint8_t *S1, size_t N1, const uint8_t
*S2, size_t N2)
-- Function: int u16_cmp2 (const uint16_t *S1, size_t N1, const
uint16_t *S2, size_t N2)
-- Function: int u32_cmp2 (const uint32_t *S1, size_t N1, const
uint32_t *S2, size_t N2)
Compares S1 and S2, lexicographically. Returns a negative value
if S1 compares smaller than S2, a positive value if S1 compares
larger than S2, or 0 if they compare equal.
This function is similar to the gnulib function `memcmp2', except
that it operates on Unicode strings.
The following function searches for a given Unicode character.
-- Function: uint8_t * u8_chr (const uint8_t *S, size_t N, ucs4_t UC)
-- Function: uint16_t * u16_chr (const uint16_t *S, size_t N, ucs4_t
UC)
-- Function: uint32_t * u32_chr (const uint32_t *S, size_t N, ucs4_t
UC)
Searches the string at S for UC. Returns a pointer to the first
occurrence of UC in S, or NULL if UC does not occur in S.
This function is similar to `memchr', except that it operates on
Unicode strings.
The following function counts the number of Unicode characters.
-- Function: size_t u8_mbsnlen (const uint8_t *S, size_t N)
-- Function: size_t u16_mbsnlen (const uint16_t *S, size_t N)
-- Function: size_t u32_mbsnlen (const uint32_t *S, size_t N)
Counts and returns the number of Unicode characters in the N units
from S.
This function is similar to the gnulib function `mbsnlen', except
that it operates on Unicode strings.
File: libunistring.info, Node: Elementary string functions with memory allocation, Next: Elementary string functions on NUL terminated strings, Prev: Elementary string functions, Up: unistr.h
4.4 Elementary string functions with memory allocation
======================================================
The following function copies a Unicode string.
-- Function: uint8_t * u8_cpy_alloc (const uint8_t *S, size_t N)
-- Function: uint16_t * u16_cpy_alloc (const uint16_t *S, size_t N)
-- Function: uint32_t * u32_cpy_alloc (const uint32_t *S, size_t N)
Makes a freshly allocated copy of S, of length N.
File: libunistring.info, Node: Elementary string functions on NUL terminated strings, Prev: Elementary string functions with memory allocation, Up: unistr.h
4.5 Elementary string functions on NUL terminated strings
=========================================================
The following functions inspect and return details about the first
character in a Unicode string.
-- Function: int u8_strmblen (const uint8_t *S)
-- Function: int u16_strmblen (const uint16_t *S)
-- Function: int u32_strmblen (const uint32_t *S)
Returns the length (number of units) of the first character in S.
Returns 0 if it is the NUL character. Returns -1 upon failure.
-- Function: int u8_strmbtouc (ucs4_t *PUC, const uint8_t *S)
-- Function: int u16_strmbtouc (ucs4_t *PUC, const uint16_t *S)
-- Function: int u32_strmbtouc (ucs4_t *PUC, const uint32_t *S)
Returns the length (number of units) of the first character in S,
putting its `ucs4_t' representation in `*PUC'. Returns 0 if it is
the NUL character. Returns -1 upon failure.
-- Function: const uint8_t * u8_next (ucs4_t *PUC, const uint8_t *S)
-- Function: const uint16_t * u16_next (ucs4_t *PUC, const uint16_t *S)
-- Function: const uint32_t * u32_next (ucs4_t *PUC, const uint32_t *S)
Forward iteration step. Advances the pointer past the next
character, or returns NULL if the end of the string has been
reached. Puts the character's `ucs4_t' representation in `*PUC'.
The following function inspects and returns details about the
previous character in a Unicode string.
-- Function: const uint8_t * u8_prev (ucs4_t *PUC, const uint8_t *S,
const uint8_t *START)
-- Function: const uint16_t * u16_prev (ucs4_t *PUC, const uint16_t
*S, const uint16_t *START)
-- Function: const uint32_t * u32_prev (ucs4_t *PUC, const uint32_t
*S, const uint32_t *START)
Backward iteration step. Advances the pointer to point to the
previous character, or returns NULL if the beginning of the string
had been reached. Puts the character's `ucs4_t' representation in
`*PUC'.
The following functions determine the length of a Unicode string.
-- Function: size_t u8_strlen (const uint8_t *S)
-- Function: size_t u16_strlen (const uint16_t *S)
-- Function: size_t u32_strlen (const uint32_t *S)
Returns the number of units in S.
This function is similar to `strlen' and `wcslen', except that it
operates on Unicode strings.
-- Function: size_t u8_strnlen (const uint8_t *S, size_t MAXLEN)
-- Function: size_t u16_strnlen (const uint16_t *S, size_t MAXLEN)
-- Function: size_t u32_strnlen (const uint32_t *S, size_t MAXLEN)
Returns the number of units in S, but at most MAXLEN.
This function is similar to `strnlen' and `wcsnlen', except that
it operates on Unicode strings.
The following functions copy portions of Unicode strings in memory.
-- Function: uint8_t * u8_strcpy (uint8_t *DEST, const uint8_t *SRC)
-- Function: uint16_t * u16_strcpy (uint16_t *DEST, const uint16_t
*SRC)
-- Function: uint32_t * u32_strcpy (uint32_t *DEST, const uint32_t
*SRC)
Copies SRC to DEST.
This function is similar to `strcpy' and `wcscpy', except that it
operates on Unicode strings.
-- Function: uint8_t * u8_stpcpy (uint8_t *DEST, const uint8_t *SRC)
-- Function: uint16_t * u16_stpcpy (uint16_t *DEST, const uint16_t
*SRC)
-- Function: uint32_t * u32_stpcpy (uint32_t *DEST, const uint32_t
*SRC)
Copies SRC to DEST, returning the address of the terminating NUL
in DEST.
This function is similar to `stpcpy', except that it operates on
Unicode strings.
-- Function: uint8_t * u8_strncpy (uint8_t *DEST, const uint8_t *SRC,
size_t N)
-- Function: uint16_t * u16_strncpy (uint16_t *DEST, const uint16_t
*SRC, size_t N)
-- Function: uint32_t * u32_strncpy (uint32_t *DEST, const uint32_t
*SRC, size_t N)
Copies no more than N units of SRC to DEST.
This function is similar to `strncpy' and `wcsncpy', except that
it operates on Unicode strings.
-- Function: uint8_t * u8_stpncpy (uint8_t *DEST, const uint8_t *SRC,
size_t N)
-- Function: uint16_t * u16_stpncpy (uint16_t *DEST, const uint16_t
*SRC, size_t N)
-- Function: uint32_t * u32_stpncpy (uint32_t *DEST, const uint32_t
*SRC, size_t N)
Copies no more than N units of SRC to DEST, returning the address
of the last unit written into DEST.
This function is similar to `stpncpy', except that it operates on
Unicode strings.
-- Function: uint8_t * u8_strcat (uint8_t *DEST, const uint8_t *SRC)
-- Function: uint16_t * u16_strcat (uint16_t *DEST, const uint16_t
*SRC)
-- Function: uint32_t * u32_strcat (uint32_t *DEST, const uint32_t
*SRC)
Appends SRC onto DEST.
This function is similar to `strcat' and `wcscat', except that it
operates on Unicode strings.
-- Function: uint8_t * u8_strncat (uint8_t *DEST, const uint8_t *SRC,
size_t N)
-- Function: uint16_t * u16_strncat (uint16_t *DEST, const uint16_t
*SRC, size_t N)
-- Function: uint32_t * u32_strncat (uint32_t *DEST, const uint32_t
*SRC, size_t N)
Appends no more than N units of SRC onto DEST.
This function is similar to `strncat' and `wcsncat', except that
it operates on Unicode strings.
The following functions compare two Unicode strings.
-- Function: int u8_strcmp (const uint8_t *S1, const uint8_t *S2)
-- Function: int u16_strcmp (const uint16_t *S1, const uint16_t *S2)
-- Function: int u32_strcmp (const uint32_t *S1, const uint32_t *S2)
Compares S1 and S2, lexicographically. Returns a negative value
if S1 compares smaller than S2, a positive value if S1 compares
larger than S2, or 0 if they compare equal.
This function is similar to `strcmp' and `wcscmp', except that it
operates on Unicode strings.
-- Function: int u8_strcoll (const uint8_t *S1, const uint8_t *S2)
-- Function: int u16_strcoll (const uint16_t *S1, const uint16_t *S2)
-- Function: int u32_strcoll (const uint32_t *S1, const uint32_t *S2)
Compares S1 and S2 using the collation rules of the current locale.
Returns -1 if S1 < S2, 0 if S1 = S2, 1 if S1 > S2. Upon failure,
sets `errno' and returns any value.
This function is similar to `strcoll' and `wcscoll', except that
it operates on Unicode strings.
Note that this function may consider different canonical
normalizations of the same string as having a large distance. It
is therefore better to use the function `u8_normcoll' instead of
this one; see *note uninorm.h::.
-- Function: int u8_strncmp (const uint8_t *S1, const uint8_t *S2,
size_t N)
-- Function: int u16_strncmp (const uint16_t *S1, const uint16_t *S2,
size_t N)
-- Function: int u32_strncmp (const uint32_t *S1, const uint32_t *S2,
size_t N)
Compares no more than N units of S1 and S2.
This function is similar to `strncmp' and `wcsncmp', except that
it operates on Unicode strings.
The following function allocates a duplicate of a Unicode string.
-- Function: uint8_t * u8_strdup (const uint8_t *S)
-- Function: uint16_t * u16_strdup (const uint16_t *S)
-- Function: uint32_t * u32_strdup (const uint32_t *S)
Duplicates S, returning an identical malloc'd string.
This function is similar to `strdup' and `wcsdup', except that it
operates on Unicode strings.
The following functions search for a given Unicode character.
-- Function: uint8_t * u8_strchr (const uint8_t *STR, ucs4_t UC)
-- Function: uint16_t * u16_strchr (const uint16_t *STR, ucs4_t UC)
-- Function: uint32_t * u32_strchr (const uint32_t *STR, ucs4_t UC)
Finds the first occurrence of UC in STR.
This function is similar to `strchr' and `wcschr', except that it
operates on Unicode strings.
-- Function: uint8_t * u8_strrchr (const uint8_t *STR, ucs4_t UC)
-- Function: uint16_t * u16_strrchr (const uint16_t *STR, ucs4_t UC)
-- Function: uint32_t * u32_strrchr (const uint32_t *STR, ucs4_t UC)
Finds the last occurrence of UC in STR.
This function is similar to `strrchr' and `wcsrchr', except that
it operates on Unicode strings.
The following functions search for the first occurrence of some
Unicode character in or outside a given set of Unicode characters.
-- Function: size_t u8_strcspn (const uint8_t *STR, const uint8_t
*REJECT)
-- Function: size_t u16_strcspn (const uint16_t *STR, const uint16_t
*REJECT)
-- Function: size_t u32_strcspn (const uint32_t *STR, const uint32_t
*REJECT)
Returns the length of the initial segment of STR which consists
entirely of Unicode characters not in REJECT.
This function is similar to `strcspn' and `wcscspn', except that
it operates on Unicode strings.
-- Function: size_t u8_strspn (const uint8_t *STR, const uint8_t
*ACCEPT)
-- Function: size_t u16_strspn (const uint16_t *STR, const uint16_t
*ACCEPT)
-- Function: size_t u32_strspn (const uint32_t *STR, const uint32_t
*ACCEPT)
Returns the length of the initial segment of STR which consists
entirely of Unicode characters in ACCEPT.
This function is similar to `strspn' and `wcsspn', except that it
operates on Unicode strings.
-- Function: uint8_t * u8_strpbrk (const uint8_t *STR, const uint8_t
*ACCEPT)
-- Function: uint16_t * u16_strpbrk (const uint16_t *STR, const
uint16_t *ACCEPT)
-- Function: uint32_t * u32_strpbrk (const uint32_t *STR, const
uint32_t *ACCEPT)
Finds the first occurrence in STR of any character in ACCEPT.
This function is similar to `strpbrk' and `wcspbrk', except that
it operates on Unicode strings.
The following functions search whether a given Unicode string is a
substring of another Unicode string.
-- Function: uint8_t * u8_strstr (const uint8_t *HAYSTACK, const
uint8_t *NEEDLE)
-- Function: uint16_t * u16_strstr (const uint16_t *HAYSTACK, const
uint16_t *NEEDLE)
-- Function: uint32_t * u32_strstr (const uint32_t *HAYSTACK, const
uint32_t *NEEDLE)
Finds the first occurrence of NEEDLE in HAYSTACK.
This function is similar to `strstr' and `wcsstr', except that it
operates on Unicode strings.
-- Function: bool u8_startswith (const uint8_t *STR, const uint8_t
*PREFIX)
-- Function: bool u16_startswith (const uint16_t *STR, const uint16_t
*PREFIX)
-- Function: bool u32_startswith (const uint32_t *STR, const uint32_t
*PREFIX)
Tests whether STR starts with PREFIX.
-- Function: bool u8_endswith (const uint8_t *STR, const uint8_t
*SUFFIX)
-- Function: bool u16_endswith (const uint16_t *STR, const uint16_t
*SUFFIX)
-- Function: bool u32_endswith (const uint32_t *STR, const uint32_t
*SUFFIX)
Tests whether STR ends with SUFFIX.
The following function does one step in tokenizing a Unicode string.
-- Function: uint8_t * u8_strtok (uint8_t *STR, const uint8_t *DELIM,
uint8_t **PTR)
-- Function: uint16_t * u16_strtok (uint16_t *STR, const uint16_t
*DELIM, uint16_t **PTR)
-- Function: uint32_t * u32_strtok (uint32_t *STR, const uint32_t
*DELIM, uint32_t **PTR)
Divides STR into tokens separated by characters in DELIM.
This function is similar to `strtok_r' and `wcstok', except that
it operates on Unicode strings. Its interface is actually more
similar to `wcstok' than to `strtok'.
File: libunistring.info, Node: uniconv.h, Next: unistdio.h, Prev: unistr.h, Up: Top
5 Conversions between Unicode and encodings `<uniconv.h>'
*********************************************************
This include file declares functions for converting between Unicode
strings and `char *' strings in locale encoding or in other specified
encodings.
The following function returns the locale encoding.
-- Function: const char * locale_charset ()
Determines the current locale's character encoding, and
canonicalizes it into one of the canonical names listed in
`config.charset'. If the canonical name cannot be determined, the
result is a non-canonical name.
The result must not be freed; it is statically allocated.
The result of this function can be used as an argument to the
`iconv_open' function in GNU libc, in GNU libiconv, or in the
gnulib provided wrapper around the native `iconv_open' function.
It may not work as an argument to the native `iconv_open' function
directly.
The handling of unconvertible characters during the conversions can
be parametrized through the following enumeration type:
-- Type: enum iconv_ilseq_handler
This type specifies how unconvertible characters in the input are
handled.
-- Constant: enum iconv_ilseq_handler iconveh_error
This handler causes the function to return with `errno' set to
`EILSEQ'.
-- Constant: enum iconv_ilseq_handler iconveh_question_mark
This handler produces one question mark `?' per unconvertible
character.
-- Constant: enum iconv_ilseq_handler iconveh_escape_sequence
This handler produces an escape sequence `\uXXXX' or `\UXXXXXXXX'
for each unconvertible character.
The following functions convert between strings in a specified
encoding and Unicode strings.
-- Function: uint8_t * u8_conv_from_encoding (const char *FROMCODE,
enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
SRCLEN, size_t *OFFSETS, uint8_t *RESULTBUF, size_t *LENGTHP)
-- Function: uint16_t * u16_conv_from_encoding (const char *FROMCODE,
enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
SRCLEN, size_t *OFFSETS, uint16_t *RESULTBUF, size_t *LENGTHP)
-- Function: uint32_t * u32_conv_from_encoding (const char *FROMCODE,
enum iconv_ilseq_handler HANDLER, const char *SRC, size_t
SRCLEN, size_t *OFFSETS, uint32_t *RESULTBUF, size_t *LENGTHP)
Converts an entire string, possibly including NUL bytes, from one
encoding to UTF-8 encoding.
Converts a memory region given in encoding FROMCODE. FROMCODE is
as for the `iconv_open' function.
The input is in the memory region between SRC (inclusive) and `SRC
+ SRCLEN' (exclusive).
If OFFSETS is not NULL, it should point to an array of SRCLEN
integers; this array is filled with offsets into the result, i.e.
the character starting at `SRC[i]' corresponds to the character
starting at `RESULT[OFFSETS[i]]', and other offsets are set to
`(size_t)(-1)'.
`RESULTBUF' and `*LENGTHP' should be a scratch buffer and its
size, or `RESULTBUF' can be NULL.
May erase the contents of the memory at `RESULTBUF'.
If successful: The resulting Unicode string (non-NULL) is returned
and its length stored in `*LENGTHP'. The resulting string is
`RESULTBUF' if no dynamic memory allocation was necessary, or a
freshly allocated memory block otherwise.
In case of error: NULL is returned and `errno' is set. Particular
`errno' values: `EINVAL', `EILSEQ', `ENOMEM'.
-- Function: char * u8_conv_to_encoding (const char *TOCODE, enum
iconv_ilseq_handler HANDLER, const uint8_t *SRC, size_t
SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
-- Function: char * u16_conv_to_encoding (const char *TOCODE, enum
iconv_ilseq_handler HANDLER, const uint16_t *SRC, size_t
SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
-- Function: char * u32_conv_to_encoding (const char *TOCODE, enum
iconv_ilseq_handler HANDLER, const uint32_t *SRC, size_t
SRCLEN, size_t *OFFSETS, char *RESULTBUF, size_t *LENGTHP)
Converts an entire Unicode string, possibly including NUL units,
from UTF-8 encoding to a given encoding.
Converts a memory region to encoding TOCODE. TOCODE is as for the
`iconv_open' function.
The input is in the memory region between SRC (inclusive) and `SRC
+ SRCLEN' (exclusive).
If OFFSETS is not NULL, it should point to an array of SRCLEN
integers; this array is filled with offsets into the result, i.e.
the character starting at `SRC[i]' corresponds to the character
starting at `RESULT[OFFSETS[i]]', and other offsets are set to
`(size_t)(-1)'.
`RESULTBUF' and `*LENGTHP' should be a scratch buffer and its
size, or `RESULTBUF' can be NULL.
May erase the contents of the memory at `RESULTBUF'.
If successful: The resulting Unicode string (non-NULL) is returned
and its length stored in `*LENGTHP'. The resulting string is
`RESULTBUF' if no dynamic memory allocation was necessary, or a
freshly allocated memory block otherwise.
In case of error: NULL is returned and `errno' is set. Particular
`errno' values: `EINVAL', `EILSEQ', `ENOMEM'.
The following functions convert between NUL terminated strings in a
specified encoding and NUL terminated Unicode strings.
-- Function: uint8_t * u8_strconv_from_encoding (const char *STRING,
const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
-- Function: uint16_t * u16_strconv_from_encoding (const char *STRING,
const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
-- Function: uint32_t * u32_strconv_from_encoding (const char *STRING,
const char *FROMCODE, enum iconv_ilseq_handler HANDLER)
Converts a NUL terminated string from a given encoding.
The result is `malloc' allocated, or NULL (with ERRNO set) in case
of error.
Particular `errno' values: `EILSEQ', `ENOMEM'.
-- Function: char * u8_strconv_to_encoding (const uint8_t *STRING,
const char *TOCODE, enum iconv_ilseq_handler HANDLER)
-- Function: char * u16_strconv_to_encoding (const uint16_t *STRING,
const char *TOCODE, enum iconv_ilseq_handler HANDLER)
-- Function: char * u32_strconv_to_encoding (const uint32_t *STRING,
const char *TOCODE, enum iconv_ilseq_handler HANDLER)
Converts a NUL terminated string to a given encoding.
The result is `malloc' allocated, or NULL (with `errno' set) in
case of error.
Particular `errno' values: `EILSEQ', `ENOMEM'.
The following functions are shorthands that convert between NUL
terminated strings in locale encoding and NUL terminated Unicode
strings.
-- Function: uint8_t * u8_strconv_from_locale (const char *STRING)
-- Function: uint16_t * u16_strconv_from_locale (const char *STRING)
-- Function: uint32_t * u32_strconv_from_locale (const char *STRING)
Converts a NUL terminated string from the locale encoding.
The result is `malloc' allocated, or NULL (with `errno' set) in
case of error.
Particular `errno' values: `ENOMEM'.
-- Function: char * u8_strconv_to_locale (const uint8_t *STRING)
-- Function: char * u16_strconv_to_locale (const uint16_t *STRING)
-- Function: char * u32_strconv_to_locale (const uint32_t *STRING)
Converts a NUL terminated string to the locale encoding.
The result is `malloc' allocated, or NULL (with `errno' set) in
case of error.
Particular `errno' values: `ENOMEM'.
File: libunistring.info, Node: unistdio.h, Next: uniname.h, Prev: uniconv.h, Up: Top
6 Output with Unicode strings `<unistdio.h>'
********************************************
This include file declares functions for doing formatted output with
Unicode strings. It defines a set of functions similar to `fprintf' and
`sprintf', which are declared in `<stdio.h>'.
These functions work like the `printf' function family. In the
format string:
* The format directive `U' takes an UTF-8 string (`const uint8_t *').
* The format directive `lU' takes an UTF-16 string (`const uint16_t
*').
* The format directive `llU' takes an UTF-32 string (`const uint32_t
*').
A function name with an infix `v' indicates that a `va_list' is
passed instead of multiple arguments.
The functions `*sprintf' have a BUF argument that is assumed to be
large enough. (_DANGEROUS! Overflowing the buffer will crash the
program._)
The functions `*snprintf' have a BUF argument that is assumed to be
SIZE units large. (_DANGEROUS! The resulting string might be
truncated in the middle of a multibyte character._)
The functions `*asprintf' have a RESULTP argument. The result will
be freshly allocated and stored in `*resultp'.
The functions `*asnprintf' have a (RESULTBUF, LENGTHP) argument
pair. If RESULTBUF is not NULL and the result fits into `*LENGTHP'
units, it is put in RESULTBUF, and RESULTBUF is returned. Otherwise, a
freshly allocated string is returned. In both cases, `*LENGTHP' is set
to the length (number of units) of the returned string. In case of
error, NULL is returned and `errno' is set.
The following functions take an ASCII format string and return a
result that is a `char *' string in locale encoding.
-- Function: int ulc_sprintf (char *BUF, const char *FORMAT, ...)
-- Function: int ulc_snprintf (char *BUF, size_t size, const char
*FORMAT, ...)
-- Function: int ulc_asprintf (char **RESULTP, const char *FORMAT, ...)
-- Function: char * ulc_asnprintf (char *RESULTBUF, size_t *LENGTHP,
const char *FORMAT, ...)
-- Function: int ulc_vsprintf (char *BUF, const char *FORMAT, va_list
AP)
-- Function: int ulc_vsnprintf (char *BUF, size_t size, const char
*FORMAT, va_list AP)
-- Function: int ulc_vasprintf (char **RESULTP, const char *FORMAT,
va_list AP)
-- Function: char * ulc_vasnprintf (char *RESULTBUF, size_t *LENGTHP,
const char *FORMAT, va_list AP)
The following functions take an ASCII format string and return a
result in UTF-8 format.
-- Function: int u8_sprintf (uint8_t *BUF, const char *FORMAT, ...)
-- Function: int u8_snprintf (uint8_t *BUF, size_t SIZE, const char
*FORMAT, ...)
-- Function: int u8_asprintf (uint8_t **RESULTP, const char *FORMAT,
...)
-- Function: uint8_t * u8_asnprintf (uint8_t *RESULTBUF, size_t
*LENGTHP, const char *FORMAT, ...)
-- Function: int u8_vsprintf (uint8_t *BUF, const char *FORMAT,
va_list ap)
-- Function: int u8_vsnprintf (uint8_t *BUF, size_t SIZE, const char
*FORMAT, va_list AP)
-- Function: int u8_vasprintf (uint8_t **RESULTP, const char *FORMAT,
va_list AP)
-- Function: uint8_t * u8_vasnprintf (uint8_t *resultbuf, size_t
*LENGTHP, const char *FORMAT, va_list AP)
The following functions take an UTF-8 format string and return a
result in UTF-8 format.
-- Function: int u8_u8_sprintf (uint8_t *BUF, const uint8_t *FORMAT,
...)
-- Function: int u8_u8_snprintf (uint8_t *BUF, size_t SIZE, const
uint8_t *FORMAT, ...)
-- Function: int u8_u8_asprintf (uint8_t **RESULTP, const uint8_t
*FORMAT, ...)
-- Function: uint8_t * u8_u8_asnprintf (uint8_t *resultbuf, size_t
*LENGTHP, const uint8_t *FORMAT, ...)
-- Function: int u8_u8_vsprintf (uint8_t *BUF, const uint8_t *FORMAT,
va_list AP)
-- Function: int u8_u8_vsnprintf (uint8_t *BUF, size_t SIZE, const
uint8_t *FORMAT, va_list AP)
-- Function: int u8_u8_vasprintf (uint8_t **RESULTP, const uint8_t
*FORMAT, va_list AP)
-- Function: uint8_t * u8_u8_vasnprintf (uint8_t *resultbuf, size_t
*LENGTHP, const uint8_t *FORMAT, va_list AP)
The following functions take an ASCII format string and return a
result in UTF-16 format.
-- Function: int u16_sprintf (uint16_t *BUF, const char *FORMAT, ...)
-- Function: int u16_snprintf (uint16_t *BUF, size_t SIZE, const char
*FORMAT, ...)
-- Function: int u16_asprintf (uint16_t **RESULTP, const char *FORMAT,
...)
-- Function: uint16_t * u16_asnprintf (uint16_t *RESULTBUF, size_t
*LENGTHP, const char *FORMAT, ...)
-- Function: int u16_vsprintf (uint16_t *BUF, const char *FORMAT,
va_list ap)
-- Function: int u16_vsnprintf (uint16_t *BUF, size_t SIZE, const char
*FORMAT, va_list AP)
-- Function: int u16_vasprintf (uint16_t **RESULTP, const char
*FORMAT, va_list AP)
-- Function: uint16_t * u16_vasnprintf (uint16_t *resultbuf, size_t
*LENGTHP, const char *FORMAT, va_list AP)
The following functions take an UTF-16 format string and return a
result in UTF-16 format.
-- Function: int u16_u16_sprintf (uint16_t *BUF, const uint16_t
*FORMAT, ...)
-- Function: int u16_u16_snprintf (uint16_t *BUF, size_t SIZE, const
uint16_t *FORMAT, ...)
-- Function: int u16_u16_asprintf (uint16_t **RESULTP, const uint16_t
*FORMAT, ...)
-- Function: uint16_t * u16_u16_asnprintf (uint16_t *resultbuf, size_t
*LENGTHP, const uint16_t *FORMAT, ...)
-- Function: int u16_u16_vsprintf (uint16_t *BUF, const uint16_t
*FORMAT, va_list AP)
-- Function: int u16_u16_vsnprintf (uint16_t *BUF, size_t SIZE, const
uint16_t *FORMAT, va_list AP)
-- Function: int u16_u16_vasprintf (uint16_t **RESULTP, const uint16_t
*FORMAT, va_list AP)
-- Function: uint16_t * u16_u16_vasnprintf (uint16_t *resultbuf,
size_t *LENGTHP, const uint16_t *FORMAT, va_list AP)
The following functions take an ASCII format string and return a
result in UTF-32 format.
-- Function: int u32_sprintf (uint32_t *BUF, const char *FORMAT, ...)
-- Function: int u32_snprintf (uint32_t *BUF, size_t SIZE, const char
*FORMAT, ...)
-- Function: int u32_asprintf (uint32_t **RESULTP, const char *FORMAT,
...)
-- Function: uint32_t * u32_asnprintf (uint32_t *RESULTBUF, size_t
*LENGTHP, const char *FORMAT, ...)
-- Function: int u32_vsprintf (uint32_t *BUF, const char *FORMAT,
va_list ap)
-- Function: int u32_vsnprintf (uint32_t *BUF, size_t SIZE, const char
*FORMAT, va_list AP)
-- Function: int u32_vasprintf (uint32_t **RESULTP, const char
*FORMAT, va_list AP)
-- Function: uint32_t * u32_vasnprintf (uint32_t *resultbuf, size_t
*LENGTHP, const char *FORMAT, va_list AP)
The following functions take an UTF-32 format string and return a
result in UTF-32 format.
-- Function: int u32_u32_sprintf (uint32_t *BUF, const uint32_t
*FORMAT, ...)
-- Function: int u32_u32_snprintf (uint32_t *BUF, size_t SIZE, const
uint32_t *FORMAT, ...)
-- Function: int u32_u32_asprintf (uint32_t **RESULTP, const uint32_t
*FORMAT, ...)
-- Function: uint32_t * u32_u32_asnprintf (uint32_t *resultbuf, size_t
*LENGTHP, const uint32_t *FORMAT, ...)
-- Function: int u32_u32_vsprintf (uint32_t *BUF, const uint32_t
*FORMAT, va_list AP)
-- Function: int u32_u32_vsnprintf (uint32_t *BUF, size_t SIZE, const
uint32_t *FORMAT, va_list AP)
-- Function: int u32_u32_vasprintf (uint32_t **RESULTP, const uint32_t
*FORMAT, va_list AP)
-- Function: uint32_t * u32_u32_vasnprintf (uint32_t *resultbuf,
size_t *LENGTHP, const uint32_t *FORMAT, va_list AP)
The following functions take an ASCII format string and produce
output in locale encoding to a `FILE' stream.
-- Function: int ulc_fprintf (FILE *STREAM, const char *FORMAT, ...)
-- Function: int ulc_vfprintf (FILE *STREAM, const char *FORMAT,
va_list AP)
File: libunistring.info, Node: uniname.h, Next: unictype.h, Prev: unistdio.h, Up: Top
7 Names of Unicode characters `<uniname.h>'
*******************************************
This include file implements the association between a Unicode
character and its name.
The name of a Unicode character allows to distinguish it from other,
similar looking characters. For example, the character `x' has the name
`"LATIN SMALL LETTER X"' and is therefore different from the character
named `"MULTIPLICATION SIGN"'.
-- Macro: unsigned int UNINAME_MAX
This macro expands to a constant that is the required size of
buffer for a Unicode character name.
-- Function: char * unicode_character_name (ucs4_t UC, char *BUF)
Looks up the name of a Unicode character, in uppercase ASCII. BUF
must point to a buffer, at least `UNINAME_MAX' bytes in size.
Returns the filled BUF, or NULL if the character does not have a
name.
-- Function: ucs4_t unicode_name_character (const char *NAME)
Looks up the Unicode character with a given name, in upper- or
lowercase ASCII. Returns the character if found, or
`UNINAME_INVALID' if not found.
-- Macro: ucs4_t UNINAME_INVALID
This macro expands to a constant that is a special return value of
the `unicode_name_character' function.
File: libunistring.info, Node: unictype.h, Next: uniwidth.h, Prev: uniname.h, Up: Top
8 Unicode character classification and properties `<unictype.h>'
****************************************************************
This include file declares functions that classify Unicode characters
and that test whether Unicode characters have specific properties.
The classification assigns a "general category" to every Unicode
character. This is similar to the classification provided by ISO C in
`<wctype.h>'.
Properties are the data that guides various text processing
algorithms in the presence of specific Unicode characters.
* Menu:
* General category::
* Canonical combining class::
* Bidirectional category::
* Decimal digit value::
* Digit value::
* Numeric value::
* Mirrored character::
* Properties::
* Scripts::
* Blocks::
* ISO C and Java syntax::
* Classifications like in ISO C::
File: libunistring.info, Node: General category, Next: Canonical combining class, Up: unictype.h
8.1 General category
====================
Every Unicode character or code point has a _general category_
assigned to it. This classification is important for most algorithms
that work on Unicode text.
The GNU libunistring library provides two kinds of API for working
with general categories. The object oriented API uses a variable to
denote every predefined general category value or combinations thereof.
The low-level API uses a bit mask instead. The advantage of the object
oriented API is that if only a few predefined general category values
are used, the data tables are relatively small. When you combine
general category values (using `uc_general_category_or',
`uc_general_category_and', or `uc_general_category_and_not'), or when
you use the low level bit masks, a big table is used thats holds the
complete general category information for all Unicode characters.
* Menu:
* Object oriented API::
* Bit mask API::
File: libunistring.info, Node: Object oriented API, Next: Bit mask API, Up: General category
8.1.1 The object oriented API for general category
--------------------------------------------------
-- Type: uc_general_category_t
This data type denotes a general category value. It is an
immediate type that can be copied by simple assignment, without
involving memory allocation. It is not an array type.
The following are the predefined general category value. Additional
general categories may be added in the future.
-- Constant: uc_general_category_t UC_CATEGORY_L
-- Constant: uc_general_category_t UC_CATEGORY_Lu
-- Constant: uc_general_category_t UC_CATEGORY_Ll
-- Constant: uc_general_category_t UC_CATEGORY_Lt
-- Constant: uc_general_category_t UC_CATEGORY_Lm
-- Constant: uc_general_category_t UC_CATEGORY_Lo
-- Constant: uc_general_category_t UC_CATEGORY_M
-- Constant: uc_general_category_t UC_CATEGORY_Mn
-- Constant: uc_general_category_t UC_CATEGORY_Mc
-- Constant: uc_general_category_t UC_CATEGORY_Me
-- Constant: uc_general_category_t UC_CATEGORY_N
-- Constant: uc_general_category_t UC_CATEGORY_Nd
-- Constant: uc_general_category_t UC_CATEGORY_Nl
-- Constant: uc_general_category_t UC_CATEGORY_No
-- Constant: uc_general_category_t UC_CATEGORY_P
-- Constant: uc_general_category_t UC_CATEGORY_Pc
-- Constant: uc_general_category_t UC_CATEGORY_Pd
-- Constant: uc_general_category_t UC_CATEGORY_Ps
-- Constant: uc_general_category_t UC_CATEGORY_Pe
-- Constant: uc_general_category_t UC_CATEGORY_Pi
-- Constant: uc_general_category_t UC_CATEGORY_Pf
-- Constant: uc_general_category_t UC_CATEGORY_Po
-- Constant: uc_general_category_t UC_CATEGORY_S
-- Constant: uc_general_category_t UC_CATEGORY_Sm
-- Constant: uc_general_category_t UC_CATEGORY_Sc
-- Constant: uc_general_category_t UC_CATEGORY_Sk
-- Constant: uc_general_category_t UC_CATEGORY_So
-- Constant: uc_general_category_t UC_CATEGORY_Z
-- Constant: uc_general_category_t UC_CATEGORY_Zs
-- Constant: uc_general_category_t UC_CATEGORY_Zl
-- Constant: uc_general_category_t UC_CATEGORY_Zp
-- Constant: uc_general_category_t UC_CATEGORY_C
-- Constant: uc_general_category_t UC_CATEGORY_Cc
-- Constant: uc_general_category_t UC_CATEGORY_Cf
-- Constant: uc_general_category_t UC_CATEGORY_Cs
-- Constant: uc_general_category_t UC_CATEGORY_Co
-- Constant: uc_general_category_t UC_CATEGORY_Cn
The following are alias names for predefined General category values.
-- Macro: uc_general_category_t UC_LETTER
This is another name for `UC_CATEGORY_L'.
-- Macro: uc_general_category_t UC_UPPERCASE_LETTER
This is another name for `UC_CATEGORY_Lu'.
-- Macro: uc_general_category_t UC_LOWERCASE_LETTER
This is another name for `UC_CATEGORY_Ll'.
-- Macro: uc_general_category_t UC_TITLECASE_LETTER
This is another name for `UC_CATEGORY_Lt'.
-- Macro: uc_general_category_t UC_MODIFIER_LETTER
This is another name for `UC_CATEGORY_Lm'.
-- Macro: uc_general_category_t UC_OTHER_LETTER
This is another name for `UC_CATEGORY_Lo'.
-- Macro: uc_general_category_t UC_MARK
This is another name for `UC_CATEGORY_M'.
-- Macro: uc_general_category_t UC_NON_SPACING_MARK
This is another name for `UC_CATEGORY_Mn'.
-- Macro: uc_general_category_t UC_COMBINING_SPACING_MARK
This is another name for `UC_CATEGORY_Mc'.
-- Macro: uc_general_category_t UC_ENCLOSING_MARK
This is another name for `UC_CATEGORY_Me'.
-- Macro: uc_general_category_t UC_NUMBER
This is another name for `UC_CATEGORY_N'.
-- Macro: uc_general_category_t UC_DECIMAL_DIGIT_NUMBER
This is another name for `UC_CATEGORY_Nd'.
-- Macro: uc_general_category_t UC_LETTER_NUMBER
This is another name for `UC_CATEGORY_Nl'.
-- Macro: uc_general_category_t UC_OTHER_NUMBER
This is another name for `UC_CATEGORY_No'.
-- Macro: uc_general_category_t UC_PUNCTUATION
This is another name for `UC_CATEGORY_P'.
-- Macro: uc_general_category_t UC_CONNECTOR_PUNCTUATION
This is another name for `UC_CATEGORY_Pc'.
-- Macro: uc_general_category_t UC_DASH_PUNCTUATION
This is another name for `UC_CATEGORY_Pd'.
-- Macro: uc_general_category_t UC_OPEN_PUNCTUATION
This is another name for `UC_CATEGORY_Ps' ("start punctuation").
-- Macro: uc_general_category_t UC_CLOSE_PUNCTUATION
This is another name for `UC_CATEGORY_Pe' ("end punctuation").
-- Macro: uc_general_category_t UC_INITIAL_QUOTE_PUNCTUATION
This is another name for `UC_CATEGORY_Pi'.
-- Macro: uc_general_category_t UC_FINAL_QUOTE_PUNCTUATION
This is another name for `UC_CATEGORY_Pf'.
-- Macro: uc_general_category_t UC_OTHER_PUNCTUATION
This is another name for `UC_CATEGORY_Po'.
-- Macro: uc_general_category_t UC_SYMBOL
This is another name for `UC_CATEGORY_S'.
-- Macro: uc_general_category_t UC_MATH_SYMBOL
This is another name for `UC_CATEGORY_Sm'.
-- Macro: uc_general_category_t UC_CURRENCY_SYMBOL
This is another name for `UC_CATEGORY_Sc'.
-- Macro: uc_general_category_t UC_MODIFIER_SYMBOL
This is another name for `UC_CATEGORY_Sk'.
-- Macro: uc_general_category_t UC_OTHER_SYMBOL
This is another name for `UC_CATEGORY_So'.
-- Macro: uc_general_category_t UC_SEPARATOR
This is another name for `UC_CATEGORY_Z'.
-- Macro: uc_general_category_t UC_SPACE_SEPARATOR
This is another name for `UC_CATEGORY_Zs'.
-- Macro: uc_general_category_t UC_LINE_SEPARATOR
This is another name for `UC_CATEGORY_Zl'.
-- Macro: uc_general_category_t UC_PARAGRAPH_SEPARATOR
This is another name for `UC_CATEGORY_Zp'.
-- Macro: uc_general_category_t UC_OTHER
This is another name for `UC_CATEGORY_C'.
-- Macro: uc_general_category_t UC_CONTROL
This is another name for `UC_CATEGORY_Cc'.
-- Macro: uc_general_category_t UC_FORMAT
This is another name for `UC_CATEGORY_Cf'.
-- Macro: uc_general_category_t UC_SURROGATE
This is another name for `UC_CATEGORY_Cs'. All code points in this
category are invalid characters.
-- Macro: uc_general_category_t UC_PRIVATE_USE
This is another name for `UC_CATEGORY_Co'.
-- Macro: uc_general_category_t UC_UNASSIGNED
This is another name for `UC_CATEGORY_Cn'. Some code points in
this category are invalid characters.
The following functions combine general categories, like in a
boolean algebra, except that there is no `not' operation.
-- Function: uc_general_category_t uc_general_category_or
(uc_general_category_t CATEGORY1, uc_general_category_t
CATEGORY2)
Returns the union of two general categories. This corresponds to
the unions of the two sets of characters.
-- Function: uc_general_category_t uc_general_category_and
(uc_general_category_t CATEGORY1, uc_general_category_t
CATEGORY2)
Returns the intersection of two general categories as bit masks.
This _does not_ correspond to the intersection of the two sets of
characters.
-- Function: uc_general_category_t uc_general_category_and_not
(uc_general_category_t CATEGORY1, uc_general_category_t
CATEGORY2)
Returns the intersection of a general category with the complement
of a second general category, as bit masks. This _does not_
correspond to the intersection with complement, when viewing the
categories as sets of characters.
The following functions associate general categories with their name.
-- Function: const char * uc_general_category_name
(uc_general_category_t CATEGORY)
Returns the name of a general category. Returns NULL if the
general category corresponds to a bit mask that does not have a
name.
-- Function: uc_general_category_t uc_general_category_byname (const
char *CATEGORY_NAME)
Returns the general category given by name, e.g. `"Lu"'.
The following functions view general categories as sets of Unicode
characters.
-- Function: uc_general_category_t uc_general_category (ucs4_t UC)
Returns the general category of a Unicode character.
This function uses a big table.
-- Function: bool uc_is_general_category (ucs4_t UC,
uc_general_category_t CATEGORY)
Tests whether a Unicode character belongs to a given category.
The CATEGORY argument can be a predefined general category or the
combination of several predefined general categories.
File: libunistring.info, Node: Bit mask API, Prev: Object oriented API, Up: General category
8.1.2 The bit mask API for general category
-------------------------------------------
The following are the predefined general category value as bit masks.
Additional general categories may be added in the future.
-- Macro: uint32_t UC_CATEGORY_MASK_L
-- Macro: uint32_t UC_CATEGORY_MASK_Lu
-- Macro: uint32_t UC_CATEGORY_MASK_Ll
-- Macro: uint32_t UC_CATEGORY_MASK_Lt
-- Macro: uint32_t UC_CATEGORY_MASK_Lm
-- Macro: uint32_t UC_CATEGORY_MASK_Lo
-- Macro: uint32_t UC_CATEGORY_MASK_M
-- Macro: uint32_t UC_CATEGORY_MASK_Mn
-- Macro: uint32_t UC_CATEGORY_MASK_Mc
-- Macro: uint32_t UC_CATEGORY_MASK_Me
-- Macro: uint32_t UC_CATEGORY_MASK_N
-- Macro: uint32_t UC_CATEGORY_MASK_Nd
-- Macro: uint32_t UC_CATEGORY_MASK_Nl
-- Macro: uint32_t UC_CATEGORY_MASK_No
-- Macro: uint32_t UC_CATEGORY_MASK_P
-- Macro: uint32_t UC_CATEGORY_MASK_Pc
-- Macro: uint32_t UC_CATEGORY_MASK_Pd
-- Macro: uint32_t UC_CATEGORY_MASK_Ps
-- Macro: uint32_t UC_CATEGORY_MASK_Pe
-- Macro: uint32_t UC_CATEGORY_MASK_Pi
-- Macro: uint32_t UC_CATEGORY_MASK_Pf
-- Macro: uint32_t UC_CATEGORY_MASK_Po
-- Macro: uint32_t UC_CATEGORY_MASK_S
-- Macro: uint32_t UC_CATEGORY_MASK_Sm
-- Macro: uint32_t UC_CATEGORY_MASK_Sc
-- Macro: uint32_t UC_CATEGORY_MASK_Sk
-- Macro: uint32_t UC_CATEGORY_MASK_So
-- Macro: uint32_t UC_CATEGORY_MASK_Z
-- Macro: uint32_t UC_CATEGORY_MASK_Zs
-- Macro: uint32_t UC_CATEGORY_MASK_Zl
-- Macro: uint32_t UC_CATEGORY_MASK_Zp
-- Macro: uint32_t UC_CATEGORY_MASK_C
-- Macro: uint32_t UC_CATEGORY_MASK_Cc
-- Macro: uint32_t UC_CATEGORY_MASK_Cf
-- Macro: uint32_t UC_CATEGORY_MASK_Cs
-- Macro: uint32_t UC_CATEGORY_MASK_Co
-- Macro: uint32_t UC_CATEGORY_MASK_Cn
The following function views general categories as sets of Unicode
characters.
-- Function: bool uc_is_general_category_withtable (ucs4_t UC,
uint32_t BITMASK)
Tests whether a Unicode character belongs to a given category.
The BITMASK argument can be a predefined general category bitmask
or the combination of several predefined general category bitmasks.
This function uses a big table comprising all general categories.
File: libunistring.info, Node: Canonical combining class, Next: Bidirectional category, Prev: General category, Up: unictype.h
8.2 Canonical combining class
=============================
Every Unicode character or code point has a _canonical combining
class_ assigned to it.
What is the meaning of the canonical combining class? Essentially,
it indicates the priority with which a combining character is attached
to its base character. The characters for which the canonical
combining class is 0 are the base characters, and the characters for
which it is greater than 0 are the combining characters. Combining
characters are rendered near/attached/around their base character, and
combining characters with small combining classes are attached "first"
or "closer" to the base character.
The canonical combining class of a character is a number in the range
0..255. The possible values are described in the Unicode Character
Database `http://www.unicode.org/Public/UNIDATA/UCD.html'. The list
here is not definitive; more values can be added in future versions.
-- Constant: int UC_CCC_NR
The canonical combining class value for "Not Reordered" characters.
The value is 0.
-- Constant: int UC_CCC_OV
The canonical combining class value for "Overlay" characters.
-- Constant: int UC_CCC_NK
The canonical combining class value for "Nukta" characters.
-- Constant: int UC_CCC_KV
The canonical combining class value for "Kana Voicing" characters.
-- Constant: int UC_CCC_VR
The canonical combining class value for "Virama" characters.
-- Constant: int UC_CCC_ATBL
The canonical combining class value for "Attached Below Left"
characters.
-- Constant: int UC_CCC_ATB
The canonical combining class value for "Attached Below"
characters.
-- Constant: int UC_CCC_ATAR
The canonical combining class value for "Attached Above Right"
characters.
-- Constant: int UC_CCC_BL
The canonical combining class value for "Below Left" characters.
-- Constant: int UC_CCC_B
The canonical combining class value for "Below" characters.
-- Constant: int UC_CCC_BR
The canonical combining class value for "Below Right" characters.
-- Constant: int UC_CCC_L
The canonical combining class value for "Left" characters.
-- Constant: int UC_CCC_R
The canonical combining class value for "Right" characters.
-- Constant: int UC_CCC_AL
The canonical combining class value for "Above Left" characters.
-- Constant: int UC_CCC_A
The canonical combining class value for "Above" characters.
-- Constant: int UC_CCC_AR
The canonical combining class value for "Above Right" characters.
-- Constant: int UC_CCC_DB
The canonical combining class value for "Double Below" characters.
-- Constant: int UC_CCC_DA
The canonical combining class value for "Double Above" characters.
-- Constant: int UC_CCC_IS
The canonical combining class value for "Iota Subscript"
characters.
The following function looks up the canonical combining class of a
character.
-- Function: int uc_combining_class (ucs4_t UC)
Returns the canonical combining class of a Unicode character.
File: libunistring.info, Node: Bidirectional category, Next: Decimal digit value, Prev: Canonical combining class, Up: unictype.h
8.3 Bidirectional category
==========================
Every Unicode character or code point has a _bidirectional category_
assigned to it.
The bidirectional category guides the bidirectional algorithm
(`http://www.unicode.org/reports/tr9/'). The possible values are the
following.
-- Constant: int UC_BIDI_L
The bidirectional category for `Left-to-Right`" characters.
-- Constant: int UC_BIDI_LRE
The bidirectional category for "Left-to-Right Embedding"
characters.
-- Constant: int UC_BIDI_LRO
The bidirectional category for "Left-to-Right Override" characters.
-- Constant: int UC_BIDI_R
The bidirectional category for "Right-to-Left" characters.
-- Constant: int UC_BIDI_AL
The bidirectional category for "Right-to-Left Arabic" characters.
-- Constant: int UC_BIDI_RLE
The bidirectional category for "Right-to-Left Embedding"
characters.
-- Constant: int UC_BIDI_RLO
The bidirectional category for "Right-to-Left Override" characters.
-- Constant: int UC_BIDI_PDF
The bidirectional category for "Pop Directional Format" characters.
-- Constant: int UC_BIDI_EN
The bidirectional category for "European Number" characters.
-- Constant: int UC_BIDI_ES
The bidirectional category for "European Number Separator"
characters.
-- Constant: int UC_BIDI_ET
The bidirectional category for "European Number Terminator"
characters.
-- Constant: int UC_BIDI_AN
The bidirectional category for "Arabic Number" characters.
-- Constant: int UC_BIDI_CS
The bidirectional category for "Common Number Separator"
characters.
-- Constant: int UC_BIDI_NSM
The bidirectional category for "Non-Spacing Mark" characters.
-- Constant: int UC_BIDI_BN
The bidirectional category for "Boundary Neutral" characters.
-- Constant: int UC_BIDI_B
The bidirectional category for "Paragraph Separator" characters.
-- Constant: int UC_BIDI_S
The bidirectional category for "Segment Separator" characters.
-- Constant: int UC_BIDI_WS
The bidirectional category for "Whitespace" characters.
-- Constant: int UC_BIDI_ON
The bidirectional category for "Other Neutral" characters.
The following functions implement the association between a
bidirectional category and its name.
-- Function: const char * uc_bidi_category_name (int CATEGORY)
Returns the name of a bidirectional category.
-- Function: int uc_bidi_category_byname (const char *CATEGORY_NAME)
Returns the bidirectional category given by name, e.g. `"LRE"'.
The following functions view bidirectional categories as sets of
Unicode characters.
-- Function: int uc_bidi_category (ucs4_t UC)
Returns the bidirectional category of a Unicode character.
-- Function: bool uc_is_bidi_category (ucs4_t UC, int CATEGORY)
Tests whether a Unicode character belongs to a given bidirectional
category.
File: libunistring.info, Node: Decimal digit value, Next: Digit value, Prev: Bidirectional category, Up: unictype.h
8.4 Decimal digit value
=======================
Decimal digits (like the digits from `0' to `9') exist in many
scripts. The following function converts a decimal digit character to
its numerical value.
-- Function: int uc_decimal_value (ucs4_t UC)
Returns the decimal digit value of a Unicode character. The
return value is an integer in the range 0..9, or -1 for characters
that do not represent a decimal digit.
File: libunistring.info, Node: Digit value, Next: Numeric value, Prev: Decimal digit value, Up: unictype.h
8.5 Digit value
===============
Digit characters are like decimal digit characters, possibly in
special forms, like as superscript, subscript, or circled. The
following function converts a digit character to its numerical value.
-- Function: int uc_digit_value (ucs4_t UC)
Returns the digit value of a Unicode character. The return value
is an integer in the range 0..9, or -1 for characters that do not
represent a digit.
File: libunistring.info, Node: Numeric value, Next: Mirrored character, Prev: Digit value, Up: unictype.h
8.6 Numeric value
=================
There are also characters that represent numbers without a digit
system, like the Roman numerals, and fractional numbers, like 1/4 or
3/4.
The following type represents the numeric value of a Unicode
character.
-- Type: uc_fraction_t
This is a structure type with the following fields:
int numerator;
int denominator;
An integer N is represented by `numerator = N', `denominator = 1'.
The following function converts a number character to its numerical
value.
-- Function: uc_fraction_t uc_numeric_value (ucs4_t UC)
Returns the numeric value of a Unicode character. The return
value is a fraction, or the pseudo-fraction `{ 0, 0 }' for
characters that do not represent a number.
File: libunistring.info, Node: Mirrored character, Next: Properties, Prev: Numeric value, Up: unictype.h
8.7 Mirrored character
======================
Character mirroring is used to associate the closing parenthesis
character to the opening parenthesis character, the closing brace
character with the opening brace character, and so on.
The following function looks up the mirrored character of a Unicode
character.
-- Function: bool uc_mirror_char (ucs4_t UC, ucs4_t *PUC)
Stores the mirrored character of a Unicode character UC in `*PUC'
and returns `true', if it exists. Otherwise it stores UC
unmodified in `*PUC' and returns `false'.
File: libunistring.info, Node: Properties, Next: Scripts, Prev: Mirrored character, Up: unictype.h
8.8 Properties
==============
This section defines boolean properties of Unicode characters. This
means, a character either has the given property or does not have it.
In other words, the property can be viewed as a subset of the set of
Unicode characters.
The GNU libunistring library provides two kinds of API for working
with properties. The object oriented API uses a type `uc_property_t'
to designate a property. In the function-based API, which is a bit more
low level, a property is merely a function.
* Menu:
* Properties as objects::
* Properties as functions::
File: libunistring.info, Node: Properties as objects, Next: Properties as functions, Up: Properties
8.8.1 Properties as objects - the object oriented API
-----------------------------------------------------
The following type designates a property on Unicode characters.
-- Type: uc_property_t
This data type denotes a boolean property on Unicode characters.
It is an immediate type that can be copied by simple assignment,
without involving memory allocation. It is not an array type.
Many Unicode properties are predefined.
The following are general properties.
-- Constant: uc_property_t UC_PROPERTY_WHITE_SPACE
-- Constant: uc_property_t UC_PROPERTY_ALPHABETIC
-- Constant: uc_property_t UC_PROPERTY_OTHER_ALPHABETIC
-- Constant: uc_property_t UC_PROPERTY_NOT_A_CHARACTER
-- Constant: uc_property_t UC_PROPERTY_DEFAULT_IGNORABLE_CODE_POINT
-- Constant: uc_property_t
UC_PROPERTY_OTHER_DEFAULT_IGNORABLE_CODE_POINT
-- Constant: uc_property_t UC_PROPERTY_DEPRECATED
-- Constant: uc_property_t UC_PROPERTY_LOGICAL_ORDER_EXCEPTION
-- Constant: uc_property_t UC_PROPERTY_VARIATION_SELECTOR
-- Constant: uc_property_t UC_PROPERTY_PRIVATE_USE
-- Constant: uc_property_t UC_PROPERTY_UNASSIGNED_CODE_VALUE
The following properties are related to case folding.
-- Constant: uc_property_t UC_PROPERTY_UPPERCASE
-- Constant: uc_property_t UC_PROPERTY_OTHER_UPPERCASE
-- Constant: uc_property_t UC_PROPERTY_LOWERCASE
-- Constant: uc_property_t UC_PROPERTY_OTHER_LOWERCASE
-- Constant: uc_property_t UC_PROPERTY_TITLECASE
-- Constant: uc_property_t UC_PROPERTY_SOFT_DOTTED
The following properties are related to identifiers.
-- Constant: uc_property_t UC_PROPERTY_ID_START
-- Constant: uc_property_t UC_PROPERTY_OTHER_ID_START
-- Constant: uc_property_t UC_PROPERTY_ID_CONTINUE
-- Constant: uc_property_t UC_PROPERTY_OTHER_ID_CONTINUE
-- Constant: uc_property_t UC_PROPERTY_XID_START
-- Constant: uc_property_t UC_PROPERTY_XID_CONTINUE
-- Constant: uc_property_t UC_PROPERTY_PATTERN_WHITE_SPACE
-- Constant: uc_property_t UC_PROPERTY_PATTERN_SYNTAX
The following properties have an influence on shaping and rendering.
-- Constant: uc_property_t UC_PROPERTY_JOIN_CONTROL
-- Constant: uc_property_t UC_PROPERTY_GRAPHEME_BASE
-- Constant: uc_property_t UC_PROPERTY_GRAPHEME_EXTEND
-- Constant: uc_property_t UC_PROPERTY_OTHER_GRAPHEME_EXTEND
-- Constant: uc_property_t UC_PROPERTY_GRAPHEME_LINK
The following properties relate to bidirectional reordering.
-- Constant: uc_property_t UC_PROPERTY_BIDI_CONTROL
-- Constant: uc_property_t UC_PROPERTY_BIDI_LEFT_TO_RIGHT
-- Constant: uc_property_t UC_PROPERTY_BIDI_HEBREW_RIGHT_TO_LEFT
-- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_RIGHT_TO_LEFT
-- Constant: uc_property_t UC_PROPERTY_BIDI_EUROPEAN_DIGIT
-- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_SEPARATOR
-- Constant: uc_property_t UC_PROPERTY_BIDI_EUR_NUM_TERMINATOR
-- Constant: uc_property_t UC_PROPERTY_BIDI_ARABIC_DIGIT
-- Constant: uc_property_t UC_PROPERTY_BIDI_COMMON_SEPARATOR
-- Constant: uc_property_t UC_PROPERTY_BIDI_BLOCK_SEPARATOR
-- Constant: uc_property_t UC_PROPERTY_BIDI_SEGMENT_SEPARATOR
-- Constant: uc_property_t UC_PROPERTY_BIDI_WHITESPACE
-- Constant: uc_property_t UC_PROPERTY_BIDI_NON_SPACING_MARK
-- Constant: uc_property_t UC_PROPERTY_BIDI_BOUNDARY_NEUTRAL
-- Constant: uc_property_t UC_PROPERTY_BIDI_PDF
-- Constant: uc_property_t UC_PROPERTY_BIDI_EMBEDDING_OR_OVERRIDE
-- Constant: uc_property_t UC_PROPERTY_BIDI_OTHER_NEUTRAL
The following properties deal with number representations.
-- Constant: uc_property_t UC_PROPERTY_HEX_DIGIT
-- Constant: uc_property_t UC_PROPERTY_ASCII_HEX_DIGIT
The following properties deal with CJK.
-- Constant: uc_property_t UC_PROPERTY_IDEOGRAPHIC
-- Constant: uc_property_t UC_PROPERTY_UNIFIED_IDEOGRAPH
-- Constant: uc_property_t UC_PROPERTY_RADICAL
-- Constant: uc_property_t UC_PROPERTY_IDS_BINARY_OPERATOR
-- Constant: uc_property_t UC_PROPERTY_IDS_TRINARY_OPERATOR
Other miscellaneous properties are:
-- Constant: uc_property_t UC_PROPERTY_ZERO_WIDTH
-- Constant: uc_property_t UC_PROPERTY_SPACE
-- Constant: uc_property_t UC_PROPERTY_NON_BREAK
-- Constant: uc_property_t UC_PROPERTY_ISO_CONTROL
-- Constant: uc_property_t UC_PROPERTY_FORMAT_CONTROL
-- Constant: uc_property_t UC_PROPERTY_DASH
-- Constant: uc_property_t UC_PROPERTY_HYPHEN
-- Constant: uc_property_t UC_PROPERTY_PUNCTUATION
-- Constant: uc_property_t UC_PROPERTY_LINE_SEPARATOR
-- Constant: uc_property_t UC_PROPERTY_PARAGRAPH_SEPARATOR
-- Constant: uc_property_t UC_PROPERTY_QUOTATION_MARK
-- Constant: uc_property_t UC_PROPERTY_SENTENCE_TERMINAL
-- Constant: uc_property_t UC_PROPERTY_TERMINAL_PUNCTUATION
-- Constant: uc_property_t UC_PROPERTY_CURRENCY_SYMBOL
-- Constant: uc_property_t UC_PROPERTY_MATH
-- Constant: uc_property_t UC_PROPERTY_OTHER_MATH
-- Constant: uc_property_t UC_PROPERTY_PAIRED_PUNCTUATION
-- Constant: uc_property_t UC_PROPERTY_LEFT_OF_PAIR
-- Constant: uc_property_t UC_PROPERTY_COMBINING
-- Constant: uc_property_t UC_PROPERTY_COMPOSITE
-- Constant: uc_property_t UC_PROPERTY_DECIMAL_DIGIT
-- Constant: uc_property_t UC_PROPERTY_NUMERIC
-- Constant: uc_property_t UC_PROPERTY_DIACRITIC
-- Constant: uc_property_t UC_PROPERTY_EXTENDER
-- Constant: uc_property_t UC_PROPERTY_IGNORABLE_CONTROL
The following function looks up a property by its name.
-- Function: uc_property_t uc_property_byname (const char
*PROPERTY_NAME)
Returns the property given by name, e.g. `"White space"'. If a
property with the given name exists, the result will satisfy the
`uc_property_is_valid' predicate. Otherwise the result will not
satisfy this predicate and must not be passed to functions that
expect an `uc_property_t' argument.
This function references a big table of all predefined properties.
Its use can significantly increase the size of your application.
-- Function: bool uc_property_is_valid (uc_property_t property)
Returns `true' when the given property is valid, or `false'
otherwise.
The following function views a property as a set of Unicode
characters.
-- Function: bool uc_is_property (ucs4_t UC, uc_property_t PROPERTY)
Tests whether the Unicode character UC has the given property.
File: libunistring.info, Node: Properties as functions, Prev: Properties as objects, Up: Properties
8.8.2 Properties as functions - the functional API
--------------------------------------------------
The following are general properties.
-- Function: bool uc_is_property_white_space (ucs4_t UC)
-- Function: bool uc_is_property_alphabetic (ucs4_t UC)
-- Function: bool uc_is_property_other_alphabetic (ucs4_t UC)
-- Function: bool uc_is_property_not_a_character (ucs4_t UC)
-- Function: bool uc_is_property_default_ignorable_code_point (ucs4_t
UC)
-- Function: bool uc_is_property_other_default_ignorable_code_point
(ucs4_t UC)
-- Function: bool uc_is_property_deprecated (ucs4_t UC)
-- Function: bool uc_is_property_logical_order_exception (ucs4_t UC)
-- Function: bool uc_is_property_variation_selector (ucs4_t UC)
-- Function: bool uc_is_property_private_use (ucs4_t UC)
-- Function: bool uc_is_property_unassigned_code_value (ucs4_t UC)
The following properties are related to case folding.
-- Function: bool uc_is_property_uppercase (ucs4_t UC)
-- Function: bool uc_is_property_other_uppercase (ucs4_t UC)
-- Function: bool uc_is_property_lowercase (ucs4_t UC)
-- Function: bool uc_is_property_other_lowercase (ucs4_t UC)
-- Function: bool uc_is_property_titlecase (ucs4_t UC)
-- Function: bool uc_is_property_soft_dotted (ucs4_t UC)
The following properties are related to identifiers.
-- Function: bool uc_is_property_id_start (ucs4_t UC)
-- Function: bool uc_is_property_other_id_start (ucs4_t UC)
-- Function: bool uc_is_property_id_continue (ucs4_t UC)
-- Function: bool uc_is_property_other_id_continue (ucs4_t UC)
-- Function: bool uc_is_property_xid_start (ucs4_t UC)
-- Function: bool uc_is_property_xid_continue (ucs4_t UC)
-- Function: bool uc_is_property_pattern_white_space (ucs4_t UC)
-- Function: bool uc_is_property_pattern_syntax (ucs4_t UC)
The following properties have an influence on shaping and rendering.
-- Function: bool uc_is_property_join_control (ucs4_t UC)
-- Function: bool uc_is_property_grapheme_base (ucs4_t UC)
-- Function: bool uc_is_property_grapheme_extend (ucs4_t UC)
-- Function: bool uc_is_property_other_grapheme_extend (ucs4_t UC)
-- Function: bool uc_is_property_grapheme_link (ucs4_t UC)
The following properties relate to bidirectional reordering.
-- Function: bool uc_is_property_bidi_control (ucs4_t UC)
-- Function: bool uc_is_property_bidi_left_to_right (ucs4_t UC)
-- Function: bool uc_is_property_bidi_hebrew_right_to_left (ucs4_t UC)
-- Function: bool uc_is_property_bidi_arabic_right_to_left (ucs4_t UC)
-- Function: bool uc_is_property_bidi_european_digit (ucs4_t UC)
-- Function: bool uc_is_property_bidi_eur_num_separator (ucs4_t UC)
-- Function: bool uc_is_property_bidi_eur_num_terminator (ucs4_t UC)
-- Function: bool uc_is_property_bidi_arabic_digit (ucs4_t UC)
-- Function: bool uc_is_property_bidi_common_separator (ucs4_t UC)
-- Function: bool uc_is_property_bidi_block_separator (ucs4_t UC)
-- Function: bool uc_is_property_bidi_segment_separator (ucs4_t UC)
-- Function: bool uc_is_property_bidi_whitespace (ucs4_t UC)
-- Function: bool uc_is_property_bidi_non_spacing_mark (ucs4_t UC)
-- Function: bool uc_is_property_bidi_boundary_neutral (ucs4_t UC)
-- Function: bool uc_is_property_bidi_pdf (ucs4_t UC)
-- Function: bool uc_is_property_bidi_embedding_or_override (ucs4_t UC)
-- Function: bool uc_is_property_bidi_other_neutral (ucs4_t UC)
The following properties deal with number representations.
-- Function: bool uc_is_property_hex_digit (ucs4_t UC)
-- Function: bool uc_is_property_ascii_hex_digit (ucs4_t UC)
The following properties deal with CJK.
-- Function: bool uc_is_property_ideographic (ucs4_t UC)
-- Function: bool uc_is_property_unified_ideograph (ucs4_t UC)
-- Function: bool uc_is_property_radical (ucs4_t UC)
-- Function: bool uc_is_property_ids_binary_operator (ucs4_t UC)
-- Function: bool uc_is_property_ids_trinary_operator (ucs4_t UC)
Other miscellaneous properties are:
-- Function: bool uc_is_property_zero_width (ucs4_t UC)
-- Function: bool uc_is_property_space (ucs4_t UC)
-- Function: bool uc_is_property_non_break (ucs4_t UC)
-- Function: bool uc_is_property_iso_control (ucs4_t UC)
-- Function: bool uc_is_property_format_control (ucs4_t UC)
-- Function: bool uc_is_property_dash (ucs4_t UC)
-- Function: bool uc_is_property_hyphen (ucs4_t UC)
-- Function: bool uc_is_property_punctuation (ucs4_t UC)
-- Function: bool uc_is_property_line_separator (ucs4_t UC)
-- Function: bool uc_is_property_paragraph_separator (ucs4_t UC)
-- Function: bool uc_is_property_quotation_mark (ucs4_t UC)
-- Function: bool uc_is_property_sentence_terminal (ucs4_t UC)
-- Function: bool uc_is_property_terminal_punctuation (ucs4_t UC)
-- Function: bool uc_is_property_currency_symbol (ucs4_t UC)
-- Function: bool uc_is_property_math (ucs4_t UC)
-- Function: bool uc_is_property_other_math (ucs4_t UC)
-- Function: bool uc_is_property_paired_punctuation (ucs4_t UC)
-- Function: bool uc_is_property_left_of_pair (ucs4_t UC)
-- Function: bool uc_is_property_combining (ucs4_t UC)
-- Function: bool uc_is_property_composite (ucs4_t UC)
-- Function: bool uc_is_property_decimal_digit (ucs4_t UC)
-- Function: bool uc_is_property_numeric (ucs4_t UC)
-- Function: bool uc_is_property_diacritic (ucs4_t UC)
-- Function: bool uc_is_property_extender (ucs4_t UC)
-- Function: bool uc_is_property_ignorable_control (ucs4_t UC)
File: libunistring.info, Node: Scripts, Next: Blocks, Prev: Properties, Up: unictype.h
8.9 Scripts
===========
The Unicode characters are subdivided into scripts.
The following type is used to represent a script:
-- Type: uc_script_t
This data type is a structure type that refers to statically
allocated read-only data. It contains the following fields:
const char *name;
The `name' field contains the name of the script.
The following functions look up a script.
-- Function: const uc_script_t * uc_script (ucs4_t UC)
Returns the script of a Unicode character. Returns NULL if UC
does not belong to any script.
-- Function: const uc_script_t * uc_script_byname (const char
*SCRIPT_NAME)
Returns the script given by its name, e.g. `"HAN"'. Returns NULL
if a script with the given name does not exist.
The following function views a script as a set of Unicode characters.
-- Function: bool uc_is_script (ucs4_t UC, const uc_script_t *SCRIPT)
Tests whether a Unicode character belongs to a given script.
The following gives a global picture of all scripts.
-- Function: void uc_all_scripts (const uc_script_t **SCRIPTS, size_t
*COUNT)
Get the list of all scripts. Stores a pointer to an array of all
scripts in `*SCRIPTS' and the length of this array in `*COUNT'.
File: libunistring.info, Node: Blocks, Next: ISO C and Java syntax, Prev: Scripts, Up: unictype.h
8.10 Blocks
===========
The Unicode characters are subdivided into blocks. A block is an
interval of Unicode code points.
The following type is used to represent a block.
-- Type: uc_block_t
This data type is a structure type that refers to statically
allocated data. It contains the following fields:
ucs4_t start;
ucs4_t end;
const char *name;
The `start' field is the first Unicode code point in the block.
The `end' field is the last Unicode code point in the block.
The `name' field is the name of the block.
The following function looks up a block.
-- Function: const uc_block_t * uc_block (ucs4_t UC)
Returns the block a character belongs to.
The following function views a block as a set of Unicode characters.
-- Function: bool uc_is_block (ucs4_t UC, const uc_block_t *BLOCK)
Tests whether a Unicode character belongs to a given block.
The following gives a global picture of all block.
-- Function: void uc_all_blocks (const uc_block_t **BLOCKS, size_t
*COUNT)
Get the list of all blocks. Stores a pointer to an array of all
blocks in `*BLOCKS' and the length of this array in `*COUNT'.
File: libunistring.info, Node: ISO C and Java syntax, Next: Classifications like in ISO C, Prev: Blocks, Up: unictype.h
8.11 ISO C and Java syntax
==========================
The following properties are taken from language standards. The
supported language standards are ISO C 99 and Java.
-- Function: bool uc_is_c_whitespace (ucs4_t UC)
Tests whether a Unicode character is considered whitespace in ISO
C 99.
-- Function: bool uc_is_java_whitespace (ucs4_t UC)
Tests whether a Unicode character is considered whitespace in Java.
The following enumerated values are the possible return values of
the functions `uc_c_ident_category' and `uc_java_ident_category'.
-- Constant: int UC_IDENTIFIER_START
This return value means that the given character is valid as first
or subsequent character in an identifier.
-- Constant: int UC_IDENTIFIER_VALID
This return value means that the given character is valid as
subsequent character only.
-- Constant: int UC_IDENTIFIER_INVALID
This return value means that the given character is not valid in
an identifier.
-- Constant: int UC_IDENTIFIER_IGNORABLE
This return value (only for Java) means that the given character
is ignorable.
The following function determine whether a given character can be a
constituent of an identifier in the given programming language.
-- Function: int uc_c_ident_category (ucs4_t UC)
Returns the categorization of a Unicode character with respect to
the ISO C 99 identifier syntax.
-- Function: int uc_java_ident_category (ucs4_t UC)
Returns the categorization of a Unicode character with respect to
the Java identifier syntax.
File: libunistring.info, Node: Classifications like in ISO C, Prev: ISO C and Java syntax, Up: unictype.h
8.12 Classifications like in ISO C
==================================
The following character classifications mimic those declared in the
ISO C header files `<ctype.h>' and `<wctype.h>'. These functions are
deprecated, because this set of functions was designed with ASCII in
mind and cannot reflect the more diverse reality of the Unicode
character set. But they can be a quick-and-dirty porting aid when
migrating from `wchar_t' APIs to Unicode strings.
-- Function: bool uc_is_alnum (ucs4_t UC)
Tests for any character for which `uc_is_alpha' or `uc_is_digit' is
true.
-- Function: bool uc_is_alpha (ucs4_t UC)
Tests for any character for which `uc_is_upper' or `uc_is_lower' is
true, or any character that is one of a locale-specific set of
characters for which none of `uc_is_cntrl', `uc_is_digit',
`uc_is_punct', or `uc_is_space' is true.
-- Function: bool uc_is_cntrl (ucs4_t UC)
Tests for any control character.
-- Function: bool uc_is_digit (ucs4_t UC)
Tests for any character that corresponds to a decimal-digit
character.
-- Function: bool uc_is_graph (ucs4_t UC)
Tests for any character for which `uc_is_print' is true and
`uc_is_space' is false.
-- Function: bool uc_is_lower (ucs4_t UC)
Tests for any character that corresponds to a lowercase letter or
is one of a locale-specific set of characters for which none of
`uc_is_cntrl', `uc_is_digit', `uc_is_punct', or `uc_is_space' is
true.
-- Function: bool uc_is_print (ucs4_t UC)
Tests for any printing character.
-- Function: bool uc_is_punct (ucs4_t UC)
Tests for any printing character that is one of a locale-specific
set of characters for which neither `uc_is_space' nor
`uc_is_alnum' is true.
-- Function: bool uc_is_space (ucs4_t UC)
Test for any character that corresponds to a locale-specific set
of characters for which none of `uc_is_alnum', `uc_is_graph', or
`uc_is_punct' is true.
-- Function: bool uc_is_upper (ucs4_t UC)
Tests for any character that corresponds to an uppercase letter or
is one of a locale-specific set of characters for which none of
`uc_is_cntrl', `uc_is_digit', `uc_is_punct', or `uc_is_space' is
true.
-- Function: bool uc_is_xdigit (ucs4_t UC)
Tests for any character that corresponds to a hexadecimal-digit
character.
-- Function: bool uc_is_blank (ucs4_t UC)
Tests for any character that corresponds to a standard blank
character or a locale-specific set of characters for which
`uc_is_alnum' is false.
File: libunistring.info, Node: uniwidth.h, Next: uniwbrk.h, Prev: unictype.h, Up: Top
9 Display width `<uniwidth.h>'
******************************
This include file declares functions that return the display width,
measured in columns, of characters or strings, when output to a device
that uses non-proportional fonts.
Note that for some rarely used characters the actual fonts or
terminal emulators can use a different width. There is no mechanism
for communicating the display width of characters across a Unix
pseudo-terminal (tty). Also, there are scripts with complex rendering,
like the Indic scripts. For these scripts, there is no such concept as
non-proportional fonts. Therefore the results of these functions
usually work fine on most scripts and on most characters but can fail
to represent the actual display width.
These functions are locale dependent. The ENCODING argument
identifies the encoding (e.g. `"ISO-8859-2"' for Polish).
-- Function: int uc_width (ucs4_t UC, const char *ENCODING)
Determines and returns the number of column positions required for
UC. Returns -1 if UC is a control character that has an influence
on the column position when output.
-- Function: int u8_width (const uint8_t *S, size_t N, const char
*ENCODING)
-- Function: int u16_width (const uint16_t *S, size_t N, const char
*ENCODING)
-- Function: int u32_width (const uint32_t *S, size_t N, const char
*ENCODING)
Determines and returns the number of column positions required for
first N units (or fewer if S ends before this) in S. This
function ignores control characters in the string.
-- Function: int u8_strwidth (const uint8_t *S, const char *ENCODING)
-- Function: int u16_strwidth (const uint16_t *S, const char *ENCODING)
-- Function: int u32_strwidth (const uint32_t *S, const char *ENCODING)
Determines and returns the number of column positions required for
S. This function ignores control characters in the string.
File: libunistring.info, Node: uniwbrk.h, Next: unilbrk.h, Prev: uniwidth.h, Up: Top
10 Word breaks in strings `<uniwbrk.h>'
***************************************
This include file declares functions for determining where in a
string "words" start and end. Here "words" are not necessarily the
same as entities that can be looked up in dictionaries, but rather
groups of consecutive characters that should not be split by text
processing operations.
* Menu:
* Word breaks in a string::
* Word break property::
File: libunistring.info, Node: Word breaks in a string, Next: Word break property, Up: uniwbrk.h
10.1 Word breaks in a string
============================
The following functions determine the word breaks in a string.
-- Function: void u8_wordbreaks (const uint8_t *S, size_t N, char *P)
-- Function: void u16_wordbreaks (const uint16_t *S, size_t N, char *P)
-- Function: void u32_wordbreaks (const uint32_t *S, size_t N, char *P)
-- Function: void ulc_wordbreaks (const char *S, size_t N, char *P)
Determines the word break points in S, an array of N units, and
stores the result at `P[0..N-1]'.
`P[i] = 1'
means that there is a word boundary between `S[i-1]' and
`S[i]'.
`P[i] = 0'
means that `S[i-1]' and `S[i]' must not be separated.
`P[0]' is always set to 0. If an application wants to consider a
word break to be present at the beginning of the string (before
`S[0]') or at the end of the string (after `S[0..N-1]'), it has to
treat these cases explicitly.
File: libunistring.info, Node: Word break property, Prev: Word breaks in a string, Up: uniwbrk.h
10.2 Word break property
========================
This is a more low-level API. The word break property is a property
defined in Unicode Standard Annex #29, section "Word Boundaries", see
`http://www.unicode.org/reports/tr29/#Word_Boundaries'. It is used for
determining the word breaks in a string.
The following are the possible values of the word break property.
More values may be added in the future.
-- Constant: int WBP_OTHER
-- Constant: int WBP_CR
-- Constant: int WBP_LF
-- Constant: int WBP_NEWLINE
-- Constant: int WBP_EXTEND
-- Constant: int WBP_FORMAT
-- Constant: int WBP_KATAKANA
-- Constant: int WBP_ALETTER
-- Constant: int WBP_MIDNUMLET
-- Constant: int WBP_MIDLETTER
-- Constant: int WBP_MIDNUM
-- Constant: int WBP_NUMERIC
-- Constant: int WBP_EXTENDNUMLET
The following function looks up the word break property of a
character.
-- Function: int uc_wordbreak_property (ucs4_t UC)
Returns the Word_Break property of a Unicode character.
File: libunistring.info, Node: unilbrk.h, Next: uninorm.h, Prev: uniwbrk.h, Up: Top
11 Line breaking `<unilbrk.h>'
******************************
This include file declares functions for determining where in a
string line breaks could or should be introduced, in order to make the
displayed string fit into a column of given width.
These functions are locale dependent. The ENCODING argument
identifies the encoding (e.g. `"ISO-8859-2"' for Polish).
The following enumerated values indicate whether, at a given
position, a line break is possible or not. Given an string S as an
array `S[0..N-1]' and a position I, the values have the following
meanings:
-- Constant: int UC_BREAK_MANDATORY
This value indicates that `S[I]' is a line break character.
-- Constant: int UC_BREAK_POSSIBLE
This value indicates that a line break may be inserted between
`S[I-1]' and `S[I]'.
-- Constant: int UC_BREAK_HYPHENATION
This value indicates that a hyphen and a line break may be
inserted between `S[I-1]' and `S[I]'. But beware of language
dependent hyphenation rules.
-- Constant: int UC_BREAK_PROHIBITED
This value indicates that `S[I-1]' and `S[I]' must not be
separated.
-- Constant: int UC_BREAK_UNDEFINED
This value is not used as a return value; rather, in the
overriding argument of the `u*_width_linebreaks' functions, it
indicates the absence of an override.
The following functions determine the positions at which line breaks
are possible.
-- Function: void u8_possible_linebreaks (const uint8_t *S, size_t N,
const char *ENCODING, char *P)
-- Function: void u16_possible_linebreaks (const uint16_t *S, size_t
N, const char *ENCODING, char *P)
-- Function: void u32_possible_linebreaks (const uint32_t *S, size_t
N, const char *ENCODING, char *P)
-- Function: void ulc_possible_linebreaks (const char *S, size_t N,
const char *ENCODING, char *P)
Determines the line break points in S, and stores the result at
`P[0..N-1]'. Every `P[I]' is assigned one of the values
`UC_BREAK_MANDATORY', `UC_BREAK_POSSIBLE', `UC_BREAK_HYPHENATION',
`UC_BREAK_PROHIBITED'.
The following functions determine where line breaks should be
inserted so that each line fits in a given width, when output to a
device that uses non-proportional fonts.
-- Function: int u8_width_linebreaks (const uint8_t *S, size_t N, int
WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
*OVERRIDE, const char *ENCODING, char *P)
-- Function: int u16_width_linebreaks (const uint16_t *S, size_t N,
int WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
*OVERRIDE, const char *ENCODING, char *P)
-- Function: int u32_width_linebreaks (const uint32_t *S, size_t N,
int WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
*OVERRIDE, const char *ENCODING, char *P)
-- Function: int ulc_width_linebreaks (const char *S, size_t N, int
WIDTH, int START_COLUMN, int AT_END_COLUMNS, const char
*OVERRIDE, const char *ENCODING, char *P)
Chooses the best line breaks, assuming that every character
occupies a width given by the `uc_width' function (see *note
uniwidth.h::).
The string is `S[0..N-1]'.
The maximum number of columns per line is given as WIDTH. The
starting column of the string is given as START_COLUMN. If the
algorithm shall keep room after the last piece, this amount of
room can be given as AT_END_COLUMNS.
OVERRIDE is an optional override; if `OVERRIDE[I] !=
UC_BREAK_UNDEFINED', `OVERRIDE[I]' takes precedence over `P[I]' as
returned by the `u*_possible_linebreaks' function.
The given ENCODING is used for disambiguating widths in `uc_width'.
Returns the column after the end of the string, and stores the
result at `P[0..N-1]'. Every `P[I]' is assigned one of the values
`UC_BREAK_MANDATORY', `UC_BREAK_POSSIBLE', `UC_BREAK_HYPHENATION',
`UC_BREAK_PROHIBITED'. Here the value `UC_BREAK_POSSIBLE'
indicates that a line break _should_ be inserted.
File: libunistring.info, Node: uninorm.h, Next: unicase.h, Prev: unilbrk.h, Up: Top
12 Normalization forms (composition and decomposition) `<uninorm.h>'
********************************************************************
This include file defines functions for transforming Unicode strings
to one of the four normal forms, known as NFC, NFD, NKFC, NFKD. These
transformations involve decomposition and -- for NFC and NFKC --
composition of Unicode characters.
* Menu:
* Decomposition of characters::
* Composition of characters::
* Normalization of strings::
* Normalizing comparisons::
* Normalization of streams::
File: libunistring.info, Node: Decomposition of characters, Next: Composition of characters, Up: uninorm.h
12.1 Decomposition of Unicode characters
========================================
The following enumerated values are the possible types of
decomposition of a Unicode character.
-- Constant: int UC_DECOMP_CANONICAL
Denotes canonical decomposition.
-- Constant: int UC_DECOMP_FONT
UCD marker: `<font>'. Denotes a font variant (e.g. a blackletter
form).
-- Constant: int UC_DECOMP_NOBREAK
UCD marker: `<noBreak>'. Denotes a no-break version of a space or
hyphen.
-- Constant: int UC_DECOMP_INITIAL
UCD marker: `<initial>'. Denotes an initial presentation form
(Arabic).
-- Constant: int UC_DECOMP_MEDIAL
UCD marker: `<medial>'. Denotes a medial presentation form
(Arabic).
-- Constant: int UC_DECOMP_FINAL
UCD marker: `<final>'. Denotes a final presentation form (Arabic).
-- Constant: int UC_DECOMP_ISOLATED
UCD marker: `<isolated>'. Denotes an isolated presentation form
(Arabic).
-- Constant: int UC_DECOMP_CIRCLE
UCD marker: `<circle>'. Denotes an encircled form.
-- Constant: int UC_DECOMP_SUPER
UCD marker: `<super>'. Denotes a superscript form.
-- Constant: int UC_DECOMP_SUB
UCD marker: `<sub>'. Denotes a subscript form.
-- Constant: int UC_DECOMP_VERTICAL
UCD marker: `<vertical>'. Denotes a vertical layout presentation
form.
-- Constant: int UC_DECOMP_WIDE
UCD marker: `<wide>'. Denotes a wide (or zenkaku) compatibility
character.
-- Constant: int UC_DECOMP_NARROW
UCD marker: `<narrow>'. Denotes a narrow (or hankaku)
compatibility character.
-- Constant: int UC_DECOMP_SMALL
UCD marker: `<small>'. Denotes a small variant form (CNS
compatibility).
-- Constant: int UC_DECOMP_SQUARE
UCD marker: `<square>'. Denotes a CJK squared font variant.
-- Constant: int UC_DECOMP_FRACTION
UCD marker: `<fraction>'. Denotes a vulgar fraction form.
-- Constant: int UC_DECOMP_COMPAT
UCD marker: `<compat>'. Denotes an otherwise unspecified
compatibility character.
The following constant denotes the maximum size of decomposition of
a single Unicode character.
-- Macro: unsigned int UC_DECOMPOSITION_MAX_LENGTH
This macro expands to a constant that is the required size of
buffer passed to the `uc_decomposition' and
`uc_canonical_decomposition' functions.
The following functions decompose a Unicode character.
-- Function: int uc_decomposition (ucs4_t UC, int *DECOMP_TAG, ucs4_t
*DECOMPOSITION)
Returns the character decomposition mapping of the Unicode
character UC. DECOMPOSITION must point to an array of at least
`UC_DECOMPOSITION_MAX_LENGTH' `ucs_t' elements.
When a decomposition exists, `DECOMPOSITION[0..N-1]' and
`*DECOMP_TAG' are filled and N is returned. Otherwise -1 is
returned.
-- Function: int uc_canonical_decomposition (ucs4_t UC, ucs4_t
*DECOMPOSITION)
Returns the canonical character decomposition mapping of the
Unicode character UC. DECOMPOSITION must point to an array of at
least `UC_DECOMPOSITION_MAX_LENGTH' `ucs_t' elements.
When a decomposition exists, `DECOMPOSITION[0..N-1]' is filled and
N is returned. Otherwise -1 is returned.
File: libunistring.info, Node: Composition of characters, Next: Normalization of strings, Prev: Decomposition of characters, Up: uninorm.h
12.2 Composition of Unicode characters
======================================
The following function composes a Unicode character from two Unicode
characters.
-- Function: ucs4_t uc_composition (ucs4_t UC1, ucs4_t UC2)
Attempts to combine the Unicode characters UC1, UC2. UC1 is known
to have canonical combining class 0.
Returns the combination of UC1 and UC2, if it exists. Returns 0
otherwise.
Not all decompositions can be recombined using this function. See
the Unicode file `CompositionExclusions.txt' for details.
File: libunistring.info, Node: Normalization of strings, Next: Normalizing comparisons, Prev: Composition of characters, Up: uninorm.h
12.3 Normalization of strings
=============================
The Unicode standard defines four normalization forms for Unicode
strings. The following type is used to denote a normalization form.
-- Type: uninorm_t
An object of type `uninorm_t' denotes a Unicode normalization form.
This is a scalar type; its values can be compared with `=='.
The following constants denote the four normalization forms.
-- Macro: uninorm_t UNINORM_NFD
Denotes Normalization form D: canonical decomposition.
-- Macro: uninorm_t UNINORM_NFC
Normalization form C: canonical decomposition, then canonical
composition.
-- Macro: uninorm_t UNINORM_NFKD
Normalization form KD: compatibility decomposition.
-- Macro: uninorm_t UNINORM_NFKC
Normalization form KC: compatibility decomposition, then canonical
composition.
The following functions operate on `uninorm_t' objects.
-- Function: bool uninorm_is_compat_decomposing (uninorm_t NF)
Tests whether the normalization form NF does compatibility
decomposition.
-- Function: bool uninorm_is_composing (uninorm_t NF)
Tests whether the normalization form NF includes canonical
composition.
-- Function: uninorm_t uninorm_decomposing_form (uninorm_t NF)
Returns the decomposing variant of the normalization form NF.
This maps NFC,NFD -> NFD and NFKC,NFKD -> NFKD.
The following functions apply a Unicode normalization form to a
Unicode string.
-- Function: uint8_t * u8_normalize (uninorm_t NF, const uint8_t *S,
size_t N, uint8_t *RESULTBUF, size_t *LENGTHP)
-- Function: uint16_t * u16_normalize (uninorm_t NF, const uint16_t
*S, size_t N, uint16_t *RESULTBUF, size_t *LENGTHP)
-- Function: uint32_t * u32_normalize (uninorm_t NF, const uint32_t
*S, size_t N, uint32_t *RESULTBUF, size_t *LENGTHP)
Returns the specified normalization form of a string.
File: libunistring.info, Node: Normalizing comparisons, Next: Normalization of streams, Prev: Normalization of strings, Up: uninorm.h
12.4 Normalizing comparisons
============================
The following functions compare Unicode string, ignoring differences
in normalization.
-- Function: int u8_normcmp (const uint8_t *S1, size_t N1, const
uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
-- Function: int u16_normcmp (const uint16_t *S1, size_t N1, const
uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
-- Function: int u32_normcmp (const uint32_t *S1, size_t N1, const
uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
Compares S1 and S2, ignoring differences in normalization.
NF must be either `UNINORM_NFD' or `UNINORM_NFKD'.
If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
set.
-- Function: char * u8_normxfrm (const uint8_t *S, size_t N, uninorm_t
NF, char *RESULTBUF, size_t *LENGTHP)
-- Function: char * u16_normxfrm (const uint16_t *S, size_t N,
uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
-- Function: char * u32_normxfrm (const uint32_t *S, size_t N,
uninorm_t NF, char *RESULTBUF, size_t *LENGTHP)
Converts the string S of length N to a NUL-terminated byte
sequence, in such a way that comparing `u8_normxfrm (S1)' and
`u8_normxfrm (S2)' with the `u8_cmp2' function is equivalent to
comparing S1 and S2 with the `u8_normcoll' function.
NF must be either `UNINORM_NFC' or `UNINORM_NFKC'.
-- Function: int u8_normcoll (const uint8_t *S1, size_t N1, const
uint8_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
-- Function: int u16_normcoll (const uint16_t *S1, size_t N1, const
uint16_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
-- Function: int u32_normcoll (const uint32_t *S1, size_t N1, const
uint32_t *S2, size_t N2, uninorm_t NF, int *RESULTP)
Compares S1 and S2, ignoring differences in normalization, using
the collation rules of the current locale.
NF must be either `UNINORM_NFC' or `UNINORM_NFKC'.
If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
set.
File: libunistring.info, Node: Normalization of streams, Prev: Normalizing comparisons, Up: uninorm.h
12.5 Normalization of streams of Unicode characters
===================================================
A "stream of Unicode characters" is essentially a function that
accepts an `ucs4_t' argument repeatedly, optionally combined with a
function that "flushes" the stream.
-- Type: struct uninorm_filter
This is the data type of a stream of Unicode characters that
normalizes its input according to a given normalization form and
passes the normalized character sequence to the encapsulated
stream of Unicode characters.
-- Function: struct uninorm_filter * uninorm_filter_create (uninorm_t
NF, int (*STREAM_FUNC) (void *STREAM_DATA, ucs4_t UC), void
*STREAM_DATA)
Creates and returns a normalization filter for Unicode characters.
The pair (STREAM_FUNC, STREAM_DATA) is the encapsulated stream.
`STREAM_FUNC (STREAM_DATA, UC)' receives the Unicode character UC
and returns 0 if successful, or -1 with `errno' set upon failure.
Returns the new filter, or NULL with `errno' set upon failure.
-- Function: int uninorm_filter_write (struct uninorm_filter *FILTER,
ucs4_t UC)
Stuffs a Unicode character into a normalizing filter. Returns 0
if successful, or -1 with `errno' set upon failure.
-- Function: int uninorm_filter_flush (struct uninorm_filter *FILTER)
Brings data buffered in the filter to its destination, the
encapsulated stream.
Returns 0 if successful, or -1 with `errno' set upon failure.
Note! If after calling this function, additional characters are
written into the filter, the resulting character sequence in the
encapsulated stream will not necessarily be normalized.
-- Function: int uninorm_filter_free (struct uninorm_filter *FILTER)
Brings data buffered in the filter to its destination, the
encapsulated stream, then closes and frees the filter.
Returns 0 if successful, or -1 with `errno' set upon failure.
File: libunistring.info, Node: unicase.h, Next: uniregex.h, Prev: uninorm.h, Up: Top
13 Case mappings `<unicase.h>'
******************************
This include file defines functions for case mapping for Unicode
strings and case insensitive comparison of Unicode strings and C
strings.
These string functions fix the problems that were mentioned in *note
char * strings::, namely, they handle the Croatian LETTER DZ WITH
CARON, the German LATIN SMALL LETTER SHARP S, the Greek sigma and the
Lithuanian i correctly.
* Menu:
* Case mappings of characters::
* Case mappings of strings::
* Case mappings of substrings::
* Case insensitive comparison::
* Case detection::
File: libunistring.info, Node: Case mappings of characters, Next: Case mappings of strings, Up: unicase.h
13.1 Case mappings of characters
================================
The following functions implement case mappings on Unicode
characters -- for those cases only where the result of the mapping is a
again a single Unicode character.
These mappings are locale and context independent.
*WARNING!* These functions are not sufficient for languages such as
German, Greek and Lithuanian. Better use the functions below that
treat an entire string at once and are language aware.
-- Function: ucs4_t uc_toupper (ucs4_t UC)
Returns the uppercase mapping of the Unicode character UC.
-- Function: ucs4_t uc_tolower (ucs4_t UC)
Returns the lowercase mapping of the Unicode character UC.
-- Function: ucs4_t uc_totitle (ucs4_t UC)
Returns the titlecase mapping of the Unicode character UC.
The titlecase mapping of a character is to be used when the
character should look like upper case and the following characters
are lower cased.
For most characters, this is the same as the uppercase mapping.
There are only few characters where the title case variant and the
uuper case variant are different. These characters occur in the
Latin writing of the Croatian, Bosnian, and Serbian languages.
Lower case Title case Upper case
------------------------------------------------------------------
LATIN SMALL LETTER LJ LATIN CAPITAL LETTER LATIN CAPITAL LETTER
L WITH SMALL LETTER J LJ
LATIN SMALL LETTER NJ LATIN CAPITAL LETTER LATIN CAPITAL LETTER
N WITH SMALL LETTER J NJ
LATIN SMALL LETTER DZ LATIN CAPITAL LETTER LATIN CAPITAL LETTER
D WITH SMALL LETTER Z DZ
LATIN SMALL LETTER LATIN CAPITAL LETTER LATIN CAPITAL LETTER
DZ WITH CARON D WITH SMALL LETTER DZ WITH CARON
Z WITH CARON
File: libunistring.info, Node: Case mappings of strings, Next: Case mappings of substrings, Prev: Case mappings of characters, Up: unicase.h
13.2 Case mappings of strings
=============================
Case mapping should always be performed on entire strings, not on
individual characters. The functions in this sections do so.
These functions allow to apply a normalization after the case
mapping. The reason is that if you want to treat `ä' and `Ä' the
same, you most often also want to treat the composed and decomposed
forms of such a character, U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
and U+0041 LATIN CAPITAL LETTER A U+0308 COMBINING DIAERESIS the same.
The NF argument designates the normalization.
These functions are locale dependent. The ISO639_LANGUAGE argument
identifies the language (e.g. `"tr"' for Turkish). NULL means to use
locale independent case mappings.
-- Function: const char * uc_locale_language ()
Returns the ISO 639 language code of the current locale. Returns
`""' if it is unknown, or in the "C" locale.
-- Function: uint8_t * u8_toupper (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
size_t *LENGTHP)
-- Function: uint16_t * u16_toupper (const uint16_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
*RESULTBUF, size_t *LENGTHP)
-- Function: uint32_t * u32_toupper (const uint32_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
*RESULTBUF, size_t *LENGTHP)
Returns the uppercase mapping of a string.
The NF argument identifies the normalization form to apply after
the case-mapping. It can also be NULL, for no normalization.
-- Function: uint8_t * u8_tolower (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
size_t *LENGTHP)
-- Function: uint16_t * u16_tolower (const uint16_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
*RESULTBUF, size_t *LENGTHP)
-- Function: uint32_t * u32_tolower (const uint32_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
*RESULTBUF, size_t *LENGTHP)
Returns the lowercase mapping of a string.
The NF argument identifies the normalization form to apply after
the case-mapping. It can also be NULL, for no normalization.
-- Function: uint8_t * u8_totitle (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
size_t *LENGTHP)
-- Function: uint16_t * u16_totitle (const uint16_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
*RESULTBUF, size_t *LENGTHP)
-- Function: uint32_t * u32_totitle (const uint32_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
*RESULTBUF, size_t *LENGTHP)
Returns the titlecase mapping of a string.
Mapping to title case means that, in each word, the first cased
character is being mapped to title case and the remaining
characters of the word are being mapped to lower case.
The NF argument identifies the normalization form to apply after
the case-mapping. It can also be NULL, for no normalization.
File: libunistring.info, Node: Case mappings of substrings, Next: Case insensitive comparison, Prev: Case mappings of strings, Up: unicase.h
13.3 Case mappings of substrings
================================
Case mapping of a substring cannot simply be performed by extracting
the substring and then applying the case mapping function to it. This
does not work because case mapping requires some information about the
surrounding characters. The following functions allow to apply case
mappings to substrings of a given string, while taking into account the
characters that precede it (the "prefix") and the characters that
follow it (the "suffix").
-- Type: casing_prefix_context_t
This data type denotes the case-mapping context that is given by a
prefix string. It is an immediate type that can be copied by
simple assignment, without involving memory allocation. It is not
an array type.
-- Constant: casing_prefix_context_t unicase_empty_prefix_context
This constant is the case-mapping context that corresponds to an
empty prefix string.
The following functions return `casing_prefix_context_t' objects:
-- Function: casing_prefix_context_t u8_casing_prefix_context (const
uint8_t *S, size_t N)
-- Function: casing_prefix_context_t u16_casing_prefix_context (const
uint16_t *S, size_t N)
-- Function: casing_prefix_context_t u32_casing_prefix_context (const
uint32_t *S, size_t N)
Returns the case-mapping context of a given prefix string.
-- Function: casing_prefix_context_t u8_casing_prefixes_context (const
uint8_t *S, size_t N, casing_prefix_context_t A_CONTEXT)
-- Function: casing_prefix_context_t u16_casing_prefixes_context
(const uint16_t *S, size_t N, casing_prefix_context_t
A_CONTEXT)
-- Function: casing_prefix_context_t u32_casing_prefixes_context
(const uint32_t *S, size_t N, casing_prefix_context_t
A_CONTEXT)
Returns the case-mapping context of the prefix concat(A, S), given
the case-mapping context of the prefix A.
-- Type: casing_suffix_context_t
This data type denotes the case-mapping context that is given by a
suffix string. It is an immediate type that can be copied by
simple assignment, without involving memory allocation. It is not
an array type.
-- Constant: casing_suffix_context_t unicase_empty_suffix_context
This constant is the case-mapping context that corresponds to an
empty suffix string.
The following functions return `casing_suffix_context_t' objects:
-- Function: casing_suffix_context_t u8_casing_suffix_context (const
uint8_t *S, size_t N)
-- Function: casing_suffix_context_t u16_casing_suffix_context (const
uint16_t *S, size_t N)
-- Function: casing_suffix_context_t u32_casing_suffix_context (const
uint32_t *S, size_t N)
Returns the case-mapping context of a given suffix string.
-- Function: casing_suffix_context_t u8_casing_suffixes_context (const
uint8_t *S, size_t N, casing_suffix_context_t A_CONTEXT)
-- Function: casing_suffix_context_t u16_casing_suffixes_context
(const uint16_t *S, size_t N, casing_suffix_context_t
A_CONTEXT)
-- Function: casing_suffix_context_t u32_casing_suffixes_context
(const uint32_t *S, size_t N, casing_suffix_context_t
A_CONTEXT)
Returns the case-mapping context of the suffix concat(S, A), given
the case-mapping context of the suffix A.
The following functions perform a case mapping, considering the
prefix context and the suffix context.
-- Function: uint8_t * u8_ct_toupper (const uint8_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint16_t * u16_ct_toupper (const uint16_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint32_t * u32_ct_toupper (const uint32_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
*LENGTHP)
Returns the uppercase mapping of a string that is surrounded by a
prefix and a suffix.
-- Function: uint8_t * u8_ct_tolower (const uint8_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint16_t * u16_ct_tolower (const uint16_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint32_t * u32_ct_tolower (const uint32_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
*LENGTHP)
Returns the lowercase mapping of a string that is surrounded by a
prefix and a suffix.
-- Function: uint8_t * u8_ct_totitle (const uint8_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint16_t * u16_ct_totitle (const uint16_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint32_t * u32_ct_totitle (const uint32_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
*LENGTHP)
Returns the titlecase mapping of a string that is surrounded by a
prefix and a suffix.
For example, to uppercase the UTF-8 substring between `s +
start_index' and `s + end_index' of a string that extends from `s' to
`s + u8_strlen (s)', you can use the statements
size_t result_length;
uint8_t result =
u8_ct_toupper (s + start_index, end_index - start_index,
u8_casing_prefix_context (s, start_index),
u8_casing_suffix_context (s + end_index,
u8_strlen (s) - end_index),
iso639_language, NULL, NULL, &result_length);
File: libunistring.info, Node: Case insensitive comparison, Next: Case detection, Prev: Case mappings of substrings, Up: unicase.h
13.4 Case insensitive comparison
================================
The following functions implement comparison that ignores
differences in case and normalization.
-- Function: uint8_t * u8_casefold (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF,
size_t *LENGTHP)
-- Function: uint16_t * u16_casefold (const uint16_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint16_t
*RESULTBUF, size_t *LENGTHP)
-- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
*RESULTBUF, size_t *LENGTHP)
Returns the case folded string.
Comparing `u8_casefold (S1)' and `u8_casefold (S2)' with the
`u8_cmp2' function is equivalent to comparing S1 and S2 with
`u8_casecmp'.
The NF argument identifies the normalization form to apply after
the case-mapping. It can also be NULL, for no normalization.
-- Function: uint8_t * u8_ct_casefold (const uint8_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint8_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint16_t * u16_ct_casefold (const uint16_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint16_t *RESULTBUF, size_t
*LENGTHP)
-- Function: uint32_t * u32_ct_casefold (const uint32_t *S, size_t N,
casing_prefix_context_t PREFIX_CONTEXT,
casing_suffix_context_t SUFFIX_CONTEXT, const char
*ISO639_LANGUAGE, uninorm_t NF, uint32_t *RESULTBUF, size_t
*LENGTHP)
Returns the case folded string. The case folding takes into
account the case mapping contexts of the prefix and suffix strings.
-- Function: int u8_casecmp (const uint8_t *S1, size_t N1, const
uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE,
uninorm_t NF, int *RESULTP)
-- Function: int u16_casecmp (const uint16_t *S1, size_t N1, const
uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
uninorm_t NF, int *RESULTP)
-- Function: int u32_casecmp (const uint32_t *S1, size_t N1, const
uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
uninorm_t NF, int *RESULTP)
-- Function: int ulc_casecmp (const char *S1, size_t N1, const char
*S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF,
int *RESULTP)
Compares S1 and S2, ignoring differences in case and normalization.
The NF argument identifies the normalization form to apply after
the case-mapping. It can also be NULL, for no normalization.
If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
set.
The following functions additionally take into account the sorting
rules of the current locale.
-- Function: char * u8_casexfrm (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
*LENGTHP)
-- Function: char * u16_casexfrm (const uint16_t *S, size_t N, const
char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
*LENGTHP)
-- Function: char * u32_casexfrm (const uint32_t *S, size_t N, const
char *ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
*LENGTHP)
-- Function: char * ulc_casexfrm (const char *S, size_t N, const char
*ISO639_LANGUAGE, uninorm_t NF, char *RESULTBUF, size_t
*LENGTHP)
Converts the string S of length N to a NUL-terminated byte
sequence, in such a way that comparing `u8_casexfrm (S1)' and
`u8_casexfrm (S2)' with the gnulib function `memcmp2' is
equivalent to comparing S1 and S2 with `u8_casecoll'.
NF must be either `UNINORM_NFC', `UNINORM_NFKC', or NULL for no
normalization.
-- Function: int u8_casecoll (const uint8_t *S1, size_t N1, const
uint8_t *S2, size_t N2, const char *ISO639_LANGUAGE,
uninorm_t NF, int *RESULTP)
-- Function: int u16_casecoll (const uint16_t *S1, size_t N1, const
uint16_t *S2, size_t N2, const char *ISO639_LANGUAGE,
uninorm_t NF, int *RESULTP)
-- Function: int u32_casecoll (const uint32_t *S1, size_t N1, const
uint32_t *S2, size_t N2, const char *ISO639_LANGUAGE,
uninorm_t NF, int *RESULTP)
-- Function: int ulc_casecoll (const char *S1, size_t N1, const char
*S2, size_t N2, const char *ISO639_LANGUAGE, uninorm_t NF,
int *RESULTP)
Compares S1 and S2, ignoring differences in case and normalization,
using the collation rules of the current locale.
The NF argument identifies the normalization form to apply after
the case-mapping. It must be either `UNINORM_NFC' or
`UNINORM_NFKC'. It can also be NULL, for no normalization.
If successful, sets `*RESULTP' to -1 if S1 < S2, 0 if S1 = S2, 1
if S1 > S2, and returns 0. Upon failure, returns -1 with `errno'
set.
File: libunistring.info, Node: Case detection, Prev: Case insensitive comparison, Up: unicase.h
13.5 Case detection
===================
The following functions determine whether a Unicode string is
entirely in upper case. or entirely in lower case, or entirely in title
case, or already case-folded.
-- Function: int u8_is_uppercase (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u16_is_uppercase (const uint16_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u32_is_uppercase (const uint32_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
Sets `*RESULTP' to true if mapping NFD(S) to upper case is a
no-op, or to false otherwise, and returns 0. Upon failure,
returns -1 with `errno' set.
-- Function: int u8_is_lowercase (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u16_is_lowercase (const uint16_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u32_is_lowercase (const uint32_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
Sets `*RESULTP' to true if mapping NFD(S) to lower case is a
no-op, or to false otherwise, and returns 0. Upon failure,
returns -1 with `errno' set.
-- Function: int u8_is_titlecase (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u16_is_titlecase (const uint16_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u32_is_titlecase (const uint32_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
Sets `*RESULTP' to true if mapping NFD(S) to title case is a
no-op, or to false otherwise, and returns 0. Upon failure,
returns -1 with `errno' set.
-- Function: int u8_is_casefolded (const uint8_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u16_is_casefolded (const uint16_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u32_is_casefolded (const uint32_t *S, size_t N, const
char *ISO639_LANGUAGE, bool *RESULTP)
Sets `*RESULTP' to true if applying case folding to NFD(S) is a
no-op, or to false otherwise, and returns 0. Upon failure,
returns -1 with `errno' set.
The following functions determine whether case mappings have any
effect on a Unicode string.
-- Function: int u8_is_cased (const uint8_t *S, size_t N, const char
*ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u16_is_cased (const uint16_t *S, size_t N, const char
*ISO639_LANGUAGE, bool *RESULTP)
-- Function: int u32_is_cased (const uint32_t *S, size_t N, const char
*ISO639_LANGUAGE, bool *RESULTP)
Sets `*RESULTP' to true if case matters for S, that is, if mapping
NFD(S) to either upper case or lower case or title case is not a
no-op. Set `*RESULTP' to false if NFD(S) maps to itself under the
upper case mapping, under the lower case mapping, and under the
title case mapping; in other words, when NFD(S) consists entirely
of caseless characters. Upon failure, returns -1 with `errno' set.
File: libunistring.info, Node: uniregex.h, Next: Using the library, Prev: unicase.h, Up: Top
14 Regular expressions `<uniregex.h>'
*************************************
This include file is not yet implemented.
File: libunistring.info, Node: Using the library, Next: More functionality, Prev: uniregex.h, Up: Top
15 Using the library
********************
This chapter explains some practical considerations, regarding the
installation and compiler options that are needed in order to use this
library.
* Menu:
* Installation::
* Compiler options::
* Include files::
* Autoconf macro::
* Reporting problems::
File: libunistring.info, Node: Installation, Next: Compiler options, Up: Using the library
15.1 Installation
=================
Before you can use the library, it must be installed. First, you
have to make sure all dependencies are installed. They are listed in
the file `DEPENDENCIES'.
Then you can proceed to build and install the library, as described
in the file `INSTALL'. For installation on Windows systems, please
refer to the file `README.woe32'.
File: libunistring.info, Node: Compiler options, Next: Include files, Prev: Installation, Up: Using the library
15.2 Compiler options
=====================
Let's denote as `LIBUNISTRING_PREFIX' the value of the `--prefix'
option that you passed to `configure' while installing this package.
If you didn't pass any `--prefix' option, then the package is installed
in `/usr/local'.
Let's denote as `LIBUNISTRING_INCLUDEDIR' the directory where the
include files were installed. This is usually the same as
`${LIBUNISTRING_PREFIX}/include'. Except that if you passed an
`--includedir' option to `configure', it is the value of that option.
Let's further denote as `LIBUNISTRING_LIBDIR' the directory where
the library itself was installed. This is the value that you passed
with the `--libdir' option to `configure', or otherwise the same as
`${LIBUNISTRING_PREFIX}/lib'. Recall that when building in 64-bit mode
on a 64-bit GNU/Linux system that supports executables in either 64-bit
mode or 32-bit mode, you should have used the option
`--libdir=${LIBUNISTRING_PREFIX}/lib64'.
So that the compiler finds the include files, you have to pass it the
option `-I${LIBUNISTRING_INCLUDEDIR}'.
So that the compiler finds the library during its linking pass, you
have to pass it the options `-L${LIBUNISTRING_LIBDIR} -lunistring'. On
some systems, in some configurations, you also have to pass options
needed for linking with `libiconv'. The autoconf macro
`gl_LIBUNISTRING' (see *note Autoconf macro::) deals with this
particularity.
File: libunistring.info, Node: Include files, Next: Autoconf macro, Prev: Compiler options, Up: Using the library
15.3 Include files
==================
Most of the include files have been presented in the introduction,
see *note Introduction::, and subsequent detailed chapters.
Another include file is `<unistring/version.h>'. It contains the
version number of the libunistring library.
-- Macro: int _LIBUNISTRING_VERSION
This constant contains the version of libunistring that is being
used at compile time. It encodes the major and minor parts of the
version number only. These parts are encoded in the form
`(major<<8) + minor'.
-- Constant: int _libunistring_version
This constant contains the version of libunistring that is being
used at run time. It encodes the major and minor parts of the
version number only. These parts are encoded in the form
`(major<<8) + minor'.
It is possible that `_libunistring_version' is greater than
`_LIBUNISTRING_VERSION'. This can happen when you use `libunistring'
as a shared library, and a newer, binary backward-compatible version
has been installed after your program that uses `libunistring' was
installed.
File: libunistring.info, Node: Autoconf macro, Next: Reporting problems, Prev: Include files, Up: Using the library
15.4 Autoconf macro
===================
GNU Gnulib provides an autoconf macro that tests for the availability
of `libunistring'. It is contained in the Gnulib module
`libunistring', see
`http://www.gnu.org/software/gnulib/MODULES.html#module=libunistring'.
The macro is called `gl_LIBUNISTRING'. It searches for an installed
libunistring. If found, it sets and AC_SUBSTs `HAVE_LIBUNISTRING=yes'
and the `LIBUNISTRING' and `LTLIBUNISTRING' variables and augments the
`CPPFLAGS' variable, and defines the C macro `HAVE_LIBUNISTRING' to 1.
Otherwise, it sets and AC_SUBSTs `HAVE_LIBUNISTRING=no' and
`LIBUNISTRING' and `LTLIBUNISTRING' to empty.
The complexities that `gl_LIBUNISTRING' deals with are the following:
* On some operating systems, in some configurations, libunistring
depends on `libiconv', and the options for linking with libiconv
must be mentioned explicitly on the link command line.
* GNU `libunistring', if installed, is not necessarily already in the
search path (`CPPFLAGS' for the include file search path,
`LDFLAGS' for the library search path).
* GNU `libunistring', if installed, is not necessarily already in the
run time library search path. To avoid the need for setting an
environment variable like `LD_LIBRARY_PATH', the macro adds the
appropriate run time search path options to the `LIBUNISTRING'
variable. This works on most systems.
File: libunistring.info, Node: Reporting problems, Prev: Autoconf macro, Up: Using the library
15.5 Reporting problems
=======================
If you encounter any problem, please don't hesitate to send a
detailed bug report to the `bug-libunistring@gnu.org' mailing list.
You can alternatively also use the bug tracker at the project page
`https://savannah.gnu.org/projects/libunistring'.
Please always include the version number of this library, and a short
description of your operating system and compilation environment with
corresponding version numbers.
For problems that appear while building and installing
`libunistring', for which you don't find the remedy in the `INSTALL'
file, please include a description of the options that you passed to
the `configure' script.
File: libunistring.info, Node: More functionality, Next: Licenses, Prev: Using the library, Up: Top
16 More advanced functionality
******************************
For bidirectional reordering of strings, we recommend the GNU
FriBidi library: `http://www.fribidi.org/'.
For the rendering of Unicode strings outside of the context of a
given toolkit (KDE/Qt or GNOME/Gtk), we recommend the Pango library:
`http://www.pango.org/'.
File: libunistring.info, Node: Licenses, Next: Index, Prev: More functionality, Up: Top
Appendix A Licenses
*******************
The files of this package are covered by the licenses indicated in
each particular file or directory. Here is a summary:
* The `libunistring' library is covered by the GNU Lesser General
Public License (LGPL). A copy of the license is included in *note
GNU LGPL::.
* This manual is free documentation. It is dually licensed under the
GNU FDL and the GNU GPL. This means that you can redistribute this
manual under either of these two licenses, at your choice.
This manual is covered by the GNU FDL. Permission is granted to
copy, distribute and/or modify this document under the terms of the
GNU Free Documentation License (FDL), either version 1.2 of the
License, or (at your option) any later version published by the
Free Software Foundation (FSF); with no Invariant Sections, with no
Front-Cover Text, and with no Back-Cover Texts. A copy of the
license is included in *note GNU FDL::.
This manual is covered by the GNU GPL. You can redistribute it
and/or modify it under the terms of the GNU General Public License
(GPL), either version 3 of the License, or (at your option) any
later version published by the Free Software Foundation (FSF). A
copy of the license is included in *note GNU GPL::.
* Menu:
* GNU GPL:: GNU General Public License
* GNU LGPL:: GNU Lesser General Public License
* GNU FDL:: GNU Free Documentation License
File: libunistring.info, Node: GNU GPL, Next: GNU LGPL, Up: Licenses
A.1 GNU GENERAL PUBLIC LICENSE
==============================
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/'
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Preamble
========
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains
free software for all its users. We, the Free Software Foundation, use
the GNU General Public License for most of our software; it applies
also to any other work released this way by its authors. You can apply
it to your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you
have certain responsibilities if you distribute copies of the software,
or if you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the
manufacturer can do so. This is fundamentally incompatible with the
aim of protecting users' freedom to change the software. The
systematic pattern of such abuse occurs in the area of products for
individuals to use, which is precisely where it is most unacceptable.
Therefore, we have designed this version of the GPL to prohibit the
practice for those products. If such problems arise substantially in
other domains, we stand ready to extend this provision to those domains
in future versions of the GPL, as needed to protect the freedom of
users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
====================
0. Definitions.
"This License" refers to version 3 of the GNU General Public
License.
"Copyright" also means copyright-like laws that apply to other
kinds of works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the
work in a fashion requiring copyright permission, other than the
making of an exact copy. The resulting work is called a "modified
version" of the earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work
based on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it
on a computer or modifying a private copy. Propagation includes
copying, distribution (with or without modification), making
available to the public, and in some countries other activities as
well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user
through a computer network, with no transfer of a copy, is not
conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to
the extent that warranties are provided), that licensees may
convey the work under this License, and how to view a copy of this
License. If the interface presents a list of user commands or
options, such as a menu, a prominent item in the list meets this
criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any
non-source form of a work.
A "Standard Interface" means an interface that either is an
official standard defined by a recognized standards body, or, in
the case of interfaces specified for a particular programming
language, one that is widely used among developers working in that
language.
The "System Libraries" of an executable work include anything,
other than the work as a whole, that (a) is included in the normal
form of packaging a Major Component, but which is not part of that
Major Component, and (b) serves only to enable use of the work
with that Major Component, or to implement a Standard Interface
for which an implementation is available to the public in source
code form. A "Major Component", in this context, means a major
essential component (kernel, window system, and so on) of the
specific operating system (if any) on which the executable work
runs, or a compiler used to produce the work, or an object code
interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including
scripts to control those activities. However, it does not include
the work's System Libraries, or general-purpose tools or generally
available free programs which are used unmodified in performing
those activities but which are not part of the work. For example,
Corresponding Source includes interface definition files
associated with source files for the work, and the source code for
shared libraries and dynamically linked subprograms that the work
is specifically designed to require, such as by intimate data
communication or control flow between those subprograms and other
parts of the work.
The Corresponding Source need not include anything that users can
regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running
a covered work is covered by this License only if the output,
given its content, constitutes a covered work. This License
acknowledges your rights of fair use or other equivalent, as
provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise
remains in force. You may convey covered works to others for the
sole purpose of having them make modifications exclusively for
you, or provide you with facilities for running those works,
provided that you comply with the terms of this License in
conveying all material for which you do not control copyright.
Those thus making or running the covered works for you must do so
exclusively on your behalf, under your direction and control, on
terms that prohibit them from making any copies of your
copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section
10 makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under
article 11 of the WIPO copyright treaty adopted on 20 December
1996, or similar laws prohibiting or restricting circumvention of
such measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such
circumvention is effected by exercising rights under this License
with respect to the covered work, and you disclaim any intention
to limit operation or modification of the work as a means of
enforcing, against the work's users, your or third parties' legal
rights to forbid circumvention of technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the
code; keep intact all notices of the absence of any warranty; and
give all recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these
conditions:
a. The work must carry prominent notices stating that you
modified it, and giving a relevant date.
b. The work must carry prominent notices stating that it is
released under this License and any conditions added under
section 7. This requirement modifies the requirement in
section 4 to "keep intact all notices".
c. You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable
section 7 additional terms, to the whole of the work, and all
its parts, regardless of how they are packaged. This License
gives no permission to license the work in any other way, but
it does not invalidate such permission if you have separately
received it.
d. If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has
interactive interfaces that do not display Appropriate Legal
Notices, your work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered
work, and which are not combined with it such as to form a larger
program, in or on a volume of a storage or distribution medium, is
called an "aggregate" if the compilation and its resulting
copyright are not used to limit the access or legal rights of the
compilation's users beyond what the individual works permit.
Inclusion of a covered work in an aggregate does not cause this
License to apply to the other parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this
License, in one of these ways:
a. Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b. Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for
as long as you offer spare parts or customer support for that
product model, to give anyone who possesses the object code
either (1) a copy of the Corresponding Source for all the
software in the product that is covered by this License, on a
durable physical medium customarily used for software
interchange, for a price no more than your reasonable cost of
physically performing this conveying of source, or (2) access
to copy the Corresponding Source from a network server at no
charge.
c. Convey individual copies of the object code with a copy of
the written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially,
and only if you received the object code with such an offer,
in accord with subsection 6b.
d. Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access
to the Corresponding Source in the same way through the same
place at no further charge. You need not require recipients
to copy the Corresponding Source along with the object code.
If the place to copy the object code is a network server, the
Corresponding Source may be on a different server (operated
by you or a third party) that supports equivalent copying
facilities, provided you maintain clear directions next to
the object code saying where to find the Corresponding Source.
Regardless of what server hosts the Corresponding Source, you
remain obligated to ensure that it is available for as long
as needed to satisfy these requirements.
e. Convey the object code using peer-to-peer transmission,
provided you inform other peers where the object code and
Corresponding Source of the work are being offered to the
general public at no charge under subsection 6d.
A separable portion of the object code, whose source code is
excluded from the Corresponding Source as a System Library, need
not be included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means
any tangible personal property which is normally used for personal,
family, or household purposes, or (2) anything designed or sold for
incorporation into a dwelling. In determining whether a product
is a consumer product, doubtful cases shall be resolved in favor of
coverage. For a particular product received by a particular user,
"normally used" refers to a typical or common use of that class of
product, regardless of the status of the particular user or of the
way in which the particular user actually uses, or expects or is
expected to use, the product. A product is a consumer product
regardless of whether the product has substantial commercial,
industrial or non-consumer uses, unless such uses represent the
only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to
install and execute modified versions of a covered work in that
User Product from a modified version of its Corresponding Source.
The information must suffice to ensure that the continued
functioning of the modified object code is in no case prevented or
interfered with solely because modification has been made.
If you convey an object code work under this section in, or with,
or specifically for use in, a User Product, and the conveying
occurs as part of a transaction in which the right of possession
and use of the User Product is transferred to the recipient in
perpetuity or for a fixed term (regardless of how the transaction
is characterized), the Corresponding Source conveyed under this
section must be accompanied by the Installation Information. But
this requirement does not apply if neither you nor any third party
retains the ability to install modified object code on the User
Product (for example, the work has been installed in ROM).
The requirement to provide Installation Information does not
include a requirement to continue to provide support service,
warranty, or updates for a work that has been modified or
installed by the recipient, or for the User Product in which it
has been modified or installed. Access to a network may be denied
when the modification itself materially and adversely affects the
operation of the network or violates the rules and protocols for
communication across the network.
Corresponding Source conveyed, and Installation Information
provided, in accord with this section must be in a format that is
publicly documented (and with an implementation available to the
public in source code form), and must require no special password
or key for unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of
this License by making exceptions from one or more of its
conditions. Additional permissions that are applicable to the
entire Program shall be treated as though they were included in
this License, to the extent that they are valid under applicable
law. If additional permissions apply only to part of the Program,
that part may be used separately under those permissions, but the
entire Program remains governed by this License without regard to
the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part
of it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material
you add to a covered work, you may (if authorized by the copyright
holders of that material) supplement the terms of this License
with terms:
a. Disclaiming warranty or limiting liability differently from
the terms of sections 15 and 16 of this License; or
b. Requiring preservation of specified reasonable legal notices
or author attributions in that material or in the Appropriate
Legal Notices displayed by works containing it; or
c. Prohibiting misrepresentation of the origin of that material,
or requiring that modified versions of such material be
marked in reasonable ways as different from the original
version; or
d. Limiting the use for publicity purposes of names of licensors
or authors of the material; or
e. Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f. Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified
versions of it) with contractual assumptions of liability to
the recipient, for any liability that these contractual
assumptions directly impose on those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as
you received it, or any part of it, contains a notice stating that
it is governed by this License along with a term that is a further
restriction, you may remove that term. If a license document
contains a further restriction but permits relicensing or
conveying under this License, you may add to a covered work
material governed by the terms of that license document, provided
that the further restriction does not survive such relicensing or
conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in
the form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights
under this License (including any patent licenses granted under
the third paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly
and finally terminates your license, and (b) permanently, if the
copyright holder fails to notify you of the violation by some
reasonable means prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from
that copyright holder, and you cure the violation prior to 30 days
after your receipt of the notice.
Termination of your rights under this section does not terminate
the licenses of parties who have received copies or rights from
you under this License. If your rights have been terminated and
not permanently reinstated, you do not qualify to receive new
licenses for the same material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer
transmission to receive a copy likewise does not require
acceptance. However, nothing other than this License grants you
permission to propagate or modify any covered work. These actions
infringe copyright if you do not accept this License. Therefore,
by modifying or propagating a covered work, you indicate your
acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not
responsible for enforcing compliance by third parties with this
License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a
covered work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or
could give under the previous paragraph, plus a right to
possession of the Corresponding Source of the work from the
predecessor in interest, if the predecessor has it or can get it
with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you
may not impose a license fee, royalty, or other charge for
exercise of rights granted under this License, and you may not
initiate litigation (including a cross-claim or counterclaim in a
lawsuit) alleging that any patent claim is infringed by making,
using, selling, offering for sale, or importing the Program or any
portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based.
The work thus licensed is called the contributor's "contributor
version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner,
permitted by this License, of making, using, or selling its
contributor version, but do not include claims that would be
infringed only as a consequence of further modification of the
contributor version. For purposes of this definition, "control"
includes the right to grant patent sublicenses in a manner
consistent with the requirements of this License.
Each contributor grants you a non-exclusive, worldwide,
royalty-free patent license under the contributor's essential
patent claims, to make, use, sell, offer for sale, import and
otherwise run, modify and propagate the contents of its
contributor version.
In the following three paragraphs, a "patent license" is any
express agreement or commitment, however denominated, not to
enforce a patent (such as an express permission to practice a
patent or covenant not to sue for patent infringement). To
"grant" such a patent license to a party means to make such an
agreement or commitment not to enforce a patent against the party.
If you convey a covered work, knowingly relying on a patent
license, and the Corresponding Source of the work is not available
for anyone to copy, free of charge and under the terms of this
License, through a publicly available network server or other
readily accessible means, then you must either (1) cause the
Corresponding Source to be so available, or (2) arrange to deprive
yourself of the benefit of the patent license for this particular
work, or (3) arrange, in a manner consistent with the requirements
of this License, to extend the patent license to downstream
recipients. "Knowingly relying" means you have actual knowledge
that, but for the patent license, your conveying the covered work
in a country, or your recipient's use of the covered work in a
country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate,
modify or convey a specific copy of the covered work, then the
patent license you grant is automatically extended to all
recipients of the covered work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that
are specifically granted under this License. You may not convey a
covered work if you are a party to an arrangement with a third
party that is in the business of distributing software, under
which you make payment to the third party based on the extent of
your activity of conveying the work, and under which the third
party grants, to any of the parties who would receive the covered
work from you, a discriminatory patent license (a) in connection
with copies of the covered work conveyed by you (or copies made
from those copies), or (b) primarily for and in connection with
specific products or compilations that contain the covered work,
unless you entered into that arrangement, or that patent license
was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order,
agreement or otherwise) that contradict the conditions of this
License, they do not excuse you from the conditions of this
License. If you cannot convey a covered work so as to satisfy
simultaneously your obligations under this License and any other
pertinent obligations, then as a consequence you may not convey it
at all. For example, if you agree to terms that obligate you to
collect a royalty for further conveying from those to whom you
convey the Program, the only way you could satisfy both those
terms and this License would be to refrain entirely from conveying
the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a
single combined work, and to convey the resulting work. The terms
of this License will continue to apply to the part which is the
covered work, but the special requirements of the GNU Affero
General Public License, section 13, concerning interaction through
a network will apply to the combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new
versions of the GNU General Public License from time to time.
Such new versions will be similar in spirit to the present
version, but may differ in detail to address new problems or
concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU
General Public License "or any later version" applies to it, you
have the option of following the terms and conditions either of
that numbered version or of any later version published by the
Free Software Foundation. If the Program does not specify a
version number of the GNU General Public License, you may choose
any version ever published by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that
proxy's public statement of acceptance of a version permanently
authorizes you to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE
COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS"
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE
RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES
AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE
THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA
BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF
THE POSSIBILITY OF SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely
approximates an absolute waiver of all civil liability in
connection with the Program, unless a warranty or assumption of
liability accompanies a copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
===========================
How to Apply These Terms to Your New Programs
=============================================
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these
terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.
ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
Copyright (C) YEAR NAME OF AUTHOR
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or (at
your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see `http://www.gnu.org/licenses/'.
Also add information on how to contact you by electronic and paper
mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
PROGRAM Copyright (C) YEAR NAME OF AUTHOR
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the
appropriate parts of the General Public License. Of course, your
program's commands might be different; for a GUI interface, you would
use an "about box".
You should also get your employer (if you work as a programmer) or
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. For more information on this, and how to apply and follow
the GNU GPL, see `http://www.gnu.org/licenses/'.
The GNU General Public License does not permit incorporating your
program into proprietary programs. If your program is a subroutine
library, you may consider it more useful to permit linking proprietary
applications with the library. If this is what you want to do, use the
GNU Lesser General Public License instead of this License. But first,
please read `http://www.gnu.org/philosophy/why-not-lgpl.html'.
File: libunistring.info, Node: GNU LGPL, Next: GNU FDL, Prev: GNU GPL, Up: Licenses
A.2 GNU LESSER GENERAL PUBLIC LICENSE
=====================================
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. `http://fsf.org/'
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU
Lesser General Public License, and the "GNU GPL" refers to version
3 of the GNU General Public License.
"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface
provided by the Library, but which is not otherwise based on the
Library. Defining a subclass of a class defined by the Library is
deemed a mode of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the
Library with which the Combined Work was made is also called the
"Linked Version".
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source
code for portions of the Combined Work that, considered in
isolation, are based on the Application, and not on the Linked
Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any
data and utility programs needed for reproducing the Combined Work
from the Application, but excluding the System Libraries of the
Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this
License without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an
Application that uses the facility (other than as an argument
passed when the facility is invoked), then you may convey a copy
of the modified version:
a. under this License, provided that you make a good faith
effort to ensure that, in the event an Application does not
supply the function or data, the facility still operates, and
performs whatever part of its purpose remains meaningful, or
b. under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material
from a header file that is part of the Library. You may convey
such object code under terms of your choice, provided that, if the
incorporated material is not limited to numerical parameters, data
structure layouts and accessors, or small macros, inline functions
and templates (ten or fewer lines in length), you do both of the
following:
a. Give prominent notice with each copy of the object code that
the Library is used in it and that the Library and its use are
covered by this License.
b. Accompany the object code with a copy of the GNU GPL and this
license document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each
of the following:
a. Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b. Accompany the Combined Work with a copy of the GNU GPL and
this license document.
c. For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to
the copies of the GNU GPL and this license document.
d. Do one of the following:
0. Convey the Minimal Corresponding Source under the terms
of this License, and the Corresponding Application Code
in a form suitable for, and under terms that permit, the
user to recombine or relink the Application with a
modified version of the Linked Version to produce a
modified Combined Work, in the manner specified by
section 6 of the GNU GPL for conveying Corresponding
Source.
1. Use a suitable shared library mechanism for linking with
the Library. A suitable mechanism is one that (a) uses
at run time a copy of the Library already present on the
user's computer system, and (b) will operate properly
with a modified version of the Library that is
interface-compatible with the Linked Version.
e. Provide Installation Information, but only if you would
otherwise be required to provide such information under
section 6 of the GNU GPL, and only to the extent that such
information is necessary to install and execute a modified
version of the Combined Work produced by recombining or
relinking the Application with a modified version of the
Linked Version. (If you use option 4d0, the Installation
Information must accompany the Minimal Corresponding Source
and Corresponding Application Code. If you use option 4d1,
you must provide the Installation Information in the manner
specified by section 6 of the GNU GPL for conveying
Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the
Library side by side in a single library together with other
library facilities that are not Applications and are not covered
by this License, and convey such a combined library under terms of
your choice, if you do both of the following:
a. Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities, conveyed under the terms of this License.
b. Give prominent notice with the combined library that part of
it is a work based on the Library, and explaining where to
find the accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new
versions of the GNU Lesser General Public License from time to
time. Such new versions will be similar in spirit to the present
version, but may differ in detail to address new problems or
concerns.
Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered
version of the GNU Lesser General Public License "or any later
version" applies to it, you have the option of following the terms
and conditions either of that published version or of any later
version published by the Free Software Foundation. If the Library
as you received it does not specify a version number of the GNU
Lesser General Public License, you may choose any version of the
GNU Lesser General Public License ever published by the Free
Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License
shall apply, that proxy's public statement of acceptance of any
version is permanent authorization for you to choose that version
for the Library.
File: libunistring.info, Node: GNU FDL, Prev: GNU LGPL, Up: Licenses
A.3 GNU Free Documentation License
==================================
Version 1.3, 3 November 2008
Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
`http://fsf.org/'
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other
functional and useful document "free" in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or
noncommercially. Secondarily, this License preserves for the
author and publisher a way to get credit for their work, while not
being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense.
It complements the GNU General Public License, which is a copyleft
license designed for free software.
We have designed this License in order to use it for manuals for
free software, because free software needs free documentation: a
free program should come with manuals providing the same freedoms
that the software does. But this License is not limited to
software manuals; it can be used for any textual work, regardless
of subject matter or whether it is published as a printed book.
We recommend this License principally for works whose purpose is
instruction or reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium,
that contains a notice placed by the copyright holder saying it
can be distributed under the terms of this License. Such a notice
grants a world-wide, royalty-free license, unlimited in duration,
to use that work under the conditions stated herein. The
"Document", below, refers to any such manual or work. Any member
of the public is a licensee, and is addressed as "you". You
accept the license if you copy, modify or distribute the work in a
way requiring permission under copyright law.
A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section
of the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall
subject (or to related matters) and contains nothing that could
fall directly within that overall subject. (Thus, if the Document
is in part a textbook of mathematics, a Secondary Section may not
explain any mathematics.) The relationship could be a matter of
historical connection with the subject or with related matters, or
of legal, commercial, philosophical, ethical or political position
regarding them.
The "Invariant Sections" are certain Secondary Sections whose
titles are designated, as being those of Invariant Sections, in
the notice that says that the Document is released under this
License. If a section does not fit the above definition of
Secondary then it is not allowed to be designated as Invariant.
The Document may contain zero Invariant Sections. If the Document
does not identify any Invariant Sections then there are none.
The "Cover Texts" are certain short passages of text that are
listed, as Front-Cover Texts or Back-Cover Texts, in the notice
that says that the Document is released under this License. A
Front-Cover Text may be at most 5 words, and a Back-Cover Text may
be at most 25 words.
A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images
composed of pixels) generic paint programs or (for drawings) some
widely available drawing editor, and that is suitable for input to
text formatters or for automatic translation to a variety of
formats suitable for input to text formatters. A copy made in an
otherwise Transparent file format whose markup, or absence of
markup, has been arranged to thwart or discourage subsequent
modification by readers is not Transparent. An image format is
not Transparent if used for any substantial amount of text. A
copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format,
SGML or XML using a publicly available DTD, and
standard-conforming simple HTML, PostScript or PDF designed for
human modification. Examples of transparent image formats include
PNG, XCF and JPG. Opaque formats include proprietary formats that
can be read and edited only by proprietary word processors, SGML or
XML for which the DTD and/or processing tools are not generally
available, and the machine-generated HTML, PostScript or PDF
produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the
material this License requires to appear in the title page. For
works in formats which do not have any title page as such, "Title
Page" means the text near the most prominent appearance of the
work's title, preceding the beginning of the body of the text.
The "publisher" means any person or entity that distributes copies
of the Document to the public.
A section "Entitled XYZ" means a named subunit of the Document
whose title either is precisely XYZ or contains XYZ in parentheses
following text that translates XYZ in another language. (Here XYZ
stands for a specific section name mentioned below, such as
"Acknowledgements", "Dedications", "Endorsements", or "History".)
To "Preserve the Title" of such a section when you modify the
Document means that it remains a section "Entitled XYZ" according
to this definition.
The Document may include Warranty Disclaimers next to the notice
which states that this License applies to the Document. These
Warranty Disclaimers are considered to be included by reference in
this License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and
has no effect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License
applies to the Document are reproduced in all copies, and that you
add no other conditions whatsoever to those of this License. You
may not use technical measures to obstruct or control the reading
or further copying of the copies you make or distribute. However,
you may accept compensation in exchange for copies. If you
distribute a large enough number of copies you must also follow
the conditions in section 3.
You may also lend copies, under the same conditions stated above,
and you may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly
have printed covers) of the Document, numbering more than 100, and
the Document's license notice requires Cover Texts, you must
enclose the copies in covers that carry, clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front cover, and
Back-Cover Texts on the back cover. Both covers must also clearly
and legibly identify you as the publisher of these copies. The
front cover must present the full title with all words of the
title equally prominent and visible. You may add other material
on the covers in addition. Copying with changes limited to the
covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in
other respects.
If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto
adjacent pages.
If you publish or distribute Opaque copies of the Document
numbering more than 100, you must either include a
machine-readable Transparent copy along with each Opaque copy, or
state in or with each Opaque copy a computer-network location from
which the general network-using public has access to download
using public-standard network protocols a complete Transparent
copy of the Document, free of added material. If you use the
latter option, you must take reasonably prudent steps, when you
begin distribution of Opaque copies in quantity, to ensure that
this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you
distribute an Opaque copy (directly or through your agents or
retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of
the Document well before redistributing any large number of
copies, to give them a chance to provide you with an updated
version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document
under the conditions of sections 2 and 3 above, provided that you
release the Modified Version under precisely this License, with
the Modified Version filling the role of the Document, thus
licensing distribution and modification of the Modified Version to
whoever possesses a copy of it. In addition, you must do these
things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title
distinct from that of the Document, and from those of
previous versions (which should, if there were any, be listed
in the History section of the Document). You may use the
same title as a previous version if the original publisher of
that version gives permission.
B. List on the Title Page, as authors, one or more persons or
entities responsible for authorship of the modifications in
the Modified Version, together with at least five of the
principal authors of the Document (all of its principal
authors, if it has fewer than five), unless they release you
from this requirement.
C. State on the Title page the name of the publisher of the
Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license
notice giving the public permission to use the Modified
Version under the terms of this License, in the form shown in
the Addendum below.
G. Preserve in that license notice the full lists of Invariant
Sections and required Cover Texts given in the Document's
license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled "History", Preserve its Title,
and add to it an item stating at least the title, year, new
authors, and publisher of the Modified Version as given on
the Title Page. If there is no section Entitled "History" in
the Document, create one stating the title, year, authors,
and publisher of the Document as given on its Title Page,
then add an item describing the Modified Version as stated in
the previous sentence.
J. Preserve the network location, if any, given in the Document
for public access to a Transparent copy of the Document, and
likewise the network locations given in the Document for
previous versions it was based on. These may be placed in
the "History" section. You may omit a network location for a
work that was published at least four years before the
Document itself, or if the original publisher of the version
it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications",
Preserve the Title of the section, and preserve in the
section all the substance and tone of each of the contributor
acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles. Section numbers
or the equivalent are not considered part of the section
titles.
M. Delete any section Entitled "Endorsements". Such a section
may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled
"Endorsements" or to conflict in title with any Invariant
Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no
material copied from the Document, you may at your option
designate some or all of these sections as invariant. To do this,
add their titles to the list of Invariant Sections in the Modified
Version's license notice. These titles must be distinct from any
other section titles.
You may add a section Entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text
has been approved by an organization as the authoritative
definition of a standard.
You may add a passage of up to five words as a Front-Cover Text,
and a passage of up to 25 words as a Back-Cover Text, to the end
of the list of Cover Texts in the Modified Version. Only one
passage of Front-Cover Text and one of Back-Cover Text may be
added by (or through arrangements made by) any one entity. If the
Document already includes a cover text for the same cover,
previously added by you or by arrangement made by the same entity
you are acting on behalf of, you may not add another; but you may
replace the old one, on explicit permission from the previous
publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this
License give permission to use their names for publicity for or to
assert or imply endorsement of any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under
this License, under the terms defined in section 4 above for
modified versions, provided that you include in the combination
all of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your
combined work in its license notice, and that you preserve all
their Warranty Disclaimers.
The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name
but different contents, make the title of each such section unique
by adding at the end of it, in parentheses, the name of the
original author or publisher of that section if known, or else a
unique number. Make the same adjustment to the section titles in
the list of Invariant Sections in the license notice of the
combined work.
In the combination, you must combine any sections Entitled
"History" in the various original documents, forming one section
Entitled "History"; likewise combine any sections Entitled
"Acknowledgements", and any sections Entitled "Dedications". You
must delete all sections Entitled "Endorsements."
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other
documents released under this License, and replace the individual
copies of this License in the various documents with a single copy
that is included in the collection, provided that you follow the
rules of this License for verbatim copying of each of the
documents in all other respects.
You may extract a single document from such a collection, and
distribute it individually under this License, provided you insert
a copy of this License into the extracted document, and follow
this License in all other respects regarding verbatim copying of
that document.
7. AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other
separate and independent documents or works, in or on a volume of
a storage or distribution medium, is called an "aggregate" if the
copyright resulting from the compilation is not used to limit the
legal rights of the compilation's users beyond what the individual
works permit. When the Document is included in an aggregate, this
License does not apply to the other works in the aggregate which
are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half
of the entire aggregate, the Document's Cover Texts may be placed
on covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic
form. Otherwise they must appear on printed covers that bracket
the whole aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section
4. Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also
include the original English version of this License and the
original versions of those notices and disclaimers. In case of a
disagreement between the translation and the original version of
this License or a notice or disclaimer, the original version will
prevail.
If a section in the Document is Entitled "Acknowledgements",
"Dedications", or "History", the requirement (section 4) to
Preserve its Title (section 1) will typically require changing the
actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense, or distribute it is void,
and will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly
and finally terminates your license, and (b) permanently, if the
copyright holder fails to notify you of the violation by some
reasonable means prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from
that copyright holder, and you cure the violation prior to 30 days
after your receipt of the notice.
Termination of your rights under this section does not terminate
the licenses of parties who have received copies or rights from
you under this License. If your rights have been terminated and
not permanently reinstated, receipt of a copy of some or all of
the same material does not give you any rights to use it.
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of
the GNU Free Documentation License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns. See
`http://www.gnu.org/copyleft/'.
Each version of the License is given a distinguishing version
number. If the Document specifies that a particular numbered
version of this License "or any later version" applies to it, you
have the option of following the terms and conditions either of
that specified version or of any later version that has been
published (not as a draft) by the Free Software Foundation. If
the Document does not specify a version number of this License,
you may choose any version ever published (not as a draft) by the
Free Software Foundation. If the Document specifies that a proxy
can decide which future versions of this License can be used, that
proxy's public statement of acceptance of a version permanently
authorizes you to choose that version for the Document.
11. RELICENSING
"Massive Multiauthor Collaboration Site" (or "MMC Site") means any
World Wide Web server that publishes copyrightable works and also
provides prominent facilities for anybody to edit those works. A
public wiki that anybody can edit is an example of such a server.
A "Massive Multiauthor Collaboration" (or "MMC") contained in the
site means any set of copyrightable works thus published on the MMC
site.
"CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0
license published by Creative Commons Corporation, a not-for-profit
corporation with a principal place of business in San Francisco,
California, as well as future copyleft versions of that license
published by that same organization.
"Incorporate" means to publish or republish a Document, in whole or
in part, as part of another Document.
An MMC is "eligible for relicensing" if it is licensed under this
License, and if all works that were first published under this
License somewhere other than this MMC, and subsequently
incorporated in whole or in part into the MMC, (1) had no cover
texts or invariant sections, and (2) were thus incorporated prior
to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the
site under CC-BY-SA on the same site at any time before August 1,
2009, provided the MMC is eligible for relicensing.
ADDENDUM: How to use this License for your documents
====================================================
To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and license
notices just after the title page:
Copyright (C) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
If you have Invariant Sections, Front-Cover Texts and Back-Cover
Texts, replace the "with...Texts." line with this:
with the Invariant Sections being LIST THEIR TITLES, with
the Front-Cover Texts being LIST, and with the Back-Cover Texts
being LIST.
If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the
situation.
If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License, to
permit their use in free software.
File: libunistring.info, Node: Index, Prev: Licenses, Up: Top
Index
*****
|