[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
<unistr.h>
This include file declares elementary functions for Unicode strings. It is
essentially the equivalent of what <string.h>
is for C strings.
The following function is available to verify the integrity of a Unicode string.
This function checks whether a Unicode string is well-formed. It returns NULL if valid, or a pointer to the first invalid unit otherwise.
The following functions perform conversions between the different forms of Unicode strings.
Converts an UTF-8 string to an UTF-16 string.
The resultbuf and lengthp arguments are as described in chapter Conventions.
Converts an UTF-8 string to an UTF-32 string.
The resultbuf and lengthp arguments are as described in chapter Conventions.
Converts an UTF-16 string to an UTF-8 string.
The resultbuf and lengthp arguments are as described in chapter Conventions.
Converts an UTF-16 string to an UTF-32 string.
The resultbuf and lengthp arguments are as described in chapter Conventions.
Converts an UTF-32 string to an UTF-8 string.
The resultbuf and lengthp arguments are as described in chapter Conventions.
Converts an UTF-32 string to an UTF-16 string.
The resultbuf and lengthp arguments are as described in chapter Conventions.
The following functions inspect and return details about the first character in a Unicode string.
Returns the length (number of units) of the first character in s, which is no longer than n. Returns 0 if it is the NUL character. Returns -1 upon failure.
This function is similar to mblen
, except that it operates on a
Unicode string and that s must not be NULL.
Returns the length (number of units) of the first character in s,
putting its ucs4_t
representation in *puc
. Upon failure,
*puc
is set to 0xfffd
, and an appropriate number of units
is returned.
The number of available units, n, must be > 0.
This function fails if an invalid sequence of units is encountered at the beginning of s, or if additional units (after the n provided units) would be needed to form a character.
This function is similar to mbtowc
, except that it operates on a
Unicode string, puc and s must not be NULL, n must be > 0,
and the NUL character is not treated specially.
This function is identical to u8_mbtouc
/u16_mbtouc
/u32_mbtouc
.
Earlier versions of this function performed fewer range-checks on the sequence
of units.
Returns the length (number of units) of the first character in s,
putting its ucs4_t
representation in *puc
. Upon failure,
*puc
is set to 0xfffd
, and -1 is returned for an invalid
sequence of units, -2 is returned for an incomplete sequence of units.
The number of available units, n, must be > 0.
This function is similar to u8_mbtouc
, except that the return value
gives more details about the failure, similar to mbrtowc
.
The following function stores a Unicode character as a Unicode string in memory.
Puts the multibyte character represented by uc in s, returning its length. Returns -1 upon failure, -2 if the number of available units, n, is too small. The latter case cannot occur if n >= 6/2/1, respectively.
This function is similar to wctomb
, except that it operates on a
Unicode strings, s must not be NULL, and the argument n must be
specified.
The following functions copy Unicode strings in memory.
Copies n units from src to dest.
This function is similar to memcpy
, except that it operates on
Unicode strings.
Copies n units from src to dest, returning a pointer after the last written unit.
This function is similar to mempcpy
, except that it operates on
Unicode strings.
Copies n units from src to dest, guaranteeing correct behavior for overlapping memory areas.
This function is similar to memmove
, except that it operates on
Unicode strings.
The following function fills a Unicode string.
Sets the first n characters of s to uc. uc should be a character that occupies only 1 unit.
This function is similar to memset
, except that it operates on
Unicode strings.
The following function compares two Unicode strings of the same length.
Compares s1 and s2, each of length n, lexicographically. Returns a negative value if s1 compares smaller than s2, a positive value if s1 compares larger than s2, or 0 if they compare equal.
This function is similar to memcmp
, except that it operates on
Unicode strings.
The following function compares two Unicode strings of possibly different lengths.
Compares s1 and s2, lexicographically. Returns a negative value if s1 compares smaller than s2, a positive value if s1 compares larger than s2, or 0 if they compare equal.
This function is similar to the gnulib function memcmp2
, except that it
operates on Unicode strings.
The following function searches for a given Unicode character.
Searches the string at s for uc. Returns a pointer to the first occurrence of uc in s, or NULL if uc does not occur in s.
This function is similar to memchr
, except that it operates on
Unicode strings.
The following function counts the number of Unicode characters.
Counts and returns the number of Unicode characters in the n units from s.
This function is similar to the gnulib function mbsnlen
, except that
it operates on Unicode strings.
The following function copies a Unicode string.
Makes a freshly allocated copy of s, of length n.
The following functions inspect and return details about the first character in a Unicode string.
Returns the length (number of units) of the first character in s. Returns 0 if it is the NUL character. Returns -1 upon failure.
Returns the length (number of units) of the first character in s,
putting its ucs4_t
representation in *puc
. Returns 0
if it is the NUL character. Returns -1 upon failure.
Forward iteration step. Advances the pointer past the next character,
or returns NULL if the end of the string has been reached. Puts the
character's ucs4_t
representation in *puc
.
The following function inspects and returns details about the previous character in a Unicode string.
Backward iteration step. Advances the pointer to point to the previous
character (the one that ends at s
), or returns NULL if the
beginning of the string (specified by start
) had been reached.
Puts the character's ucs4_t
representation in *puc
.
Note that this function works only on well-formed Unicode strings.
The following functions determine the length of a Unicode string.
Returns the number of units in s.
This function is similar to strlen
and wcslen
, except
that it operates on Unicode strings.
Returns the number of units in s, but at most maxlen.
This function is similar to strnlen
and wcsnlen
, except
that it operates on Unicode strings.
The following functions copy portions of Unicode strings in memory.
Copies src to dest.
This function is similar to strcpy
and wcscpy
, except
that it operates on Unicode strings.
Copies src to dest, returning the address of the terminating NUL in dest.
This function is similar to stpcpy
, except that it operates on
Unicode strings.
Copies no more than n units of src to dest.
This function is similar to strncpy
and wcsncpy
, except
that it operates on Unicode strings.
Copies no more than n units of src to dest. Returns a
pointer past the last non-NUL unit written into dest. In other words,
if the units written into dest include a NUL, the return value is the
address of the first such NUL unit, otherwise it is
dest + n
.
This function is similar to stpncpy
, except that it operates on
Unicode strings.
Appends src onto dest.
This function is similar to strcat
and wcscat
, except
that it operates on Unicode strings.
Appends no more than n units of src onto dest.
This function is similar to strncat
and wcsncat
, except
that it operates on Unicode strings.
The following functions compare two Unicode strings. They ignore locale-dependent collation rules.
Compares s1 and s2, lexicographically. Returns a negative value if s1 compares smaller than s2, a positive value if s1 compares larger than s2, or 0 if they compare equal.
This function is similar to strcmp
and wcscmp
, except
that it operates on Unicode strings.
Compares s1 and s2 using the collation rules of the current
locale.
Returns -1 if s1 < s2, 0 if s1 = s2, 1 if
s1 > s2. Upon failure, sets errno
and returns any value.
This function is similar to strcoll
and wcscoll
, except
that it operates on Unicode strings.
Note that this function may consider different canonical normalizations
of the same string as having a large distance. It is therefore better to
use the function u8_normcoll
instead of this one; see Normalization forms (composition and decomposition) <uninorm.h>
.
Compares no more than n units of s1 and s2.
This function is similar to strncmp
and wcsncmp
, except
that it operates on Unicode strings.
The following function allocates a duplicate of a Unicode string.
Duplicates s, returning an identical malloc'd string.
This function is similar to strdup
and wcsdup
, except
that it operates on Unicode strings.
The following functions search for a given Unicode character.
Finds the first occurrence of uc in str.
This function is similar to strchr
and wcschr
, except
that it operates on Unicode strings.
Finds the last occurrence of uc in str.
This function is similar to strrchr
and wcsrchr
, except
that it operates on Unicode strings.
The following functions search for the first occurrence of some Unicode character in or outside a given set of Unicode characters.
Returns the length of the initial segment of str which consists entirely of Unicode characters not in reject.
This function is similar to strcspn
and wcscspn
, except
that it operates on Unicode strings.
Returns the length of the initial segment of str which consists entirely of Unicode characters in accept.
This function is similar to strspn
and wcsspn
, except
that it operates on Unicode strings.
Finds the first occurrence in str of any character in accept.
This function is similar to strpbrk
and wcspbrk
, except
that it operates on Unicode strings.
The following functions search whether a given Unicode string is a substring of another Unicode string.
Finds the first occurrence of needle in haystack.
This function is similar to strstr
and wcsstr
, except
that it operates on Unicode strings.
Tests whether str starts with prefix.
Tests whether str ends with suffix.
The following function does one step in tokenizing a Unicode string.
Divides str into tokens separated by characters in delim.
This function is similar to strtok_r
and wcstok
, except
that it operates on Unicode strings. Its interface is actually more similar to
wcstok
than to strtok
.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Bruno Haible on October, 16 2024 using texi2html 1.78a.