summaryrefslogtreecommitdiff
path: root/doc/RE
diff options
context:
space:
mode:
authorJörg Frings-Fürst <debian@jff.email>2019-07-12 09:18:45 +0200
committerJörg Frings-Fürst <debian@jff.email>2019-07-12 09:18:45 +0200
commited7b90269a529d90d151969348e55347fae02c0d (patch)
tree7c72e5a49112a844a53a2c10ef1fda47867aee92 /doc/RE
parent6cac1738b6aebf75b29d225775030d291319f9a2 (diff)
parent5c29b174a11e992dac68553f08645fe4a825c315 (diff)
Merge branch 'feature/upstream' into develop
Diffstat (limited to 'doc/RE')
-rw-r--r--doc/RE123
1 files changed, 69 insertions, 54 deletions
diff --git a/doc/RE b/doc/RE
index 963d009..72957dd 100644
--- a/doc/RE
+++ b/doc/RE
@@ -1,4 +1,4 @@
-Oniguruma Regular Expressions Version 6.8.0 2018/07/26
+Oniguruma Regular Expressions Version 6.9.2 2019/03/29
syntax: ONIG_SYNTAX_ONIGURUMA (default)
@@ -81,15 +81,23 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
\O true anychar (?m:.) (* original function)
- \X Extended Grapheme Cluster (?>\O(?:\Y\O)*)
+ \X Text Segment \X === (?>\O(?:\Y\O)*)
- \X doesn't check whether matching start position is boundary.
- Write as \y\X if you want to ensure it.
+ The meaning of this operator changes depending on the setting of
+ the option (?y{..}).
- Unicode case:
- See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
+ \X doesn't check whether matching start position is boundary or not.
+ Please write as \y\X if you want to ensure it.
- Not Unicode: (?>\r\n|\O)
+ [Extended Grapheme Cluster mode] (default)
+ Unicode case:
+ See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
+
+ Not Unicode case: \X === (?>\r\n|\O)
+
+ [Word mode]
+ Currently, this mode is supported in Unicode only.
+ See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
Character Property
@@ -119,17 +127,17 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
? 1 or 0 times
* 0 or more times
+ 1 or more times
- {n,m} at least n but no more than m times
+ {n,m} (n <= m) at least n but no more than m times
{n,} at least n times
{,n} at least 0 but no more than n times ({0,n})
{n} n times
reluctant
- ?? 1 or 0 times
+ ?? 0 or 1 times
*? 0 or more times
+? 1 or more times
- {n,m}? at least n but not more than m times
+ {n,m}? (n <= m) at least n but not more than m times
{n,}? at least n times
{,n}? at least 0 but not more than n times (== {0,n}?)
@@ -138,8 +146,10 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
?+ 1 or 0 times
*+ 0 or more times
++ 1 or more times
+ {n,m} (n > m) at least m but not more than n times
- ({n,m}+, {n,}+, {n}+ are possessive op. in ONIG_SYNTAX_JAVA only)
+ {n,m}+, {n,}+, {n}+ are possessive operators in ONIG_SYNTAX_JAVA and
+ ONIG_SYNTAX_PERL only.
ex. /a*+/ === /(?>a*)/
@@ -150,8 +160,6 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
$ end of the line
\b word boundary
\B non-word boundary
- \y Extended Grapheme Cluster boundary
- \Y Extended Grapheme Cluster non-boundary
\A beginning of string
\Z end of string, or before newline at the end
@@ -160,6 +168,24 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
\K keep (keep start position of the result string)
+ \y Text Segment boundary
+ \Y Text Segment non-boundary
+
+ The meaning of these operators(\y, \Y) changes depending on the setting
+ of the option (?y{..}).
+
+ [Extended Grapheme Cluster mode] (default)
+ Unicode case:
+ See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
+
+ Not Unicode:
+ All positions except between \r and \n.
+
+ [Word mode]
+ Currently, this mode is supported in Unicode only.
+ See [Unicode Standard Annex #29: http://unicode.org/reports/tr29/]
+
+
6. Character class
@@ -221,20 +247,28 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
(?#...) comment
- (?imxWDSP-imxWDSP:subexp) option on/off for subexp
+ (?imxWDSPy-imxWDSP:subexp) option on/off for subexp
+
+ i: ignore case
+ m: multi-line (dot (.) also matches newline)
+ x: extended form
+ W: ASCII only word (\w, \p{Word}, [[:word:]])
+ ASCII only word bound (\b)
+ D: ASCII only digit (\d, \p{Digit}, [[:digit:]])
+ S: ASCII only space (\s, \p{Space}, [[:space:]])
+ P: ASCII only POSIX properties (includes W,D,S)
+ (alnum, alpha, blank, cntrl, digit, graph,
+ lower, print, punct, space, upper, xdigit, word)
+
+ y{?}: Text Segment mode
+ This option changes the meaning of \X, \y, \Y.
+ Currently, this option is supported in Unicode only.
- i: ignore case
- m: multi-line (dot (.) also matches newline)
- x: extended form
- W: ASCII only word (\w, \p{Word}, [[:word:]])
- ASCII only word bound (\b)
- D: ASCII only digit (\d, \p{Digit}, [[:digit:]])
- S: ASCII only space (\s, \p{Space}, [[:space:]])
- P: ASCII only POSIX properties (includes W,D,S)
- (alnum, alpha, blank, cntrl, digit, graph,
- lower, print, punct, space, upper, xdigit, word)
+ y{g}: Extended Grapheme Cluster mode (default)
+ y{w}: Word mode
+ See [Unicode Standard Annex #29]
- (?imxWDSP-imxWDSP) isolated option
+ (?imxWDSPy-imxWDSP) isolated option
* It makes a group to the next ')' or end of the pattern.
/ab(?i)c|def|gh/ == /ab(?i:c|def|gh)/
@@ -336,7 +370,7 @@ syntax: ONIG_SYNTAX_ONIGURUMA (default)
else_exp can be omitted.
Then it works as a backreference validity checker.
- [ backreference validity checker ] (* original)
+ [ Backreference validity checker ] (* original)
(?(n)), (?(-n)), (?(+n)), (?(n+level)) ...
(?(<n>)), (?('-n')), (?(<+n>)) ...
@@ -470,10 +504,15 @@ A-1. Syntax-dependent options
A-2. Original extensions
- + hexadecimal digit char type \h, \H
- + named group (?<name>...), (?'name'...)
- + named backref \k<name>
- + subexp call \g<name>, \g<group-num>
+ + hexadecimal digit char type \h, \H
+ + true anychar \O
+ + text segment boundary \y, \Y
+ + backreference validity checker (?(...))
+ + named group (?<name>...), (?'name'...)
+ + named backref \k<name>
+ + subexp call \g<name>, \g<group-num>
+ + absent expression (?~|...|...)
+ + absent stopper (?|...)
A-3. Missing features compared with perl 5.8.0
@@ -528,28 +567,4 @@ A-4. Differences with Japanized GNU regex(version 0.12) of Ruby 1.8
/(?:()|())*\1\2/ =~ ""
/(?:\1a|())*/ =~ "a"
-
-A-5. Features disabled in default syntax
-
- + capture history
-
- (?@...) and (?@<name>...)
-
- ex. /(?@a)*/.match("aaa") ==> [<0-1>, <1-2>, <2-3>]
-
- see sample/listcap.c file.
-
-
-A-6. Problems
-
- + Invalid encoding byte sequence is not checked.
-
- ex. UTF-8
-
- * Invalid first byte is treated as a character.
- /./u =~ "\xa3"
-
- * Incomplete byte sequence is not checked.
- /\w+/ =~ "a\xf3\x8ec"
-
// END