non-deterministic finite automata (NFAs or NDFAs).
the syntax of regular expressions in Perl:
If use locale
is in effect, the case map is taken from the current locale. See the perllocale manpage.
The /s and /m modifiers both override the $*
setting. That is, no matter what $*
contains, /s without /m will force ``^'' to match only at the beginning of the string and ``$'' to match only at the end (or just before a newline at the end) of the string. Together, as /ms
, they let the ``.'' match any character whatsoever, while yet allowing ``^'' and ``$'' to match, respectively, just after and just before newlines within the string.
These are usually written as ``the /x
modifier'', even though the delimiter in question might not actually be a slash. In fact, any of these modifiers may also be embedded within the regular expression itself using the new (?...)
construct. See below.
The /x
modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The #
character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or #
characters in the pattern (outside of a character class, where they are unaffected by /x
), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did not intend to close the pattern early. See the C-comment deletion code in the perlop manpage.
關(guān)于 /m/s 給出一個(gè)合理的解釋:(通過(guò)現(xiàn)象分析實(shí)質(zhì))
By default, the ``^'' character is guaranteed to match at only the beginning of the string, the ``$'' character at only the end (or before the newline at the end) and Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by ``^'' or ``$''. You may, however, wish to treat a string as a multi-line buffer, such that the ``^'' will match after any newline within the string, and ``$'' will match before any newline. At the cost of a little more overhead, you can do this by using the /m
modifier on the pattern match operator. (Older programs did this by setting $*
, but this practice is now deprecated.)
To facilitate multi-line substitutions, the ``.'' character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't. The /s modifier also overrides the setting of $*
, in case you have some (badly behaved) older code that sets it in another module.
當(dāng)有.出現(xiàn)在匹配換行符的位置的時(shí)候, 那么就將正則在解析的時(shí)候 /s 的優(yōu)先級(jí)要高,也就是將字符串進(jìn)行 sigle line 的解析了。
當(dāng)出現(xiàn) ^ 或者 $ 來(lái)匹配開(kāi)始位置和結(jié)束位置的時(shí)候,即使這個(gè)時(shí)候也出現(xiàn)了 . 符號(hào)來(lái)匹配換行,正則在解析的時(shí)候 /m 的優(yōu)先級(jí)要搞,也就是將字符串進(jìn)行 multiple lines 的解析了。
具體可以通過(guò)相應(yīng)的正則調(diào)試工具進(jìn)行測(cè)試。
在 multiple lines 中 . 符號(hào)是永遠(yuǎn)也不會(huì)用來(lái)匹配 newline 的,也就是 /m 的優(yōu)先級(jí)屏蔽了 . 符號(hào)對(duì)于 newline 的匹配,如果要使 . 能夠匹配 newline, 那么請(qǐng)使用 /s
關(guān)于 /x 的合理解釋:(通過(guò)例子調(diào)試獲取結(jié)果)
/x 也成為擴(kuò)展模式,這是 Regex Match Tracer 告訴我們的。他在正則表達(dá)式中允許出現(xiàn)空格以及 # 的注釋,但是這些注釋字符串(空格以及 # 后面出現(xiàn)的字符)并不匹配實(shí)際的字符串。
轉(zhuǎn)義字符 \Q...\E
使用 \Q 開(kāi)始,\E 結(jié)束,可使中間的標(biāo)點(diǎn)符號(hào)失去特殊意義,將中間的字符作為普通字符。
使用 \U 開(kāi)始,\E 結(jié)束,除了具有 \Q...\E 相同的功能外,還將中間的小寫字母轉(zhuǎn)換成大寫。在大小寫敏感模式下,只能與大寫文本匹配。
使用 \L 開(kāi)始,\E 結(jié)束,除了具有 \Q...\E 相同的功能外,還將中間的大寫字母轉(zhuǎn)換成小寫。在大小寫敏感模式下,只能與小寫文本匹配。
說(shuō)明
\Q...\E 適合用于:表達(dá)式中需要比較長(zhǎng)的普通文本,而其中包含了特殊符號(hào)。
舉例
表達(dá)式
說(shuō)明
\Q(a+b)*3\E
可匹配文本 "(a+b)*3"。
\(a\+b\)\*3
如果不使用 \Q...\E 進(jìn)行轉(zhuǎn)義,則對(duì)每個(gè)特殊符號(hào)進(jìn)行轉(zhuǎn)義。