首頁(yè) 新隨筆新文章聯(lián)系聚合

posts - 2,comments - 2,trackbacks - 0

2025年6月

>

日

一

二

三

四

五

六

25

26

27

28

29

30

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

1

2

3

4

5

1. 我要讀書。
2. 路.慢慢走
莫心急。

常用鏈接

留言簿

隨筆檔案(2)

2009年11月 (2)

文章分類(109)

文章檔案(89)

2008年12月 (89)

搜索

閱讀排行榜

評(píng)論排行榜

正則表達(dá)式 DFA and NFA

deterministic finite automaton (DFA),

non-deterministic finite automata (NFAs or NDFAs).

the syntax of regular expressions in Perl:

i

Do case-insensitive pattern matching.

If use locale is in effect, the case map is taken from the current locale. See the perllocale manpage.

m

Treat string as multiple lines. That is, change ``^'' and ``$'' from matching at only the very start or end of the string to the start or end of any line anywhere within the string,

s

Treat string as single line. That is, change ``.'' to match any character whatsoever, even a newline, which it normally would not match.

The /s and /m modifiers both override the $* setting. That is, no matter what $* contains, /s without /m will force ``^'' to match only at the beginning of the string and ``$'' to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the ``.'' match any character whatsoever, while yet allowing ``^'' and ``$'' to match, respectively, just after and just before newlines within the string.

x

Extend your pattern's legibility by permitting whitespace and comments.

These are usually written as ``the /x modifier'', even though the delimiter in question might not actually be a slash. In fact, any of these modifiers may also be embedded within the regular expression itself using the new (?...) construct. See below.

The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in the pattern (outside of a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did not intend to close the pattern early. See the C-comment deletion code in the perlop manpage.

關(guān)于 /m/s 給出一個(gè)合理的解釋：（通過(guò)現(xiàn)象分析實(shí)質(zhì)）

By default, the ``^'' character is guaranteed to match at only the beginning of the string, the ``$'' character at only the end (or before the newline at the end) and Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines will not be matched by ``^'' or ``$''. You may, however, wish to treat a string as a multi-line buffer, such that the ``^'' will match after any newline within the string, and ``$'' will match before any newline. At the cost of a little more overhead, you can do this by using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this practice is now deprecated.)

To facilitate multi-line substitutions, the ``.'' character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't. The /s modifier also overrides the setting of $*, in case you have some (badly behaved) older code that sets it in another module.

當(dāng)有.出現(xiàn)在匹配換行符的位置的時(shí)候，那么就將正則在解析的時(shí)候 /s 的優(yōu)先級(jí)要高，也就是將字符串進(jìn)行 sigle line 的解析了。
當(dāng)出現(xiàn) ^ 或者 $ 來(lái)匹配開(kāi)始位置和結(jié)束位置的時(shí)候，即使這個(gè)時(shí)候也出現(xiàn)了 . 符號(hào)來(lái)匹配換行,正則在解析的時(shí)候 /m 的優(yōu)先級(jí)要搞，也就是將字符串進(jìn)行 multiple lines 的解析了。

這就是兩個(gè)的正則符號(hào)的并集，一個(gè)不行，另一個(gè)頂上的原則。
具體可以通過(guò)相應(yīng)的正則調(diào)試工具進(jìn)行測(cè)試。

在 multiple lines 中 . 符號(hào)是永遠(yuǎn)也不會(huì)用來(lái)匹配 newline 的，也就是 /m 的優(yōu)先級(jí)屏蔽了 . 符號(hào)對(duì)于 newline 的匹配，如果要使 . 能夠匹配 newline, 那么請(qǐng)使用 /s

關(guān)于 /x 的合理解釋：（通過(guò)例子調(diào)試獲取結(jié)果）
/x 也成為擴(kuò)展模式，這是 Regex Match Tracer 告訴我們的。他在正則表達(dá)式中允許出現(xiàn)空格以及 # 的注釋，但是這些注釋字符串（空格以及 # 后面出現(xiàn)的字符）并不匹配實(shí)際的字符串。

轉(zhuǎn)義字符 \Q...\E

使用 \Q 開(kāi)始，\E 結(jié)束，可使中間的標(biāo)點(diǎn)符號(hào)失去特殊意義，將中間的字符作為普通字符。

使用 \U 開(kāi)始，\E 結(jié)束，除了具有 \Q...\E 相同的功能外，還將中間的小寫字母轉(zhuǎn)換成大寫。在大小寫敏感模式下，只能與大寫文本匹配。

使用 \L 開(kāi)始，\E 結(jié)束，除了具有 \Q...\E 相同的功能外，還將中間的大寫字母轉(zhuǎn)換成小寫。在大小寫敏感模式下，只能與小寫文本匹配。

說(shuō)明

\Q...\E 適合用于：表達(dá)式中需要比較長(zhǎng)的普通文本，而其中包含了特殊符號(hào)。

舉例

表達(dá)式

說(shuō)明

\Q(a+b)*3\E

可匹配文本 "(a+b)*3"。

$a\+b$\*3

如果不使用 \Q...\E 進(jìn)行轉(zhuǎn)義，則對(duì)每個(gè)特殊符號(hào)進(jìn)行轉(zhuǎn)義。

表達(dá)式	說(shuō)明
*\Q(a+b)3\E**	可匹配文本 "(a+b)*3"。
*\(a\+b\)\3**	如果不使用 \Q...\E 進(jìn)行轉(zhuǎn)義，則對(duì)每個(gè)特殊符號(hào)進(jìn)行轉(zhuǎn)義。

posted on 2008-12-21 23:02 CopyHoo 閱讀(1336) 評(píng)論(0) 編輯收藏所屬分類: Java Web

新用戶注冊(cè) 刷新評(píng)論列表


只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問(wèn) 管理
相關(guān)文章: 正則表達(dá)式 DFA and NFA 關(guān)于 jsp 的解釋執(zhí)行。 tomcat 根據(jù)自己的測(cè)試結(jié)果學(xué)習(xí)。 javascript動(dòng)態(tài)增加行的錯(cuò)誤（問(wèn)題比較經(jīng)典） js細(xì)節(jié)札記 html 的 select 組關(guān)于 select 的添加 option 應(yīng)該注意的問(wèn)題。 select元素的options.add 與 insertbefore的區(qū)別工程在不斷改正后，在web 上面沒(méi)有顯示出來(lái)的原因解析。關(guān)于頁(yè)面以及 iframe 造成的縮進(jìn)。