我的漫漫程序之旅

專注于JavaWeb開發(fā)

隨筆 - 39, 文章 - 310, 評論 - 411, 引用 - 0

數(shù)據(jù)加載中……

Groovy之旅系列之五(正則之分組)

俘獲組:

Groovy正則表達式的一個最有用的特性就是能用正則表達式從另一個正則
表達式中俘獲數(shù)據(jù)．看下面這個例子，如果我們想精確定位到Liverpool, England:

locationData = "Liverpool, England: 53° 25? 0? N 3° 0? 0?"

我們能用string的split()方法，來截取我們需要的Liverpool, England(這里需要把
逗號除去).或許我們可以采用正則表達式，對于下面的例子，您對語法可能有一點生疏．
第一步，我們定義一個正則表達式，把我們感興趣的內(nèi)容都放入圓括號內(nèi):

myRegularExpression = /([a-zA-Z]+), ([a-zA-Z]+): ([0-9]+). ([0-9]+). ([0-9]+). ([A-Z]) ([0-9]+). ([0-9]+). ([0-9]+)./

下面我們定義一個matcher,它是用=~操作符來完成的．

matcher = ( locationData =~ myRegularExpression )

變量matcher包含 java.util.regex.Matcher ，并被Groovy進行了增強.你可以訪問你的數(shù)據(jù)像在Java平臺上一樣對一個Matcher對象．一個更棒的方式就是用matcher,來訪問一個二維數(shù)組．
我們可以來看看數(shù)據(jù)的第一維:

["Liverpool, England: 53° 25? 0? N 3° 0? 0?", "Liverpool", "England", "53", "25", "0", "N", "3", "0", "0"]

已經(jīng)把滿足條件的string加上原來的strng，組合成了一個數(shù)組．

這樣我們就可以方便的輸出我們想要的數(shù)據(jù)：

if(matcher.matches()) {
    println(matcher.getCount()+ " occurrence of the regular expression was found in the string.");
    println(matcher[0][1] + " is in the " + matcher[0][6] + " hemisphere. (According to: " + matcher[0][0] + ")")
    for(int i = 0;i < matcher[0].size; i ++)
    {
        println(matcher[0][i])
    }
}

非俘獲組:

有時候我們需要定義一個非俘獲組，來獲得我們想要的數(shù)據(jù)．來看下面的例子，我們的目標是
過濾掉它的middle name:

names = [
    "Graham James Edward Miller",
    "Andrew Gregory Macintyre"
]

printClosure = {
    matcher = (it =~ /(.*?)(?: .+)+ (.*)/);  // notice the non-matching group in the middle
    if (matcher.matches())
        println(matcher[0][2]+", "+matcher[0][1]);
}
names.each(printClosure);

輸出:

Miller, Graham
Macintyre, Andrew

有人可能對非俘獲組不太明白，通俗點說就是在已經(jīng)俘獲的組除去你不想要的字符或符號．
比如：

names =
[
"ZDW   love beijing",
"Angel   love beijing",
"Ghost   hate beijing"
]

我們只想要開頭名字和結(jié)尾的城市，過濾掉love.這時
就用到了非俘獲組．表示方法就是用?: 加上你要過濾的正則前面．

nameClosure = {
        myMatcher = (it =~ /(.*?)(?:   .+)+ (.*)/)
        if(myMatcher.matches())
        {
            println(myMatcher[0][1] + " " + myMatcher[0][2])
        }
}

names.each(nameClosure);

我們來分析一下這個：

(?:　.+)

組都用()括起來，?:表示這是一個非俘獲組其中中間是有一個空格的．這個取決
于原字符串中間的空格，如果是逗號或其它符號，換成相應(yīng)的就可以了．
.+ 任意多個字符(最少１個)

替換:

我們可能有這樣的需要，在一個字符串中，把指定的字符串或符號，換成我們想要的．
比如：

excerpt = "At school, Harry had no one. Everybody knew that Dudley's gang hated that odd Harry Potter "+
"in his baggy old clothes and broken glasses, and nobody liked to disagree with Dudley's gang.";
matcher = (excerpt =~ /Harry Potter/);
excerpt = matcher.replaceAll("Tanya Grotter");

matcher = (excerpt =~ /Harry/);
excerpt = matcher.replaceAll("Tanya");
println("Publish it! "+excerpt);

這個例子中我們做了兩件事情．一個是把Harry Potter換成了Tanya Grotter,另一個是
把Harry換成了Tanya.

Reluctant Operators

對于這個還是不翻譯的好＂勉強操作符＂？．
對于.,*,+操作默認都是貪心的．意思就是說有時候把我們不想要的也
匹配進去了．這時我們就要用到Relucatant operators.

我們只想要皇帝的名字和所在世紀．

/Pope (.*)(?: .*)? ([0-9]+)-([0-9]+)/

上面是正常分組表達式，我們簡單的在.*+后面再加上個？就表示Reluctant operators.

自己試驗一下看看輸出什么:

popesArray = [
    "Pope Anastasius I 399-401",
    "Pope Innocent I 401-417",
    "Pope Zosimus 417-418",
    "Pope Boniface I 418-422",
    "Pope Celestine I 422-432",
    "Pope Sixtus III 432-440",
    "Pope Leo I the Great 440-461",
    "Pope Hilarius 461-468",
    "Pope Simplicius 468-483",
    "Pope Felix III 483-492",
    "Pope Gelasius I 492-496",
    "Pope Anastasius II 496-498",
    "Pope Symmachus 498-514"
]

myClosure = {
    myMatcher = (it =~ /Pope (.*?)(?: .*)? ([0-9]+)-([0-9]+)/);
    if (myMatcher.matches())
        println(myMatcher[0][1]+": "+myMatcher[0][2]+" to "+myMatcher[0][3]);
}
popesArray.each(myClosure);

基本上滿足了我們的要求．
你可以嘗試一下如果不加？看看會發(fā)生什么錯誤～．

posted on 2008-05-13 10:56 々上善若水々閱讀(2386) 評論(0) 編輯收藏

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發(fā)表評論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理

我的漫漫程序之旅

Groovy之旅系列之五(正則之分組)

Reluctant Operators

導(dǎo)航

常用鏈接

留言簿(39)

隨筆檔案(43)

文章分類(304)

文章檔案(257)

搜索

最新評論

閱讀排行榜

評論排行榜