首頁新隨筆新文章聯系聚合

posts - 536,comments - 394,trackbacks - 0

2008年9月

>

日

一

二

三

四

五

六

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

1

2

3

4

5

6

7

8

9

10

11

-------------------------------------------
崇尚原創精神,
文章歡迎轉載，
請您注明出處，
在此特別聲明。
版權所有@zhyiwww
引用鏈接
http://www.aygfsteel.com/zhyiwww
--------------------------------------------

常用鏈接

留言簿(33)

隨筆分類(626)

朋友的博客

zhaoningbo
云云的博客
小林的博客
曉東的博客
老關的博客
老譚的博客

搜索

積分與排名

積分 - 1559488
排名 - 11

閱讀排行榜

評論排行榜

grep的非貪婪模式

最近在項目中，我希望能通過grep實現從一個html頁面中檢索出所有的超鏈接，
比如下面的一段代碼

<tr class=rb><td class=pl><a href=mail.htm>郵箱</a></td><td><a href=http://mail.163.com/>163郵箱</a>
<a href="?　　<a href=http://mail.sina.com.cn/>新浪郵箱</a> 　　<a href=http://mail.qq.com/>QQ郵箱</a> 　　<a href=http://www.hotmail.com/>Hotmail</a></td><td><a href=mail.htm>更多 »</a></td></tr>

<tr class=ry><td class=pl><a href=wangmei.htm>視頻</a></td><td><a href=http://www.youku.com/>優酷網</a>　　<a href="結果如下：
<tr class=rb><td class=pl><a href=mail.htm>郵箱</a></td><td><a href=http://mail.163.com/>163郵箱</a> 　　<a href="

因為這種模式是貪婪匹配模式。我希望能用非貪婪模式，來進行匹配，方法是通過在*修飾副后面添加\？,修改如下：

C:\tmp>grep -ior "href=.*\?\/>" a.txt
結果如下：
href=mail.htm>郵箱</a></td><td><a href=http://mail.163.com/>163郵箱</a> 　　<a href="http://cn.mail.yahoo.com/?id=40014
" class="greenfont">雅虎郵箱</a> 　　<a href=http://www.126.com/>126郵箱</a> 　　<a href=http://mail.sina.com.cn/>新浪郵
箱</a> 　　<a href=http://mail.qq.com/>QQ郵箱</a> 　　<a href=http://www.hotmail.com/>

我期望的結果如下：

href=mail.htm
href=http://mail.163.com/
href=
href=http://www.126.com/
href=http://mail.sina.com.cn/
href=http://mail.qq.com/
href=http://www.hotmail.com/
href=mail.htm
不知道如何實現。如果您有解決方案，請多多指導。先謝了。

|----------------------------------------------------------------------------------------|
版權聲明版權所有 @zhyiwww
引用請注明來源 http://www.aygfsteel.com/zhyiwww
|----------------------------------------------------------------------------------------|

posted on 2008-09-26 13:25 zhyiwww 閱讀(3001) 評論(1) 編輯收藏所屬分類: linux

FeedBack:

# re: grep的非貪婪模式

2008-09-26 13:54 | zhyiwww

我又用了下面的方法
grep -ior "href=[a-z1-9A-Z\?/:\.]*" b.txt
結果是：
href=mail.htm
href=http://mail.163.com/
href=
href=http://www.126.com/
href=http://mail.sina.com.cn/
href=http://mail.qq.com/
href=http://www.hotmail.com/
href=mail.htm
但是這種方法沒有使用正則表達式的非貪婪模式。
不知道如何使用非貪婪匹配模式來解決此問題。回復更多評論

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: ubuntu上安裝repo 禪道PDO_MySQL擴展的安裝 apache+subversion+ssl配置 tar打包時排除一些文件或者目錄 find僅列某一級目錄的內容 linux查看目錄大小紅帽5.4企業版上yum的安裝和配置 Shell腳本執行時出現declare: not found的解決方法 Shell把字符串聲明成變量 Ubuntu下修改PDF默認打開程序