首頁新隨筆新文章聯(lián)系聚合

posts - 536,comments - 394,trackbacks - 0

2008年9月

>

日

一

二

三

四

五

六

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

1

2

3

4

5

6

7

8

9

10

11

-------------------------------------------
崇尚原創(chuàng)精神,
文章歡迎轉(zhuǎn)載，
請您注明出處，
在此特別聲明。
版權(quán)所有@zhyiwww
引用鏈接
http://www.aygfsteel.com/zhyiwww
--------------------------------------------

常用鏈接

留言簿(33)

隨筆分類(626)

朋友的博客

zhaoningbo
云云的博客
小林的博客
曉東的博客
老關(guān)的博客
老譚的博客

搜索

積分與排名

積分 - 1559136
排名 - 11

閱讀排行榜

評(píng)論排行榜

grep的非貪婪模式

最近在項(xiàng)目中，我希望能通過grep實(shí)現(xiàn)從一個(gè)html頁面中檢索出所有的超鏈接，
比如下面的一段代碼

<tr class=rb><td class=pl><a href=mail.htm>郵箱</a></td><td><a href=http://mail.163.com/>163郵箱</a>
<a href="?　　<a href=http://mail.sina.com.cn/>新浪郵箱</a> 　　<a href=http://mail.qq.com/>QQ郵箱</a> 　　<a href=http://www.hotmail.com/>Hotmail</a></td><td><a href=mail.htm>更多 »</a></td></tr>

<tr class=ry><td class=pl><a href=wangmei.htm>視頻</a></td><td><a href=http://www.youku.com/>優(yōu)酷網(wǎng)</a>　　<a href="結(jié)果如下：
<tr class=rb><td class=pl><a href=mail.htm>郵箱</a></td><td><a href=http://mail.163.com/>163郵箱</a> 　　<a href="

因?yàn)檫@種模式是貪婪匹配模式。我希望能用非貪婪模式，來進(jìn)行匹配，方法是通過在*修飾副后面添加\？,修改如下：

C:\tmp>grep -ior "href=.*\?\/>" a.txt
結(jié)果如下：
href=mail.htm>郵箱</a></td><td><a href=http://mail.163.com/>163郵箱</a> 　　<a href="http://cn.mail.yahoo.com/?id=40014
" class="greenfont">雅虎郵箱</a> 　　<a href=http://www.126.com/>126郵箱</a> 　　<a href=http://mail.sina.com.cn/>新浪郵
箱</a> 　　<a href=http://mail.qq.com/>QQ郵箱</a> 　　<a href=http://www.hotmail.com/>

我期望的結(jié)果如下：

href=mail.htm
href=http://mail.163.com/
href=
href=http://www.126.com/
href=http://mail.sina.com.cn/
href=http://mail.qq.com/
href=http://www.hotmail.com/
href=mail.htm
不知道如何實(shí)現(xiàn)。如果您有解決方案，請多多指導(dǎo)。先謝了。

|----------------------------------------------------------------------------------------|
版權(quán)聲明版權(quán)所有 @zhyiwww
引用請注明來源 http://www.aygfsteel.com/zhyiwww
|----------------------------------------------------------------------------------------|

posted on 2008-09-26 13:25 zhyiwww 閱讀(3000) 評(píng)論(1) 編輯收藏所屬分類: linux

FeedBack:

# re: grep的非貪婪模式

2008-09-26 13:54 | zhyiwww

我又用了下面的方法
grep -ior "href=[a-z1-9A-Z\?/:\.]*" b.txt
結(jié)果是：
href=mail.htm
href=http://mail.163.com/
href=
href=http://www.126.com/
href=http://mail.sina.com.cn/
href=http://mail.qq.com/
href=http://www.hotmail.com/
href=mail.htm
但是這種方法沒有使用正則表達(dá)式的非貪婪模式。
不知道如何使用非貪婪匹配模式來解決此問題。回復(fù) 更多評(píng)論

新用戶注冊刷新評(píng)論列表


只有注冊用戶登錄后才能發(fā)表評(píng)論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關(guān)文章: ubuntu上安裝repo 禪道PDO_MySQL擴(kuò)展的安裝 apache+subversion+ssl配置 tar打包時(shí)排除一些文件或者目錄 find僅列某一級(jí)目錄的內(nèi)容 linux查看目錄大小紅帽5.4企業(yè)版上yum的安裝和配置 Shell腳本執(zhí)行時(shí)出現(xiàn)declare: not found的解決方法 Shell把字符串聲明成變量 Ubuntu下修改PDF默認(rèn)打開程序