無線&移動(dòng)互聯(lián)網(wǎng)技術(shù)研發(fā)

換位思考·····

posts - 19, comments - 53, trackbacks - 0, articles - 283

shell合并和分割

Posted on 2009-11-29 11:57 Gavin.lee 閱讀(605) 評(píng)論(0) 編輯收藏所屬分類: Linux shell 入門

• 實(shí)用的分類（sort）操作。
sort命令的一般格式為：
sort -cmu -o output_file [other options] +pos1 +pos2 input_files
下面簡要介紹一下s o r t的參數(shù)：
-c 測試文件是否已經(jīng)分類。
-m 合并兩個(gè)分類文件。
-u 刪除所有復(fù)制行。
-o 存儲(chǔ)s o r t結(jié)果的輸出文件名。
其他選項(xiàng)有：
-b 使用域進(jìn)行分類時(shí)，忽略第一個(gè)空格。
-n 指定分類是域上的數(shù)字分類。
-t 域分隔符；用非空格或t a b鍵分隔域。
-r 對(duì)分類次序或比較求逆。
+n n為域號(hào)。使用此域號(hào)開始分類。
n n為域號(hào)。在分類比較時(shí)忽略此域，一般與+ n一起使用。
post1 傳遞到m，n。m為域號(hào)，n為開始分類字符數(shù)；例如4，6意即以第5域分類，從第7
個(gè)字符開始。
舉例：
按第一個(gè)域分類
-bash-3.00$ sort -k0 sed.txt
打印分類后的最后第一行
-bash-3.00$ sort -k0 sed.txt | tail -1
打印分類后的第一行
-bash-3.00$ sort -k0 sed.txt | head -1
awk使用sort輸出結(jié)果
-bash-3.00$ sort -k0 sed.txt | head -1 | awk '{if($1=="caodejun")print $1}'
將兩個(gè)分類文件合并
-bash-3.00$ sort -m sed.txt sort.txt
將文件合并前，它們必須已被分類。合并文件可用于事務(wù)處理和任何種類的修改操作。
下面這個(gè)例子，因?yàn)橥税褍蓚€(gè)家電名稱加入文件，它們被放在一個(gè)單獨(dú)的文件里，現(xiàn)在將
之并入一個(gè)文件。分類的合并格式為‘sort -m sorted_file1 sorted_file2’。

刪除重復(fù)行
-bash-3.00$ sort -u sed.txt

• uniq
uniq用來從一個(gè)文本文件中去除或禁止重復(fù)行。一般uniq假定文件已分類，并且結(jié)果正確。我們并不強(qiáng)制要求這樣做，如果愿意，可以使用任何非排序文本，甚至是無規(guī)律行。
-bash-3.00$ who | awk '{print $1} ' |uniq
liuzk423
605408211
shuzigui
nefu_luyanshen
waterlooz
wsoangel
tomotoboy
xp55699312
zyy0904
caodejun
duke1988
605408211
nefu_luyanshen
zyy0904
lonelysand

顯示不唯一的行
-bash-3.00$ who | awk '{print $1} ' |uniq -d

-c打印每一重復(fù)行出現(xiàn)次數(shù)。
-bash-3.00$ who | awk '{print $1} ' |uniq -c
   1 liuzk423
   1 605408211
   1 shuzigui
   1 nefu_luyanshen
   1 waterlooz
   1 wsoangel
   1 tomotoboy
   1 xp55699312
   1 zyy0904
   1 caodejun
   1 duke1988
   1 605408211
   1 nefu_luyanshen
   1 zyy0904
   1 lonelysand
這里沒有搞懂nefu_luyanshen明明重復(fù)，卻顯示重復(fù)行數(shù)目為1

對(duì)特定域進(jìn)行測試，使用-n只測試一行一部分的唯一性。
-bash-3.00$ who | awk '{print $1} ' |uniq -n2
liuzk423

• join
將兩個(gè)已經(jīng)分好類的文件連接在一起哈。一些系統(tǒng)要求使用join時(shí)文件域要少于20，為公平起見，如果域大于20，應(yīng)使用DBMS系統(tǒng)，其一般格式如下：
join [options] in_file1 in_file2

-bash-3.00$ cat sed.txt
605408211   pts/16       Jul 31 13:54   (218.0.1.42)
caodejun   pts/44       Jul 31 14:16    (219.148.133.31)
duke1988   pts/45       Jul 31 14:41    (218.104.163.66)
liuzk423   pts/6        Jul 20 08:27    (219.245.104.240)
nefu_luyanshen   pts/23       Jul 31 14:33      (218.25.6.142)
nefu_luyanshen   pts/48       Jul 31 12:59      (218.25.6.142)
shuzigui   pts/21       Jul 31 12:11    (121.35.248.193)
tomotoboy   pts/41       Jul 31 13:31   (219.221.99.155)
waterlooz   pts/25       Jul 31 08:48   (121.0.29.225)
wsoangel   pts/35       Jul 31 13:40    (116.233.219.10)
xp55699312   pts/42       Jul 31 14:12 (61.152.132.103)
zyy0904    pts/43       Jul 31 13:53    (125.33.195.36)
-bash-3.00$ cat sort.txt
605408211   pts/16       Jul 31 13:54   (218.0.1.42)
caodejun   pts/44       Jul 31 14:16    (219.148.133.31)
duke1988   pts/45       Jul 31 14:41    (218.104.163.66)
-bash-3.00$ join sed.txt sort.txt
605408211 pts/16 Jul 31 13:54 (218.0.1.42) pts/16 Jul 31 13:54 (218.0.1.42)
caodejun pts/44 Jul 31 14:16 (219.148.133.31) pts/44 Jul 31 14:16 (219.148.133.31)
duke1988 pts/45 Jul 31 14:41 (218.104.163.66) pts/45 Jul 31 14:41 (218.104.163.66)

選擇匹配
-bash-3.00$ join -a1 -a2 sed.txt sort.txt
605408211 pts/16 Jul 31 13:54 (218.0.1.42) pts/16 Jul 31 13:54 (218.0.1.42)
caodejun pts/44 Jul 31 14:16 (219.148.133.31) pts/44 Jul 31 14:16 (219.148.133.31)
duke1988 pts/45 Jul 31 14:41 (218.104.163.66) pts/45 Jul 31 14:41 (218.104.163.66)
liuzk423 pts/6 Jul 20 08:27 (219.245.104.240)
nefu_luyanshen pts/23 Jul 31 14:33 (218.25.6.142)
nefu_luyanshen pts/48 Jul 31 12:59 (218.25.6.142)
shuzigui pts/21 Jul 31 12:11 (121.35.248.193)
tomotoboy pts/41 Jul 31 13:31 (219.221.99.155)
waterlooz pts/25 Jul 31 08:48 (121.0.29.225)
wsoangel pts/35 Jul 31 13:40 (116.233.219.10)
xp55699312 pts/42 Jul 31 14:12 (61.152.132.103)
zyy0904 pts/43 Jul 31 13:53 (125.33.195.36)
-bash-3.00$ join -o 1.1 2.2 sed.txt sort.txt
605408211 pts/16
caodejun pts/44
duke1988 pts/45
-bash-3.00$ join -o 1.1 2.2 2.3 sed.txt sort.txt
605408211 pts/16 Jul
caodejun pts/44 Jul
duke1988 pts/45 Jul

• cut
cut用來從標(biāo)準(zhǔn)輸入或文本文件中剪切列或域。剪切文本可以將之粘貼到一個(gè)文本文件。
下一節(jié)將介紹粘貼用法。
cut一般格式為：
cut [options] file1 file2
下面介紹其可用選項(xiàng)：
-c list 指定剪切字符數(shù)。
-f field 指定剪切域數(shù)。
-d   指定與空格和t a b鍵不同的域分隔符。
- c用來指定剪切范圍，如下所示：
- c 1，5-7 剪切第1個(gè)字符，然后是第5到第7個(gè)字符。
-c1-50 剪切前5 0個(gè)字符。
-f 格式與- c相同。
-f 1，5 剪切第1域，第5域。
- f 1，10-12 剪切第1域，第1 0域到第1 2域。

-bash-3.00$ ps -ef | cut -c1-8
-bash-3.00$ ps -ef | cut -d: -f1
-bash-3.00$ ps -ef | cut -d: -f1，3

• paste
cut用來從文本文件或標(biāo)準(zhǔn)輸出中抽取數(shù)據(jù)列或者域，然后再用 paste可以將這些數(shù)據(jù)粘貼
起來形成相關(guān)文件。粘貼兩個(gè)不同來源的數(shù)據(jù)時(shí)，首先需將其分類，并確保兩個(gè)文件行數(shù)相
同。
paste將按行將不同文件行信息放在一行。缺省情況下， paste連接時(shí)，用空格或tab鍵分隔
新行中不同文本，除非指定- d選項(xiàng)，它將成為域分隔符。paste格式為：
paste -d -s -file1 file2
選項(xiàng)含義如下：
-d   指定不同于空格或t a b鍵的域分隔符。例如用@分隔域，使用- d @。
-s   將每個(gè)文件合并成行而不是按行粘貼。
            - 使用標(biāo)準(zhǔn)輸入。例如ls -l |paste ，意即只在一列上顯示輸出。

• split
split用來將大文件分割成小文件。有時(shí)文件越來越大，傳送這些文件時(shí)，首先將其分割可
能更容易。使用v i或其他工具諸如sort時(shí)，如果文件對(duì)于工作緩沖區(qū)太大，也會(huì)存在一些問題。
因此有時(shí)沒有選擇余地，必須將文件分割成小的碎片。
split命令一般格式：
split -output_file-size input-filename output-filename
這里output_file-size 指的是文本文件被分割的行數(shù)。
-bash-3.00$ ps -ef |split -10
-bash-3.00$ ls

a.out greeting.sh main.c sort.txt xac xai xao

append.sed grepgrepstrings nohup.out test xad xaj xap

change.sed grepstr readme.sh test.sh xae xak xaq

core.log hello seawolf user.online xaf xal xar

factorial hello.cpp sed.out xaa xag xam

factorial.c main sed.txt xab xah xan

新用戶注冊(cè) 刷新評(píng)論列表


只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關(guān)文章: 向腳本傳遞參數(shù) shell函數(shù) 后臺(tái)執(zhí)行命令——守護(hù)進(jìn)程創(chuàng)建控制流結(jié)構(gòu)——until、while、break、continue 控制流結(jié)構(gòu)——for 控制流結(jié)構(gòu)——if then else 控制流結(jié)構(gòu)——case 條件控制（test，expr……） shell變量及環(huán)境變量登陸環(huán)境

無線&移動(dòng)互聯(lián)網(wǎng)技術(shù)研發(fā)

shell合并和分割

日歷

常用鏈接

留言簿(13)

我參與的團(tuán)隊(duì)

隨筆檔案(19)

文章分類(277)

文章檔案(282)

收藏夾(7)

友情鏈接

最新隨筆

搜索

積分與排名

最新評(píng)論

閱讀排行榜

評(píng)論排行榜