Find duplicate records in text file
Example:
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
UNIX:
display the no of occurance and the record
> sort f1.txt|uniq -c
2 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
1 aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
2 tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
> sort f1.txt|uniq -c
2 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
1 aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
2 tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
display only the duplicate records
> sort f1.txt|uniq -d
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
display distinct records
> sort f1.txt|uniq
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
Reference:
> sort f1.txt|uniq -d
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
display distinct records
> sort f1.txt|uniq
abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
aer 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452
tas 3420 3562 2123 1343 2176 7654 3252 8765 5643 3452
Reference:
Shell: How To Remove Duplicate Text Lines
Windows:
Windows:
Notepad++ can sort by line, and remove the duplicate lines at the same time.
- Open the menu under: TextFX-->TextFX Tools
- Make sure "sort outputs only unique..." is checked
- select a block of text (ctrl-a to select the entire document).
- click "sort lines case sensitive" or "sort lines case insensitive"
posted on 2012-04-11 12:10 zJun's帛羅閣 閱讀(485) 評論(0) 編輯 收藏 所屬分類: 開發環境