首頁新隨筆新文章聯(lián)系

2025年6月

日

一

二

三

四

五

六

25

26

27

28

29

30

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

1

2

3

4

5

blog是收集資料并且作為技術(shù)交流的平臺，發(fā)布一些本人常用資料或開發(fā)經(jīng)驗，希望能和大家一起討論、進步。

訪問統(tǒng)計

留言簿(6)

我參與的團隊

牛虻(0/0)

隨筆檔案(8)

文章分類(149)

新聞分類(1)

鐵血軍事(1)

相冊

收藏夾(21)

友情鏈接

lpj的博客
三生石的博客
中文愛百科
莊陸的博客
狼的博客

我的鏈接

搜索

積分與排名

積分 - 150787
排名 - 413

閱讀排行榜

評論排行榜

LUCENE簡單實例

lucene的簡單實例<一>

關(guān)鍵字: lucene
說明一下,這一篇文章的用到的lucene,是用2.0版本的,主要在查詢的時候2.0版本的lucene與以前的版本有了一些區(qū)別.
其實這一些代碼都是早幾個月寫的,自己很懶,所以到今天才寫到自己的博客上,高深的文章自己寫不了，只能記錄下一些簡單的記錄與點滴，其中的代碼算是自娛自樂的，希望高手不要把重構(gòu)之類的砸下來...

1、在windows系統(tǒng)下的的C盤，建一個名叫s的文件夾,在該文件夾里面隨便建三個txt文件，隨便起名啦，就叫"1.txt","2.txt"和"3.txt"啦
其中1.txt的內(nèi)容如下：

代碼
中華人民共和國
全國人民
2006年

而"2.txt"和"3.txt"的內(nèi)容也可以隨便寫幾寫，這里懶寫，就復(fù)制一個和1.txt文件的內(nèi)容一樣吧

2、下載lucene包，放在classpath路徑中
建立索引:

代碼

1

package lighter.javaeye.com;
2

3

import java.io.BufferedReader;
4

import java.io.File;
5

import java.io.FileInputStream;
6

import java.io.IOException;
7

import java.io.InputStreamReader;
8

import java.util.Date;
9

10

import org.apache.lucene.analysis.Analyzer;
11

import org.apache.lucene.analysis.standard.StandardAnalyzer;
12

import org.apache.lucene.document.Document;
13

import org.apache.lucene.document.Field;
14

import org.apache.lucene.index.IndexWriter;
15

16

/**
17

* author lighter date 2006-8-7
18

*/
19

public class TextFileIndexer {
20

public static void main(String[] args) throws Exception {
21

/* 指明要索引文件夾的位置,這里是C盤的S文件夾下 */
22

File fileDir = new File("c:\\s");
23

24

/* 這里放索引文件的位置 */
25

File indexDir = new File("c:\\index");
26

Analyzer luceneAnalyzer = new StandardAnalyzer();
27

IndexWriter indexWriter = new IndexWriter(indexDir, luceneAnalyzer,
28

true);
29

File[] textFiles = fileDir.listFiles();
30

long startTime = new Date().getTime();
31

32

//增加document到索引去
33

for (int i = 0; i < textFiles.length; i++) {
34

if (textFiles[i].isFile()
35

&& textFiles[i].getName().endsWith(".txt")) {
36

System.out.println("File " + textFiles[i].getCanonicalPath()
37

+ "正在被索引

.");
38

String temp = FileReaderAll(textFiles[i].getCanonicalPath(),
39

"GBK");
40

System.out.println(temp);
41

Document document = new Document();
42

Field FieldPath = new Field("path", textFiles[i].getPath(),
43

Field.Store.YES, Field.Index.NO);
44

Field FieldBody = new Field("body", temp, Field.Store.YES,
45

Field.Index.TOKENIZED,
46

Field.TermVector.WITH_POSITIONS_OFFSETS);
47

document.add(FieldPath);
48

document.add(FieldBody);
49

indexWriter.addDocument(document);
50

}
51

}
52

//optimize()方法是對索引進行優(yōu)化
53

indexWriter.optimize();
54

indexWriter.close();
55

56

//測試一下索引的時間
57

long endTime = new Date().getTime();
58

System.out
59

.println("這花費了"
60

+ (endTime - startTime)
61

+ " 毫秒來把文檔增加到索引里面去!"
62

+ fileDir.getPath());
63

}
64

65

public static String FileReaderAll(String FileName, String charset)
66

throws IOException {
67

BufferedReader reader = new BufferedReader(new InputStreamReader(
68

new FileInputStream(FileName), charset));
69

String line = new String();
70

String temp = new String();
71

72

while ((line = reader.readLine()) != null) {
73

temp += line;
74

}
75

reader.close();
76

return temp;
77

}
78

}

索引的結(jié)果：

代碼
File C:\s\1.txt正在被索引....
中華人民共和國全國人民2006年
File C:\s\2.txt正在被索引....
中華人民共和國全國人民2006年
File C:\s\3.txt正在被索引....
中華人民共和國全國人民2006年
這花費了297 毫秒來把文檔增加到索引里面去!c:\s

3、建立了索引之后，查詢啦....

代碼

1

package lighter.javaeye.com;
2

3

import java.io.IOException;
4

5

import org.apache.lucene.analysis.Analyzer;
6

import org.apache.lucene.analysis.standard.StandardAnalyzer;
7

import org.apache.lucene.queryParser.ParseException;
8

import org.apache.lucene.queryParser.QueryParser;
9

import org.apache.lucene.search.Hits;
10

import org.apache.lucene.search.IndexSearcher;
11

import org.apache.lucene.search.Query;
12

13

public class TestQuery {
14

public static void main(String[] args) throws IOException, ParseException {
15

Hits hits = null;
16

String queryString = "中華";
17

Query query = null;
18

IndexSearcher searcher = new IndexSearcher("c:\\index");
19

20

Analyzer analyzer = new StandardAnalyzer();
21

try {
22

QueryParser qp = new QueryParser("body", analyzer);
23

query = qp.parse(queryString);
24

} catch (ParseException e) {
25

}
26

if (searcher != null) {
27

hits = searcher.search(query);
28

if (hits.length() > 0) {
29

System.out.println("找到:" + hits.length() + " 個結(jié)果!");
30

}
31

}
32

}
33

34

}

lucene簡單實例<二>

Lucene 其實很簡單的,它最主要就是做兩件事:建立索引和進行搜索
來看一些在lucene中使用的術(shù)語,這里并不打算作詳細的介紹,只是點一下而已----因為這一個世界有一種好東西，叫搜索。

IndexWriter:lucene中最重要的的類之一，它主要是用來將文檔加入索引，同時控制索引過程中的一些參數(shù)使用。

Analyzer:分析器,主要用于分析搜索引擎遇到的各種文本。常用的有StandardAnalyzer分析器,StopAnalyzer分析器,WhitespaceAnalyzer分析器等。

Directory:索引存放的位置;lucene提供了兩種索引存放的位置，一種是磁盤，一種是內(nèi)存。一般情況將索引放在磁盤上；相應(yīng)地lucene提供了FSDirectory和RAMDirectory兩個類。

Document:文檔;Document相當(dāng)于一個要進行索引的單元，任何可以想要被索引的文件都必須轉(zhuǎn)化為Document對象才能進行索引。

Field：字段。

IndexSearcher:是lucene中最基本的檢索工具，所有的檢索都會用到IndexSearcher工具;

Query:查詢，lucene中支持模糊查詢，語義查詢，短語查詢，組合查詢等等,如有TermQuery,BooleanQuery,RangeQuery,WildcardQuery等一些類。

QueryParser: 是一個解析用戶輸入的工具，可以通過掃描用戶輸入的字符串，生成Query對象。

Hits:在搜索完成之后，需要把搜索結(jié)果返回并顯示給用戶，只有這樣才算是完成搜索的目的。在lucene中，搜索的結(jié)果的集合是用Hits類的實例來表示的。

上面作了一大堆名詞解釋，下面就看幾個簡單的實例吧:
1、簡單的的StandardAnalyzer測試例子

代碼

1

package lighter.javaeye.com;
2

3

import java.io.IOException;
4

import java.io.StringReader;
5

6

import org.apache.lucene.analysis.Analyzer;
7

import org.apache.lucene.analysis.Token;
8

import org.apache.lucene.analysis.TokenStream;
9

import org.apache.lucene.analysis.standard.StandardAnalyzer;
10

11

public class StandardAnalyzerTest
12

{
13

//構(gòu)造函數(shù)，
14

public StandardAnalyzerTest()
15

{
16

}
17

public static void main(String[] args)
18

{
19

//生成一個StandardAnalyzer對象
20

Analyzer aAnalyzer = new StandardAnalyzer();
21

//測試字符串
22

StringReader sr = new StringReader("lighter javaeye com is the are on");
23

//生成TokenStream對象
24

TokenStream ts = aAnalyzer.tokenStream("name", sr);
25

try {
26

int i=0;
27

Token t = ts.next();
28

while(t!=null)
29

{
30

//輔助輸出時顯示行號
31

i++;
32

//輸出處理后的字符
33

System.out.println("第"+i+"行:"+t.termText());
34

//取得下一個字符
35

t=ts.next();
36

}
37

} catch (IOException e) {
38

e.printStackTrace();
39

}
40

}
41

}
42

顯示結(jié)果：

引用
第1行:lighter
第2行:javaeye
第3行:com

提示一下：
StandardAnalyzer是lucene中內(nèi)置的"標(biāo)準(zhǔn)分析器",可以做如下功能:
1、對原有句子按照空格進行了分詞
2、所有的大寫字母都可以能轉(zhuǎn)換為小寫的字母
3、可以去掉一些沒有用處的單詞，例如"is","the","are"等單詞，也刪除了所有的標(biāo)點
查看一下結(jié)果與"new StringReader("lighter javaeye com is the are on")"作一個比較就清楚明了。
這里不對其API進行解釋了，具體見lucene的官方文檔。需要注意一點，這里的代碼使用的是lucene2的API，與1.43版有一些明顯的差別。

2、看另一個實例,簡單地建立索引，進行搜索

代碼

1

package lighter.javaeye.com;
2

import org.apache.lucene.analysis.standard.StandardAnalyzer;
3

import org.apache.lucene.document.Document;
4

import org.apache.lucene.document.Field;
5

import org.apache.lucene.index.IndexWriter;
6

import org.apache.lucene.queryParser.QueryParser;
7

import org.apache.lucene.search.Hits;
8

import org.apache.lucene.search.IndexSearcher;
9

import org.apache.lucene.search.Query;
10

import org.apache.lucene.store.FSDirectory;
11

12

public class FSDirectoryTest {
13

14

//建立索引的路徑
15

public static final String path = "c:\\index2";
16

17

public static void main(String[] args) throws Exception {
18

Document doc1 = new Document();
19

doc1.add( new Field("name", "lighter javaeye com",Field.Store.YES,Field.Index.TOKENIZED));
20

21

Document doc2 = new Document();
22

doc2.add(new Field("name", "lighter blog",Field.Store.YES,Field.Index.TOKENIZED));
23

24

IndexWriter writer = new IndexWriter(FSDirectory.getDirectory(path, true), new StandardAnalyzer(), true);
25

writer.setMaxFieldLength(3);
26

writer.addDocument(doc1);
27

writer.setMaxFieldLength(3);
28

writer.addDocument(doc2);
29

writer.close();
30

31

IndexSearcher searcher = new IndexSearcher(path);
32

Hits hits = null;
33

Query query = null;
34

QueryParser qp = new QueryParser("name",new StandardAnalyzer());
35

36

query = qp.parse("lighter");
37

hits = searcher.search(query);
38

System.out.println("查找\"lighter\" 共" + hits.length() + "個結(jié)果");
39

40

query = qp.parse("javaeye");
41

hits = searcher.search(query);
42

System.out.println("查找\"javaeye\" 共" + hits.length() + "個結(jié)果");
43

44

}
45

46

}
47

運行結(jié)果：

代碼
查找"lighter" 共2個結(jié)果
查找"javaeye" 共1個結(jié)果

本轉(zhuǎn)自:http://tb.blog.csdn.net/TrackBack.aspx?PostId=1797992

---------------------------------------------------------------------------------------------------------------------------------
說人之短，乃護己之短。夸己之長，乃忌人之長。皆由存心不厚，識量太狹耳。能去此弊，可以進德，可以遠怨。
http://www.aygfsteel.com/szhswl
------------------------------------------------------------------------------------------------------ ----------------- ---------

posted on 2007-12-05 17:04 宋針還閱讀(1209) 評論(0) 編輯收藏所屬分類: 搜索引擎

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發(fā)表評論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關(guān)文章: LUCENE學(xué)習(xí)筆記3(轉(zhuǎn)載) 用Lucene加速Web搜索應(yīng)用程序的開發(fā) 給Compass搜索添加高亮(highlight) Compass: 在你的應(yīng)用中集成搜索功能 lunece查詢 Lucene的工作原理(轉(zhuǎn)載) lucene全文檢索應(yīng)用示例及代碼簡析 LUCENE簡單實例

留言簿(6)

我參與的團隊

隨筆檔案(8)

文章分類(149)

新聞分類(1)

相冊

收藏夾(21)

友情鏈接

我的鏈接

搜索

積分與排名

最新評論

閱讀排行榜

評論排行榜