唐朝之千年不變
編程技巧

隨筆 - 11 文章 - 2 trackbacks - 0

2007年8月

>

日

一

二

三

四

五

六

29

30

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

1

2

3

4

5

6

7

8

常用鏈接

留言簿(1)

隨筆檔案

相冊

me

搜索

閱讀排行榜

評論排行榜

2007年8月16日

Lucene數(shù)據(jù)索引搜索示例

平臺：Lucene 2.1.0，JRE 1.4，Oracle 10g，IBM Web Sphere。
數(shù)據(jù)表：Article。字段：ID（自動增長），Title（String），Content（String）。共有550000條記錄。
對Article建立索引：

1import org.apache.lucene.analysis.*;
2import org.apache.lucene.analysis.cn.*;
3import org.apache.lucene.document.*;
4import org.apache.lucene.index.*;
5import java.sql.*;
6import oracle.jdbc.pool.*;
7
8public class Index {
9 private String url="jdbc:oracle:thin:@//192.168.0.l:1521/Test";
10 private String user="terry";
11 private String password="dev";
12 private Connection con=null;
13 private Statement st=null;
14 private ResultSet rs=null;
15 private String indexUrl="E:\\ArticleIndex";
16
17 private ResultSet getResult() throws Exception{
18        OracleDataSource ods=new OracleDataSource();
19
20        ods.setURL(this.url);
21        ods.setUser(this.user);
22        ods.setPassword(this.password);
23
24 this.con=ods.getConnection();
25 this.st=this.con.createStatement();
26 this.rs=this.st.executeQuery("SELECT * FROM Article");
27
28 return this.rs;
29    }
30
31 public void createIndex() throws Exception{
32        ResultSet rs=this.getResult();
33
34        Analyzer chineseAnalyzer=new ChineseAnalyzer();
35        IndexWriter indexWriter=new IndexWriter(this.indexUrl,chineseAnalyzer,true);
36        indexWriter.setMergeFactor(100);
37        indexWriter.setMaxBufferedDocs(100);
38
39        java.util.Date startDate=new java.util.Date();
40
41        System.out.println("開始索引時間："+startDate);
42
43        executeIndex(rs,indexWriter);
44
45        indexWriter.optimize();
46
47        indexWriter.close();
48
49        java.util.Date endDate=new java.util.Date();
50
51        System.out.println("索引結(jié)束時間："+endDate);
52        System.out.println("共花費(fèi)："+(endDate.getTime()-startDate.getTime())+"ms");
53    }
54
55 private void executeIndex(ResultSet rs,IndexWriter indexWriter) throws Exception{
56 int i=0;
57
58 while(rs.next()){
59 int id=rs.getInt("ID");
60            String title=rs.getString("TITLE");
61            String info=rs.getString("CONTENT");
62
63            Document doc=new Document();
64
65            Field idField=new Field("ID",Integer.toString(id),Field.Store.YES,Field.Index.NO,Field.TermVector.NO);
66            Field titleField=new Field("Title",title,Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES);
67         Field infoField=new Field("Content",title,Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES);
68
69            doc.add(idField);
70            doc.add(titleField);
71            doc.add(infoField);
72
73            indexWriter.addDocument(doc);
74
75            i++;
76        }
77
78 this.close();
79
80        System.out.println("共處理記錄："+i);
81    }
82
83 private void close() throws Exception{
84 this.rs.close();
85 this.st.close();
86 this.con.close();
87    }
88}

查找：

1import java.io.*;
2import org.apache.lucene.analysis.cn.*;
3import org.apache.lucene.search.*;
4import org.apache.lucene.store.*;
5import org.apache.lucene.document.*;
6import org.apache.lucene.queryParser.QueryParser;
7
8import java.util.*;
9
10public class Search {
11
12 private static final String indexUrl="E:\\ArticleIndex";
13
14 public static void main(String[] args) throws Exception {
15/**/    /*建立索引代碼，查找時注釋*/
16 //Index index=new Index();
17
18 //index.createIndex();
19
20
21
22
23        File indexDir=new File(indexUrl);
24        FSDirectory fdir=FSDirectory.getDirectory(indexDir);
25
26        IndexSearcher searcher=new IndexSearcher(fdir);
27
28//對中文建立解析（必須）
29        QueryParser parser=new QueryParser("Title",new ChineseAnalyzer());
30        Query query=parser.parse("李湘");
31
32        Date startDate=new Date();
33        System.out.println("檢索開始時間："+startDate);
34
35        Hits result=searcher.search(query);
36
37 for(int i=0;i<result.length();i++){
38            Document doc=result.doc(i);
39
40            System.out.println("內(nèi)容："+doc.get("Content"));
41        }
42
43        Date endDate=new Date();
44
45        System.out.println("共有記錄："+result.length());
46        System.out.println("共花費(fèi)："+(endDate.getTime()-startDate.getTime()));
47    }
48
49}

經(jīng)測試，建立索引文件大概花了11分鐘。一般情況下，和用SQL執(zhí)行LIKE查詢差不多。

當(dāng)然，這只是我的粗略測試。最近一階段，我會對Lucene進(jìn)行代碼深入研究。

posted @ 2007-08-16 10:57 jacksontoto 閱讀(254) | 評論 (0) | 編輯收藏

構(gòu)建各種Lucene Query

摘要: 搜索流程中的第二步就是構(gòu)建一個Query。下面就來介紹Query及其構(gòu)建。當(dāng)用戶輸入一個關(guān)鍵字，搜索引擎接收到后，并不是立刻就將它放入后臺開始進(jìn)行關(guān)鍵字的檢索，而應(yīng)當(dāng)首先對這個關(guān)鍵字進(jìn)行一定的分析和處理，使之成為一種后臺可以理解的形式，只有這樣，才能提高檢索的效率，同時檢索出更加有效的結(jié)果。那么，在Lucene中，這種處理，其實就是構(gòu)建一個Query對象。就Query對象本身言，它只是Luce... 閱讀全文

posted @ 2007-08-16 10:56 jacksontoto 閱讀(700) | 評論 (0) | 編輯收藏

常用鏈接

留言簿(1)

隨筆檔案

相冊

搜索

最新評論

閱讀排行榜

評論排行榜