99国产精品久久久久久久,欧美精品一区二区三区国产精品 ,亚洲最新在线视频

Lucene是apache組織的一個用java實(shí)現(xiàn)全文搜索引擎的開源項(xiàng)目。

其功能非常的強(qiáng)大，api也很簡單。總得來說用Lucene來進(jìn)行建立

和搜索和操作數(shù)據(jù)庫是差不多的(有點(diǎn)像)，Document可以看作是

數(shù)據(jù)庫的一行記錄，F(xiàn)ield可以看作是數(shù)據(jù)庫的字段。用lucene實(shí)

現(xiàn)搜索引擎就像用JDBC實(shí)現(xiàn)連接數(shù)據(jù)庫一樣簡單。

Lucene2.0，它與以前廣泛應(yīng)用和介紹的Lucene 1.4.3并不兼容。

Lucene2.0的下載地址是http://apache.justdn.org/lucene/java/

大家先看一個例子，通過這個例子來對 lucene 的一個大概的認(rèn)識。

一個 Junit 測試用例： ( 為了讓代碼清晰好看，我們將異常都拋出 )

a)??? 這是一個建立文件索引的例子

public void testIndexHello() throws IOException

??? {

??????? Date date1 = new Date(); ?

??????? // 可以說是創(chuàng)建一個新的寫入工具

??????? // 第一個參數(shù)是要索引建立在哪個目錄里

??????? // 第二個參數(shù)是新建一個文本分析器,這里用的是標(biāo)準(zhǔn)的大家也可以自己寫一個

??????? // 第三個參數(shù)如果是true，在建立索引之前先將c:\\index目錄清空。

??????? IndexWriter writer = new IndexWriter( "c:\\index" , new StandardAnalyzer(), true );

???????

//????? 這個是數(shù)據(jù)源的文件夾

??????? File file = new File( "c:\\file" );

??????? /**

??????? ? * 例子主要是將C:\\file目錄下的文件的內(nèi)容進(jìn)行建立索引，將文件路徑作為搜索內(nèi)容的附屬.

??????? ? */

???????

??????? if (file.isDirectory())

??????? {

??????????? String[] fileList = file.list();

??????????? for ( int i = 0; i < fileList. length ; i++)

??????????? {

//????????????? 建立一個新的文檔,它可以看作是數(shù)據(jù)庫的一行記錄

??????????????? Document doc = new Document();

??????????????? File f = new File(file,

??????????????? ??????? fileList[i]);

??????????????? Reader reader = new BufferedReader( new FileReader(f));

??????????????? doc.add( new Field( "file" ,reader)); // 為doument添加field

??????????????? doc.add( new Field( "path" ,f.getAbsolutePath(),Field.Store. YES ,Field.Index. NO ));

??????????????? writer.addDocument(doc);

??????????? }

???????????

??????? }

??????? writer.close(); // 這一步是必須的，只有這樣數(shù)據(jù)才會被寫入索引的目錄里

??????? Date date2 = new Date();

??????? System. out .println( " 用時" +(date2.getTime()-date1.getTime())+ " 毫秒" );

}

注意：因?yàn)榻⑺饕緛砭褪琴M(fèi)時，所以說最后輸出的用時會比較長，請不要奇怪。

b) 一個通過索引來全文檢索的例子

public void HelloSearch() throws IOException, ParseException

??? {

??????? IndexSearcher indexSearcher = new IndexSearcher( "c:\\index" ); // 和上面的IndexWriter一樣是一個工具

??????? QueryParser queryParser = new QueryParser( "file" , // 這是一個分詞器

??????????????? new StandardAnalyzer());

??????? BufferedReader br = new BufferedReader( new InputStreamReader(System. in ));

??????? Query query = queryParser.parse(br.readLine()); // 這個地方Query是抽象類大家也注意一下，下面會講到的

??????? Hits hits = indexSearcher.search(query);

??????? Document doc = null ;

??????? System. out .print( " 正搜索................" );

??????? for ( int i = 0; i < hits.length(); i++)

??????? {

??????????? doc = hits.doc(i);

??????????? System. out .println( " 內(nèi)容是：" +doc.get( "file" )); // 注意這里輸出的是什么

??????????? System. out .println( " 文件的路徑是：" + doc.get( "path" ));

??????? }

??? }

通過上面的兩個例子應(yīng)該可以看出Lucene還是比較簡單的。

運(yùn)行一下上面的兩個例子，大家可能會說怎么doc.get( “ file ” ); 返回的是空呢,我們馬上會講到。

下面講一下索引的建立

其實(shí)從上面的例子就可以看出建立索引就用到Document,IndexWriter,Field。

最簡單的步驟就是：

首先分別new 一個Document，IndexWriter,Field

然后用Doument.add()方法加入Field,

其次用IndexWrtier.addDocument()方法加入Document。

最后調(diào)用一下IndexWriter.close()方法關(guān)閉輸入索引，這一步非常的重要只有調(diào)用這個方法索引才會被寫入索引的目錄里，而這是被很多初學(xué)的人所忽略的。

Document 沒有什么好介紹的，把它的作用看成數(shù)據(jù)庫中的一行記錄就行。

Field 是一個比較重要的也是比較復(fù)雜的：

看一下它的構(gòu)造函數(shù)有5個：

Field (String?name, byte[]?value, Field.Store?store)

Field (String?name, Reader?reader)

Field (String?name, Reader?reader, Field.TermVector?termVector)

Field (String?name, String?value, Field.Store?store, Field.Index?index)

Field (String?name, String?value, Field.Store?store, Field.Index?index, Field.TermVector?termVector)

在Field中有三個內(nèi)部類：Field.Index,Field.Store,Field.termVector，而構(gòu)造函數(shù)也用到了它們。

注意： termVector 是Lucene 1.4 新增的它提供一種向量機(jī)制來進(jìn)行模糊查詢的這個不常用，默認(rèn)是false不過是什么對于一般查詢無影響。

它們的不同的組合，在全文檢索中有著不同的作用。看看下面的表吧:

Field.Index	Field.Store	說明
`TOKENIZED (` `分詞)`	`YES`	文章的標(biāo)題或內(nèi)容(如果是內(nèi)容的話不能太長)是可以被搜索的
`TOKENIZED`	`NO`	文章的標(biāo)題或內(nèi)容(內(nèi)容可以很長)也是可以被看過的
`NO`	`YES`	這是不能被搜索的，它只是被搜索內(nèi)容的附屬物。如URL等
`UN_TOKENIZED`	`YES/NO`	不被分詞，它作為一個整體被搜索,搜一部分是搜不出來的
`NO`	`NO`	沒有這種用法

而對于 Field (String?name, Reader?reader)

Field (String?name, Reader?reader, Field.TermVector?termVector)

他們是Field.Index.TOKENIZED和Field.Store.NO的。這就是為什么我們在上面的例子中會出現(xiàn)文章的內(nèi)容為null了。因?yàn)樗皇潜凰饕耍]有被存儲下來。如果一定要看到文章的內(nèi)容的話可以通過文章的路徑得到畢竟文章的路徑是作為搜索的附屬物被搜索出來了。而我們在Web開發(fā)的時候一般是將大數(shù)據(jù)放在數(shù)據(jù)庫中，不會放在文件系統(tǒng)中，更不會放在索引目錄里，因?yàn)樗罅瞬僮鲿哟蠓?wù)器的負(fù)擔(dān)。

下面介紹一下IndexWriter:

它就是一個寫入索引的寫入器,它的任務(wù)比較簡單：

1. 用addDocument()將已經(jīng)準(zhǔn)備好寫入索引的document們加入

2. 調(diào)用close()將索引寫入索引目錄

先看一下它的構(gòu)造函數(shù):

IndexWriter (Directory?d, Analyzer?a, boolean?create)

（未完）

接http://www.javaeye.com/post/190334
IndexWriter(File path, Analyzer a, boolean create)
IndexWriter(String path, Analyzer a, boolean create)
可見構(gòu)造它需要一個索引文件目錄，一個分析器(一般用標(biāo)準(zhǔn)的這個)，最后一個參數(shù)是標(biāo)識是否清空索引目錄
它有一些設(shè)置參數(shù)的功能如：設(shè)置Field的最大長度
看個例子：
[code]
public void IndexMaxField() throws IOException
{
??????? IndexWriter indexWriter= new IndexWriter("c:\\index",new StandardAnalyzer(),true);
??????? Document doc1 = new Document();
??????? doc1.add(new Field("name1","程序員之家",Field.Store.YES,Field.Index.TOKENIZED));
??????? Document doc2 = new Document();
??????? doc2.add(new Field("name2","Welcome to the Home of programers",Field.Store.YES,Field.Index.TOKENIZED));
??????? indexWriter.setMaxFieldLength(5);
??????? indexWriter.addDocument(doc1);
??????? indexWriter.setMaxFieldLength(3);
??????? indexWriter.addDocument(doc1);
??????? indexWriter.setMaxFieldLength(0);
??????? indexWriter.addDocument(doc2);
??????? indexWriter.setMaxFieldLength(3);
??????? indexWriter.addDocument(doc2);
??????? indexWriter.close();
}
public void SearcherMaxField() throws ParseException, IOException
{
??????? Query query = null;
??????? Hits hits = null;
??????? IndexSearcher indexSearcher= null;
??????? QueryParser queryParser= null;
??????? queryParser = new QueryParser("name1",new StandardAnalyzer());
??????? query = queryParser.parse("程序員");
??????? indexSearcher= new IndexSearcher("c:\\index");
??????? hits = indexSearcher.search(query);
??????? System.out.println("您搜的是：程序員");
??????? System.out.println("找到了"+hits.length()+"個結(jié)果");
??????? System.out.println("它們分別是：");
??????? for (int i = 0; i < hits.length(); i++)
??????? {
??????????? Document doc = hits.doc(i);
??????????? System.out.println(doc.get("name1"));
??????? }
??????? query = queryParser.parse("程序員之家");
??????? indexSearcher= new IndexSearcher("c:\\index");
??????? hits = indexSearcher.search(query);
??????? System.out.println("您搜的是：程序員之家");
??????? System.out.println("找到了"+hits.length()+"個結(jié)果");
??????? System.out.println("它們分別是：");
??????? for (int i = 0; i < hits.length(); i++)
??????? {
??????????? Document doc = hits.doc(i);
??????????? System.out.println(doc.get("name1"));
??????? }
??????? queryParser = new QueryParser("name2",new StandardAnalyzer());
??????? query = queryParser.parse("Welcome");
??????? indexSearcher= new IndexSearcher("c:\\index");
??????? hits = indexSearcher.search(query);
??????? System.out.println("您搜的是：Welcome");
??????? System.out.println("找到了"+hits.length()+"個結(jié)果");
??????? System.out.println("它們分別是：");
??????? for (int i = 0; i < hits.length(); i++)
??????? {
??????????? Document doc = hits.doc(i);
??????????? System.out.println(doc.get("name2"));
??????? }????????? ?
??????? query = queryParser.parse("the");
??????? indexSearcher= new IndexSearcher("c:\\index");
??????? hits = indexSearcher.search(query);
??????? System.out.println("您搜的是：the");
??????? System.out.println("找到了"+hits.length()+"個結(jié)果");
??????? System.out.println("它們分別是：");
??????? for (int i = 0; i < hits.length(); i++)
??????? {
??????????? Document doc = hits.doc(i);
??????????? System.out.println(doc.get("name2"));
??????? }
??????? query = queryParser.parse("home");
??????? indexSearcher= new IndexSearcher("c:\\index");
??????? hits = indexSearcher.search(query);
??????? System.out.println("您搜的是：home");
??????? System.out.println("找到了"+hits.length()+"個結(jié)果");
??????? System.out.println("它們分別是：");
??????? for (int i = 0; i < hits.length(); i++)
??????? {
??????????? Document doc = hits.doc(i);
??????????? System.out.println(doc.get("name2"));
??????? }
}
[/code]
它的運(yùn)行結(jié)果為:
總結(jié)一下：
1.設(shè)置Field的長度限制只是限制了搜索。如果用了Field.Store.YES的話還是會
全部被保存進(jìn)索引目錄里的。
2.為什么搜the沒有搜出來呢是因?yàn)閘ucene分析英文的時候不會搜索the to of 等無
用的詞(搜這些詞是無意義的)。
3.New StandardAnlayzer()對于英文的分詞是按空格和一些無用的詞，而中文呢是全部的單個
的字。
4.設(shè)置Field的最大長度是以0開頭和數(shù)組一樣。
程序員之家----------3--------程序員之
??????????????????????????????????? 0 1 2? 3
Welcome to the home of programmers------3------Welcome to the home of programmers

?????????????????????????????????????????????????? 0?????????? 1???????? 2
大家還可以試一下別的，以便加深一下印象
(未完)

Lucene.JPG
?描述:	?lucene講解圖片
?文件大小:	?39 KB
?看過的:	?文件被下載或查看 77 次
下載

返回頂端
接http://www.javaeye.com/post/190335
到現(xiàn)在我們已經(jīng)可以用lucene建立索引了
下面介紹一下幾個功能來完善一下：
1．索引格式
其實(shí)索引目錄有兩種格式，一種是除配置文件外，每一個Document獨(dú)立成為一個文件（這種搜索起來會影響速度）。另一種是全部的Document成一個文件，這樣屬于復(fù)合模式就快了。
2.索引文件可放的位置：
索引可以存放在兩個地方1.硬盤，2.內(nèi)存
放在硬盤上可以用FSDirectory()，放在內(nèi)存的用RAMDirectory()不過一關(guān)機(jī)就沒了
FSDirectory.getDirectory(File file, boolean create)
FSDirectory.getDirectory(String path, boolean create)兩個工廠方法返回目錄
New RAMDirectory()就直接可以
再和IndexWriter(Directory d, Analyzer a, boolean create)一配合就行了
如：
IndexWrtier indexWriter = new IndexWriter(FSDirectory.getDirectory(“c:\\index”,true),new StandardAnlyazer(),true);
IndexWrtier indexWriter = new IndexWriter(new RAMDirectory(),new StandardAnlyazer(),true);
3.索引的合并
這個可用IndexWriter.addIndexes(Directory[] dirs)將目錄加進(jìn)去
來看個例子:
[code]
public void UniteIndex() throws IOException
??? {
??????? IndexWriter writerDisk = new IndexWriter(FSDirectory.getDirectory("c:\\indexDisk", true),new StandardAnalyzer(),true);
??????? Document docDisk = new Document();
??????? docDisk.add(new Field("name","程序員之家",Field.Store.YES,Field.Index.TOKENIZED));
??????? writerDisk.addDocument(docDisk);
??????? RAMDirectory ramDir = new RAMDirectory();
??????? IndexWriter writerRam = new IndexWriter(ramDir,new StandardAnalyzer(),true);
??????? Document docRam = new Document();
??????? docRam.add(new Field("name","程序員雜志",Field.Store.YES,Field.Index.TOKENIZED));
??????? writerRam.addDocument(docRam);
??????? writerRam.close();//這個方法非常重要，是必須調(diào)用的
??????? writerDisk.addIndexes(new Directory[]{ramDir});
??????? writerDisk.close();
??? }
??? public void UniteSearch() throws ParseException, IOException
??? {
??????? QueryParser queryParser = new QueryParser("name",new StandardAnalyzer());
??????? Query query = queryParser.parse("程序員");
??????? IndexSearcher indexSearcher =new IndexSearcher("c:\\indexDisk");
??????? Hits hits = indexSearcher.search(query);
??????? System.out.println("找到了"+hits.length()+"結(jié)果");
??????? for(int i=0;i
??????? {
??????????? Document doc = hits.doc(i);
??????????? System.out.println(doc.get("name"));
??????? }
}
[/code]
這個例子是將內(nèi)存中的索引合并到硬盤上來.
注意：合并的時候一定要將被合并的那一方的IndexWriter的close()方法調(diào)用。
4.對索引的其它操作:
IndexReader類是用來操作索引的，它有對Document,Field的刪除等操作。
下面一部分的內(nèi)容是：全文的搜索
全文的搜索主要是用：IndexSearcher,Query,Hits,Document(都是Query的子類),有的時候用QueryParser
主要步驟：
1.new QueryParser(Field字段，new 分析器)
2.Query query = QueryParser.parser(“要查詢的字串”);這個地方我們可以用反射api看一下query究竟是什么類型
3.new IndexSearcher(索引目錄).search(query);返回Hits
4.用Hits.doc(n);可以遍歷出Document
5.用Document可得到Field的具體信息了。
其實(shí)1　，2兩步就是為了弄出個Query 實(shí)例，究竟是什么類型的看分析器了。
拿以前的例子來說吧
QueryParser queryParser = new QueryParser("name",new StandardAnalyzer());
??????? Query query = queryParser.parse("程序員");
/*這里返回的就是org.apache.lucene.search.PhraseQuery*/
??????? IndexSearcher indexSearcher =new IndexSearcher("c:\\indexDisk");
??????? Hits hits = indexSearcher.search(query);
不管是什么類型，無非返回的就是Query的子類，我們完全可以不用這兩步直接new個Query的子類的實(shí)例就ok了，不過一般還是用這兩步因?yàn)樗祷氐氖荘hraseQuery這個是非常強(qiáng)大的query子類它可以進(jìn)行多字搜索用QueryParser可以設(shè)置各個關(guān)鍵字之間的關(guān)系這個是最常用的了。
IndexSearcher:
其實(shí)IndexSearcher它內(nèi)部自帶了一個IndexReader用來讀取索引的，IndexSearcher有個close()方法，這個方法不是用來關(guān)閉IndexSearche的是用來關(guān)閉自帶的IndexReader。

QueryParser呢可以用parser.setOperator()來設(shè)置各個關(guān)鍵字之間的關(guān)系（與還是或）它可以自動通過空格從字串里面將關(guān)鍵字分離出來。
注意：用QueryParser搜索的時候分析器一定的和建立索引時候用的分析器是一樣的。
Query:
可以看一個lucene2.0的幫助文檔有很多的子類:
BooleanQuery, ConstantScoreQuery, ConstantScoreRangeQuery, DisjunctionMaxQuery, FilteredQuery, MatchAllDocsQuery, MultiPhraseQuery, MultiTermQuery, PhraseQuery, PrefixQuery, RangeQuery, SpanQuery, TermQuery
各自有用法看一下文檔就能知道它們的用法了
下面一部分講一下lucene的分析器:
分析器是由分詞器和過濾器組成的，拿英文來說吧分詞器就是通過空格把單詞分開，過濾器就是把the,to,of等詞去掉不被搜索和索引。
我們最常用的是StandardAnalyzer()它是lucene的標(biāo)準(zhǔn)分析器它集成了內(nèi)部的許多的分析器。
最后一部分了：lucene的高級搜索了
1.排序
Lucene有內(nèi)置的排序用IndexSearcher.search(query,sort)但是功能并不理想。我們需要自己實(shí)現(xiàn)自定義的排序。
這樣的話得實(shí)現(xiàn)兩個接口: ScoreDocComparator, SortComparatorSource
用IndexSearcher.search(query,new Sort(new SortField(String Field,SortComparatorSource)));
就看個例子吧：
這是一個建立索引的例子：
[code]
public void IndexSort() throws IOException
{
??????? IndexWriter writer = new IndexWriter("C:\\indexStore",new StandardAnalyzer(),true);
??????? Document doc = new Document()
??????? doc.add(new Field("sort","1",Field.Store.YES,Field.Index.TOKENIZED));
??????? writer.addDocument(doc);
??????? doc = new Document();
??????? doc.add(new Field("sort","4",Field.Store.YES,Field.Index.TOKENIZED));
??????? writer.addDocument(doc);
??????? doc = new Document();
??????? doc.add(new Field("sort","3",Field.Store.YES,Field.Index.TOKENIZED));
??????? writer.addDocument(doc);
??????? doc = new Document();
??????? doc.add(new Field("sort","5",Field.Store.YES,Field.Index.TOKENIZED));
??????? writer.addDocument(doc);
??????? doc = new Document();
??????? doc.add(new Field("sort","9",Field.Store.YES,Field.Index.TOKENIZED));
??????? writer.addDocument(doc);
??????? doc = new Document();
??????? doc.add(new Field("sort","6",Field.Store.YES,Field.Index.TOKENIZED));
??????? writer.addDocument(doc);
??????? doc = new Document();
??????? doc.add(new Field("sort","7",Field.Store.YES,Field.Index.TOKENIZED));
??????? writer.addDocument(doc);
??????? writer.close();
}
[/code]
（未完）
下面是搜索的例子:
[code]
public void SearchSort1() throws IOException, ParseException
{
??????? IndexSearcher indexSearcher = new IndexSearcher("C:\\indexStore");
??????? QueryParser queryParser = new QueryParser("sort",new StandardAnalyzer());
??????? Query query = queryParser.parse("4");
????? ?
??????? Hits hits = indexSearcher.search(query);
??????? System.out.println("有"+hits.length()+"個結(jié)果");
??????? Document doc = hits.doc(0);
??????? System.out.println(doc.get("sort"));
}
public void SearchSort2() throws IOException, ParseException
{
??????? IndexSearcher indexSearcher = new IndexSearcher("C:\\indexStore");
??????? Query query = new RangeQuery(new Term("sort","1"),new Term("sort","9"),true);//這個地方前面沒有提到，它是用于范圍的Query可以看一下幫助文檔.
??????? Hits hits = indexSearcher.search(query,new Sort(new SortField("sort",new MySortComparatorSource())));
??????? System.out.println("有"+hits.length()+"個結(jié)果");
??????? for(int i=0;i
??????? {
??????????? Document doc = hits.doc(i);
??????????? System.out.println(doc.get("sort"));
??????? }
}
public class MyScoreDocComparator implements ScoreDocComparator
{
??? private Integer[]sort;
??? public MyScoreDocComparator(String s,IndexReader reader, String fieldname) throws IOException
??? {
??????? sort = new Integer[reader.maxDoc()];
??????? for(int i = 0;i
??????? {
??????????? Document doc =reader.document(i);
??????????? sort[i]=new Integer(doc.get("sort"));
??????? }
??? }
??? public int compare(ScoreDoc i, ScoreDoc j)
??? {
??????? if(sort[i.doc]>sort[j.doc])
??????????? return 1;
??????? if(sort[i.doc]
??????????? return -1;
??????? return 0;
??? }
??? public int sortType()
??? {
??????? return SortField.INT;
??? }
??? public Comparable sortValue(ScoreDoc i)
??? {
??????? // TODO 自動生成方法存根
??????? return new Integer(sort[i.doc]);
??? }
}
public class MySortComparatorSource implements SortComparatorSource
{
??? private static final long serialVersionUID = -9189690812107968361L;
??? public ScoreDocComparator newComparator(IndexReader reader, String fieldname)
??????????? throws IOException
??? {
??????? if(fieldname.equals("sort"))
??????????? return new MyScoreDocComparator("sort",reader,fieldname);
??????? return null;
??? }
}[/code]
SearchSort1()輸出的結(jié)果沒有排序,SearchSort2()就排序了。
2.多域搜索MultiFieldQueryParser
如果想輸入關(guān)鍵字而不想關(guān)心是在哪個Field里的就可以用MultiFieldQueryParser了
用它的構(gòu)造函數(shù)即可后面的和一個Field一樣。
MultiFieldQueryParser. parse(String[] queries, String[] fields, BooleanClause.Occur[] flags, Analyzer analyzer)????????????????????????????????????????? ~~~~~~~~~~~~~~~~~
第三個參數(shù)比較特殊這里也是與以前l(fā)ucene1.4.3不一樣的地方
看一個例子就知道了
String[] fields = {"filename", "contents", "description"};
?BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
??????????????? BooleanClause.Occur.MUST,//在這個Field里必須出現(xiàn)的
??????????????? BooleanClause.Occur.MUST_NOT};//在這個Field里不能出現(xiàn)
?MultiFieldQueryParser.parse("query", fields, flags, analyzer);
?（未完）

發(fā)表于 2006-12-25 11:05 rendong 閱讀(373) 評論(0) 編輯收藏所屬分類: j2ee

Lucene

常用鏈接

留言簿(5)

隨筆分類

隨筆檔案

好的blog

好的站點(diǎn)

搜索

最新評論

閱讀排行榜

評論排行榜

學(xué)習(xí)--共同努力
BlogJava \| 首頁 \| 發(fā)新隨筆 \| 發(fā)新文章 \| 聯(lián)系 \| 聚合 \| 管理	隨筆：48 文章：0 評論：18 引用：0