亚洲视频在线一区二区,zzijzzij亚洲日本少妇熟睡,精品一区二区免费

lucene整理2 -- 主要的類

1. lucene中主要的類
1.1. Document文檔類
1.1.1.常用方法
方法
描述

void add(Field field)
往Document對(duì)象中添加字段

void removeField(String name)
刪除字段。若多個(gè)字段以同一個(gè)字段名存在，則刪除首先添加的字段；若不存在，則Document保持不變

void removeFields(String name)
刪除所有字段。若字段不存在，則Document保持不變

Field getField（String name）
若多個(gè)字段以同一個(gè)字段名存在，則返回首先添加的字段；若字段不存在，則Document保持不變

Enumeration fields()
返回Document對(duì)象的所有字段，以枚舉類型返回

Field [] getFields(String name)
根據(jù)名稱得到一個(gè)Field的數(shù)組

String [] getValues(String name)
根據(jù)名稱得到一個(gè)Field的值的數(shù)組

1.1.2.示例
Document doc1 = new Document();

doc1.add(new Field("name", "word1 word2 word3",

Field.Store.NO,Field.Index.TOKENIZED));

Document doc2 = new Document();

doc2.add(new Field("name", "word1 word2 word3",

Field.Store.NO,Field.Index.TOKENIZED));

1.2. Field字段類
1.2.1.構(gòu)造方法
1) public Field(String name,String value,Store store,Index index);//直接的字符串方式

2) public Field(String name,String value,Store store,Index index,TermVector termVector);

3) public Field(String name,String value,Reader reader);//使用Reader從外部傳入

4) public Field(String name,String value,Reader reader,TermVector termVector);

5) public Field(String name,byte[] value,Store store)//使用直接的二進(jìn)制byte傳入

當(dāng)Field值為二進(jìn)制時(shí)，可以使用Lucene的壓縮功能將其值進(jìn)行壓縮。

1.2.2.Store類
靜態(tài)屬性
描述

Store.NO
表示該Field不需要存儲(chǔ)

Store.YES
表示該Field需要存儲(chǔ)

Store.COMPRESS
表示用壓縮方式來保存這個(gè)Field的值

1.2.3.Index類
靜態(tài)屬性
描述

Index.NO
不需要索引

Index.TOKENIZED
先被分詞再被索引

Index.UN_TOKENIZED
不對(duì)該Field進(jìn)行分詞，但會(huì)對(duì)它進(jìn)行索引

Index.NO_NORMS
對(duì)該Field進(jìn)行索引，但是不使用Analyzer，同時(shí)禁止它參加評(píng)分，主要是為了減少內(nèi)存的消耗。

1.2.4.示例
new Field("name", "word1 word2 word3",Field.Store.YES,Field.Index.TOKENIZED)

1.3. IndexWriter類
1.3.1.構(gòu)造方法
1) public IndexWriter(String path,Analyzer a,Boolean create)

2) public IndexWriter(File path,Analyzer a,Boolean create)

3) public IndexWriter(Directory d,Analyzer a,Boolean create)

第一個(gè)參數(shù)：索引存放在什么地方

第二個(gè)參數(shù)：分析器，繼承自org.apache.lucene.analysis.Analyzer類

第三個(gè)參數(shù)：為true時(shí)，IndexWriter不管目錄內(nèi)是否已經(jīng)有索引了，一律清空，重新建立；當(dāng)為false時(shí)，則IndexWriter會(huì)在原有基礎(chǔ)上增量添加索引。所以在更新的過程中，需要設(shè)置該值為false。

1.3.2.添加文檔
public void addDocument(Document doc)

public void addDocument(Document doc,Analyzer analyzer)//使用一個(gè)開發(fā)者自定義的，而非事先在構(gòu)建IndexWriter時(shí)聲明的Analyzer來進(jìn)行分析

writer.addDocument(doc1);

1.3.3.性能參數(shù)
1) mergeFactor控制Lucene在把索引從內(nèi)存寫入磁盤上的文件系統(tǒng)時(shí)內(nèi)存中最大的Document數(shù)量，同時(shí)它還控制內(nèi)存中最大的Segment數(shù)量。默認(rèn)為10.

writer.setMergeFactor(10);

2) maxMergeDocs限制一個(gè)Segment中最大的文檔數(shù)量。一個(gè)較大的maxMergeDocs適用于對(duì)大批量的文檔建立索引，增量式的索引則應(yīng)使用較小的maxMergeDocs。

writer.setMaxMergeDocs(1000);

3) minMergeDocs用于控制內(nèi)存中持有的文檔數(shù)量的，它對(duì)磁盤上的Segment大小沒有任何影響。

1.3.4.限制Field的長(zhǎng)度
maxFieldLength限制Field的長(zhǎng)度，默認(rèn)值為10000.最大值100000個(gè)。

public void setMaxFieldLength(int maxFieldLength)

writer.addDocument(doc1);

writer.setMaxFieldLength(100000);

writer.addDocument(doc2);

1.3.5.復(fù)合索引格式
setUseCompoundFile(Boolean) 默認(rèn)true

writer.setUseCompoundFile(true);//復(fù)合索引

writer.setUseCompoundFile(false);

1.3.6.優(yōu)化索引
writer.optimize();

將磁盤上的多個(gè)segment進(jìn)行合并，組成一個(gè)全新的segment。這種方法并不會(huì)增加建索時(shí)的速度，反而會(huì)降低建索的速度。所以應(yīng)該在建完索引后在調(diào)用這個(gè)函數(shù)

1.3.7.示例
IndexWriter writer = new IndexWriter(path, new StandardAnalyzer(), true);

writer.addDocument(doc1);

writer.addDocument(doc2);

Sytem.out.println(writer.docCount());

writer.close();

IndexSearcher searcher = new IndexSearcher(path);

Hits hits = null;

Query query = null;

QueryParser parser =new QueryParser("name", new StandardAnalyzer());

query =parser.parse("word1");

hits = searcher.search(query);

System.out.println("查找 word1 共" + hits.length() + "個(gè)結(jié)果");

1.4. Directory類
Directory：用于索引的存放位置

a) FSDirectory.getDirectory(path, true)第二個(gè)參數(shù)表示刪除掉目錄內(nèi)原有內(nèi)容

IndexWriter writer = new IndexWriter(FSDirectory.getDirectory(path, true), new StandardAnalyzer(), true);//刪除原有索引

或

FSDirectory fsDir=FSDirectory.getDirectory(path,true);

IndexWriter writer = new IndexWriter(fsDir, new StandardAnalyzer(), true);

b) RAMDirectory在內(nèi)存中存放，讀取速度快，但程序一運(yùn)行結(jié)束，它的內(nèi)容就不存在了

RAMDirectory ramDir=new RAMDirectory();

IndexWriter writer = new IndexWriter(ramDir, new StandardAnalyzer(), true);

或

IndexWriter writer = new IndexWriter(new RAMDirectory(), new StandardAnalyzer(), true);

1.5. IndexReader類
IndexReader類――索引的讀取工具

1.5.1.刪除文檔
IndexReader reader=IndexReader.open(path);

reader.deleteDocument(0);//刪除第一個(gè)

reader.close();

1.5.2.反刪除
reader.undeleteAll();

1.5.3.按字段刪除
reader.deleteDocuments(new Term("name","word1"));

若要真正物理刪除，則只需使用IndexWriter對(duì)索引optimize一次即可！

1.5.4.示例
IndexReader reader=IndexReader.open(path);

for(int i=0;i<reader.numDocs();i++){

System.out.println(reader.document(i));

}

System.out.println("版本："+reader.getVersion());

System.out.println("索引內(nèi)的文檔數(shù)量："+reader.numDocs());

//reader.deleteDocuments(new Term("name","word1"));

Term term1=new Term("name","word1");

TermDocs docs=reader.termDocs(term1);

while(docs.next())

{

System.out.println("含有所查找的"+term1+"的Document的編號(hào)為"+docs.doc());

System.out.println("Term在文檔中的出現(xiàn)次數(shù)"+docs.freq());

}

reader.close();

1.6. IndexModifier類
集成了IndexWriter的大部分功能和IndexReader中對(duì)索引刪除的功能 ------ Lucene2.0的新類

1.6.1.示例
public static void main(String[] args) throws Exception {

IndexModifier modifier=new IndexModifier("C:\\Q1",new StandardAnalyzer(),true);

Document doc1=new Document();

doc1.add(new Field("bookname","鋼鐵是怎樣煉成的",Field.Store.YES,Field.Index.TOKENIZED));

Document doc2=new Document();

doc2.add(new Field("bookname","山山水水",Field.Store.YES,Field.Index.TOKENIZED));

modifier.addDocument(doc1);

modifier.addDocument(doc2);

System.out.println(modifier.docCount());

modifier.setUseCompoundFile(false);

modifier.close();

IndexModifier mo=new IndexModifier("C:\\Q1",new StandardAnalyzer(),false);

mo.deleteDocument(0);

System.out.println(mo.docCount());

mo.close();

}

1.7. IndexSearcher類
1.7.1.構(gòu)造方法
IndexSearcher searcher = new IndexSearcher(String path);

IndexSearcher searcher = new IndexSearcher(Directory directory);

IndexSearcher searcher = new IndexSearcher(IndexReader r);

IndexSearcher searcher = new IndexSearcher(IndexReader r,Boolean closeReader);

IndexSearcher searcher = new IndexSearcher(path);

IndexSearcher searcher = new IndexSearcher(FSDirectory.getDirectory(path,false) );

1.7.2.search方法
//返回Hits對(duì)象

public Hits search(Query query)

public Hits search(Query query,Filter filter)

public Hits search(Query query,Sort sort)

public Hits search(Query query,Filter filter,Sort sort)

//檢索只返回得分最高的Document

public TopDocs search(Query query,Filter filter,int n)

public TopDocs search(Weight weight,Filter filter,int n)

public TopFieldDocs search(Weight weight,Filter filter,int n,Sort sort)

public TopFieldDocs search(Query query,Filter filter,int n,Sort sort)

//傳入HitCollector,將結(jié)果保存在HitCollector中

public void search(Query query,HitCollector results)

public void search(Query query,Filter filter,HitCollector results)

public void search(Weight weight,Filter filter,HitCollector results)

1.7.3.Searcher的explain方法
public Explaination explain(Query query,int doc)throws IOException

for(int i=0;i<hits.length()&&i<10;i++)

{

Document d=hits.doc(i);

System.out.println(i+" "+hits.score(i)+" "+d.get("contents"));

System.out.println(searcher.explain(query,hits.id(i)).toString());

}

1.7.4.示例
IndexSearcher searcher = new IndexSearcher(path);

Hits hits = null;

Query query = null;

QueryParser parser =new QueryParser("contents", new StandardAnalyzer());

query =parser.parse("11");

hits = searcher.search(query);

System.out.println("查找 word1 共" + hits.length() + "個(gè)結(jié)果");

for(int i=0;i<hits.length()&&i<10;i++)

{

Document d=hits.doc(i);

System.out.println(d+" "+i+" "+hits.score(i)+" "+d.get("contents"));

}

searcher.close();

1.8. Hits類
1.8.1.概述
Hits類――檢索結(jié)果

1.8.2.常用方法

方法名
描述

int length()
返回搜索到結(jié)果的總數(shù)量

Document doc(int i)
返回第i個(gè)文檔

int id(int i)
返回第i個(gè)文檔的內(nèi)部ID號(hào)

float score(int i)
返回第i個(gè)文檔的得分

Iterator iterator()
取得Hits集合的遍歷對(duì)象

1.8.3.示例
for(int i=0;i<hits.length()&&i<10;i++)

{

Document d=hits.doc(i);

System.out.println(d+" "+" "+hits.score(i)+" "+d.get("contents"));

System.out.println("文檔的內(nèi)部ID號(hào):" + hits.id(i));

}

1.9. QueryParser類
1.9.1.改變默認(rèn)的布爾邏輯
Ø 默認(rèn)為“或”關(guān)系

Query query = null;

QueryParser parser =new QueryParser("contents", new StandardAnalyzer());

query =parser.parse("hello world!");

System.out.println(query.toString());

Ø 改變默認(rèn)布爾邏輯

Query query = null;

QueryParser parser =new QueryParser("contents", new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

query =parser.parse("hello world");//若world后加！會(huì)出錯(cuò)

System.out.println(query.toString());

Ø AND OR NOT – 關(guān)鍵字

也可以不用改變默認(rèn)布爾邏輯，而直接讓用戶在輸入關(guān)鍵字時(shí)指定不同詞條間的布爾聯(lián)系。例如，用戶輸入 hello AND world 必須為大寫

邏輯與：AND （大寫）

邏輯或：OR （大寫）

邏輯非：- 例如： hello - world

也可以是NOT 例如： hello NOT world

1.9.2.不需要分詞
不進(jìn)行分詞，將其完整的作為一個(gè)詞條進(jìn)行處理，則需要在詞組的外面加上引號(hào)

String queryStr="\"God helps those who help themselves\"";

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

Query query=parser.parse(queryStr);

System.out.println(query.toString());

1.9.3.設(shè)置坡度值,支持FuzzyQuery
String queryStr="\"God helps those who help themselves\"~1";//設(shè)置坡度為1

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

Query query=parser.parse(queryStr);

System.out.println(query.toString());

1.9.4.設(shè)置通配符，支持WildcardQuery
String queryStr="wor?"

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

Query query=parser.parse(queryStr);

System.out.println(query.toString());

1.9.5.查找指定的Field
String queryStr="linux publishdate:2006-09-01";

QueryParser parser = new QueryParser("bookname",new StandardAnalyzer());

parser.setDefaultOperator(QueryParser.AND_OPERATOR);

Query query=parser.parse(queryStr);

System.out.println(query.toString());

例如：要求用戶選擇某一方面的

1.9.6.范圍的查找，支持RangeQuery
String queryStr="[1990-01-01 TO 1998-12-31]";

QueryParser parser=new QueryParser("publishdate",

new StandardAnalyzer());

Query query=parser.parse(queryStr);

System.out.println(query.toString());

輸出結(jié)果為publishdate:[081xmghs0 TO 0boeetj3z]

因?yàn)榻⑺饕龝r(shí)，如果按照日期表示的字符串來進(jìn)行索引，實(shí)際上比較的是字符串的字典順序。而首先將日期轉(zhuǎn)為以毫秒計(jì)算的時(shí)間后，則可以精確地比較兩個(gè)日期的大小了。于是，lucene提供DateTools工具，用來完成其內(nèi)部對(duì)時(shí)間的轉(zhuǎn)化和處理，將毫秒級(jí)的時(shí)間轉(zhuǎn)化為一個(gè)長(zhǎng)字符串來進(jìn)行表示，并進(jìn)行索引。所以，遇到日期型數(shù)據(jù)時(shí)，最好用DateTools進(jìn)行轉(zhuǎn)換，再進(jìn)行索引！

1.9.7.現(xiàn)在還不支持SpanQuery
1.10. MultiFieldQueryParser類--多域搜索
//在不同的Field上進(jìn)行不同的查找

public static Query parse(String []queries,String[] fields,Analyzer analyzer)throws ParseException

//在不同的Field上進(jìn)行同一個(gè)查找，指定它們之間的布爾關(guān)系

public static Query parse(String query,String[] fields,BooleanClause.Occur[] flags,Analyzer analyzer) throws ParseException

//在不同的Field上進(jìn)行不同的查找，指定它們之間的布爾關(guān)系

public static Query parse(String []queries,String [] fields,BooleanClause.Occur[] flags,Analyzer analyzer)throws ParseException

String [] queries={"鋼", "[10 TO 20]"};

String[] fields={“bookname”,”price”};

BooleanClause.Occur[] clauses={BooleanClause.Occur.MUST,BooleanClause.Occur.MUST};

Query query=MultiFieldQueryParser.parse(queries,fields,clauses,new StandardAnalyzer());

System.out.println(query.toString());

1.11. MultiSearcher類--多個(gè)索引搜索
IndexSearcher searcher1=new IndexSearcher(path1);

IndexSearcher searcher2=new IndexSearcher(path2);

IndexSeacher [] searchers={searcher1,seacher2};

MultiSearcher searcher=new MultiSearcher(searchers);

Hits hits=searcher.search(query);

for(int i=0;i<hits.length();i++){

System.out.println(hits.doc(i));

}

1.12. ParalellMultiSearcher類---多線程搜索
IndexSearcher searcher1=new IndexSearcher(path1);

IndexSearcher searcher2=new IndexSearcher(path2);

IndexSearcher [] searchers={searcher1,searcher2};

ParallelMultiSearcher searcher=new ParallelMultiSearcher(searchers);

long start=System.currentTimeMillis();

Hits hits=searcher.search(query);

long end=System.currentTimeMillis();

System.out.println((end-start)+"ms");

本文來自CSDN博客，轉(zhuǎn)載請(qǐng)標(biāo)明出處：http://blog.csdn.net/xiaoping8411/archive/2010/03/23/5409953.aspx

posted on 2010-06-21 10:05 rogerfan 閱讀(320) 評(píng)論(0) 編輯收藏所屬分類: 【開源技術(shù)】

新用戶注冊(cè) 刷新評(píng)論列表


只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關(guān)文章: 【轉(zhuǎn)】微信公眾號(hào)開發(fā)之微信模板消息【轉(zhuǎn)】微信公眾平臺(tái)開發(fā)之模板消息(Java) 【轉(zhuǎn)】Memcached-1.4.4-14 For Win32 or Win64 安裝【轉(zhuǎn)】windows+nginx+memcached+tomcat做負(fù)載均衡【轉(zhuǎn)】windows任務(wù)定時(shí)重啟tomcat 【轉(zhuǎn)】CDN緩存那些事【轉(zhuǎn)】CAS實(shí)現(xiàn)SSO單點(diǎn)登錄原理【轉(zhuǎn)】CAS框架配置詳解【轉(zhuǎn)】nginx1.8.1(穩(wěn)定版本) nginx.conf 配置文件詳解二【轉(zhuǎn)】nginx1.8.1(穩(wěn)定版本) ngixn.conf 配置文件詳解一

JAVA—咖啡館

公告

常用鏈接

留言簿(17)

隨筆分類(542)

隨筆檔案(438)

文章分類(182)

文章檔案(142)

新聞分類

※→ 【JAVA文檔】

※→ 【親人博客】

※→ 【休閑娛樂】

※→ 【友情鏈接】

※→ 【學(xué)習(xí)網(wǎng)站】

※→ 【服務(wù)網(wǎng)站】

※→ 【著名網(wǎng)站】

※→ 【阿里博客】

最新隨筆

搜索

積分與排名

最新評(píng)論

閱讀排行榜

評(píng)論排行榜