隨筆-72 評(píng)論-63 文章-0 trackbacks-0

今天用了下Lucene，發(fā)現(xiàn)網(wǎng)上雖然也有不少介紹它的文檔，不過很多都偏向介紹概念呀、設(shè)計(jì)或者是一些更為深入的東西，對(duì)于其入門使用的介紹性的文檔并不多，就寫了這么一篇。

Lucene 基本使用介紹

?

本文的目的不在于對(duì) Lucene 的概念和設(shè)計(jì)這些進(jìn)行介紹，僅在于介紹怎么樣去使用 Lucene 來達(dá)到自己想要的幾種常見的全文檢索的需求，如果想深入了解 Lucene 的話本文不會(huì)帶給你什么收獲的。看完本文后想更深入的了解 Lucene 請(qǐng)?jiān)L問：http:// lucene.apache.org

?

一.? 概述

隨著系統(tǒng)信息的越來越多，怎么樣從這些信息海洋中撈起自己想要的那一根針就變得非常重要了，全文檢索是通常用于解決此類問題的方案，而 Lucene 則為實(shí)現(xiàn)全文檢索的工具，任何應(yīng)用都可通過嵌入它來實(shí)現(xiàn)全文檢索。

二.? 環(huán)境搭建

從 lucene.apache.org 上下載最新版本的 lucene.jar ，將此 jar 作為項(xiàng)目的 build path ，那么在項(xiàng)目中就可以直接使用 lucene 了。

三.? 使用說明

3.1.?????? 基本概念

這里介紹的主要為在使用中經(jīng)常碰到一些概念，以大家都比較熟悉的數(shù)據(jù)庫來進(jìn)行類比的講解，使用 Lucene 進(jìn)行全文檢索的過程有點(diǎn)類似數(shù)據(jù)庫的這個(gè)過程， table--- à 查詢相應(yīng)的字段或查詢條件 ---- à 返回相應(yīng)的記錄，首先是 IndexWriter ，通過它建立相應(yīng)的索引表，相當(dāng)于數(shù)據(jù)庫中的 table ，在構(gòu)建此索引表時(shí)需指定的為該索引表采用何種方式進(jìn)行構(gòu)建，也就是說對(duì)于其中的記錄的字段以什么方式來進(jìn)行格式的劃分，這個(gè)在 Lucene 中稱為 Analyzer ， Lucene 提供了幾種環(huán)境下使用的 Analyzer ： SimpleAnalyzer 、 StandardAnalyzer 、 GermanAnalyzer 等，其中 StandardAnalyzer 是經(jīng)常使用的，因?yàn)樗峁┝藢?duì)于中文的支持，在表建好后我們就需要往里面插入用于索引的記錄，在 Lucene 中這個(gè)稱為 Document ，有點(diǎn)類似數(shù)據(jù)庫中 table 的一行記錄，記錄中的字段的添加方法，在 Lucene 中稱為 Field ，這個(gè)和數(shù)據(jù)庫中基本一樣，對(duì)于 Field Lucene 分為可被索引的，可切分的，不可被切分的，不可被索引的幾種組合類型，通過這幾個(gè)元素基本上就可以建立起索引了。在查詢時(shí)經(jīng)常碰到的為另外幾個(gè)概念，首先是 Query ， Lucene 提供了幾種經(jīng)常可以用到的 Query ： TermQuery 、 MultiTermQuery 、 BooleanQuery 、 WildcardQuery 、 PhraseQuery 、 PrefixQuery 、 PhrasePrefixQuery 、 FuzzyQuery 、 RangeQuery 、 SpanQuery ， Query 其實(shí)也就是指對(duì)于需要查詢的字段采用什么樣的方式進(jìn)行查詢，如模糊查詢、語義查詢、短語查詢、范圍查詢、組合查詢等，還有就是 QueryParser ， QueryParser 可用于創(chuàng)建不同的 Query ，還有一個(gè) MultiFieldQueryParser 支持對(duì)于多個(gè)字段進(jìn)行同一關(guān)鍵字的查詢， IndexSearcher 概念指的為需要對(duì)何目錄下的索引文件進(jìn)行何種方式的分析的查詢，有點(diǎn)象對(duì)數(shù)據(jù)庫的哪種索引表進(jìn)行查詢并按一定方式進(jìn)行記錄中字段的分解查詢的概念，通過 IndexSearcher 以及 Query 即可查詢出需要的結(jié)果， Lucene 返回的為 Hits . 通過遍歷 Hits 可獲取返回的結(jié)果的 Document ，通過 Document 則可獲取 Field 中的相關(guān)信息了。

通過對(duì)于上面在建立索引和全文檢索的基本概念的介紹希望能讓你對(duì) Lucene 建立一定的了解。

3.2.?????? 全文檢索需求的實(shí)現(xiàn)

索引建立部分的代碼：

private ? void ?createIndex(String?indexFilePath)?throws?Exception {

????????IndexWriter?iwriter = getWriter(indexFilePath);

????????Document?doc = new ?Document();

????????doc.add(Field.Keyword( " name " , " jerry " ));

????????doc.add(Field.Text( " sender " , " bluedavy@gmail.com " ));

????????doc.add(Field.Text( " receiver " , " google@gmail.com " ));

????????doc.add(Field.Text( " title " , " 用于索引的標(biāo)題 " ));

????????doc.add(Field.UnIndexed( " content " , " 不建立索引的內(nèi)容 " ));

????????Document?doc2 = new ?Document();

????????doc2.add(Field.Keyword( " name " , " jerry.lin " ));

????????doc2.add(Field.Text( " sender " , " bluedavy@hotmail.com " ));

????????doc2.add(Field.Text( " receiver " , " msn@hotmail.com " ));

????????doc2.add(Field.Text( " title " , " 用于索引的第二個(gè)標(biāo)題 " ));

????????doc2.add(Field.Text( " content " , " 建立索引的內(nèi)容 " ));

????????iwriter.addDocument(doc);

????????iwriter.addDocument(doc2);

????????iwriter.optimize();

????????iwriter.close();

????}

????

???? private ?IndexWriter?getWriter(String?indexFilePath)?throws?Exception {

????????boolean?append = true ;

????????File?file = new ?File(indexFilePath + File.separator + " segments " );

???????? if (file.exists())

????????????append = false ;?

???????? return ? new ?IndexWriter(indexFilePath,analyzer,append);

????}

3.2.1.?????? 對(duì)于某字段的關(guān)鍵字的模糊查詢

Query?query = new ?WildcardQuery( new ?Term( " sender " , " *davy* " ));

????????

????????Searcher?searcher = new ?IndexSearcher(indexFilePath);

????????Hits?hits = searcher.search(query);

???????? for ?( int ?i? = ? 0 ;?i? < ?hits.length();?i ++ )? {

????????????System. out .println(hits.doc(i). get ( " name " ));

????????}

3.2.2.?????? 對(duì)于某字段的關(guān)鍵字的語義查詢

Query?query = QueryParser.parse( " 索引 " , " title " ,analyzer);

????????

????????Searcher?searcher = new ?IndexSearcher(indexFilePath);

????????Hits?hits = searcher.search(query);

???????? for ?( int ?i? = ? 0 ;?i? < ?hits.length();?i ++ )? {

????????????System. out .println(hits.doc(i). get ( " name " ));

????????}

3.2.3.?????? 對(duì)于多字段的關(guān)鍵字的查詢

Query?query = MultiFieldQueryParser.parse( " 索引 " , new ?String[] { " title " , " content " } ,analyzer);

????????

????????Searcher?searcher = new ?IndexSearcher(indexFilePath);

????????Hits?hits = searcher.search(query);

???????? for ?( int ?i? = ? 0 ;?i? < ?hits.length();?i ++ )? {

????????????System. out .println(hits.doc(i). get ( " name " ));

????????}

3.2.4.?????? 復(fù)合查詢(多種查詢條件的綜合查詢)

Query?query = MultiFieldQueryParser.parse( " 索引 " , new ?String[] { " title " , " content " } ,analyzer);

????????Query?mquery = new ?WildcardQuery( new ?Term( " sender " , " bluedavy* " ));

????????TermQuery?tquery = new ?TermQuery( new ?Term( " name " , " jerry " ));

????????

????????BooleanQuery?bquery = new ?BooleanQuery();

????????bquery.add(query, true , false );

????????bquery.add(mquery, true , false );

????????bquery.add(tquery, true , false );

????????

????????Searcher?searcher = new ?IndexSearcher(indexFilePath);

????????Hits?hits = searcher.search(bquery);

???????? for ?( int ?i? = ? 0 ;?i? < ?hits.length();?i ++ )? {

????????????System. out .println(hits.doc(i). get ( " name " ));

????????}

四.? 總結(jié)

相信大家通過上面的說明能知道 Lucene 的一個(gè)基本的使用方法，在全文檢索時(shí)建議大家先采用語義時(shí)的搜索，先搜索出有意義的內(nèi)容，之后再進(jìn)行模糊之類的搜索， ^_^ ，這個(gè)還是需要根據(jù)搜索的需求才能定了， Lucene 還提供了很多其他更好用的方法，這個(gè)就等待大家在使用的過程中自己去進(jìn)一步的摸索了，比如對(duì)于 Lucene 本身提供的 Query 的更熟練的掌握，對(duì)于 Filter 、 Sorter 的使用，自己擴(kuò)展實(shí)現(xiàn) Analyzer ，自己實(shí)現(xiàn) Query 等等，甚至可以去了解一些關(guān)于搜索引擎的技術(shù) ( 切詞、索引排序 etc) 等等。