综合久久亚洲,中文字幕一区三区,污污在线观看

lucene全文��索应用示例及代码��?

Mon, 14 Jan 2008 02:38:00 GMT

摘要: 使用Lucene实现全文��索，主要有下面三个步骤：(x��) 　　1、徏立烦引库�Q�根据网站新��M��息库中的已有的数据资料徏立Lucene索引文�g�?　　2、通过索引库搜索：(x��)有了索引后，卛_��使用标准的词法分析器或直接的词法分析器实现进行全文检索�?　　3、维护烦引库�Q�网站新��M��息库中的信息�?x��)不断的变动�Q�包括新增、修改及删除�{�，�q�些信息的变动都需要进一步反映到Lucene索引文�g中�? &nbs... 阅读全文

大田�?/a> 2008-01-14 10:38 发表评论

lucene实例使用

Mon, 14 Jan 2008 02:32:00 GMT

摘要: 说明一�?�q�一��文章的用到的lucene,是用2.0版本�?主要在查询的时�?.0版本的lucene与以前的版本有了一些区�? 其实�q�一些代码都是早几个月写�?自己很懒,所以到今天才写到自��q��博客�?高深的文章自己写不了�Q�只能记录下一些简单的记录与点��_(d��)��其中的代码算是自��p��乐的�Q�希望高手不要把重构之类的砸下来... 1、在windows�pȝ��下的的C盘，��Z��个名叫s的文件夹,�?.. 阅读全文

大田�?/a> 2008-01-14 10:32 发表评论

Lucene基本使用介绍

Tue, 13 Feb 2007 03:50:00 GMT

今天用了下Lucene�Q�发现网上虽然也有不��介�l�它的文档，不过很多都偏向介�l�概念呀、设计或者是一些更为深入的东西�Q�对于其入门使用的介�l�性的文档�q�不多，��写了这么一��?br />

Lucene 基本使用介绍

本文的目的不在于对Lucene的概念和设计�q�些�q�行介绍�Q�仅在于介绍怎么样去使用Lucene来达到自己想要的几种常见的全文检索的需求，如果��x��入了解Lucene的话本文不会(x��)带给你什么收��L(f��ng)��。看完本文后��x��深入的了解Lucene误��问：(x��)http://lucene.apache.org

一. 概述

随着�pȝ��信息的越来越多，怎么样从�q�些信息��h��中捞赯��己想要的那一栚w��变得非帔R��要了�Q�全文检索是通常用于解决此类问题的方案，而Lucene则�ؓ(f��)实现全文��索的工具�Q��Q何应用都可通过嵌入它来实现全文��索�?/p>

�? 环境搭徏

从lucene.apache.org上下载最新版本的lucene.jar�Q�将此jar作�ؓ(f��)��目的build path�Q�那么在��目中就可以直接使用lucene了�?/p>

�? 使用说明

3.1. 基本概念

�q�里介绍的主要�ؓ(f��)在��用中�l�常��到一些概念，以大安��比较熟�?zh��n)�的数据库来进行类比的讲解�Q��用Lucene�q�行全文��索的�q�程有点�c�M��数据库的�q�个�q�程�Q�table---à查询相应的字�D�|��查询条�g----à�q�回相应的记录，首先是IndexWriter�Q�通过它徏立相应的索引表，相当于数据库中的table�Q�在构徏此烦引表旉��指定的�ؓ(f��)该烦引表采用何种方式�q�行构徏�Q�也��是说对于其中的记录的字�D�以什么方式来�q�行格式的划分，�q�个在Lucene中称为Analyzer�Q�Lucene提供了几�U�环境下使用的Analyzer�Q�SimpleAnalyzer、StandardAnalyzer、GermanAnalyzer�{�，其中StandardAnalyzer是经�怋�用的�Q�因为它提供了对于中文的支持�Q�在表徏好后我们��需要往里面插入用于索引的记录，在Lucene中这个称为Document�Q�有点类似数据库中table的一行记录，记录中的字段的添加方法，在Lucene中称为Field�Q�这个和数据库中基本一��P��对于Field Lucene分�ؓ(f��)可被索引的，可切分的�Q�不可被切分的，不可被烦引的几种�l�合�c�d��Q�通过�q�几个元素基本上��可以徏立�v索引了。在查询时经常碰到的为另外几个概念，首先是Query�Q�Lucene提供了几�U�经常可以用到的Query�Q�TermQuery、MultiTermQuery、BooleanQuery、WildcardQuery、PhraseQuery、PrefixQuery、PhrasePrefixQuery、FuzzyQuery、RangeQuery、SpanQuery�Q�Query其实也就是指对于需要查询的字段采用什么样的方式进行查询，如模�p�查询、语义查询、短语查询、范围查询、组合查询等�Q�还有就是QueryParser�Q�QueryParser可用于创��Z��同的Query�Q�还有一个MultiFieldQueryParser支持对于多个字段�q�行同一关键字的查询�Q�IndexSearcher概念指的为需要对何目录下的烦引文件进行何�U�方式的分析的查询，有点象对数据库的哪种索引表进行查询�ƈ按一定方式进行记录中字段的分解查询的概念�Q�通过IndexSearcher以及Query卛_��查询出需要的�l�果�Q�Lucene�q�回的�ؓ(f��)Hits.通过遍历Hits可获取返回的�l�果的Document�Q�通过Document则可获取Field中的相关信息了�?/p>

通过对于上面在徏立烦引和全文��索的基本概念的介�l�希望能让你对Lucene建立一定的了解�?/p>

3.2. 全文��索需求的实现

索引建立部分的代码：(x��)

private void createIndex(String indexFilePath) throws Exception{

IndexWriter iwriter=getWriter(indexFilePath);

Document doc=new Document();

doc.add(Field.Keyword("name","jerry"));

doc.add(Field.Text("sender","bluedavy@gmail.com"));

doc.add(Field.Text("receiver","google@gmail.com"));

doc.add(Field.Text("title","用于索引的标�?));

doc.add(Field.UnIndexed("content","不徏立烦引的内容"));

Document doc2=new Document();

doc2.add(Field.Keyword("name","jerry.lin"));

doc2.add(Field.Text("sender","bluedavy@hotmail.com"));

doc2.add(Field.Text("receiver","msn@hotmail.com"));

doc2.add(Field.Text("title","用于索引的第二个标题"));

doc2.add(Field.Text("content","建立索引的内�?));

iwriter.addDocument(doc);

iwriter.addDocument(doc2);

iwriter.optimize();

iwriter.close();

}

private IndexWriter getWriter(String indexFilePath) throws Exception{

boolean append=true;

File file=new File(indexFilePath+File.separator+"segments");

if(file.exists())

append=false;

return new IndexWriter(indexFilePath,analyzer,append);

}

3.2.1. 对于某字�D늚�关键字的模糊查询

Query query=new WildcardQuery(new Term("sender","*davy*"));

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(query);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

3.2.2. 对于某字�D늚�关键字的语义查询

Query query=QueryParser.parse("索引","title",analyzer);

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(query);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

3.2.3. 对于多字�D늚�关键字的查询

Query query=MultiFieldQueryParser.parse("索引",new String[]{"title","content"},analyzer);

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(query);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

3.2.4. 复合查询(多种查询条�g的综合查�?

Query query=MultiFieldQueryParser.parse("索引",new String[]{"title","content"},analyzer);

Query mquery=new WildcardQuery(new Term("sender","bluedavy*"));

TermQuery tquery=new TermQuery(new Term("name","jerry"));

BooleanQuery bquery=new BooleanQuery();

bquery.add(query,true,false);

bquery.add(mquery,true,false);

bquery.add(tquery,true,false);

Searcher searcher=new IndexSearcher(indexFilePath);

Hits hits=searcher.search(bquery);

for (int i = 0; i < hits.length(); i++) {

System.out.println(hits.doc(i).get("name"));

}

�? �ȝ��

�怿�大家通过上面的说明能知道Lucene的一个基本的使用�Ҏ(gu��)��Q�在全文��索时��大家先采用语义时的搜索，先搜索出有意义的内容�Q�之后再�q�行模糊之类的搜索，^_^�Q�这个还是需要根据搜索的需求才能定了，Lucene�q�提供了很多其他更好用的�Ҏ(gu��)��Q�这个就�{�待大家在��用的�q�程中自己去�q�一步的摸烦了，比如对于Lucene本��n提供的Query的更熟练的掌握，对于Filter、Sorter的��用，自己扩展实现Analyzer�Q�自己实现Query�{�等�Q�甚臛_��以去了解一些关于搜索引擎的技�?切词、烦引排�?etc)�{�等�?br />

大田�?/a> 2007-02-13 11:50 发表评论

Lucene In Action ch 4 �W�记(I)--Analysis

Tue, 13 Feb 2007 03:32:00 GMT

本章详细的讨��Z�� Lucene的分析处理过�E�和几个Analyzer.

在indexing�q�程�?要把需要indexing的text分析处理一�? �l�过处理和切�?然后建立index. 而不通的Analyzer有不同的分析规则, 因此在程序中使用Lucene�?选择正确的Analyzer是很重要�?

1.Using Analyzers

在��用Analyzer以前先来看看text�l�过Analyzer分析后的效果�?

Listing 4.1 Visualizing analyzer effects
Analyzing "The quick brown fox jumped over the lazy dogs"
WhitespaceAnalyzer:
    [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
SimpleAnalyzer:
    [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]
StopAnalyzer:
    [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
StandardAnalyzer:
    [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]
　

Analyzing "XY&Z Corporation - xyz@example.com"
WhitespaceAnalyzer:
    [XY&Z] [Corporation] [-] [xyz@example.com]
SimpleAnalyzer:
    [xy] [z] [corporation] [xyz] [example] [com]
StopAnalyzer:
    [xy] [z] [corporation] [xyz] [example] [com]
StandardAnalyzer:
    [xy&z] [corporation] [xyz@example.com]

上面是在下面我们要提到的一个例子的�q�行�l�果. 可以看出不同的Analyzer 是如何来分析text�?在分析The quick brown fox jumped over the lazy dogs �? WhitespaceAnalyzer�?SimpleAnalyzer只是��单的把词分开,建立Term��可以了;而另外两个Analyzer则去掉了stop word. 而在分析XY&Z Corporation - xyz@example.com 的时�?不同的Analyzer 对待 & �?- 的方式也是不一��L(f��ng)�� . 现在对Analysis有个感性的了解,下面来看看不同处理阶�D늚�分析�q�程.

I. Indexing Analysis

�q�记得在ch2 indexing �?讲到 ,在徏立index�?使用IndexWriter 在构造IndexWriter�?要��用到Analyser.如下所�C?

Analyzer analyzer = new StandardAnalyzer();

IndexWriter writer = new IndexWriter(directory,

analyzer, true);

然后��可以��用writer�?document 来indexing�?如下

Document doc = new Document();

doc.add(

Field.Text("title", "This is the title"));

doc.add(

Field.UnStored("contents", "...document contents..."));

writer.addDocument(doc);

使用的是在构造IndexWriter�?指定的Analyzer. 如果要给一个文档单独指定一个Analyzer 可以用下面的一个方�?

writer.addDocument(doc,analyzer);

II.QueryParser Analysis

Analysis 是term搜烦的关�?要确保经�q�Analyzer分析后的term和被索引的一�?�q�样才可以得到搜索结�?在��用QueryParser parse 用户输入的搜索表辑ּ�时可�?指定一个Analyzer 如下所�C?

Query query = QueryParser.parse(expression, "contents",

analyzer);

通过QueryParser的静态方法实�? 如果使用QueryParser实例, 则可以在构造QueryParser时�?提供一个Analyzer 如下:

QueryParser parser = new QueryParser("contents",

analyzer);

query = parser.parse(expression);

QueryParser

analyzes individual pieces of the expression, not the expression as a

whole, which may include operators, parenthesis, and other special expression

syntax to denote range, wildcard, and fuzzy searches.

QueryParser �q�等的分析所有的text,她�ƈ不知道他们是如何每indxed, �q�时如果当搜索一个被索引为Keyword的filed�?��可能会(x��)遇到问题.

�q�有一个问题就是在分析一些包含其他元素的text时该如何处理 ,�?Html xml 文档, 他们都带有元素标�{?而这些标�{�一般是不烦引的.以及如何处理分域(field)索引, �?Html 有Header �?Body�?如何分开搜烦 �q�个问题Analyzer现在也不能解决的, 因�ؓ(f��)在每�ơAnalyzer都处理单个域. 在后面我们在�q�一步讨��问题.

2. Analyzing the Analyzer

要详�l�了解Lucene分析文本的过�E�就要知道Analyzer是如何工作的,下面��来看看Analyzer是怎么工作的吧. Analyzer是各个XXXAnalyzer的基�c?,该类出奇的简�?比我惌��的要��单多�? 只要一个方�?tokenStream(String fieldName, Reader reader); fieldName 参数�Ҏ(gu��)��些Analyzer实现是没有作用的,如SimpleAnalyzer, 该类的代码如�?

public final class SimpleAnalyzer extends Analyzer {

public TokenStream tokenStream(String fieldName, Reader reader) {

return new LowerCaseTokenizer(reader);

}

可以看到该类也是出奇的简�? 只用��C��LowerCaseTokenizer; 但LowerCaseTokenizer是干什么的�? 看看名字��可以猜个差不多�?,

该类把Text 中非字母(nonletters)的字�W�去�?�q�把所有Text转换为小�?

而返回的

TokenStream 是一�?enumerator-like class ,通过她可以得到连�l�的 Tokens,当到达末��时候返回null.

大田�?/a> 2007-02-13 11:32 发表评论

Lucene In Action ch 3 �W�记--Add search

Tue, 13 Feb 2007 03:31:00 GMT

今天看看 ch3, Add search to your Application. 真正开始��?Lucene search 来搜索你的目标了.

1. 实现一个简单的search feature

在本章中只限于讨论简单Lucene 搜烦API, 有下面几个相关的�c?

Lucene 基本搜烦API:

�c?/p>	功能
IndexSearcher	搜烦一个index的入�?所有的searches都是通过IndexSearcher 实例的几个重载的�Ҏ(gu��)��实现�?
Query (and subclasses)	各个子类��装了特定搜索类型的逻辑(logic),Query实例传递给IndexSearcher的search�Ҏ(gu��)��.
QueryParser	处理一个可�ȝ��表达�?转换��Z��个具体的Query实例.
Hits	包含了搜索的�l�果.有IndexSearcher的search函数�q�回.

下面我们来看几个书中的例�?

LiaTestCase.java 一个��承自TestCase �q�且扩展了TestCase的类, 下面的几个例子都�l�承自该�c?

01 package lia.common; 02 03 import junit.framework.TestCase; 04 import org.apache.lucene.store.FSDirectory; 05 import org.apache.lucene.store.Directory; 06 import org.apache.lucene.search.Hits; 07 import org.apache.lucene.document.Document; 08 09 import java.io.IOException; 10 import java.util.Date; 11 import java.text.ParseException; 12 import java.text.SimpleDateFormat; 13 14 /** 15 * LIA base class for test cases. 16 */ 17 public abstract class LiaTestCase extends TestCase { 18 private String indexDir = System.getProperty("index.dir"); // ��试 index 已经建立好了 19 protected Directory directory; 20 21 protected void setUp() throws Exception { 22 directory = FSDirectory.getDirectory(indexDir, false); 23 } 24 25 protected void tearDown() throws Exception { 26 directory.close(); 27 } 28 29 /** 30 * For troubleshooting ��Z�� 解决问题的方�?/font> 31 */ 32 protected final void dumpHits(Hits hits) throws IOException { 33 if (hits.length() == 0) { 34 System.out.println("No hits"); 35 } 36 37 for (int i=0; i < hits.length(); i++) { 38 Document doc = hits.doc(i); 39 System.out.println(hits.score(i) + ":" + doc.get("title")); 40 } 41 } 42 43 protected final void assertHitsIncludeTitle( 44 Hits hits, String title) 45 throws IOException { 46 for (int i=0; i < hits.length(); i++) { 47 Document doc = hits.doc(i); 48 if (title.equals(doc.get("title"))) { 49 assertTrue(true); 50 return; 51 } 52 } 53 54 fail("title '" + title + "' not found"); 55 } 56 57 protected final Date parseDate(String s) throws ParseException { 58 return new SimpleDateFormat("yyyy-MM-dd").parse(s); 59 } 60 }

I.搜烦一个特定的Term 和利用QueryParser 解析用户输入的表辑ּ�

要利用一个特定的term搜烦,使用QueryTerm��可以了,单个term ��其适合Keyword搜烦. 解析用户输入的表辑ּ�可以更适合用户的��用方�?搜烦表达式的解析有QueryParser来完�?如果表达式解析错�?�?x��)有异常抛�? 可以取得�怿�的错误信�?以便�l�用户适当的提�C?在解析表辑ּ��?�q�需要一个Analyzer 来分析用��L(f��ng)��输入, �q�根据不同的Analyzer来生产相应的Term然后构成Query实例.

下面看个例子�?BasicSearchingTest.java

01 package lia.searching; 02 03 import lia.common.LiaTestCase; 04 import org.apache.lucene.analysis.SimpleAnalyzer; 05 import org.apache.lucene.document.Document; 06 import org.apache.lucene.index.Term; 07 import org.apache.lucene.queryParser.QueryParser; 08 import org.apache.lucene.search.Hits; 09 import org.apache.lucene.search.IndexSearcher; 10 import org.apache.lucene.search.Query; 11 import org.apache.lucene.search.TermQuery; 12 13 public class BasicSearchingTest extends LiaTestCase { 14 15 public void testTerm() throws Exception { 16 IndexSearcher searcher = new IndexSearcher(directory); 17 Term t = new Term("subject", "ant"); // 构造一个Term 18 Query query = new TermQuery(t); 19 Hits hits = searcher.search(query); // 搜烦 20 assertEquals("JDwA", 1, hits.length()); //��试�l�果 21 22 t = new Term("subject", "junit"); 23 hits = searcher.search(new TermQuery(t)); 24 assertEquals(2, hits.length()); 25 26 searcher.close(); 27 } 28 29 public void testKeyword() throws Exception { // ��试关键字搜�?/font> 30 IndexSearcher searcher = new IndexSearcher(directory); 31 Term t = new Term("isbn", "1930110995"); // 关键�?term 32 Query query = new TermQuery(t); 33 Hits hits = searcher.search(query); 34 assertEquals("JUnit in Action", 1, hits.length()); 35 } 36 37 public void testQueryParser() throws Exception { // ��试 QueryParser. 38 IndexSearcher searcher = new IndexSearcher(directory); 39 40 Query query = QueryParser.parse("+JUNIT +ANT -MOCK", 41 "contents", 42 new SimpleAnalyzer()); // 通过解析搜烦表达�?�q�回一个Query实例 43 Hits hits = searcher.search(query); 44 assertEquals(1, hits.length()); 45 Document d = hits.doc(0); 46 assertEquals("Java Development with Ant", d.get("title")); 47 48 query = QueryParser.parse("mock OR junit", 49 "contents", 50 new SimpleAnalyzer()); // 通过解析搜烦表达�?�q�回一个Query实例51 hits = searcher.search(query); 52 assertEquals("JDwA and JIA", 2, hits.length()); 53 } 54 }

大田�?/a> 2007-02-13 11:31 发表评论

Lucene In Action ch 2 �W�记--indexing详解

Tue, 13 Feb 2007 03:29:00 GMT

Lucene In Action ch2 �pȝ��的讲解了 indexing,下面��来看看�?

1,indexing的处理过�E?

首先要把indexing的数据�{换�ؓ(f��)text,因�ؓ(f��)Lucene只能索引text,然后由Analysis来过虑text,把一些ch1中提到的所谓的stop words �q��o�? 然后建立index.建立的index�?font face="NewBaskervilleITCbyBT-Italic" size="3">inverted index 也就是所谓的倒排索引.

2,基本的ingex操作

基本的操�?包括 :��d�� 删除更新.

I . ��d��

下面我们看个例子代码 BaseIndexingTestCase.class

01 package lia.indexing;
02 
03 import org.apache.lucene.store.Directory;
04 import org.apache.lucene.store.FSDirectory;
05 import org.apache.lucene.document.Document;
06 import org.apache.lucene.document.Field;
07 import org.apache.lucene.index.IndexWriter;
08 import org.apache.lucene.index.IndexReader;
09 import org.apache.lucene.analysis.Analyzer;
10 import org.apache.lucene.analysis.SimpleAnalyzer;
11 
12 import junit.framework.TestCase;
13 import java.io.IOException;
14 
15 /**
16  *
17  */
18 public abstract class BaseIndexingTestCase extends TestCase {
19   protected String[] keywords = {"1", "2"};
20   protected String[] unindexed = {"Netherlands", "Italy"};
21   protected String[] unstored = {"Amsterdam has lots of bridges",
22                                  "Venice has lots of canals"};
23   protected String[] text = {"Amsterdam", "Venice"};
24   protected Directory dir;
25   // setUp �Ҏ(gu��)��
26   protected void setUp() throws IOException {
27     String indexDir =
28       System.getProperty("java.io.tmpdir", "tmp") +
29       System.getProperty("file.separator") + "index-dir";
30     dir = FSDirectory.getDirectory(indexDir, true);
31     addDocuments(dir);
32   }
33 
34   protected void addDocuments(Directory dir)
35     throws IOException {
36     IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
37       true);    // 得到indexWriter 实例
38     writer.setUseCompoundFile(isCompound());
39     for (int i = 0; i < keywords.length; i++) {
40       Document doc = new Document();        // ��d��文档
41       doc.add(Field.Keyword("id", keywords[i]));
42       doc.add(Field.UnIndexed("country", unindexed[i]));
43       doc.add(Field.UnStored("contents", unstored[i]));
44       doc.add(Field.Text("city", text[i]));
45       writer.addDocument(doc);
46     }
47     writer.optimize();   // 优化index
48     writer.close();
49   }
50   // 可以覆盖该方法提供不同的Analyzer 
51   protected Analyzer getAnalyzer() {
52     return new SimpleAnalyzer();
53   }
54   // 也可以覆盖该�Ҏ(gu��)�� 指出Compound属�?是否�?

Heterogeneous Documents


55   protected boolean isCompound() {
56     return true;
57   }
58   // ���试��d��文档
59   public void testIndexWriter() throws IOException {
60     IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
61       false);
62     assertEquals(keywords.length, writer.docCount());
63     writer.close();
64   }
65   // ���试IndexReader
66   public void testIndexReader() throws IOException {
67     IndexReader reader = IndexReader.open(dir);
68     assertEquals(keywords.length, reader.maxDoc());
69     assertEquals(keywords.length, reader.numDocs());
70     reader.close();
71   }
72 }

�q�是一个测试超�c?可以被其他的��试用例�l�承来测试不同的功能.上面带有详细的注�?

在添加Field�? �?x��)遇到同义词的情�?��d��同义词由两种方式:

a.创徏一个同义词词组,循环��d��到Single Strng的不同Field�?

b.把同义词��d��C��个Base word的field�?如下:

String baseWord = "fast";

String synonyms[] = String {"quick", "rapid", "speedy"};

Document doc = new Document();

doc.add(Field.Text("word", baseWord));

for (int i = 0; i < synonyms.length; i++) {

doc.add(Field.Text("word", synonyms[i]));

}

�q�样 �?/font>Lucene内部把每个词都添加的一个名为word的Field�?在搜索时你可以��用�Q何一个给定的词语.

大田�?/a> 2007-02-13 11:29 发表评论

Lucene In Action ch 1 �W�记 -- 基本概念

Tue, 13 Feb 2007 03:28:00 GMT

在第一章中作�?主要讲了Lucene 是什�?能用来干什�? 以及一�?indexing �?searching 的例�? 通过例子讲解了一点基�?核心)概念.�l�读者一个基本的Lucene 概况. 然后又介�l�了现在��行�?搜烦框架.

我们主要来看�?�q�个 indexing and searching 例子然后了解一些基本概�?

package lia.meetlucene; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import java.io.File; import java.io.IOException; import java.io.FileReader; import java.util.Date; /** * This code was originally written for * Erik's Lucene intro java.net article */ public class Indexer { public static void main(String[] args) throws Exception { if (args.length != 2) { throw new Exception("Usage: java " + Indexer.class.getName() + " "); } File indexDir = new File(args[0]); // 在该目录中创建Lucene Incex File dataDir = new File(args[1]); // 该目录中存放备烦引的文�g long start = new Date().getTime(); int numIndexed = index(indexDir, dataDir); long end = new Date().getTime(); System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds"); } public static int index(File indexDir, File dataDir) throws IOException { if (!dataDir.exists() || !dataDir.isDirectory()) { throw new IOException(dataDir + " does not exist or is not a directory"); } IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), true); //(1)创徏 Lucene Index writer.setUseCompoundFile(false); indexDirectory(writer, dataDir); int numIndexed = writer.docCount(); writer.optimize(); writer.close(); // close index return numIndexed; } private static void indexDirectory(IndexWriter writer, File dir) throws IOException { File[] files = dir.listFiles(); for (int i = 0; i < files.length; i++) { File f = files[i]; if (f.isDirectory()) { indexDirectory(writer, f); //(2) recurse } else if (f.getName().endsWith(".txt")) { indexFile(writer, f); } } } private static void indexFile(IndexWriter writer, File f) throws IOException { if (f.isHidden() || !f.exists() || !f.canRead()) { return; } System.out.println("Indexing " + f.getCanonicalPath()); Document doc = new Document(); doc.add(Field.Text("contents", new FileReader(f))); // (3) index file content doc.add(Field.Keyword("filename", f.getCanonicalPath())); // (4) index file name writer.addDocument(doc); //(5) add document in Lucene index } }

上面的Indexer 使用了几�?Lucene的API, 来indexing 一个目录下面的文�g. �q�行时�?需要两个参�?, 一个保存index的目录和要烦引的文�g目录.

在上面的�c�M��,需要下面的一些Lucene classes 来执�?indexing 处理:

�?

IndexWriter

�?

综合久久亚洲,中文字幕一区三区,污污在线观看

lucene全文���索应用示例及代码����?

lucene实例使用

Lucene基本使用介绍

Lucene In Action ch 4 �W�记(I)--Analysis

Lucene In Action ch 3 �W�记--Add search

Lucene In Action ch 2 �W�记--indexing详解

Lucene In Action ch 1 �W�记 -- 基本概念

lucene全文��索应用示例及代码��?