DANCE WITH JAVA

開發出高質量的系統

隨筆分類(277)

隨筆檔案(238)

閱讀排行榜

常用鏈接

統計

隨筆 - 239
文章 - 0
評論 - 664
引用 - 0

積分與排名

積分 - 986225
排名 - 34

好友之家

apache lucene 一個最簡單的實例

就像每個程序都有一個Hello World來讓人體驗它一樣，lucene也可以很簡單的提供一個實例。如下（來自lucene in action的例子）有兩個類組成：
一個是建立索引

package my;

import java.io.File;

import java.io.FileReader;

import java.io.IOException;

import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

public class Indexer {

public static void main(String[] args) throws Exception {

if (args.length != 2) {

throw new Exception("Usage: java " + Indexer.class.getName()

+ " <index dir> <data dir>");

}

File indexDir = new File(args[0]);

File dataDir = new File(args[1]);

long start = new Date().getTime();

int numIndexed = index(indexDir, dataDir);

long end = new Date().getTime();

System.out.println("Indexing " + numIndexed + " files took "

+ (end - start) + " milliseconds");

}

// open an index and start file directory traversal

public static int index(File indexDir, File dataDir) throws IOException {

if (!dataDir.exists() || !dataDir.isDirectory()) {

throw new IOException(dataDir

+ " does not exist or is not a directory");

}

IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(),

true);

writer.setUseCompoundFile(false);

indexDirectory(writer, dataDir);

int numIndexed = writer.docCount();

writer.optimize();

writer.close();

return numIndexed;

}

// recursive method that calls itself when it finds a directory

private static void indexDirectory(IndexWriter writer, File dir)

throws IOException {

File[] files = dir.listFiles();

for (int i = 0; i < files.length; i++) {

File f = files[i];

if (f.isDirectory()) {

indexDirectory(writer, f);

} else if (f.getName().endsWith(".txt")) {

indexFile(writer, f);

}

// method to actually index file using Lucene

private static void indexFile(IndexWriter writer, File f)

throws IOException {

if (f.isHidden() || !f.exists() || !f.canRead()) {

return;

}

System.out.println("Indexing " + f.getCanonicalPath());

Document doc = new Document();

doc.add(Field.Text("contents", new FileReader(f)));

doc.add(Field.Keyword("filename", f.getCanonicalPath()));

writer.addDocument(doc);

}

另一個是搜索：

package my;

import java.io.File;

import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

public class Searcher {

public static void main(String[] args) throws Exception {

if (args.length != 2) {

throw new Exception("Usage: java " + Searcher.class.getName()

+ " <index dir> <auery>");

}

File indexDir = new File(args[0]);

String q = args[1];

if (!indexDir.exists() || !indexDir.isDirectory()) {

throw new Exception(indexDir

+ " does not exist or is not a directory.");

}

search(indexDir, q);

}

public static void search(File indexDir, String q) throws Exception {

Directory fsDir = FSDirectory.getDirectory(indexDir, false);

IndexSearcher is = new IndexSearcher(fsDir);

Query query = QueryParser.parse(q, "contents", new StandardAnalyzer());

long start = new Date().getTime();

Hits hits = is.search(query);

long end = new Date().getTime();

System.err.println("Found " + hits.length() + " document(s) (in "

+ (end - start) + " milliseconds) that matched query ‘" + q

+ "’:");

for (int i = 0; i < hits.length(); i++) {

Document doc = hits.doc(i);

System.out.println(doc.get("filename"));

}

ok，這樣就簡單實現了，在搜索目錄下所有txt，找出包括某一個字符串的txt文件名的功能。
下篇文章將介紹一下lucene的核心類

posted on 2007-06-12 09:46 dreamstone 閱讀(5097) 評論(5) 編輯收藏所屬分類: 搜索引擎lucence


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: lucene入門合集 lucene的中文分詞器 lucene的豐富的各種查詢（二） lucene的豐富的各種查詢(一) 比較lucene各種英文分析器Analyzer lucene建立索引時候的用到的一些文檔和目錄操作 lucene 索引非txt文檔 (pdf word rtf html xml) apache lucene 的核心類 apache lucene 一個最簡單的實例 apache lucene介紹

# re: apache lucene 一個最簡單的實例 2010-05-25 09:35 yuanfangzhou

# re: apache lucene 一個最簡單的實例 2010-07-15 15:32 solidfish

# re: apache lucene 一個最簡單的實例 2010-07-23 17:34 人字拖

# re: apache lucene 一個最簡單的實例[未登錄] 2011-04-29 21:05 Talin

# re: apache lucene 一個最簡單的實例[未登錄] 2013-11-14 15:35 椰子