生命科學(xué)領(lǐng)域的專業(yè)信息解決方案！

化學(xué)結(jié)構(gòu)搜索，化學(xué)信息學(xué)，生物信息學(xué)，實驗室信息學(xué)等。
以高科技的生物、化學(xué)信息技術(shù)實現(xiàn)生命科學(xué)領(lǐng)域中專業(yè)數(shù)據(jù)的計算和管理、提高研發(fā)能力、增強在科研和成本效率方面的國際競爭力，為生物、化學(xué)、醫(yī)藥和學(xué)術(shù)機構(gòu)提供一流的解決方案和技術(shù)咨詢。

子曰：危邦不入，亂邦不居。天下有道則見，無道則隱。

BlogJava :: 首頁 :: 新隨筆 :: 聯(lián)系 :: 聚合

:: 管理

posts - 431, comments - 344, trackbacks - 0

公告

Don't Repeat Yourself
座右銘：you can lose your money, you can spent all of it, and if you work hard you get it all back. But if you waste your time, you're never gonna get it back.
公告：本博客在此聲明部分文章為轉(zhuǎn)摘，只做資料收集使用。

微信: szhourui
QQ：109450684
Email
：lsi.zhourui@gmail.com

<

2007年1月

>

日

一

二

三

四

五

六

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

31

1

2

3

4

5

6

7

8

9

10

留言簿(15)

隨筆分類(1019)

文章分類(3)

文章檔案(21)

收藏夾

Java
Struts

Link

DHTML 參考手冊
speedtest
UML軟件工程組織
手冊中心

好友博客

Andy Yao
Charlie Zhu
Seal's Blog
叉的博客
姜海英
肖西洋
bio & chem

搜索

積分與排名

積分 - 860072
排名 - 44

閱讀排行榜

Lucene全文檢索小試

HTML 解析器
package com.rain.util;

import Java.io.FileInputStream;
import Java.io.FileNotFoundException;
import Java.io.IOException;
import Java.io.InputStream;
import Java.io.InputStreamReader;
import Java.io.Reader;
import Java.io.UnsupportedEncodingException;

import org.apache.lucene.demo.html.HTMLParser;

public class HTMLDocParser {

private String htmlPath;
private HTMLParser htmlParser;

public HTMLDocParser(String htmlPath){
  this.htmlPath=htmlPath;
  initHtmlParser();
}
public void initHtmlParser(){
  InputStream inputStream=null;
  try{
   inputStream=new FileInputStream(htmlPath);
  }catch(FileNotFoundException e){
   e.printStackTrace();
  }
  if(null!=inputStream){
   try{
    htmlParser=new HTMLParser(new InputStreamReader(inputStream,"utf-8"));
   }catch(UnsupportedEncodingException e){
    e.printStackTrace();
   }
  }
}
public String getTitle(){
  if(null!=htmlParser){
   try{
    return htmlParser.getTitle();
   }catch(IOException e){
    e.printStackTrace();
   }catch(InterruptedException e){
    e.printStackTrace();
   }
  }
  return "";
}
public Reader getContent(){
  if(null!=htmlParser){
   try{
    return htmlParser.getReader();
   }catch(IOException e){
    e.printStackTrace();
   }
  }
  return null;
}
public String getPath(){
  return this.htmlPath;
}
}

描述搜索結(jié)果的結(jié)構(gòu)實體Bean
package com.rain.search;

public class SearchResultBean {
    private String htmlPath;

    private String htmlTitle;

public String getHtmlPath() {
return htmlPath;
}

public void setHtmlPath(String htmlPath) {
this.htmlPath = htmlPath;
}

public String getHtmlTitle() {
return htmlTitle;
}

public void setHtmlTitle(String htmlTitle) {
this.htmlTitle = htmlTitle;
}
}

索引子系統(tǒng)的實現(xiàn)

package com.rain.index;

import Java.io.File;
import Java.io.IOException;
import Java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.document.Field;

import com.rain.util.HTMLDocParser;

public class IndexManager {

//the directory that stores HTML files
private final String dataDir="E:\\dataDir";

//the directory that is used to store a Lucene index
private final String indexDir="E:\\indexDir";

public boolean creatIndex()throws IOException{
  if(true==inIndexExist()){
   return true;
  }
  File dir=new File(dataDir);
  if(!dir.exists()){
   return false;
  }
  File[] htmls=dir.listFiles();
  Directory fsDirectory=FSDirectory.getDirectory(indexDir,true);
  Analyzer analyzer=new StandardAnalyzer();
  IndexWriter indexWriter=new IndexWriter(fsDirectory,analyzer,true);
  for(int i=0;i<htmls.length;i++){
   String htmlPath=htmls[i].getAbsolutePath();
   if(htmlPath.endsWith(".html")||htmlPath.endsWith("htm")){
    addDocument(htmlPath,indexWriter);
   }
  }
  indexWriter.optimize();
  indexWriter.close();
  return true;
}

public void addDocument(String htmlPath,IndexWriter indexWriter){
  HTMLDocParser htmlParser=new HTMLDocParser(htmlPath);
  String path=htmlParser.getPath();
  String title=htmlParser.getTitle();
  Reader content=htmlParser.getContent();

  Document document=new Document();
  document.add(new Field("path",path,Field.Store.YES,Field.Index.NO));
  document.add(new Field("title",title,Field.Store.YES,Field.Index.TOKENIZED));
     document.add(new Field("content",content));
     try{
     indexWriter.addDocument(document);
     }catch(IOException e){
     e.printStackTrace();
     }
}
public String getDataDir(){
  return this.dataDir;
}

public String getIndexDir(){
  return this.indexDir;
}

public boolean inIndexExist(){
  File directory=new File(indexDir);
  if(0<directory.listFiles().length){
   return true;
  }else{
   return false;
  }
}
}

搜索功能的實現(xiàn)
package com.rain.search;

import Java.io.IOException;
import Java.util.ArrayList;
import Java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

import com.rain.index.IndexManager;

public class SearchManager {
private String searchWord;
private IndexManager indexManager;
private Analyzer analyzer;

public SearchManager(String searchWord){
  this.searchWord=searchWord;
  this.indexManager=new IndexManager();
  this.analyzer=new StandardAnalyzer();
}

/**
     * do search
     */
public List search(){
  List searchResult=new ArrayList();
  if(false==indexManager.inIndexExist()){
   try{
    if(false==indexManager.creatIndex()){
     return searchResult;
    }
   }catch(IOException e){
    e.printStackTrace();
    return searchResult;
   }
  }
  IndexSearcher indexSearcher=null;
  try{
   indexSearcher=new IndexSearcher(indexManager.getIndexDir());
  }catch(IOException e){
   e.printStackTrace();
  }
  QueryParser queryParser=new QueryParser("content",analyzer);
  Query query=null;
  try{
   query=queryParser.parse(searchWord);
  }catch(ParseException e){
   e.printStackTrace();
  }
  if(null!=query&&null!=indexSearcher){
   try{
    Hits hits=indexSearcher.search(query);
    for(int i=0;i<hits.length();i++){
     SearchResultBean resultBean=new SearchResultBean();
     resultBean.setHtmlPath(hits.doc(i).get("path"));
     resultBean.setHtmlTitle(hits.doc(i).get("title"));
     searchResult.add(resultBean);
    }
   }catch(IOException e){
    e.printStackTrace();
   }
  }
   return searchResult;
}

}

請求管理器的實現(xiàn)

package com.rain.servlet;

import Java.io.IOException;
import Java.util.List;

import javax.servlet.RequestDispatcher;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.rain.search.SearchManager;

/**
* @author zhourui
* 2007-1-28
*/
public class SearchController extends HttpServlet {
private static final long serialVersionUID=1L;

/* (non-Javadoc)
* @see javax.servlet.http.HttpServlet#doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
*/
@Override
protected void doPost(HttpServletRequest arg0, HttpServletResponse arg1) throws ServletException, IOException {
  // TODO Auto-generated method stub
  String searchWord=arg0.getParameter("searchWord");
  SearchManager searchManager=new SearchManager(searchWord);
  List searchResult=null;
  searchResult=searchManager.search();
  RequestDispatcher dispatcher=arg0.getRequestDispatcher("search.jsp");
  arg0.setAttribute("searchResult",searchResult);
        dispatcher.forward(arg0, arg1);
}

}

向Web服務(wù)器提交搜索請求
<form action="SearchController" method="post">
      <table>
        <tr>
          <td colspan="3">
            SearchWord:<input type="text" name="searchWord" id="searchWord" size="40">
            <input id="doSearch" type="submit" value="search">
          </td>
        </tr>
      </table>
    </form>
顯示搜索結(jié)果
<table class="result">
      <%
        List searchResult=(List)request.getAttribute("searchResult");
        int resultCount=0;
        if(null!=searchResult){
        resultCount=searchResult.size();
        }
        for(int i=0;i<resultCount;i++){
        SearchResultBean resultBean=(SearchResultBean)searchResult.get(i);
        String title=resultBean.getHtmlTitle();
        String path=resultBean.getHtmlPath();
        %>
        <tr>
           <td class="title"><h3><a href="<%=path%>"><%=title%></a></h3></td>
        </tr>
        <%
        }
      %>
    </table>

posted on 2007-01-29 09:57 周銳閱讀(839) 評論(0) 編輯收藏所屬分類: Lucene

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發(fā)表評論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關(guān)文章: 當(dāng)前幾個主要的Lucene中文分詞器的比較【轉(zhuǎn)載】 Lucene全文檢索小試 Lucene基本使用介紹