??xml version="1.0" encoding="utf-8" standalone="yes"?>青青草视频一区,亚洲综合日本,视频一区在线观看http://www.aygfsteel.com/dreamer/java咖啡zh-cnSat, 17 May 2025 10:52:47 GMTSat, 17 May 2025 10:52:47 GMT60Lucene关键字高亮显C?/title><link>http://www.aygfsteel.com/dreamer/archive/2008/08/06/220386.html</link><dc:creator>轩辕</dc:creator><author>轩辕</author><pubDate>Wed, 06 Aug 2008 03:24:00 GMT</pubDate><guid>http://www.aygfsteel.com/dreamer/archive/2008/08/06/220386.html</guid><wfw:comment>http://www.aygfsteel.com/dreamer/comments/220386.html</wfw:comment><comments>http://www.aygfsteel.com/dreamer/archive/2008/08/06/220386.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/dreamer/comments/commentRss/220386.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/dreamer/services/trackbacks/220386.html</trackback:ping><description><![CDATA[<table style="border-collapse: collapse; word-wrap: break-word" cellspacing="0" cellpadding="0" width="760" align="center" bgcolor="#ffffff" border="0"> <tbody> <tr> <td align="center" height="30"><font style="font-size: 14pt" color="#02368d"><strong>Lucene关键字高亮显C?/strong></font><br /> </td> </tr> <tr> <td align="center" height="9"><img height="9" alt="" src="http://blog.chinaunix.net/templates/default/images/right_line.gif" width="502" border="0" /></td> </tr> <tr> <td align="center"> <table style="border-collapse: collapse; word-wrap: break-word" cellspacing="0" cellpadding="0" width="740" border="0"> <tbody> <tr> <td width="740"> <div class="wmqeeuq" id="art" style="margin: 15px" width="560"> <div>本文转自Q?<a >http://daihaixiang.blog.163.com/blog/static/38301342008416104432532/</a></div> <div> </div> <div> </div> <div> <p>在Lucene的org.apache.lucene.search.highlight包中提供了关于高亮显C检索关键字的工兗用百度、Google搜烦的时候,索结果显C的时候,在摘要中实现与关键字相同的词条进行高亮显C,癑ֺ和Google指定U色高亮昄?/p> <p>有了Lucene提供的高亮显C的工具Q可以很方便地实现高亮显C的功能?/p> <p>高亮昄Q就是根据用戯入的索关键字Q检索找到该关键字对应的索结果文Ӟ提取对应于该文g的摘要文本,然后Ҏ讄的高亮格式,格式写入到摘要文本中对应的与关键字相同或相似的词条上,在网上昄出来Q该摘要中的与关键字有关的文本就会以高亮的格式显C出来?/p> <p>Lucene中org.apache.lucene.search.highlight.SimpleHTMLFormattercd以构造一个高亮格式,q是最单的构造方式,例如Q?/p> <p>SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<font color='red'>", "</font>");</p> <p>构造方法声明ؓpublic SimpleHTMLFormatter(String preTag, String postTag)Q因U高亮格式是依赖于网|件的QHTML文g中是以标?tag)来标识的Q即存在一个preTag和一个postTag?/p> <p>上面构造的高亮格式是摘要中出现的关键字使用U色来显C,区分其它文本?/p> <p>通过构造好的高亮格式对象,来构造一个org.apache.lucene.search.highlight.Highlighter实例Q然后根据对索结果得到的Field的文本内?q里是指摘要文本)q行切分Q找C索关键字相同或相似的词条Q将高亮格式加入到摘要文本中Q返回一个新的、带有格式的摘要文本Q在|页上就可以呈现高亮昄?/p> <p>下面实现一个简单的例子Q展C实现高亮显C的处理q程?/p> <p>试cd下所C:</p> <p>package org.shirdrn.lucene.learn.highlight;</p> <p>import java.io.IOException;<br /> import java.io.StringReader;</p> <p>import net.teamhot.lucene.ThesaurusAnalyzer;</p> <p>import org.apache.lucene.analysis.Analyzer;<br /> import org.apache.lucene.analysis.TokenStream;<br /> import org.apache.lucene.document.Document;<br /> import org.apache.lucene.document.Field;<br /> import org.apache.lucene.index.CorruptIndexException;<br /> import org.apache.lucene.index.IndexWriter;<br /> import org.apache.lucene.queryParser.ParseException;<br /> import org.apache.lucene.queryParser.QueryParser;<br /> import org.apache.lucene.search.Hits;<br /> import org.apache.lucene.search.IndexSearcher;<br /> import org.apache.lucene.search.Query;<br /> import org.apache.lucene.search.highlight.Highlighter;<br /> import org.apache.lucene.search.highlight.QueryScorer;<br /> import org.apache.lucene.search.highlight.SimpleFragmenter;<br /> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;</p> <p>public class MyHighLighter {<br /> <br /> private String indexPath = "F:\\index";<br /> private Analyzer analyzer;<br /> private IndexSearcher searcher;<br /> <br /> public MyHighLighter(){<br />    analyzer = new ThesaurusAnalyzer();<br /> }<br /> <br /> public void createIndex() throws IOException {  <font color="#339966"> // 该方法徏立烦?/font><br />    IndexWriter writer = new IndexWriter(indexPath,analyzer,true);<br />    Document docA = new Document();<br />    String fileTextA = "因ؓ火烧云L燃烧着消失在太阛_下地q线的时刻,然后便是宁静的自然的天籁Q没有谁会在q样的时光的镜片里伤感自语,因ؓ灿烂lh以安静的舒适感?;<br />    Field fieldA = new Field("contents", fileTextA, Field.Store.YES,Field.Index.TOKENIZED);<br />    docA.add(fieldA); <br />   <br />    Document docB = new Document();<br />    String fileTextB = "因ؓ带有以伤痕ؓ代h的美丽风景L让h不由地惴惴不安,紧接着袭面而来的抑或是病痛抑或是灾难,没有谁会能够安逸着恬然Q因为模p让人撕心裂肺地惛_喊?;<br />    Field fieldB = new Field("contents", fileTextB, Field.Store.YES,Field.Index.TOKENIZED);<br />    docB.add(fieldB); <br />   <br />    Document docC = new Document();<br />    String fileTextC = "我喜Ƣ上了一个h孤独地行游,在梦与vz的交接地带炽烈燃烧着?+<br />    "因ؓQ一条孤独的鱼喜Ƣ上了火焰的颜色Q真是荒唐地不合逻辑?;<br />    Field fieldC = new Field("contents", fileTextC, Field.Store.YES,Field.Index.TOKENIZED);<br />    docC.add(fieldC); <br />   <br />    writer.addDocument(docA);<br />    writer.addDocument(docB);<br />    writer.addDocument(docC);<br />    writer.optimize();<br />    writer.close();<br /> }<br /> <br /> public void search(String fieldName,String keyword) throws CorruptIndexException, IOException, ParseException{  <font color="#339966"> // 索的ҎQƈ实现高亮昄<br /> </font>   searcher = new IndexSearcher(indexPath); <br />    QueryParser queryParse = new QueryParser(fieldName, analyzer);    <font color="#339966"> //   构造QueryParserQ解析用戯入的索关键字</font><br />    Query query = queryParse.parse(keyword); <br />    Hits hits = searcher.search(query);<br />    for(int i=0;i<hits.length();i++){<br />     Document doc = hits.doc(i);<br />     String text = doc.get(fieldName);<br />     SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<font color='red'>", "</font>");    <br />             Highlighter highlighter = new Highlighter(simpleHTMLFormatter,new QueryScorer(query));    <br />             highlighter.setTextFragmenter(new SimpleFragmenter(text.length()));       <br />             if (text != null) {    <br />                 TokenStream tokenStream = analyzer.tokenStream(fieldName,new StringReader(text));    <br />                 String highLightText = highlighter.getBestFragment(tokenStream, text); <br />                 System.out.println("★高亮显C第 "+(i+1) +" 条检索结果如下所C:"); <br />                 System.out.println(highLightText);    <br />             } <br />    }<br />    searcher.close();<br /> }<br /> <br /> <br /> public static void main(String[] args) {    <font color="#339966">// 试d?/font><br />    MyHighLighter mhl = new MyHighLighter();<br />    try {<br />     mhl.createIndex();<br />     mhl.search("contents", "因ؓ");<br />    } catch (CorruptIndexException e) {<br />     e.printStackTrace();<br />    } catch (IOException e) {<br />     e.printStackTrace();<br />    } catch (ParseException e) {<br />     e.printStackTrace();<br />    }<br /> }</p> <p>}</p> <p>E序说明Q?/p> <p>1、createIndex()ҎQ用ThesaurusAnalyzer分析器ؓ指定的文本徏立烦引。每个Document中都有一个name为contents的Field。在实际应用中,可以再构造一一个name为path的FieldQ指定检索到的文件的路径(本地路径或者网l上的链接地址)</p> <p>2、根据已l徏好的索引库进行检索。这首先需要解析用戯入的索关键字Q用QueryParserQ必M后台使用的分析器相同Q否则不能保证解析得到的查询(p条构?Query索到合理的结果集?/p> <p>3、根据解析出来的Queryq行索,索结果集保存在Hits中。遍历,提取每个满条g的Document的内容,E序中直接把它的内容当作摘要内容Q实现高亮显C。在实际应用中,应该对应着一个提取摘?或者检索数据库得到索关键字对应的结果集文g的摘要内?的过E。有了摘要以后,可以ؓ摘要内容增加高亮格式?/p> <p>4、如果提取结果集文g的前N个字W串作ؓ摘要Q只需要在 highlighter.setTextFragmenter(new SimpleFragmenter(text.length())); 中设|显C摘要的字数Q这里显C全部的文本作ؓ摘要?/p> <p>q行E序Q结果如下所C:</p> <p>词库未被初始化Q开始初始化词库.<br /> 初始化词库结束。用?3906毫秒;<br /> 共添?95574个词语?br /> ★高亮显C第 1 条检索结果如下所C:<br /> <font color="#ff0000"><font color='red'>因ؓ</font></font>火烧云L燃烧着消失在太阛_下地q线的时刻,然后便是宁静的自然的天籁Q没有谁会在q样的时光的镜片里伤感自?font color="#ff0000">Q?lt;font color='red'>因ؓ</font></font>灿烂lh以安静的舒适感?br /> ★高亮显C第 2 条检索结果如下所C:<br /> <font color="#ff0000"><font color='red'>因ؓ</font></font>带有以伤痕ؓ代h的美丽风景L让h不由地惴惴不安,紧接着袭面而来的抑或是病痛抑或是灾难,没有谁会能够安逸着恬然<font color="#ff0000">Q?lt;font color='red'>因ؓ</font></font>模糊让h撕心裂肺地想呐喊?br /> ★高亮显C第 3 条检索结果如下所C:<br /> 我喜Ƣ上了一个h孤独地行游,在梦与vz的交接地带炽烈燃烧着?font color="#ff0000"><font color='red'>因ؓ</font></font>Q一条孤独的鱼喜Ƣ上了火焰的颜色Q真是荒唐地不合逻辑?/p> <p>上面的检索结果在HTML|页中,׃高亮昄关键?#8220;因ؓ”Q显CZؓU色?/p> </div> </div> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> <img src ="http://www.aygfsteel.com/dreamer/aggbug/220386.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/dreamer/" target="_blank">轩辕</a> 2008-08-06 11:24 <a href="http://www.aygfsteel.com/dreamer/archive/2008/08/06/220386.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Lucene关键字高亮显C?/title><link>http://www.aygfsteel.com/dreamer/archive/2008/08/06/220383.html</link><dc:creator>轩辕</dc:creator><author>轩辕</author><pubDate>Wed, 06 Aug 2008 03:22:00 GMT</pubDate><guid>http://www.aygfsteel.com/dreamer/archive/2008/08/06/220383.html</guid><wfw:comment>http://www.aygfsteel.com/dreamer/comments/220383.html</wfw:comment><comments>http://www.aygfsteel.com/dreamer/archive/2008/08/06/220383.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/dreamer/comments/commentRss/220383.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/dreamer/services/trackbacks/220383.html</trackback:ping><description><![CDATA[<p>package searchfileexample;</p> <p>import javax.servlet.*;<br /> import javax.servlet.http.*;<br /> import java.io.*;<br /> import java.io.IOException;<br /> import java.io.StringReader;</p> <p>import org.apache.lucene.analysis.Analyzer;<br /> import org.apache.lucene.analysis.TokenStream;<br /> import org.apache.lucene.document.Document;<br /> import org.apache.lucene.document.Field;<br /> import org.apache.lucene.index.CorruptIndexException;<br /> import org.apache.lucene.index.IndexWriter;<br /> import org.apache.lucene.queryParser.ParseException;<br /> import org.apache.lucene.queryParser.QueryParser;<br /> import org.apache.lucene.search.Hits;<br /> import org.apache.lucene.search.IndexSearcher;<br /> import org.apache.lucene.search.Query;<br /> import org.apache.lucene.search.highlight.Highlighter;<br /> import org.apache.lucene.search.highlight.QueryScorer;<br /> import org.apache.lucene.search.highlight.SimpleFragmenter;<br /> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;<br /> import org.apache.lucene.analysis.standard.StandardAnalyzer;</p> <p><br /> public class MyHighLighterServlet extends HttpServlet {<br />   private static final String CONTENT_TYPE = "text/html; charset=GB18030";</p> <p>  private String indexPath = "C:\\index";<br />   private Analyzer analyzer;<br />   private IndexSearcher searcher;</p> <p>  //Initialize global variables<br />   public void init() throws ServletException {<br />     analyzer = new StandardAnalyzer();<br />   }<br />   public void createIndex() throws IOException {   // 该方法徏立烦?br />        IndexWriter writer = new IndexWriter(indexPath,analyzer,true);<br />        Document docA = new Document();<br />        String fileTextA = "因ؓ火烧云L燃烧着消失在太阛_下地q线的时刻,然后便是宁静的自然的天籁Q没有谁会在q样的时光的镜片里伤感自语,因ؓ灿烂lh以安静的舒适感?;<br />        Field fieldA = new Field("contents", fileTextA, Field.Store.YES,Field.Index.TOKENIZED);<br />        docA.add(fieldA); <br />   <br />        Document docB = new Document();<br />        String fileTextB = "因ؓ带有以伤痕ؓ代h的美丽风景L让h不由地惴惴不安,紧接着袭面而来的抑或是病痛抑或是灾难,没有谁会能够安逸着恬然Q因为模p让人撕心裂肺地惛_喊?;<br />        Field fieldB = new Field("contents", fileTextB, Field.Store.YES,Field.Index.TOKENIZED);<br />        docB.add(fieldB); <br />   <br />        Document docC = new Document();<br />        String fileTextC = "我喜Ƣ上了一个h孤独地行游,在梦与vz的交接地带炽烈燃烧着?+<br />        "因ؓQ一条孤独的鱼喜Ƣ上了火焰的颜色Q真是荒唐地不合逻辑,原因?;<br />        Field fieldC = new Field("contents", fileTextC, Field.Store.YES,Field.Index.TOKENIZED);<br />        docC.add(fieldC); <br />   <br />        writer.addDocument(docA);<br />        writer.addDocument(docB);<br />        writer.addDocument(docC);<br />        writer.optimize();<br />        writer.close();<br />     }<br />   <br />     public void search(String fieldName,String keyword,PrintWriter out) throws CorruptIndexException, IOException, ParseException{   // 索的ҎQƈ实现高亮昄<br />        searcher = new IndexSearcher(indexPath); <br />        QueryParser queryParse = new QueryParser(fieldName, analyzer);     //   构造QueryParserQ解析用戯入的索关键字<br />        Query query = queryParse.parse(keyword); <br />        Hits hits = searcher.search(query);<br />        for(int i=0;i<hits.length();i++){<br />         Document doc = hits.doc(i);<br />         String text = doc.get(fieldName);<br />         SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<font color='red'>", "</font>");    <br />                 Highlighter highlighter = new Highlighter(simpleHTMLFormatter,new QueryScorer(query));    <br />                 highlighter.setTextFragmenter(new SimpleFragmenter(text.length()));       <br />                 if (text != null) {    <br />                     TokenStream tokenStream = analyzer.tokenStream(fieldName,new StringReader(text));    <br />                     String highLightText = highlighter.getBestFragment(tokenStream, text); <br />                     System.out.println("★高亮显C第 "+(i+1) +" 条检索结果如下所C:"); <br />                     out.println(highLightText);    <br />                 } <br />        }<br />        searcher.close();<br />     }</p> <p>  //Process the HTTP Get request<br />   public void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {<br />     response.setContentType(CONTENT_TYPE);<br />     PrintWriter out = response.getWriter();<br />     out.println("<html>");<br />     out.println("<head><title>MyHighLighterServlet</title></head>");<br />     out.println("<body bgcolor=\"#ffffff\">");</p> <p>   <br />      try {<br />       createIndex();<br />       search("contents", "因ؓ",out);<br />      } catch (CorruptIndexException e) {<br />       e.printStackTrace();<br />      } catch (IOException e) {<br />       e.printStackTrace();<br />      } catch (ParseException e) {<br />       e.printStackTrace();<br />      }</p> <p>    <br />     <br />     <br />     out.println("</body></html>");<br />   }</p> <p>  //Clean up resources<br />   public void destroy() {<br />   }<br /> }</p> <img src ="http://www.aygfsteel.com/dreamer/aggbug/220383.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/dreamer/" target="_blank">轩辕</a> 2008-08-06 11:22 <a href="http://www.aygfsteel.com/dreamer/archive/2008/08/06/220383.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>_blank _self的含?/title><link>http://www.aygfsteel.com/dreamer/archive/2008/07/08/213238.html</link><dc:creator>轩辕</dc:creator><author>轩辕</author><pubDate>Tue, 08 Jul 2008 02:23:00 GMT</pubDate><guid>http://www.aygfsteel.com/dreamer/archive/2008/07/08/213238.html</guid><wfw:comment>http://www.aygfsteel.com/dreamer/comments/213238.html</wfw:comment><comments>http://www.aygfsteel.com/dreamer/archive/2008/07/08/213238.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/dreamer/comments/commentRss/213238.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/dreamer/services/trackbacks/213238.html</trackback:ping><description><![CDATA[_blank -- 打开一个新H体 <br /> _parent -- 在父H体中打开 <br /> _self -- 在本|开,此ؓ默认?<br /> _top -- 在上层窗体中打开 <br /> _search --同时打开搜烦H口 <br /> <br /> 一个对应的框架늚名称 -- 在对应框枉中打开 <img src ="http://www.aygfsteel.com/dreamer/aggbug/213238.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/dreamer/" target="_blank">轩辕</a> 2008-07-08 10:23 <a href="http://www.aygfsteel.com/dreamer/archive/2008/07/08/213238.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>prototype.js开发笔?/title><link>http://www.aygfsteel.com/dreamer/archive/2008/06/05/206071.html</link><dc:creator>轩辕</dc:creator><author>轩辕</author><pubDate>Thu, 05 Jun 2008 07:56:00 GMT</pubDate><guid>http://www.aygfsteel.com/dreamer/archive/2008/06/05/206071.html</guid><wfw:comment>http://www.aygfsteel.com/dreamer/comments/206071.html</wfw:comment><comments>http://www.aygfsteel.com/dreamer/archive/2008/06/05/206071.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/dreamer/comments/commentRss/206071.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/dreamer/services/trackbacks/206071.html</trackback:ping><description><![CDATA[     摘要: Table of Contents 1. Programming Guide 1.1. Prototype是什? 1.2. 兌文章 1.3. 通用性方? 1.3.1. 使用 $()Ҏ 1.3.2. 使用$F()Ҏ 1.3.3. 使用$A()Ҏ 1.3.4. 使用$H()Ҏ 1.3.5. 使用$R()Ҏ 1.3.6. 使用Try.these()?..  <a href='http://www.aygfsteel.com/dreamer/archive/2008/06/05/206071.html'>阅读全文</a><img src ="http://www.aygfsteel.com/dreamer/aggbug/206071.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/dreamer/" target="_blank">轩辕</a> 2008-06-05 15:56 <a href="http://www.aygfsteel.com/dreamer/archive/2008/06/05/206071.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>全文索第二版,分别对TXT,WORD,EXCEL文gq行了处?/title><link>http://www.aygfsteel.com/dreamer/archive/2008/03/19/187293.html</link><dc:creator>轩辕</dc:creator><author>轩辕</author><pubDate>Wed, 19 Mar 2008 08:52:00 GMT</pubDate><guid>http://www.aygfsteel.com/dreamer/archive/2008/03/19/187293.html</guid><wfw:comment>http://www.aygfsteel.com/dreamer/comments/187293.html</wfw:comment><comments>http://www.aygfsteel.com/dreamer/archive/2008/03/19/187293.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/dreamer/comments/commentRss/187293.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/dreamer/services/trackbacks/187293.html</trackback:ping><description><![CDATA[<p>package searchfileexample;</p> <p>/**<br />  * dExcel文g<br />  */<br /> import java.io.*;<br /> import org.apache.poi.hssf.usermodel.HSSFWorkbook;<br /> import org.apache.poi.hssf.usermodel.HSSFSheet;<br /> import org.apache.poi.hssf.usermodel.HSSFCell;<br /> import org.apache.poi.hssf.usermodel.HSSFDateUtil;<br /> import java.util.Date;<br /> import org.apache.poi.hssf.usermodel.HSSFRow;</p> <p>public class ExcelReader {<br />   // 创徏文g输入?br />   private BufferedReader reader = null;</p> <p>  // 文gcd<br />   private String filetype;</p> <p>  // 文g二进制输入流<br />   private InputStream is = null;</p> <p>  // 当前的Sheet<br />   private int currSheet;</p> <p>  // 当前位置<br />   private int currPosition;</p> <p>  // Sheet数量<br />   private int numOfSheets;</p> <p>  // HSSFWorkbook<br />   HSSFWorkbook workbook = null;<br />   // 讄Cell之间以空格分?br />   private static String EXCEL_LINE_DELIMITER = " ";</p> <p>  // 讄最大列?br />   private static int MAX_EXCEL_COLUMNS = 64;</p> <p>  public int rows = 0;<br />   public int getRows() {<br />     return rows;<br />   }</p> <p>  // 构造函数创Z个ExcelReader</p> <p>  public ExcelReader(String inputfile) throws IOException, Exception {<br />     // 判断参数是否为空或没有意?br />     if (inputfile == null || inputfile.trim().equals("")) {<br />       throw new IOException("no input file specified");<br />     }<br />     // 取得文g名的后缀名赋值给filetype<br />     this.filetype = inputfile.substring(inputfile.lastIndexOf(".") + 1);<br />     // 讄开始行?<br />     currPosition = 0;<br />     // 讄当前位置?<br />     currSheet = 0;<br />     // 创徏文g输入?br />     is = new FileInputStream(inputfile);<br />     // 判断文g格式<br />     if (filetype.equalsIgnoreCase("txt")) {<br />       // 如果是txt则直接创建BufferedReaderd<br />       reader = new BufferedReader(new InputStreamReader(is));<br />     }<br />     else if (filetype.equalsIgnoreCase("xls")) {<br />       // 如果是Excel文g则创建HSSFWorkbookd<br />       workbook = new HSSFWorkbook(is);<br />       // 讄Sheet?br />       numOfSheets = workbook.getNumberOfSheets();<br />     }<br />     else {<br />       throw new Exception("File Type Not Supported");<br />     }<br />   }</p> <p>  // 函数readLined文g的一?br />   public String readLine() throws IOException {<br />     // 如果是txt文g则通过readerd<br />     if (filetype.equalsIgnoreCase("txt")) {<br />       String str = reader.readLine();<br />       // I则略去,直接d下一?br />       while (str.trim().equals("")) {<br />         str = reader.readLine();<br />       }<br />       return str;<br />     }<br />     // 如果是XLS文g则通过POI提供的APId文g<br />     else if (filetype.equalsIgnoreCase("xls")) {<br />       // ҎcurrSheetD得当前的sheet<br />       HSSFSheet sheet = workbook.getSheetAt(currSheet);<br />       rows = sheet.getLastRowNum();<br />       // 判断当前行是否到但前Sheet的结?br />       if (currPosition > sheet.getLastRowNum()) {<br />         // 当前行位|清?br />         currPosition = 0;<br />         // 判断是否q有Sheet<br />         while (currSheet != numOfSheets - 1) {<br />           // 得到下一张Sheet<br />           sheet = workbook.getSheetAt(currSheet + 1);<br />           // 当前行数是否已经到达文g末尾<br />           if (currPosition == sheet.getLastRowNum()) {<br />             // 当前Sheet指向下一张Sheet<br />             currSheet++;<br />             continue;<br />           }<br />           else {<br />             // 获取当前行数<br />             int row = currPosition;<br />             currPosition++;<br />             // d当前行数?br />             return getLine(sheet, row);<br />           }<br />         }<br />         return null;<br />       }<br />       // 获取当前行数<br />       int row = currPosition;<br />       currPosition++;<br />       // d当前行数?br />       return getLine(sheet, row);<br />     }<br />     return null;<br />   }</p> <p>  // 函数getLineq回Sheet的一行数?br />   private String getLine(HSSFSheet sheet, int row) {<br />     // Ҏ行数取得Sheet的一?br />     HSSFRow rowline = sheet.getRow(row);<br />     // 创徏字符创缓冲区<br />     StringBuffer buffer = new StringBuffer();<br />     // 获取当前行的列数<br />     int filledColumns = rowline.getLastCellNum();<br />     HSSFCell cell = null;<br />     // 循环遍历所有列<br />     for (int i = 0; i < filledColumns; i++) {<br />       // 取得当前Cell<br />       cell = rowline.getCell( (short) i);<br />       String cellvalue = null;<br />       if (cell != null) {<br />         // 判断当前Cell的Type<br />         switch (cell.getCellType()) {<br />           // 如果当前Cell的Type为NUMERIC<br />           case HSSFCell.CELL_TYPE_NUMERIC: {<br />             // 判断当前的cell是否为Date<br />             if (HSSFDateUtil.isCellDateFormatted(cell)) {<br />               // 如果是Datecd则,取得该Cell的Date?br />               Date date = cell.getDateCellValue();<br />               // 把Date转换成本地格式的字符?br />               cellvalue = cell.getDateCellValue().toLocaleString();<br />             }<br />             // 如果是纯数字<br />             else {<br />               // 取得当前Cell的数?br />               Integer num = new Integer( (int) cell<br />                                         .getNumericCellValue());<br />               cellvalue = String.valueOf(num);<br />             }<br />             break;<br />           }<br />           // 如果当前Cell的Type为STRIN<br />           case HSSFCell.CELL_TYPE_STRING:</p> <p>            // 取得当前的Cell字符?br />             cellvalue = cell.getStringCellValue().replaceAll("'", "''");<br />             break;<br />             // 默认的Cell?br />           default:<br />             cellvalue = " ";<br />         }<br />       }<br />       else {<br />         cellvalue = "";<br />       }<br />       // 在每个字D之间插入分割符<br />       buffer.append(cellvalue).append(EXCEL_LINE_DELIMITER);<br />     }<br />     // 以字W串q回该行的数?br />     return buffer.toString();<br />   }</p> <p>  // close函数执行的关闭操作<br />   public void close() {<br />     // 如果is不ؓI,则关闭InputSteam文g输入?br />     if (is != null) {<br />       try {<br />         is.close();<br />       }<br />       catch (IOException e) {<br />         is = null;<br />       }<br />     }<br />     // 如果reader不ؓI则关闭BufferedReader文g输入?br />     if (reader != null) {<br />       try {<br />         reader.close();<br />       }<br />       catch (IOException e) {<br />         reader = null;<br />       }<br />     }<br />   }</p> <p>  public static void main(String[] args) {<br />     try {<br />       ExcelReader er = new ExcelReader("d:\\xp.xls");<br />       String line = er.readLine();<br />       while (line != null) {<br />         System.out.println(line);<br />         line = er.readLine();<br />       }<br />       er.close();<br />     }<br />     catch (Exception e) {<br />       e.printStackTrace();<br />     }<br />   }</p> <p>}<br /> <br /> </p> <p>package searchfileexample;</p> <p>import javax.servlet.*;<br /> import javax.servlet.http.*;<br /> import java.io.*;<br /> import java.util.*;</p> <p>import org.apache.lucene.analysis.standard.StandardAnalyzer;<br /> import org.apache.lucene.index.IndexWriter;</p> <p>import java.io.File;<br /> import java.io.FileNotFoundException;<br /> import java.io.IOException;<br /> import java.util.Date;<br /> import org.apache.lucene.demo.FileDocument;<br /> import org.apache.lucene.document.Document;<br /> import org.apache.lucene.document.Field;<br /> import java.io.FileReader;<br /> import org.apache.lucene.index.*;<br /> import java.text.DateFormat;<br /> import org.apache.poi.hdf.extractor.WordDocument;<br /> import java.io.InputStream;<br /> import java.io.StringWriter;<br /> import java.io.PrintWriter;<br /> import java.io.FileInputStream;<br /> import java.io.*;<br /> import org.textmining.text.extraction.WordExtractor;<br /> import org.apache.poi.hssf.usermodel.HSSFWorkbook;</p> <p>/**<br />  * l某个目录下的所有文件生成烦?br />  * <p>Title: </p><br />  * <p>Description: </p><br />  * <p>Copyright: Copyright (c) 2007</p><br />  * <p>Company: </p><br />  * @author not attributable<br />  * @version 1.0<br />  * Ҏ文g的不同,可以把烦引文件创建到不同的文件夹下去Q这样可以分cM存烦引信息?br />  */</p> <p>public class IndexFilesServlet<br />     extends HttpServlet {<br />   static final File INDEX_DIR = new File("index");</p> <p>  //Initialize global variables<br />   public void init() throws ServletException {<br />   }</p> <p>  //Process the HTTP Get request<br />   public void service(HttpServletRequest request, HttpServletResponse response) throws<br />       ServletException, IOException {<br />     final File docDir = new File("a"); //需要生成烦引的文g的文件夹<br />     if (!docDir.exists() || !docDir.canRead()) {<br />       System.out.println("Document directory '" + docDir.getAbsolutePath() +<br />                          "' does not exist or is not readable, please check the path");<br />       System.exit(1);<br />     }</p> <p>    Date start = new Date();<br />     try {<br />       IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true); //true-覆盖原有的烦?false-不覆盖原有的索引<br />       System.out.println("Indexing to directory '" + INDEX_DIR + "'...");<br />       indexDocs(writer, docDir);<br />       System.out.println("Optimizing...");<br />       writer.optimize();<br />       writer.close();</p> <p>      Date end = new Date();<br />       System.out.println(end.getTime() - start.getTime() +<br />                          " total milliseconds");</p> <p>    }<br />     catch (IOException e) {<br />       System.out.println(" caught a " + e.getClass() +<br />                          "\n with message: " + e.getMessage());<br />     }</p> <p>  }</p> <p>  //Clean up resources<br />   public void destroy() {<br />   }</p> <p>  public void indexDocs(IndexWriter writer, File file) throws IOException {<br />     // do not try to index files that cannot be read<br />     int index = 0;<br />     String filehouzui = "";<br />     index = file.getName().indexOf(".");<br />     //strFileName = strFileName.substring(0, index) +DateUtil.getCurrDateTime() + "." + strFileName.substring(index + 1);<br />     filehouzui = file.getName().substring(index + 1);</p> <p>    if (file.canRead()) {<br />       if (file.isDirectory()) {<br />         String[] files = file.list();<br />         // an IO error could occur<br />         if (files != null) {<br />           for (int i = 0; i < files.length; i++) {<br />             indexDocs(writer, new File(file, files[i]));<br />           }<br />         }<br />       }<br />       else {<br />         System.out.println("adding " + file);<br />         try {<br />           if (filehouzui.equals("doc")) {<br />             writer.addDocument(getWordDocument(file, new FileInputStream(file)));<br />           }<br />           else if (filehouzui.equals("txt")) {<br />             writer.addDocument(getTxtDocument(file, new FileInputStream(file)));<br />           }<br />           else if (filehouzui.equals("xls")) {<br />             writer.addDocument(getExcelDocument(file, new FileInputStream(file)));<br />           }<br />           //writer.addDocument(parseFile(file));</p> <p>          //writer.addDocument(FileDocument.Document(file));//path 存放文g的相对\?br />         }<br />         // at least on windows, some temporary files raise this exception with an "access denied" message<br />         // checking if the file can be read doesn't help<br />         catch (Exception fnfe) {<br />           ;<br />         }<br />       }<br />     }<br />   }</p> <p>  /**<br />    *@paramfile<br />    *<br />    *把File变成Document<br />    */<br />   public Document parseFile(File file) throws Exception {<br />     Document doc = new Document();<br />     doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                       Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />     try {<br />       doc.add(new Field("contents", new FileReader(file))); //索引文g内容<br />       doc.add(new Field("title", file.getName(), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED));<br />       //索引最后修Ҏ?br />       doc.add(new Field("modified",<br />                         String.valueOf(DateFormat.<br />                                        getDateTimeInstance().format(new<br />           Date(file.lastModified()))), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED));<br />       //doc.removeField("title");<br />     }<br />     catch (Exception e) {<br />       e.printStackTrace();<br />     }<br />     return doc;<br />   }</p> <p> <br />   /**<br />    *@paramfile<br />    *<br />    *使用POIdword文<br />    * 不太好用Q读取word文不全<br />    */<br />   public Document getDocument(File file, FileInputStream is) throws Exception {<br />     String bodyText = null;<br />     try {<br />       WordDocument wd = new WordDocument(is);<br />       StringWriter docTextWriter = new StringWriter();<br />       wd.writeAllText(new PrintWriter(docTextWriter));<br />       bodyText = docTextWriter.toString();<br />       docTextWriter.close();<br />       //   bodyText   =   new   WordExtractor().extractText(is);<br />       System.out.println("word content====" + bodyText);<br />     }<br />     catch (Exception e) {<br />       ;<br />     }<br />     if ( (bodyText != null)) {<br />       Document doc = new Document();<br />       doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />       doc.add(new Field("contents", bodyText, Field.Store.YES,<br />                         Field.Index.TOKENIZED));<br />       return doc;<br />     }<br />     return null;<br />   }</p> <p>  //Document   doc   =   getDocument(new   FileInputStream(new   File(file)));<br />   /**<br />    *@paramfile<br />    *<br />    *使用tm-extractors-0.4.jardword文<br />    * 好用<br />    */<br />   public Document getWordDocument(File file, FileInputStream is) throws<br />       Exception {<br />     String bodyText = null;<br />     try {<br />       WordExtractor extractor = new WordExtractor();<br />       System.out.println("word文档");<br />       bodyText = extractor.extractText(is);<br />       if ( (bodyText != null)) {<br />         Document doc = new Document();<br />         doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                           Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />         doc.add(new Field("contents", bodyText, Field.Store.YES,<br />                           Field.Index.TOKENIZED));<br />         System.out.println("word content====" + bodyText);<br />         return doc;<br />       }<br />     }<br />     catch (Exception e) {<br />       ;<br />     }<br />     return null;<br />   }</p> <p>  /**<br />    *@paramfile<br />    *<br />    *dTXT文<br />    */<br />   public Document getTxtDocument(File file, FileInputStream is) throws<br />       Exception {<br />     try {<br />       Reader textReader = new FileReader(file);<br />       Document doc = new Document();<br />       doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />       doc.add(new Field("contents", textReader));<br />       return doc;<br />     }<br />     catch (Exception e) {<br />       ;<br />     }<br />     return null;<br />   }</p> <p>  /**<br />    * 使用POIdExcel文g<br />    * @param file File<br />    * @param is FileInputStream<br />    * @throws Exception<br />    * @return Document<br />    */<br />   public Document getExcelDocument(File file, FileInputStream is) throws<br />       Exception {<br />     String bodyText = "";<br />     try {<br />       System.out.println("dexcel文g");<br />       ExcelReader er = new ExcelReader(file.getAbsolutePath());<br />       bodyText = er.readLine();<br />       int rows = 0;<br />       rows = er.getRows();<br />       for (int i = 0; i < rows; i++) {<br />         bodyText = bodyText + er.readLine();<br />         System.out.println("bodyText===" + bodyText);<br />       }<br />       Document doc = new Document();<br />       doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />       doc.add(new Field("contents", bodyText, Field.Store.YES,<br />                         Field.Index.TOKENIZED));<br />       System.out.println("word content====" + bodyText);<br />       return doc;<br />     }<br />     catch (Exception e) {<br />       System.out.println(e);<br />     }<br />     return null;<br />   }<br /> }<br /> </p> <p><br />  </p> <p>package searchfileexample;</p> <p>import javax.servlet.*;<br /> import javax.servlet.http.*;<br /> import java.io.*;<br /> import java.util.*;</p> <p>import org.apache.lucene.analysis.Analyzer;<br /> import org.apache.lucene.analysis.standard.StandardAnalyzer;<br /> import org.apache.lucene.document.Document;<br /> import org.apache.lucene.index.FilterIndexReader;<br /> import org.apache.lucene.index.IndexReader;<br /> import org.apache.lucene.queryParser.QueryParser;<br /> import org.apache.lucene.search.Hits;<br /> import org.apache.lucene.search.IndexSearcher;<br /> import org.apache.lucene.search.Query;<br /> import org.apache.lucene.search.Searcher;</p> <p>import java.io.BufferedReader;<br /> import java.io.FileReader;<br /> import java.io.IOException;<br /> import java.io.InputStreamReader;<br /> import java.util.Date;<br /> import org.apache.lucene.queryParser.*;</p> <p>public class SearchFileServlet<br />     extends HttpServlet {<br />   private static final String CONTENT_TYPE = "text/html; charset=GBK";</p> <p>  //Initialize global variables<br />   public void init() throws ServletException {<br />   }</p> <p>  /** Use the norms from one field for all fields.  Norms are read into memory,<br />    * using a byte of memory per document per searched field.  This can cause<br />    * search of large collections with a large number of fields to run out of<br />    * memory.  If all of the fields contain only a single token, then the norms<br />    * are all identical, then single norm vector may be shared. */<br />   private static class OneNormsReader<br />       extends FilterIndexReader {<br />     private String field;</p> <p>    public OneNormsReader(IndexReader in, String field) {<br />       super(in);<br />       this.field = field;<br />     }</p> <p>    public byte[] norms(String field) throws IOException {<br />       return in.norms(this.field);<br />     }<br />   }</p> <p>  //Process the HTTP Get request<br />   public void service(HttpServletRequest request, HttpServletResponse response) throws<br />       ServletException, IOException {<br />     response.setContentType(CONTENT_TYPE);<br />     PrintWriter out = response.getWriter();</p> <p>    String[] args = {<br />         "a", "b"};<br />     String usage =<br />         "Usage: java org.apache.lucene.demo.SearchFiles [-index dir] [-field f] [-repeat n] [-queries file] [-raw] [-norms field]";<br />     if (args.length > 0 && ("-h".equals(args[0]) || "-help".equals(args[0]))) {<br />       System.out.println(usage);<br />       System.exit(0);<br />     }</p> <p>    String index = "index"; //该值是用来存放生成的烦引文件的文g夹的名称Q不能改?br />     String field = "contents"; //不能修改  field  的?br />     String queries = null; //是用来存N要检索的关键字的一个文件?br />     queries = "D:/lfy_programe/全文?SearchFileExample/aa.txt";<br />     System.out.println("-----------------------" + request.getContextPath());<br />     int repeat = 1;<br />     boolean raw = false;<br />     String normsField = null;</p> <p>    for (int i = 0; i < args.length; i++) {<br />       if ("-index".equals(args[i])) {<br />         index = args[i + 1];<br />         i++;<br />       }<br />       else if ("-field".equals(args[i])) {<br />         field = args[i + 1];<br />         i++;<br />       }<br />       else if ("-queries".equals(args[i])) {<br />         queries = args[i + 1];<br />         i++;<br />       }<br />       else if ("-repeat".equals(args[i])) {<br />         repeat = Integer.parseInt(args[i + 1]);<br />         i++;<br />       }<br />       else if ("-raw".equals(args[i])) {<br />         raw = true;<br />       }<br />       else if ("-norms".equals(args[i])) {<br />         normsField = args[i + 1];<br />         i++;<br />       }<br />     }</p> <p>    IndexReader reader = IndexReader.open(index);</p> <p>    if (normsField != null) {<br />       reader = new OneNormsReader(reader, normsField);</p> <p>    }<br />     Searcher searcher = new IndexSearcher(reader); //用来打开索引文g<br />     Analyzer analyzer = new StandardAnalyzer(); //分析?br />     //Analyzer analyzer = new StandardAnalyzer();</p> <p>    BufferedReader in = null;<br />     if (queries != null) {<br />       in = new BufferedReader(new FileReader(queries));<br />     }<br />     else {<br />       in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));<br />     }<br />     QueryParser parser = new QueryParser(field, analyzer);</p> <p>    out.println("<html>");<br />     out.println("<head><title>SearchFileServlet</title></head>");<br />     out.println("<body bgcolor=\"#ffffff\">");</p> <p>    while (true) {<br />       if (queries == null) { // prompt the user<br />         System.out.println("Enter query: ");</p> <p>      }<br />       String line = in.readLine(); //l成查询关键字字W串<br />       System.out.println("查询字符?==" + line);</p> <p>      if (line == null || line.length() == -1) {<br />         break;<br />       }</p> <p>      line = line.trim();<br />       if (line.length() == 0) {<br />         break;<br />       }</p> <p>      Query query = null;<br />       try {<br />         query = parser.parse(line);<br />       }<br />       catch (ParseException ex) {<br />       }<br />       System.out.println("Searching for: " + query.toString(field)); //每个关键?/p> <p>      Hits hits = searcher.search(query);</p> <p>      if (repeat > 0) { // repeat & time as benchmark<br />         Date start = new Date();<br />         for (int i = 0; i < repeat; i++) {<br />           hits = searcher.search(query);<br />         }<br />         Date end = new Date();<br />         System.out.println("Time: " + (end.getTime() - start.getTime()) + "ms");<br />       }<br />       out.println("<p>查询刎ͼ" + hits.length() + "个含有[" +<br />                   query.toString(field) + "]的文?lt;/p>");</p> <p>      System.out.println("查询刎ͼ" + hits.length() + " 个含?[" +<br />                          query.toString(field) + "]的文?);</p> <p>      final int HITS_PER_PAGE = 10; //查询q回的最大记录数<br />       int currentNum = 5; //当前记录?/p> <p>      for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {<br />         //start = start + currentNum;<br />         int end = Math.min(hits.length(), start + HITS_PER_PAGE);</p> <p>        for (int i = start; i < end; i++) {</p> <p>          //if (raw) {                              // output raw format<br />           System.out.println("doc=" + hits.id(i) + " score=" + hits.score(i)); //score是接q度的意?br />           //continue;<br />           //}</p> <p>          Document doc = hits.doc(i);<br />           String path = doc.get("path");</p> <p>          if (path != null) {<br />             System.out.println( (i + 1) + ". " + path);<br />             out.println("<p>" + (i + 1) + ". " + path + "</p>");<br />             String title = doc.get("title");<br />             System.out.println("   modified: " + doc.get("modified"));<br />             if (title != null) {<br />               System.out.println("   Title: " + doc.get("title"));<br />             }<br />           }<br />           else {<br />             System.out.println( (i + 1) + ". " + "No path for this document");<br />           }<br />         }</p> <p>        if (queries != null) { // non-interactive<br />           break;<br />         }</p> <p>        if (hits.length() > end) {<br />           System.out.println("more (y/n) ? ");<br />           line = in.readLine();<br />           if (line.length() == 0 || line.charAt(0) == 'n') {<br />             break;<br />           }<br />         }<br />       }<br />     }<br />     reader.close();</p> <p>    out.println("</body></html>");<br />   }</p> <p>//Clean up resources<br />   public void destroy() {<br />   }<br /> }<br /> </p> <p><br />  </p> <p> </p> <img src ="http://www.aygfsteel.com/dreamer/aggbug/187293.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/dreamer/" target="_blank">轩辕</a> 2008-03-19 16:52 <a href="http://www.aygfsteel.com/dreamer/archive/2008/03/19/187293.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>全文?/title><link>http://www.aygfsteel.com/dreamer/archive/2008/03/18/186938.html</link><dc:creator>轩辕</dc:creator><author>轩辕</author><pubDate>Tue, 18 Mar 2008 02:35:00 GMT</pubDate><guid>http://www.aygfsteel.com/dreamer/archive/2008/03/18/186938.html</guid><wfw:comment>http://www.aygfsteel.com/dreamer/comments/186938.html</wfw:comment><comments>http://www.aygfsteel.com/dreamer/archive/2008/03/18/186938.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/dreamer/comments/commentRss/186938.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/dreamer/services/trackbacks/186938.html</trackback:ping><description><![CDATA[<p>package searchfileexample;</p> <p>import org.apache.lucene.analysis.standard.StandardAnalyzer;<br /> import org.apache.lucene.index.IndexWriter;</p> <p>import java.io.File;<br /> import java.io.FileNotFoundException;<br /> import java.io.IOException;<br /> import java.util.Date;<br /> import org.apache.lucene.demo.FileDocument;<br /> import org.apache.lucene.document.Document;<br /> import org.apache.lucene.document.Field;<br /> import java.io.FileReader;<br /> import org.apache.lucene.index.*;<br /> import java.text.DateFormat;<br /> import org.apache.poi.hdf.extractor.WordDocument;<br /> import java.io.InputStream;<br /> import java.io.StringWriter;<br /> import java.io.PrintWriter;<br /> import java.io.FileInputStream;<br /> import java.io.*;<br /> import org.textmining.text.extraction.WordExtractor;</p> <p>/**<br />  * l某个目录下的所有文件生成烦?br />  * <p>Title: </p><br />  * <p>Description: </p><br />  * <p>Copyright: Copyright (c) 2007</p><br />  * <p>Company: </p><br />  * @author not attributable<br />  * @version 1.0<br />  * Ҏ文g的不同,可以把烦引文件创建到不同的文件夹下去Q这样可以分cM存烦引信息?br />  */</p> <p>/** Index all text files under a directory. */<br /> public class IndexFiles {</p> <p>  private IndexFiles() {}</p> <p>  static final File INDEX_DIR = new File("index");</p> <p>  /** Index all text files under a directory. */<br />   public static void main(String[] args) {<br />     String usage = "java org.apache.lucene.demo.IndexFiles <root_directory>";<br />     //String[] arg = {"a","b"};<br />     //System.out.println(arg[0]);<br />     /*<br />          if (args.length == 0) {<br />       System.err.println("Usage: " + usage);<br />       System.exit(1);<br />          }*/<br />     /*<br />         if (INDEX_DIR.exists()) {<br />           System.out.println("Cannot save index to '" +INDEX_DIR+ "' directory, please delete it first");<br />           System.exit(1);<br />         }*/</p> <p>    final File docDir = new File("a"); //需要生成烦引的文g的文件夹<br />     if (!docDir.exists() || !docDir.canRead()) {<br />       System.out.println("Document directory '" + docDir.getAbsolutePath() +<br />                          "' does not exist or is not readable, please check the path");<br />       System.exit(1);<br />     }</p> <p>    Date start = new Date();<br />     try {<br />       IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true); //true-覆盖原有的烦?false-不覆盖原有的索引<br />       System.out.println("Indexing to directory '" + INDEX_DIR + "'...");<br />       indexDocs(writer, docDir);<br />       System.out.println("Optimizing...");<br />       writer.optimize();<br />       writer.close();</p> <p>      Date end = new Date();<br />       System.out.println(end.getTime() - start.getTime() +<br />                          " total milliseconds");</p> <p>    }<br />     catch (IOException e) {<br />       System.out.println(" caught a " + e.getClass() +<br />                          "\n with message: " + e.getMessage());<br />     }<br />   }</p> <p>  static void indexDocs(IndexWriter writer, File file) throws IOException {<br />     // do not try to index files that cannot be read<br />     if (file.canRead()) {<br />       if (file.isDirectory()) {<br />         String[] files = file.list();<br />         // an IO error could occur<br />         if (files != null) {<br />           for (int i = 0; i < files.length; i++) {<br />             indexDocs(writer, new File(file, files[i]));<br />           }<br />         }<br />       }<br />       else {<br />         System.out.println("adding " + file);<br />         try {</p> <p>          writer.addDocument(getDocument2(file, new FileInputStream(file)));<br />           //writer.addDocument(parseFile(file));</p> <p>          //writer.addDocument(FileDocument.Document(file));//path 存放文g的相对\?br />         }<br />         // at least on windows, some temporary files raise this exception with an "access denied" message<br />         // checking if the file can be read doesn't help<br />         catch (Exception fnfe) {<br />           ;<br />         }<br />       }<br />     }<br />   }</p> <p>  /**<br />    *@paramfile<br />    *<br />    *把File变成Document<br />    */<br />   static Document parseFile(File file) throws Exception {<br />     Document doc = new Document();<br />     doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                       Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />     try {<br />       doc.add(new Field("contents", new FileReader(file))); //索引文g内容<br />       doc.add(new Field("title", file.getName(), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED));<br />       //索引最后修Ҏ?br />       doc.add(new Field("modified",<br />                         String.valueOf(DateFormat.<br />                                        getDateTimeInstance().format(new<br />           Date(file.lastModified()))), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED));<br />       //doc.removeField("title");<br />     }<br />     catch (Exception e) {<br />       e.printStackTrace();<br />     }<br />     return doc;<br />   }</p> <p>  /**<br />    *@paramfile<br />    *<br />    *转换word文</p> <p>         static String changeWord(File file) throws Exception {<br />     String re = "";<br />     try {<br />       WordDocument wd = new WordDocument(is);<br />         StringWriter docTextWriter = new StringWriter();<br />         wd.writeAllText(new PrintWriter(docTextWriter));<br />         docTextWriter.close();<br />         bodyText = docTextWriter.toString();</p> <p>    } catch (Exception e) {<br />         e.printStackTrace();<br />     }<br />     return re;<br />          }*/<br />   /**<br />    *@paramfile<br />    *<br />    *使用POIdword文<br />    */<br />   static Document getDocument(File file, FileInputStream is) throws Exception {</p> <p>    String bodyText = null;</p> <p>    try {</p> <p>      //BufferedReader wt = new BufferedReader(new InputStreamReader(is));<br />       //bodyText = wt.readLine();<br />       //System.out.println("word ===="+bodyText);</p> <p>      WordDocument wd = new WordDocument(is);<br />       StringWriter docTextWriter = new StringWriter();<br />       wd.writeAllText(new PrintWriter(docTextWriter));<br />       bodyText = docTextWriter.toString();<br />       docTextWriter.close();<br />       //   bodyText   =   new   WordExtractor().extractText(is);<br />       System.out.println("word content====" + bodyText);<br />     }<br />     catch (Exception e) {<br />       ;</p> <p>    }</p> <p>    if ( (bodyText != null)) {<br />       Document doc = new Document();<br />       doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />       doc.add(new Field("contents", bodyText, Field.Store.YES,<br />                         Field.Index.TOKENIZED));</p> <p>      return doc;<br />     }<br />     return null;<br />   }</p> <p>  //Document   doc   =   getDocument(new   FileInputStream(new   File(file)));<br />   /**<br />    *@paramfile<br />    *<br />    *使用tm-extractors-0.4.jardword文<br />    */<br />   static Document getDocument2(File file, FileInputStream is) throws Exception {</p> <p>    String bodyText = null;</p> <p>    try {</p> <p>      //FileInputStream in = new FileInputStream("D:/lfy_programe/全文?SearchFileExample/a/aa.doc");<br />       //  FileInputStream in = new FileInputStream ("D:/szqxjzhbase/技术测?新徏 Microsoft Word 文.doc");<br />       WordExtractor extractor = new WordExtractor();<br />       System.out.println(is.available());</p> <p>      bodyText = extractor.extractText(is);</p> <p>//    System.out.println("the result length is"+str.length());<br />       System.out.println("word content===="+bodyText);</p> <p>    }<br />     catch (Exception e) {<br />       ;</p> <p>    }</p> <p>    if ( (bodyText != null)) {<br />       Document doc = new Document();<br />       doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,<br />                         Field.Index.UN_TOKENIZED)); //取文件的l对路径<br />       doc.add(new Field("contents", bodyText, Field.Store.YES,<br />                         Field.Index.TOKENIZED));</p> <p>      return doc;<br />     }<br />     return null;<br />   }</p> <p>}<br /> </p> <p><br />  </p> <p>package searchfileexample;</p> <p><br /> import org.apache.lucene.analysis.Analyzer;<br /> import org.apache.lucene.analysis.standard.StandardAnalyzer;<br /> import org.apache.lucene.document.Document;<br /> import org.apache.lucene.index.FilterIndexReader;<br /> import org.apache.lucene.index.IndexReader;<br /> import org.apache.lucene.queryParser.QueryParser;<br /> import org.apache.lucene.search.Hits;<br /> import org.apache.lucene.search.IndexSearcher;<br /> import org.apache.lucene.search.Query;<br /> import org.apache.lucene.search.Searcher;</p> <p><br /> import java.io.BufferedReader;<br /> import java.io.FileReader;<br /> import java.io.IOException;<br /> import java.io.InputStreamReader;<br /> import java.util.Date;<br /> import org.apache.lucene.analysis.SimpleAnalyzer;<br /> import org.apache.lucene.analysis.KeywordAnalyzer;<br /> import org.apache.lucene.analysis.WhitespaceAnalyzer;<br /> import org.apache.lucene.document.Fieldable;</p> <p>/** Simple command-line based search demo. */<br /> public class SearchFiles {</p> <p>  /** Use the norms from one field for all fields.  Norms are read into memory,<br />    * using a byte of memory per document per searched field.  This can cause<br />    * search of large collections with a large number of fields to run out of<br />    * memory.  If all of the fields contain only a single token, then the norms<br />    * are all identical, then single norm vector may be shared. */<br />   private static class OneNormsReader extends FilterIndexReader {<br />     private String field;</p> <p>    public OneNormsReader(IndexReader in, String field) {<br />       super(in);<br />       this.field = field;<br />     }</p> <p>    public byte[] norms(String field) throws IOException {<br />       return in.norms(this.field);<br />     }<br />   }</p> <p>  private SearchFiles() {}</p> <p>  /** Simple command-line based search demo. */<br />   public static void main(String[] arg) throws Exception {<br />     String[] args = {"a","b"};<br />     String usage =<br />       "Usage: java org.apache.lucene.demo.SearchFiles [-index dir] [-field f] [-repeat n] [-queries file] [-raw] [-norms field]";<br />     if (args.length > 0 && ("-h".equals(args[0]) || "-help".equals(args[0]))) {<br />       System.out.println(usage);<br />       System.exit(0);<br />     }</p> <p>    String index = "index";//该值是用来存放生成的烦引文件的文g夹的名称Q不能改?br />     String field = "contents";//不能修改  field  的?br />     String queries = null;//是用来存N要检索的关键字的一个文件?br />     queries = "D:/lfy_programe/全文?SearchFileExample/aa.txt";</p> <p>    int repeat = 1;<br />     boolean raw = false;<br />     String normsField = null;</p> <p>    for (int i = 0; i < args.length; i++) {<br />       if ("-index".equals(args[i])) {<br />         index = args[i+1];<br />         i++;<br />       } else if ("-field".equals(args[i])) {<br />         field = args[i+1];<br />         i++;<br />       } else if ("-queries".equals(args[i])) {<br />         queries = args[i+1];<br />         i++;<br />       } else if ("-repeat".equals(args[i])) {<br />         repeat = Integer.parseInt(args[i+1]);<br />         i++;<br />       } else if ("-raw".equals(args[i])) {<br />         raw = true;<br />       } else if ("-norms".equals(args[i])) {<br />         normsField = args[i+1];<br />         i++;<br />       }<br />     }</p> <p>    IndexReader reader = IndexReader.open(index);</p> <p>    if (normsField != null)<br />       reader = new OneNormsReader(reader, normsField);</p> <p>    Searcher searcher = new IndexSearcher(reader);//用来打开索引文g<br />     Analyzer analyzer = new StandardAnalyzer();//分析?br />     //Analyzer analyzer = new StandardAnalyzer();</p> <p>    BufferedReader in = null;<br />     if (queries != null) {<br />       in = new BufferedReader(new FileReader(queries));<br />     } else {<br />       in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));<br />     }<br />       QueryParser parser = new QueryParser(field, analyzer);<br />     while (true) {<br />       if (queries == null)                        // prompt the user<br />         System.out.println("Enter query: ");</p> <p>      String line = in.readLine();//l成查询关键字字W串<br />       System.out.println("查询字符?=="+line);</p> <p>      if (line == null || line.length() == -1)<br />         break;</p> <p>      line = line.trim();<br />       if (line.length() == 0)<br />         break;</p> <p>      Query query = parser.parse(line);<br />       System.out.println("Searching for: " + query.toString(field));//每个关键?/p> <p>      Hits hits = searcher.search(query);</p> <p>      if (repeat > 0) {                           // repeat & time as benchmark<br />         Date start = new Date();<br />         for (int i = 0; i < repeat; i++) {<br />           hits = searcher.search(query);<br />         }<br />         Date end = new Date();<br />         System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");<br />       }</p> <p>      System.out.println("查询刎ͼ" + hits.length() + " 个含?["+query.toString(field)+"]的文?);</p> <p>      final int HITS_PER_PAGE = 10;//查询q回的最大记录数<br />       int currentNum = 2;//当前记录?br />       for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {<br />         //start = start + currentNum;<br />         int end = Math.min(hits.length(), start + HITS_PER_PAGE);<br />         for (int i = start; i < end; i++) {</p> <p>          //if (raw) {                              // output raw format<br />             System.out.println("doc="+hits.id(i)+" score="+hits.score(i));//score是接q度的意?br />             //continue;<br />           //}</p> <p>          Document doc = hits.doc(i);<br />           String path = doc.get("path");</p> <p><br />           if (path != null) {<br />             System.out.println((i+1) + ". " + path);<br />             String title = doc.get("title");<br />             System.out.println("   modified: " + doc.get("modified"));<br />             if (title != null) {<br />               System.out.println("   Title: " + doc.get("title"));<br />             }<br />           } else {<br />             System.out.println((i+1) + ". " + "No path for this document");<br />           }<br />         }</p> <p>        if (queries != null)                      // non-interactive<br />           break;</p> <p>        if (hits.length() > end) {<br />           System.out.println("more (y/n) ? ");<br />           line = in.readLine();<br />           if (line.length() == 0 || line.charAt(0) == 'n')<br />             break;<br />         }<br />       }<br />     }<br />     reader.close();<br />   }<br /> }<br /> </p> <p><br />  </p> <p>package searchfileexample;</p> <p>import javax.servlet.*;<br /> import javax.servlet.http.*;<br /> import java.io.*;<br /> import java.util.*;<br /> import org.textmining.text.extraction.WordExtractor;</p> <p>public class ReadWord extends HttpServlet {<br />   private static final String CONTENT_TYPE = "text/html; charset=GBK";</p> <p>  //Initialize global variables<br />   public void init() throws ServletException {<br />   }</p> <p>  //Process the HTTP Get request<br />   public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {<br />     response.setContentType(CONTENT_TYPE);<br />     FileInputStream in = new FileInputStream ("D:/lfy_programe/全文?SearchFileExample/a/aa.doc");<br />        //  FileInputStream in = new FileInputStream ("D:/szqxjzhbase/技术测?新徏 Microsoft Word 文档.doc");<br />    WordExtractor extractor = new WordExtractor(); <br />    System.out.println(in.available());<br />   String str = null;<br />   try {<br />     str = extractor.extractText(in);<br />   }<br />   catch (Exception ex) {<br />   }<br /> //    System.out.println("the result length is"+str.length()); <br />    System.out.println(str); </p> <p>  }</p> <p>  //Clean up resources<br />   public void destroy() {<br />   }<br /> }</p> <p>1.英文的模p查询问?br /> 查询时的关键字的后边加上通配W?nbsp; " * " 可以了?/p> <p>2.IndexFiles.java<br /> 用来索引文g的javac?/p> <p>3.SearchFiles.java<br /> 用来搜烦的javac?/p> <p>4.ReadWord.java<br /> 使用tm-extractors-0.4.jar来读取word文g</p> <p><br />  </p> <p> </p> <img src ="http://www.aygfsteel.com/dreamer/aggbug/186938.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/dreamer/" target="_blank">轩辕</a> 2008-03-18 10:35 <a href="http://www.aygfsteel.com/dreamer/archive/2008/03/18/186938.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>使用tm-extractors-0.4.jar来读取word文ghttp://www.aygfsteel.com/dreamer/archive/2008/03/18/186937.html轩辕轩辕Tue, 18 Mar 2008 02:33:00 GMThttp://www.aygfsteel.com/dreamer/archive/2008/03/18/186937.htmlhttp://www.aygfsteel.com/dreamer/comments/186937.htmlhttp://www.aygfsteel.com/dreamer/archive/2008/03/18/186937.html#Feedback5http://www.aygfsteel.com/dreamer/comments/commentRss/186937.htmlhttp://www.aygfsteel.com/dreamer/services/trackbacks/186937.htmlpackage searchfileexample;

import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;
import org.textmining.text.extraction.WordExtractor;

public class ReadWord extends HttpServlet {
  private static final String CONTENT_TYPE = "text/html; charset=GBK";

  //Initialize global variables
  public void init() throws ServletException {
  }

  //Process the HTTP Get request
  public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    response.setContentType(CONTENT_TYPE);
    FileInputStream in = new FileInputStream ("D:/lfy_programe/全文?SearchFileExample/a/aa.doc");
       //  FileInputStream in = new FileInputStream ("D:/szqxjzhbase/技术测?新徏 Microsoft Word 文.doc");
   WordExtractor extractor = new WordExtractor();
   System.out.println(in.available());
  String str = null;
  try {
    str = extractor.extractText(in);
  }
  catch (Exception ex) {
  }
//    System.out.println("the result length is"+str.length());
   System.out.println(str);

  }

  //Clean up resources
  public void destroy() {
  }
}



轩辕 2008-03-18 10:33 发表评论
]]>
掌控上传q度的AJAX Uploadhttp://www.aygfsteel.com/dreamer/archive/2007/08/07/135004.html轩辕轩辕Tue, 07 Aug 2007 09:02:00 GMThttp://www.aygfsteel.com/dreamer/archive/2007/08/07/135004.htmlhttp://www.aygfsteel.com/dreamer/comments/135004.htmlhttp://www.aygfsteel.com/dreamer/archive/2007/08/07/135004.html#Feedback1http://www.aygfsteel.com/dreamer/comments/commentRss/135004.htmlhttp://www.aygfsteel.com/dreamer/services/trackbacks/135004.html阅读全文

轩辕 2007-08-07 17:02 发表评论
]]>
ajax 上传文ghttp://www.aygfsteel.com/dreamer/archive/2007/08/07/135000.html轩辕轩辕Tue, 07 Aug 2007 08:54:00 GMThttp://www.aygfsteel.com/dreamer/archive/2007/08/07/135000.htmlhttp://www.aygfsteel.com/dreamer/comments/135000.htmlhttp://www.aygfsteel.com/dreamer/archive/2007/08/07/135000.html#Feedback0http://www.aygfsteel.com/dreamer/comments/commentRss/135000.htmlhttp://www.aygfsteel.com/dreamer/services/trackbacks/135000.htmlhttp://www.matrix.org.cn/resource/article/2007-01-08/09db6d69-9ec6-11db-ab77-2bbe780ebfbf.html

轩辕 2007-08-07 16:54 发表评论
]]>
E序下蝲javaE序http://www.aygfsteel.com/dreamer/archive/2007/08/01/133810.html轩辕轩辕Wed, 01 Aug 2007 07:23:00 GMThttp://www.aygfsteel.com/dreamer/archive/2007/08/01/133810.htmlhttp://www.aygfsteel.com/dreamer/comments/133810.htmlhttp://www.aygfsteel.com/dreamer/archive/2007/08/01/133810.html#Feedback0http://www.aygfsteel.com/dreamer/comments/commentRss/133810.htmlhttp://www.aygfsteel.com/dreamer/services/trackbacks/133810.html  /*
 * 创徏日期 2006-1-11
 *
 * 更改所生成文g模板?br> * H口 > 首选项 > Java > 代码生成 > 代码和注?br> 
*/

package com.abc.cc.util.file ;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import javax.servlet.ServletOutputStream;
import java.io.FileInputStream;

import com.abc.callcenter.DataStatistic.Export.CreatUDStatisticExport;
import com.abc.callcenter.uds.unitedealwith.UniteUtil;

/**
 * 
 * 创徏日期Q?006-2-9
 * 功  能:工作?nbsp;> 文理 > 文g下蝲
 * 
@author asx
 *
 
*/

public class Down extends HttpServlet {
    
public void doGet(HttpServletRequest request , HttpServletResponse response) {
        System.out.println(
"logining Down");
        response.setContentType(
"text/html; charset=GBK");
        String downfile 
= request.getRealPath("/"+ "/exportfile/" + TimeTool.getCurrentDateForEight() + "_" + StringTool.getExportFileName(Integer.parseInt(request.getParameter("fileName"))) ;
        
try {downfile = new String(downfile.getBytes("GBK")) ;}catch(Exception e){}
        System.out.println(
"downfile = "+downfile);
        String fileName 
= buildFilename(downfile) ;
        System.out.println(
"fileName = "+fileName);
        
        String strBeginDate 
= request.getParameter("excel_begindate"); //起始日期
        String strEndDate = request.getParameter("excel_enddate"); //l束日期
        String strUnite_dept = request.getParameter("excel_department_name");//部门
        try{
            strUnite_dept 
= UniteUtil.Query_NameDepartment(""+strUnite_dept);;
        }
catch(Exception e){
            e.printStackTrace();
        }
    
        CreatUDStatisticExport cue 
= new CreatUDStatisticExport();
        cue.queryPrintInfo(strBeginDate,strEndDate,strUnite_dept,request);
        
        System.out.println(
"logining Down1");
        
try 
        

            fileName
=response.encodeURL(new String(fileName.getBytes(),"iso-8859-1"));
            response.reset(); 
            response.setContentType(
"APPLICATION/OCTET-STREAM"); 
            response.setHeader(
"Content-Disposition""attachment; filename=\"" + fileName + "\""); 
            ServletOutputStream out 
= response.getOutputStream(); 
            FileInputStream inStream 
= new FileInputStream(downfile); 
            
            
//循环取出中的数?nbsp;
            byte[] b = new byte[1024]; 
            
int len; 
            
while((len=inStream.read(b , 0 , b.length)) >0{
                out.write(b,
0,len);                 
            }

            out.close(); 
            inStream.close(); 
        }
 catch(Exception e) {}
    }

    
public void doPost(HttpServletRequest request , HttpServletResponse response) {
        doGet(request , response) ;
    }

    
    
/**
     * 转换上传文g的文件名
     * 
@param sou
     * 
@param ts
     * 
@return String
     
*/

    
private static String buildFilename(String sou) {
        
while(sou.indexOf("/"> -1{
            sou 
= sou.substring(sou.indexOf("/"+ 1) ;
        }

        
return sou;
    }

}


轩辕 2007-08-01 15:23 发表评论
]]>
վ֩ģ壺 ȷɽ| | | ¡| | | | | Ӷ| | Ӻ| | ױ| ʡ| | ̨| ʳ| | | | | | | | ͼľ| | | | | | | ˫| | Т| | | | | ˼| ¬| |