??xml version="1.0" encoding="utf-8" standalone="yes"?>午夜精品视频,8x国产一区二区三区精品推荐,精品欧美一区二区在线观看 http://www.aygfsteel.com/laoding/category/34348.html本来我以为,隐n了别人就找不到我Q没有用的,像我q样拉风的男人,无论走到哪里Q都像在黑暗中的萤火虫一P那样的鲜明,那样的出众。我那忧郁的眼神Q稀疏的胡茬Q那微微隆v的将军肚和亲切的W容......都深深吸引了众h...... zh-cnSun, 31 May 2009 19:00:41 GMTSun, 31 May 2009 19:00:41 GMT60lucene增量索引的简单实?/title><link>http://www.aygfsteel.com/laoding/articles/279230.html</link><dc:creator>老丁</dc:creator><author>老丁</author><pubDate>Sun, 31 May 2009 08:37:00 GMT</pubDate><guid>http://www.aygfsteel.com/laoding/articles/279230.html</guid><wfw:comment>http://www.aygfsteel.com/laoding/comments/279230.html</wfw:comment><comments>http://www.aygfsteel.com/laoding/articles/279230.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/laoding/comments/commentRss/279230.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/laoding/services/trackbacks/279230.html</trackback:ping><description><![CDATA[用lucene来徏立搜索程序,在检索的时候效率大大的提高了,但是却以建立索引ZP建立索引本n是个耗内存大、时间长的过E(数据量比较大Q数据少何必用lucene来徏立全文检索,个h拙见Q,从而烦引的建立是个瓶颈,如果我们建立好烦引,然后每次更新数据后重新徏立烦引,无疑是不合理的,Z么不能在原先索引文g的基上再把新更新的加在上面呢Q增量烦引就是在建完索引的后Q将数据库的最后一条记录的ID存储hQ下ơ徏立时候将q个ID拿到Q从而可以把更新的数据拿刎ͼq把q些更新数据的烦引文件加在原先的索引文g里面Q下面来看个单的例子<br /> 数据库有两个字段id和titleQ话不多_直接上代码,一看便?br /> <br /> <div style="border-right: #cccccc 1px solid; padding-right: 5px; border-top: #cccccc 1px solid; padding-left: 4px; font-size: 13px; padding-bottom: 4px; border-left: #cccccc 1px solid; width: 98%; word-break: break-all; padding-top: 4px; border-bottom: #cccccc 1px solid; background-color: #eeeeee"><span style="color: #0000ff">import</span><span style="color: #000000"> java.io.BufferedReader;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.io.File;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.io.FileReader;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.io.FileWriter;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.io.IOException;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.io.PrintWriter;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.sql.Connection;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.sql.DriverManager;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.sql.ResultSet;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> java.sql.Statement;<br /> <br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> org.apache.lucene.analysis.Analyzer;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> org.apache.lucene.analysis.standard.StandardAnalyzer;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> org.apache.lucene.document.Document;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> org.apache.lucene.document.Field;<br /> </span><span style="color: #0000ff">import</span><span style="color: #000000"> org.apache.lucene.index.IndexWriter;<br /> <br /> </span><span style="color: #0000ff">public</span><span style="color: #000000"> </span><span style="color: #0000ff">class</span><span style="color: #000000"> Index {<br /> <br />     </span><span style="color: #0000ff">public</span><span style="color: #000000"> </span><span style="color: #0000ff">static</span><span style="color: #000000"> </span><span style="color: #0000ff">void</span><span style="color: #000000"> main(String[] args) {<br />         </span><span style="color: #0000ff">try</span><span style="color: #000000"> {<br />             Index index </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> Index();<br />             String path </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">"</span><span style="color: #000000">d:\\index</span><span style="color: #000000">"</span><span style="color: #000000">;</span><span style="color: #008000">//</span><span style="color: #008000">索引文g的存放\?/span><span style="color: #008000"><br /> </span><span style="color: #000000">            String storeIdPath </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">"</span><span style="color: #000000">d:\\storeId.txt</span><span style="color: #000000">"</span><span style="color: #000000">;</span><span style="color: #008000">//</span><span style="color: #008000">存储ID的\?/span><span style="color: #008000"><br /> </span><span style="color: #000000">            String storeId </span><span style="color: #000000">=</span><span style="color: #000000">""</span><span style="color: #000000">;<br />             storeId </span><span style="color: #000000">=</span><span style="color: #000000"> index.getStoreId(storeIdPath);<br />             ResultSet rs </span><span style="color: #000000">=</span><span style="color: #000000"> index.getResult(storeId);<br />             index.indexBuilding(path, storeIdPath, rs);<br />             storeId </span><span style="color: #000000">=</span><span style="color: #000000"> index.getStoreId(storeIdPath);<br />             System.out.println(storeId);</span><span style="color: #008000">//</span><span style="color: #008000">打印ơ存储v来的ID</span><span style="color: #008000"><br /> </span><span style="color: #000000">        } </span><span style="color: #0000ff">catch</span><span style="color: #000000"> (Exception e) {<br />             e.printStackTrace();<br />         }<br />     }<br />     <br />     </span><span style="color: #0000ff">public</span><span style="color: #000000"> ResultSet getResult(String storeId) </span><span style="color: #0000ff">throws</span><span style="color: #000000"> Exception{<br />         Class.forName(</span><span style="color: #000000">"</span><span style="color: #000000">com.mysql.jdbc.Driver</span><span style="color: #000000">"</span><span style="color: #000000">).newInstance();<br />         String url </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">"</span><span style="color: #000000">jdbc:mysql://localhost:3306/ding</span><span style="color: #000000">"</span><span style="color: #000000">;<br />         String userName </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">"</span><span style="color: #000000">root</span><span style="color: #000000">"</span><span style="color: #000000">;<br />         String password </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">"</span><span style="color: #000000">ding</span><span style="color: #000000">"</span><span style="color: #000000">;<br />         Connection conn </span><span style="color: #000000">=</span><span style="color: #000000"> DriverManager.getConnection(url,userName,password);<br />         Statement stmt </span><span style="color: #000000">=</span><span style="color: #000000"> conn<br />             .createStatement();<br />         ResultSet rs </span><span style="color: #000000">=</span><span style="color: #000000"> stmt<br />             .executeQuery(</span><span style="color: #000000">"</span><span style="color: #000000">select * from newitem where id > '</span><span style="color: #000000">"</span><span style="color: #000000">+</span><span style="color: #000000">storeId</span><span style="color: #000000">+</span><span style="color: #000000">"</span><span style="color: #000000">'order by id</span><span style="color: #000000">"</span><span style="color: #000000">);<br />         </span><span style="color: #0000ff">return</span><span style="color: #000000"> rs;<br />     }<br /> <br />     </span><span style="color: #0000ff">public</span><span style="color: #000000"> </span><span style="color: #0000ff">boolean</span><span style="color: #000000"> indexBuilding(String path,String storeIdPath, ResultSet rs) {</span><span style="color: #008000">//</span><span style="color: #008000"> 把RS换成LIST原理一?/span><span style="color: #008000"><br /> </span><span style="color: #000000"><br />         </span><span style="color: #0000ff">try</span><span style="color: #000000"> {<br />             Analyzer luceneAnalyzer </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> StandardAnalyzer();<br />             </span><span style="color: #008000">//</span><span style="color: #008000"> 取得存储h的IDQ以判定是增量烦引还是重新烦?/span><span style="color: #008000"><br /> </span><span style="color: #000000">            </span><span style="color: #0000ff">boolean</span><span style="color: #000000"> isEmpty </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">true</span><span style="color: #000000">;<br />              </span><span style="color: #0000ff">try</span><span style="color: #000000"> { <br />                 File file </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> File(storeIdPath);<br />                 </span><span style="color: #0000ff">if</span><span style="color: #000000"> (</span><span style="color: #000000">!</span><span style="color: #000000">file.exists()) {<br />                     file.createNewFile();<br />                 }<br />                 FileReader fr </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> FileReader(storeIdPath);<br />                 BufferedReader br </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> BufferedReader(fr);                 <br />                 </span><span style="color: #0000ff">if</span><span style="color: #000000">(br.readLine()</span><span style="color: #000000">!=</span><span style="color: #000000"> </span><span style="color: #0000ff">null</span><span style="color: #000000">) {<br />                     isEmpty </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">false</span><span style="color: #000000">;<br />                  }<br />                  br.close();<br />                  fr.close(); <br />                 } </span><span style="color: #0000ff">catch</span><span style="color: #000000"> (IOException e) { <br />                    e.printStackTrace();<br />               }<br /> <br />             IndexWriter writer </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> IndexWriter(path, luceneAnalyzer, isEmpty);</span><span style="color: #008000">//</span><span style="color: #008000">参数isEmpty是false表示增量索引</span><span style="color: #008000"><br /> </span><span style="color: #000000">            String storeId </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">""</span><span style="color: #000000">;<br />             </span><span style="color: #0000ff">boolean</span><span style="color: #000000"> indexFlag </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">false</span><span style="color: #000000">;<br />             String id;<br />             String title;<br />             </span><span style="color: #0000ff">while</span><span style="color: #000000"> (rs.next()) {<br />                 </span><span style="color: #008000">//</span><span style="color: #008000"> for(Iterator it = list.iterator();it.hasNext();){</span><span style="color: #008000"><br /> </span><span style="color: #000000">                id </span><span style="color: #000000">=</span><span style="color: #000000"> rs.getString(</span><span style="color: #000000">"</span><span style="color: #000000">id</span><span style="color: #000000">"</span><span style="color: #000000">);<br />                 title </span><span style="color: #000000">=</span><span style="color: #000000"> rs.getString(</span><span style="color: #000000">"</span><span style="color: #000000">title</span><span style="color: #000000">"</span><span style="color: #000000">);<br />                 writer.addDocument(Document(id, title));<br />                 storeId </span><span style="color: #000000">=</span><span style="color: #000000"> id;</span><span style="color: #008000">//</span><span style="color: #008000">拿到的idlstoreIdQ这U拿法不合理Q这里ؓ了方?/span><span style="color: #008000"><br /> </span><span style="color: #000000">                indexFlag </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">true</span><span style="color: #000000">;<br />             }<br />             writer.optimize();<br />             writer.close();<br />             </span><span style="color: #0000ff">if</span><span style="color: #000000">(indexFlag){<br />                 </span><span style="color: #008000">//</span><span style="color: #008000"> 最后一个的ID存到盘文g?/span><span style="color: #008000"><br /> </span><span style="color: #000000">                </span><span style="color: #0000ff">this</span><span style="color: #000000">.writeStoreId(storeIdPath, storeId);<br />             }<br />             </span><span style="color: #0000ff">return</span><span style="color: #000000"> </span><span style="color: #0000ff">true</span><span style="color: #000000">;<br />         } </span><span style="color: #0000ff">catch</span><span style="color: #000000"> (Exception e) {<br />             e.printStackTrace();<br />             System.out.println(</span><span style="color: #000000">"</span><span style="color: #000000">出错?/span><span style="color: #000000">"</span><span style="color: #000000"> </span><span style="color: #000000">+</span><span style="color: #000000"> e.getClass() </span><span style="color: #000000">+</span><span style="color: #000000"> </span><span style="color: #000000">"</span><span style="color: #000000">\n   错误信息?   </span><span style="color: #000000">"</span><span style="color: #000000"><br />                     </span><span style="color: #000000">+</span><span style="color: #000000"> e.getMessage());<br />             </span><span style="color: #0000ff">return</span><span style="color: #000000"> </span><span style="color: #0000ff">false</span><span style="color: #000000">;<br />         }<br /> <br />     }<br /> <br /> <br />     </span><span style="color: #0000ff">public</span><span style="color: #000000"> </span><span style="color: #0000ff">static</span><span style="color: #000000"> Document Document(String id, String title) {<br />         Document doc </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> Document();<br />         doc.add(</span><span style="color: #0000ff">new</span><span style="color: #000000"> Field(</span><span style="color: #000000">"</span><span style="color: #000000">ID</span><span style="color: #000000">"</span><span style="color: #000000">, id, Field.Store.YES, Field.Index.TOKENIZED));<br />         doc.add(</span><span style="color: #0000ff">new</span><span style="color: #000000"> Field(</span><span style="color: #000000">"</span><span style="color: #000000">TITLE</span><span style="color: #000000">"</span><span style="color: #000000">, title, Field.Store.YES,<br />                 Field.Index.TOKENIZED));<br />         </span><span style="color: #0000ff">return</span><span style="color: #000000"> doc;<br />     }<br /> <br />     </span><span style="color: #008000">//</span><span style="color: #008000"> 取得存储在磁盘中的ID</span><span style="color: #008000"><br /> </span><span style="color: #000000">    </span><span style="color: #0000ff">public</span><span style="color: #000000"> </span><span style="color: #0000ff">static</span><span style="color: #000000"> String getStoreId(String path) {<br />         String storeId </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">""</span><span style="color: #000000">;<br />         </span><span style="color: #0000ff">try</span><span style="color: #000000"> {<br />             File file </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> File(path);<br />             </span><span style="color: #0000ff">if</span><span style="color: #000000"> (</span><span style="color: #000000">!</span><span style="color: #000000">file.exists()) {<br />                 file.createNewFile();<br />             }<br />             FileReader fr </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> FileReader(path);<br />             BufferedReader br </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> BufferedReader(fr);<br />             storeId </span><span style="color: #000000">=</span><span style="color: #000000"> br.readLine();<br />             </span><span style="color: #0000ff">if</span><span style="color: #000000"> (storeId </span><span style="color: #000000">==</span><span style="color: #000000"> </span><span style="color: #0000ff">null</span><span style="color: #000000"> </span><span style="color: #000000">||</span><span style="color: #000000"> storeId </span><span style="color: #000000">==</span><span style="color: #000000"> </span><span style="color: #000000">""</span><span style="color: #000000">)<br />                 storeId </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #000000">"</span><span style="color: #000000">0</span><span style="color: #000000">"</span><span style="color: #000000">;<br />             br.close();<br />             fr.close();<br />         } </span><span style="color: #0000ff">catch</span><span style="color: #000000"> (Exception e) {<br />             e.printStackTrace();<br />         }<br />         </span><span style="color: #0000ff">return</span><span style="color: #000000"> storeId;<br />     }<br /> <br />     </span><span style="color: #008000">//</span><span style="color: #008000"> ID写入到磁盘文件中</span><span style="color: #008000"><br /> </span><span style="color: #000000">    </span><span style="color: #0000ff">public</span><span style="color: #000000"> </span><span style="color: #0000ff">static</span><span style="color: #000000"> </span><span style="color: #0000ff">boolean</span><span style="color: #000000"> writeStoreId(String path,String storeId) {<br />         </span><span style="color: #0000ff">boolean</span><span style="color: #000000"> b </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">false</span><span style="color: #000000">;<br />         </span><span style="color: #0000ff">try</span><span style="color: #000000"> {<br />             File file </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> File(path);<br />             </span><span style="color: #0000ff">if</span><span style="color: #000000"> (</span><span style="color: #000000">!</span><span style="color: #000000">file.exists()) {<br />                 file.createNewFile();<br />             }<br />             FileWriter fw </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> FileWriter(path);<br />             PrintWriter out </span><span style="color: #000000">=</span><span style="color: #000000"> </span><span style="color: #0000ff">new</span><span style="color: #000000"> PrintWriter(fw);<br />             out.write(storeId);<br />             out.close();<br />             fw.close();<br />             b</span><span style="color: #000000">=</span><span style="color: #0000ff">true</span><span style="color: #000000">;<br />         } </span><span style="color: #0000ff">catch</span><span style="color: #000000"> (IOException e) {<br />             e.printStackTrace();<br />         }<br />         </span><span style="color: #0000ff">return</span><span style="color: #000000"> b;<br />     }<br /> }</span></div> <br /> q里代码写的比较单,很多需要改q的地方Q自己改q就行了Q这里只是说明了增量索引的原理,望指正?br /> <br /> <img src ="http://www.aygfsteel.com/laoding/aggbug/279230.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/laoding/" target="_blank">老丁</a> 2009-05-31 16:37 <a href="http://www.aygfsteel.com/laoding/articles/279230.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>lucene索引word/pdf/html/txt文g及检?搜烦引擎)http://www.aygfsteel.com/laoding/articles/237868.html老丁老丁Fri, 31 Oct 2008 11:05:00 GMThttp://www.aygfsteel.com/laoding/articles/237868.htmlhttp://www.aygfsteel.com/laoding/comments/237868.htmlhttp://www.aygfsteel.com/laoding/articles/237868.html#Feedback0http://www.aygfsteel.com/laoding/comments/commentRss/237868.htmlhttp://www.aygfsteel.com/laoding/services/trackbacks/237868.html lucene的jar包自己去下蝲?br /> 首先是徏立烦引的代码Q?br />
public class TextFileIndexer {   
    
public static void main(String[] args) throws Exception {   
        
/* 指明要烦引文件夹的位|?q里是d盘的s文g夹下 */  
        File fileDir 
= new File("d:\\s");   
  
        
/* q里攄引文件的位置 */  
        File indexDir 
= new File("d:\\index");   
        Analyzer luceneAnalyzer 
= new StandardAnalyzer();   
        IndexWriter indexWriter 
= new IndexWriter(indexDir, luceneAnalyzer,   
                
true);   
        File[] textFiles 
= fileDir.listFiles();   
        
long startTime = new Date().getTime();   
           
        
//增加document到烦引去     
                System.out.println("File正在被烦?img src="http://www.aygfsteel.com/Images/dot.gif" alt="" />.");  
                
                
/*
                 * 注意要变的就是这里,路径和读取文件的Ҏ
                 * 
*/
                String path 
="d:\\s\\2.doc";
                String temp 
= ReadFile.readWord(path);
//                String path ="d:\\s\\index.htm"; 
//                String temp = ReadFile.readHtml(path);
                Document document = new Document();   
                Field FieldPath 
= new Field("path",path, 
                        Field.Store.YES, Field.Index.NO);   
                Field FieldBody 
= new Field("body", temp, Field.Store.YES,   
                        Field.Index.TOKENIZED,   
                        Field.TermVector.WITH_POSITIONS_OFFSETS);   
                document.add(FieldPath);   
                document.add(FieldBody);   
                indexWriter.addDocument(document);   
             
          
        
//optimize()Ҏ是对索引q行优化   
        indexWriter.optimize();   
        indexWriter.close();   
           
        
//试一下烦引的旉   
        long endTime = new Date().getTime();   
        System.out   
                .println(
"q花费了"  
                        
+ (endTime - startTime)   
                        
+ " 毫秒来把文档增加到烦引里面去!"  
                        
+ fileDir.getPath());   
    }  
 }

上面已经注释了要换的地方Q我们要做的是换文件的路径和读取文件的Ҏ?/span>

下面来具体看下读取文件的Ҏ

1.首先来看WORD文档Q?/span>
我这里用的是poiQ相关jar包自己去下蝲Q然后加到工E中Q以下所要用的jar包也是,不再重复_?br />
来看相关代码Q?br />
    public static String readWord(String path) {
        StringBuffer content 
= new StringBuffer("");// 文档内容
        try {

            HWPFDocument doc 
= new HWPFDocument(new FileInputStream(path));
            Range range 
= doc.getRange();
            
int paragraphCount = range.numParagraphs();// D落
            for (int i = 0; i < paragraphCount; i++) {// 遍历D落d数据
                Paragraph pp = range.getParagraph(i);
                content.append(pp.text());
            }

        } 
catch (Exception e) {

        }
        
return content.toString().trim();
    }

2.PDF文g用的是PDFboxQ?br />
public static String readPdf(String path) throws Exception {
        StringBuffer content 
= new StringBuffer("");// 文档内容
        FileInputStream fis = new FileInputStream(path);
        PDFParser p 
= new PDFParser(fis);
        p.parse();
        PDFTextStripper ts 
= new PDFTextStripper();
        content.append(ts.getText(p.getPDDocument()));
        fis.close();
        
return content.toString().trim();
    }

3.html文gQ?br />
public static String readHtml(String urlString) {

        StringBuffer content 
= new StringBuffer("");
        File file 
= new File(urlString);
        FileInputStream fis 
= null;
        
try {
            fis 
= new FileInputStream(file);
            
// d面
            BufferedReader reader = new BufferedReader(new InputStreamReader(
                    fis,
"utf-8"));//q里的字W编码要注意Q要对上html头文件的一_否则会出q
            
            String line 
= null;

            
while ((line = reader.readLine()) != null) {
                content.append(line 
+ "\n");
            }
            reader.close();
        } 
catch (Exception e) {
            e.printStackTrace();
        }
        String contentString 
= content.toString();
        
return contentString;
    }

4.txt文gQ?/span>

public static String readTxt(String path) {
        StringBuffer content 
= new StringBuffer("");// 文档内容
        try {
            FileReader reader 
= new FileReader(path);
            BufferedReader br 
= new BufferedReader(reader);
            String s1 
= null;

            
while ((s1 = br.readLine()) != null) {
                content.append(s1 
+ "\r");
            }
            br.close();
            reader.close();
        } 
catch (IOException e) {
            e.printStackTrace();
        }
        
return content.toString().trim();
    }

接下来数搜烦代码Q?/span>

public class TestQuery {   
    
public static void main(String[] args) throws IOException, ParseException {   
        Hits hits 
= null;   
        
//搜烦内容自己?/span>
        String queryString = "Ҏ国务院的军_";   
        Query query 
= null;  
        
        IndexSearcher searcher 
= new IndexSearcher("d:\\index"); //q里注意索引存放的\?nbsp;
  
        Analyzer analyzer 
= new StandardAnalyzer();   
        
try {   
            QueryParser qp 
= new QueryParser("body", analyzer);   
            
/**
             * 建烦引的时候我们指定了body建立为内容,我们搜烦的时候也是针对body的,所?br />              *   QueryParser qp = new QueryParser("body", analyzer); 
             *   q句和徏立烦引时?br />                 Field FieldBody = new Field("body", temp, Field.Store.YES,   
                        Field.Index.TOKENIZED,   
                        Field.TermVector.WITH_POSITIONS_OFFSETS); 
             *的这句的"body"是对应的?br />              
*/
            query 
= qp.parse(queryString);   
        } 
catch (ParseException e) {
            System.out.println(
"异常"); 
        }   
        
if (searcher != null) {   
            hits 
= searcher.search(query);   
            
if (hits.length() > 0) {   
                System.out.println(
"扑ֈ:" + hits.length() + " 个结?");  
                
for (int i = 0; i < hits.length(); i++) {//输出搜烦信息 
                     Document document = hits.doc(i);
                     System.out.println(
"contentsQ?/span>"+document.get("body"));
                     
//同样原理q里的document.get("body")是取得建立在烦引文仉面的额body的所有内?br />                      //你若惌出文件\径就用document.get("path")可以了
                }
            } 
else{
                System.out.println(
"0个结?"); 
            }   
        }  
    } 


老丁 2008-10-31 19:05 发表评论
]]>
Lucene的查询语法!(搜烦引擎)http://www.aygfsteel.com/laoding/articles/237857.html老丁老丁Fri, 31 Oct 2008 10:07:00 GMThttp://www.aygfsteel.com/laoding/articles/237857.htmlhttp://www.aygfsteel.com/laoding/comments/237857.htmlhttp://www.aygfsteel.com/laoding/articles/237857.html#Feedback1http://www.aygfsteel.com/laoding/comments/commentRss/237857.htmlhttp://www.aygfsteel.com/laoding/services/trackbacks/237857.htmlhttp://liyu2000.nease.net/article/Lucene/queryparsersyntax.htm

l论

Lucene提供了方便您创徏自徏查询的APIQ也通过QueryParser提供了强大的查询语言?/span>

本文讲述Lucene的查询语句解析器支持的语法,Lucene的查询语句解析器是用JavaCC工具生成的词法解析器Q它查询字串解析ؓLucene Query对象?/span>

TermQ?/span>

一条搜索语句被拆分Z些项QtermQ和操作W(operatorQ。项有两U类型:单独和短语?/span>

单独就是一个单独的单词Q例?test" Q?"hello"?/span>

短语是一l被双引号包围的单词Q例?hello dolly"?/span>

多个可以用布尔操作W连接v来Ş成复杂的查询语句Q接下来您就会看刎ͼ?/span>

注意QAnalyzer建立索引时用的解析器和解析单独和短语时的解析器相同,因此选择一个不会受查询语句q扰的Analyzer非常重要?/span>

域(FieldQ?/span>

Lucene支持域。您可以指定在某一个域中搜索,或者就使用默认域。域名及默认域是具体索引器实现决定的?/span>

您可以这h索域Q域?":"+搜烦的项名?/span>

举个例子Q假设某一个Lucene索引包含两个域,title和textQtext是默认域。如果您x找标题ؓ"The Right Way"且含?don't go this way"的文章,您可以输入:

title:"The Right Way" AND text:go

或?/span>

title:"Do it right" AND right

因ؓtext是默认域Q所以这个域名可以不行?/span>

注意Q域名只对紧接于其后的项生效Q所?/span>

title:Do it right

只有"Do"属于title域?it"?right"仍将在默认域中搜索(q里是text域)?/span>

修饰符Q?/span>Term ModifiersQ?/span>

Lucene支持修饰符以支持更宽范围的搜烦选项?/span>

用通配W搜?/span>

Lucene支持单个与多个字W的通配搜烦?/span>

使用W号"?"表示单个L字符的通配?/span>

使用W号"*"表示多个L字符的通配?/span>

单个L字符匚w的是所有可能单个字W。例如,搜烦"text或?test"Q可以这P

te?t

多个L字符匚w的是0个及更多个可能字W。例如,搜烦test, tests 或?testerQ可以这P

test*

您也可以在字W窜中间使用多个L字符通配W?/span>

te*t

注意Q您不能在搜索的开始?或?W号?/span>

模糊查询

Lucene支持ZLevenshtein Distance与Edit Distance法的模p搜索。要使用模糊搜烦只需要在单独的最后加上符?~"。例如搜索拼写类g"roam"的项q样写:

roam~

q次搜烦找到Ş如foam和roams的单词?/span>

注意Q用模p查询将自动得到增量因子Qboost factorQؓ0.2的搜索结?

邻近搜烦(Proximity Searches)

Luceneq支持查扄隔一定距ȝ单词。邻q搜索是在短语最后加上符?~"。例如在文档中搜索相?0个单词的"apache"?jakarta"Q这样写Q?/span>

"jakarta apache"~10

Boosting a Term

Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.

Lucene可以讄在搜索时匚w的怼度。在的最后加上符?^"紧接一个数字(增量|Q表C搜索时的相似度。增量D高,搜烦到的相兛_好?/span>

Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for jakarta apache and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type:

通过增量一个项可以控制搜烦文档时的相关度。例如如果您要搜索jakarta apacheQ同时您惌"jakarta"的相兛_更加好,那么在其后加?^"W号和增量|也就是您输入Q?/span>

jakarta^4 apache

This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example:

q将使得生成的doucment可能与jakarta相关度高。您也可以增量短语,象以下这个例子一P

"jakarta apache"^4 "jakarta lucene"

By default, the boost factor is 1. Although, the boost factor must be positive, it can be less than 1 (i.e. .2)

默认情况下,增量值是1。增量g可以于1Q例?.2Q,但必L有效的?/span>

布尔操作W?/span>

布尔操作W可项通过逻辑操作q接h。Lucene支持AND, "+", OR, NOT ?"-"q些操作W。(注意Q布操作符必须全部大写Q?/span>

OR

OR操作W是默认的连接操作符。这意味着如果两个之间没有布操作符Q就是用OR操作W。OR操作W连接两个项Q意味着查找含有L的文档。这与集合ƈq算相同。符号||可以代替W号OR?/span>

搜烦含有"jakarta apache" 或?"jakarta"的文档,可以使用q样的查询:

"jakarta apache" jakarta

或?/span>

"jakarta apache" OR jakarta

AND

AND操作W匹配的是两同时出现的文档。这个与集合交操作相{。符?amp;&可以代替W号AND?/span>

搜烦同时含有"jakarta apache" ?"jakarta lucene"的文档,使用查询Q?/span>

"jakarta apache" AND "jakarta lucene"

+

"+"操作W或者称为存在操作符Q要求符?+"后的必d文档相应的域中存在?/span>

搜烦必须含有"jakarta"Q可能含?lucene"的文档,使用查询Q?/span>

+jakarta apache

NOT

NOT操作W排除那些含有NOTW号后面的文档。这和集合的差运相同。符P可以代替W号NOT?/span>

搜烦含有"jakarta apache"Q但是不含有"jakarta lucene"的文档,使用查询Q?/span>

"jakarta apache" NOT "jakarta lucene"

注意QNOT操作W不能单独与用构成查询。例如,以下的查询查不到Ml果Q?/span>

NOT "jakarta apache"

-

"-"操作W或者禁止操作符排除含有"-"后面的相似项的文档?/span>

搜烦含有"jakarta apache"Q但不是"jakarta lucene"Q用查询:

"jakarta apache" -"jakarta lucene"

分组Q?/span>GroupingQ?/span>

Lucene支持使用圆括hl合字句形成子查询。这对于x制查询布逻辑的h十分有用?/span>

搜烦含有"jakarta"或?apache"Q同时含?website"的文档,使用查询Q?/span>

(jakartaOR apache) AND website

q样消除了歧义Q保证website必须存在Qjakarta和apache中之一也存在?/span>

转义Ҏ字符Q?/span>Escaping Special CharactersQ?/span>

Lucene支持转义Ҏ字符Q因为特D字W是查询语法用到的。现在,Ҏ字符包括

+ - && || ! ( ) { } [ ] ^ " ~ * ? : "

转义Ҏ字符只需在字W前加上W号",例如搜烦(1+1):2Q用查?/span>

"(1"+1")":2



老丁 2008-10-31 18:07 发表评论
]]>
lucene介绍(搜烦引擎)http://www.aygfsteel.com/laoding/articles/237852.html老丁老丁Fri, 31 Oct 2008 09:33:00 GMThttp://www.aygfsteel.com/laoding/articles/237852.htmlhttp://www.aygfsteel.com/laoding/comments/237852.htmlhttp://www.aygfsteel.com/laoding/articles/237852.html#Feedback0http://www.aygfsteel.com/laoding/comments/commentRss/237852.htmlhttp://www.aygfsteel.com/laoding/services/trackbacks/237852.html1.     什么是lucene

Apache Lucene是一个开放源E序的搜d引擎Q利用它可以LCؓJava软g加入全文搜寻功能?/span>Lucene的最主要工作是替文g的每一个字作烦引,索引让搜ȝ效率比传l的逐字比较大大提高Q?/span>Lucen提供一l解读,qoQ分析文Ӟ~排和用烦引的APIQ它的强大之处除了高效和单外Q是最重要的是使用者可以随时应自已需要自订其功能?/span> Lucene?/span>apache软g基金会项目组的一个子目Q是一个开放源代码的全文检索引擎工具包Q即它不是一个完整的全文索引擎,而是一个全文检索引擎的架构Q提供了完整的查询引擎和索引引擎Q部分文本分析引擎?/span>Lucene的目的是Y件开发h员提供一个简单易用的工具包,以方便的在目标系l中实现全文索的功能Q或者是以此为基建立起完整的全文索引擎?/span>

2.     Lucene能做什?/span>

Lucene使你可以Z的应用程序添加烦引和搜烦能力?/span>Lucene可以索引q能使得可以转换成文本格式的M数据能够被搜索?/span>Luceneq不兛_数据的来源、格式甚臛_的语aQ只要你能将它{换ؓ文本。这意味着你可l烦引ƈ搜烦存放于文件中的数据:在远E服务器上的web面Q存于本地文件系l的文档Q简单的文本文gQ微?/span>Word文档Q?/span>HTML?/span>PDF文g或Q何其它能够提取出文本信息的格式?/span>

同样Q利?/span>Lucene你可以烦引存放于数据库中的数据,提供l用户很多数据库没有提供的全文搜索的能力。一旦你集成?/span>LuceneQ你的应用程序的用户p够像q样来搜索:+George +Rice –eat –pudding, Apple –pie +Tiger, animal:monkey AND food:banana{等。利?/span>LuceneQ你可以索引和搜?/span>email邮gQ邮件列表档案,x聊天记录Q你?/span>Wiki……{等更多?/span>

3.     Lucene的优?/span>

Q?/span>1Q烦引文件格式独立于应用q_?/span>Lucene定义了一套以8位字节ؓ基础的烦引文件格式,使得兼容pȝ或者不同^台的应用能够׃n建立的烦引文件?/span>

Q?/span>2Q在传统全文索引擎的倒排索引的基上,实现了分块烦引,能够针对新的文g建立文件烦引,提升索引速度。然后通过与原有烦引的合ƈQ达C化的目的?/span>Lucene提供了烦引的扩展机制Q因此烦引可以动态扩展?/span>

Q?/span>4Q设计了独立于语a和文件格式的文本分析接口Q烦引器通过接受Token完成烦引文件的创立Q用h展新的语a和文件格式,只需要实现文本分析的接口?/span>

Q?/span>5Q已l默认实C一套强大的查询引擎Q用h需自己~写代码即ɾpȝ可获得强大的查询能力Q?/span>Lucene的查询实C默认实现了布操作、模p查询、分l查询等{?/span>

Q?/span>6Q?span style="color: black">搜烦q程优化?/span>Lucene面向全文索的优化在于首次索引索后Qƈ不把所有的记录Q?/span>DocumentQ具体内容读取出来,而v只将所有结果中匚w度最高的?/span>100条结果(TopDocsQ的ID攑ֈl果集缓存中q返回?/span>

Q?/span>7Q?/span>Lucene的另外一个特Ҏ在收集结果的q程中将匚w度低的结果自动过滤掉了。这也是和数据库应用需要将搜烦的结果全部返回不同之?/span>

4.     查询相关

Analyzer是分析器Q它的作用是把一个字W串按某U规则划分成一个个词语Qƈ去除其中的无效词语,q里说的无效词语是指英文中的“of”?/span> the”Q中文中?#8220;?#8221;?#8220;?#8221;{词语,q些词语在文章中大量出现Q但是本w不包含什么关键信息,L有利于羃烦引文件、提高效率、提高命中率?/span>

分词的规则千变万化,但目的只有一个:按语义划分。这点在英文中比较容易实玎ͼ因ؓ英文本n是以单词ؓ单位的,已经用空格分开Q而中文则必须以某U方法将q成一片的句子划分成一个个词语?/span>

(1)      用通配W进行搜?/span>

单个L字符匚w的是所有可能单个字W。例如,搜烦"text或?/span>"test"Q可以这Pte?t

多个L字符匚w的是0个及更多个可能字W。例如,搜烦test, tests 或?/span> testerQ可以这Ptest*

您也可以在字W窜中间使用多个L字符通配W?/span>te*t

注意Q您不能在搜索的开始?/span>*或?/span>?W号?/span>

(2)      模糊查询

Lucene支持ZLevenshtein Distance?/span>Edit Distance法的模p搜索。要使用模糊搜烦只需要在单独的最后加上符?/span>"~"。例如搜索拼写类g"roam"的项q样写:roam~

q次搜烦找到Ş?/span>foam?/span>roams的单词?/span>

注意Q用模p查询将自动得到增量因子Q?/span>boost factor0.2的搜索结?/span>.

(3)      布尔操作W?/span>

布尔操作W可项通过逻辑操作q接h?/span>Lucene支持AND, "+", OR, NOT ?/span> "-"q些操作W。(注意Q布操作符必须全部大写Q?/span>

(4)      转义Ҏ字符

Lucene支持转义Ҏ字符Q因为特D字W是查询语法用到的。现在,Ҏ字符包括

+ - && || ! ( ) { } [ ] ^ " ~ * ? : "

转义Ҏ字符只需在字W前加上W号",例如搜烦(1+1):2Q用查?/span>

"(1"+1")":2

5.     一些用经?/span>

(1)      关键词区分大写

OR AND TO{关键词是区分大写的,lucene只认大写的,写的当做普通单词?/span>

(2)      d互斥?/span>

同一时刻只能有一个对索引的写操作Q在写的同时可以q行搜烦?/span>

(3)      文g?/span>

在写索引的过E中退出将?/span>tmp目录留下一?/span>lock文gQ以后的写操作无法q行Q可以将其手工删除?/span>

(4)       旉格式

lucene只支持一U时间格?/span>yyMMddHHmmssQ所以你传一?/span>yy-MM-dd HH:mm:ss的时间给lucene它是不会当作旉来处理的?/span>

(5)      索引更新

lucene不支持烦引更斎ͼ必须是先删除再新建烦引,如果数据量很大且更新快则相当ȝQ本w徏立烦引是个O长的q程Q同时相当耗内存且很伤diskQ不能实时的满查询?/span>

(6)      中间取烦?/span>

lucene不支持从中间取烦引。例如:用户取第十页Q?/span>lucene需要把前面所有的内容都要索出Q然后所有的排序Q过滤掉前面的然后返回?/span>

(7)      英文查询

若查询英文,比如有一句话Q?/span>jiangxi strong 如果你输?/span>jiang或?/span>stron{不完整的一个词Q将不能查询出结果,当你输入jiangxi或?/span>strong才能查询出结果?/span>



老丁 2008-10-31 17:33 发表评论
]]>
单lucene搜烦实现(搜烦引擎)http://www.aygfsteel.com/laoding/articles/226902.html老丁老丁Thu, 04 Sep 2008 05:06:00 GMThttp://www.aygfsteel.com/laoding/articles/226902.htmlhttp://www.aygfsteel.com/laoding/comments/226902.htmlhttp://www.aygfsteel.com/laoding/articles/226902.html#Feedback0http://www.aygfsteel.com/laoding/comments/commentRss/226902.htmlhttp://www.aygfsteel.com/laoding/services/trackbacks/226902.html首先下蝲lucene相关jar包,q里׃多说Q自q上找

在eclipse下徏立web工程luceneTest

jar包加载到你的web工程里面

新徏cIndex.java,代码如下Q?/span>


import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.RAMDirectory;

/*
 * Create Date:2007-10-26 下午02:52:53
 *
 * Author:dingkm
 *
 * Version: V1.0
 *
 * DescriptionQ对q行修改的功能进行描q?br />  *
 *
 */

public class Index {

 /**
  * @Description Ҏ实现功能描述
  * @param args
  *            void
  * @throws 抛出异常说明
  */
 public static void main(String[] args) {
  // TODO Auto-generated method stub
  try {
   new Index().index();
   System.out.println("create index success!!!");
  } catch (CorruptIndexException e) {
   e.printStackTrace();
  } catch (LockObtainFailedException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }

 public void index() throws CorruptIndexException, LockObtainFailedException, IOException{
   long start = System.currentTimeMillis();
  
  // 建立索引的\?br />      String path = "c:\\index2";
  Document doc1 = new Document();  
        doc1.add( new Field("name", "中华人民共和?,Field.Store.YES,Field.Index.TOKENIZED));  
        doc1.add( new Field("content", "标题或正文包?,Field.Store.YES,Field.Index.TOKENIZED)); 
        doc1.add( new Field("time", "20080715",Field.Store.YES,Field.Index.TOKENIZED));
        Document doc2 = new Document();  
        doc2.add(new Field("name", "大中国中?,Field.Store.YES,Field.Index.TOKENIZED));  
        IndexWriter writer = new IndexWriter(FSDirectory.getDirectory(path, true), new StandardAnalyzer(), true);
        writer.setMaxMergeDocs(10);
        writer.setMaxFieldLength(3);  
        writer.addDocument(doc1);  
        writer.setMaxFieldLength(3);  
        writer.addDocument(doc2);  
        writer.close();  
 
 
      
        System.out.println("=========================");
        System.out.print(System.currentTimeMillis() - start);
  System.out.println("total milliseconds");
  System.out.println("=========================");
       

 }

}

执行q个c,可以看到l果Q?br />
=========================
375total milliseconds
=========================
create index success!!!

可以看到索引创徏成功?br />

下面我们来创建搜索类QSearch.java

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

/*
 * Create Date:2007-10-26 下午02:56:12
 *
 * Author:dingkm
 *
 * Version: V1.0
 *
 * DescriptionQ对q行修改的功能进行描q?
 *
 * 
 */

public class Search {

 /** 
  *   @Description Ҏ实现功能描述 
  *   @param args
  *   void
  *   @throws  抛出异常说明
  */
 public static void main(String[] args) {
  // TODO Auto-generated method stub
   String path = "c:\\index2";
   try {
   new Search().search(path);
  } catch (CorruptIndexException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (ParseException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }

 }
 
 
 public void search(String path) throws CorruptIndexException, IOException, ParseException{
   IndexSearcher searcher = new IndexSearcher(path);  
         Hits hits = null;  
         Query query = null;  
         QueryParser qp = new QueryParser("name",new StandardAnalyzer());  

            query = qp.parse("?);
         hits = searcher.search(query); 
            java.text.NumberFormat   format   =   java.text.NumberFormat.getNumberInstance();  
         System.out.println("查找到共" + hits.length() + "个结?);  
            for   (int   i   =   0;   i   <   hits.length();   i++)   {  
                  //开始输出查询结?nbsp; 
                  Document   doc   =   hits.doc(i);  
                  System.out.println(doc.get("name"));  
                  System.out.println("content="+doc.get("content"));
                  System.out.println("time="+doc.get("time"));
                  System.out.println("准确度ؓQ?   +   format.format(hits.score(i)   *   100.0)   +   "%");  
//                  System.out.println(doc.get("CONTENT"));  
              } 
     
 }

}

执行它,会得C下结果:

查找到共2个结?br /> 中华人民共和?br /> content=标题或正文包?br /> time=20080715
准确度ؓQ?9.727%
大中国中?br /> content=null
time=null
准确度ؓQ?9.727%

q样完成了我们的程?br />
q是我第一ơ发表文?br /> 说的比较单,可能很多地方说的不清?br /> 希望大家多多支持

有什么不明白的欢q留a?/span>



老丁 2008-09-04 13:06 发表评论
]]>
վ֩ģ壺 Ͽ| | | ɽ| Ժ| | | ϻ| | | ƽ| ˳| | ̨| ɽ| | ̨| | | ¡| ƴ| | ̨| ء| | ˮ| | Դ| ƺ| ˾| ̫| | | ۳| Ϫ| ʳ| ȳ| | | ˾| |