posts - 75,comments - 83,trackbacks - 0

          本文主要結合測試案例介紹了Lucene下的各種查詢語句以及它們的簡化方法.
          通過本文你將了解Lucene的基本查詢語句,并可以學習所有的測試代碼已加強了解.

          具體的查詢語句

          在了解了SQL后, 你是否想了解一下查詢語法樹?在這里簡要介紹一些能被Lucene直接使用的查詢語句.

          1.???????? TermQuery
          查詢某個特定的詞,在文章開始的例子中已有介紹.常用于查詢關鍵字.

          ???????????? [Test]
          ???????? public void Keyword()
          ???????? {
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Term t = new Term("isbn", "1930110995");
          ????????????? Query query = new TermQuery(t);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
          ???????? }

          注意Lucene中的關鍵字,是需要用戶去保證唯一性的.

          ?TermQuery和QueryParse

          只要在QueryParse的Parse方法中只有一個word,就會自動轉換成TermQuery.

          2.???????? RangeQuery
          用于查詢范圍,通常用于時間,還是來看例子:

          namespace dotLucene.inAction.BasicSearch
          {
          ???? public class RangeQueryTest : LiaTestCase
          ???? {
          ???????? private Term begin, end;

          ???????? [SetUp]
          ???????? protected override void Init()
          ???????? {
          ????????????? begin = new Term("pubmonth", "200004");

          ????????????? end = new Term("pubmonth", "200206");
          ????????????? base.Init();
          ???????? }

          ???????? [Test]
          ???????? public void Inclusive()
          ???????? {
          ????????????? RangeQuery query = new RangeQuery(begin, end, true);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);

          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length());
          ???????? }

          ???????? [Test]
          ???????? public void Exclusive()
          ???????? {
          ????????????? RangeQuery query = new RangeQuery(begin, end, false);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);

          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(0, hits.Length());
          ???????? }

          ???? }
          }

          RangeQuery的第三個參數用于表示是否包含該起止日期.

          RangeQuery QueryParse

          ????????????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query = QueryParser.Parse("pubmonth:[200004 TO 200206]", "subject", new SimpleAnalyzer());
          ????????????? Assert.IsTrue(query is RangeQuery);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);

          ????????????? query = QueryParser.Parse("{200004 TO 200206}", "pubmonth", new SimpleAnalyzer());
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(0, hits.Length(), "JDwA in 200206");
          ???????? }

          Lucene用[] 和{}分別表示包含和不包含.

          3.??? PrefixQuery

          用于搜索是否包含某個特定前綴,常用于Catalog的檢索.

          ???????????[Test]
          ???????? public? void? TestPrefixQuery()
          ???????? {
          ????????????? PrefixQuery query = new PrefixQuery(new Term("category", "/Computers"));

          ???????????? ?IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ?????????????
          ????????????? query = new PrefixQuery(new Term("category", "/Computers/JUnit"));
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
          ???????? }

          PrefixQuery和QueryParse

          ??????????? ? [Test]
          ???????? public void TestQueryParser()
          ???????? {

          ????????????? QueryParser qp = new QueryParser("category", new SimpleAnalyzer());
          ????????????? qp.SetLowercaseWildcardTerms(false);
          ????????????? Query query =qp.Parse("/Computers*");
          ????????????? Console.Out.WriteLine("query = {0}", query.ToString());
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ????????????? query =qp.Parse("/Computers/JUnit*");
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
          ???????? }

          這里需要注意的是我們使用了QueryParser對象,而不是QueryParser類. 原因在于使用對象可以對QueryParser的一些默認屬性進行修改.比如在上面的例子中我們的category是大寫的,而QueryParser默認會把所有的含*的查詢字符串變成小寫/computer*. 這樣我們就會查不到原文中的/Computers* ,所以我們需要通過設置QueryParser的默認屬性來改變這一默認選項.即qp.SetLowercaseWildcardTerms(false)所做的工作.

          4.???? BooleanQuery

          用于測試滿足多個條件.

          下面兩個例子用于分別測試了滿足與條件和或條件的情況.

          ???????? [Test]
          ???????? public void And()
          ???????? {
          ????????????? TermQuery searchingBooks =
          ?????????????????? new TermQuery(new Term("subject", "junit"));

          ????????????? RangeQuery currentBooks =
          ?????????????????? new RangeQuery(new Term("pubmonth", "200301"),
          ?????????????????? ?????????????? new Term("pubmonth", "200312"),
          ?????????????????? ?????????????? true);
          ????????????? BooleanQuery currentSearchingBooks = new BooleanQuery();
          ????????????? currentSearchingBooks.Add(searchingBooks, true, false);
          ????????????? currentSearchingBooks.Add(currentBooks, true, false);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(currentSearchingBooks);

          ????????????? AssertHitsIncludeTitle(hits, "JUnit in Action");
          ???????? }
          ???????? [Test]
          ???????? public void Or()
          ???????? {
          ????????????? TermQuery methodologyBooks = new TermQuery(
          ?????????????????? new Term("category",
          ?????????????????? ???????? "/Computers/JUnit"));
          ????????????? TermQuery easternPhilosophyBooks = new TermQuery(
          ?????????????????? new Term("category",
          ?????????????????? ???????? "/Computers/Ant"));
          ????????????? BooleanQuery enlightenmentBooks = new BooleanQuery();
          ????????????? enlightenmentBooks.Add(methodologyBooks, false, false);
          ????????????? enlightenmentBooks.Add(easternPhilosophyBooks, false, false);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(enlightenmentBooks);
          ????????????? Console.Out.WriteLine("or = " + enlightenmentBooks);
          ????????????? AssertHitsIncludeTitle(hits, "Java Development with Ant");
          ????????????? AssertHitsIncludeTitle(hits, "JUnit in Action");

          ???????? }

          什么時候是與什么時候又是或? 關鍵在于BooleanQuery對象的Add方法的參數.

          參數一是待添加的查詢條件.

          參數二Required表示這個條件必須滿足嗎? True表示必須滿足, False表示可以不滿足該條件.

          參數三Prohibited表示這個條件必須拒絕嗎? True表示這么滿足這個條件的結果要排除, False表示可以滿足該條件.

          這樣會有三種組合情況,如下表所示:

          BooleanQuery QueryParse

          ???????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query = QueryParser.Parse("pubmonth:[200301 TO 200312] AND junit", "subject", new SimpleAnalyzer());
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length());
          ????????????? query = QueryParser.Parse("/Computers/JUnit OR /Computers/Ant", "category", new WhitespaceAnalyzer());
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ???????? }

          注意AND和OR的大小 如果想要A與非B 就用 A AND –B 表示, +A –B也可以.

          默認的情況下QueryParser會把空格認為是或關系,就象google一樣.但是你可以通過QueryParser對象修改這一屬性.

          [Test]
          ???????? public void TestQueryParserDefaultAND()
          ???????? {
          ????????????? QueryParser qp = new QueryParser("subject", new SimpleAnalyzer());
          ??????????? ??qp.SetOperator(QueryParser.DEFAULT_OPERATOR_AND );
          ????????????? Query query = qp.Parse("pubmonth:[200301 TO 200312] junit");
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length());

          ???????? }
          5.???????? PhraseQuery
          查詢短語,這里面主要有一個slop的概念, 也就是各個詞之間的位移偏差, 這個值會影響到結果的評分.如果slop為0,當然最匹配.看看下面的例子就比較容易明白了,有關slop的計算用戶就不需要理解了,不過slop太大的時候對查詢效率是有影響的,所以在實際使用中要把該值設小一點.?PhraseQuery對于短語的順序是不管的,這點在查詢時除了提高命中率外,也會對性能產生很大的影響, 利用SpanNearQuery可以對短語的順序進行控制,提高性能.
          ????? ? [SetUp]
          ???? protected void Init()
          ???? {
          ???????? // set up sample document
          ???????? RAMDirectory directory = new RAMDirectory();
          ???????? IndexWriter writer = new IndexWriter(directory,
          ???????? ???????????????????????????????????? new WhitespaceAnalyzer(), true);
          ???????? Document doc = new Document();
          ???????? doc.Add(Field.Text("field",
          ???????? ?????????????????? "the quick brown fox jumped over the lazy dog"));
          ???????? writer.AddDocument(doc);
          ???????? writer.Close();

          ???????? searcher = new IndexSearcher(directory);
          ???? }
          ??????private bool matched(String[] phrase, int slop)
          ???? {
          ???????? PhraseQuery query = new PhraseQuery();
          ???????? query.SetSlop(slop);

          ???????? for (int i = 0; i < phrase.Length; i++)
          ???????? {
          ????????????? query.Add(new Term("field", phrase[i]));
          ???????? }

          ???????? Hits hits = searcher.Search(query);
          ???????? return hits.Length() > 0;
          ???? }

          ???? [Test]
          ???? public void SlopComparison()
          ???? {
          ???????? String[] phrase = new String[]{"quick", "fox"};

          ???????? Assert.IsFalse(matched(phrase, 0), "exact phrase not found");

          ???????? Assert.IsTrue(matched(phrase, 1), "close enough");
          ???? }

          ?????[Test]
          ???? public void Reverse()
          ???? {
          ???????? String[] phrase = new String[] {"fox", "quick"};

          ???????? Assert.IsFalse(matched(phrase, 2), "exact phrase not found");

          ???????? Assert.IsTrue(matched(phrase, 3), "close enough");
          ???? }

          ???? [Test]
          ???? public void Multiple()-
          ???? {
          ???????? Assert.IsFalse(matched(new String[] {"quick", "jumped", "lazy"}, 3), "not close enough");
          ???????? Assert.IsTrue(matched(new String[] {"quick", "jumped", "lazy"}, 4), "just enough");
          ???????? Assert.IsFalse(matched(new String[] {"lazy", "jumped", "quick"}, 7), "almost but not quite");
          ???????? Assert.IsTrue(matched(new String[] {"lazy", "jumped", "quick"}, 8), "bingo");
          ???? }

          PhraseQuery和QueryParse

          利用QueryParse進行短語查詢的時候要先設定slop的值,有兩種方式如下所示

          [Test]
          ???? public void TestQueryParser()
          ???? {
          ???????? Query q1 = QueryParser.Parse(""quick fox"",
          ????????????? "field", new SimpleAnalyzer());
          ???????? Hits hits1 = searcher.Search(q1);
          ???????? Assert.AreEqual(hits1.Length(), 0);

          ???????? Query q2 = QueryParser.Parse(""quick fox"~1",????????? //第一種方式
          ???????? ??????????????????????????? "field", new SimpleAnalyzer());
          ???????? Hits hits2 = searcher.Search(q2);
          ???????? Assert.AreEqual(hits2.Length(), 1);

          ???????? QueryParser qp = new QueryParser("field", new SimpleAnalyzer());
          ???????? qp.SetPhraseSlop(1);??????????????????????????????????? //第二種方式
          ???????? Query q3=qp.Parse(""quick fox"");
          ???????? Assert.AreEqual(""quick fox"~1", q3.ToString("field"),"sloppy, implicitly");
          ???????? Hits hits3 = searcher.Search(q2);
          ???????? Assert.AreEqual(hits3.Length(), 1);
          ???? }

          6.???????? WildcardQuery
          通配符搜索,需要注意的是child, mildew的分值是一樣的.
          ???????? [Test]
          ???????? public void Wildcard()
          ???????? {
          ????????????? IndexSingleFieldDocs(new Field[]
          ?????????????????? {
          ?????????????????????? Field.Text("contents", "wild"),
          ?????????????????????? Field.Text("contents", "child"),
          ?????????????????????? Field.Text("contents", "mild"),
          ?????????????????????? Field.Text("contents", "mildew")
          ?????????????????? });
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Query query = new WildcardQuery(
          ?????????????????? new Term("contents", "?ild*"));
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(3, hits.Length(), "child no match");
          ????????????? Assert.AreEqual(hits.Score(0), hits.Score(1), 0.0, "score the same");
          ????????????? Assert.AreEqual(hits.Score(1), hits.Score(2), 0.0, "score the same");
          ???????? }

          WildcardQuery和QueryParse
          需要注意的是出于性能的考慮使用QueryParse的時候,不允許在開頭就使用就使用通配符.
          同樣處于性能考慮會將只在末尾含有*的查詢詞轉換為PrefixQuery.
          ???????? [Test, ExpectedException(typeof (ParseException))]
          ???????? public void TestQueryParserException()
          ???????? {
          ????????????? Query query = QueryParser.Parse("?ild*", "contents", new WhitespaceAnalyzer());
          ???????? }

          ???????? [Test]
          ???????? public void TestQueryParserTailAsterrisk()
          ???????? {
          ????????????? Query query = QueryParser.Parse("mild*", "contents", new WhitespaceAnalyzer());
          ????????????? Assert.IsTrue(query is PrefixQuery);
          ????????????? Assert.IsFalse(query is WildcardQuery);

          ???????? }

          ???????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query = QueryParser.Parse("mi?d*", "contents", new WhitespaceAnalyzer());
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ???????? }
          7.???????? FuzzyQuery
          模糊查詢, 需要注意的是兩個匹配項的分值是不同的,這點和WildcardQuery是不同的

          ???????? [Test]
          ???????? public void Fuzzy()
          ???????? {
          ????????????? Query query = new FuzzyQuery(new Term("contents", "wuzza"));
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual( 2, hits.Length(),"both close enough");
          ????????????? Assert.IsTrue(hits.Score(0) != hits.Score(1),"wuzzy closer than fuzzy");
          ????????????? Assert.AreEqual("wuzzy", hits.Doc(0).Get("contents"),"wuzza bear");
          ???????? }


          FuzzyQuery和QueryParse

          注意和PhraseQuery中表示slop的區(qū)別,前者~后要跟數字.

          ???????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query =QueryParser.Parse("wuzza~","contents",new SimpleAnalyzer());
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual( 2, hits.Length(),"both close enough");
          ???????? }

          posted on 2008-09-22 10:17 梓楓 閱讀(2432) 評論(2)  編輯  收藏 所屬分類: lucene

          FeedBack:
          # re: lucene具體的查詢語句
          2012-08-22 16:58 | 看不懂
          看不懂  回復  更多評論
            
          # cool[未登錄]
          2013-08-23 14:30 | lf
          cool  回復  更多評論
            
          主站蜘蛛池模板: 临沭县| 武鸣县| 绵阳市| 鄂州市| 石楼县| 鲁山县| 营山县| 维西| 萨嘎县| 马公市| 花莲市| 喀喇沁旗| 张家口市| 邢台市| 腾冲县| 凭祥市| 友谊县| 前郭尔| 宣恩县| 土默特右旗| 临武县| 鄂托克前旗| 南涧| 通许县| 宜宾市| 苏尼特左旗| 锡林浩特市| 文昌市| 左贡县| 襄垣县| 镇江市| 通山县| 芦山县| 延长县| 册亨县| 海南省| 饶平县| 湘潭县| 湘乡市| 阿拉善右旗| 鞍山市|