posts - 75,comments - 83,trackbacks - 0

          本文主要結(jié)合測試案例介紹了Lucene下的各種查詢語句以及它們的簡化方法.
          通過本文你將了解Lucene的基本查詢語句,并可以學(xué)習(xí)所有的測試代碼已加強(qiáng)了解.

          具體的查詢語句

          在了解了SQL后, 你是否想了解一下查詢語法樹?在這里簡要介紹一些能被Lucene直接使用的查詢語句.

          1.???????? TermQuery
          查詢某個(gè)特定的詞,在文章開始的例子中已有介紹.常用于查詢關(guān)鍵字.

          ???????????? [Test]
          ???????? public void Keyword()
          ???????? {
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Term t = new Term("isbn", "1930110995");
          ????????????? Query query = new TermQuery(t);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
          ???????? }

          注意Lucene中的關(guān)鍵字,是需要用戶去保證唯一性的.

          ?TermQuery和QueryParse

          只要在QueryParse的Parse方法中只有一個(gè)word,就會自動轉(zhuǎn)換成TermQuery.

          2.???????? RangeQuery
          用于查詢范圍,通常用于時(shí)間,還是來看例子:

          namespace dotLucene.inAction.BasicSearch
          {
          ???? public class RangeQueryTest : LiaTestCase
          ???? {
          ???????? private Term begin, end;

          ???????? [SetUp]
          ???????? protected override void Init()
          ???????? {
          ????????????? begin = new Term("pubmonth", "200004");

          ????????????? end = new Term("pubmonth", "200206");
          ????????????? base.Init();
          ???????? }

          ???????? [Test]
          ???????? public void Inclusive()
          ???????? {
          ????????????? RangeQuery query = new RangeQuery(begin, end, true);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);

          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length());
          ???????? }

          ???????? [Test]
          ???????? public void Exclusive()
          ???????? {
          ????????????? RangeQuery query = new RangeQuery(begin, end, false);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);

          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(0, hits.Length());
          ???????? }

          ???? }
          }

          RangeQuery的第三個(gè)參數(shù)用于表示是否包含該起止日期.

          RangeQuery QueryParse

          ????????????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query = QueryParser.Parse("pubmonth:[200004 TO 200206]", "subject", new SimpleAnalyzer());
          ????????????? Assert.IsTrue(query is RangeQuery);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);

          ????????????? query = QueryParser.Parse("{200004 TO 200206}", "pubmonth", new SimpleAnalyzer());
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(0, hits.Length(), "JDwA in 200206");
          ???????? }

          Lucene用[] 和{}分別表示包含和不包含.

          3.??? PrefixQuery

          用于搜索是否包含某個(gè)特定前綴,常用于Catalog的檢索.

          ???????????[Test]
          ???????? public? void? TestPrefixQuery()
          ???????? {
          ????????????? PrefixQuery query = new PrefixQuery(new Term("category", "/Computers"));

          ???????????? ?IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ?????????????
          ????????????? query = new PrefixQuery(new Term("category", "/Computers/JUnit"));
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
          ???????? }

          PrefixQuery和QueryParse

          ??????????? ? [Test]
          ???????? public void TestQueryParser()
          ???????? {

          ????????????? QueryParser qp = new QueryParser("category", new SimpleAnalyzer());
          ????????????? qp.SetLowercaseWildcardTerms(false);
          ????????????? Query query =qp.Parse("/Computers*");
          ????????????? Console.Out.WriteLine("query = {0}", query.ToString());
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ????????????? query =qp.Parse("/Computers/JUnit*");
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
          ???????? }

          這里需要注意的是我們使用了QueryParser對象,而不是QueryParser類. 原因在于使用對象可以對QueryParser的一些默認(rèn)屬性進(jìn)行修改.比如在上面的例子中我們的category是大寫的,而QueryParser默認(rèn)會把所有的含*的查詢字符串變成小寫/computer*. 這樣我們就會查不到原文中的/Computers* ,所以我們需要通過設(shè)置QueryParser的默認(rèn)屬性來改變這一默認(rèn)選項(xiàng).即qp.SetLowercaseWildcardTerms(false)所做的工作.

          4.???? BooleanQuery

          用于測試滿足多個(gè)條件.

          下面兩個(gè)例子用于分別測試了滿足與條件和或條件的情況.

          ???????? [Test]
          ???????? public void And()
          ???????? {
          ????????????? TermQuery searchingBooks =
          ?????????????????? new TermQuery(new Term("subject", "junit"));

          ????????????? RangeQuery currentBooks =
          ?????????????????? new RangeQuery(new Term("pubmonth", "200301"),
          ?????????????????? ?????????????? new Term("pubmonth", "200312"),
          ?????????????????? ?????????????? true);
          ????????????? BooleanQuery currentSearchingBooks = new BooleanQuery();
          ????????????? currentSearchingBooks.Add(searchingBooks, true, false);
          ????????????? currentSearchingBooks.Add(currentBooks, true, false);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(currentSearchingBooks);

          ????????????? AssertHitsIncludeTitle(hits, "JUnit in Action");
          ???????? }
          ???????? [Test]
          ???????? public void Or()
          ???????? {
          ????????????? TermQuery methodologyBooks = new TermQuery(
          ?????????????????? new Term("category",
          ?????????????????? ???????? "/Computers/JUnit"));
          ????????????? TermQuery easternPhilosophyBooks = new TermQuery(
          ?????????????????? new Term("category",
          ?????????????????? ???????? "/Computers/Ant"));
          ????????????? BooleanQuery enlightenmentBooks = new BooleanQuery();
          ????????????? enlightenmentBooks.Add(methodologyBooks, false, false);
          ????????????? enlightenmentBooks.Add(easternPhilosophyBooks, false, false);
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(enlightenmentBooks);
          ????????????? Console.Out.WriteLine("or = " + enlightenmentBooks);
          ????????????? AssertHitsIncludeTitle(hits, "Java Development with Ant");
          ????????????? AssertHitsIncludeTitle(hits, "JUnit in Action");

          ???????? }

          什么時(shí)候是與什么時(shí)候又是或? 關(guān)鍵在于BooleanQuery對象的Add方法的參數(shù).

          參數(shù)一是待添加的查詢條件.

          參數(shù)二Required表示這個(gè)條件必須滿足嗎? True表示必須滿足, False表示可以不滿足該條件.

          參數(shù)三Prohibited表示這個(gè)條件必須拒絕嗎? True表示這么滿足這個(gè)條件的結(jié)果要排除, False表示可以滿足該條件.

          這樣會有三種組合情況,如下表所示:

          BooleanQuery QueryParse

          ???????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query = QueryParser.Parse("pubmonth:[200301 TO 200312] AND junit", "subject", new SimpleAnalyzer());
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length());
          ????????????? query = QueryParser.Parse("/Computers/JUnit OR /Computers/Ant", "category", new WhitespaceAnalyzer());
          ????????????? hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ???????? }

          注意AND和OR的大小 如果想要A與非B 就用 A AND –B 表示, +A –B也可以.

          默認(rèn)的情況下QueryParser會把空格認(rèn)為是或關(guān)系,就象google一樣.但是你可以通過QueryParser對象修改這一屬性.

          [Test]
          ???????? public void TestQueryParserDefaultAND()
          ???????? {
          ????????????? QueryParser qp = new QueryParser("subject", new SimpleAnalyzer());
          ??????????? ??qp.SetOperator(QueryParser.DEFAULT_OPERATOR_AND );
          ????????????? Query query = qp.Parse("pubmonth:[200301 TO 200312] junit");
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(1, hits.Length());

          ???????? }
          5.???????? PhraseQuery
          查詢短語,這里面主要有一個(gè)slop的概念, 也就是各個(gè)詞之間的位移偏差, 這個(gè)值會影響到結(jié)果的評分.如果slop為0,當(dāng)然最匹配.看看下面的例子就比較容易明白了,有關(guān)slop的計(jì)算用戶就不需要理解了,不過slop太大的時(shí)候?qū)Σ樵冃适怯杏绊懙?所以在實(shí)際使用中要把該值設(shè)小一點(diǎn).?PhraseQuery對于短語的順序是不管的,這點(diǎn)在查詢時(shí)除了提高命中率外,也會對性能產(chǎn)生很大的影響, 利用SpanNearQuery可以對短語的順序進(jìn)行控制,提高性能.
          ????? ? [SetUp]
          ???? protected void Init()
          ???? {
          ???????? // set up sample document
          ???????? RAMDirectory directory = new RAMDirectory();
          ???????? IndexWriter writer = new IndexWriter(directory,
          ???????? ???????????????????????????????????? new WhitespaceAnalyzer(), true);
          ???????? Document doc = new Document();
          ???????? doc.Add(Field.Text("field",
          ???????? ?????????????????? "the quick brown fox jumped over the lazy dog"));
          ???????? writer.AddDocument(doc);
          ???????? writer.Close();

          ???????? searcher = new IndexSearcher(directory);
          ???? }
          ??????private bool matched(String[] phrase, int slop)
          ???? {
          ???????? PhraseQuery query = new PhraseQuery();
          ???????? query.SetSlop(slop);

          ???????? for (int i = 0; i < phrase.Length; i++)
          ???????? {
          ????????????? query.Add(new Term("field", phrase[i]));
          ???????? }

          ???????? Hits hits = searcher.Search(query);
          ???????? return hits.Length() > 0;
          ???? }

          ???? [Test]
          ???? public void SlopComparison()
          ???? {
          ???????? String[] phrase = new String[]{"quick", "fox"};

          ???????? Assert.IsFalse(matched(phrase, 0), "exact phrase not found");

          ???????? Assert.IsTrue(matched(phrase, 1), "close enough");
          ???? }

          ?????[Test]
          ???? public void Reverse()
          ???? {
          ???????? String[] phrase = new String[] {"fox", "quick"};

          ???????? Assert.IsFalse(matched(phrase, 2), "exact phrase not found");

          ???????? Assert.IsTrue(matched(phrase, 3), "close enough");
          ???? }

          ???? [Test]
          ???? public void Multiple()-
          ???? {
          ???????? Assert.IsFalse(matched(new String[] {"quick", "jumped", "lazy"}, 3), "not close enough");
          ???????? Assert.IsTrue(matched(new String[] {"quick", "jumped", "lazy"}, 4), "just enough");
          ???????? Assert.IsFalse(matched(new String[] {"lazy", "jumped", "quick"}, 7), "almost but not quite");
          ???????? Assert.IsTrue(matched(new String[] {"lazy", "jumped", "quick"}, 8), "bingo");
          ???? }

          PhraseQuery和QueryParse

          利用QueryParse進(jìn)行短語查詢的時(shí)候要先設(shè)定slop的值,有兩種方式如下所示

          [Test]
          ???? public void TestQueryParser()
          ???? {
          ???????? Query q1 = QueryParser.Parse(""quick fox"",
          ????????????? "field", new SimpleAnalyzer());
          ???????? Hits hits1 = searcher.Search(q1);
          ???????? Assert.AreEqual(hits1.Length(), 0);

          ???????? Query q2 = QueryParser.Parse(""quick fox"~1",????????? //第一種方式
          ???????? ??????????????????????????? "field", new SimpleAnalyzer());
          ???????? Hits hits2 = searcher.Search(q2);
          ???????? Assert.AreEqual(hits2.Length(), 1);

          ???????? QueryParser qp = new QueryParser("field", new SimpleAnalyzer());
          ???????? qp.SetPhraseSlop(1);??????????????????????????????????? //第二種方式
          ???????? Query q3=qp.Parse(""quick fox"");
          ???????? Assert.AreEqual(""quick fox"~1", q3.ToString("field"),"sloppy, implicitly");
          ???????? Hits hits3 = searcher.Search(q2);
          ???????? Assert.AreEqual(hits3.Length(), 1);
          ???? }

          6.???????? WildcardQuery
          通配符搜索,需要注意的是child, mildew的分值是一樣的.
          ???????? [Test]
          ???????? public void Wildcard()
          ???????? {
          ????????????? IndexSingleFieldDocs(new Field[]
          ?????????????????? {
          ?????????????????????? Field.Text("contents", "wild"),
          ?????????????????????? Field.Text("contents", "child"),
          ?????????????????????? Field.Text("contents", "mild"),
          ?????????????????????? Field.Text("contents", "mildew")
          ?????????????????? });
          ????????????? IndexSearcher searcher = new IndexSearcher(directory);
          ????????????? Query query = new WildcardQuery(
          ?????????????????? new Term("contents", "?ild*"));
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(3, hits.Length(), "child no match");
          ????????????? Assert.AreEqual(hits.Score(0), hits.Score(1), 0.0, "score the same");
          ????????????? Assert.AreEqual(hits.Score(1), hits.Score(2), 0.0, "score the same");
          ???????? }

          WildcardQuery和QueryParse
          需要注意的是出于性能的考慮使用QueryParse的時(shí)候,不允許在開頭就使用就使用通配符.
          同樣處于性能考慮會將只在末尾含有*的查詢詞轉(zhuǎn)換為PrefixQuery.
          ???????? [Test, ExpectedException(typeof (ParseException))]
          ???????? public void TestQueryParserException()
          ???????? {
          ????????????? Query query = QueryParser.Parse("?ild*", "contents", new WhitespaceAnalyzer());
          ???????? }

          ???????? [Test]
          ???????? public void TestQueryParserTailAsterrisk()
          ???????? {
          ????????????? Query query = QueryParser.Parse("mild*", "contents", new WhitespaceAnalyzer());
          ????????????? Assert.IsTrue(query is PrefixQuery);
          ????????????? Assert.IsFalse(query is WildcardQuery);

          ???????? }

          ???????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query = QueryParser.Parse("mi?d*", "contents", new WhitespaceAnalyzer());
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual(2, hits.Length());
          ???????? }
          7.???????? FuzzyQuery
          模糊查詢, 需要注意的是兩個(gè)匹配項(xiàng)的分值是不同的,這點(diǎn)和WildcardQuery是不同的

          ???????? [Test]
          ???????? public void Fuzzy()
          ???????? {
          ????????????? Query query = new FuzzyQuery(new Term("contents", "wuzza"));
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual( 2, hits.Length(),"both close enough");
          ????????????? Assert.IsTrue(hits.Score(0) != hits.Score(1),"wuzzy closer than fuzzy");
          ????????????? Assert.AreEqual("wuzzy", hits.Doc(0).Get("contents"),"wuzza bear");
          ???????? }


          FuzzyQuery和QueryParse

          注意和PhraseQuery中表示slop的區(qū)別,前者~后要跟數(shù)字.

          ???????? [Test]
          ???????? public void TestQueryParser()
          ???????? {
          ????????????? Query query =QueryParser.Parse("wuzza~","contents",new SimpleAnalyzer());
          ????????????? Hits hits = searcher.Search(query);
          ????????????? Assert.AreEqual( 2, hits.Length(),"both close enough");
          ???????? }

          posted on 2008-09-22 10:17 梓楓 閱讀(2432) 評論(2)  編輯  收藏 所屬分類: lucene

          FeedBack:
          # re: lucene具體的查詢語句
          2012-08-22 16:58 | 看不懂
          看不懂  回復(fù)  更多評論
            
          # cool[未登錄]
          2013-08-23 14:30 | lf
          cool  回復(fù)  更多評論
            

          只有注冊用戶登錄后才能發(fā)表評論。


          網(wǎng)站導(dǎo)航:
           
          主站蜘蛛池模板: 阳城县| 嵩明县| 江山市| 文登市| 陇川县| 德令哈市| 英吉沙县| 伊宁县| 昆明市| 百色市| 随州市| 南岸区| 乐清市| 台安县| 南郑县| 金乡县| 新巴尔虎右旗| 长兴县| 龙里县| 遂平县| 庆阳市| 金门县| 客服| 洞头县| 鲁甸县| 崇礼县| 广丰县| 盐津县| 久治县| 安泽县| 阿巴嘎旗| 汝阳县| 新田县| 汉源县| 武陟县| 名山县| 莱芜市| 汉沽区| 灵寿县| 新津县| 连江县|