本文主要結(jié)合測試案例介紹了Lucene下的各種查詢語句以及它們的簡化方法.
通過本文你將了解Lucene的基本查詢語句,并可以學(xué)習(xí)所有的測試代碼已加強(qiáng)了解.
具體的查詢語句
在了解了SQL后, 你是否想了解一下查詢語法樹?在這里簡要介紹一些能被Lucene直接使用的查詢語句.
1.????????
TermQuery
查詢某個(gè)特定的詞,在文章開始的例子中已有介紹.常用于查詢關(guān)鍵字.
???????????? [Test]
???????? public void Keyword()
???????? {
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Term t = new Term("isbn", "1930110995");
????????????? Query query = new TermQuery(t);
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
???????? }
注意Lucene中的關(guān)鍵字,是需要用戶去保證唯一性的.
?TermQuery和QueryParse
只要在QueryParse的Parse方法中只有一個(gè)word,就會自動轉(zhuǎn)換成TermQuery.
2.????????
RangeQuery
用于查詢范圍,通常用于時(shí)間,還是來看例子:
namespace dotLucene.inAction.BasicSearch
{
???? public class RangeQueryTest : LiaTestCase
???? {
???????? private Term begin, end;
???????? [SetUp]
???????? protected override void Init()
???????? {
????????????? begin = new Term("pubmonth", "200004");
????????????? end = new Term("pubmonth", "200206");
????????????? base.Init();
???????? }
???????? [Test]
???????? public void Inclusive()
???????? {
????????????? RangeQuery query = new RangeQuery(begin, end, true);
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(1, hits.Length());
???????? }
???????? [Test]
???????? public void Exclusive()
???????? {
????????????? RangeQuery query = new RangeQuery(begin, end, false);
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(0, hits.Length());
???????? }
???? }
}
RangeQuery的第三個(gè)參數(shù)用于表示是否包含該起止日期.
RangeQuery 和 QueryParse
????????????? [Test]
???????? public void TestQueryParser()
???????? {
????????????? Query query = QueryParser.Parse("pubmonth:[200004 TO 200206]", "subject", new SimpleAnalyzer());
????????????? Assert.IsTrue(query is RangeQuery);
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(query);
????????????? query = QueryParser.Parse("{200004 TO 200206}", "pubmonth", new SimpleAnalyzer());
????????????? hits = searcher.Search(query);
????????????? Assert.AreEqual(0, hits.Length(), "JDwA in 200206");
???????? }
Lucene用[] 和{}分別表示包含和不包含.
3.??? PrefixQuery
用于搜索是否包含某個(gè)特定前綴,常用于Catalog的檢索.
???????????[Test]
???????? public? void? TestPrefixQuery()
???????? {
????????????? PrefixQuery query = new PrefixQuery(new Term("category", "/Computers"));
???????????? ?IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(2, hits.Length());
?????????????
????????????? query = new PrefixQuery(new Term("category", "/Computers/JUnit"));
????????????? hits = searcher.Search(query);
????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
???????? }
PrefixQuery和QueryParse
??????????? ? [Test]
???????? public void TestQueryParser()
???????? {
????????????? QueryParser qp = new QueryParser("category", new SimpleAnalyzer());
????????????? qp.SetLowercaseWildcardTerms(false);
????????????? Query query =qp.Parse("/Computers*");
????????????? Console.Out.WriteLine("query = {0}", query.ToString());
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(2, hits.Length());
????????????? query =qp.Parse("/Computers/JUnit*");
????????????? hits = searcher.Search(query);
????????????? Assert.AreEqual(1, hits.Length(), "JUnit in Action");
???????? }
這里需要注意的是我們使用了QueryParser對象,而不是QueryParser類. 原因在于使用對象可以對QueryParser的一些默認(rèn)屬性進(jìn)行修改.比如在上面的例子中我們的category是大寫的,而QueryParser默認(rèn)會把所有的含*的查詢字符串變成小寫/computer*. 這樣我們就會查不到原文中的/Computers* ,所以我們需要通過設(shè)置QueryParser的默認(rèn)屬性來改變這一默認(rèn)選項(xiàng).即qp.SetLowercaseWildcardTerms(false)所做的工作.
4.????
BooleanQuery
用于測試滿足多個(gè)條件.
下面兩個(gè)例子用于分別測試了滿足與條件和或條件的情況.
???????? [Test]
???????? public void And()
???????? {
????????????? TermQuery searchingBooks =
?????????????????? new TermQuery(new Term("subject", "junit"));
????????????? RangeQuery currentBooks =
?????????????????? new RangeQuery(new Term("pubmonth", "200301"),
?????????????????? ?????????????? new Term("pubmonth", "200312"),
?????????????????? ?????????????? true);
????????????? BooleanQuery currentSearchingBooks = new BooleanQuery();
????????????? currentSearchingBooks.Add(searchingBooks, true, false);
????????????? currentSearchingBooks.Add(currentBooks, true, false);
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(currentSearchingBooks);
????????????? AssertHitsIncludeTitle(hits, "JUnit in Action");
???????? }
???????? [Test]
???????? public void Or()
???????? {
????????????? TermQuery methodologyBooks = new TermQuery(
?????????????????? new Term("category",
?????????????????? ???????? "/Computers/JUnit"));
????????????? TermQuery easternPhilosophyBooks = new TermQuery(
?????????????????? new Term("category",
?????????????????? ???????? "/Computers/Ant"));
????????????? BooleanQuery enlightenmentBooks = new BooleanQuery();
????????????? enlightenmentBooks.Add(methodologyBooks, false, false);
????????????? enlightenmentBooks.Add(easternPhilosophyBooks, false, false);
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(enlightenmentBooks);
????????????? Console.Out.WriteLine("or = " + enlightenmentBooks);
????????????? AssertHitsIncludeTitle(hits, "Java Development with Ant");
????????????? AssertHitsIncludeTitle(hits, "JUnit in Action");
???????? }
什么時(shí)候是與什么時(shí)候又是或? 關(guān)鍵在于BooleanQuery對象的Add方法的參數(shù).
參數(shù)一是待添加的查詢條件.
參數(shù)二Required表示這個(gè)條件必須滿足嗎? True表示必須滿足, False表示可以不滿足該條件.
參數(shù)三Prohibited表示這個(gè)條件必須拒絕嗎? True表示這么滿足這個(gè)條件的結(jié)果要排除, False表示可以滿足該條件.
這樣會有三種組合情況,如下表所示:
BooleanQuery 和 QueryParse
???????? [Test]
???????? public void TestQueryParser()
???????? {
????????????? Query query = QueryParser.Parse("pubmonth:[200301 TO 200312] AND junit", "subject", new SimpleAnalyzer());
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(1, hits.Length());
????????????? query = QueryParser.Parse("/Computers/JUnit OR /Computers/Ant", "category", new WhitespaceAnalyzer());
????????????? hits = searcher.Search(query);
????????????? Assert.AreEqual(2, hits.Length());
???????? }
注意AND和OR的大小 如果想要A與非B 就用 A AND –B 表示, +A –B也可以.
默認(rèn)的情況下QueryParser會把空格認(rèn)為是或關(guān)系,就象google一樣.但是你可以通過QueryParser對象修改這一屬性.
[Test]
???????? public void TestQueryParserDefaultAND()
???????? {
????????????? QueryParser qp = new QueryParser("subject", new SimpleAnalyzer());
??????????? ??qp.SetOperator(QueryParser.DEFAULT_OPERATOR_AND );
????????????? Query query = qp.Parse("pubmonth:[200301 TO 200312] junit");
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(1, hits.Length());
???????? }
5.???????? PhraseQuery
查詢短語,這里面主要有一個(gè)slop的概念, 也就是各個(gè)詞之間的位移偏差, 這個(gè)值會影響到結(jié)果的評分.如果slop為0,當(dāng)然最匹配.看看下面的例子就比較容易明白了,有關(guān)slop的計(jì)算用戶就不需要理解了,不過slop太大的時(shí)候?qū)Σ樵冃适怯杏绊懙?所以在實(shí)際使用中要把該值設(shè)小一點(diǎn).?PhraseQuery對于短語的順序是不管的,這點(diǎn)在查詢時(shí)除了提高命中率外,也會對性能產(chǎn)生很大的影響, 利用SpanNearQuery可以對短語的順序進(jìn)行控制,提高性能.
????? ? [SetUp]
???? protected void Init()
???? {
???????? // set up sample document
???????? RAMDirectory directory = new RAMDirectory();
???????? IndexWriter writer = new IndexWriter(directory,
???????? ???????????????????????????????????? new WhitespaceAnalyzer(), true);
???????? Document doc = new Document();
???????? doc.Add(Field.Text("field",
???????? ?????????????????? "the quick brown fox jumped over the lazy dog"));
???????? writer.AddDocument(doc);
???????? writer.Close();
???????? searcher = new IndexSearcher(directory);
???? }
??????private bool matched(String[] phrase, int slop)
???? {
???????? PhraseQuery query = new PhraseQuery();
???????? query.SetSlop(slop);
???????? for (int i = 0; i < phrase.Length; i++)
???????? {
????????????? query.Add(new Term("field", phrase[i]));
???????? }
???????? Hits hits = searcher.Search(query);
???????? return hits.Length() > 0;
???? }
???? [Test]
???? public void SlopComparison()
???? {
???????? String[] phrase = new String[]{"quick", "fox"};
???????? Assert.IsFalse(matched(phrase, 0), "exact phrase not found");
???????? Assert.IsTrue(matched(phrase, 1), "close enough");
???? }
?????[Test]
???? public void Reverse()
???? {
???????? String[] phrase = new String[] {"fox", "quick"};
???????? Assert.IsFalse(matched(phrase, 2), "exact phrase not found");
???????? Assert.IsTrue(matched(phrase, 3), "close enough");
???? }
???? [Test]
???? public void Multiple()-
???? {
???????? Assert.IsFalse(matched(new String[] {"quick", "jumped", "lazy"}, 3), "not close enough");
???????? Assert.IsTrue(matched(new String[] {"quick", "jumped", "lazy"}, 4), "just enough");
???????? Assert.IsFalse(matched(new String[] {"lazy", "jumped", "quick"}, 7), "almost but not quite");
???????? Assert.IsTrue(matched(new String[] {"lazy", "jumped", "quick"}, 8), "bingo");
???? }
PhraseQuery和QueryParse
利用QueryParse進(jìn)行短語查詢的時(shí)候要先設(shè)定slop的值,有兩種方式如下所示
[Test]
???? public void TestQueryParser()
???? {
???????? Query q1 = QueryParser.Parse(""quick fox"",
????????????? "field", new SimpleAnalyzer());
???????? Hits hits1 = searcher.Search(q1);
???????? Assert.AreEqual(hits1.Length(), 0);
???????? Query q2 = QueryParser.Parse(""quick fox"~1",????????? //第一種方式
???????? ??????????????????????????? "field", new SimpleAnalyzer());
???????? Hits hits2 = searcher.Search(q2);
???????? Assert.AreEqual(hits2.Length(), 1);
???????? QueryParser qp = new QueryParser("field", new SimpleAnalyzer());
???????? qp.SetPhraseSlop(1);??????????????????????????????????? //第二種方式
???????? Query q3=qp.Parse(""quick fox"");
???????? Assert.AreEqual(""quick fox"~1", q3.ToString("field"),"sloppy, implicitly");
???????? Hits hits3 = searcher.Search(q2);
???????? Assert.AreEqual(hits3.Length(), 1);
???? }
6.???????? WildcardQuery
通配符搜索,需要注意的是child, mildew的分值是一樣的.
???????? [Test]
???????? public void Wildcard()
???????? {
????????????? IndexSingleFieldDocs(new Field[]
?????????????????? {
?????????????????????? Field.Text("contents", "wild"),
?????????????????????? Field.Text("contents", "child"),
?????????????????????? Field.Text("contents", "mild"),
?????????????????????? Field.Text("contents", "mildew")
?????????????????? });
????????????? IndexSearcher searcher = new IndexSearcher(directory);
????????????? Query query = new WildcardQuery(
?????????????????? new Term("contents", "?ild*"));
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(3, hits.Length(), "child no match");
????????????? Assert.AreEqual(hits.Score(0), hits.Score(1), 0.0, "score the same");
????????????? Assert.AreEqual(hits.Score(1), hits.Score(2), 0.0, "score the same");
???????? }
WildcardQuery和QueryParse
需要注意的是出于性能的考慮使用QueryParse的時(shí)候,不允許在開頭就使用就使用通配符.
同樣處于性能考慮會將只在末尾含有*的查詢詞轉(zhuǎn)換為PrefixQuery.
???????? [Test, ExpectedException(typeof (ParseException))]
???????? public void TestQueryParserException()
???????? {
????????????? Query query = QueryParser.Parse("?ild*", "contents", new WhitespaceAnalyzer());
???????? }
???????? [Test]
???????? public void TestQueryParserTailAsterrisk()
???????? {
????????????? Query query = QueryParser.Parse("mild*", "contents", new WhitespaceAnalyzer());
????????????? Assert.IsTrue(query is PrefixQuery);
????????????? Assert.IsFalse(query is WildcardQuery);
???????? }
???????? [Test]
???????? public void TestQueryParser()
???????? {
????????????? Query query = QueryParser.Parse("mi?d*", "contents", new WhitespaceAnalyzer());
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual(2, hits.Length());
???????? }
7.???????? FuzzyQuery
模糊查詢, 需要注意的是兩個(gè)匹配項(xiàng)的分值是不同的,這點(diǎn)和WildcardQuery是不同的
???????? [Test]
???????? public void Fuzzy()
???????? {
????????????? Query query = new FuzzyQuery(new Term("contents", "wuzza"));
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual( 2, hits.Length(),"both close enough");
????????????? Assert.IsTrue(hits.Score(0) != hits.Score(1),"wuzzy closer than fuzzy");
????????????? Assert.AreEqual("wuzzy", hits.Doc(0).Get("contents"),"wuzza bear");
???????? }
FuzzyQuery和QueryParse
注意和PhraseQuery中表示slop的區(qū)別,前者~后要跟數(shù)字.
???????? [Test]
???????? public void TestQueryParser()
???????? {
????????????? Query query =QueryParser.Parse("wuzza~","contents",new SimpleAnalyzer());
????????????? Hits hits = searcher.Search(query);
????????????? Assert.AreEqual( 2, hits.Length(),"both close enough");
???????? }