PhraseQuery使用位置信息來(lái)進(jìn)行相關(guān)查詢,比如TermQuery使用“我們”和“祖國(guó)”進(jìn)行查詢,那么文檔中含有這兩個(gè)詞的所有記錄都會(huì)被查詢出來(lái)。但是有一種情況,我們可能需要查詢“我們”和“中國(guó)”之間只隔一個(gè)字和兩個(gè)字或者兩個(gè)字等,而不是它們之間字距相差十萬(wàn)八千里,就可以使用PhraseQuery。比如下面的情況:
    doc.add(Field.Text("field", "the quick brown fox jumped over the lazy dog"));
那么:
    String[] phrase = new String[] {"quick", "fox"};
    assertFalse("exact phrase not found", matched(phrase, 0));
    assertTrue("close enough", matched(phrase, 1));
multi-terms:
    assertFalse("not close enough", matched(new String[] {"quick", "jumped", "lazy"}, 3));
    assertTrue("just enough", matched(new String[] {"quick", "jumped", "lazy"}, 4));
    assertFalse("almost but not quite", matched(new String[] {"lazy", "jumped", "quick"}, 7));
    assertTrue("bingo", matched(new String[] {"lazy", "jumped", "quick"}, 8));

數(shù)字表示slop,通過(guò)如下方式設(shè)置,表示按照順序從第一個(gè)字段到第二個(gè)字段之間間隔的term個(gè)數(shù)。
    query.setSlop(slop);

順序很重要:
    String[] phrase = new String[] {"fox", "quick"};
assertFalse("hop flop", matched(phrase, 2));
assertTrue("hop hop slop", matched(phrase, 3));

原理如下圖所示:


對(duì)于查詢關(guān)鍵字quick和fox,只需要fox移動(dòng)一個(gè)位置即可匹配quick brown fox。而對(duì)于fox和quick這兩個(gè)關(guān)鍵字
需要將fox移動(dòng)三個(gè)位置。移動(dòng)的距離越大,那么這項(xiàng)記錄的score就越小,被查詢出來(lái)的可能行就越小了。

SpanQuery利用位置信息查詢更有意思的查詢:

SpanQuery type         Description
SpanTermQuery         Used in conjunction with the other span query types. On its own, it’s
                                        functionally equivalent to TermQuery.
SpanFirstQuery         Matches spans that occur within the first part of a field.
SpanNearQuery         Matches spans that occur near one another.
SpanNotQuery         Matches spans that don’t overlap one another.
SpanOrQuery             Aggregates matches of span queries.

SpanFirstQuery:To query for spans that occur within the first n positions of a field, use Span-FirstQuery.



quick = new SpanTermQuery(new Term("f", "quick"));
brown = new SpanTermQuery(new Term("f", "brown"));
red = new SpanTermQuery(new Term("f", "red"));
fox = new SpanTermQuery(new Term("f", "fox"));
lazy = new SpanTermQuery(new Term("f", "lazy"));
sleepy = new SpanTermQuery(new Term("f", "sleepy"));
dog = new SpanTermQuery(new Term("f", "dog"));
cat = new SpanTermQuery(new Term("f", "cat"));

SpanFirstQuery sfq = new SpanFirstQuery(brown, 2);
assertNoMatches(sfq);
sfq = new SpanFirstQuery(brown, 3);
assertOnlyBrownFox(sfq);

SpanNearQuery:

彼此相鄰的跨度

      首先,強(qiáng)調(diào)一下PhraseQuery對(duì)象,這個(gè)對(duì)象不屬于跨度查詢類,但能完成跨度查詢功能。

      匹配到的文檔所包含的項(xiàng)通常是彼此相鄰的,考慮到原文檔中在查詢項(xiàng)之間可能有一些中間項(xiàng),或?yàn)榱四懿樵兊古诺捻?xiàng),PhraseQuery設(shè)置了slop因子,但是這個(gè)slop因子指2個(gè)項(xiàng)允許最大間隔距離,不是傳統(tǒng)意義上的距離,是按順序組成給定的短語(yǔ),所需要移動(dòng)位置的次數(shù)這表示PhraseQuery是必須按照項(xiàng)在文檔中出現(xiàn)的順序計(jì)算跨度的,如quick brown fox為文檔,則quick fox2個(gè)項(xiàng)的slop為1,quick向后移動(dòng)一次.而fox quick需要quick向后移動(dòng)3次,所以slop為3

      其次,來(lái)看一下SpanQuery的子類SpanTermQuery。

      它能跨度查詢,并且不一定非要按項(xiàng)在文檔中出現(xiàn)的順序,可以用一個(gè)獨(dú)立的標(biāo)記表示查詢對(duì)象必須按順序,或允許按倒過(guò)來(lái)的順序完成匹配。匹配的跨度也不是指移動(dòng)位置的次數(shù),是指從第一個(gè)跨度的起始位置到最后一個(gè)跨度的結(jié)束位置。

      在SpanNearQuery中將SpanTermQuery對(duì)象作為SpanQuery對(duì)象使用的效果,與使用PharseQuery的效果非常相似。在SpanNearQuery的構(gòu)造函數(shù)中的第三個(gè)參數(shù)為inOrder標(biāo)志,設(shè)置這個(gè)標(biāo)志,表示按項(xiàng)在文檔中出現(xiàn)的順序倒過(guò)來(lái)的順序。

      如:the quick brown fox jumps over the lazy dog這個(gè)文檔

      public void testSpanNearQuery() throws Exception{

           SpanQuery[] quick_brown_dog=new SpanQuery[]{quick,brown,dog};

           SpanNearQuery snq=new SpanNearQuery(quick_brown_dog,0,true);//按正常順序,跨度為0,對(duì)三個(gè)項(xiàng)進(jìn)行查詢

           assertNoMatches(snq);//無(wú)法匹配

           SpanNearQuery snq=new SpanNearQuery(quick_brown_dog,4,true);//按正常順序,跨度為4,對(duì)三個(gè)項(xiàng)進(jìn)行查詢

           assertNoMatches(snq);//無(wú)法匹配

           SpanNearQuery snq=new SpanNearQuery(quick_brown_dog,4,true);//按正常順序,跨度為5,對(duì)三個(gè)項(xiàng)進(jìn)行查詢

           assertOnlyBrownFox(snq);//匹配成功    

           SpanNearQuery snq=new SpanNearQuery(new SpanQuery[]{lazy,fox},3,false);//按相反順序,跨度為3,對(duì)三個(gè)項(xiàng)進(jìn)行查詢

           assertOnlyBrownFox(snq);//匹配成功   

           //下面使用PhraseQuery進(jìn)行查詢,因?yàn)槭前错樞颍詌azy和fox必須要跨度為5

           PhraseQuery pq=new PhraseQuery();

           pq.add(new Term("f","lazy"));

           pq.add(new Term("f","lazy"));

           pq.setslop(4);

           assertNoMatches(pq);//跨度4無(wú)法匹配

           //PharseQuery,slop因子為5

           pq.setSlop(5);

           assertOnlyBrownFox(pq);          

      }


3.PhrasePrefixQuery 主要用來(lái)進(jìn)行同義詞查詢的:
    IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true);
    Document doc1 = new Document();
    doc1.add(Field.Text("field", "the quick brown fox jumped over the lazy dog"));
    writer.addDocument(doc1);
    Document doc2 = new Document();
    doc2.add(Field.Text("field","the fast fox hopped over the hound"));
    writer.addDocument(doc2);

    PhrasePrefixQuery query = new PhrasePrefixQuery();
    query.add(new Term[] {new Term("field", "quick"), new Term("field", "fast")});
    query.add(new Term("field", "fox"));

    Hits hits = searcher.search(query);
    assertEquals("fast fox match", 1, hits.length());
    query.setSlop(1);
    hits = searcher.search(query);
    assertEquals("both match", 2, hits.length());