elasticsearch should实现or功能,设置minimum_should_match


elasticsearch实现传统数据库中的or功能,需要使用bool下面的should关键字,对于A or B的情况,应该至少返回A和B中的一个,但是如下语句,不仅返回A和B中的至少一个,也返回了没有A也没有B的情况:

 {
   "query": {
     "bool": {
       "fileter":[
           {"range":{"date.keyword":{"gt":"20170101","lt":"20170201"}}}
       ]
       "should": [
           {"term": {"A.keyword": "0000000000"}},
           {"term": {"B.keyword": "0000000001"}}
      ]
    }
  }
}

参看elasticsearch官方文档,对should的说明如下:

should

The clause (query) should appear in the matching document. If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match. In this case these clauses are only used to influence the score. If the bool query is a filter context or has neither must or filter then at least one of the should queries must match a document for it to match the bool query. This behavior may be explicitly controlled by settings the minimum_should_match parameter.

表达的意思是:如果一个query语句的bool下面,除了should语句,还包含了filter或者must语句,那么should context下的查询语句可以一个都不满足,只是_score=0,所以上述查询语句,有无should语句,查询到的hits().total()是一样的,只是score不同而已。

为了达到传统数据库中or的功能,有如下两种方法:

  1. 将should语句写到must下面,然后让must和filter并列
    {
      "query": {
        "bool": {
    	  "fileter":[
    	      {"range":{"date.keyword":{"gt":"20170101","lt":"20170201"}}}
    	  ],
    	  "must":[
    		  {
    			"bool":{
    			  "should": [
    				  {"term": {"A.keyword": "0000000000"}},
    				  {"term": {"B.keyword": "0000000001"}}
    			  ]
    			 }
    		  }
    	  ]
        }
      }
    }

     2. 采用官方文档中的 minimum_should_match 参数

TypeExampleDescription

Integer

3

Indicates a fixed value regardless of the number of optional clauses.

Negative integer

-2

Indicates that the total number of optional clauses, minus this number should be mandatory.

Percentage

75%

Indicates that this percent of the total number of optional clauses are necessary. The number computed from the percentage is rounded down and used as the minimum.

Negative percentage

-25%

Indicates that this percent of the total number of optional clauses can be missing. The number computed from the percentage is rounded down, before being subtracted from the total to determine the minimum.

Combination

3<90%

A positive integer, followed by the less-than symbol, followed by any of the previously mentioned specifiers is a conditional specification. It indicates that if the number of optional clauses is equal to (or less than) the integer, they are all required, but if it’s greater than the integer, the specification applies. In this example: if there are 1 to 3 clauses they are all required, but for 4 or more clauses only 90% are required.

Multiple combinations

2<-25% 9<-3

Multiple conditional specifications can be separated by spaces, each one only being valid for numbers greater than the one before it. In this example: if there are 1 or 2 clauses both are required, if there are 3-9 clauses all but 25% are required, and if there are more than 9 clauses, all but three are required.

minimum_should_match代表了最小匹配精度,如果设置minimum_should_match=1,那么should语句中至少需要有一个条件满足,查询语句如下:

{
  "query": {
    "bool": {
	  "fileter":[
	      {"range":{"date.keyword":{"gt":"20170101","lt":"20170201"}}}
	  ]
      "should": [
		  {"term": {"A.keyword": "0000000000"}},
		  {"term": {"B.keyword": "0000000001"}}
      ],
	  "minimum_should_match":1
    }
  }
}

第一种方法和第二种方法返回的结果是一致的。

另外,minimum_should_match的参数很多:

http://blog.csdn.net/xiao_jun_0820/article/details/51095521  讲的很清楚。

elk