maxBooleanClauses behavior in Solr 7.x Vs 8.x
This article details boolean query limit change which was done in 8.1 and above and will be helpful if you plan to upgrade from 7.x to 8.x versions.
What Changed?
Boolean queries limit has been restricted (Standard query parser) via a global limit in Solr version 8.1. This limit has been introduced by an entry in solr.xml:
<int name=”maxBooleanClauses”>${solr.max.booleanClauses:1024}</int>
In solr 7.x versions, This limit was handled using below entry in solrconfig.xml :
<maxBooleanClauses>1024</maxBooleanClauses>
But Starting Solr 8.1 and above limit in solrconfig.xml will not be respected if it exceeds the global limit of 1024 defined in solr.xml.
Reason for the change
Changes made in Solr 7.0 set the effective value of BoleanQuery.getMaxClauseCount to Integer.MAX_VALUE-1 and imposed a restriction based on the (existing) solrconfig.xml setting at the Solr query parser level via a new utility helper method.
But this means programmatically generated queries (either by low-level Lucene methods or by query re-writing) no longer had any safety valve to prevent (effectively) infinite expansion.
This issue was fixed by
- Restoring a default upper bound on BoleanQuery.getMaxClauseCount of 1024
- Introducing a new solr.xml level setting for configuring this upper bound:
<int name=”maxBooleanClauses”>${solr.max.booleanClauses:1024}</int>
Note: This solr.xml limit is a hard upper bound, that supersedes the existing solrconfig.xml setting, which has been left in place and still limits the size of user-specified boolean queries.
Solution: If you are using the boolean queries and have a use case of having a limit of more than 1024 then you need to increase the limit in solr.xml file.
Recommendation
Explore using Terms query parser instead of boolean queries.
TermsQParser
functions similarly to the Term Query Parser but takes in multiple values separated by commas and returns documents matching any of the specified values.
This can be useful for generating filter queries from the external human-readable terms returned by the faceting or terms components and may be more efficient in some cases than using the Standard Query Parser to generate a boolean query since the default implementation method avoids scoring.