Option for case insensitive search at runtime #61162

markharwood · 2020-08-14T17:03:51Z

This meta issue tracks the various changes relating to offering a "case insensitive" option to various term-level queries (term, terms, prefix, wildcard, regex) at search time. It replaces the previous #53603 which meandered with various discussions.
In query DSL we will offer a new case_insensitive flag which can only be set to true to enable new behaviour. When left unset the existing behaviour is used (which is inconsistent - keyword fields with normalizers normalize query terms while text fields do not). Due to these inconsistencies and lack of guarantees, setting the case_insensitive flag to false will throw an error.

Tasks

Add case insensitive option to Lucene's RegExpQuery - merged RegExp - add case insensitive matching option apache/lucene-solr#1541
regexp queries in elasticsearch DSL - Issue PR Search - add case insensitive support for regex queries. #59441
Lucene query_string support for /Foo/i regex syntax - PR LUCENE-9445 Add support for case insensitive regex searches in QueryParser apache/lucene-solr#1708
term queries in elasticsearch DSL- issue Search - add case insensitive flag to term, prefix and wildcard queries #61546
terms queries in elasticsearch DSL
prefix queries in elasticsearch DSL - issue Search - add case insensitive flag to term, prefix and wildcard queries #61546
wildcard query in elasticsearch DSL - issue Search - add case insensitive flag to term, prefix and wildcard queries #61546

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-08-14T17:03:53Z

Pinging @elastic/es-search (:Search/Search)

markharwood · 2020-08-26T10:14:08Z

@jimczi / @jpountz - shall I open Lucene issues to add case insensitive param to TermQuery, PrefixQuery and WildcardQuery or create case insensitive variants for elasticsearch?

jpountz · 2020-08-26T13:04:03Z

I wonder if we really need new queries or if we could reuse AutomatonQuery to build case-insensitive variants?

markharwood · 2020-08-26T13:26:27Z

I wonder if we really need new queries or if we could reuse AutomatonQuery to build case-insensitive variants?

PrefixQuery and WildcardQuery already do reuse AutomatonQuery? They're lightweight subclasses that take args and implement a toAutomaton() function. The case insensitive options we want to add can either be added as an "if" statement in core Lucene or we fork those classes as something like this:

public class CaseInsensitivePrefixQuery extends CaseInsensitiveAutomatonQuery {

/** Constructs a case insensitive query for terms starting with <code>prefix</code>. */
public CaseInsensitivePrefixQuery(Term prefix) {
    super(prefix, toAutomaton(prefix.bytes()), Integer.MAX_VALUE, true);
}

/** Build an automaton accepting all terms with the specified prefix, case insensitive. */
public static Automaton toAutomaton(BytesRef prefix) {
    if (prefix == null) {
        throw new NullPointerException("prefix must not be null");
    }
    List<Automaton> list = new ArrayList<>();
    String s = prefix.utf8ToString();
    Iterator<Integer> iter = s.codePoints().iterator();
    while (iter.hasNext()) {
        list.add(toCaseInsensitiveChar(iter.next(), Integer.MAX_VALUE));
    }
    list.add(Automata.makeAnyString());

    Automaton a = Operations.concatenate(list);
    a = MinimizationOperations.minimize(a, Integer.MAX_VALUE);
    return a;
}

The CaseInsensitiveAutomatonQuery base class proposed above offers toCaseInsensitiveChar helper function to help create [Ff][Oo][Oo] type sequences from foo input

mbudge · 2020-10-12T15:47:11Z

The problem we have with the beats templates is they explicitly set each field, which takes priority over any settings applied through dynamic templates. In event-management and incident response, we need case insensitive search to mitigate the risk of important events being missed due to keywords being case sensitive, and data being collected from many different systems on the network. We would need to write a lot of code to do the string normalisation for every field in every parser/Logstash.

Instead we have a python script which adds the lowercase normaliser to every field in the beats template. But this means we have to run the template through the script every time a new version is released. With elastic moving to doing beats/template management through fleet, and enrichment moving from the javascript to ingest pipeline, we would still have to run each template through the python script to add the lowercase normaliser.

We would be happy with an index level setting which adds the lowercase normaliser to every field when the index is created. That way teams who want to lowercase all keywords can apply this setting once in the index settings, and use KQL to do case-insensitive search without needing to add multi-fields.

markharwood · 2021-01-11T15:28:56Z

Closing as complete because with the 2 remaining tasks there were issues:

Query string case insensitive regex - there was no clean way to add the /i syntax to Lucene in a backwards compatible way.

Terms query - concerns over query complexity explosion and performance meant

markharwood added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.0.0 labels Aug 14, 2020

markharwood self-assigned this Aug 14, 2020

elasticmachine added the Team:Search Meta label for search team label Aug 14, 2020

markharwood mentioned this issue Aug 14, 2020

Support case insensitive search on new wildcard field and keyword #53603

Closed

jimczi added the Meta label Aug 14, 2020

jimczi changed the title ~~Meta issue - option for case insensitive search at runtime~~ Option for case insensitive search at runtime Aug 14, 2020

jimczi removed the v8.0.0 label Aug 14, 2020

markharwood mentioned this issue Aug 25, 2020

Search - add case insensitive flag to term, prefix and wildcard queries #61546

Closed

costin mentioned this issue Sep 2, 2020

EQL: Revisit case insensitivity #61883

Closed

markharwood closed this as completed Jan 11, 2021

davendu mentioned this issue Nov 26, 2021

support case_insensitive in terms query #71520

Open

thomaslow mentioned this issue Sep 26, 2022

Extended Search: option to ignore upper and lower case kitodo/kitodo-production#4980

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option for case insensitive search at runtime #61162

Option for case insensitive search at runtime #61162

markharwood commented Aug 14, 2020 •

edited

Loading

elasticmachine commented Aug 14, 2020

markharwood commented Aug 26, 2020 •

edited

Loading

jpountz commented Aug 26, 2020

markharwood commented Aug 26, 2020 •

edited

Loading

mbudge commented Oct 12, 2020

markharwood commented Jan 11, 2021

Option for case insensitive search at runtime #61162

Option for case insensitive search at runtime #61162

Comments

markharwood commented Aug 14, 2020 • edited Loading

Tasks

elasticmachine commented Aug 14, 2020

markharwood commented Aug 26, 2020 • edited Loading

jpountz commented Aug 26, 2020

markharwood commented Aug 26, 2020 • edited Loading

mbudge commented Oct 12, 2020

markharwood commented Jan 11, 2021

markharwood commented Aug 14, 2020 •

edited

Loading

markharwood commented Aug 26, 2020 •

edited

Loading

markharwood commented Aug 26, 2020 •

edited

Loading