Cannot create analyzer that has more than 2 synonym_graph filters #111507

cipher450 · 2024-08-01T10:23:07Z

Elasticsearch Version

8.11.4

Installed Plugins

No response

Java Version

bundled

OS Version

Linux fedora 6.9.7-100.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 27 18:06:32 UTC 2024 x86_64 GNU/Linux

Problem Description

while trying to switch from synonym type to synonym_graph i had these settings , i first tried to create a test index in the kibana dev tools by using PUT with the mapping and settings found in steps to reproduce , however that ended up in an usable index with no error thrown just a timeout after a while , after that i tried to remove one of the filters from the analyzer and then worked just fine and created my index keep in mind that this exact mapping/settings worked fine with the synonyme type.

Steps to Reproduce

Use the following to create a test index :

  "settings": {
    "analysis": {
      "filter": {
        "synonyms_filter_ar": {
          "type": "synonym_graph",
          "updateable": true,
          "synonyms_set": "synonyms_ar"
        },
        "synonyms_filter_fr": {
          "type": "synonym_graph",
          "updateable": true,
          "synonyms_set": "synonyms_fr"
        },
        "synonyms_filter_en": {
          "type": "synonym_graph",
          "updateable": true,
          "synonyms_set": "synonyms_en"
        }
      },
      "analyzer": {
        "synonyms_analyzer": {
          "filter": [
            "synonyms_filter_fr",
            "synonyms_filter_ar",
            "synonyms_filter_en"
          ],
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "synonyme": {
            "type": "text",
            "analyzer": "default",
            "search_analyzer": "synonyms_analyzer"
          }
        }
      }
    }
  }
}

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-08-01T16:09:12Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

carlosdelest · 2024-08-01T16:53:03Z

Hi @cipher450 :

It's possible that chaining the synonym graph filters together caused some interaction that ended up in invalid synonyms. That would mean that the index is not usable when reopened.

Did you get any error log messages in Elasticsearch? Having invalid synonyms should log the error to your Elasticsearch log.

In case no error is logged, could you provide your synonym set rules to try to reproduce the issue?

Thanks!

cipher450 · 2024-08-01T18:46:42Z

Hi @carlosdelest,

You were right it was indeed the chaining of the filters that caused some invalid synonyms , i do not know which one was it since overtime i deleted and add new rules , i now tried it since you mentioned this and it did create the index Thanks for the reply!
However, now I'm concerned about this kind of issue occurring in production, where I plan to deploy a system with CRUD operations for managing synonym rules. How can I prevent such problems from happening?

carlosdelest · 2024-08-02T08:45:20Z

However, now I'm concerned about this kind of issue occurring in production, where I plan to deploy a system with CRUD operations for managing synonym rules. How can I prevent such problems from happening?

Updating a synonym set will try to reload analyzers. In case there's an error in reloading, you will get the reload analyzers response error in the update. If analyzers fail to reload, the change won't be applied to the analyzer and the index will be on green status until it is reopened (via closing / opening or a node restart), so you can revert your change or further change your synonym set to rectify the offending synonym set.

Ways to actually prevent this issue before happening:

Use lenient: true in your synonym graph token filter to ignore rules that result in an error. This will prevent your indices becoming red, but you will need to check the Elasticsearch log to ensure there are no errors in the synonyms.
Have a staging environment where you apply these changes first to validate them.
Have a copy of the indices (no data is needed on them) that reference a copy of the synonym sets. Apply the changes first to the synonym sets copy to ensure the analyzers are reloaded correctly, and then update the original synonym sets used in your production indices.

We're working on a fix to this issue that will come in 8.16 - basically we will make lenient: true as default behaviour in reloadable analyzers.

cipher450 · 2024-08-02T13:37:04Z

Use lenient: true in your synonym graph token filter to ignore rules that result in an error. This will prevent your indices becoming red, but you will need to check the Elasticsearch log to ensure there are no errors in the synonyms.

Have a staging environment where you apply these changes first to validate them.

This sounds good to me i'l do just that , once again thank you for your answer !

cipher450 added >bug needs:triage Requires assignment of a team area label labels Aug 1, 2024

Mikep86 added :Search Relevance/Analysis How text is split into tokens v8.11.4 and removed needs:triage Requires assignment of a team area label labels Aug 1, 2024

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 1, 2024

cipher450 closed this as completed Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot create analyzer that has more than 2 synonym_graph filters #111507

Cannot create analyzer that has more than 2 synonym_graph filters #111507

cipher450 commented Aug 1, 2024

elasticsearchmachine commented Aug 1, 2024

carlosdelest commented Aug 1, 2024

cipher450 commented Aug 1, 2024 •

edited

Loading

carlosdelest commented Aug 2, 2024

cipher450 commented Aug 2, 2024

Cannot create analyzer that has more than 2 synonym_graph filters #111507

Cannot create analyzer that has more than 2 synonym_graph filters #111507

Comments

cipher450 commented Aug 1, 2024

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

elasticsearchmachine commented Aug 1, 2024

carlosdelest commented Aug 1, 2024

cipher450 commented Aug 1, 2024 • edited Loading

carlosdelest commented Aug 2, 2024

cipher450 commented Aug 2, 2024

cipher450 commented Aug 1, 2024 •

edited

Loading