Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create analyzer that has more than 2 synonym_graph filters #111507

Closed
cipher450 opened this issue Aug 1, 2024 · 5 comments
Closed

Cannot create analyzer that has more than 2 synonym_graph filters #111507

cipher450 opened this issue Aug 1, 2024 · 5 comments
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.11.4

Comments

@cipher450
Copy link

Elasticsearch Version

8.11.4

Installed Plugins

No response

Java Version

bundled

OS Version

Linux fedora 6.9.7-100.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 27 18:06:32 UTC 2024 x86_64 GNU/Linux

Problem Description

while trying to switch from synonym type to synonym_graph i had these settings , i first tried to create a test index in the kibana dev tools by using PUT with the mapping and settings found in steps to reproduce , however that ended up in an usable index with no error thrown just a timeout after a while , after that i tried to remove one of the filters from the analyzer and then worked just fine and created my index keep in mind that this exact mapping/settings worked fine with the synonyme type.

Steps to Reproduce

Use the following to create a test index :

  "settings": {
    "analysis": {
      "filter": {
        "synonyms_filter_ar": {
          "type": "synonym_graph",
          "updateable": true,
          "synonyms_set": "synonyms_ar"
        },
        "synonyms_filter_fr": {
          "type": "synonym_graph",
          "updateable": true,
          "synonyms_set": "synonyms_fr"
        },
        "synonyms_filter_en": {
          "type": "synonym_graph",
          "updateable": true,
          "synonyms_set": "synonyms_en"
        }
      },
      "analyzer": {
        "synonyms_analyzer": {
          "filter": [
            "synonyms_filter_fr",
            "synonyms_filter_ar",
            "synonyms_filter_en"
          ],
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "synonyme": {
            "type": "text",
            "analyzer": "default",
            "search_analyzer": "synonyms_analyzer"
          }
        }
      }
    }
  }
}

Logs (if relevant)

No response

@cipher450 cipher450 added >bug needs:triage Requires assignment of a team area label labels Aug 1, 2024
@Mikep86 Mikep86 added :Search Relevance/Analysis How text is split into tokens v8.11.4 and removed needs:triage Requires assignment of a team area label labels Aug 1, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 1, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@carlosdelest
Copy link
Member

Hi @cipher450 :

It's possible that chaining the synonym graph filters together caused some interaction that ended up in invalid synonyms. That would mean that the index is not usable when reopened.

Did you get any error log messages in Elasticsearch? Having invalid synonyms should log the error to your Elasticsearch log.

In case no error is logged, could you provide your synonym set rules to try to reproduce the issue?

Thanks!

@cipher450
Copy link
Author

cipher450 commented Aug 1, 2024

Hi @carlosdelest,

You were right it was indeed the chaining of the filters that caused some invalid synonyms , i do not know which one was it since overtime i deleted and add new rules , i now tried it since you mentioned this and it did create the index Thanks for the reply!
However, now I'm concerned about this kind of issue occurring in production, where I plan to deploy a system with CRUD operations for managing synonym rules. How can I prevent such problems from happening?

@carlosdelest
Copy link
Member

However, now I'm concerned about this kind of issue occurring in production, where I plan to deploy a system with CRUD operations for managing synonym rules. How can I prevent such problems from happening?

Updating a synonym set will try to reload analyzers. In case there's an error in reloading, you will get the reload analyzers response error in the update. If analyzers fail to reload, the change won't be applied to the analyzer and the index will be on green status until it is reopened (via closing / opening or a node restart), so you can revert your change or further change your synonym set to rectify the offending synonym set.

Ways to actually prevent this issue before happening:

  • Use lenient: true in your synonym graph token filter to ignore rules that result in an error. This will prevent your indices becoming red, but you will need to check the Elasticsearch log to ensure there are no errors in the synonyms.
  • Have a staging environment where you apply these changes first to validate them.
  • Have a copy of the indices (no data is needed on them) that reference a copy of the synonym sets. Apply the changes first to the synonym sets copy to ensure the analyzers are reloaded correctly, and then update the original synonym sets used in your production indices.

We're working on a fix to this issue that will come in 8.16 - basically we will make lenient: true as default behaviour in reloadable analyzers.

@cipher450
Copy link
Author

  • Use lenient: true in your synonym graph token filter to ignore rules that result in an error. This will prevent your indices becoming red, but you will need to check the Elasticsearch log to ensure there are no errors in the synonyms.
  • Have a staging environment where you apply these changes first to validate them.

This sounds good to me i'l do just that , once again thank you for your answer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.11.4
Projects
None yet
Development

No branches or pull requests

4 participants