Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic reloading of the user dic #84

Open
HIRANO-Satoshi opened this issue Feb 13, 2020 · 5 comments
Open

Automatic reloading of the user dic #84

HIRANO-Satoshi opened this issue Feb 13, 2020 · 5 comments

Comments

@HIRANO-Satoshi
Copy link

There is a nice plugin which syncs config files such as user dic among ES instances.

Is it possible to add an automatic reloading feature when the user dic is updated?

Without such functionality we need to open/close all indexes that use the dic on all ES instances, or restart all ES instances. That is a burden.

@eiennohito eiennohito transferred this issue from WorksApplications/Sudachi Dec 24, 2021
@eiennohito
Copy link
Collaborator

eiennohito commented Dec 24, 2021

Transferred to elasticsearch plugin repo. WIll try to implement this in some capacity.
Dictionary reload will not be supported for binary dictionaries of plain sudachi (but may be supported for auto-compiled csv-based dictionaries).

@eiennohito
Copy link
Collaborator

After a consideration there is a problem: what to do with old documents which were analyzed by a different dictionary.
When reloading a dictionary, some documents can produce different token streams and will not be searchable anymore.

Possible actions:

  1. Do nothing (easiest implementation)
  2. Reindex all documents (this will be pretty difficult to implement atomically)

Any ideas on possible behavior?

@ackintosh
Copy link

+1 to 1. Do nothing. Users can reindex documents if needed after updating the user dictionary themselves.

@ackintosh
Copy link

@eiennohito @mh-northlander
Also, I'm available to take the time to implement the 1 solution if it's okay with you, but since I'm new, I'll need some pointers to tackle the implementation.

@mh-northlander
Copy link
Collaborator

+1 to the "do nothing".
Basicaly user should perform reindex explicitly when they change Sudachi settings (including dictionary).
Here we focus on cases where reindexing is not neccessary or skipped.

ReloadAware.maybeReload() method should be responsible for the reloading. Currently it's implementation (see DictionaryService.kt) does not check file update.
It should be noted that most of Sudachi classes (e.g. Tokenizer, PosMatcher) depends on the dictionary and all of them need to be reloaded. Also, we need to consider when is safe to reload or how to handle middle-analysis things, and analysis cache.

As reloading dictionary can cause problems, "auto" reload should be turned off by default.
If possible I want to add a custom REST API to kick it explicitly like elasticsearch-configsync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants