Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS Filter: Allow forward matching #21

Closed
sorami opened this issue Apr 2, 2018 · 1 comment
Closed

POS Filter: Allow forward matching #21

sorami opened this issue Apr 2, 2018 · 1 comment

Comments

@sorami
Copy link
Collaborator

sorami commented Apr 2, 2018

The sudachi_part_of_speech filter excludes the words with specified POS information.

Sudachi POS information is a list, consisting 6 items; Currently, a user can specify either

  • 1st-4th items together (excluding asterisk items)
  • 5th item (活用型)
  • 6th item (活用形)
    to filter out the result.

Currently, the user needs to specify the entire POS information.

It would be convenient if a user can just write part of the POS (say, first 1 or 2 items of the POS information list), and the filtering is done by forward matching.

Thanks to @cidrugHug8 for mentioning the topic in Elasticsearchのための新しい形態素解析器 「Sudachi」 - Qiita (in Japanese).

@sorami
Copy link
Collaborator Author

sorami commented Apr 10, 2018

Merged #22.

Forward matching for POS [0-4], [4, 6], and [5].

So now you can write in any of these forms;

  • 1 - e.g., 名詞
  • 1,2 - e.g., 名詞,固有名詞
  • 1,2,3 - e.g., 名詞,固有名詞,地名
  • 1,2,3,4 - e.g., 名詞,固有名詞,地名,一般
  • 5 - e.g., 五段-カ行
  • 6 - e.g., 終止形-一般
  • 5,6 - e.g., 五段-カ行,終止形-一般

@sorami sorami closed this as completed Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant