-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an ability to convert result of surface() method to normalized variants by specifying a projection #230
Comments
@eiennohito We need to consider the way to pass the projection type without modifying BertJapaneseTokenizer in transformers repo.
These three arguments are the candidates for specifying projection type. https://github.com/huggingface/transformers/blob/v4.29.0/src/transformers/models/bert_japanese/tokenization_bert_japanese.py#L533-L546 |
In the long term it would be better to submit a patch to huggingface transformers to provide |
also, add projection functionality Fixes #230
@hiroshi-matsuda-rit see https://github.com/WorksApplications/sudachi.rs/blob/da5aca62e3cef8892ceaf64e7ac4f9ef25c2f8d1/python/tests/test_projection.py, any comments on the functionality? |
@eiennohito Thank you for all your efforts in this issue. It seems new implementations are very nice. |
No problem, I missed point on requirements about supporting chiTra forms. I will add that implementation as well. |
@hiroshi-matsuda-rit I will add support only for normalized_and_surface, normalized_nouns, and dictionary_and_surface, please say if other types are also needed (they require more work) |
Excellent! Thanks for additional implementations. (I think those three options are enough for us.) @eiennohito |
also, add projection functionality Fixes #230
Projection can be normalization transforms as specified by ChiTra or a user-passed callable.
Original surface will always be available as
raw_surface()
.Specification
projection
argument toDictionary.create
which can accept either a string or a callable.projection
is passed, instead of usualMorpheme.surface()
result, Sudachi.rs will produce a projected string.surface()
raw_surface()
method onMorpheme
The text was updated successfully, but these errors were encountered: