Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve __str__ and __repr__ #187

Merged
merged 3 commits into from
Dec 7, 2021

Conversation

eiennohito
Copy link
Collaborator

@eiennohito eiennohito commented Dec 2, 2021

Based on WorksApplications/SudachiPy#166
Fixes #122

Closes #SudachiPy/166

There is a slight difference in the proposed format caused by WordId formatting, the implemented version uses (dic_id, word_id)

>>> d = sudachipy.Dictionary()
>>> tok = d.create(sudachipy.SplitMode.A)
>>> mrs = tok.tokenize("外国人参政権")
>>> mrs
<MorphemeList[
  <Morpheme(外国, 0:2, (0, 375175))>,
  <Morpheme(人, 2:3, (0, 284079))>,
  <Morpheme(参政, 3:5, (0, 331513))>,
  <Morpheme(権, 5:6, (0, 522170))>,
]>
>>> str(mrs)
'外国 人 参政 権'
>>> mrs[0]
<Morpheme(外国, 0:2, (0, 375175))>
>>> str(mrs[0])
'外国'

Remaining question:
Should strings be naked as they are now or should we put them into quotes? (<Morpheme('外国', 0:2, (0, 375175))>)

@eiennohito eiennohito merged commit afe1a1e into WorksApplications:develop Dec 7, 2021
@eiennohito eiennohito deleted the 122-ergonomics branch December 7, 2021 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide more user-friendly __repr__ and __str__ for Morpheme/MorphemeList/Dictionary/Tokenizer
2 participants