Skip to content
This repository has been archived by the owner on Mar 9, 2023. It is now read-only.

Add repr for Morpheme/MorphemeList #166

Closed
wants to merge 1 commit into from

Conversation

polm
Copy link
Contributor

@polm polm commented Sep 30, 2021

This makes it easier to check values when developing interactively. Probably should have been included with #124.

@kazuma-t
Copy link
Member

kazuma-t commented Sep 30, 2021

For object.__repr__(), the Python language reference has the following description.

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment).

(from 3.3.1. Basic customization)

This implementation does not seem to follow that purpose. Even if it can't, it is undesirable that it is the same as __str__(), I think.

@polm
Copy link
Contributor Author

polm commented Sep 30, 2021

The current output looks like this:

<sudachipy.morphemelist.MorphemeList at 0x7f9afedf7940>
<sudachipy.morpheme.Morpheme at 0x7f9afdc2c850>

That doesn't follow the purpose in the language reference and also isn't useful for anything as far as I can tell.

@kazuma-t
Copy link
Member

kazuma-t commented Sep 30, 2021

At least, begin, end, dictionary_id, word_id, and is_oov are needed to identify morpheme .

@eiennohito
Copy link
Collaborator

It is impossible to instantiate Morpheme/MorphemeList directly, so I think we need to decide whether their __repr__ should be more useful to developers of SudachiPy or users of SudachiPy.

@kazuma-t
Copy link
Member

kazuma-t commented Oct 1, 2021

How about a format like this,

<Morpheme (猫, 0:3, 0, 571365)>
<MorphemeList [(猫, 0:3, 0, 571365), (が, 3:6, 0, 45393), (ぴらる, 6:15, -1, -1)]>

( {surface}, {begin}:{end}, {dict_id}, {word_id}) (dict_id and word_id are -1 in OOV)

@eiennohito
Copy link
Collaborator

As the reasoning for the format, being able to detect whether the word comes from user dictionaries/system dictionaries or OOV can help to debug and resolve problems with user dictionaries.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants