Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize language option #79

Open
rth opened this issue Jun 15, 2020 · 0 comments
Open

Standardize language option #79

rth opened this issue Jun 15, 2020 · 0 comments

Comments

@rth
Copy link
Owner

rth commented Jun 15, 2020

From #78 (comment) by @joshlk

I think it would be sensible to identify different languages throughout the package using ISO two-letter codes (e.g. en, fr, de ...).

In particular, we should implement this for the Snowball stemmer in python which currently uses the full language names.

I am also wondering if in Rust, we should use String for the language parameter or define an Enum e.g.

use vtext::lang

let stemmer = SnowballStemmerParams::default().lang(lang::en).build()

The latter is probably simpler, but it makes it a bit harder to extend e.g. if someone designs an custom estimator for a language not in the list (e.g. some ancient infrequently used language), they would have to create a new enum.

Also just to be consistent the parameter name would be "lang" not "language", right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant