Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unicode_word_indices #91

Merged

Conversation

basile-henry
Copy link
Contributor

The PR adds a new iterator: UnicodeWordIndices (and the function unicode_word_indices). It is similar to UnicodeWords but also provides byte offsets for each word.

The motivation for this PR was making nushell/reedline#5 in which I used split_word_bound_indices and then filtered the result using logic that is internal to unicode_words. I believe that PR would have been trivial using unicode_word_indices. Hopefully it can also be useful to others.

Should I add more tests for unicode_word_indices? Or are the existing tests for unicode_words and the doc test for unicode_word_indices sufficient?

The iterator UnicodeWordIndices is similar to UnicodeWord but also provides byte offsets for each word
@Manishearth Manishearth closed this Mar 7, 2021
@Manishearth Manishearth reopened this Mar 7, 2021
@Manishearth
Copy link
Member

Retriggering GHA

@Manishearth Manishearth merged commit cea3ce6 into unicode-rs:master Mar 9, 2021
@basile-henry basile-henry deleted the basile/unicode-word-indices branch March 9, 2021 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants