Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow extraction from fields that may be collections. #21

Merged
merged 5 commits into from
Jan 5, 2023

Conversation

tmpfs
Copy link
Collaborator

@tmpfs tmpfs commented Dec 31, 2022

The current API only supports fields that extract &str directly from the document however there are times when we need to borrow from a collection in the document.

My use case is that I have a HashSet<String> to represent tags associated with the document.

With this PR my extract functions look like this:

// Label
fn label_extract(d: &Document) -> Vec<&str> {
    vec![d.2.label()]
}

// Tags
fn tags_extract(d: &Document) -> Vec<&str> {
    d.2.tags().iter().map(|s| &s[..]).collect()
}

I expected the extra heap allocation for the Vec to have a minor impact on performance but running the benchmarks shows a small performance improvement 🤷

@marcus-pousette
Copy link
Collaborator

Thanks for the PR.

Thinking about this feature fundamentally, this is basically multi-valued field support. If there is a performance penalty (even though it is not shown in the benchmarks because the formatter did some other optimisation) I would say it is worth it.

Could you update the README also?

@tmpfs
Copy link
Collaborator Author

tmpfs commented Dec 31, 2022

So should we update to have a single extractor function, this would simplify the code instead of passing around slices of FieldAccessor - what do you think?

Happy New Year too! 😁

@marcus-pousette
Copy link
Collaborator

marcus-pousette commented Dec 31, 2022

I was working on this issue before #19, which basically does that, but I had some troubles keeping the performance on the same level. But that would basically eliminate the need for passing around these functions and you would just impl the trait instead

I would say, keep this PR as it is (except fixing the README), and then we can move on later and simplify things even more!

Happy new year to you too! 🎉

@tmpfs
Copy link
Collaborator Author

tmpfs commented Jan 4, 2023

Ok thanks @marcus-pousette that makes sense, I have updated the README, lets get this merged as is and migrate to the trait based extractor later. Cheers 👍

@marcus-pousette marcus-pousette merged commit 8d215e4 into quantleaf:master Jan 5, 2023
@marcus-pousette
Copy link
Collaborator

Super! Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants