Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: change type of split_by to Literal including None #3389

Merged
merged 5 commits into from
Oct 19, 2022

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Oct 14, 2022

Related Issues

Proposed Changes:

This PR changes the type annotation of split_by from str to Literal["word", "sentence", "passage", None].
The type of the split_by parameter indicated here was str but it should be Optional[str] or even more explicitly be Literal["word", "sentence", "passage", None] instead: https://github.com/deepset-ai/haystack/blob/main/haystack/nodes/preprocessor/preprocessor.py#L54 and the schema file generated based on that with the following line should also allow None as the type of split_by then:

Checklist

@julian-risch julian-risch requested a review from a team as a code owner October 14, 2022 09:54
@julian-risch julian-risch requested review from masci and removed request for a team October 14, 2022 09:54
@julian-risch julian-risch marked this pull request as draft October 14, 2022 09:58
@sjrl
Copy link
Contributor

sjrl commented Oct 14, 2022

Hey, @julian-risch I realize this is still in draft, but I'd like to ask a quick question. Would it make sense to make this option a Literal like Literal["word", "sentence", "passage", None] in this scenario? I see we do that in other parts of Haystack like here. However, I'm not sure if using a Literal here would still cause errors in the config validation so I don't know if it is relevant here. Any insight on this would be appreciated!

@julian-risch
Copy link
Member Author

Hi @sjrl not a bad idea actually. 🙂 I was just thinking about a quick fix for the issue reported by a community member but a Literal would be a cleaner solution. I'll do that instead. Thanks for the hint. 👍

@julian-risch julian-risch added topic:preprocessing type:bug Something isn't working labels Oct 14, 2022
@julian-risch julian-risch changed the title bug: change type of split_by to Optional[str] bug: change type of split_by to Literal including None Oct 14, 2022
@julian-risch
Copy link
Member Author

@masci This PR is ready for review. 🙂

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@julian-risch julian-risch added this to the 1.10.0 milestone Oct 19, 2022
@julian-risch julian-risch merged commit 16723bf into main Oct 19, 2022
@julian-risch julian-risch deleted the fix-split-by-type branch October 19, 2022 08:11
masci pushed a commit that referenced this pull request Oct 20, 2022
* change type of split_by

* fix mpy and update schema files

* change split_by type to Literal

* handle ImportError for Literal py<3.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:preprocessing type:bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Load PreProcessor with split_by: None from config fails
3 participants