Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rapidfuzz>=2.8.0 #3202

Closed
tstadel opened this issue Sep 12, 2022 · 1 comment
Closed

Support rapidfuzz>=2.8.0 #3202

tstadel opened this issue Sep 12, 2022 · 1 comment
Labels
1.x breaking change Contributions wanted! Looking for external contributions topic:dependencies topic:eval type:refactor Not necessarily visible to the users

Comments

@tstadel
Copy link
Member

tstadel commented Sep 12, 2022

Is your feature request related to a problem? Please describe.
rapidfuzz 2.8.0 introduced a fix that should make the custom implementation of boost_split_overlap in haystack.utils.calculate_context_similarity obsolete.

Describe the solution you'd like

  • Remove version pin of rapidfuzz<2.8.0
  • Remove custom implementation of boost_split_overlap in haystack.utils.calculate_context_similarity
  • Find an appropriate threshold for similarity scores (currently 65) so all tests in others/test_utils.py pass, set this threshold as default value to haystack.utils.match_context, haystack.utils.match_contexts, haystack.Pipeline.eval, haystack.Pipeline.execute_eval_run and haystack.Pipeline._build_eval_dataframe
  • make similarity tests deterministic which use numpy.random by setting numpy.random.seed before executing

Describe alternatives you've considered

  • keep version pin

Additional context
Tests in others/test_utils.py about context similarity should not need to be changed. Getting rid of some imprecisions (e.g. accuracy assessments in tests from 99% to 100%) would be appreciated

@tstadel tstadel added Contributions wanted! Looking for external contributions breaking change topic:eval type:refactor Not necessarily visible to the users journey:advanced labels Sep 12, 2022
@masci masci added the P2 Medium priority, add to the next sprint if no P1 available label Nov 24, 2022
@masci masci added P3 Low priority, leave it in the backlog and removed P2 Medium priority, add to the next sprint if no P1 available labels Jan 25, 2023
@masci masci added 1.x and removed P3 Low priority, leave it in the backlog labels Dec 13, 2023
@masci
Copy link
Contributor

masci commented Dec 13, 2023

Closing as won't do, we'll stay with 2.8 for the 1.x release line

@masci masci closed this as completed Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.x breaking change Contributions wanted! Looking for external contributions topic:dependencies topic:eval type:refactor Not necessarily visible to the users
Projects
None yet
Development

No branches or pull requests

3 participants