Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed performance bug. Using a list where a set is needed. #2125

Merged
merged 1 commit into from
Feb 3, 2022

Conversation

baregawi
Copy link
Contributor

@baregawi baregawi commented Feb 3, 2022

Proposed changes:

  • change a list to a set to make "dedupe" linear time instead of quadratic time.

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

@baregawi
Copy link
Contributor Author

baregawi commented Feb 3, 2022

@ZanSara could you approve the workflow if you're around? Also, there is only 3 lines of code change if you could code review it. This would seem to slow down write_documents in base document store significantly.

@ZanSara
Copy link
Contributor

ZanSara commented Feb 3, 2022

Yep looks good! As long as the pipeline is green I'll approve this 👍

@ZanSara ZanSara merged commit 1fa682a into deepset-ai:master Feb 3, 2022
@ZanSara
Copy link
Contributor

ZanSara commented Feb 3, 2022

Thank you so much for the help by the way! You're doing a great job lately! 😊

@baregawi
Copy link
Contributor Author

baregawi commented Feb 3, 2022

It is my pleasure! It makes a lot more sense to build something like this together then everyone having their own, lower quality version of it lol. Thank Deepset for starting this!

I think my focus will be performance and batching at the moment since those seem to be bottlenecks to experimentation. Let me know if there is anywhere else I should focus on.

@baregawi baregawi deleted the fix_dedupe_perf_bug branch February 4, 2022 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:document_store topic:speed type:refactor Not necessarily visible to the users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants