Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix error in log message #2719

Merged
merged 5 commits into from
Jun 24, 2022
Merged

fix error in log message #2719

merged 5 commits into from
Jun 24, 2022

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Jun 23, 2022

Related Issue(s): #1749

Proposed changes:

  • In order to specify a correct log message, pass the index argument from _handle_duplicate_documents further on to _drop_duplicate_documents and change the logging there.
    @ZanSara please give it a look.

Pre-flight checklist

@ZanSara ZanSara self-requested a review June 24, 2022 08:50
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 😊

One last thing missing is to actually pass the index parameter to _drop_duplicate_documents here:

documents = self._drop_duplicate_documents(documents)

Once done I will merge it 👍

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests before merging

@anakin87
Copy link
Member Author

One last thing missing is to actually pass the index parameter to _drop_duplicate_documents here:

@ZanSara sorry for forgetting 😅

@masci there are already these two tests:

def test_write_with_duplicate_doc_ids(document_store: BaseDocumentStore):

def test_write_with_duplicate_doc_ids_custom_index(document_store: BaseDocumentStore):

Should we think of other tests?

@ZanSara
Copy link
Contributor

ZanSara commented Jun 24, 2022

@masci I agree with @anakin87 here, this change affects only a log line, and has no functional impact on neither _handle_duplicate_documents or _drop_duplicate_documents. I don't think we need additional tests for it.

We might use caplog to check the log is emitted with the correct index name, but I don't think it's that important honestly 😄

@masci
Copy link
Contributor

masci commented Jun 24, 2022

@masci I agree with @anakin87 here, this change affects only a log line, and has no functional impact on neither _handle_duplicate_documents or _drop_duplicate_documents. I don't think we need additional tests for it.

This adds logic and I believe it's not directly covered https://github.com/deepset-ai/haystack/pull/2719/files#diff-f69bf03b2cc38f923c98a0d1a1375adac40255f48a5ee8754181c92af3197408R618

Edit: outside the context of this PR, but changing a signature of a function without the need of changing tests is a yellow flag to me.

@anakin87
Copy link
Member Author

I try to recap.

Current situation
What happens: dropped duplicate documents in CUSTOM_INDEX
Log message: "Document with id ... already exists in index DEFAULT_INDEX" 🐛

Fixed situation
What happens: dropped duplicate documents in CUSTOM_INDEX
Log message: "Document with id ... already exists in index CUSTOM_INDEX" 👌

In practice, the index parameter is passed to _drop_duplicate_documents only to return the right logging message, but the actual dropping does not depend on this parameter.

@masci however, if we identify significant test cases, we can include them. I can't think of anything particularly relevant, but as @ZanSara suggested, we could verify the log using caplog. @masci what do you think?

@ZanSara
Copy link
Contributor

ZanSara commented Jun 24, 2022

I stand by my opinion here that that line you link does not affect the functionality of _drop_duplicate_documents. It could be safely removed and placed into the log's f-string. We can actually do this, to make it more explicit that the index is used nowhere else.

Edit: outside the context of this PR, but changing a signature of a function without the need of changing tests is a yellow flag to me.

Agreed, but in this specific case this is a private function with a very limited use, and it's covered by two tests that check the important parameters. A reasonable solution here, to avoid the signature change, would be to merge _handle_duplicate_documents and _drop_duplicate_documents: I don't see at a first glance why they are separate.

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to test logging output. I stand by my point that index or self.index translates to a conditional branch and this would probably lower a diff coverage report (if we had one :)) but there's no need to block this PR.

@ZanSara ZanSara merged commit 42b1a5c into deepset-ai:master Jun 24, 2022
@anakin87 anakin87 deleted the fix_drop_duplicate_documents branch June 24, 2022 15:00
Krak91 pushed a commit to Krak91/haystack that referenced this pull request Jul 26, 2022
* fix error in log message

* Update Documentation & Code Style

* pass index to _drop_duplicate_documents

* make the use of index in logging more explicit

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants