Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve Document __repr__ #3385

Merged
merged 3 commits into from
Oct 19, 2022
Merged

fix: improve Document __repr__ #3385

merged 3 commits into from
Oct 19, 2022

Conversation

anakin87
Copy link
Member

Related Issues

Proposed Changes:

As reported in #3382, there was a minor bug in __repr__ method of the Document class.
Since "The truth value of an array with more than one element is ambiguous.", I check instead that the embedding is not None.

How did you test it?

Manual tests

Notes for the reviewer

Other small fixes

Checklist

@anakin87 anakin87 requested a review from a team as a code owner October 14, 2022 00:40
@anakin87 anakin87 requested review from mayankjobanputra and removed request for a team October 14, 2022 00:40
@masci
Copy link
Contributor

masci commented Oct 19, 2022

Hey @anakin87 thanks for the PR but I found out that the bug was already addressed in https://github.com/deepset-ai/haystack/pull/2891/files#diff-33a10a124046ca252619f7aee1dc2d1656668fa7cb5c23fef9c4ba0824d538e8R232 so I'm closing this one along with the related issue 🙏

@masci masci closed this Oct 19, 2022
@anakin87
Copy link
Member Author

Hey @masci!
I agree that #3382 has been fixed by #2891.

Anyway, while working on this PR, I discovered another bug that is still present in the main branch and would have been fixed by this PR.

import numpy as np
from haystack import Document
d = Document(
        content="some text",
        content_type="text",
        score=0.99988,
        meta={"name": "doc1"},
        embedding=np.random.rand(5).astype(np.float32),
    )

repr(d) is:

<Document: {'content': 'some text', 'content_type': 'text', 'score': 0.99988, 'meta': {'name': 'doc1'}, 
'embedding': array([0.19878574, 0.44114766, 0.93554574, 0.84111285, 0.03531026],
      dtype=float32), 'id': 'c216d056ab381f2f730e4b3639f4dbc5'}>

Bug: the embedding is still included in repr instead of its shape, because of this line

return f"<Document: {str(self.to_dict())}>"

@masci
Copy link
Contributor

masci commented Oct 19, 2022

@anakin87 I completely missed that. Let's rebase on main then!

@masci masci reopened this Oct 19, 2022
@masci masci requested review from masci and removed request for mayankjobanputra October 19, 2022 11:20
@anakin87 anakin87 marked this pull request as draft October 19, 2022 18:27
@anakin87 anakin87 marked this pull request as ready for review October 19, 2022 18:27
@anakin87
Copy link
Member Author

@masci done!

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@masci masci merged commit 3860bb9 into deepset-ai:main Oct 19, 2022
@anakin87 anakin87 deleted the fix_document_repr branch October 19, 2022 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema representation bug on Jupyter
2 participants