Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: store id_hash_keys in Document objects to make documents clonable #3697

Merged
merged 26 commits into from
Jan 23, 2023

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Dec 12, 2022

Related Issues

Proposed Changes:

  • Make Document object store their id_hash_key attribute after initialization. This allows nodes to "clone" the way Documents were created, for example when splitting or merging them together, and saves the user the burden to keep specifying the id_hash_key parameter every time.
  • Fixes ES and Weaviate to properly manage id_hash_keys
  • Fixes a few docstrings & typing in the Document class
  • Remove some dangerous defaults in the Document class, mainly the field_maps.

How did you test it?

  • Unit tests

Notes for the reviewer

Read the related discussions linked above as they contain valuable insights about why this feature was initially dismissed.

Note also that the possibility to pass an ID to the init of Document might have been an early workaround to the fact that id_hash_keys was lost. We might want to reconsider this feature later on.

Checklist

@ZanSara ZanSara added type:feature New feature or request topic:preprocessing labels Dec 12, 2022
@ZanSara ZanSara marked this pull request as ready for review January 12, 2023 10:04
@ZanSara ZanSara requested a review from a team as a code owner January 12, 2023 10:04
@ZanSara ZanSara requested review from bogdankostic and removed request for a team January 12, 2023 10:04
@ZanSara ZanSara changed the title feat: store id_hash_keys in Document objects feat: store id_hash_keys in Document objects to make documents clonable Jan 12, 2023
Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, mostly regarding doc strings. Also, I think we should follow our deprecation policy here, even though this is a minor change.

haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
haystack/schema.py Outdated Show resolved Hide resolved
ZanSara and others added 2 commits January 17, 2023 13:53
haystack/schema.py Outdated Show resolved Hide resolved
@ZanSara
Copy link
Contributor Author

ZanSara commented Jan 17, 2023

Also, I think we should follow our deprecation policy here, even though this is a minor change.

I'm not sure how to proceed here. This PR is mostly about persisting a parameter that has always been given to the Document. We could argue I'm adding stuff rather than removing it. Where can I add the deprecation notice? Do you have a specific API in mind?

@bogdankostic
Copy link
Contributor

Where can I add the deprecation notice? Do you have a specific API in mind?

I thought of adding category=FutureWarning, stacklevel=2 at this deprecation warning message and adding a test that uses id_hash_keys parameter in from_dict that will fail in v1.15.

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZanSara ZanSara merged commit 94f660c into main Jan 23, 2023
@ZanSara ZanSara deleted the clonable-documents branch January 23, 2023 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants