Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid usage of *args and **kwargs in any node's __init__ #2362

Merged
merged 8 commits into from
Apr 14, 2022

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Mar 28, 2022

Problem:

  • The YAML validation code seems to ignore **kwargs.
  • If a subclass uses **kwargs in its __init__ to transmit any unknown parameter to its superclass __init__, the validation for that node is going to fail and the nade cannot be instantiated, as in many cases the content of **kwargs is critical for the superclass.

Solution

  • The YAML schema generator should look for **kwargs in the signature and, if detected, merge the superclass' schema with the subclass'.

@TuanaCelik

@ZanSara ZanSara self-assigned this Mar 28, 2022
@ZanSara
Copy link
Contributor Author

ZanSara commented Mar 28, 2022

After a better inspection of the code, I realized that in fact we can't handle **kwargs properly. Once they go in the __init__() of the subclass we've got no guarantees that they are not meant to be used locally at least in part: assuming they will all be sent to the superclass is a rather implicit assumption that many custom (and future) nodes might not respect.

Given that **kwargs are generally not a necessity but simply a compactness measure, I propose to remove the usage of **kwargs from our nodes and document stores and forbid any future usage.

This will mean modifying:

In the docstore cases, it will be enough to copy-paste the parent signature down to the subclasses.
For FARMReader, this will imply modifying some more internal classes to stop forwarding kwargs around when it's not needed. I will open separate PRs for these changes and merge them into this one.

ZanSara and others added 6 commits April 13, 2022 13:21
* Remove kwargs from ESDocStore subclasses

* Remove kwargs from subclasses of SQLDocumentStore

* Remove kwargs from Weaviate

* Revert change in pinecone

* Fix tests

* Fix retriever test wirh weaviate

* Change Exception into DocumentStoreError
* Remove FARMReader kwargs without trying to replace them functionally

* Update Documentation & Code Style

* enforce same index values before and after saving/loading eval dataframes (#2398)

* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)

* Add tests for missing init and super

* Update Documentation & Code Style

* change in with endswith

* Move test in pipeline.py and change test in pipeline_yaml.py

* Update Documentation & Code Style

* Use caplog to test the warning

* Update Documentation & Code Style

* move tests into test_pipeline and use get_config

* Update Documentation & Code Style

* Unmock version name
@ZanSara ZanSara marked this pull request as ready for review April 14, 2022 11:20
@ZanSara ZanSara changed the title YAML schema generator cannot handle **kwargs Forbid usage of *args and **kwargs in any node's __init__ Apr 14, 2022
Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! To make the failing code-and-docs-check pass, I think you need to apply black manually to haystack/nodes/_json_schema.py and test/test_pipeline_yaml.py.

@ZanSara
Copy link
Contributor Author

ZanSara commented Apr 14, 2022

It sometimes fails when it checks too early. Given that the Code Style action passed, I think it's safe to merge. Thanks! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:pipeline type:bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants