Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Load PreProcessor with split_by: None from config fails #3386

Closed
ugm2 opened this issue Oct 14, 2022 · 4 comments · Fixed by #3389
Closed

Bug: Load PreProcessor with split_by: None from config fails #3386

ugm2 opened this issue Oct 14, 2022 · 4 comments · Fixed by #3389
Assignees
Labels
topic:preprocessing type:bug Something isn't working

Comments

@ugm2
Copy link
Contributor

ugm2 commented Oct 14, 2022

The issue

When trying to load the PreProcessor node with split_by: None from config fails.

Reproduce the issue

To reproduce the issue, execute this script with the latest Haystack version and python 3.8:

from haystack import Pipeline
from haystack.nodes.preprocessor import PreProcessor

processor = PreProcessor(
    split_by=None,
)

pipeline = Pipeline()
pipeline.add_node(processor, name="Preprocessor", inputs=["File"])

config = pipeline.get_config()
pipeline = Pipeline.load_from_config(config, pipeline_name="indexing")

The error

Where it fails:

Draft7Validator(schema).validate(instance=pipeline_config)
break
except ValidationError as validation:
# If the validation comes from an unknown node, try to find it and retry:
if list(validation.relative_schema_path) == ["properties", "components", "items", "anyOf"]:
if validation.instance["type"] not in loaded_custom_nodes:
logger.info(
f"Missing definition for node of type {validation.instance['type']}. Looking into local classes..."
)
missing_component_class = BaseComponent.get_subclass(validation.instance["type"])
schema = inject_definition_in_schema(node_class=missing_component_class, schema=schema)
loaded_custom_nodes.append(validation.instance["type"])
continue
# A node with the given name was in the schema, but something else is wrong with it.
# Probably it references unknown classes in its init parameters.
raise PipelineSchemaError(
f"Node of type {validation.instance['type']} found, but it failed validation. Possible causes:\n"
" - The node is missing some mandatory parameter\n"
" - Wrong indentation of some parameter in YAML\n"
"See the stacktrace for more information."
) from validation

The actual error:

On instance['components'][0]:
    {'name': 'Preprocessor',
     'params': {'split_by': None},
     'type': 'PreProcessor'}
Traceback (most recent call last):
  File "/home/paperspace/repositories/smart-search/env/lib/python3.8/site-packages/haystack/pipelines/config.py", line 338, in validate_schema
    Draft7Validator(schema).validate(instance=pipeline_config)
  File "/home/paperspace/repositories/smart-search/env/lib/python3.8/site-packages/jsonschema/validators.py", line 304, in validate
    raise error
jsonschema.exceptions.ValidationError: {'name': 'Preprocessor', 'type': 'PreProcessor', 'params': {'split_by': None}} is not valid under any of the given schemas

On instance['components'][0]:
    {'name': 'Preprocessor',
     'params': {'split_by': None},
     'type': 'PreProcessor'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "haystack_bug_test.py", line 12, in <module>
    pipeline = Pipeline.load_from_config(config, pipeline_name="indexing")
  File "/home/paperspace/repositories/smart-search/env/lib/python3.8/site-packages/haystack/pipelines/base.py", line 1913, in load_from_config
    validate_config(pipeline_config, strict_version_check=strict_version_check)
  File "/home/paperspace/repositories/smart-search/env/lib/python3.8/site-packages/haystack/pipelines/config.py", line 259, in validate_config
    validate_schema(
  File "/home/paperspace/repositories/smart-search/env/lib/python3.8/site-packages/haystack/pipelines/config.py", line 366, in validate_schema
    raise PipelineSchemaError(
haystack.errors.PipelineSchemaError: Node of type PreProcessor found, but it failed validation. Possible causes:
 - The node is missing some mandatory parameter
 - Wrong indentation of some parameter in YAML
See the stacktrace for more information.
@julian-risch
Copy link
Member

@ugm2 Thank you also for reporting this bug, Unai! The problem seems to be the type of the split_by parameter indicated here: https://github.com/deepset-ai/haystack/blob/main/haystack/nodes/preprocessor/preprocessor.py#L54 and the schema file generated based on that with the following line:

@julian-risch
Copy link
Member

Hi @ugm2 here is a PR that should fix the issue: #3389 If you want you can try it out by checking out the branch of the PR even before we merge it into main.

@ugm2
Copy link
Contributor Author

ugm2 commented Oct 14, 2022

Hey @julian-risch ! I just checked out to the branch of the PR and tried my script and it worked like a charm! So yeah, that'd work for me 🙂

@julian-risch
Copy link
Member

Great to hear that. Thank you for trying it out. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:preprocessing type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants