Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML Pipeline Validation #1981

Closed
1 of 4 tasks
ZanSara opened this issue Jan 10, 2022 · 6 comments · Fixed by #2226
Closed
1 of 4 tasks

YAML Pipeline Validation #1981

ZanSara opened this issue Jan 10, 2022 · 6 comments · Fixed by #2226
Assignees
Labels

Comments

@ZanSara
Copy link
Contributor

ZanSara commented Jan 10, 2022

Rationale

Pipelines can be defined as YAML files. However right now it's nearly impossible to understand if the supplied pipeline is actually valid and if not, and if not, what's the reason.

Providing validation can be extremely useful to users, especially less experienced ones, or users with very complex pipeline designs.

Main Goals

  • Investigate YAML schema possibilities
  • Define how strict the validation should be. For example, shall we allow for almost nonsensical pipelines, as long as the system could technically work with no exceptions? Shall warnings be thrown?
  • Validate pipeline syntax: verify from a YAML if it's valid (e.g. params, node classes ...)
  • Validate pipeline semantics: verify if combination of nodes makes sense (e.g. no retriever after reader)
@ZanSara ZanSara self-assigned this Jan 10, 2022
@lalitpagaria
Copy link
Contributor

It would be great to build it as a CLI utility. CLI utility can -

@ZanSara
Copy link
Contributor Author

ZanSara commented Jan 11, 2022

I'm not sure what the benefit of a CLI tool would be here to be honest:

  • performs pipeline validation: it could be done with a method call, rather than a CLI tool, but I see that it might be handy for some.
  • install required dependencies: a good error message can tell the user what to do without having to maintain code that handles pip from Python (like: This YAML requires the 'faiss' dependency group. Please run 'pip install haystack[faiss]')
  • convert YAMLs to different version if required: I'm not sure what you mean here. As a migration tool for future YAML versions?
  • view a YAML file: I guess a text editor like VSCode could suffice 🙂

However, if we end up making a CLI utility with a bigger scope in the future, these are nice features to remember about.

@lalitpagaria
Copy link
Contributor

With view, I mean pipeline visualisation.

I agree with your view. Anyway click is part of haystack as transitive dependency hence suggested it. Yes I think haystack as CLI utility can provide good value to the users.

@ZanSara
Copy link
Contributor Author

ZanSara commented Jan 11, 2022

I see! I totally misunderstood the last one, sorry. I would say we could open a separate issue for a future CLI tool. As soon as we see there are enough features to justify the cost, we'll plan for it. It would be a shame to forget about these smaller features later on.

I think it's better if you open the issue, you may know about other proposals that came out in the past. If so please gather them so we have a good starting point for the discussion 🙂

@tholor
Copy link
Member

tholor commented Jan 11, 2022

I agree, let's put this to a separate issue. One thing that might also be part of such a CLI is "setting up a haystack project / pipeline" based on some command line questionnaire / templates (bit like cookiecutter). This might help you spin up a quick template for a "dense document retrieval pipeline based on elasticsearch" or let you choose the "right reader model" by prioritizing speed or accuracy etc.

@lalitpagaria
Copy link
Contributor

Sure. Let me gather my thoughts, I will create an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants