Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Components in Pipelines with REST API #955

Closed
guillim opened this issue Apr 9, 2021 · 5 comments
Closed

Custom Components in Pipelines with REST API #955

guillim opened this issue Apr 9, 2021 · 5 comments
Assignees
Labels
topic:pipeline type:feature New feature or request

Comments

@guillim
Copy link
Contributor

guillim commented Apr 9, 2021

Problem
I am handling multiple instances of Haystack, because I have multiple customers with their own knowledge base, and therefore custom needs.

Since the pipelines PR #922, I am seeing a new possibility : having one file rest_api/pipelines.yaml for each one of our customer, keeping the same code base among all of them. It will ease the maintenance, a lot.

However, we have sometimes to create custom nodes such as you suggest in you doc but these components cannot yet be available when using the REST_API and docker.

Describe the solution you'd like
A cleaner way to integrate custom nodes. I am sure my solution isn't clean, but I haven't found a better way yet.

Describe alternatives you've considered
To give you an example, let's say we want to create a slightly modified version of the EmbeddingRetriever.
At the moment, I create a file called haystack/custom.py in which i write my python nodes

from typing import List
from haystack.schema import BaseComponent
from haystack.retriever.dense import EmbeddingRetriever
import numpy as np
from haystack import Document

class TitleEmbeddingRetriever(EmbeddingRetriever):
    def embed_passages(self, docs: List[Document]) -> List[np.ndarray]:
        texts = [d.meta['name'] for d in docs]
        return self.embed(texts)

and I import them in the haystack/init.py like so

from haystack.custom import TitleEmbeddingRetriever

Additional context
For the setup, I am using the REST_API with Docker-compose

@guillim guillim added the type:feature New feature or request label Apr 9, 2021
@guillim
Copy link
Contributor Author

guillim commented Apr 9, 2021

One suggestion that I had from @psorianom is to create some pip packages to deal with custom nodes.

@oryx1729
Copy link
Contributor

Hi @guillim, thank you for raising the issue.

I think the fundamental premise here is that the Pipeline class should be aware of custom node definitions before a Pipeline can be loaded. This implies that the code for custom nodes has to be packaged with Docker.

I agree with you that the proposed solution with defining nodes in haystack/custom.py & adding imports in haystack/init.py could be made cleaner. A downside of this solution is you'll always have to have the Haystack code with the REST API, instead of using Haystack just as a package(with pip install).

An alternative approach could be defining the custom nodes together with the REST APIs(maybe in a dedicated custom_nodes folder) using a Python decorator(@component) or using Class inheritance to make the Pipeline class aware of the custom_nodes. This would decouple the Haystack code itself with the custom_nodes. Would a solution like this make sense for your use-cases?

It is an important functionality for Haystack, so happy to brainstorm more ideas to simplify this workflow.


One suggestion that I had from @psorianom is to create some pip packages to deal with custom nodes.

I like the idea, but my first reaction is it might possibly be more work than needed for users to create and publish packages for custom_nodes.

@guillim
Copy link
Contributor Author

guillim commented Apr 16, 2021

The custom_nodes folder sounds like a cleaner version of what I am doing at the moment with custom.py. I would say it's the best option of this brainstorming at the moment.

I understand (and agree) that most of you users would feel a bit overwhelmed by the need to develop a dedicated pip packeage for this specific purpose.

@tholor
Copy link
Member

tholor commented Apr 16, 2021

Yeah I would also imagine a solution where we mount a custom.py (or folder) together with the pipelines.yaml into the container. Maybe allowing to specify a path like source_file in the yaml definition of the node 🤔

@oryx1729 oryx1729 changed the title Custom Components Custom Components in Pipelines with REST API Jun 23, 2021
@guillim
Copy link
Contributor Author

guillim commented Jun 23, 2021

tks for the job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:pipeline type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants