Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support for specifying vectorizer parameter #179

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pashva
Copy link

@pashva pashva commented May 12, 2024

feat: #95

I have added a optional parameter of vectorizer to the Vectorstore class
The hierarchy it follows is :-

  1. If a vectorizer is passed for example text2vec-transformers, the class will be created with the passed value.
  2. If nothing is passed to it then it will adopt the DEFAULT_VECTORIZER from the Weaviate configuration
  3. If the user does not want any vectorizers they can pass none as a string in the vectorizer param as well

@hsm207
Copy link
Collaborator

hsm207 commented May 12, 2024

@pashva thanks for your contribution!

Unfortunately, I can't accept it in the current state because:

  1. As discussed in Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95, the solution we're looking to implement is, instead of introducing another param in the init method, users will create their schema in weaviate themselves, and then when instantiating the weaviate vector store class in langchain, specify the index name.
  2. The CI pipeline is failing as there aren't tests
  3. This feature definitely will need a documentation update

We have a contributing doc to explain how to set up a dev environment to run tests, lint, etc.

@pashva
Copy link
Author

pashva commented May 12, 2024

So if I understand it correctly,
In the init class you want to check whether the index exists previously with some configuration else go ahead and create a default one? But this seems to be the current implementation right? Where we only create a collection if it does not exist?@hsm207

@hsm207
Copy link
Collaborator

hsm207 commented May 12, 2024

@pashva yes, your understanding is correct.

Although the current implementation in the init method is implemented the way want, it was not written with this feature in mind. The purpose was to not overwrite the schema that was created within langchain when the user provides the index_name to reuse an index langchain created earlier, as weaviate will throw an error if the new schema name already exists.

Therefore, we should verify that the use case we are currently considering does work as expected and not break things elsewhere e.g. when doing search or ingesting documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants