Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using InMemoryDocumentStore() #1090

Closed
raghav-menon opened this issue May 24, 2021 · 4 comments
Closed

Error when using InMemoryDocumentStore() #1090

raghav-menon opened this issue May 24, 2021 · 4 comments

Comments

@raghav-menon
Copy link

I had installed haystack and was trying to use inMemoryDocumentStore(). After initializing the document store, when I try to index it I receive the following error


ValueError Traceback (most recent call last)
in ()
----> 1 document_store.write_documents(dicts)

/usr/local/lib/python3.7/dist-packages/haystack/document_store/memory.py in write_documents(self, documents, index)
84 if document.id in self.indexes[index]:
85 # TODO Make error type consistent across document stores and add user options to deal with duplicate documents (ignore, overwrite, fail)
---> 86 raise ValueError(f"Duplicate Documents: write_documents() failed - Document with id '{document.id} already exists in index '{index}'")
87 self.indexes[index][document.id] = document
88

ValueError: Duplicate Documents: write_documents() failed - Document with id '4cdbad5973235417ea3e71769dc9c9ae already exists in index 'document'


I had tried this previously (about a month back) it worked perfectly well. Not sure why this is arising. Would be grateful if you could point me in the right direction.

Thanks

Raghav

@tholor
Copy link
Member

tholor commented May 24, 2021

Hey,

We recently introduced in #1000 a basic hash mechanism that prevents writing duplicate documents to the documentstore. This was a common source of error and confusion for users in the past. Per default we generate the ID by hashing the documents text - you can customize this to other fields (see #1000).

In #1088 we will improve the user options to specify what happens in the case of duplicates (skip, overwrite, fail...).

So your options are:

@raghav-menon
Copy link
Author

Thank you. Tried the workaround and it worked.

Regards.

@Timoeller
Copy link
Contributor

Nice, can we close this issue then?

@raghav-menon
Copy link
Author

Sure we can close the issue. Thanks and Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants