-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmbeddingRetriever
assumes the title of the document is in a field of meta
called name
#3258
Comments
Hey @ZanSara! I think that this line is superfluous and indeed is probably wrong:
In fact, before calling that method, in the haystack/haystack/nodes/retriever/dense.py Lines 1838 to 1868 in 6cb4e93
So for now, if you have such a document: {"content": "mycontent",
"meta": {"name": "myname"}} and you try to embed it with an [['myname', 'myname\ncontent']] I verified that with some experiments. Possible solutionWe can simply remove that line from |
Hey @anakin87, wow, thank you for digging into this one. So, although I don't believe this duplication benefits anyone, there might be parts of the code deep into the retriever that either assume the name will be in the first position of the list, and others that assume the name is already concatenated. My recommendation would be to proceed, remove the line, and then thoroughly test the resulting code. By this, I mean that we should make sure all tutorials run on it. If you can, running all tests locally would be also fantastic. Do you think you can do that? If it's not viable I'll do the tests myself, let me know. It will just take a bit longer as I have a lot to catch up after my holidays 😄 |
Mayank is already working on this. |
@julian-risch thanks! I wasn't aware yet 👍 |
Describe the bug
embed_documents
,EmbeddingRetriever
assumes the title of the document is in a field ofmeta
calledname
:haystack/haystack/nodes/retriever/_embedding_encoder.py
Line 189 in 7e79a48
Expected behavior
embed_meta_fields
parameter.Additional context
DenseRetriever
abstraction #3252 (comment)The text was updated successfully, but these errors were encountered: