-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support conjunctive queries in sparse retrieval #2361
Conversation
@bogdankostic it would be great if you could take a look at this and especially the naming of the param |
@Timoeller I'd also like to hear your opinion on the naming of the param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking already pretty good to me!
I think the naming of all_terms_must_match
is okay, as this param is well explained in the docstrings. I am happy to approve this PR once the test test_DeepsetCloudDocumentStore_query
is fixed and labels are added :)
I think the term is a bit long but fine, it describes what is happening pretty well. Although I like this feature I thought we had a discussion about conjunctive matching between terms and filters. Will this come in a later PR? |
Alright @Timoeller, conjunctive matching between terms and filters is already handled in #2359. |
@bogdankostic test is fixed. Currently there is a test failing due to some hf downtime I guess. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently we only support disjunctive queries in sparse retrieval. I.e. only one query term needs to be present in a document in order to be retrieved. This is a highly recall-favouring approach. In some use-cases (e.g. document retrieval) a more precision-focused approach might be better, however. Conjunctive queries ensure that all query terms must be present in a document in order to be retrieved. Within elasticsearch/OpenSearch this could easily be achieved by setting the
operator
param of thematch
clause toAND
.Proposed changes:
all_terms_must_match
toElasticsearchRetriever
'squery()
methodoperator
of match clause in elasticsearch and OpenSearch queries according toall_terms_must_match
all_terms_must_match
toDeepsetCloudDocumentStore
Status (please check what you already did):