Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ElasticsearchDocumentStore.get_all_documents in ElasticsearchFilterOnlyRetriever.retrieve #2151

Conversation

adri1wald
Copy link
Contributor

@adri1wald adri1wald commented Feb 9, 2022

Closes #2139

Proposed changes:

  • Previously ElasticsearchFilterOnlyRetriever.retrieve used ElasticsearchDocumentStore.query, which takes a top_k parameter but doesn't apply it to the es request body on the if query is None branch.
  • Based on discussion with @julian-risch and @mathislucka, top_k doesn't have semantic meaning without a query anyways, because there is nothing to score agains and thus no ordering over which to select the top k documents.
  • We came to the conclusion that "the expected behavior of the ElasticsearchFilterOnlyRetriever is to return all documents that match the given filter and not use a top_k at all"
  • ElasticsearchFilterOnlyRetriever.retrieve therefore now uses ElasticsearchDocumentStore.get_all_documents, which returns all documents matching the filters.

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

@adri1wald
Copy link
Contributor Author

The tests around retrievers are a bit confusing to me - would love some pointers on how to test this change.

@julian-risch
Copy link
Member

The tests around retrievers are a bit confusing to me - would love some pointers on how to test this change.

Hi @adri1wald sure, I think it would be nice to add a new test case here that is parameterized with an ElasticsearchDocumentStore and an ElasticsearchFilterOnlyRetriever:

@pytest.mark.parametrize("retriever", ["es_filter_only"], indirect=True)
@pytest.mark.parametrize("document_store", ["elasticsearch"], indirect=True)
def test_es_filter_only(document_store, retriever, docs):
...

There you could add let's say 12 documents of which 11 documents should be selected by the filter. If all 11 are returned, the test passes. Any other idea?

@ZanSara
Copy link
Contributor

ZanSara commented Mar 25, 2022

Hey @adri1wald, do you think you'll have time to work on this PR or shall we take over, apply the last touches and merge?

@adri1wald
Copy link
Contributor Author

Hey @ZanSara, really sorry I completely forgot about this PR :’) I will make some time this weekend to finish it off!

@ZanSara
Copy link
Contributor

ZanSara commented Mar 25, 2022

Great, thanks! 🙂

@julian-risch
Copy link
Member

Hi @adri1wald it would be really nice to include your changes in the next release. Do you think you could make the final changes by the end of next week (end of April)?

@adri1wald
Copy link
Contributor Author

Hey, major apologies for the delay. I've added the test case now. Let me know if there's anything else that needs doing!

@julian-risch
Copy link
Member

Hi @adri1wald thanks for adding the test. Right now, it's failing because of the following line of code:

        document_store.write_documents(docs)
>       retrieved_docs = retriever.retrieve(filters={"f1": ["0"]})
E       TypeError: retrieve() missing 1 required positional argument: 'query'

Currently, the retrieve method has no default value for query and therefore the test fails.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, ready to merge! 👍 Thank you @adri1wald for your contribution!

@julian-risch julian-risch merged commit c401e86 into deepset-ai:master Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

top_k in ElasticsearchFilterOnlyRetriever has no effect when using ElasticsearchDocumentStore
3 participants