Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Store test refactoring #3449

Merged
merged 23 commits into from
Oct 31, 2022
Merged

Document Store test refactoring #3449

merged 23 commits into from
Oct 31, 2022

Conversation

masci
Copy link
Contributor

@masci masci commented Oct 21, 2022

Related Issues

Proposed Changes

Refactor the tests into a class hierarchy that reflects the Document Store classes.

Why?

  • The pytest markers we use to select which tests should be collected is not manageable anymore: we recently found out that entire portions of tests were not executed because the right combination of markers was never met. The way pytest collect tests organized into classes will let us getting rid of the markers matrix.
  • Computing coverage at the moment is beyond difficult

How

  • The DocumentStoreBaseTestAbstract class contains the tests for the abstract methods in BaseDocumentStore. These tests receive an instance of a document store in form of the ds fixture.
  • Classes deriving from DocumentStoreBaseTestAbstract are responsible for defining the ds fixture, returning the specific instance of the Document Store they intend to test
  • pytest will run the testing code from DocumentStoreBaseTestAbstract for each subclass, hence exercising a different Document Store at each round

Notes for the reviewer

  • This PR only touches the SearchEngineDocumentStore hierarchy, the other will follow in separate PRs
  • The tests marked with pytest.mark.elasticsearch were moved in the new hierarchy
  • Any other test in test_document_store.py was left there, waiting for the other Document Stores to be ported
  • The Elasticsearch and Opensearch jobs in the workflow were replaced with new ones using the test classes

Checklist

@masci masci requested a review from a team as a code owner October 21, 2022 13:53
@masci masci requested review from vblagoje and removed request for a team October 21, 2022 13:53
.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
@ZanSara ZanSara added the type:refactor Not necessarily visible to the users label Oct 24, 2022
"pinecone",
"opensearch",
}
all_doc_stores = {"elasticsearch", "faiss", "sql", "memory", "milvus1", "milvus", "weaviate", "pinecone"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as we add test_memory.py, it will get removed from these existing "references"? We'll do that one-by-one for each document store @masci ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vblagoje correct, up to the point where we can remove test_document_store.py

@masci masci requested a review from vblagoje October 31, 2022 09:28
@@ -110,14 +115,94 @@ jobs:
run: pip install .[all]

- name: Run
run: pytest -m "unit" test/
run: pytest -m "unit" test/${{ matrix.topic }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this needs to match the test directory, we add a small comment perhaps?

port=9201,
knn_engine="nmslib",
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have test_opensearch.py, and we are removing OpenSearchDocumentStore from get_document_store yet we don't do the same for ElasticsearchDocumentStore. We are not removing it from get_document_store and we added 'test_elasticsearch.py'. Just a bit confused here. What's happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch keeps being used in some tests in test_document_store.py that are triggered with the complex markers matrix, while Opensearch had a way smaller test coverage there. The result is that Opensearch tests are now 100% implemented in the new module test_opensearch.py while Elasticsearch needs more iterations to finish moving tests out of test_document_store.py into test_elasticsearch.py. This can't be done without touching the other Document Stores so I plan to do that later.

Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks gtg. Somewhat complex refactoring, but I see its intention, and it seems like a very good one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:CI topic:document_store topic:DX Developer Experience type:refactor Not necessarily visible to the users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants