Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests categories #1822

Merged
merged 5 commits into from
Oct 6, 2022
Merged

Tests categories #1822

merged 5 commits into from
Oct 6, 2022

Conversation

miguelgfierro
Copy link
Collaborator

Description

Added some of the work that @simonzhaoms and I developed for other projects

Related Issues

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

tests/README.md Outdated
* **Functional tests:** These tests make sure that the components of the project not just run but their function is correct. For example, we want to test that an ML model evaluation of RMSE gives a positive number. These tests can be run asynchronously in the nightly builds and can take hours. In these tests, we want to use real data.
* **Integration tests:** We want to make sure that the interaction between different components is correct. For example, the interaction between data ingestion pipelines and the compute where the model is trained, or between the compute and a database. These tests can be of variable length, if they are fast, we could add them to the PR gate, otherwise, we will add them to the nightly builds. For this type of tests, synthetic and real data can be used.
* **Smoke tests:** The smoke tests are gates to the slow tests in the nightly builds to detect quick errors. If we are running a test with a large dataset that takes 4h, we want to create a faster version of the large test (maybe with a small percentage of the dataset or with 1 epoch) to ensure that it runs end-to-end without obvious failures. Smoke tests can run sequentially with functional or integration tests in the nightly builds, and should be fast, ideally less than 20min. They use the same type of data as their longer counterparts.
* **Performance test:** The performance tests are tests that measure the computation time or memory footprint of a piece of code and make sure that this is bounded between some limits. For this type of tests, synthetic data can be used.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there also going to be some sort of champion challenger test to make sure the model is performing better than another model in prod.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an interesting point of view @AbeOmor. Will this be in the realm of regression tests when we want to measure whether different versions of a library provide different values, or are you thinking in the context of benchmarking different models for the same dataset like this https://github.com/microsoft/recommenders/blob/main/examples/06_benchmarks/movielens.ipynb


The nightly builds tests are executed asynchronously and can take longer. Here we include the smoke and integration tests, and their objective is to not only make sure that there are not errors, but also to make sure that the machine learning solutions are doing what we expect.
* **Data validation tests:** In the data validation tests, we ensure that the schema for input and output data for each function in the pipeline matches the desired prespecified schema. They should be fast and can be added to the PR gate.
* **Unit tests**: In the unit tests we just make sure the python utilities and notebooks run correctly. Unit tests are fast, ideally less than 5min and are run in every pull request. They belong to the PR gate. For this type of tests, synthetic data can be used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one can argue that the unit tests are the tests at the lowest (atomic) level of the code, to isolate issues in specific part of the code. With this definition, running notebooks (e2e) doesn't fall into unit test. But I am not sure where to put 'test notebooks'

Copy link
Collaborator Author

@miguelgfierro miguelgfierro Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good thought. I was also struggling with whether to separate the notebook unit tests or put them together with the library.

Looking at the Wikipedia:

In computer programming, unit testing is a software testing method by which individual units of source code—sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures—are tested to determine whether they are fit for use

Based on this definition if we consider notebook as an individual unit, or we consider it as a unit with multiple modules, it could be argued that they fit into unit tests. A different situation would be if instead of just checking if the notebook works, we also check that it returns the right information. Then that would be in the realm of functional.

@setuc
Copy link

setuc commented Sep 29, 2022

I would also want to add the following tests or checks as part of the PR / scheduled builds

  • Code Scans to include for potential security issues or code quality checks, e.g. unused imports etc. We are currently using LGTM in our repository together with Sonnar cloud
  • We also want to enable container scans to find potential security issues either in python packages or the underlying OS as part of PR in addition to scheduled scans in the production pipelines.
  • Would we need to include an endpoint scan using OWASP?
  • Consideration for load testing, especially when working with larger models in CV and NLP. This is what you have documented as performance testing.

@miguelgfierro
Copy link
Collaborator Author

@setuc really good points. We mentioned security in the RAI tests, but not sure if it would be convenient to create a different categories. In fact, here https://en.wikipedia.org/wiki/Software_testing there is a category of security testing.

About load testing, this is a really good point. We will incorporate it in the text

@miguelgfierro
Copy link
Collaborator Author

@setuc @fazamani @AbeOmor I incorporated the feedback in the new commit.

@miguelgfierro
Copy link
Collaborator Author

Feedback from Erich Gamma:
Each project needs to come up with a strategy that ensure the quality of the project while preserving agility. It looks like you are already using different test categories following the idea of a testing pyramid https://martinfowler.com/articles/practical-test-pyramid.html (this is a pretty good article on the topic).

Given that you already have a decent amount of tests in the Recommender repo I suggest that you start to reflect about what are your pain points with your current tests. For example,

  • Not enough tests: make it easier to implement tests. For example by providing a library of test fixtures/setups
  • Flakey tests: Team effort to make tests more stable, establish culture that it is not OK to have flakey tests. File issues against these tests
  • Not enough coverage: Make the current coverage visible using a coverage report
    Not enough automation typically an issue for smoke testing that involves UI (not an issue for your project): Investigate into a UI testing framework like

@miguelgfierro
Copy link
Collaborator Author

Quote from the link that Erich sent:

Write lots of small and fast unit tests. Write some more coarse-grained tests and very few high-level tests that test your application from end to end.

Given the shortcomings of the original names it's totally okay to come up with other names for your test layers, as long as you keep it consistent within your codebase and your team's discussions.

@miguelgfierro miguelgfierro merged commit 6de2112 into staging Oct 6, 2022
@miguelgfierro miguelgfierro deleted the miguel/tests_readme branch October 6, 2022 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants