Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need unit tests for calc...() popgen functions #474

Open
bhaller opened this issue Sep 25, 2024 · 5 comments
Open

Need unit tests for calc...() popgen functions #474

bhaller opened this issue Sep 25, 2024 · 5 comments

Comments

@bhaller
Copy link
Contributor

bhaller commented Sep 25, 2024

The need for unit tests for SLiM's popgen functions has been underlined by another discovery of a bug with them (https://groups.google.com/g/slim-discuss/c/Yacfk9EIYeU/m/bc72wVUzBAAJ). I'm not sure how to test them, though. I suppose a test could construct a population with known mutations, placed into the genomes at known positions/frequencies, and then test that the value calculated by the function matches the expected value calculated independently from first principles or by other software. If someone can supply me with a test scenario and an expected value, I can construct a corresponding SLiM test, but I don't have the knowledge necessary to come up with appropriate scenarios and expected values. These test scenarios wouldn't need to be large/complex; even a test with a genome of say, ten base positions long with, say, five mutations present and four diploid individuals (eight genomes) would be quite sufficient to test that the math and logic are correct, I would think. It would be good to have such tests for all of the calc...() functions. Perhaps @npb596 or @petrelharp or @philippmesser could help me with this?

@bhaller
Copy link
Contributor Author

bhaller commented Sep 25, 2024

This could take the form of a VCF file and an expected value. My SLiM test could simply load the VCF and check for a match (within reasonable numerical tolerance) to the expected value.

@petrelharp
Copy link
Collaborator

My recommendation is to not do expected values from theory (if that's what you meant); instead compare to the value calculated independently - either by other software or by a separate, first-principles implementation.

I don't want to take this on right now though - maybe a good student project?

@bhaller
Copy link
Contributor Author

bhaller commented Sep 25, 2024

OK. Why not expected values from theory?

@petrelharp
Copy link
Collaborator

Because that is so much more complicated - you have to worry about statistical power; how close is "close enough"; etcetera. That sort of thing is good for validation, but not so good for unit tests (for one thing you end up having to run a lot of simulatiosn to make sure). What we do in tskit, for instance, is usually just pull up the definition of the thing, then code up some real simple implementation that doesn't worry about efficiency; and compare to that. msprime does have a whole validation.py script that does statistical comparisons to other simulation software; but that's a much messier thing.

@bhaller
Copy link
Contributor Author

bhaller commented Sep 25, 2024

Because that is so much more complicated - you have to worry about statistical power; how close is "close enough"; etcetera. That sort of thing is good for validation, but not so good for unit tests (for one thing you end up having to run a lot of simulatiosn to make sure). What we do in tskit, for instance, is usually just pull up the definition of the thing, then code up some real simple implementation that doesn't worry about efficiency; and compare to that. msprime does have a whole validation.py script that does statistical comparisons to other simulation software; but that's a much messier thing.

Aha, I see. Yes, there are certainly problems with doing statistical tests for validation. SLiM already does tons of them, though. But if a precise comparison to the "right answer" is possible, that's certainly better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants