Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC for AncientEurope_4A21 #1256

Merged
merged 1 commit into from
May 27, 2022

Conversation

mufernando
Copy link
Member

@mufernando mufernando commented May 16, 2022

@awohns and I QC'ed the AncientEurope_4A21.

@grahamgower
Copy link
Member

Nice work @mufernando!

The model comparison is failing with: AssertionError: State mismatch in populations in epoch[0], Pop1: 1 ≠ 0. This is happening because the catalog model uses legacy MassMigration events to do population splits, so the populations always exist. Whereas in the Demes model (and non-legacy msprime.Demography models) populations have an explicit time interval of existence. Msprime tracks this using the population's state variable (see https://github.com/tskit-dev/msprime/blob/09403d0b894645943e9cf6b714ecc77f3fe1c98b/msprime/demography.py#L3570-L3591). You can make the two models agree by specifying the population_id_map parameter to stdpopsim.DemographicModel. This is an (undocumented) list-of-dicts where there is one dict for each msprime.DemographyDebugger epoch, and the dict maps names to ids in each epoch. I might have this wrong (I didn't write the code), but hopefully @jeromekelleher can correct me and/or provide additional hints.

@petrelharp
Copy link
Contributor

Ah, thanks! We'd got this figured out, and our plan was for @awohns to re-write the model with the new msprime API, but perhaps this is an easier solution? (And, once they agreed, we were going to switch the implementations, putting the yaml one in main and the msprime one in qc.)

@mufernando
Copy link
Member Author

thanks for the reply @grahamgower.

I reimplemented the model using demes and added it to a separate YAML file for now under ancient_europe.yaml. There we are not re-using populations and setting the names to be exactly what they are in the figure (in contrast to how it was done before where pops were re-used and the names in the model do not agree with the actual names in the figures). @awohns will reimplement the model using msprime's new API and we will make sure the two implementations agree.

@mufernando
Copy link
Member Author

mufernando commented May 19, 2022

I'm having two issues here:

  • the sampling times don't agree. by default demes, will use the end_time as sampling time. @awohns added the actual sampling times of genomes from the paper. I'm not sure if we should add the sampling times from the paper? and if we do, it's not clear to me how to do that using demes.
  • still, the debugger will complain about mismatch in states. this won't show up in the tests right now, because it errors bc of sampling times first. it's not clear to me how to use the population id map @grahamgower mentioned above.

ping @grahamgower @jeromekelleher

@petrelharp
Copy link
Contributor

Perhaps @apragsdale can provide some input?

@jeromekelleher
Copy link
Member

Before diving into the weeds of doing the model comparisons here @mufernando, what's the reasoning for bringing Demes in here? I don't think we've used Demes as input anywhere else (unless I missed it) and there will be annoying quirks here in the different expressions of the same models between old-skool msprime, msprime 1.0 Demography API and Demes. It would be better to keep things uniform until we can do the catalog update, wholesale.

I agree it's a pain to have to use the old APIs when the new tools are so much better, though.

@mufernando
Copy link
Member Author

mufernando commented May 19, 2022 via email

@jeromekelleher
Copy link
Member

I would vote for doing one thing at a time, as there are enough moving parts already. I'm not really doing much here though at the moment, so I'd give other votes more weight!

@grahamgower
Copy link
Member

I don't have any good suggestion regarding sampling times. But just sticking to the msprime API for now is probably the path of least resistance. You can always call to_demes() on the msprime.Demography object and pass the demes.Graph object to demesdraw. Also, the msprime.DemographyDebugger output is much improved in msprime v1.x.

@jeromekelleher
Copy link
Member

We should keep the nicely crafted Demes model around somewhere though, as we'll definitely want to use this when we do the technology transition for the catalog. Maybe we open an issue with it and keep the model in there? Or just leave the model in the file, commented out?

@mufernando
Copy link
Member Author

I reimplemented the model now usingmsprime. @awohns some of the sampling times differ between your implementation and mine. it seems that the order in which you add the events (split/admix) matters to this? also see I had to add the default_sampling_time=None to some populations in your implementation.

@mufernando
Copy link
Member Author

also, it's not clear to me what to do with the Bronze age population. It has two sampling times but for now we only implemented one.

@petrelharp
Copy link
Contributor

Thanks for the heavy lifting here! I say just put in one sampling time (the more recent one?).

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #1256 (d6525c2) into main (77f9663) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head d6525c2 differs from pull request most recent head 7107232. Consider uploading reports for the commit 7107232 to get more accurate results

@@           Coverage Diff           @@
##             main    #1256   +/-   ##
=======================================
  Coverage   99.65%   99.66%           
=======================================
  Files         111      111           
  Lines        3523     3558   +35     
  Branches      438      438           
=======================================
+ Hits         3511     3546   +35     
  Misses          8        8           
  Partials        4        4           
Impacted Files Coverage Δ
stdpopsim/catalog/HomSap/demographic_models.py 100.00% <100.00%> (ø)
stdpopsim/qc/HomSap.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77f9663...7107232. Read the comment docs.

@awohns
Copy link
Contributor

awohns commented May 25, 2022

Minor changes to fix two failing tests:

  1. Model implementation has NEO defined before YAM in the population descriptions, which led to incorrect sampling population sampling times for those two populations.
  2. The "Bronze" population had start_sizes which disagreed between the two implementations, I think the qc version should have set initial_size to the population size at time zero, which is 592450737.7092822.

Look ok @mufernando?

@mufernando mufernando force-pushed the AncientEurope_4A21_qc branch 4 times, most recently from c37acd1 to df34cc1 Compare May 25, 2022 20:54
@mufernando
Copy link
Member Author

I think the last thing we need before we merge this is to figure out where Alice Pearson got the r_EU estimate for the growth rate of modern europeans (since the Bronze age). This is not in the preprint she linked in the model. @petrelharp did you hear from her?

@mufernando mufernando marked this pull request as ready for review May 25, 2022 21:57
@AliPearson
Copy link

I chose the growth rate pretty arbitrarily. I wanted to make the population size grow exponentially from the bronze age to present-day, resulting in a large enough population size to prevent coalescences in the first 100 generations or so. Not sure how realistic this is and i have since changed it to be constant from the bronze age, but that is what is in the preprint.

@petrelharp
Copy link
Contributor

Oh, good - so the number is actually in the preprint? We didn't find it, but there is a lot in there.

@petrelharp
Copy link
Contributor

If it's not in the preprint (and I just checked again?), we should just put in a comment saying it's a personal communication.

@AliPearson
Copy link

AliPearson commented May 27, 2022 via email

@petrelharp
Copy link
Contributor

No problem! I've added a note about that, so I think we're ready to merge in! Thanks, all!

added note about growth rate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants