Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Individuals not associated with nodes can lead to incorrect VCF #2448

Closed
jeromekelleher opened this issue Jul 28, 2022 · 0 comments
Closed
Labels
bug Something isn't working

Comments

@jeromekelleher
Copy link
Member

Related to #2446 and #2257

ts = msprime.sim_ancestry(3, sequence_length=1e2, random_seed=1234)
ts = msprime.sim_mutations(ts, rate=0.01, random_seed=1234)
tables = ts.dump_tables()
tables.individuals.add_row()
ts = tables.tree_sequence()
print(ts.as_vcf())

gives

##fileformat=VCFv4.2
##source=tskit 0.5.2.dev0
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=100>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  tsk_0   tsk_1   tsk_2tsk_3
1       2       0       C       T       .       PASS    .       GT      1|0     1|0     1|0
1       15      1       G       T       .       PASS    .       GT      1|0     0|0     0|0
1       19      2       A       T       .       PASS    .       GT      1|0     0|0     0|0
1       29      3       C       A       .       PASS    .       GT      1|0     1|0     1|0
1       35      4       T       C       .       PASS    .       GT      0|0     0|0     0|1
1       37      5       G       C       .       PASS    .       GT      0|1     0|1     0|0
1       56      6       C       G       .       PASS    .       GT      0|1     0|1     0|0
1       61      7       T       G       .       PASS    .       GT      0|0     0|0     1|0
1       71      8       C       G       .       PASS    .       GT      0|1     0|1     0|0
1       81      9       G       C       .       PASS    .       GT      1|0     0|0     0|0

Note the extra sample tsk_3 in the header.

The current code essentially assumes that all individuals are associated with sample nodes, and it is possible to come up with other examples where incorrect data is output without raising an error.

@jeromekelleher jeromekelleher added the bug Something isn't working label Jul 28, 2022
@jeromekelleher jeromekelleher changed the title Individuals not associated with nodes can lead to incorrent VCF Individuals not associated with nodes can lead to incorrect VCF Jul 28, 2022
jeromekelleher added a commit to jeromekelleher/tskit that referenced this issue Jul 28, 2022
jeromekelleher added a commit to jeromekelleher/tskit that referenced this issue Jul 28, 2022
@mergify mergify bot closed this as completed in 2d0d33e Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant