Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch (optional) root-sequence JSON #1197

Merged
merged 3 commits into from
Oct 7, 2020
Merged

Fetch (optional) root-sequence JSON #1197

merged 3 commits into from
Oct 7, 2020

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Aug 7, 2020

Auspice has had a long-standing issue where choosing a genotype for a position where no mutations were observed resulted in an uninformative coloring of the tree. This is because we don't store the ancestral (root) sequence in the dataset JSON and thus rely on mutations to infer it.

Upon dataset load we now make a request for the "root-sequence" sidecar file. If this request is successful we use the data to color genotypes for which there are no mutations. The get-dataset script (which runs for auspice heroku deployments) has been modified to fetch vic & yam from the staging server, as corresponding root-sequence JSONs were present there but not on nextstrain-data.

You can see this in action via https://auspice-root-seq-doeexpnizvfdt.herokuapp.com/flu/seasonal/vic/ha/3y?c=gt-HA2_120 -- notice how the tree starts off with a grey coloring and then once the root-sequence JSON request arrives the coloring updates to show the correct AA (T). Contrast this with https://auspice-root-seq-doeexpnizvfdt.herokuapp.com/flu/seasonal/h3n2/ha/3y?c=gt-HA2_120 for which the root-sequence JSON isn't available.

@emmahodcroft do you have a root-sequence JSON for a TB dataset? Presumably that will be a large JSON and it'd be good to use that for testing to see if there are any issues.

@jameshadfield jameshadfield temporarily deployed to auspice-root-seq-doeexpnizvfdt August 7, 2020 06:47 Inactive
@jameshadfield jameshadfield temporarily deployed to auspice-root-seq-doeexpnizvfdt August 7, 2020 06:54 Inactive
jameshadfield added a commit to nextstrain/ncov that referenced this pull request Aug 7, 2020
This modifies the default workflow to produce the sidecar root-sequence JSON
for each build. This is in preparation for nextstrain/auspice#1197
(see there for an explanation of the advantages this JSON gives us)
@rneher
Copy link
Member

rneher commented Sep 29, 2020

this would be quite useful also for nextclade. we could then fetch the root sequence along with the tree and make it work for any of our builds.

Currently genotypes are unknown for positions without any mutations, as it is through mutations we infer the appropriate values to display. This commit changes the coloring to be grey in such cases, rather than blue/green.
Upon dataset load we now make a request for the "root-sequence" sidecar file. If this request is successful we use the data to color genotypes for which there are no mutations (previously the nt/aa at such positions weren't known)
This switches to getting vic & yam from the nextstrain staging bucket as the root-sequence JSONs weren't present on the data bucket
@jameshadfield jameshadfield temporarily deployed to auspice-root-seq-oafyzuc8kjwa3 October 7, 2020 01:37 Inactive
@jameshadfield
Copy link
Member Author

Merging after rebasing onto master & retesting

@jameshadfield jameshadfield merged commit 29dca85 into master Oct 7, 2020
@jameshadfield jameshadfield deleted the root-seq branch October 7, 2020 02:00
jameshadfield added a commit to nextstrain/ncov that referenced this pull request Nov 2, 2020
This modifies the default workflow to produce the sidecar root-sequence JSON for each build. See nextstrain/auspice#1197 for the functionality this file gives Auspice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants