Fetch (optional) root-sequence JSON #1197
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Auspice has had a long-standing issue where choosing a genotype for a position where no mutations were observed resulted in an uninformative coloring of the tree. This is because we don't store the ancestral (root) sequence in the dataset JSON and thus rely on mutations to infer it.
Upon dataset load we now make a request for the "root-sequence" sidecar file. If this request is successful we use the data to color genotypes for which there are no mutations. The get-dataset script (which runs for auspice heroku deployments) has been modified to fetch vic & yam from the staging server, as corresponding root-sequence JSONs were present there but not on nextstrain-data.
You can see this in action via https://auspice-root-seq-doeexpnizvfdt.herokuapp.com/flu/seasonal/vic/ha/3y?c=gt-HA2_120 -- notice how the tree starts off with a grey coloring and then once the root-sequence JSON request arrives the coloring updates to show the correct AA (T). Contrast this with https://auspice-root-seq-doeexpnizvfdt.herokuapp.com/flu/seasonal/h3n2/ha/3y?c=gt-HA2_120 for which the root-sequence JSON isn't available.
@emmahodcroft do you have a root-sequence JSON for a TB dataset? Presumably that will be a large JSON and it'd be good to use that for testing to see if there are any issues.