-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drag drop metadata improvements #1244
Conversation
The refactor was motivated by upcoming functionality and this influenced the implementation here. The change in action type name indicates that we will be able to extract more information from dropped files than just extra color-bys.
Here we use microReact-style interpretation of column header names. Specifically, a double-underscore in a header name is interpreted as extra information, with valid suffixes "colour", "autocolour" and "shape". We ignore "shape" (as auspice doesn't have this capability) and "autocolour" (as that's our default). We parse "colour" as a column defining colours for the corresponding column. We enforce that these are (long) HEX values, but this should be relaxed in the future. Where multiple nodes with the same trait value define different colours, we average them similarly to the map.
|
||
/* There are a number of "special case" columns we currently ignore */ | ||
const fieldsToIgnore = new Set(["name", "div", "vaccine", "labels", "hidden", "mutations", "url", "authors", "accession", "traits", "children"]); | ||
fieldsToIgnore.add("num_date").add("year").add("month").add("date"); /* TODO - implement date parsing */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An obvious next step is to allow dates to be specified via additional metadata. Interpreting this as the num_date
attr is slightly complex as the coloring & tree-metric are tied together in the code. A different approach would be to extend our color schemes to allow "temporal" types and then add temporal info in the CSV/TSV as simply an additional color-by.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this would mean sequences in a time-tree with inferred dates were allowed to have 'real' dates supplied? Would it change where they're plotted on X-axis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards allowing metadata traits (in drag-n-drop data or in the main JSON) which are encoded as (e.g.) YYYY-MM-DD
to be inferred as temporal types by auspice, and a colour scale generated according. A good example of this would be nCoV's "submitted date" field. These would be "just another color-by" and thus different to the num_date
field which is used by auspice for the tree-time view. I don't see any easy way to use metadata to update / influence node (temporal) positioning, but am open to suggestions. (It's easy to see how such metadata could define the x-axis position of tips, but what do we do with internal nodes / tips not in the metadata?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Yes, I agree, that's why I was curious - was wondering how you'd adjust the other parts of the tree to accommodate moving tips! Unfortunately I have no answers to offer, just curiosity :)
This all sounds super-cool James!! What a neat bunch of features. If you have a moment to supply a dummy CSV that shows an example how one can make use of these (how to organize them, etc) that would be fantastic! (No rush though!) |
If you can provide a dummy file showing how to format the drag-n-drop files for the new things that are accepted, I'm happy to work from that, modify, add, etc, and play around to help review! (Sorry if I've missed this file already being somewhere!) |
One way to test the functionality of this is via a modified version of the zika-tutorial at https://auspice-drag-drop-metad-2eccco.herokuapp.com/zika-sparse-metadata. Notably this build does not have the country included as a coloring or a geographic resolution. The additional metadata TSV at https://nextstrain-scratch.s3.amazonaws.com/james/zika-extra-metadata.tsv: strain country latitude longitude random random__colour
1_0087_PF French Polynesia -17.6797 -149.4068 blue #184ae8
1_0181_PF French Polynesia -17.6797 -149.4068 blue #184ae8
1_0199_PF French Polynesia -17.6797 -149.4068 blue #184ae8
Aedes_aegypti/USA/2016/FL05 USA 39.7837304 -100.4458824 green #2dc234
BRA/2016/FC_6706 Brazil -10.3333332 -53.1999999 red #e7182a
Brazil/2015/ZBRA105 Brazil -10.3333332 -53.1999999 blue #184ae8
... will add the following data to around 25 of the strains in the build:
|
I tested this out on the WA trees (https://dev.nextstrain.org/groups/blab/ncov/wa/1y) w/ real metadata, and overall, these are fantastic new features that work really well! Specifically, I tested:
Comments:
Thanks again for the awesome work, James!! These are super useful features. |
Very good point about spelling @cassiawag ! @jameshadfield would it be possible to let the code accept both |
Allows for a common use case where one wants to filter the dataset to those samples in the CSV/TSV.
Previously only `<trait>__colour` was valid to specify the colour hexes, which was chosen to maximise compatability with MicroReact format files. Here we allow `<trait>__color` to be used as an alternate spelling. If both are specified, then `__colour` is used.
This interprets specific lat/long columns in the CSV/TSV as association a strain with a geographic location. As this approach defines lat-longs _per-sample_ it is orthogonal to Nextstrain's approach (where we associate coords to a metadata trait). The approach employed here is to create a new (dummy) trait whose values represent the unique lat/longs provided. If a JSON does not define any lat/longs then the appropriate additional metadata will trigger the map to become available (and displayed).
8a0fe85
to
9c15c4a
Compare
Thanks for the testing and comments @cassiawag & @emmahodcroft -- I've updated the documentation, allowed American spellings (color), and changed the value to "Strains in XXX.tsv" which makes the filtering look much nicer! P.S. Microreact uses |
A number of improvements to the drag-and-drop metadata functionality. See commit messages and documentation changes in this PR for more details.
Closes #1242
Closes #1239