Allow JSONs to define language for auspice #1049 #1221

eharkins · 2020-10-20T19:11:35Z

This is really #1218 by @charlie-jones but I added

a more friendly error message when you use a language that auspice doesn't recognize
documentation

eharkins · 2020-10-20T19:17:22Z

Since Auspice docs are in the process of migration, this will have to be updated in the migrated version as well. See nextstrain/docs.nextstrain.org#17

This is a squashed & rebased version of PR #1221, which itself superseded PR #1218. Closes #1049. Co-authored-by: Charlie Jones <ctwj88@gmail.com> Co-authored-by: eharkins <eli.harkins@gmail.com>

jameshadfield · 2021-03-10T02:11:52Z

Thanks @charlie-jones & @eharkins - this looks great.
Due to how the master branch has diverged since this PR, and the number of commits here, I ended up having to squash & rebase this to test this. That's PR #1303 which i'm going to merge now. Thanks!

…:master (#7) * Enforce unnormalized frequencies when data is lacking This commit forces controls.normalizeFrequencies to be false if there are any pivots where the total frequency is less than 0.1%. This addresses two issues: 1. In the existing code, attempting to normalize situations where pivots have 0% total frequency results in bad looking "all equal" bands. This commit removes the capacity to get into this bad looking app state. 2. When filtering to a particular clade, we often want to switch to unnormalized frequencies anyway (as opposed to filtering to geography). This commit accomplishes this automatically because filtering to an emerging clade will generally result in pivots with <0.1% frequency. * Show root-to-tip mutations in tip-clicked info box * Clean up mutation display in tip-clicked info box * Cap string length in tip-clicked info box * Added Polish language to locales * Added Polish language to sidebar options * fixed typos in sidebar.json * controls/filter: Avoid spread syntax with potentially large arrays Push each value individually instead of all at once, which results in more method calls but a much smaller call stack size. Alternatively, Array.concat could be used, but this follows the pattern of surrounding code and avoids reassignment of "options", which would also necessitate removal of the "const" declaration. The /tb/global build has ~277k genotype states, which resulted in a call to Array.push with as many arguments when the spread syntax was used. This blew through the call stack size limit on Chrome with an error like: RangeError: Maximum call stack size exceeded at FilterData.eval [as makeOptions] (filter.js?6bcb:65) at FilterData.render (filter.js?6bcb:100) … Firefox was unaffected, so presumably has a larger limit. Debugging was waylaid for a bit by the assumption that exceeding the call stack size necessarily meant deep recursion, but the lack of a deep stack trace led to the realization that it can also occur when a function's arguments are too many. Resolves nextstrain#1292. * Correctly format BCE dates BCE dates were correctly interpreted but incorrectly rendered due to a bug in the final string-prettying step. This would result in a tree with the correct layout and positioning, but incorrect labels ("-undefined"). This commit remedies this and adds a test. * Increase bundlesize limits This simply increases the limits to allow CI tests to pass. Large bundlesizes are a long-term concern, but given the recent bugs regarding bundling and other priorities this change essentially pushes improvements here to "sometime in the future" * Correctly handle reversions & multiple mutations The function `collectMutations` now doesn't report reversions (where the tip state = the ancestral state) and combines multiple mutations (e.g. A->B->C is now A->C rather than two separate mutations). * Ensure unique component keys for rendering * Report all root-to-selected-tip mutations * Indicate if frequencies can be normalized in sidebar Previous implementation continued to display the toggle icon but disabled its functionality (by forcing `normalizeFrequencies` to be `false`). Here we replace the toggle with a "not available" message & update the info-popup text. * Update normalizeFrequencies flag via redux actions Upon initial parsing of the frequencies JSON as well as frequency data updates we may set redux→controls→normalizeFrequencies→false. This commit modifies the LOAD_FREQUENCIES and FREQUENCY_MATRIX actions to pass this information to the reducer, rather than updating the redux state directly from within the actions. I couldn't find any bugs caused by the previous implementation, but this change is more in line with suggested behaviour and should help future-proof work here. * Update frequencies panel when data changes The frequencies component performed some basic comparisons between the previously rendered data and new data to avoid unnecessary re-renders. This logic was too simplistic and caused a bug where the component wouldn't re-render the graph when the data had indeed changed. This commit skips these checks. This may result in some unnecessary re-renders, however I couldn't find any in my testing. Closes nextstrain#1224 * [frequencies panel] Don't round frequencies below 1% When a frequency value is below 1% we now display "<1%" rather than rounding to the nearest integer which could lead to confusing output of "frequency: 0%". Closes nextstrain#1279 * Legend no longer obscures branches/tips. Shifts the top of the tree down slightly so that tips and branches cannot be hidden behind the (closed) legend, which prevents interacting with them. This only happens in rectangular / unrooted trees, as radial / clock views almost never have tips rendered in the top-left corner. * Styling adjustments for mutation list * Allow JSONs to define language This is a squashed & rebased version of PR nextstrain#1221, which itself superseded PR nextstrain#1218. Closes nextstrain#1049. Co-authored-by: Charlie Jones <ctwj88@gmail.com> Co-authored-by: eharkins <eli.harkins@gmail.com> * Fix spelling typo * changelog * version bump to 2.24.0 for release * Ensure metadata.display_defaults exists A bug was introduced in PR nextstrain#1280 where datasets which did not define `metadata.display_defaults` would crash, as the code assumed its existence. This property is optional in the dataset JSON. This commit ensures `display_defaults` exists in redux state after a dataset is loaded, thus allowing code to rely on its presence. This was preferred to checking for `display_defaults` in (all) the relevant sections of code, now and in future. (Using TypeScript, or expanding our smoke tests, would be approaches to avoiding these kinds of bugs in future.) * version bump to 2.24.1 for release * Treat accessions and urls as special node traits The schema defines these as "special" property, and we use them to render the value and link to be rendered via `<AccessionAndUrl>` within the tip-clicked panel. These should not be available as valid traits for general display. * Allow node_traits to define their own URLs This extends our interpretation of dataset-supplied traits to allow them to define a URL as well as a value. If a url is specified, then the value (in the tip-clicked panel) is rendered as a link. Closes nextstrain#1307 * Improve validation of URLs & add tests This improves our validation of URLs which should improve app stability. * Generic scatterplot layouts This adds a new layout for scatterplots and allows users to choose the x and y variables from the available colorings. The defaults are the tree metric (x-axis) and the current color-by (y-axis). The layout algorithm is largely unchanged from the root-to-tip layout. This presupposes that node trait values will be numeric, and thus map nicely to an axis. Future work will allow scales to map non-numeric values (e.g. categorical, ordinal, boolean scales) to a d3 domain for rendering. Currently these traits get assigned `0` as their x and/or y values. Similarly, the algorithm presupposes that all nodes (internal and terminal) have values and should be rendered. There will be many cases where nodes (especially internal nodes) do not have traits assigned. In these cases we should hide them from view, and remove any connecting branches. Future work needed: * More testing is needed for rare use cases, e.g. trees without divergence, datasets with no colorings. * Dataset JSONs and URL queries should be able to select the scatterplot variables. This commit is based off previous work by trvrb. Co-authored-by: Trevor Bedford <trevor@bedford.io> * Add support for "data_provenance" metadata In the early stages of COVID-19, we added support for acknowledging GISAID as the source of data in the Byline. This was inferred based on domain / dataset name heuristics. We now support data provenance to be defined in the dataset JSON (see nextstrain/augur#705) and all core nCoV builds have been updated to include this here. This commit parses and renders such information. Note that a previous commit removed the "Build info" from the byline for datasets displaying GISAID (see [0]), which I believe was an oversight. This commit reinstates it. [0] nextstrain@18d5d21 * Scatterplots improved for continuous variables This fixes some issues highlighted by the previous commit to improve rendering of scatterplots. We now limit scatterplot x,y variable choices to continuous-scaled colorings, and leave the display of other scale types to future work as this requires PhyloTree to switch to a new d3 scale. As not all nodes may have traits assigned (contrary to other tree layouts), we detect and hide those nodes from view, as well as any joining branches. We also expose the ability to toggle branches on/off. We also improve the starting variable choices for x & y. * Link out to gisaid.org This adds a link to gisaid.org from the GISAID logo (when present). It also adjusts parsing so that data_provenance.name == "gisaid" will still get picked up. * Unify clock and scatterplot layouts As the clock view is simply a specific type of scatterplot layout, this commit unifies the code and display between these two "separate" layouts. We preserve the clock button in the sidebar as this is a common action which we want to surface. **Show branch toggles** Are now rendered for both views. The layout of scatterplots does not consider internal nodes for calculating the domain if branches are not shown. Similarly, branch labels are not displayed if branches are not. **Regression Lines** These are now available for both layouts, and are toggled via a UI element similar to branches. Previously, the regression would be shown for clock layouts _if_ the branch metric was time, however the explicit UI element introduced here is better. For scatterplot views we calculate the regression with a free intercept, as the root node may not have co-ordinates defined (depending on chosen x,y variables), and additionally report the R^2. The display of the regression text can be improved in future commits. **Persist chosen state** To improve the UX, once a scatterplot has been viewed, we persist the x,y variables for future viewing. Similarly, the toggle state persists between clock & scatter layouts. * Store scatterplot state in URL query See added documentation for available queries * Render appropriate scatterplot axes grids This commit updates the logic for deciding the gridlines for both x and y axes for scatterplots. Previously we had a very limited range of cases to consider here. We now have two general functions available for creating grids - one for temporal scales and one for all other numeric scales (previously used only for divergence). We will need to add a third function when we expand scatterplots to plot non-continuous variables. * [bugfix] Initialize `filtersInFooter` for all datasets This fixes a bug noticed in auspice.us [1] where certain datasets would not have the `controls.filtersInFooter` state set, causing a crash when metadata was dropped on. Type / prop checking would have alerted us to this. [1] nextstrain#1304 * [phylotree] fix case where x-axis grid wasn't present A missed conditional resulted in certain configurations of the scatterplot "missing" the x-axis grid. (The grid was incorrectly being calculated for a temporal scale, which resulted in no meaningful grid lines.) Closes nextstrain#1323 * [phylotree] fix edge cases surrounding display of branch labels Fixes a couple of edge cases introduced by the scatterplot functionality which would result in a tree rendering with branch labels when it shouldn't have them (and vice versa). * make frequencies tend to 0 in absense of data * Use trait titles for data filter display Each coloring variable is defined by both a "key" and a "title". Keys are (largely) used internally, whereas titles are intended for user- facing display. This commit improves the "Filter Data" sidebar UI to use titles, resulting in a more consistent (and nicer) UI. Closes nextstrain#1322 * changelog * version bump to 2.25.0 for release * [PhyloTree] branch label bugfix Fixes a bug where we sometimes ask PhyloTree to update the branch labels for a view without any branch labels, which would cause auspice to crash. I first tried to fix this in a80e186 but that didn't cover all the situations when this could arise. * increase padding value for frequencies * lint appeasment * changelog * version bump to 2.25.1 for release * set frequencies explicity to 0 if total is too low * reduce frequency normalizaton threshold to single constant * remove unassigned variable * Allow continuous colorings to define anchor points The schema currently allows datasets to provide a scale for non- continuous scales where specific trait values are given colour hexes (missing values are given greys by auspice). Here we extend this to continuous scales by interpreting the same data structure as anchor points which we interpolate between using the same method as we currently use for generating default continuous color scales (d3's `interpolateRgb`) * Allow legend entries to be user-defined This allows continuous colour scales to define custom legend entries, via a `legend` key in the JSON. This allows control over the values in the scale which we use as legend elements, the displayed text, and the range of values which each entry covers. Bounds are enforced to be non-overlapping. If overlapping bounds are detected, we revert to Auspice dynamically generating these. (This is a requirement for future work which will map continuous tip values to a legend entry, which will allow pie-chart display using the legend swatches.) * Legend bound matching is (a, b] for continuous scales This restores the algorithm used to associate a hovered legend item to tips for continuous variables. Commit 0f37b1a (Mar 2018) incorrectly changed this to `tip \in [a, b]` rather than the intended (and documented) `tip \in (a, b]`. This takes on more importance given that the previous commit allows user-defined bounds. Note that the frequencies panel already used `(a, b]` matching, so now the legend matching mirrors this. * Extend user-provided legend info beyond continuous scales * Use filterOptions to modify search alg * GitHub Action to create nextstrain.org PR This action will run on each auspice PR and create a corresponding PR on nextstrain.org which includes a commit using the version of auspice from this (auspice) PR. This functionality is extremely useful for auspice development as it will allow us to use a Heroku review app to test auspice in the context of nextstrain.org There are a number of future improvements to implement: * New auspice releases (tagged commits on `release` branch) would ideally create a PR on nextstrain.org which could be merged to update the version of auspice there. * Other consumers of auspice (e.g. auspice.us) could be added to this GitHub Action. * Allow non-continuous scatterplot variables This implements a requested improvement to the original scatterplot implementation. The implementation hinges on two changes: (1) The collection of values for a given variable (e.g. x-var) need to be computed and passed to PhyloTree to act as the scale's domain. We reuse the colorScale machinery here, which could be optimised (see todo messages in code), but this has the advantage that the domain ordering matches the legend (unless user supplied). (2) PhyloTree needed to be modified to use non-linear scales, in this case `pointScale`. This commit should be fully functional, however there are some future improvements to be made: (i) Grid text is obscured and unreadable when there are many entries in the domain. (ii) Genotypes and Boolean scales are not yet available. (iii) Jitter should be added to nodes to avoid obfuscation. * Layout changes occur via redux thunk This commit is in preparation for allowing genotype to be a scatterplot variable. This will complicate the allowable scatterplot variables and force these to update upon colorBy changes. This is much cleaner if layout is changed in a thunk. * Allow genotype to be scatterplot variable Genotype is treated differently to other colorings in two important ways: (1) it can change value, for instance when changing the colorby to another genotype position and (2) it is stored in a different place to other colorings. These require scatterplot logic to be more complex as actions are no longer separate - we now require a NEW_COLOURS action to potentially update the layout which was formerly within the remit of the CHANGE_LAYOUT actions. This is achieved through a middleware layer. This implementation makes it clear that jitter and better domain spacing are crucial for scatterplots. * Improve padding for categorical scatterplot variables This prevents nodes falling on the axis itself or at the very end of the grid, which was especially noticeable for traits with small domains. * Add jitter to categorical scatterplots * Apply clipping to first column of legend We have had issues in the past with legend values from column 1 overflowing into column 2. For instance, issue nextstrain#899 was fixed by PR nextstrain#914 which implemented a maximum character limit for legend names. This solution can produce misleading views, such as those described in nextstrain#1306. This solution implements a clipping mask for column 1, avoiding the complication of limiting the string size. Column 2 already has similar behaviour because the SVG element of the legend itself performs the clipping. * changelog * version bump to 2.26.0 for release * Always show regression toggle for clock layout Fixes a bug where the ability to toggle regression lines was hidden for clock views. (The ability to hide this toggle is only intended for scatter layouts, where we should not expose the toggle unless both axes are showing continuous variables.) * Adjust grayscale color ramp The existing grayscale color ramp (used for values absent in an explicitly specified color scale) had values that were too dark and threw off the overall color balance. This commit narrows the grayscale color ramp to be more in line with pastel color ramp. * Inject a bit of color into the "grayscale" color ramp This adds a bit of blue into the grayscale color ramp. Still reads as mostly gray, but no colors seem to exist more in the same universe as canonical auspice color ramp. * changelog * version bump to 2.27.0 for release * Styling adjustments to footer text * Remove metadata download from GISAID datasets This commit uses dataProvenance in metadata to identify datasets using "GISAID" data. For these datasets, the full metadata download is swapped to an "acknowledgments" download that only includes the following fields: - strain - gisaid_epi_isl - genbank_accession - originating_lab - submitting_lab - author * Cleanup metadata headers This commit cleans up naming of metadata headers in downloaded metadata TSV. It does the following: 1. Keeps headers as input into "augur export" rather than renaming by title. Thus it has "originating_lab" rather than "Originating lab", "pango_lineage" rather than "PANGO lineage", etc... This should make it easier for people to process downloaded metadata from Auspice alongside metadata provisioned by Nextstrain (via GISAID or via S3). 2. Makes "date" the second column as this is often what's most important. I couldn't figure out a way to intelligently order remaining fields. My first thought was to use metadata.colorings, but this isn't sorted. 3. Fixes "accession". It had been exporting as "[object Object]". * Update changelog * version bump to 2.28.0 for release Co-authored-by: Trevor Bedford <trevor@bedford.io> Co-authored-by: James Hadfield <hadfield.james@gmail.com> Co-authored-by: Michał Kowalski <mkowalski@argon.mcb.uj.edu.pl> Co-authored-by: Thomas Sibley <tsibley@fredhutch.org> Co-authored-by: james hadfield <jameshadfield@users.noreply.github.com> Co-authored-by: Charlie Jones <ctwj88@gmail.com> Co-authored-by: eharkins <eli.harkins@gmail.com> Co-authored-by: Richard Neher <richard.neher@unibas.ch> Co-authored-by: Muhammad Aditya Hilmy <mhilmy@hey.com>

charlie-jones and others added 27 commits September 20, 2020 11:38

JSON issue fix

196d42d

JSON fix

2b5fe1b

Merge branch 'master' of https://github.com/nextstrain/auspice

26ec1c3

json display_defaults updates lang

bbecc0f

works on npm run dev (still not on npm run view)

692d35f

works on npm run dev (still not on npm run view) - with lint

8be29cd

works on npm run dev (still not on npm run view) - smoke tests

c574f8b

fixed lint, smoke tests pass on my macbook

fb76406

all of it works now

f2d40cc

feature done

7ac79f4

upd region->country (since I changed it earlier for testing)

533a9d4

JSON issue fix

345777e

JSON fix

898e16c

json display_defaults updates lang

b7d398b

works on npm run dev (still not on npm run view)

5e954dc

works on npm run dev (still not on npm run view) - with lint

6f81bc6

works on npm run dev (still not on npm run view) - smoke tests

a7d4e31

fixed lint, smoke tests pass on my macbook

0f5dc23

all of it works now

b6f1915

feature done

67bdbf1

upd region->country (since I changed it earlier for testing)

731c2ed

done

7a22b2c

finished

dcd7a4a

done

2e875b4

fixed

7245c6d

#1049 nice error message

26c048e

document #1049

b3a06c6

eharkins requested a review from jameshadfield October 20, 2020 19:11

jameshadfield temporarily deployed to auspice-1049-json-lang-imeeatz October 20, 2020 19:11 Inactive

lint

3afd719

eharkins temporarily deployed to auspice-1049-json-lang-imeeatz October 20, 2020 19:17 Inactive

jameshadfield mentioned this pull request Mar 10, 2021

Allow JSONs to define language #1303

Merged

jameshadfield closed this Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow JSONs to define language for auspice #1049 #1221

Allow JSONs to define language for auspice #1049 #1221

eharkins commented Oct 20, 2020

eharkins commented Oct 20, 2020

jameshadfield commented Mar 10, 2021

Allow JSONs to define language for auspice #1049 #1221

Allow JSONs to define language for auspice #1049 #1221

Conversation

eharkins commented Oct 20, 2020

eharkins commented Oct 20, 2020

jameshadfield commented Mar 10, 2021