Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color by genotype refactoring + gene names with hyphens/underscores #681

Merged
merged 12 commits into from
Nov 19, 2018

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented Nov 15, 2018

This branch started as work to add support for hyphens and underscores in gene names (#669). It quickly led to refactoring the color by genotype handling so that the encoding/decoding could be handled in a single, well-factored place. That in turn led to cleanups around the color by controls. The story and rationale for all changes is in the commit messages. It will be more digestible if reviewed commit-by-commit in chronological order rather than as one big lump.

@jameshadfield, I'd like your input on these changes if you have time.

@emmahodcroft, I believe you had the original need for hyphens and underscores being supported in gene names in auspice. It should work on this branch!

…instead of (0, length).  In the UI, positions are one-origin along the
gene length, so the last position should be included.
Consolidates test/encoding/decoding logic into three main functions
which are used throughout the codebase instead of hardcoded assumptions
about the serialization format ("gt-<gene>_<pos>") in many locations.
This resolves existing and prevents future divergence in the serialized
format, while allowing it to change in the future with minimal or no
change outside of getGenotype.js.

Multiple positions are supported, but multiple genes are not at this
time.  I've left that for future work since there is not a UI for
multiple genes.  This refactoring will make it easier if we do.
Terminology (prot vs. gene) was normalized for consistency when trivial.

As a result of improved parsing, gene names with embedded underscores
and hyphens are now supported.  Resolves #669.
…ction

Leaves it up to the top-level render(), where the same conditional
exists, to decide when genotype controls should be shown.  The nested
conditional is unnecessary, and gtPositionInput() now matches the
gtGeneSelect() render function's behaviour in this regard.

Diff best viewed with whitespace ignored.
Default gene (HA1 or nucleotide) and position (1) selections are
replaced by informative placeholders and nothing changes until a full
selection is made.

If an invalid position is entered, the app remains at the last good
state instead of resetting back to a default state.  This allows a
correction of the position entered instead of requiring a re-entering.

The previous placeholders rarely or never triggered because default
values were set.
…rectly

This helps maintain consistency in the handling across different
components.
…lor by field

It is easier to reason about and keep track of whole state replacement
than the combined effects of individual field updates.

This is especially true for logically linked state like geneSelected and
positionSelected, which easily becomes out of sync with colorByGenotype.
For example, geneSelected and positionSelected were not reset when
switching from a genotype coloring to any other color by field.  A few
individual field updates remain where it doesn't make any more sense to
think about them in terms of replacement.
External state in redux now has one entrypoint
(componentWillReceiveProps) and one exitpoint (componentDidUpdate).

This consolidates calls to redux's dispatch() and ties it to changes in
local component state, simplifying the logic of the color by, gene, and
position input handlers.  setColorBy() and setGenotypeColoring() are now
one function (componentDidUpdate()) instead of two interacting
functions.
@tsibley
Copy link
Member Author

tsibley commented Nov 15, 2018

The corresponding JSON schema changes in augur to allow hyphens is https://github.com/nextstrain/augur/compare/schema-gene-names.

@jameshadfield
Copy link
Member

Brilliant. Would you have time tomorrow afternoon to talk through this?

@trvrb
Copy link
Member

trvrb commented Nov 15, 2018

This is great Tom. The handling of gt- as color by did seem messy.

@tsibley
Copy link
Member Author

tsibley commented Nov 15, 2018

@jameshadfield Yep, I'm certainly happy to walk through this. Tomorrow might be tight, however, with lab meeting in the morning and then our offsite meeting in the afternoon. I'll ping you on Slack to figure something else out.

@tsibley
Copy link
Member Author

tsibley commented Nov 16, 2018

Walked through this with James. Overall we're happy with it. I'm going to make the color by control debouncing more specific to genotype (so non-genotype color bys don't delay 400ms), and then it should be good to merge.

This avoids triggering a new state on every keypress in a short
sequence, which is not only slow but inconsistent with expectations that
intermediate colorations are shown while, for example, backspacing a
position input to delete it.

400ms seemed about right in my testing (300ms was too quick for most
typists, 500ms seemed sluggish), but that value may want to be adjusted.
It's good practice to avoid hardcoding such magic values into a myriad
of places and makes maintenance easier.  The name also lends more
context to its usage.

Using a global constant also means there is only a single place to
change the value, at least in auspice, if we need to.  It's not unlikely
we'll need to change it, because there are plenty of legitimate genes
named "nuc", e.g. <https://www.ncbi.nlm.nih.gov/gene/?term=nuc>.

Note that this doesn't refactor _all_ references to the string "nuc", as
the same value is also used for a different concept, the "mutation type"
or "mutType", which is valued "nuc" or "aa" (or sometimes false).  It's
important that magic value constants aren't used merely for their value
alone, but for the conceptual thing they represent so as not to conflate
changes in the future.
…to make it clear it returns a "mutType".

The mutType is closely related to but different from the genotype gene
name.  Embedding mut type into the function name makes it clear what's
returned and easier to find with other mut type code when grepping.
Though the development webpack config includes source maps, they were
disabled in production. When debugging a live build (or a local build
where you don't want hot reloading), it's very useful to have a source
map for the browser to translate between the packed dist/bundle.js and
the original code.

This matches the recommendations of webpack's documentation:

    https://webpack.js.org/guides/production/#source-mapping
…instead of bundling the former with the latter.

Changing the mutType must happen _before_ updating colors because the
entropy bars need to be recomputed for the new mutType before applying
the new genotype colorBy.  Otherwise, the entropy component tries to
apply the new genotype colorBy to bars of the wrong mutType, which in
turn causes all sorts of errors ("entropy out of sync" and selected
positions not matching the data bars).

The state dependencies are a bit tangled here, but de-tangling them is a
larger project for another time.
@tsibley
Copy link
Member Author

tsibley commented Nov 16, 2018

@jameshadfield I've re-applied the debouncing to only the genotype updates and repushed the branch. The new commit is 1b46f72, which makes these changes between the old and new version. Give it a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants