Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc core typescripting #3605

Closed
wants to merge 3 commits into from
Closed

Misc core typescripting #3605

wants to merge 3 commits into from

Conversation

rbuels
Copy link
Contributor

@rbuels rbuels commented Mar 22, 2023

These are some probably worthwhile core changes split out from the REST adapter PR #3503

better if blankStats() does not cast
since TS won't check if all the necessary fields are filled when casting
improve TS types for feature score stats
add jsdoc comments about region stats types
we use interbase coords
correct stats tests
typescript mst-reflection.ts, might be a tiny bit better than leaving it js
improve some ts types
some config slot utility types, some better types in mst-reflection.ts
StringEnumEditor does not need to use getPropertyType
improve more typescripting
@github-actions github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Mar 22, 2023
@rbuels rbuels added enhancement New feature or request housekeeping needs to be tidied up, is a drag on team performance and removed needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) enhancement New feature or request labels Mar 22, 2023
@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Merging #3605 (dab7e28) into main (b766f03) will increase coverage by 0.02%.
The diff coverage is 86.56%.

❗ Current head dab7e28 differs from pull request most recent head 13e0871. Consider uploading reports for the commit 13e0871 to get more accurate results

@@            Coverage Diff             @@
##             main    #3605      +/-   ##
==========================================
+ Coverage   62.89%   62.91%   +0.02%     
==========================================
  Files         869      866       -3     
  Lines       30227    30253      +26     
  Branches     7280     7300      +20     
==========================================
+ Hits        19010    19035      +25     
+ Misses      11031    11030       -1     
- Partials      186      188       +2     
Impacted Files Coverage Δ
packages/core/configuration/configurationSchema.ts 79.64% <ø> (ø)
packages/core/configuration/configurationSlot.ts 63.63% <ø> (ø)
packages/core/util/types/index.ts 70.96% <ø> (ø)
...readsheetView/models/ColumnDataTypes/LocString.tsx 12.12% <ø> (ø)
...rc/SpreadsheetView/models/ColumnDataTypes/Text.tsx 14.81% <ø> (ø)
plugins/wiggle/src/WiggleRPC/rpcMethods.ts 65.45% <0.00%> (ø)
plugins/wiggle/src/util.ts 71.79% <50.00%> (ø)
packages/core/util/mst-reflection.ts 61.53% <57.14%> (ø)
packages/core/data_adapters/BaseAdapter.ts 72.41% <72.72%> (-2.29%) ⬇️
packages/core/util/stats.ts 98.80% <97.43%> (-1.20%) ⬇️
... and 3 more

... and 17 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Mar 24, 2023
@cmdcolin
Copy link
Collaborator

I cherry-picked the misc stuff off this branch (1075f3d) and the remainder is the stats code essentially
I think it's possible that the stats thing is worth reviewing still. It seems like there is "conflation" of concerns where our "stats" code applies to "quantitative stats" for y scalebar, feature density stats for toggling when to display 'too much data' warnings, and then just the issue of maybe undefined's existing in e.g. a score field on just some features

If it helps to decouple these concerns, I think stats estimation could go through a different code path

@cmdcolin
Copy link
Collaborator

example where 'two different return types' are used for the single function, feature density on one but not the other, indicating the stats estimation vs quantitative stats situation https://github.com/GMOD/jbrowse-components/pull/3605/files#diff-9d5632c87842dea801ef5acbc3f15abac30b437769d9b1ace2fc6f86542a752eR254-R263

@rbuels
Copy link
Contributor Author

rbuels commented Mar 24, 2023

Yes, with the current feature adapter API, we don't know exactly what kind of stats getRegionStats will return, so the quantitative stats and feature density stats have to use the same code path.

Having different kinds of stats estimation go through different code paths would necessitate some kind of change to the feature adapter API, probably something like getRegionScoreStats and getRegionFeatureDensityStats, is that what you had in mind?

@cmdcolin
Copy link
Collaborator

Having different kinds of stats estimation go through different code paths would necessitate some kind of change to the feature adapter API, probably something like getRegionScoreStats and getRegionFeatureDensityStats, is that what you had in mind?

currently there are 'estimateRegionsStats' used essentially for feature density, though it can also return number of bytes estimated also, and 'getMultiRegionStats' which is used by the quantitative tracks. could potentially rename to estimateMultiRegionStats to be more consistently named, and/or, use a more descriptive word than stats

@rbuels
Copy link
Contributor Author

rbuels commented Mar 24, 2023

Ah, that's a problem. estimateRegionsStats is supposed to be a protected method of data adapters, but it looks like it's being called directly in quite a number of places now in main. I think that needs to change. Calling code isn't supposed to know how the data adapter is getting the stats, whether they are precomputed or what. All the "estimateStats" code in the main should probably be renamed to call getRegionStats, which in the case of the base adapter just delegates to estimate the region stats, but is meant to be overridden by adapters that might actually have precomputed stats, like bigwig.

But that's kind of a separate question. Again: in order to avoid the same code path being used for feature density, scores, and (later) histogram bins, it seems like we would need to change the adapter API. Is that what you are thinking?

@cmdcolin
Copy link
Collaborator

I'm open to changing the API, but also don't necessarily understand your proposal to e.g. rename estimateStats to getRegionStats because that would, in the current way things are done, make quantiative and feature density type stats have a single code path again which is what the current approach already IMO properly avoids

@rbuels
Copy link
Contributor Author

rbuels commented Mar 24, 2023

The original design of the data adapters API had two public methods for stats: getRegionStats for single regions, and getMultiRegionStats for multiple regions. Those would return whatever stats were available, whether they were density, scores, histogram bins, or whatever. The estimateRegionStats method was an internal method of the base adapter that would fetch feature data and try to calculate some stats from it, and it was for use by data adapters that didn't have precomputed stats (which is a lot of them, but definitely not all). Looks like it wasn't marked protected, but it should have been. This design makes it possible to use any data adapter with any track type and have it pretty much work, which has always been part of the design of JBrowse.

Looks like about a year ago the estimateRegionsStats method started being used by calling code (that is, code outside of the data adapters). The problem with this is, this means that data adapters that have pre-computed feature stats, that don't estimate them, like NCList adapters, either won't have their pre-computed stats used, or they have to override the estimateRegionsStats method to return data that, despite the name of the method, is not actually estimated. So the method name then becomes inaccurate.

But let's focus on where we need to get to: we need a uniform data adapter API that lets (in theory) any data adapter by any track type. The method names need to be accurate for what they do.

And also, it sounds like you think it's important for quantitative stats (scores, stddev, etc) and density stats (featureDensity, bytes, precalculated histogram bins, etc) to be separated, so that the kind of stats that are being returned are easier for calling code to understand, is that right? Is the motivation more to make the calling code easier to write (avoid needing to use type guards or other runtime reflection to figure out what is in the returned data), or more for performance?

@cmdcolin
Copy link
Collaborator

@rbuels and I met today

we identified that there are (at least) two types of things that the word "stats" refers to, used in different contexts. as a result, we will try to distinguish these two usages by renaming some functions in BaseAdapter

quantitative stats

Quantitative stats are related to e.g. y-scalebars on the quantitative tracks. We propose rename functions like this

getRegionStats -> getRegionQuantitativeStats
getMultiRegionStats -> getMultiRegionQuantitativeStats

"feature density"-type stats

Feature density type stats are related to displaying the "Region too large, zoom in/force load" type messages. We propose renaming functions like this

estimateRegionsStats -> getMultiRegionFeatureDensityStats (alternative considered getRegionTooLargeStats)

Note 1. The getMultiRegionFeatureDensityStats can return feature density of byte size download estimate as a tagged type. this is currently what estimateRegionsStats already does.

Note 2. The singular method estimateRegionStats does not exist currently. Reason being is that we just call the 'multi-region' (estimateRegionsStats) version by default, even though only the first region of the multi-region set is looked at in the baseadapter's version. if we want to add the single region version, might be named getRegionFeatureDensityStats.

@cmdcolin
Copy link
Collaborator

note: also considered splitting feature density type stats into "byte range" stat estimate and "feature density" estimate, but then the display code would need to figure out how to decide which to call

the proposals there were that e.g. byte size estimate could be undefined by default, and if some calling code received undefined, it could further call feature density estimation. I proposed not doing this in favor of keeping "getMultiRegionFeatureDensityStats" able to return either a true feature density or a byte range estimate, but if there is a strong feeling for splitting it, let me know

@rbuels rbuels closed this Mar 30, 2023
@cmdcolin cmdcolin deleted the misc_core_typescripting branch June 8, 2023 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping needs to be tidied up, is a drag on team performance needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants