Fix dynspread not to count already-spread pixels and respect array bnds #1001

jbednar · 2021-04-11T05:03:27Z

Fixes #1000. As described in #1000, dynspread was not working as documented or designed, because it was iteratively measuring the density of already-spread pixels, which produced large dense regions that were indistinguishable from dense regions of datapoints. Also, it turns out, the density calculations were reading outside the bounds of the array, which is dangerous and likely to give incorrect readings for density, even though it does not involve writing outside the array.

Fixed both problems by (1) respecting the bounds of the array when computing density, and (2) iteratively measuring density using an increasing neighborhood (as before) but no longer with actual spreading for each interative step (unlike before). I.e., instead of spreading then checking to see if the density metric is exceeded, simply tries increasingly large values of a neighborhood radius to count how many pixels have neighbors within this radius, and stopping when that normalized number reaches the specified threshold. The neighborhood search is now twice as large as the spreading amount being tested, since unlike actual spreading each datapoint has to check distance to the neighboring datapoint's center, not edge, but it no longer has to do actual spreading, so the speed may not decrease much.

In any case, the behavior is now vastly more appropriate, and is approximately how I first envisioned dynspread working, smoothly switching to larger point sizes (even for large max_px values of e.g. 100) on zooming in. The default threshold of 0.5 actually still seems about right to me, so no need to update that. The max_px default of 3 now seems low; 5 seems more visible and clear once zooming in. But I'll leave the default value unchanged here since really it's the default value in the plotting library (e.g. HoloViews hv.operation.datashader.dynspread) that affects interactive use.

Here's an example using the datashader_dashboard, with max_px=10 (fairly big!):

DynspreadZoom.mp4

philippjfr · 2021-04-11T18:03:22Z

Nice! My main question I have is if you looked at possible performance implications?

Also, it turns out, the density calculations were reading outside the bounds of the array, which is dangerous and likely to give incorrect readings for density

😬

jbednar · 2021-04-11T18:14:08Z

I have not done any benchmarking, but I wasn't able to spot performance differences in my informal testing. It gets slower as the image size increases and as max_px increases (but only for images sparse enough to approach max_px).

jlstevens · 2021-04-11T18:35:25Z

Very nice! I'll give this PR a go tomorrow morning and review.

…unds

jbednar · 2021-05-25T15:00:04Z

I've pushed a version of dynspread that defines density for a given radius as "fraction of non-empty pixels that have a neighbor within the radius". This is the first time dynspread has had a consistent, accurate, and describable interpretation, rather than being an opaque metric whose threshold can only be set empirically. It also seems to work very well for points plots, enlarging the spread just enough in the various points cases so that points become visible up to the max_px as long as they do not overlap for the default threshold=0.5, and with meaningful levels of overlap controllable with values like 0.8 or 0.9. I'm now happy with it for points plots.

However, it no longer works at all for line plots, because by construction all points along the line always have an immediate neighbor, and so no spreading is ever applied. I'm still pondering whether there is a meaningful dynspread metric that (a) works using pixels alone, as it does now, yet (b) appropriately spreads line plots with low overlap while not spreading collections of many overlapping lines.

jlstevens · 2021-05-26T12:08:08Z

I've always felt that a single dynamic spreading approach for multiple glyphs (e.g points versus lines) would have such tradeoffs. Imho what lines really need is a dynamic thickness (with antialiasing) instead of a convolution type approach.

At the holoviews level we could know which glyph is being rasterized and choose different code paths that way. At the datashader level it would be up to the user to choose a dynspread approach appropriate to what is being rasterized. Of course, it would be even better If you can think of a single, easy-to-explain approach that works for both points and lines!

I also think that the work in this PR to improve dynspread behavior would be good to make available, even if it only works for points.

jbednar · 2021-05-27T01:34:22Z

To-do items: Ok, after discussion with @jlstevens and @philippjfr, we agreed:

For points plots, this new algorithm is a clear improvement and behaves very reasonably, and should go in. There might be other algorithms that would work as well, but this one is very simple to reason about and matches what we generally want dynspread to do, when operating under its current constraints.
For line plots, it's more complicated:
a. The new algorithm does not ever do any spreading for normal line/curve/area plots, because every point in such plots of spatially extended items always has a neighbor.
b. In nearly every case we could think of, spread(px=1) is actually what we'd want to see for a line plot, and that has always behaved well.
c. dynspread previously typically ended up spreading a line plot a little, nearly being equivalent to spread(px=1), whereas now line plots will no longer have spreading with dynspread. To re-enable spreading (previously only happening as a side effect of a broken algorithm), those examples will need updating.
d. It's conceivable that we could write a neighborhood-based dynspread function for use with lines that looks at the fraction of points filled out of the neighborhood radius, and this could let a line plot use no spreading when on balance it's a dense tangle, and spreading when zoomed in. Making such an algorithm would be a research project (probably not too hard?), and normalizing it to behave similarly with respect to the threshold value as the current algorithm would be difficult, so it seems like it could be a fully separate function if we do decide we need it. I propose that we not do that work, and just see if a fixed spread value is good enough; seems more of a priority to make an antialiased line with thickness than to have a special dynspread for lines.
Should we offer the old dynspread function somehow for backwards compatibility? Our conclusion is that preserving the old behavior is not worth the complexity, but we'll need some text about this issue in the release notes regardless.
a. Cons:
1. The old one was buggy in several ways and just did not work well.
2. Supporting both is more complex and messy.
3. Supporting both would e.g. require explaining to users to do something like datashader.transfer_functions.dynspread = datashader.transfer_functions.dynspread_old, which is not substantially easier than telling them to change dynspread to spread(px=1).
  b. Pros:
4. People who were relying on the previous buggy behavior would more easily be able to return to it.
The new dynspread appears to work as well as it can given the underlying assumptions, but see Smoother spreading step size #1012 for a proposal for how to eliminate the sharp jumps in spreading sizes currently required.

To-do list before merging: (Jim)

Update the Datashader tests that were verifying the old buggy behavior.
Search the datashader docs for dynspread used on line plots, and update those to use spread(px=1) instead.
Rename density to something more accurate. neighbor_fraction? [Skipped; seems like as good a name as any.]
Choose default max_px value. [The current value of 3 seems fine at the datashader level. It would have to be overridden at the plotting library level, and I personally suggest 5 at the HoloViews level, but that's not part of this PR.]
Write a release notes entry about dynspread updates.

To-do after merging:

Search related sites like holoviews for dynspread used on line plots, and update those to use spread(px=1) instead.

To-do after releasing datashader 0.13:

Update examples.pyviz and holoviews websites to use Datashader 0.13 so that their dynspread behavior improves.

jbednar · 2021-06-05T20:52:17Z

Tests are passing on python 3.6, but failing on later versions because of holoviz/spatialpandas#68 and possibly other issues with related libraries. Merging this now, but still need to address the downstream issues.

jbednar added this to the v0.12.2 milestone Apr 11, 2021

jbednar requested a review from jlstevens April 11, 2021 05:06

jbednar added 3 commits May 23, 2021 00:06

Fix dynspread not to count already-spread pixels and respect array bo…

a9e64aa

…unds

Updating missed bit

bd7336b

Updated tests to match new behavior

852291d

jbednar force-pushed the dynspread branch from 507ee3a to 852291d Compare May 23, 2021 05:07

Defined dynspread as fraction of non-empty pixels with neighbors

e4a6196

Skip unneeded zero-radius check for neighbors

ad31807

jbednar removed the request for review from jlstevens May 26, 2021 19:03

jbednar modified the milestones: v0.13.1, v0.13.0 May 26, 2021

This was referenced May 26, 2021

dynspread fails when the y-axis value is constant #775

Closed

Smoother spreading step size #1012

Open

jbednar added 4 commits June 4, 2021 01:08

Made dynspread threshold inclusive to allow full spreading

c1f659e

Updated tests for new dynspread

9d4ef6d

Temporarily pin to avoid broken dask release

8643778

Updated dynspread usage in examples

4f85417

jbednar added 2 commits June 5, 2021 15:53

Reverted temporary pin

c4ec3c9

Fixed typo

e1706bb

jbednar merged commit 1ae52b6 into master Jun 5, 2021

maximlt deleted the dynspread branch December 25, 2021 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dynspread not to count already-spread pixels and respect array bnds #1001

Fix dynspread not to count already-spread pixels and respect array bnds #1001

jbednar commented Apr 11, 2021 •

edited

Loading

philippjfr commented Apr 11, 2021

jbednar commented Apr 11, 2021

jlstevens commented Apr 11, 2021

jbednar commented May 25, 2021

jlstevens commented May 26, 2021 •

edited

Loading

jbednar commented May 27, 2021 •

edited

Loading

jbednar commented Jun 5, 2021

Fix dynspread not to count already-spread pixels and respect array bnds #1001

Fix dynspread not to count already-spread pixels and respect array bnds #1001

Conversation

jbednar commented Apr 11, 2021 • edited Loading

philippjfr commented Apr 11, 2021

jbednar commented Apr 11, 2021

jlstevens commented Apr 11, 2021

jbednar commented May 25, 2021

jlstevens commented May 26, 2021 • edited Loading

jbednar commented May 27, 2021 • edited Loading

jbednar commented Jun 5, 2021

jbednar commented Apr 11, 2021 •

edited

Loading

jlstevens commented May 26, 2021 •

edited

Loading

jbednar commented May 27, 2021 •

edited

Loading