Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dynspread not to count already-spread pixels and respect array bnds #1001

Merged
merged 11 commits into from
Jun 5, 2021

Conversation

jbednar
Copy link
Member

@jbednar jbednar commented Apr 11, 2021

Fixes #1000. As described in #1000, dynspread was not working as documented or designed, because it was iteratively measuring the density of already-spread pixels, which produced large dense regions that were indistinguishable from dense regions of datapoints. Also, it turns out, the density calculations were reading outside the bounds of the array, which is dangerous and likely to give incorrect readings for density, even though it does not involve writing outside the array.

Fixed both problems by (1) respecting the bounds of the array when computing density, and (2) iteratively measuring density using an increasing neighborhood (as before) but no longer with actual spreading for each interative step (unlike before). I.e., instead of spreading then checking to see if the density metric is exceeded, simply tries increasingly large values of a neighborhood radius to count how many pixels have neighbors within this radius, and stopping when that normalized number reaches the specified threshold. The neighborhood search is now twice as large as the spreading amount being tested, since unlike actual spreading each datapoint has to check distance to the neighboring datapoint's center, not edge, but it no longer has to do actual spreading, so the speed may not decrease much.

In any case, the behavior is now vastly more appropriate, and is approximately how I first envisioned dynspread working, smoothly switching to larger point sizes (even for large max_px values of e.g. 100) on zooming in. The default threshold of 0.5 actually still seems about right to me, so no need to update that. The max_px default of 3 now seems low; 5 seems more visible and clear once zooming in. But I'll leave the default value unchanged here since really it's the default value in the plotting library (e.g. HoloViews hv.operation.datashader.dynspread) that affects interactive use.

Here's an example using the datashader_dashboard, with max_px=10 (fairly big!):

DynspreadZoom.mp4

@jbednar jbednar added this to the v0.12.2 milestone Apr 11, 2021
@jbednar jbednar requested a review from jlstevens April 11, 2021 05:06
@philippjfr
Copy link
Member

Nice! My main question I have is if you looked at possible performance implications?

Also, it turns out, the density calculations were reading outside the bounds of the array, which is dangerous and likely to give incorrect readings for density

😬

@jbednar
Copy link
Member Author

jbednar commented Apr 11, 2021

I have not done any benchmarking, but I wasn't able to spot performance differences in my informal testing. It gets slower as the image size increases and as max_px increases (but only for images sparse enough to approach max_px).

@jlstevens
Copy link
Collaborator

Very nice! I'll give this PR a go tomorrow morning and review.

@jbednar
Copy link
Member Author

jbednar commented May 25, 2021

I've pushed a version of dynspread that defines density for a given radius as "fraction of non-empty pixels that have a neighbor within the radius". This is the first time dynspread has had a consistent, accurate, and describable interpretation, rather than being an opaque metric whose threshold can only be set empirically. It also seems to work very well for points plots, enlarging the spread just enough in the various points cases so that points become visible up to the max_px as long as they do not overlap for the default threshold=0.5, and with meaningful levels of overlap controllable with values like 0.8 or 0.9. I'm now happy with it for points plots.

However, it no longer works at all for line plots, because by construction all points along the line always have an immediate neighbor, and so no spreading is ever applied. I'm still pondering whether there is a meaningful dynspread metric that (a) works using pixels alone, as it does now, yet (b) appropriately spreads line plots with low overlap while not spreading collections of many overlapping lines.

@jlstevens
Copy link
Collaborator

jlstevens commented May 26, 2021

I've always felt that a single dynamic spreading approach for multiple glyphs (e.g points versus lines) would have such tradeoffs. Imho what lines really need is a dynamic thickness (with antialiasing) instead of a convolution type approach.

At the holoviews level we could know which glyph is being rasterized and choose different code paths that way. At the datashader level it would be up to the user to choose a dynspread approach appropriate to what is being rasterized. Of course, it would be even better If you can think of a single, easy-to-explain approach that works for both points and lines!

I also think that the work in this PR to improve dynspread behavior would be good to make available, even if it only works for points.

@jbednar jbednar removed the request for review from jlstevens May 26, 2021 19:03
@jbednar jbednar modified the milestones: v0.13.1, v0.13.0 May 26, 2021
@jbednar
Copy link
Member Author

jbednar commented May 27, 2021

To-do items: Ok, after discussion with @jlstevens and @philippjfr, we agreed:

  1. For points plots, this new algorithm is a clear improvement and behaves very reasonably, and should go in. There might be other algorithms that would work as well, but this one is very simple to reason about and matches what we generally want dynspread to do, when operating under its current constraints.
  2. For line plots, it's more complicated:
    a. The new algorithm does not ever do any spreading for normal line/curve/area plots, because every point in such plots of spatially extended items always has a neighbor.
    b. In nearly every case we could think of, spread(px=1) is actually what we'd want to see for a line plot, and that has always behaved well.
    c. dynspread previously typically ended up spreading a line plot a little, nearly being equivalent to spread(px=1), whereas now line plots will no longer have spreading with dynspread. To re-enable spreading (previously only happening as a side effect of a broken algorithm), those examples will need updating.
    d. It's conceivable that we could write a neighborhood-based dynspread function for use with lines that looks at the fraction of points filled out of the neighborhood radius, and this could let a line plot use no spreading when on balance it's a dense tangle, and spreading when zoomed in. Making such an algorithm would be a research project (probably not too hard?), and normalizing it to behave similarly with respect to the threshold value as the current algorithm would be difficult, so it seems like it could be a fully separate function if we do decide we need it. I propose that we not do that work, and just see if a fixed spread value is good enough; seems more of a priority to make an antialiased line with thickness than to have a special dynspread for lines.
  3. Should we offer the old dynspread function somehow for backwards compatibility? Our conclusion is that preserving the old behavior is not worth the complexity, but we'll need some text about this issue in the release notes regardless.
    a. Cons:
    1. The old one was buggy in several ways and just did not work well.
    2. Supporting both is more complex and messy.
    3. Supporting both would e.g. require explaining to users to do something like datashader.transfer_functions.dynspread = datashader.transfer_functions.dynspread_old, which is not substantially easier than telling them to change dynspread to spread(px=1).
      b. Pros:
    4. People who were relying on the previous buggy behavior would more easily be able to return to it.
  4. The new dynspread appears to work as well as it can given the underlying assumptions, but see Smoother spreading step size #1012 for a proposal for how to eliminate the sharp jumps in spreading sizes currently required.

To-do list before merging: (Jim)

  • Update the Datashader tests that were verifying the old buggy behavior.
  • Search the datashader docs for dynspread used on line plots, and update those to use spread(px=1) instead.
  • Rename density to something more accurate. neighbor_fraction? [Skipped; seems like as good a name as any.]
  • Choose default max_px value. [The current value of 3 seems fine at the datashader level. It would have to be overridden at the plotting library level, and I personally suggest 5 at the HoloViews level, but that's not part of this PR.]
  • Write a release notes entry about dynspread updates.

To-do after merging:

  • Search related sites like holoviews for dynspread used on line plots, and update those to use spread(px=1) instead.

To-do after releasing datashader 0.13:

  • Update examples.pyviz and holoviews websites to use Datashader 0.13 so that their dynspread behavior improves.

@jbednar
Copy link
Member Author

jbednar commented Jun 5, 2021

Tests are passing on python 3.6, but failing on later versions because of holoviz/spatialpandas#68 and possibly other issues with related libraries. Merging this now, but still need to address the downstream issues.

@jbednar jbednar merged commit 1ae52b6 into master Jun 5, 2021
@maximlt maximlt deleted the dynspread branch December 25, 2021 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving dynspread
3 participants