Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filled area support #734

Merged
merged 12 commits into from
May 3, 2019
Merged

Filled area support #734

merged 12 commits into from
May 3, 2019

Conversation

jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Apr 6, 2019

Overview

This PR add a new Canvas.area function to support the creation of filled area plots with Datashader.

Examples

Imports and dataset

import pandas as pd  # doctest: +SKIP
import numpy as np
import datashader as ds
from datashader import Canvas
import datashader.transfer_functions as tf
cvs = Canvas()
df = pd.DataFrame({
       'A1': [1, 1.5, 2, 2.5, 3, 4],
       'A2': [1.6, 2.1, 2.9, 3.2, 4.2, 5],
       'B1': [10, 12, 11, 14, 13, 15],
       'B2': [5, 9, 10, 7, 8, 12],
    }, dtype='float64')
df

Plot 'A1' vs 'B1' and 'B2' as lines

agg = cvs.line(df, x='A1', y=['B1', 'B2'], axis=0)
tf.spread(tf.shade(agg))

index1

Plot filled area from 'B1' to y=0.

agg = cvs.area(df, x='A1', y='B1')
tf.shade(agg)

index2

Plot filled area from 'B1' to 'B2' using the y_stack argument to Canvas.area.

agg = cvs.area(df, x='A1', y='B1', y_stack='B2')
tf.shade(agg)

index3

Mutli-line with axis=0

agg = cvs.line(df, x=['A1', 'A2'], y=['B1', 'B2'], axis=0, agg=ds.count())  # doctest: +SKIP
tf.spread(tf.shade(agg))

index8

Multi-area with axis=0

agg = cvs.area(df, x=['A1', 'A2'], y=['B1', 'B2'], axis=0, agg=ds.count())  # doctest: +SKIP
tf.shade(agg)

index4

Mutli-line with axis=0 and shared x

agg = cvs.line(df, x='A1', y=['B1', 'B2'], axis=0)
tf.spread(tf.shade(agg))

index9

Multi-area with axis=0 and shared x

agg = cvs.area(df, x='A1', y=['B1', 'B2'], agg=ds.count(), axis=0)  # doctest: +SKIP
tf.shade(agg)

index5

Multi-line with axis=1

agg = cvs.line(df, x=['A1', 'A2'], y=['B1', 'B2'], axis=1)
tf.spread(tf.shade(agg))

index10

Multi-area with axis=1

agg = cvs.area(df, x=['A1', 'A2'], y=['B1', 'B2'], agg=ds.count(), axis=1)
tf.shade(agg)

index6

Ragged array lines

df_ragged = pd.DataFrame({  # doctest: +SKIP
           'A1': pd.array([
               [1, 1.5], [2, 2.5, 3], [1.6, 2, 3, 4], [3.2, 4, 5]],
               dtype='Ragged[float32]'),
           'B1': pd.array([
               [10, 12], [11, 14, 13], [10, 7, 9, 10], [7, 8, 12]],
               dtype='Ragged[float32]'),
           'B2': pd.array([
               [6, 10], [9, 10, 10], [9, 5, 8, 9], [4, 5, 11]],
               dtype='Ragged[float32]'),
           'group': pd.Categorical([0, 1, 2, 1])
        })
        
agg = cvs.line(df_ragged, x='A1', y='B1', axis=1)
tf.spread(tf.shade(agg))

index11

Ragged areas filled to zero line

agg = cvs.area(df_ragged, x='A1', y='B1', agg=ds.count(), axis=1)
tf.shade(agg)

Ragged areas filled to ragged line

agg = cvs.area(df_ragged, x='A1', y='B1', y_stack='B2', agg=ds.count(), axis=1)
tf.shade(agg)

index12

Ragged areas filled to ragged with count_cat aggregator

agg = cvs.area(df_ragged, x='A1', y='B1', y_stack='B2',
               agg=ds.count_cat('group'), axis=1)
tf.shade(agg)

index13

Documentation

I added a new area plot section to the Timeseries user guide. See the user guide for full code:

cvs = ds.Canvas(plot_height=300, plot_width=900)
agg = cvs.area(df, 'Time', 'a')
img = tf.shade(agg)
img

index

Performance

I haven't done extensive benchmarking, but in all of the cases I've tried the render time has been within a factor of two of the time to render the line by itself.

Adds function to render trapezoids with parallel edges that are aligned with the y-axis. These are the primitives that will be used to create area glyphs.
This function relies on the new AreaToZero and AreaToLine glyphs.  AreaToZero handles area plots between a line and y=0.  AreaToLine handles area plots between two lines that share the same x-axis coordinates.
@jonmmease
Copy link
Collaborator Author

Seems some tornado version incompatibility may have cropped up in the tests.

looks like nbsmoke/nbconvert is raising an error:

AttributeError: module 'tornado.web' has no attribute 'asynchronous'

@jlstevens
Copy link
Collaborator

@jonmmease I restarted the failing build and it seems to be passing now. Maybe some other dependency ended up pinning tornado for us? At any rate, I'll now have a look at this PR...

@jlstevens
Copy link
Collaborator

jlstevens commented Apr 19, 2019

Just a quick note to say I've tried the PR and it seems to be working as expected! I've tested whether this could be used to draw axis-aligned filled boxes for a project I am working on and it behaves the way I would expect.

@jlstevens
Copy link
Collaborator

jlstevens commented Apr 19, 2019

One issue I have spotted - replacing .line( with .area( everywhere in '3_Timeseries.ipynb' of the user guide, you get this error at the end:

image

I suppose the axis argument isn't yet supported by area? Otherwise it is working great!

@jbednar
Copy link
Member

jbednar commented Apr 19, 2019

@jonmmease , it would be great to have the axis support as well, to make it simple to switch from line to area and back as needed.

One possible feature request: If we're showing the area between two curves a and b, would it be possible to somehow indicate the regions where a was higher than b differently from the regions where b is higher than a? E.g. the first case could aggregate positively, and the second negatively? I'm trying to think about the cases where people use area shading between curves in general, where usually it's to highlight differences between the curves, and it seems like the sign of that difference is crucially important in that case.

@jlstevens
Copy link
Collaborator

image

Just so we can have a pretty plot demonstrating areas :-)

@jonmmease
Copy link
Collaborator Author

Thanks for the feedback and trying this out @jlstevens

Yes, that's right that the axis argument isn't implemented. Other that time, the main reason I didn't pursue it was that I wasn't sure what it should mean. For the line case with axis=1, each row of the dataframe represents a single distinct line. For area there is the single line case (fill to zero) and the two line case (fill between lines), but I wasn't sure how to think about the case with a separate line per row in a dataframe. They could be treated as a bunch of instances of the single line case, or there could be some kind of option to toggle two line behavior where every other row would be filled between.

Happy to talk more about it if someone has ideas for how the axis=1 API should work. My initial impression is that it could get messy and I didn't know what use cases there might be for it.

Regarding aggregation, perhaps we could define a new comparison_count aggregator that would accept two field names. The aggregator would return +1 if col1 > col2 and 0 of col1 == col2 and -1 if col1 < col2. This aggregator could be used to display a change in filled area plot when the lines cross, like @jbednar suggested, but it could also be used in other glyphs as well. Would that make sense?

@jbednar
Copy link
Member

jbednar commented Apr 19, 2019

I would expect the axis argument support to work just as for cvs.line, i.e. only really being relevant for the case of filling from the x axis to the line. I don't think there's a meaningful extension of the between-the-lines filling for that case, and doing every other line sounds quite arbitrary, so I'd be very happy if it simply did all that cvs.line does, plus separate support for filling between two lines (the only bit of the API or behavior that wouldn't match cvs.line, and strictly being an extension/superset rather than a difference.) The goal is that any user of cvs.line should be able to switch to cvs.area to try it out without encountering any issues, and I think that's achievable here.

The proposed aggregator sounds great!

@jonmmease
Copy link
Collaborator Author

The goal is that any user of cvs.line should be able to switch to cvs.area to try it out without encountering any issues, and I think that's achievable here.

Sounds good! I'll see what makes sense in terms of the fill-between-lines API as a get a bit further in refactoring this.

@jonmmease jonmmease changed the title Filled area support [WIP] Filled area support May 1, 2019
@jbednar
Copy link
Member

jbednar commented May 2, 2019

Looks promising; tag me and @jlstevens when it's ready for re-review!

@jonmmease
Copy link
Collaborator Author

Looks promising; tag me and @jlstevens when it's ready for re-review!

Will do! Should be soon now 🙂

@jonmmease jonmmease changed the title [WIP] Filled area support Filled area support May 3, 2019
@jonmmease
Copy link
Collaborator Author

jonmmease commented May 3, 2019

@jbednar @jlstevens Alright, this is ready for another look! Main PR description above has been updated with the new API.

@jbednar
Copy link
Member

jbednar commented May 3, 2019

Looks great! I really appreciate your careful docstrings, comments, and test cases, and I was able to use it anywhere I expected cvs.line to work, without any problems. The underlying numerical code makes my brain go numb, so I can't comment on that, but it certainly seems to behave properly.

Does the y_stack argument work for only a single subtracting line, or can it also be used in the multiple-column or multiple-row cases to provide a baseline for each such line? Either is fine, I just couldn't tell from the 3_Timeseries.ipynb notebook.

Also, were you able to look into an implementation of comparison_count? That shouldn't hold up this PR being merged; just wondering.

Unless @jlstevens can spot any problems, I'm happy to merge this!

@jonmmease
Copy link
Collaborator Author

Thanks for taking a look @jbednar,

Does the y_stack argument work for only a single subtracting line, or can it also be used in the multiple-column or multiple-row cases to provide a baseline for each such line?

Yes, y_stack can be used in every case (single, multi, ragged, axis=0, axis=1, etc.). It just needs to have the same form as y (i.e. both strings, both lists of strings with the same length, both ragged, both numpy arrays with the same length, etc.). I didn't make examples of all of them because that's another factor of 2 and I'm not sure which will actually be useful in practice 🙂

Also, were you able to look into an implementation of comparison_count? That shouldn't hold up this PR being merged; just wondering.

No, not yet. But I take a look soon.

@jbednar
Copy link
Member

jbednar commented May 3, 2019

Sounds great!

@jlstevens
Copy link
Collaborator

Looks good to me! My only observation here is that glyphs.pyis getting rather long and at some point it might be good to split up...

@jbednar jbednar merged commit 0910846 into master May 3, 2019
@jbednar jbednar deleted the enh_area branch May 3, 2019 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants