Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PX can accept non-pandas dataframes that can .to_pandas() #3901

Merged
merged 2 commits into from
Jun 8, 2023

Conversation

nicolaskruchten
Copy link
Contributor

No description provided.

@exactlyallan
Copy link

Regarding accepting RAPIDS cuDF dataframes, this is great at reducing friction, but we lose transparency about moving off GPU to system memory that an explicit .to_pandas() shows. For single charts (most of PX's usage?) that really isnt an issue, but I'm wondering how much slower a Dash + PX dashboard would be without the tighter cuDF integration.

@nicolaskruchten
Copy link
Contributor Author

There's not really any tighter integration to be had atm... PX doesn't do any aggregation anyway, it just splits and dumps dataframe columns into JSON basically so it's coming off the GPU no matter what.

The major inefficiency in my PR is that it takes the entire df to pandas, not just the columns it needs, which would be a better approach but potentially trickier/further down into the code.

@kevinheavey
Copy link

The other drawback of relying on to_pandas is it cements Pandas as a dependency

@nicolaskruchten
Copy link
Contributor Author

Ah well right now the dependency is extremely cemented internally, which is part of what this change would try to hide from the user.

Copy link
Member

@LiamConnors LiamConnors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃

@alexcjohnson alexcjohnson merged commit 69b4d3e into master Jun 8, 2023
@alexcjohnson alexcjohnson deleted the px_to_pandas branch June 8, 2023 14:33
@nicolaskruchten
Copy link
Contributor Author

@LiamConnors itll be important to mention in the docs my comment above re "copying the whole data frame"!

@MarcoGorelli
Copy link
Contributor

Ah well right now the dependency is extremely cemented internally, which is part of what this change would try to hide from the user.

Hey @nicolaskruchten - just FYI, there is an initiative which could help with this: https://data-apis.org/blog/dataframe_standard_rfc/

Just to gauge interest, is this something you might consider leveraging in the future once it's ready?

@nicolaskruchten
Copy link
Contributor Author

definitely! PX does all sorts of manipulation under the hood which is pretty tightly-coupled to The Way Pandas Does Things™ but only out of necessity. If it turns out all of these can be faithfully ported to The Standard Way™ then I'm all for it. To be clear though: none of this is doing any data processing that can be sped up by non-Pandas backends, it's just a bunch of slicing and column rearranging work. PX does essentially no aggregation/math of any kind with Pandas, so the idea of accepting non-Pandas input is just an ergonomic one at this point to avoid the user having to do the .to_pandas() themselves.

There's another PR in flight to leverage the interchange approach on top of the to_pandas() approach: #4244

@MarcoGorelli
Copy link
Contributor

thanks @nicolaskruchten - the interchange protocol isn't necessarily zero-copy (it just tries to be when possible), so it may still be more efficient to write library-agnostic code

I'll try it out in plotly and see how far it goes

@nicolaskruchten
Copy link
Contributor Author

👍

probably the biggest win right now would be to .to_pandas() or to use the interchange to copy over less than the entire set of columns, which is what's happening right now and can be super-inefficient in the case of e.g. a very wide cudf thing where we haul the entire thing off the GPU just for two columns!

@@ -17,6 +16,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
this feature was anonymously sponsored: thank you to our sponsor!
- Add `legend.xref` and `legend.yref` to enable container-referenced positioning of legends [[#6589](https://github.com/plotly/plotly.js/pull/6589)], with thanks to [Gamma Technologies](https://www.gtisoft.com/) for sponsoring the related development.
- Add `colorbar.xref` and `colorbar.yref` to enable container-referenced positioning of colorbars [[#6593](https://github.com/plotly/plotly.js/pull/6593)], with thanks to [Gamma Technologies](https://www.gtisoft.com/) for sponsoring the related development.
- `px` methods now accept data-frame-like objects that support a `to_pandas()` method, such as polars, cudf, vaex etc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolaskruchten Looks like vaex doesn't have to_pandas method, only to_pandas_df.

Therefore, tests with vaex do not pass in #4244.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants