Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support dataframe protocol (tested with Vaex) #3387

Conversation

maartenbreddels
Copy link
Contributor

This allows plotly express to take in any dataframe that supports
the dataframe protocol, see:
https://data-apis.org/blog/dataframe_protocol_rfc/
https://data-apis.org/dataframe-protocol/latest/index.html

Test includes an example with vaex, which should work with
vaexio/vaex#1509
(not yet released)

This is only a POC, I think this needs to wait till Pandas implemented the from_dataframe, and if you'd like to keep this test, would require a Vaex version with above mentioned PR merged and released.

Usage:
image

Note that this does not speed up any aggregation/processing, although reading from hdf5/arrow/parquet might be faster.

@maartenbreddels
Copy link
Contributor Author

credits go to @AlenkaF for doing the heavy lifting on the Vaex side (cc @rgommers)

@nicolaskruchten
Copy link
Contributor

Cool, thanks! I'll be happy to merge this once it's a bit more ready :)

@nicolaskruchten
Copy link
Contributor

See also #3901 for an alternative approach

This allows plotly express to take in any dataframe that supports
the dataframe protocol, see:
https://data-apis.org/blog/dataframe_protocol_rfc/
https://data-apis.org/dataframe-protocol/latest/index.html

Test includes an example with vaex, which should work with
vaexio/vaex#1509
(not yet released)
@maartenbreddels maartenbreddels marked this pull request as ready for review September 30, 2022 09:54
@maartenbreddels
Copy link
Contributor Author

Since pandas 1.5.0 it has support for the protocol: https://pandas.pydata.org/docs/whatsnew/v1.5.0.html#dataframe-interchange-protocol-implementation

@maartenbreddels
Copy link
Contributor Author

Not sure if you'd like a vaex dependency for testing, but in case you're ok with it, where should that do?

@nicolaskruchten
Copy link
Contributor

Thanks Maarten! There should be a requirements_optional.txt somewhere that I can add a vaex dependency :)

Once we merge this, we can still offer #3901 as a fallback, for cases where someone has an old pandas or an old vaex without the interchange stuff but, say, a vaex that does still export itself to_pandas, right?

@maartenbreddels
Copy link
Contributor Author

Absolutely, although we have a to_pandas_df() method

@labanyamukhopadhyay
Copy link

Hi @nicolaskruchten Is there an update regarding support for the dataframe exchange protocol? It would be useful for interoperability with Plotly and Modin dataframes!

@alexander-beedie
Copy link

alexander-beedie commented May 1, 2023

As an FYI, this would also enable transparent polars integration/support (closing #3637) as we implemented support for the DataFrame Interchange Protocol back in 0.16.2 (end of Jan/beginning of Feb this year) 📈

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented May 21, 2023

Is this still active?

If so, I'd strongly suggest setting 2.0.2* as the minimum pandas version to try interchanging from, because there's some pretty basic mistakes in earlier versions:

In [1]: df = pl.DataFrame({'a': [1,2,3]})

In [2]: pd.api.interchange.from_dataframe(df[1:])
Out[2]:
                 a
0                2
1                3
2  125822987010162

(😱 )

*not yet available, but should be out tomorrow

@anmyachev
Copy link
Contributor

If so, I'd strongly suggest setting 2.0.2* as the minimum pandas version to try interchanging from, because there's some pretty basic mistakes in earlier versions:

Hi, 2.0.2 is out.

As an option, one can change the condition so that this protocol is used for those users who have already switched to pandas 2.0.2 and not wait until the plotly's minimum supported version of pandas will be 2.0.2.

from packaging import version
if hasattr(args["data_frame"], "__dataframe__") and version.parse(pd.__version__) >= version.parse("2.0.2"):

@MarcoGorelli
Copy link
Contributor

thanks @anmyachev - yes, sorry, that's what I meant, rather than bumping the minimum pandas version for everything

@anmyachev
Copy link
Contributor

Once we merge this, we can still offer #3901 as a fallback, for cases where someone has an old pandas or an old vaex without the interchange stuff but, say, a vaex that does still export itself to_pandas, right?

@nicolaskruchten using interchange protocol and also having a fallback as in #3901 looks the best option for now, since IIUC interchange protocol doesn't work for Series.

cc @MarcoGorelli

@anmyachev
Copy link
Contributor

@LiamConnors @nicolaskruchten do you have a plan to use interchange protocol in addition to to_pandas method that was added in #3901?

In this case, plotly.py would also be able to accept Modin dataframes.

@nicolaskruchten
Copy link
Contributor

This is probably a good idea still yes, if someone wants to update this PR to implement this fallback :)

@anmyachev
Copy link
Contributor

This is probably a good idea still yes, if someone wants to update this PR to implement this fallback :)

Good! In that case, I'll take care of it, if no one minds :)

@anmyachev
Copy link
Contributor

@nicolaskruchten I made a separate pull request, with the continuation of this work. #4244

@anmyachev
Copy link
Contributor

@nicolaskruchten I guess this PR can be closed?

@alexcjohnson
Copy link
Collaborator

Yes, thanks @anmyachev!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants