Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pyarrow Table and/or polars DataFrame #3637

Closed
sa- opened this issue Mar 19, 2022 · 8 comments
Closed

Add support for pyarrow Table and/or polars DataFrame #3637

sa- opened this issue Mar 19, 2022 · 8 comments

Comments

@sa-
Copy link

sa- commented Mar 19, 2022

Apache Arrow is slowly becoming the new standard for dataframes, and there is a dataframe library written on top of Arrow called Polars https://github.com/pola-rs/polars and it's really fast

It would be nice if there was support for polars directly, or for pyarrow tables so that I could use plotly with it like one would with pandas.

For example, it would be nice if I could do this:

import polars as pl
import plotly.express as px

df = pl.DataFrame({"a":[1,2,3,4,5], "b":[1,4,9,16,25]})

px.line(df, x="a", y="b")
# or px.line(df.to_arrow(), x="a", y="b")
# if you would only like to provide support for pyarrow Tables and not polars specifically
@sa-
Copy link
Author

sa- commented Mar 22, 2022

The workaround that I use right now is px.line(x=df["a"], y=df["b"]), but it gets unwieldy if the name of the data frame is too big

@DrMaphuse
Copy link

Surprisingly, polars Series seem to work out of the box, as you write in your workaround. I am curious how this is possible.

@nicolaskruchten
Copy link
Contributor

As far as I know, Plotly Express doesn't use any pandas functions which are significantly faster in polars or other data frames like vaex. All that PX does is column extraction and melt() if you pass in wide-form data. PX doesn't do any aggregations or math on the dataset. At the end of the process, some part of every data-frame row that you pass in to PX gets serialized to JSON anyway, so you can't really us PX to visualize very-large datasets.

So you in general you should convert your data frames to pandas ones first before passing them to Plotly Express. The most straightforward way for PX to "accept" such alternative dataframes would be for PX to detect the presence of an "export to pandas" function and call that internally.

@thomasaarholt
Copy link

Just to be explicit: You can use the graph_objects API to plot polars series. What doesn't work is passing a df to e.g. px.line(df, x=..., y=...), and then referencing x and y by strings.

from plotly import graph_objects as go
import polars as pl
dates = pl.date_range(low=date(2021, 1, 1), high=date(2021, 1, 5), interval='1d', name="dates")
df = pl.DataFrame({"dates": dates, "values": range(5)})

fig = go.Figure()
fig.add_trace(go.Scatter(x = df["dates"], y=df["values"]))
st.plotly_chart(fig)

image

This works fine in streamlit, for those who are interested.

@alexander-beedie
Copy link

alexander-beedie commented May 1, 2023

Surprisingly, polars Series seem to work out of the box, as you write in your workaround. I am curious how this is possible.

I expect it's because we support the numpy __array__ protocol.

Note that VegaFusion/Altair recently gained polars support (and vaex/duckdb) by implementing the DataFrame Interchange Protocol; this would be a nice/generic way forward here too (rather than having to add custom/per-backend support). There does seem to be an existing PR for this; if that was merged then everything should "just work", which would be awesome.

@Lundez
Copy link

Lundez commented Jun 9, 2023

Latest release should support this 🥳
https://github.com/plotly/plotly.py/releases/tag/v5.15.0

But it's still using Pandas under-the-hood 😢

px methods now accept data-frame-like objects that support a to_pandas() method, such as polars, cudf, vaex etc

@thomasaarholt
Copy link

Nice! I guess that was the cheapest / fastest way to getting support.

@gvwilson
Copy link
Contributor

Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. If you'd like to submit a PR, we'd be happy to prioritize a review, and if it's a request for tech support, please post in our community forum. Thank you - @gvwilson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants