Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Plotly functions don't accept col names as x, y parameters in Modin #5445

Closed
3 tasks done
labanyamukhopadhyay opened this issue Dec 15, 2022 · 4 comments
Closed
3 tasks done
Labels
bug 🦗 Something isn't working Integration ➕➕ Issues with integrating Modin into other libraries P2 Minor bugs or low-priority feature requests

Comments

@labanyamukhopadhyay
Copy link
Contributor

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import modin.pandas as pd
import plotly.express as px

df = pd.DataFrame(dict(a=[1,3,2,4], b=[3,2,1,0]))
px.area(df, x='a', y='b')

Issue Description

Passing in column names through x and y parameters in any plotly function does not work. Only column indexes are accepted (px.area(df, x=0, y=1))

Expected Behavior

Screen Shot 2022-12-15 at 12 26 01 PM

Error Logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[22], line 4
      2 df = pd.DataFrame(dict(a=[1,3,2,4], b=[3,2,1,0]))
      3 #px.area(df) 
----> 4 px.area(df, x='a',y='b')

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_chart_types.py:315, in area(data_frame, x, y, line_group, color, pattern_shape, symbol, hover_name, hover_data, custom_data, text, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, pattern_shape_sequence, pattern_shape_map, symbol_sequence, symbol_map, markers, orientation, groupnorm, log_x, log_y, range_x, range_y, line_shape, title, template, width, height)
    270 def area(
    271     data_frame=None,
    272     x=None,
   (...)
    308     height=None,
    309 ) -> go.Figure:
    310     """
    311     In a stacked area plot, each row of `data_frame` is represented as
    312     vertex of a polyline mark in 2D space. The area between successive
    313     polylines is filled.
    314     """
--> 315     return make_figure(
    316         args=locals(),
    317         constructor=go.Scatter,
    318         trace_patch=dict(stackgroup=1, mode="lines", groupnorm=groupnorm),
    319     )

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_core.py:1990, in make_figure(args, constructor, trace_patch, layout_patch)
   1987 layout_patch = layout_patch or {}
   1988 apply_default_cascade(args)
-> 1990 args = build_dataframe(args, constructor)
   1991 if constructor in [go.Treemap, go.Sunburst, go.Icicle] and args["path"] is not None:
   1992     args = process_dataframe_hierarchy(args)

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_core.py:1405, in build_dataframe(args, constructor)
   1402     args["color"] = None
   1403 # now that things have been prepped, we do the systematic rewriting of `args`
-> 1405 df_output, wide_id_vars = process_args_into_dataframe(
   1406     args, wide_mode, var_name, value_name
   1407 )
   1409 # now that `df_output` exists and `args` contains only references, we complete
   1410 # the special-case and wide-mode handling by further rewriting args and/or mutating
   1411 # df_output
   1413 count_name = _escape_col_name(df_output, "count", [var_name, value_name])

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_core.py:1207, in process_args_into_dataframe(args, wide_mode, var_name, value_name)
   1205         if argument == "index":
   1206             err_msg += "\n To use the index, pass it in directly as `df.index`."
-> 1207         raise ValueError(err_msg)
   1208 elif length and len(df_input[argument]) != length:
   1209     raise ValueError(
   1210         "All arguments should have the same length. "
   1211         "The length of column argument `df[%s]` is %d, whereas the "
   (...)
   1218         )
   1219     )

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0, 1] but received: a

Installed Versions

INSTALLED VERSIONS

commit : 4114183
python : 3.9.7.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.18.0+3.g4114183f
ray : 2.0.1
dask : 2022.11.1
distributed : 2022.11.1
hdk : None

pandas dependencies

pandas : 1.5.2
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 57.4.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : 2022.11.0
scipy : 1.9.3
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@labanyamukhopadhyay labanyamukhopadhyay added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Dec 15, 2022
@anmyachev anmyachev added Integration ➕➕ Issues with integrating Modin into other libraries and removed Triage 🩹 Issues that need triage labels Dec 15, 2022
@gpaolopedrazza
Copy link

I am also stuck because of that. I think it starts there
https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L1306
because I obviously do not pass an instance of pd.DataFrame but of modin.pd.DataFrame.
The fresh pd.DataFrame instance has pandas.RangeIndex as type of the columns argument, while the old modin.pd.DataFrame instance which got replaced there had panda.Index instead. According with pandas documentation [1], index will default to RangeIndex if no indexing information part of input data and no index provided. So that could be exactly what modin is missing, but that is just of guess of mine.
[1] https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

@mvashishtha mvashishtha added the P2 Minor bugs or low-priority feature requests label Feb 6, 2023
@modin-bot
Copy link

This issue has been mentioned on Modin Discuss. There might be relevant details there:

https://discuss.modin.org/t/pandas-parallel-package-modin-does-not-work-with-plotly/344/2

@anmyachev
Copy link
Collaborator

@gpaolopedrazza FYI, this reproducer works with the changes from plotly/plotly.py#4244.

@labanyamukhopadhyay thanks!

@Garra1980
Copy link
Collaborator

Thanks everybody!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working Integration ➕➕ Issues with integrating Modin into other libraries P2 Minor bugs or low-priority feature requests
Projects
None yet
Development

No branches or pull requests

6 participants