BUG: Plotly functions don't accept col names as x, y parameters in Modin #5445

labanyamukhopadhyay · 2022-12-15T20:32:55Z

Modin version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import modin.pandas as pd
import plotly.express as px

df = pd.DataFrame(dict(a=[1,3,2,4], b=[3,2,1,0]))
px.area(df, x='a', y='b')

Issue Description

Passing in column names through x and y parameters in any plotly function does not work. Only column indexes are accepted (px.area(df, x=0, y=1))

Expected Behavior

Error Logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[22], line 4
      2 df = pd.DataFrame(dict(a=[1,3,2,4], b=[3,2,1,0]))
      3 #px.area(df) 
----> 4 px.area(df, x='a',y='b')

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_chart_types.py:315, in area(data_frame, x, y, line_group, color, pattern_shape, symbol, hover_name, hover_data, custom_data, text, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, pattern_shape_sequence, pattern_shape_map, symbol_sequence, symbol_map, markers, orientation, groupnorm, log_x, log_y, range_x, range_y, line_shape, title, template, width, height)
    270 def area(
    271     data_frame=None,
    272     x=None,
   (...)
    308     height=None,
    309 ) -> go.Figure:
    310     """
    311     In a stacked area plot, each row of `data_frame` is represented as
    312     vertex of a polyline mark in 2D space. The area between successive
    313     polylines is filled.
    314     """
--> 315     return make_figure(
    316         args=locals(),
    317         constructor=go.Scatter,
    318         trace_patch=dict(stackgroup=1, mode="lines", groupnorm=groupnorm),
    319     )

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_core.py:1990, in make_figure(args, constructor, trace_patch, layout_patch)
   1987 layout_patch = layout_patch or {}
   1988 apply_default_cascade(args)
-> 1990 args = build_dataframe(args, constructor)
   1991 if constructor in [go.Treemap, go.Sunburst, go.Icicle] and args["path"] is not None:
   1992     args = process_dataframe_hierarchy(args)

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_core.py:1405, in build_dataframe(args, constructor)
   1402     args["color"] = None
   1403 # now that things have been prepped, we do the systematic rewriting of `args`
-> 1405 df_output, wide_id_vars = process_args_into_dataframe(
   1406     args, wide_mode, var_name, value_name
   1407 )
   1409 # now that `df_output` exists and `args` contains only references, we complete
   1410 # the special-case and wide-mode handling by further rewriting args and/or mutating
   1411 # df_output
   1413 count_name = _escape_col_name(df_output, "count", [var_name, value_name])

File ~/Desktop/ponder-integ/lib/python3.9/site-packages/plotly/express/_core.py:1207, in process_args_into_dataframe(args, wide_mode, var_name, value_name)
   1205         if argument == "index":
   1206             err_msg += "\n To use the index, pass it in directly as `df.index`."
-> 1207         raise ValueError(err_msg)
   1208 elif length and len(df_input[argument]) != length:
   1209     raise ValueError(
   1210         "All arguments should have the same length. "
   1211         "The length of column argument `df[%s]` is %d, whereas the "
   (...)
   1218         )
   1219     )

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0, 1] but received: a

Installed Versions

INSTALLED VERSIONS

commit : 4114183
python : 3.9.7.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.18.0+3.g4114183f
ray : 2.0.1
dask : 2022.11.1
distributed : 2022.11.1
hdk : None

pandas dependencies

pandas : 1.5.2
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 57.4.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : 2022.11.0
scipy : 1.9.3
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

The text was updated successfully, but these errors were encountered:

gpaolopedrazza · 2023-02-04T17:29:38Z

I am also stuck because of that. I think it starts there
https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L1306
because I obviously do not pass an instance of pd.DataFrame but of modin.pd.DataFrame.
The fresh pd.DataFrame instance has pandas.RangeIndex as type of the columns argument, while the old modin.pd.DataFrame instance which got replaced there had panda.Index instead. According with pandas documentation [1], index will default to RangeIndex if no indexing information part of input data and no index provided. So that could be exactly what modin is missing, but that is just of guess of mine.
[1] https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

modin-bot · 2023-03-28T15:46:57Z

This issue has been mentioned on Modin Discuss. There might be relevant details there:

https://discuss.modin.org/t/pandas-parallel-package-modin-does-not-work-with-plotly/344/2

anmyachev · 2023-07-01T18:58:53Z

@gpaolopedrazza FYI, this reproducer works with the changes from plotly/plotly.py#4244.

@labanyamukhopadhyay thanks!

Garra1980 · 2023-07-01T19:29:02Z

Thanks everybody!

labanyamukhopadhyay added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Dec 15, 2022

anmyachev added Integration ➕➕ Issues with integrating Modin into other libraries and removed Triage 🩹 Issues that need triage labels Dec 15, 2022

mvashishtha added the P2 Minor bugs or low-priority feature requests label Feb 6, 2023

anmyachev closed this as completed Jul 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Plotly functions don't accept col names as x, y parameters in Modin #5445

BUG: Plotly functions don't accept col names as x, y parameters in Modin #5445

labanyamukhopadhyay commented Dec 15, 2022

INSTALLED VERSIONS

Modin dependencies

pandas dependencies

gpaolopedrazza commented Feb 4, 2023

modin-bot commented Mar 28, 2023

anmyachev commented Jul 1, 2023

Garra1980 commented Jul 1, 2023

BUG: Plotly functions don't accept col names as x, y parameters in Modin #5445

BUG: Plotly functions don't accept col names as x, y parameters in Modin #5445

Comments

labanyamukhopadhyay commented Dec 15, 2022

Modin version checks

Reproducible Example

Issue Description

Expected Behavior

Error Logs

Installed Versions

INSTALLED VERSIONS

Modin dependencies

pandas dependencies

gpaolopedrazza commented Feb 4, 2023

modin-bot commented Mar 28, 2023

anmyachev commented Jul 1, 2023

Garra1980 commented Jul 1, 2023