-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44548][PYTHON] Add support for pandas-on-Spark DataFrame assertDataFrameEqual #42158
Conversation
@itholic I think you should review this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall structure looks fine to me, but left some comments for some error class/error message refactoring.
Let's do not forget to create related tickets and resolve these with follow-ups.
Looks pretty good. cc @HyukjinKwon for confirming as CI is passed |
Merged to master and branch-3.5. |
…tDataFrameEqual ### What changes were proposed in this pull request? This PR adds support for pandas-on-Spark DataFrame for the testing util, `assertDataFrameEqual` ### Why are the changes needed? The change allows users to call the same PySpark API for both Spark and pandas DataFrames. ### Does this PR introduce _any_ user-facing change? Yes, the PR affects the user-facing util `assertDataFrameEqual` ### How was this patch tested? Added tests to `python/pyspark/sql/tests/test_utils.py` and `python/pyspark/sql/tests/connect/test_utils.py` and existing pandas util tests. Closes #42158 from asl3/pandas-or-pyspark-df. Authored-by: Amanda Liu <amanda.liu@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 7c1ad5b) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…tDataFrameEqual ### What changes were proposed in this pull request? This PR adds support for pandas-on-Spark DataFrame for the testing util, `assertDataFrameEqual` ### Why are the changes needed? The change allows users to call the same PySpark API for both Spark and pandas DataFrames. ### Does this PR introduce _any_ user-facing change? Yes, the PR affects the user-facing util `assertDataFrameEqual` ### How was this patch tested? Added tests to `python/pyspark/sql/tests/test_utils.py` and `python/pyspark/sql/tests/connect/test_utils.py` and existing pandas util tests. Closes apache#42158 from asl3/pandas-or-pyspark-df. Authored-by: Amanda Liu <amanda.liu@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR adds support for pandas-on-Spark DataFrame for the testing util,
assertDataFrameEqual
Why are the changes needed?
The change allows users to call the same PySpark API,
assertDataFrameEqual
, for both Spark and pandas-on-Spark DataFrames. It also exposes a new user-facing API,assertPandasOnSparkEqual
.Does this PR introduce any user-facing change?
Yes, the PR affects the user-facing util
assertDataFrameEqual
and exposes a new user-facing API,assertPandasOnSparkEqual
.How was this patch tested?
Added tests to
python/pyspark/sql/tests/test_utils.py
andpython/pyspark/sql/tests/connect/test_utils.py
and existing pandas util tests.