Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Ragged tests #1022

Merged
merged 3 commits into from
Sep 29, 2021
Merged

Fix Ragged tests #1022

merged 3 commits into from
Sep 29, 2021

Conversation

ianthomas23
Copy link
Member

There are consistently 14 test failures in CI in the various TestRagged tests in test_datatypes.py:

=========================== short test summary info ============================
FAILED datashader/tests/test_datatypes.py::TestRaggedGetitem::test_getitem_mask_raises
FAILED datashader/tests/test_datatypes.py::TestRaggedGetitem::test_getitem_ellipsis_and_slice
FAILED datashader/tests/test_datatypes.py::TestRaggedGetitem::test_take_empty
FAILED datashader/tests/test_datatypes.py::TestRaggedGroupby::test_groupby_extension_apply[scalar]
FAILED datashader/tests/test_datatypes.py::TestRaggedGroupby::test_groupby_extension_apply[list]
FAILED datashader/tests/test_datatypes.py::TestRaggedGroupby::test_groupby_extension_apply[object]
FAILED datashader/tests/test_datatypes.py::TestRaggedGroupby::test_groupby_extension_agg[False]
FAILED datashader/tests/test_datatypes.py::TestRaggedGroupby::test_groupby_extension_transform
FAILED datashader/tests/test_datatypes.py::TestRaggedMethods::test_sort_values_frame[True]
FAILED datashader/tests/test_datatypes.py::TestRaggedMethods::test_sort_values_frame[False]
FAILED datashader/tests/test_datatypes.py::TestRaggedMissing::test_fillna_limit_pad
FAILED datashader/tests/test_datatypes.py::TestRaggedMissing::test_fillna_limit_backfill
FAILED datashader/tests/test_datatypes.py::TestRaggedMissing::test_fillna_series_method[ffill]
FAILED datashader/tests/test_datatypes.py::TestRaggedMissing::test_fillna_series_method[bfill]
===== 14 failed, 763 passed, 39 skipped, 101 warnings in 372.30s (0:06:12) =====

The failures are due to a tightening-up of numpy and pandas APIs and increased extension testing in pandas.

This PR removes the test failures; there are 4 types of fixes/changes:

  1. Change wording of exceptions.
  2. Replacing use of deprecated pandas.util.testing with equivalent numpy assertion.
  3. Skipping tests for functionality that is known to not be implemented for RaggedArray, such as construction using nested sequences and indexing using ellipsis.
  4. Skipping other tests that I am not 100% sure of but seems consistent with existing skipped tests.

As expert in RaggedArray and pandas extension types could no doubt do a better job with item 4, but in the meantime I would quite like the CI to pass so that I can experiment with adding new functionality.

@ianthomas23
Copy link
Member Author

I think I've dealt with the final test failure. This PR needs approval again for the CI to run...

@philippjfr
Copy link
Member

@ianthomas23 It seems like we are endlessly chasing pandas expanding test suite for ExtensionArrays. Do you have any thoughts on how we can isolate ourselves from this? It seems kind of silly to me for Pandas to keep adding to the base extension array tests knowing that a bunch of tests they are adding won't be applicable to various subclasses of ExtensionArray other people might implement.

@ianthomas23
Copy link
Member Author

@philippjfr We could pin the version of pandas in setup.py (and maybe other places?) to an upper version that we know the CI passes with. Then contributors PRs won't fail because of pandas changes. But it would mean whenever pandas do a new release someone would have to check the tests pass locally with that new version before changing the upper version allowed in datashader. It looks like pandas release about once a month.

There has been a discussion in numpy/scipy for such upper version pinning in the last year.

@ianthomas23
Copy link
Member Author

Remaining test failures are

TypeError: 'coroutine' object is not subscriptable.

This seems to be a known problem with some combinations of jupyter_client and nbconvert (jupyter/jupyter_client#637). It looks like you have seen this recently on colorcet too: holoviz/colorcet#66.

The solution seems to be to pin the versions of one or the other. There might need to be some experimentation here to find a working combination.

How do you want to proceed? Would you be happy merging this PR and then I can try to deal with it in another PR? This will be easier than adding it to this one as I will no longer be a first-time contributor so I should be able to use github actions without any assistance.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me; I'll merge and other fixes can be made separately. Thanks, @ianthomas23, and sorry for the delay. Personally, I'd like to get all extension array code out of datashader and into e.g. SpatialPandas, but there are still a few important bits implemented only in Datashader. :-(

datashader/datatypes.py Outdated Show resolved Hide resolved
@jbednar jbednar merged commit aea4959 into holoviz:master Sep 29, 2021
@philippjfr
Copy link
Member

Personally, I'd like to get all extension array code out of datashader and into e.g. SpatialPandas, but there are still a few important bits implemented only in Datashader. :-(

Same issues apply there, I don't think pinning pandas in setup.py is a great option, although it might be a good idea to do that in CI, because 99% of the time there's zero issue with the new pandas. It's just various new tests which don't apply to the ragged array and geometry arrays have to be skipped.

@ianthomas23
Copy link
Member Author

@philippjfr I don't have any other cunning ideas for avoiding problems with future pandas releases. Maybe we just leave it as it is but try to check CI failures more frequently.

Here at makepath we intend to use datashader a lot, so maybe we can help with future maintenance workload. I have a bunch of PR ideas lined up so that it is a bit more polished for our use, so I should start to get familiar with the code base.

@ianthomas23 ianthomas23 mentioned this pull request Oct 18, 2021
@ianthomas23 ianthomas23 deleted the fix_ragged_tests branch November 17, 2022 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants