Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-the-fly endianness conversion #125

Merged
merged 5 commits into from
Nov 12, 2021
Merged

Conversation

aulemahal
Copy link
Collaborator

@aulemahal aulemahal commented Oct 13, 2021

Fixes #124.

#111 introduced an unforeseen constrain : machine architectures and data dtypes are now restricted by what numba supports. numba is used by sparse under the hood. I have not heard of problems with machine architectures, but issue #124 is due to data coming from a different computer that uses "big-endian" (numpy's ">f8" dtype).

This introduces on-the-fly byte-swapping, to convert to little-endian. The performance cost is unmeasured, but should be quite low. I find it a bit ugly to perform that conversion in xESMF instead of sparse directly, but this is the fastest solution I could think of.
EDIT: Instead of converting dtypes on-the-fly, I converted the weights from it's sparse format to a scipy matrix, which doesn't use numba and thus resolves the problem. It also conserves the dtype, which might be wanted.

I used a function from numba directly, should I add it as an explicit dependency?

Also, it might not be necessary to raise a warning?

@aulemahal aulemahal requested a review from huard October 13, 2021 18:29
@raphaeldussin
Copy link
Contributor

I wonder if we should have a solution to fall back onto the scipy functions too, looks like that transition to Sparse had unintended consequences

@aulemahal
Copy link
Collaborator Author

@raphaeldussin That is indeed a cleaner solution I think.

Opened an issue : pydata/sparse#521.

@aulemahal
Copy link
Collaborator Author

Does someone have access to a machine with an architecture not supported by numba? I think my solution would solve potential issues there, but I don't know for sure.

@raphaeldussin
Copy link
Contributor

let's not rush to merge this until we figure out the right course of action

@huard
Copy link
Contributor

huard commented Nov 2, 2021

@raphaeldussin I suggest we merge this, as it now includes a fall-back to scipy.

@raphaeldussin
Copy link
Contributor

@huard should we figure out the dask graph/performance issue first before we had more commits?

@@ -121,6 +122,15 @@ def apply_weights(weights, indata, shape_in, shape_out):
Extra dimensions are the same as `indata`.
If input data is C-ordered, output will also be C-ordered.
"""
# Limitation from numba : some big-endian dtypes are not supported.
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how costly is this test? wouldn't it be cheaper to just test with:

if indata.dtype.byteorder == '>'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's very costly. My intuition comes from the fact that it happens when numba doesn't support the given dtype, so no computation is done at all. But indeed, it is usually less costly to perform a single check than a try-except call, that's from pure python.

However, I was trying to be as general as I can be here. This change was triggered by numba not supporting a given byte-order, but could there be other things that numba doesn't support, especially on other machines? Evidently, I can't test those with such a computer, but I tried to be ready for other eventualities...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point, I'm ok with sacrificing a bit of performance for compatibility. try/except it is then

@@ -121,6 +122,15 @@ def apply_weights(weights, indata, shape_in, shape_out):
Extra dimensions are the same as `indata`.
If input data is C-ordered, output will also be C-ordered.
"""
# Limitation from numba : some big-endian dtypes are not supported.
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point, I'm ok with sacrificing a bit of performance for compatibility. try/except it is then

@aulemahal aulemahal merged commit 47d2753 into pangeo-data:master Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numba issues with 0.6.1
3 participants