On-the-fly endianness conversion #125

aulemahal · 2021-10-13T18:29:22Z

Fixes #124.

#111 introduced an unforeseen constrain : machine architectures and data dtypes are now restricted by what numba supports. numba is used by sparse under the hood. I have not heard of problems with machine architectures, but issue #124 is due to data coming from a different computer that uses "big-endian" (numpy's ">f8" dtype).

This introduces on-the-fly byte-swapping, to convert to little-endian. The performance cost is unmeasured, but should be quite low. I find it a bit ugly to perform that conversion in xESMF instead of sparse directly, but this is the fastest solution I could think of.
EDIT: Instead of converting dtypes on-the-fly, I converted the weights from it's sparse format to a scipy matrix, which doesn't use numba and thus resolves the problem. It also conserves the dtype, which might be wanted.

I used a function from numba directly, should I add it as an explicit dependency?

Also, it might not be necessary to raise a warning?

raphaeldussin · 2021-10-13T18:37:09Z

I wonder if we should have a solution to fall back onto the scipy functions too, looks like that transition to Sparse had unintended consequences

aulemahal · 2021-10-13T18:52:29Z

@raphaeldussin That is indeed a cleaner solution I think.

Opened an issue : pydata/sparse#521.

aulemahal · 2021-10-13T18:58:49Z

Does someone have access to a machine with an architecture not supported by numba? I think my solution would solve potential issues there, but I don't know for sure.

raphaeldussin · 2021-10-13T21:24:30Z

let's not rush to merge this until we figure out the right course of action

huard · 2021-11-02T14:39:38Z

@raphaeldussin I suggest we merge this, as it now includes a fall-back to scipy.

raphaeldussin · 2021-11-02T14:55:34Z

@huard should we figure out the dask graph/performance issue first before we had more commits?

raphaeldussin · 2021-11-08T20:21:32Z

xesmf/smm.py

@@ -121,6 +122,15 @@ def apply_weights(weights, indata, shape_in, shape_out):
        Extra dimensions are the same as `indata`.
        If input data is C-ordered, output will also be C-ordered.
    """
+    # Limitation from numba : some big-endian dtypes are not supported.
+    try:


how costly is this test? wouldn't it be cheaper to just test with:

if indata.dtype.byteorder == '>'

I don't think it's very costly. My intuition comes from the fact that it happens when numba doesn't support the given dtype, so no computation is done at all. But indeed, it is usually less costly to perform a single check than a try-except call, that's from pure python.

However, I was trying to be as general as I can be here. This change was triggered by numba not supporting a given byte-order, but could there be other things that numba doesn't support, especially on other machines? Evidently, I can't test those with such a computer, but I tried to be ready for other eventualities...

fair point, I'm ok with sacrificing a bit of performance for compatibility. try/except it is then

raphaeldussin · 2021-11-10T20:37:39Z

xesmf/smm.py

@@ -121,6 +122,15 @@ def apply_weights(weights, indata, shape_in, shape_out):
        Extra dimensions are the same as `indata`.
        If input data is C-ordered, output will also be C-ordered.
    """
+    # Limitation from numba : some big-endian dtypes are not supported.
+    try:


fair point, I'm ok with sacrificing a bit of performance for compatibility. try/except it is then

aulemahal added 2 commits October 13, 2021 14:19

On-the-fly byte-swapping

ce6af11

Upd changes

d39d0c5

aulemahal requested a review from huard October 13, 2021 18:29

linting

dbbf9cb

Remove on-the-fly byte-swap, switch to scipy instead

b57206f

huard approved these changes Oct 13, 2021

View reviewed changes

raphaeldussin reviewed Nov 8, 2021

View reviewed changes

raphaeldussin approved these changes Nov 10, 2021

View reviewed changes

Merge branch 'master' into fix-124

6690457

aulemahal merged commit 47d2753 into pangeo-data:master Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-the-fly endianness conversion #125

On-the-fly endianness conversion #125

aulemahal commented Oct 13, 2021 •

edited

Loading

raphaeldussin commented Oct 13, 2021

aulemahal commented Oct 13, 2021

aulemahal commented Oct 13, 2021

raphaeldussin commented Oct 13, 2021

huard commented Nov 2, 2021

raphaeldussin commented Nov 2, 2021

raphaeldussin Nov 8, 2021

aulemahal Nov 8, 2021

raphaeldussin Nov 10, 2021

raphaeldussin Nov 10, 2021

On-the-fly endianness conversion #125

On-the-fly endianness conversion #125

Conversation

aulemahal commented Oct 13, 2021 • edited Loading

raphaeldussin commented Oct 13, 2021

aulemahal commented Oct 13, 2021

aulemahal commented Oct 13, 2021

raphaeldussin commented Oct 13, 2021

huard commented Nov 2, 2021

raphaeldussin commented Nov 2, 2021

raphaeldussin Nov 8, 2021

Choose a reason for hiding this comment

aulemahal Nov 8, 2021

Choose a reason for hiding this comment

raphaeldussin Nov 10, 2021

Choose a reason for hiding this comment

raphaeldussin Nov 10, 2021

Choose a reason for hiding this comment

aulemahal commented Oct 13, 2021 •

edited

Loading