[Tracking Issue] Enabling Testing in AArch64 #10673

Mousius · 2022-03-18T14:10:27Z

This issue is to track progress enabling tests on AArch64

As part of enabling more tests in the AArch64 container, a number of tests had to be skipped and need to be fixed.

Potential Schedule Issues

test_topi_conv2d_int8.py::verify_conv2d_NCHWc_int8
test_op_level5.py::test_crop_and_resize

xgboost issues

E           xgboost.core.XGBoostError: XGBoost Library (libxgboost.so) could not be loaded.
E           Likely causes:
E             * OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libomp.dylib for Mac OSX, libgomp.so for Linux and other UNIX-like OSes). Mac OSX users: Run `brew install libomp` to install OpenMP runtime.
E             * You are running 32-bit Python on a 64-bit OS
E           Error message(s): ['/usr/local/lib/python3.7/dist-packages/xgboost/lib/../../xgboost.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block']

test_model.py::test_tvmc_workflow
tvmc/test_autoscheduler.py::test_tune_tasks
tvmc/test_autoscheduler.py::test_tune_tasks__tuning_records
tvmc/test_autoscheduler.py::test_tune_tasks__no_early_stopping
tvmc/test_command_line.py::test_tvmc_cl_workflow
tvmc/test_model.py::test_tvmc_workflow
tvmc/test_frontends.py::test_load_model___wrong_language__to_pytorch

Unsure

test_crt_aot.py::test_output_tensor_names

The text was updated successfully, but these errors were encountered:

As part of this any failing tests have been marked for follow up as part of apache#10673. This depends on fixes in apache#10659, apache#10672 and apache#10674 to scope other tests correctly.

As part of this any failing tests have been marked for follow up as part of #10673. This depends on fixes in #10659, #10672 and #10674 to scope other tests correctly.

masahi · 2022-03-31T04:07:28Z

test_topi_conv2d_int8.py::verify_conv2d_NCHWc_int8 was fixed in #10839

As part of this any failing tests have been marked for follow up as part of apache#10673. This depends on fixes in apache#10659, apache#10672 and apache#10674 to scope other tests correctly.

leandron · 2022-08-23T08:03:49Z

When enabling PyTorch and ONNX, I spotted a few more instance of these libgomp relates issues, so I'm adding new tests to the list of skipped tests in AArch64, for further investigation, but in the meanwhile, we guarantee that the other don't regress.

The error message looks like this:

xgboost.core.XGBoostError: XGBoost Library (libxgboost.so) could not be loaded.
Likely causes:
  * OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libomp.dylib for Mac OSX, libgomp.so for Linux and other UNIX-like OSes). Mac OSX users: Run `brew install libomp` to install OpenMP runtime.
  * You are running 32-bit Python on a 64-bit OS
Error message(s): ['/usr/local/lib/python3.7/dist-packages/xgboost/lib/../../xgboost.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block']

Or another version:

def test_guess_frontend_pytorch():
        # some CI environments wont offer pytorch, so skip in case it is not present
>       pytest.importorskip("torch")

tests/python/driver/tvmc/test_frontends.py:79: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.7/dist-packages/torch/__init__.py:198: in <module>
    _load_global_deps()
/usr/local/lib/python3.7/dist-packages/torch/__init__.py:151: in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <CDLL '/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_global_deps.so', handle 0 at 0xffff1c3e0bd0>
name = '/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_global_deps.so'
mode = 256, handle = None, use_errno = False, use_last_error = False

    def __init__(self, name, mode=DEFAULT_MODE, handle=None,
                 use_errno=False,
                 use_last_error=False):
        self._name = name
        flags = self._func_flags_
        if use_errno:
            flags |= _FUNCFLAG_USE_ERRNO
        if use_last_error:
            flags |= _FUNCFLAG_USE_LASTERROR
        if _sys.platform.startswith("aix"):
            """When the name contains ".a(" and ends with ")",
               e.g., "libFOO.a(libFOO.so)" - this is taken to be an
               archive(member) syntax for dlopen(), and the mode is adjusted.
               Otherwise, name is presented to dlopen() as a file argument.
            """
            if name and name.endswith(")") and ".a(" in name:
                mode |= ( _os.RTLD_MEMBER | _os.RTLD_NOW )
    
        class _FuncPtr(_CFuncPtr):
            _flags_ = flags
            _restype_ = self._func_restype_
        self._FuncPtr = _FuncPtr
    
        if handle is None:
>           self._handle = _dlopen(self._name, mode)
E           OSError: /usr/local/lib/python3.7/dist-packages/torch/lib/libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block

/usr/lib/python3.7/ctypes/__init__.py:364: OSError

In the process of investigating these, I realised that no environment with torch is running integration tests (see #12529), which is also reason for concern that should be fixed.

leandron · 2022-08-23T08:16:30Z

Just submitted #12554 with the new tests that need skipping, now that I'm testing the environments with Torch installed.

…12660) This patch makes test_load_model___wrong_language__to_pytorch to be skipped in AArch64 due to a bug that can be reproduced when enabling Integration Tests in machines with Torch installed in TVM. ``` The error message seen is: OSError: /usr/local/lib/python3.7/dist-packages/torch/lib/ libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block ``` While the test needs further investigation, it is being set as skipped so other tests can be enabled and not to regress and allow time for the investigation to be made. This relates to the issue described in #10673.

…pache#12660) This patch makes test_load_model___wrong_language__to_pytorch to be skipped in AArch64 due to a bug that can be reproduced when enabling Integration Tests in machines with Torch installed in TVM. ``` The error message seen is: OSError: /usr/local/lib/python3.7/dist-packages/torch/lib/ libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block ``` While the test needs further investigation, it is being set as skipped so other tests can be enabled and not to regress and allow time for the investigation to be made. This relates to the issue described in apache#10673.

Mousius added status: help wanted type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs labels Mar 18, 2022

Mousius mentioned this issue Mar 18, 2022

[CI] Enable integration tests on AArch64 #10677

Merged

tkonolige mentioned this issue Mar 21, 2022

[ARM,TOPI] Allow auto scheduler layout rewritting in dense #10699

Merged

manupak pushed a commit that referenced this issue Mar 25, 2022

[CI] Enable integration tests on AArch64 (#10677)

67da111

As part of this any failing tests have been marked for follow up as part of #10673. This depends on fixes in #10659, #10672 and #10674 to scope other tests correctly.

leandron mentioned this issue Aug 31, 2022

[Torch][AArch64] Skip test_load_model___wrong_language__to_pytorch #12660

Merged

leandron mentioned this issue Sep 8, 2022

[CI][AArch64] Mark tests to be skipped due to torch crash #12730

Merged

areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022

hpanda-naut removed the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 25, 2022

hpanda-naut added the dev:test-infra label Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking Issue] Enabling Testing in AArch64 #10673

[Tracking Issue] Enabling Testing in AArch64 #10673

Mousius commented Mar 18, 2022 •

edited by leandron

Loading

masahi commented Mar 31, 2022

leandron commented Aug 23, 2022 •

edited

Loading

leandron commented Aug 23, 2022

[Tracking Issue] Enabling Testing in AArch64 #10673

[Tracking Issue] Enabling Testing in AArch64 #10673

Comments

Mousius commented Mar 18, 2022 • edited by leandron Loading

This issue is to track progress enabling tests on AArch64

Potential Schedule Issues

xgboost issues

Unsure

masahi commented Mar 31, 2022

leandron commented Aug 23, 2022 • edited Loading

leandron commented Aug 23, 2022

Mousius commented Mar 18, 2022 •

edited by leandron

Loading

leandron commented Aug 23, 2022 •

edited

Loading