Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking Issue] Enabling Testing in AArch64 #10673

Open
2 of 10 tasks
Mousius opened this issue Mar 18, 2022 · 3 comments
Open
2 of 10 tasks

[Tracking Issue] Enabling Testing in AArch64 #10673

Mousius opened this issue Mar 18, 2022 · 3 comments
Labels
dev:test-infra status: help wanted type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs

Comments

@Mousius
Copy link
Member

Mousius commented Mar 18, 2022

This issue is to track progress enabling tests on AArch64

As part of enabling more tests in the AArch64 container, a number of tests had to be skipped and need to be fixed.

See also: #10677 / #10564

Potential Schedule Issues

xgboost issues

E           xgboost.core.XGBoostError: XGBoost Library (libxgboost.so) could not be loaded.
E           Likely causes:
E             * OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libomp.dylib for Mac OSX, libgomp.so for Linux and other UNIX-like OSes). Mac OSX users: Run `brew install libomp` to install OpenMP runtime.
E             * You are running 32-bit Python on a 64-bit OS
E           Error message(s): ['/usr/local/lib/python3.7/dist-packages/xgboost/lib/../../xgboost.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block']
  • test_model.py::test_tvmc_workflow
  • tvmc/test_autoscheduler.py::test_tune_tasks
  • tvmc/test_autoscheduler.py::test_tune_tasks__tuning_records
  • tvmc/test_autoscheduler.py::test_tune_tasks__no_early_stopping
  • tvmc/test_command_line.py::test_tvmc_cl_workflow
  • tvmc/test_model.py::test_tvmc_workflow
  • tvmc/test_frontends.py::test_load_model___wrong_language__to_pytorch

Unsure

  • test_crt_aot.py::test_output_tensor_names
@Mousius Mousius added status: help wanted type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs labels Mar 18, 2022
Mousius added a commit to Mousius/tvm that referenced this issue Mar 18, 2022
As part of this any failing tests have been marked for follow up as part of apache#10673.

This depends on fixes in apache#10659, apache#10672 and apache#10674 to scope other tests correctly.
Mousius added a commit to Mousius/tvm that referenced this issue Mar 18, 2022
As part of this any failing tests have been marked for follow up as part of apache#10673.

This depends on fixes in apache#10659, apache#10672 and apache#10674 to scope other tests correctly.
Mousius added a commit to Mousius/tvm that referenced this issue Mar 25, 2022
As part of this any failing tests have been marked for follow up as part of apache#10673.

This depends on fixes in apache#10659, apache#10672 and apache#10674 to scope other tests correctly.
Mousius added a commit to Mousius/tvm that referenced this issue Mar 25, 2022
As part of this any failing tests have been marked for follow up as part of apache#10673.

This depends on fixes in apache#10659, apache#10672 and apache#10674 to scope other tests correctly.
manupak pushed a commit that referenced this issue Mar 25, 2022
As part of this any failing tests have been marked for follow up as part of #10673.

This depends on fixes in #10659, #10672 and #10674 to scope other tests correctly.
@masahi
Copy link
Member

masahi commented Mar 31, 2022

test_topi_conv2d_int8.py::verify_conv2d_NCHWc_int8 was fixed in #10839

pfk-beta pushed a commit to pfk-beta/tvm that referenced this issue Apr 11, 2022
As part of this any failing tests have been marked for follow up as part of apache#10673.

This depends on fixes in apache#10659, apache#10672 and apache#10674 to scope other tests correctly.
@leandron
Copy link
Contributor

leandron commented Aug 23, 2022

When enabling PyTorch and ONNX, I spotted a few more instance of these libgomp relates issues, so I'm adding new tests to the list of skipped tests in AArch64, for further investigation, but in the meanwhile, we guarantee that the other don't regress.

The error message looks like this:

xgboost.core.XGBoostError: XGBoost Library (libxgboost.so) could not be loaded.
Likely causes:
  * OpenMP runtime is not installed (vcomp140.dll or libgomp-1.dll for Windows, libomp.dylib for Mac OSX, libgomp.so for Linux and other UNIX-like OSes). Mac OSX users: Run `brew install libomp` to install OpenMP runtime.
  * You are running 32-bit Python on a 64-bit OS
Error message(s): ['/usr/local/lib/python3.7/dist-packages/xgboost/lib/../../xgboost.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block']

Or another version:

def test_guess_frontend_pytorch():
        # some CI environments wont offer pytorch, so skip in case it is not present
>       pytest.importorskip("torch")

tests/python/driver/tvmc/test_frontends.py:79: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.7/dist-packages/torch/__init__.py:198: in <module>
    _load_global_deps()
/usr/local/lib/python3.7/dist-packages/torch/__init__.py:151: in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <CDLL '/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_global_deps.so', handle 0 at 0xffff1c3e0bd0>
name = '/usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_global_deps.so'
mode = 256, handle = None, use_errno = False, use_last_error = False

    def __init__(self, name, mode=DEFAULT_MODE, handle=None,
                 use_errno=False,
                 use_last_error=False):
        self._name = name
        flags = self._func_flags_
        if use_errno:
            flags |= _FUNCFLAG_USE_ERRNO
        if use_last_error:
            flags |= _FUNCFLAG_USE_LASTERROR
        if _sys.platform.startswith("aix"):
            """When the name contains ".a(" and ends with ")",
               e.g., "libFOO.a(libFOO.so)" - this is taken to be an
               archive(member) syntax for dlopen(), and the mode is adjusted.
               Otherwise, name is presented to dlopen() as a file argument.
            """
            if name and name.endswith(")") and ".a(" in name:
                mode |= ( _os.RTLD_MEMBER | _os.RTLD_NOW )
    
        class _FuncPtr(_CFuncPtr):
            _flags_ = flags
            _restype_ = self._func_restype_
        self._FuncPtr = _FuncPtr
    
        if handle is None:
>           self._handle = _dlopen(self._name, mode)
E           OSError: /usr/local/lib/python3.7/dist-packages/torch/lib/libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block

/usr/lib/python3.7/ctypes/__init__.py:364: OSError

In the process of investigating these, I realised that no environment with torch is running integration tests (see #12529), which is also reason for concern that should be fixed.

@leandron
Copy link
Contributor

Just submitted #12554 with the new tests that need skipping, now that I'm testing the environments with Torch installed.

driazati pushed a commit that referenced this issue Aug 31, 2022
…12660)

This patch makes test_load_model___wrong_language__to_pytorch to be
skipped in AArch64 due to a bug that can be reproduced when enabling
Integration Tests in machines with Torch installed in TVM.

```
The error message seen is:
OSError: /usr/local/lib/python3.7/dist-packages/torch/lib/
libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block
```

While the test needs further investigation, it is being set as
skipped so other tests can be enabled and not to regress and allow
time for the investigation to be made.

This relates to the issue described in #10673.
@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@hpanda-naut hpanda-naut removed the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 25, 2022
xinetzone pushed a commit to daobook/tvm that referenced this issue Nov 25, 2022
…pache#12660)

This patch makes test_load_model___wrong_language__to_pytorch to be
skipped in AArch64 due to a bug that can be reproduced when enabling
Integration Tests in machines with Torch installed in TVM.

```
The error message seen is:
OSError: /usr/local/lib/python3.7/dist-packages/torch/lib/
libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block
```

While the test needs further investigation, it is being set as
skipped so other tests can be enabled and not to regress and allow
time for the investigation to be made.

This relates to the issue described in apache#10673.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev:test-infra status: help wanted type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs
Projects
None yet
Development

No branches or pull requests

5 participants