Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] YARN EGX IT build failing parquet_testing_test can't find file #10586

Closed
tgravescs opened this issue Mar 13, 2024 · 3 comments · Fixed by #10599
Closed

[BUG] YARN EGX IT build failing parquet_testing_test can't find file #10586

tgravescs opened this issue Mar 13, 2024 · 3 comments · Fixed by #10599
Assignees
Labels
bug Something isn't working

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
The parquet_testing_test.py is failing on the YARN EGX Cluster. rapids_it-EGX-Yarn - build 845

13:03:20  ___________ ERROR collecting src/main/python/parquet_testing_test.py ___________
13:03:20  integration_tests/src/main/python/parquet_testing_test.py:162: in <module>
13:03:20      @pytest.mark.parametrize("path", gen_testing_params_for_valid_files())
13:03:20  integration_tests/src/main/python/parquet_testing_test.py:151: in gen_testing_params_for_valid_files
13:03:20      for f in locate_parquet_testing_files():
13:03:20  integration_tests/src/main/python/parquet_testing_test.py:129: in locate_parquet_testing_files
13:03:20      files += glob(p, pattern)
13:03:20  integration_tests/src/main/python/parquet_testing_test.py:88: in hdfs_glob
13:03:20      process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
13:03:20  /usr/lib/python3.6/subprocess.py:729: in __init__
13:03:20      restore_signals, start_new_session)
13:03:20  /usr/lib/python3.6/subprocess.py:1364: in _execute_child
13:03:20      raise child_exception_type(errno_num, err_msg, err_filename)
13:03:20  E   FileNotFoundError: [Errno 2] No such file or directory: 'hadoop': 'hadoop'
13:03:20  - generated xml file: /....../TEST-pytest-1710352823536861083.xml -
13:03:20  =========================== short test summary info ============================
13:03:20  ERROR integration_tests/src/main/python/parquet_testing_test.py - FileNotFoun...
13:03:20  !!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
13:03:20  =============== 26677 deselected, 92 warnings, 1 error in 12.22s ===============
13:03:20  ___________ ERROR collecting src/main/python/parquet_testing_test.py ___________
13:03:20  ERROR integration_tests/src/main/python/parquet_testing_test.py - FileNotFoun...
@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 13, 2024
@yinqingh
Copy link
Collaborator

yinqingh commented Mar 14, 2024

We don't actually run parquet_testing_test in this job and the hdfs_glob is called during the pytest collection process. The root cause is that the integration tests run in yarn docker container and the hadoop is only available on the host. Seems we could mount hadoop to the container to fix it.

@tgravescs
Copy link
Collaborator Author

What do you mean by "hadoop".. is this a directory in hdfs, local file system, a user?

@yinqingh
Copy link
Collaborator

Sorry for the confusion caused. The hdfs_glob relies on hadoop command to find files from hdfs. In container, hadoop command is not installed which causes this error. I think probably we could mount the hadoop binary and other necessary hadoop related confs and files from the host to fix this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants