Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split expensive pytest files in cases level [skip ci] #4336

Merged
merged 2 commits into from
Dec 13, 2021

Conversation

pxLi
Copy link
Collaborator

@pxLi pxLi commented Dec 9, 2021

Signed-off-by: Peixin Li pxli@nyu.edu

fix #3279

  1. enable worker cleanup
  2. dynamically set parallelism for normal tests
  3. separate gpu memory-consuming and time-consuming cases
  4. parallelize 311+ cache_test

Verified in multiple scenarios, saved around ~15 mins (30x) and 30 mins (311+) comparing to current setup

Signed-off-by: Peixin Li <pxli@nyu.edu>
revans2
revans2 previously approved these changes Dec 9, 2021
# --halt "now,fail=1": exit when the first job fail, and kill running jobs.
# we can set it to "never" and print failed ones after finish running all tests if needed
# --group: print stderr after test finished for better readability
parallel --group --halt "now,fail=1" -j2 run_test ::: ${mem_consuming_cases}

time_consuming_tests_str=$(echo ${time_consuming_tests} | xargs | sed 's/ / or /g')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have a tag for these tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be good if we could document better some place about these things

Copy link
Collaborator Author

@pxLi pxLi Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have a tag for these tests?

Yes, tag would be better here. Let me add some +TODO, and I will add some tags here if we see more cases in this category

jenkins/spark-tests.sh Show resolved Hide resolved
jenkins/spark-tests.sh Show resolved Hide resolved
if [[ $PARALLEL_TEST == "true" ]] && [ -x "$(command -v parallel)" ]; then
cache_test_cases=$(./run_pyspark_from_build.sh -k "cache_test" \
--collect-only -qq 2>/dev/null | grep -oP '(?<=::).*?(?=\[)' | uniq | shuf | xargs)
# hardcode parallelism as 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment looks wrong, using -j5 below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, fixed

Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, please file an issue for the followup tagging.

@pxLi
Copy link
Collaborator Author

pxLi commented Dec 13, 2021

build

@pxLi
Copy link
Collaborator Author

pxLi commented Dec 13, 2021

created #4348 to follow tagging updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improve test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] split expensive pytest files in cases level
3 participants