Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestNG execution hangs with version 7.9 or higher, with large amount of test cases #3152

Open
1 of 7 tasks
bapszy opened this issue Jul 23, 2024 · 16 comments
Open
1 of 7 tasks

Comments

@bapszy
Copy link

bapszy commented Jul 23, 2024

TestNG Version

7.9 and all newer versions

Note: only the latest version is supported

Expected behavior

Tests run, reports generated, test run finished successfully.

Actual behavior

When running large amount of test cases (200+) despite the last test case was finished, exetution doeasn't stop, build is terminated after timeout only.
We use Appium with TestNG for mobile testing in GitLab, on linux runners. This happens only with the large alltest suites. We run tests parallel, and don't use DataProvider or parametrization.
We use Gradle to execute tests with useTestNG().
Everything works with version 7.8, but none of the newer versions. I tried to debug on our side but could not find the cause. AfterMethod runs, but AfterSuite doesn't start, both have the alwaysrun=true set. I couldn't figure out what happens in between.
Please help. Thanks.

Is the issue reproducible on runner?

  • Shell
  • Maven
  • Gradle
  • Ant
  • Eclipse
  • IntelliJ
  • NetBeans

Test case sample

Please, share the test case (as small as possible) which shows the issue
Issue happens with large amount of test cases, 200+
Our alltest configuration:
image001
(Sorry I needed to hide some parts)

Contribution guidelines

Incase you plan to raise a pull request to fix this issue, please make sure you refer our Contributing section for detailed set of steps.

@krmahadevan
Copy link
Member

@bapszy - Can you please help share a sample that can be used to reproduce the problem. I ask this because given the nature of your issue, its not going to be easy to identify the root cause without a sample.

Please feel free to create a simple project that doesnt use any other library apart from TestNG, which we can use to reproduce the issue

Copy link

Hi, @bapszy.
We need more information to reproduce the issue.

Please help share a Minimal, Reproducible Example that can be used to recreate the issue.

In addition to sharing a sample, please also add the following details:

  • TestNG version being used.
  • JDK version being used.
  • How was the test run (IDE/Build Tools)?
  • How does your build file (pom.xml | build.gradle | build.gradle.kts) look like ?

It would be better if you could share a sample project that can be directly used to reproduce the problem.
Reply to this issue when all information is provided, thank you.

@bapszy
Copy link
Author

bapszy commented Jul 23, 2024

@krmahadevan As I mentioned it only happens whith large amount of test cases, I don't know how to create an example for that which doesn't take long time to create and still can reproduce the issue. Writing 200+ fake tests seems to be a hard work. Our alltest suite runs for more then 1 hour, on a complex system of Appium, GitlabRunners, Mac test machines which runs the Android emulators and iOS Simulators. If you can help me with any suggestions with it.

I'm happy to answer any questions, you have.

We are using Gradle 8.8
We are on Java 17

It happens when we run our big test suites in Gitlab. (locally it would take a whole day to run all tests on 1 thread)

@krmahadevan
Copy link
Member

@bapszy I hear you.

But please help me understand how do I go about debugging this problem to find out what is going wrong. There is no information that you have shared so far.

I would atleast need to know what your code does. Which is why I am asking for a sample to reproduce.

@itkhanz
Copy link

itkhanz commented Jul 25, 2024

@bapszy We have a similar setup, and I at-least have not encountered any such issue. Here is our environment:

  • Gradle 8.9
  • Java 17
  • TestNG 7.10.2
  • Appium Java client 9.2.3
  • Azure Pipelines (Linux machines)
  • BrowserStack remote execution

Also, I do use testng suite xml file and have around 100 tests which altogether take about 1.5-2 hours on when running in non-parallel mode. However i do not have any methods annotated with @AfterSuite.

This issue may be related to how your pipeline is configured to run tests or Appium client server communication.

Please share your build.gradle to see how you have configured TestNG as well as command which you use to run tests. Without any example to reproduce, it is hard to debug.

@bapszy
Copy link
Author

bapszy commented Jul 25, 2024

In our build.gradle configured this:

test {
    // run specific suite from command line with gradlew test -Psuite1
    systemProperty "configFailurePolicy", "continue"
    systemProperty "surefire.printSummary", "false"
    useTestNG() {
        useDefaultListeners = true

//Android
if (project.hasProperty('AndroidAllTests')) {
    suites 'src/main/java/com/****/*****/suites/android/AllTests.xml'
}

And we start the test run in the GitLab yml:
gradle -i test -PAlltest

Currently I'm working on going through the snapshots with a binary search to see if I can find the commit from which the problem occurs

@itkhanz
Copy link

itkhanz commented Jul 25, 2024

In our build.gradle configured this:

test {
    // run specific suite from command line with gradlew test -Psuite1
    systemProperty "configFailurePolicy", "continue"
    systemProperty "surefire.printSummary", "false"
    useTestNG() {
        useDefaultListeners = true

//Android
if (project.hasProperty('AndroidAllTests')) {
    suites 'src/main/java/com/****/*****/suites/android/AllTests.xml'
}

And we start the test run in the GitLab yml: gradle -i test -PAlltest

Currently I'm working on going through the snapshots with a binary search to see if I can find the commit from which the problem occurs

There is apparently nothing wrong here. Run your tests with gradle debug mode, as well as enabled debug mode logging in GitLab if there is one. This will help to see what is going on after the very last test has finished executing. You can also check Appium Server logs to see whats going on if there is some test execution stuck, or put some logging statements to see at which test or configuration method execution hangs.

@bapszy
Copy link
Author

bapszy commented Jul 29, 2024

This Snaphot version is where it starts to fail:
testng-7.9.0-20231218.041814-36.jar | Mon Dec 18 04:20:08 UTC 2023

If I'm right this is the corresponding pull request:
#3014

@krmahadevan
Copy link
Member

@bapszy - That is a good find. But just the seggregation of the executor logic AFAIK shouldn't cause any stalling.

The only stalling issue I am aware of was this #3028 and I think I fixed this as well in 7.11.0

@bapszy
Copy link
Author

bapszy commented Jul 29, 2024

@krmahadevan I don't see version 7.11.0, only the 7.10.2

What is the correct way of enabling TestNG logging? We tried the log4testng.properties but it doesn't work

@krmahadevan
Copy link
Member

7.11.0 is the upcoming version. Not yet released. But that fix wont impact you if you are NOT using the shared thread pool executors (Are you doing that somewhere?)

What is the correct way of enabling TestNG logging? We tried the log4testng.properties but it doesn't work

TestNG now uses the standard slf4j for its logging purposes. So you can use your favorite implementation (log4j or jul logger) to configure the slf4j logger and dump logs.

@bapszy
Copy link
Author

bapszy commented Aug 14, 2024

Hi again,
I was on holiday and now I continue the investigation of this. But the snapshots of 7.9 are missing from https://oss.sonatype.org/content/repositories/snapshots/org/testng/testng/

@bapszy
Copy link
Author

bapszy commented Aug 14, 2024

@krmahadevan
I could make the logs work, and with version 7.9.0 I could reach the point when it hangs.
I see the following line exactly every 60 seconds in the log at the end:
[idle-connection-reaper] DEBUG PoolingHttpClientConnectionManager:441 - Closing connections idle longer than 60000 MILLISECONDS

Is it something testNG calls?

@krmahadevan
Copy link
Member

No. That looks like some connection manager at work cleaning stale http connections.
TestNG does not have any connection managers.

@krmahadevan
Copy link
Member

Hi again,

I was on holiday and now I continue the investigation of this. But the snapshots of 7.9 are missing from https://oss.sonatype.org/content/repositories/snapshots/org/testng/testng/

I am not sure what may have happened there. I personally have not paid a lot of attention to snapshot jars because there is hardly anyone who asks for it.

@ilya-corp
Copy link

ilya-corp commented Sep 17, 2024

Hi, looks like I have the same problem. The amount of tests is 250, approx. 15 failed and 15 skipped.
I have tried JDK 21 and 22. The version of TestNG is latest.
I start tests from XML runner file from IntellijIdea. Selenide 7.4.2 or 7.5.0 is in use also.

<!DOCTYPE suite SYSTEM "https://testng.org/testng-1.0.dtd" >

<suite name="lk-multithreading" parallel="none" configfailurepolicy="continue">
    <listeners>
        <listener class-name="org.testng.reporters.TestHTMLReporter" />
        <listener class-name="org.testng.reporters.EmailableReporter2" />
        <listener class-name="org.testng.reporters.FailedReporter" />
        <listener class-name="org.testng.reporters.XMLReporter" />
        <listener class-name="ru.company.core.TestOrderRandomizer" />
    </listeners>
    <test name="proxy" parallel="methods" thread-count="8">
        <groups>
            <run>
                <exclude name="not-for-production" />
                <exclude name="draft" />
                <exclude name="no-proxy" />
            </run>
        </groups>
        <packages>
            <package name="ru.company.lk.tests.functional.*">
                <exclude name="ru.company.lk.tests.functional.debug.*" />
                <exclude name="ru.company.lk.tests.functional.monitoring.StagingLiveCheckerTests.*" />
                <exclude name="ru.company.lk.tests.functional.regular.RegularTests" />
            </package>
        </packages>
    </test>

    <test name="no-proxy" parallel="methods" thread-count="2">
        <groups>
            <run>
                <include name="no-proxy" />
                <exclude name="not-for-production" />
                <exclude name="draft" />
            </run>
        </groups>
        <packages>
            <package name="ru.company.lk.tests.functional.*">
                <exclude name="ru.company.lk.tests.functional.registration.*" />
                <exclude name="ru.company.lk.tests.functional.debug.*" />
                <exclude name="ru.company.lk.tests.functional.monitoring.StagingLiveCheckerTests.*" />
                <exclude name="ru.company.lk.tests.functional.regular.RegularTests" />
            </package>
        </packages>
    </test>
</suite>



It looks like @AfterSuite hooks not starting after every test completed.
But the TestNG IntellijIdea plugin showing progress animation for skipped tests.
To skip test manually I use throw new SkipException("skipped");

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants