Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] VerifyVersionConstantsIT is unreliable #12407

Open
peternied opened this issue Feb 21, 2024 · 9 comments
Open

[BUG] VerifyVersionConstantsIT is unreliable #12407

peternied opened this issue Feb 21, 2024 · 9 comments
Labels
bug Something isn't working Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. :test Adding or fixing a test >test-failure Test failure from CI, local build, etc.

Comments

@peternied
Copy link
Member

Describe the bug

testLuceneVersionConstant did not detect that when the Lucene version was updated [1] the test was broken

java.lang.AssertionError: 
Expected: <9.10.0>
     but: was <9.9.2>

Related component

Build

To Reproduce

  1. Update the lucene version
  2. Merge change into main
  3. Run the test org.opensearch.qa.verify_version_constants.VerifyVersionConstantsIT.testLuceneVersionConstant

Expected behavior

Test should fail when the version is upgraded, should pass all other times.

Additional Details

No response

@peternied peternied added bug Something isn't working untriaged flaky-test Random test failure that succeeds on second run labels Feb 21, 2024
@github-actions github-actions bot added the Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. label Feb 21, 2024
@peternied
Copy link
Member Author

[Triage - attendees 1 2 3 4 5]
Thanks for filing, look forward to seeing this fixed

@peternied
Copy link
Member Author

[Triage - attendees 1 2 3 4 5]
Thanks for filing - consider contributing with a pull request

@andrross
Copy link
Member

We have a number of "qa" tests that introduce cross-branch dependencies and result in temporary breakages until all branches are in sync, so to some extent this is by design. There are a number of issues discussing this problem and potential improvements:

@peternied I'm going to remove the "flaky-test" label as I think the failure here is a result of the design of the test. That's not to say it can't be improved, but I don't think it's quite the same category as other flaky tests. Let me know if you disagree.

@andrross andrross added >test-failure Test failure from CI, local build, etc. :test Adding or fixing a test and removed flaky-test Random test failure that succeeds on second run labels Feb 21, 2024
@peternied
Copy link
Member Author

@andrross I disagree - if we rev the Lucene version the test should immediately break, unless I don't understand how this test is executed.

Can you help confirm that the boundary of the changing the lucence version show have the rest request/response validation that can be done without requiring a change in another branch or as part of the backward compatibility suite? If the answer is the boundary is complex, then I'd be curious why this test should cross that boundary.

@andrross
Copy link
Member

if we rev the Lucene version the test should immediately break, unless I don't understand how this test is executed.

@peternied Sorry, I think I misunderstood this issue. It works like this: we have the Version.java file that defines a static mapping of OpenSearch version to Lucene version. What this test does is:

for each version in `BuildParams.bwcVersion.indexCompatible`:
 - Start an OpenSearch cluster for this version
 - call the `GET /` API and verify its response matches what is defined in the Version.java of the local code. 

That means if we change the Lucene version on main (aka 3.0), then the test will still pass because it will use the local code to start the 3.0 cluster. If we change the Lucene version on the 2.x branch, then this test will break on main until we update the 2.x version mapping on main. Does that make sense?

@andrross
Copy link
Member

The intent of this test, as I understand it, is to make sure the forward branches get updated when an older branch (e.g. 1.x or 2.x) update the version of Lucene that they use.

@peternied
Copy link
Member Author

If we change the Lucene version on the 2.x branch, then this test will break on main until we update the 2.x version mapping on main. Does that make sense?

Thanks for breaking down the mechanics - much appreciated.

Does it make sense that a contribution that was created before an off-branch version change was issued cannot pass CI until it merges from main - I'm having a harder time understanding that choice.

This sounds like rather than having a separate dependency (e.g. another maven package named OpenSearchVersionAllThing) using the multi-branched nature of git is being repurposed to hold this information. Does that seem like a fair statement?

@andrross
Copy link
Member

Does it make sense that a contribution that was created before an off-branch version change was issued cannot pass CI until it merges from main - I'm having a harder time understanding that choice.

This is something that happens quite a lot (see #2350). The fundamental issue is that the thing we're testing compatibility against (e.g. 2.x, the next unreleased minor) is a moving target. The CI failures tell you the code in your PR is not compatible with that next unreleased minor version. I honestly don't have any good ideas for how to detect that the failure is because of the code you introduced versus a separate change you haven't rebased.

@shwetathareja
Copy link
Member

shwetathareja commented Jul 5, 2024

https://build.ci.opensearch.org/job/gradle-check/42005/ again with assertion failure between 9.11.1 and 9.11.0
Update:
9.11.1 lucene version bump for 2.16 is merged in main as well. Test should pass after rebase from main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. :test Adding or fixing a test >test-failure Test failure from CI, local build, etc.
Projects
None yet
Development

No branches or pull requests

3 participants