Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Segment Replication flaky test failures #8279

Closed
15 of 16 tasks
dreamer-89 opened this issue Jun 27, 2023 · 19 comments
Closed
15 of 16 tasks

[Meta] Segment Replication flaky test failures #8279

dreamer-89 opened this issue Jun 27, 2023 · 19 comments
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc. v2.10.0

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Jun 27, 2023

Meta issue to track flaky test failures related to segment replication. This issue is to track recent surge in flaky test failures post remote store integration where all existing segment replication integration tests are run with remote store. Below report shows the top hitters.

Linking some of existing open issues below. We can start with ones which are top hitters based on numbers above

Related: #5669

@kotwanikunal
Copy link
Member

kotwanikunal commented Jul 6, 2023

kkotwani@bcd07463a8e0 ~ % ruby flaky-finder.rb --s 18400 --e 19300                                                                                        [07/6/23 | 12:39:12]
Will crawl builds from 18400 to 19300
------------------
225 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication (18420,18425,18426,18467,18482,18485,18487,18489,18489,18492,18495,18496,18497,18499,18502,18510,18510,18516,18517,18517,18519,18526,18526,18531,18533,18534,18534,18537,18539,18541,18542,18543,18546,18550,18555,18563,18568,18575,18583,18595,18595,18595,18596,18599,18599,18599,18600,18606,18610,18611,18612,18614,18615,18616,18618,18619,18620,18624,18625,18627,18628,18629,18630,18631,18633,18635,18635,18648,18649,18653,18659,18660,18667,18667,18672,18680,18685,18707,18712,18723,18724,18728,18728,18732,18738,18739,18739,18746,18757,18762,18763,18766,18775,18777,18777,18781,18782,18795,18796,18800,18801,18804,18804,18805,18808,18811,18815,18817,18818,18821,18826,18830,18832,18837,18838,18841,18841,18842,18852,18856,18857,18857,18861,18867,18871,18873,18873,18876,18879,18879,18879,18888,18889,18895,18900,18919,18920,18934,18947,18947,18947,18950,18951,18953,18956,18959,18962,18968,18974,18976,18977,18979,18982,18983,18983,18989,18989,18989,18991,18993,18993,18998,18999,19001,19002,19008,19009,19013,19013,19019,19021,19025,19029,19038,19041,19047,19048,19051,19053,19054,19055,19056,19060,19061,19062,19063,19063,19065,19066,19068,19069,19069,19075,19077,19080,19084,19090,19091,19092,19092,19093,19094,19099,19101,19102,19102,19106,19107,19109,19118,19119,19127,19135,19142,19142,19147,19152,19154,19157,19157,19159,19166,19168,19183,19193) - #8439 
55 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure (18467,18541,18541,18541,18543,18583,18611,18611,18612,18612,18624,18672,18705,18705,18712,18727,18727,18728,18728,18738,18763,18763,18766,18782,18782,18817,18817,18817,18821,18823,18823,18833,18852,18873,18873,18968,18977,18981,18981,19019,19021,19052,19052,19056,19056,19062,19107,19133,19168,19171,19171,19198,19198,19238,19282) - #7703 
54 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication (18404,18435,18489,18510,18517,18526,18534,18539,18543,18577,18605,18612,18624,18624,18635,18667,18719,18728,18728,18763,18777,18782,18801,18804,18807,18826,18830,18857,18873,18888,18895,18974,18983,18993,19004,19013,19024,19029,19063,19069,19101,19102,19120,19142,19156,19157,19170,19176,19195,19202,19222,19275,19289,19290)
45 org.opensearch.search.SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString (18415,18415,18454,18454,18506,18509,18569,18626,18626,18626,18654,18696,18696,18700,18701,18730,18730,18730,18806,18806,18806,18807,18810,18810,18816,18816,18833,18833,19010,19024,19026,19026,19033,19039,19052,19070,19070,19163,19187,19195,19195,19200,19202,19266,19292) - #8059 
33 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreInSameRemoteStoreEnabledIndex (18537,18550,18555,18575,18577,18595,18596,18611,18620,18628,18630,18660,18680,18680,18705,18727,18728,18738,18739,18757,18758,18775,18801,18842,18842,18989,18991,19004,19013,19016,19061,19068,19127)
27 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness (18412,18425,18497,18516,18575,18606,18624,18627,18630,18724,18728,18804,18805,18808,18900,18950,18999,19056,19069,19080,19101,19118,19188,19233,19245,19276,19289) - #8352 
24 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPressureServiceStats (18401,18415,18415,18427,18435,18505,18506,18509,18569,18577,18577,18605,18626,18645,18645,18654,18671,18696,18696,18716,18716,18716,18727,18727) - #7592 
23 org.opensearch.cluster.routing.MovePrimaryFirstTests.testClusterGreenAfterPartialRelocation (18573,18574,18582,18587,18592,18593,18604,18697,18718,18740,18742,18743,18747,18750,18756,18835,18843,18844,18851,18855,19161,19173,19174)
23 org.opensearch.cluster.routing.MovePrimaryFirstTests.classMethod (18573,18574,18582,18587,18592,18593,18604,18697,18718,18740,18742,18743,18747,18750,18756,18835,18843,18844,18851,18855,19161,19173,19174)
22 org.opensearch.cluster.service.MasterServiceTests.classMethod (18563,18563,18568,18568,18575,18575,18672,18672,18723,18723,18738,18738,19013,19013,19013,19013,19056,19056,19056,19056,19094,19094)
13 org.opensearch.indices.replication.SegmentReplicationIT.testScrollCreatedOnReplica (18404,18409,18539,18596,18627,18629,18727,18730,18757,18834,19172,19193,19242)
12 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.classMethod (18492,18492,18667,18667,18808,18808,18818,18818,19008,19008,19038,19038)
12 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testReplicaHasDiffFilesThanPrimary (18645,18700,18700,18700,18730,18739,18782,18810,19016,19171,19171,19266)
12 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testBasicTaskResourceTracking (18467,18497,18611,18833,18857,18871,18895,19048,19094,19181,19199,19233)
11 org.opensearch.cluster.service.MasterServiceTests.testThrottlingForMultipleTaskTypes (18563,18568,18575,18672,18723,18738,19013,19013,19056,19056,19094)
10 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue (18405,18425,18542,18763,18821,18823,18951,19048,19080,19276)
10 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.classMethod (18404,18404,18616,18616,19054,19054,19119,19119,19156,19156)
10 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation (18499,18633,18645,18660,18979,19014,19021,19118,19135,19142)
9 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreRemoteStoreIndicesWithRemoteTranslog (18531,18534,18542,18680,18766,18861,18861,18920,19108)
9 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreRemoteStoreIndicesWithoutRemoteTranslog (18534,18542,18580,18680,18766,18861,18977,19009,19108)
8 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards (18403,18467,18489,18725,18804,18841,19109,19198)
7 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled (18420,18563,18628,18757,19008,19060,19077)
6 org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testStartReplicationListenerSuccess (18410,18981,19004,19016,19192,19290)
6 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testPressureServiceStats (18492,18808,18818,19008,19038,19272)
5 org.opensearch.indices.replication.SegmentReplicationIT.testCancellation (18624,18781,18953,19127,19159)
5 org.opensearch.upgrade.DetectEsInstallationTaskTests.testTaskExecution (18697,18697,18697,18843,18843)
5 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshSuccessAfterFailureInFirstAttemptAfterSnapshotAndMetadataUpload (18404,18616,19054,19119,19156)
4 org.opensearch.remotestore.RemoteStoreIT.testStaleCommitDeletionWithInvokeFlush (19251,19251,19272,19277)
4 org.opensearch.remotestore.RemoteStoreIT.testStaleCommitDeletionWithoutInvokeFlush (19243,19251,19276,19294)
4 org.opensearch.search.aggregations.bucket.DoubleTermsIT.classMethod (19070,19070,19070,19070)
4 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPitCreatedOnReplica (18833,18834,18888,19092)
4 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all} (18738,18782,18989,19209)
4 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testCancellation (18467,18517,19056,19062)
4 org.opensearch.indices.replication.SegmentReplicationAllocationIT.testSingleIndexShardAllocation (18712,18871,19211,19242)
3 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocationWithSegRepFailure (18610,19061,19289)
3 org.opensearch.index.IndexServiceTests.testAsyncTranslogTrimTaskOnClosedIndex (18728,18900,19063)
3 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation (18415,18489,19062)
3 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries (19106,19108,19183)
3 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreShallowCopySnapshotWithDifferentRepo (18815,18823,18842)
2 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod (19264,19264)
2 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled (18672,18757)
2 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWhileContinuouslyIndexingAndWaitingForRefresh (18680,18680)
2 org.opensearch.http.netty4.Netty4HttpServerTransportTests.testLargeCompressedResponse (18697,18756)
2 org.opensearch.remotestore.RemoteStoreIT.testRemoteTranslogCleanup (18800,18800)
2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted (18841,19199)
2 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadNonexistentBlobThrowsNoSuchFileException (18871,19251)
2 org.opensearch.indices.replication.RemoteStoreReplicationSourceTests.classMethod (18974,18974)
2 org.opensearch.indices.replication.SegmentReplicationIT.testReplicationPostDeleteAndForceMerge (19133,19249)
2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation (19202,19205)
1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=update/20_doc_upsert/Doc upsert} (18401)
1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=indices.shrink/10_basic/Shrink index via API} (18806)
1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu (18429)
1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationDuringFetchPhase (18821)
1 org.opensearch.remotestore.CreateRemoteIndexIT.testRemoteStoreTranslogDisabledByUser (18833)
1 org.opensearch.action.admin.indices.create.CreateIndexIT.classMethod (18403)
1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/270_median_absolute_deviation_metric/bad arguments} (18401)
1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteBlobWithRetries (19250)
1 org.opensearch.indices.replication.RemoteStoreReplicationSourceTests.testGetCheckpointMetadataFailure (18974)
1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=indices.get_settings/10_basic/Get /_settings with local flag} (18401)
1 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshSuccessOnSecondAttempt (19008)
1 org.opensearch.gateway.ReplicaShardAllocatorIT.testPreferCopyWithHighestMatchingOperations (19021)
1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testReplicaHasDiffFilesThanPrimary (19062)
1 org.opensearch.action.admin.indices.create.CreateIndexIT.testCreateAndDeleteIndexConcurrently (18403)
1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals} (19077)
1 org.opensearch.index.reindex.UpdateByQueryBasicTests.testMultipleSources (19092)
1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testCancelPrimaryAllocation (19093)
1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/Long test} (18401)
1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadBlobWithRetries (19118)
1 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueDown (19264)
1 org.opensearch.remotestore.RemoteStoreBackpressureIT.testWritesRejectedDueToTimeLagBreach (19162)
1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testReplicationAfterForceMerge (19166)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testWriteRead (19184)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testDeleteBlobs (19184)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testMultipleSnapshotAndRollback (19184)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testList (19184)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles (19184)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testIndicesDeletedFromRepository (19184)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.classMethod (19184)
1 org.opensearch.cluster.ClusterHealthIT.testHealthOnClusterManagerFailover (19184)
1 org.opensearch.client.PitIT.testDeleteAllAndListAllPits (19195)
1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary (18685)
1 org.opensearch.client.PitIT.testCreateAndDeletePit (19195)
1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testIndexReopenClose (18667)
1 org.opensearch.search.pit.DeletePitMultiNodeIT.testDeleteWhileSearch (18611)
1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu (18430)
1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testReplicationPostDeleteAndForceMerge (18763)
1 org.opensearch.remotestore.RemoteStoreIT.testRemoteSegmentCleanup (18800)

@anasalkouz
Copy link
Member

I thought @Rishikesh1159 already fixed the testNodeDropWithOngoingReplication flaky test #8441.
Why it still shows as the most flaky test?

@Rishikesh1159
Copy link
Member

@anasalkouz my PR #8441 to fix testNodeDropWithOngoingReplication flaky test was merged 5 days ago on 5th July. Any PR opened before the fix or that didn't rebase their branch with latest fix will see this flaky test.

@anasalkouz
Copy link
Member

@kotwanikunal did you pulled testNodeDropWithOngoingReplication fix before your run? or there is something else need to be fixed?

@kotwanikunal
Copy link
Member

@kotwanikunal did you pulled testNodeDropWithOngoingReplication fix before your run? or there is something else need to be fixed?

The fix was first added with gradle run 19120 and the checks are from 18400 to 19300.
Re-running the script now should verify if the fix resolved the flaky test.

@dreamer-89
Copy link
Member Author

Recent runs from 19000 to 20900 (removed failures <= 2 for brevity).
CC @mch2

(base) ➜  OpenSearch git:(Issue-8850) ✗ ruby ~/Desktop/flaky-finder.rb --s 19000 --e 20900
Will crawl builds from 19000 to 20900
------------------
116 org.opensearch.search.SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString (19010,19024,19026,19026,19033,19039,19052,19070,19070,19163,19187,19195,19195,19200,19202,19266,19292,19336,19398,19398,19406,19418,19442,19442,19463,19463,19472,19472,19474,19474,19474,19492,19492,19494,19514,19514,19518,19520,19520,19521,19521,19542,19542,19555,19562,19619,19619,19789,19789,19821,19824,19853,19853,19859,19859,19888,19903,19940,19940,19947,19961,19969,19985,19985,19985,19986,19987,19987,19993,19993,19993,19997,20008,20008,20014,20014,20023,20023,20023,20026,20037,20041,20048,20051,20095,20159,20163,20170,20170,20177,20230,20230,20324,20390,20455,20482,20482,20487,20580,20673,20673,20673,20716,20726,20774,20812,20836,20836,20838,20838,20838,20873,20891,20899,20899,20899)
88 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication (19004,19013,19024,19029,19063,19069,19101,19102,19120,19142,19156,19157,19170,19176,19195,19202,19222,19275,19289,19290,19319,19321,19335,19391,19418,19428,19437,19465,19495,19520,19521,19564,19577,19644,19654,19654,19690,19704,19716,19729,19729,19729,19775,19842,19850,19859,19901,19904,19909,19914,19922,19924,19934,19937,19940,19960,19974,19993,20002,20026,20027,20027,20035,20043,20046,20046,20056,20063,20070,20070,20076,20129,20163,20179,20213,20348,20372,20376,20377,20399,20405,20423,20461,20487,20812,20836,20872,20873)
65 org.opensearch.remotestore.RemoteStoreIT.testStaleCommitDeletionWithInvokeFlush (19251,19251,19272,19277,19329,19329,19331,19331,19335,19343,19358,19372,19373,19373,19373,19568,19582,19598,19607,19626,19650,19653,19659,19670,19674,19679,19727,19752,19759,19785,19814,19830,19833,19854,19857,19875,19917,19919,19923,19960,19968,19999,20043,20078,20105,20162,20165,20222,20225,20261,20296,20312,20315,20325,20378,20386,20394,20399,20405,20408,20424,20573,20575,20596,20602)
62 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication (19001,19002,19008,19009,19013,19013,19019,19021,19025,19029,19038,19041,19047,19048,19051,19053,19054,19055,19056,19060,19061,19062,19063,19063,19065,19066,19068,19069,19069,19075,19077,19080,19084,19090,19091,19092,19092,19093,19094,19099,19101,19102,19102,19106,19107,19109,19118,19119,19127,19135,19142,19142,19147,19152,19154,19157,19157,19159,19166,19168,19183,19193)
58 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness (19056,19069,19080,19101,19118,19188,19233,19245,19276,19289,19308,19321,19343,19399,19428,19430,19475,19644,19652,19654,19681,19684,19693,19698,19730,19757,19761,19767,19866,19870,19901,19904,19960,19978,19999,20040,20080,20089,20178,20291,20322,20330,20352,20402,20529,20598,20604,20644,20676,20714,20724,20768,20787,20796,20801,20820,20827,20873)
39 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation (19014,19021,19118,19135,19142,19330,19418,19539,19713,19720,19854,19859,19887,19887,19891,19935,19941,19967,20007,20071,20085,20145,20179,20240,20330,20399,20484,20525,20525,20557,20685,20720,20731,20783,20798,20812,20813,20858,20866)
38 org.opensearch.indices.replication.SegmentReplicationIT.testScrollCreatedOnReplica (19172,19193,19242,19336,19356,19405,19441,19492,19586,19597,19626,19631,19698,19731,19742,19767,19879,19910,19952,19975,19978,20043,20162,20213,20223,20240,20273,20313,20449,20530,20647,20714,20743,20774,20779,20787,20802,20875)
34 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.classMethod (19008,19008,19038,19038,19354,19354,19369,19369,19396,19396,19449,19449,19501,19501,19653,19653,19862,19862,19875,19875,19942,19942,20046,20046,20091,20091,20230,20230,20240,20240,20298,20298,20443,20443)
34 org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testStaleCommitDeletionWithInvokeFlush (19917,19919,19942,19960,19965,19967,19968,19984,20043,20063,20070,20078,20126,20135,20165,20165,20168,20259,20261,20262,20298,20315,20325,20333,20366,20378,20392,20399,20405,20476,20540,20567,20573,20596)
33 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testBasicTaskResourceTracking (19048,19094,19181,19199,19233,19358,19368,19534,19613,19619,19671,19752,19888,19897,19919,19950,19955,19960,19997,20175,20220,20230,20231,20288,20317,20320,20392,20644,20663,20749,20778,20811,20829)
32 org.opensearch.cluster.service.MasterServiceTests.classMethod (19013,19013,19013,19013,19056,19056,19056,19056,19094,19094,19397,19397,19397,19397,19397,19397,19541,19541,19633,19633,19718,19718,19825,19825,19825,19825,20298,20298,20305,20305,20613,20613)
30 org.opensearch.remotestore.RemoteStoreIT.testStaleCommitDeletionWithoutInvokeFlush (19243,19251,19276,19294,19308,19318,19319,19325,19329,19330,19331,19341,19350,19354,19358,19358,19369,19373,19373,19389,19433,19441,19591,19623,19650,19733,19742,19785,19785,19788)
28 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure (19019,19021,19052,19052,19056,19056,19062,19107,19133,19168,19171,19171,19198,19198,19238,19282,19341,19359,19359,19388,19422,19454,19454,19459,19503,19509,19509,19968)
28 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocationWithSegRepFailure (19061,19289,19323,19343,19358,19636,19759,19764,19919,19919,19919,19935,20023,20058,20071,20089,20089,20261,20283,20399,20531,20671,20671,20675,20681,20748,20770,20787)
28 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.classMethod (19054,19054,19119,19119,19156,19156,19331,19331,19331,19331,19395,19395,19584,19584,19629,19629,19681,19681,19718,19718,19733,19733,19825,19825,19844,19844,19937,19937)
27 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testPressureServiceStats (19008,19038,19272,19354,19369,19396,19449,19501,19613,19862,19875,19909,19909,19909,19964,19967,19967,20091,20298,20304,20304,20312,20364,20364,20364,20443,20443)
26 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPitCreatedOnReplica (19092,19318,19321,19504,19519,19635,19661,19737,19761,19914,19964,20008,20313,20726,20774,20774,20820,20829,20857,20857,20872,20891,20891,20899,20899,20899)
24 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue (19048,19080,19276,19366,19395,19508,19601,19730,19751,19901,19904,19924,20026,20031,20050,20118,20129,20348,20484,20545,20655,20713,20733,20740)
20 org.opensearch.indices.replication.SegmentReplicationIT.testDropPrimaryDuringReplication (19451,19477,19551,19570,19657,19670,19704,19733,19733,20094,20339,20511,20522,20522,20541,20675,20689,20758,20798,20813)
19 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled (19008,19060,19077,19313,19388,19455,19640,19759,19858,19875,20062,20149,20149,20179,20276,20476,20697,20755,20805)
18 org.opensearch.cluster.routing.MovePrimaryFirstTests.testClusterGreenAfterPartialRelocation (19161,19173,19174,20020,20100,20174,20238,20247,20264,20383,20456,20463,20486,20572,20650,20719,20735,20754)
18 org.opensearch.cluster.routing.MovePrimaryFirstTests.classMethod (19161,19173,19174,20020,20100,20174,20238,20247,20264,20383,20456,20463,20486,20572,20650,20719,20735,20754)
17 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testReplicaHasDiffFilesThanPrimary (19016,19171,19171,19266,19391,19482,19550,19725,20646,20680,20700,20817,20835,20858,20862,20862,20865)
17 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards (19109,19198,19334,19497,19517,19660,19833,19991,20052,20141,20276,20331,20405,20765,20810,20825,20872)
16 org.opensearch.cluster.service.MasterServiceTests.testThrottlingForMultipleTaskTypes (19013,19013,19056,19056,19094,19397,19397,19397,19541,19633,19718,19825,19825,20298,20305,20613)
16 org.opensearch.index.IndexSettingsTests.testDefaultSearchPipelineWithoutFeatureFlag (20320,20361,20366,20372,20376,20377,20378,20386,20392,20394,20395,20396,20397,20402,20404,20408)
14 org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testShardAlreadyReplicating (19984,20235,20469,20503,20525,20556,20580,20613,20682,20704,20714,20731,20768,20891)
13 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testCancellation (19056,19062,19395,19395,19614,19850,19978,20123,20168,20168,20367,20378,20487)
13 org.opensearch.indices.replication.SegmentReplicationIT.testPitCreatedOnReplica (19549,19619,19706,19718,19956,19958,20107,20367,20397,20743,20774,20802,20810)
13 org.opensearch.indices.replication.SegmentReplicationIT.classMethod (20367,20397,20449,20625,20743,20743,20774,20774,20779,20787,20802,20802,20810)
11 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted (19199,19517,19551,19623,19651,19935,19958,20145,20306,20872,20899)
11 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWhileContinuouslyIndexingAndWaitingForRefresh (19783,19901,19913,19997,20118,20157,20163,20177,20240,20288,20288)
10 org.opensearch.index.IndexServiceTests.testAsyncTranslogTrimTaskOnClosedIndex (19063,19504,19578,19678,19875,20386,20423,20490,20811,20819)
9 org.opensearch.index.reindex.BulkByScrollResponseTests.testFromXContent (20020,20100,20174,20238,20247,20264,20383,20486,20572)
9 org.opensearch.search.SearchTimeoutIT.testSimpleTimeout (19539,19774,19834,20168,20352,20529,20580,20761,20858)
8 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshSuccessOnThirdAttemptAttempt (19308,19348,19436,19436,19436,19511,19511,19927)
8 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testReplicaHasDiffFilesThanPrimary (19062,20404,20443,20476,20476,20493,20493,20523)
8 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation (19202,19205,19385,19672,20487,20638,20812,20819)
7 org.opensearch.gateway.RecoveryFromGatewayIT.testReuseInFileBasedPeerRecovery (19308,19373,19418,19495,19555,19664,20229)
7 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testDropPrimaryDuringReplication (19935,20063,20157,20229,20229,20229,20240)
7 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all} (19209,19463,19731,20449,20557,20644,20813)
7 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation (19062,19817,19952,20074,20590,20682,20695)
6 org.opensearch.search.ConcurrentSegmentSearchTimeoutIT.testSimpleTimeout (19536,19578,20098,20275,20635,20773)
6 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreInSameRemoteStoreEnabledIndex (19004,19013,19016,19061,19068,19127)
6 org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testStartReplicationListenerSuccess (19004,19016,19192,19290,19422,19514)
6 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshAfterCommit (19331,19331,19395,19584,19681,19718)
6 org.opensearch.indices.replication.RemoteStoreReplicationSourceTests.classMethod (19356,19356,19558,19558,19608,19608)
6 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshSuccessAfterFailureInFirstAttemptAfterSnapshotAndMetadataUpload (19054,19119,19156,19733,19825,20339)
6 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testScrollWithConcurrentIndexAndSearch (19653,19698,19942,20046,20230,20240)
6 org.opensearch.cluster.metadata.IndexGraveyardTests.testXContent (20020,20100,20174,20247,20486,20572)
6 org.opensearch.indices.replication.SegmentReplicationRelocationIT.classMethod (20525,20671,20720,20783,20787,20798)
5 org.opensearch.remotestore.SegmentReplicationWithRemoteStorePressureIT.testAddReplicaWhileWritesBlocked (20771,20796,20806,20899,20900)
5 org.opensearch.indices.replication.SegmentReplicationIT.testReplicationPostDeleteAndForceMerge (19133,19249,19555,19678,20165)
5 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testScrollCreatedOnReplica (20443,20443,20484,20522,20530)
5 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshSuccessOnSecondAttempt (19008,19330,19373,19426,19445)
5 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled (19311,19945,20618,20731,20872)
4 org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testStaleCommitDeletionWithoutInvokeFlush (20078,20122,20129,20141)
4 org.opensearch.search.aggregations.bucket.DoubleTermsIT.classMethod (19070,19070,19070,19070)
4 org.opensearch.indices.replication.SegmentReplicationAllocationIT.testSingleIndexShardAllocation (19211,19242,19974,20672)
4 org.opensearch.indices.replication.RemoteStoreReplicationSourceTests.testGetSegmentFiles (19356,19608,19888,19937)
4 org.opensearch.search.pit.DeletePitMultiNodeIT.testDeleteWhileSearch (19785,19967,20506,20874)
4 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod (19264,19264,20892,20892)
4 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries (19106,19108,19183,19736)
4 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadBlobWithRetries (19118,19494,19658,20138)
3 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteBlobWithRetries (19250,19564,19887)
3 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary (20001,20044,20095)
3 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testPitCreatedOnReplica (20450,20476,20523)
3 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testPrimaryStopped_ReplicaPromoted (19485,19771,20298)
3 org.opensearch.remotestore.CreateRemoteIndexClusterDefaultDocRep.testRemoteStoreTranslogDisabledByUser (19519,19942,20647)
3 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testAfterCommit (19629,19844,19937)
3 org.opensearch.client.PitIT.testDeleteAllAndListAllPits (19195,20663,20731)
3 org.opensearch.indices.replication.SegmentReplicationIT.testCancellation (19127,19159,19562)
3 org.opensearch.remotestore.CreateRemoteIndexIT.testRemoteStoreOverrideTranslogDisabledCorrectly (19368,19898,20051)
3 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testReplicationAfterForceMerge (19166,19995,20286)
3 org.opensearch.index.translog.RemoteFSTranslogTests.testMetadataFileDeletion (19430,19610,20548)
3 org.opensearch.cluster.ClusterHealthIT.testHealthOnClusterManagerFailover (19184,19924,20313)
...

@sachinpkale
Copy link
Member

sachinpkale commented Aug 9, 2023

Created issue for org.opensearch.remotestore.RemoteStoreIT.testRemoteTranslogRestoreWithNoDataPostRefresh: #9185

@Bukhtawar
Copy link
Collaborator

Tests with failures:

  • org.opensearch.remotestore.RemoteRestoreSnapshotIT.testRestoreInSameRemoteStoreEnabledIndex
  • org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
  • org.opensearch.snapshots.CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled

https://build.ci.opensearch.org/job/gradle-check/22217/

@sachinpkale
Copy link
Member

@harishbhakuni ^

@sachinpkale
Copy link
Member

Created issue for org.opensearch.index.translog.RemoteFSTranslogTests.testConcurrentWriteViewsAndSnapshot - #9455

@mch2
Copy link
Member

mch2 commented Aug 21, 2023

Created a separate meta for remote store only tests - #9467. We can continue to use this issue for SR related IT failures cc @sachinpkale @Bukhtawar.

@dreamer-89
Copy link
Member Author

dreamer-89 commented Sep 13, 2023

Tried another run with 24000 start id (~12 days back) and below are tests which are still failing occasionally on gradle check related to segment replication. Created tracking issues for top 5.

CC @sachinpkale @mch2 @anasalkouz

5 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDeleteOperations (24184,24787,24847,25220,25478)
5 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation (24228,24358,24612,24686,25111)
5 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testMultipleShards (24057,24205,24358,25310,25397)
4 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted (24139,24530,24751,24937)
4 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation (24158,25031,25325,25325)
3 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary (24087,24190,24978)
3 org.opensearch.indices.replication.SegmentReplicationIT.classMethod (24144,25260,25274)
3 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWhileContinuouslyIndexingAndWaitingForRefresh (24453,24567,25031)
2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testUpdateOperations (25321,25372
2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPrimaryReceivesDocsDuringReplicaRecovery (25034,25433)

Complete flaky report

➜  ~ ruby ~/Desktop/flaky-finder.rb --s 24000 --e 25495
Will crawl builds from 24000 to 25495
------------------
20 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all} (24059,24109,24179,24231,24515,24756,24868,24923,25023,25032,25089,25121,25126,25186,25208,25214,25240,25245,25292,25332)
18 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards (24106,24106,24108,24109,24184,24369,24515,24548,24637,24867,25089,25104,25159,25225,25240,25321,25321,25456)
16 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.classMethod (24109,24109,24313,24313,24461,24461,24470,24470,24515,24515,24544,24544,24587,24587,24728,24728)
15 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled (24096,24274,24443,24568,24676,24785,24873,24999,25044,25089,25202,25227,25275,25337,25455)
12 org.opensearch.cluster.routing.MovePrimaryFirstTests.classMethod (24214,24263,24555,24833,25117,25235,25247,25276,25368,25458,25467,25471)
12 org.opensearch.cluster.routing.MovePrimaryFirstTests.testClusterGreenAfterPartialRelocation (24214,24263,24555,24833,25117,25235,25247,25276,25368,25458,25467,25471)
11 org.opensearch.remotestore.RemoteStoreStatsIT.testStatsResponseFromLocalNode (24366,24470,24477,24740,25111,25114,25132,25168,25351,25456,25456)
10 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled (24057,24231,24367,24526,24617,24804,25094,25327,25434,25469)
8 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testReplicaPromotion (24109,24305,24313,24461,24470,24544,24587,24728)
8 org.opensearch.remotestore.RemoteStoreStatsIT.testStatsOnRemoteStoreRestore (24049,24092,24266,24310,24315,24500,25372,25472)
8 org.opensearch.client.PitIT.testDeleteAllAndListAllPits (24051,24149,24254,24354,24900,25165,25265,25429)
7 org.opensearch.snapshots.CloneSnapshotIT.testCloneAfterRepoShallowSettingDisabled (24051,24307,24605,25163,25271,25370,25373)
7 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testContainerCreationAndDeletion (24459,25043,25126,25252,25261,25401,25410)
7 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue (24048,24547,24547,24568,25209,25271,25415)
7 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness (24205,24223,24241,24515,25248,25260,25373)
6 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadNonexistentBlobThrowsNoSuchFileException (24090,24224,25006,25190,25230,25285)
6 org.opensearch.search.SearchTimeoutIT.testSimpleTimeout {p0={"search.concurrent_segment_search.enabled":"false"}} (24037,24274,24693,25260,25260,25332)
6 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled (24132,24946,25379,25379,25400,25424)
6 org.opensearch.index.shard.RemoteIndexShardTests.testRepicaCleansUpOldCommitsWhenReceivingNew (24149,24200,24943,25229,25364,25473)
6 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteBlobWithRetries (24184,24184,24787,25286,25286,25449)
5 org.opensearch.remotestore.RemoteStoreStatsIT.testStatsResponseAllShards (24240,24254,24357,24581,24997)
5 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDeleteOperations (24184,24787,24847,25220,25478)
5 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testCancelPrimaryAllocation (24228,24358,24612,24686,25111)
5 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testMultipleShards (24057,24205,24358,25310,25397)
5 org.opensearch.index.ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting (24964,25066,25235,25247,25433)
5 org.opensearch.remotestore.RemoteStoreStatsIT.testDownloadStatsCorrectnessSinglePrimaryMultipleReplicaShards (24165,24363,24461,24491,25409)
4 org.opensearch.snapshots.CloneSnapshotIT.testShallowCloneNameAvailability (24255,24370,24480,24662)
4 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob (24337,25223,25392,25493)
4 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod (24314,24314,25349,25349)
4 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testTrackerData (24841,24955,25184,25184)
4 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPrimaryStopped_ReplicaPromoted (24139,24530,24751,24937)
4 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries (24479,25192,25192,25273)
4 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation (24158,25031,25325,25325)
4 org.opensearch.search.SearchWeightedRoutingIT.testShardRoutingWithNetworkDisruption_FailOpenEnabled (24190,24194,25043,25238)
3 org.opensearch.remotestore.RemoteStoreStatsIT.testDownloadStatsCorrectnessSinglePrimarySingleReplica (24358,24662,24968)
3 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary (24087,24190,24978)
3 org.opensearch.indices.replication.SegmentReplicationIT.classMethod (24144,25260,25274)
3 org.opensearch.index.ShardIndexingPressureIT.testShardIndexingPressureTrackingDuringBulkWrites (24315,24676,24682)
3 org.opensearch.search.pit.DeletePitMultiNodeIT.testDeleteWhileSearch (24357,24994,25293)
3 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation (24452,24674,25006)
3 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWhileContinuouslyIndexingAndWaitingForRefresh (24453,24567,25031)
3 org.opensearch.index.ShardIndexingPressureSettingsIT.classMethod (25066,25235,25247)
3 org.opensearch.index.shard.RemoteIndexShardTests.classMethod (25433,25433,25433)
2 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles (25043,25125)
2 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueDown (24314,25349)
2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testUpdateOperations (25321,25372)
2 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testIndicesDeletedFromRepository (24353,25043)
2 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.classMethod (24353,24353)
2 org.opensearch.search.basic.SearchWithRandomIOExceptionsIT.classMethod (24057,25272)
2 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadBlobWithRetries (24366,25261)
2 org.opensearch.search.basic.SearchWithRandomIOExceptionsIT.testRandomDirectoryIOExceptions (24057,25272)
2 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=pit/10_basic/Delete all} (24598,25163)
2 org.opensearch.search.SearchWeightedRoutingIT.testStrictWeightedRoutingWithCustomString_FailOpenEnabled (24637,25003)
2 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testDeleteBlobs (24771,25043)
2 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats (24804,25181)
2 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testWriteRead (24049,25043)
2 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testReadNonExistingPath (24961,25043)
2 org.opensearch.index.IndexServiceTests.testAsyncTranslogTrimTaskOnClosedIndex (24962,25472)
2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testPrimaryReceivesDocsDuringReplicaRecovery (25034,25433)
2 org.opensearch.search.SearchTimeoutIT.testSimpleTimeout {p0={"search.concurrent_segment_search.enabled":"true"}} (24224,25260)
2 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testAfterCommit (24250,25204)
1 org.opensearch.index.shard.RemoteIndexShardTests.testSegmentInfosAndReplicationCheckpointTuple (25433)
1 org.opensearch.action.admin.indices.create.CreateIndexIT.testCreateAndDeleteIndexConcurrently (24051)
1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringFetchPhase (24480)
1 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testRefreshAfterCommit (24515)
1 org.opensearch.action.admin.indices.create.CreateIndexIT.testRestartIndexCreationAfterFullClusterRestart (24548)
1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testList (25043)
1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testReplicationAfterPrimaryRefreshAndFlush (24617)
1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotAndRestore (25043)
1 org.opensearch.snapshots.CloneSnapshotIT.testCloneShallowSnapshotIndex (25442)
1 org.opensearch.monitor.fs.FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout (25194)
1 org.opensearch.indices.replication.SegmentReplicationAllocationIT.testAllocationWithDisruption (24223)
1 org.opensearch.remotestore.RemoteStoreStatsIT.classMethod (24310)
1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication (24311)
1 org.opensearch.cluster.ClusterHealthIT.testHealthOnClusterManagerFailover (24313)
1 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure (24241)
1 org.opensearch.index.reindex.DeleteByQueryBasicTests.testDeleteByQueryWithMultipleIndices (24087)
1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testReplicaHasDiffFilesThanPrimary (24962)
1 org.opensearch.remotestore.RemoteStoreStatsIT.testNonZeroPrimaryStatsOnNewlyCreatedIndexWithZeroDocs (24315)
1 org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshRefresh (24061)
1 org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles (24223)
1 org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit (24037)
1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringQueryPhase (25323)
1 org.opensearch.index.reindex.UpdateByQueryBasicTests.testMultipleSources (25003)
1 org.opensearch.cluster.routing.allocation.IndexShardHotSpotTests.classMethod (24214)
1 org.opensearch.cluster.routing.allocation.IndexShardHotSpotTests.testClusterScaleInWithSkew (24214)
1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/20_terms/string profiler via global ordinals} (24370)
1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.classMethod (25375)
1 org.opensearch.action.admin.indices.create.CreateIndexIT.classMethod (24051)

@anasalkouz
Copy link
Member

Do we need to keep tracking them here? seems they failed very few times. I don't think those should be considered as high priority.
I do suggest close this issue, since we close all major flaky tests. We can track these low priority issues separately.
@dreamer-89 do you see any of these flaky tests are concerning and need an immediate attention?

@dreamer-89
Copy link
Member Author

Do we need to keep tracking them here? seems they failed very few times. I don't think those should be considered as high priority. I do suggest close this issue, since we close all major flaky tests. We can track these low priority issues separately. @dreamer-89 do you see any of these flaky tests are concerning and need an immediate attention?

Thanks @anasalkouz for the suggestion. Yes, failures have reduced drastically when compared to previous run. Yes, we can track these issues separately owning to lower rate of failures. I did a quick scan and observe doc-count mis-match assertion trips in all cases except one #10029 where relocating primary shard continues to perform round of segment replication while not in primary mode (this is case of node-node communication). This is not problematic as replica should start replication with new primary and cancel ongoing replication with older primary. The issues failing with doc count mis-match needs more deep dive!

As suggested, I am closing this issue and flaky tests can be tracked separately.

@dblock
Copy link
Member

dblock commented Oct 2, 2023

I hit one that was in the above list again, opened #10303.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc. v2.10.0
Projects
Status: Done
Development

No branches or pull requests

10 participants