[BUG] Evaulate and update all generation fields in SegmentCommitInfo during primary promotion #5701

dreamer-89 · 2023-01-04T19:42:01Z

Coming from #4365 (comment), we need to relook into different SegmentCommitInfo generations and update if needed. As part of this bug, identify use of different generation fields in SegmentCommitInfo and possibly bump them as well.

Poojita-Raj · 2023-01-11T18:58:47Z

Looking into it

mhoffm-aiven · 2023-01-27T13:25:59Z

Hey, im not sure if its related but i have been running an experiment using segment replication using an opensearch 2.4.1 cluster with 9 nodes:

OpenSearchException[Segment Replication failed]; nested: IllegalStateException[Shard [opensearch_segment_replication_test_index-2023-01-26][40] has local copies of segments that differ from the primary]

I started facing it after adding 6 nodes to the cluster ( to have 15 in total ) and restarting the opensearch processes on all nodes to add some static configuration. Since then the cluster has been performing relocations and failing due to this error i suppose. All of this was performed under production load.

dreamer-89 · 2023-01-27T19:05:50Z

Hey, im not sure if its related but i have been running an experiment using segment replication using an opensearch 2.4.1 cluster with 9 nodes:
OpenSearchException[Segment Replication failed]; nested: IllegalStateException[Shard [opensearch_segment_replication_test_index-2023-01-26][40] has local copies of segments that differ from the primary]
I started facing it after adding 6 nodes to the cluster ( to have 15 in total ) and restarting the opensearch processes on all nodes to add some static configuration. Since then the cluster has been performing relocations and failing due to this error i suppose. All of this was performed under production load.

Thank you @mhoffm-aiven for sharing this issue. This is really important to make the feature more robust when tested under production workload. So, thank you again!

In order to repro this issue, can you please share the cluster configuration, index settings and type of operations (index/update/delete), workload size.

mhoffm-aiven · 2023-01-30T14:21:28Z

Hey @dreamer-89,

sorry i already deleted the cluster, from memory though, cluster configuration that was different from our default was

cluster.routing.allocation.cluster_concurrent_rebalance = 16

the indexes were created using this template

PUT /_index_templates/segment_replication_test
{
  "index_patterns": [
    "opensearch_segment_replication_test_index*"
  ],
  "template": {
    "settings": {
      "replication.type": "SEGMENT"
    }
  }
}

using these settings

    "settings": {
        "index": {
            "refresh_interval": "5s",
            "sort.field": "timestamp",
            "sort.order": "asc",
        },
        "number_of_replicas": 1,
        "number_of_shards": 52,
     },

I noticed that the service was relocating shards constantly after adding 6 nodes and restarting the cluster by accident ( i mistakenly configured cluster.routing.allocation.cluster_concurrent_rebalance with static config ). Checking after a while (2 days) why its not done relocating i saw that this bug happened pretty close to messing with the cluster by adding nodes and restarting.

Workload was _bulk with only index operations, and occasionally deletion of old indexes.

nknize · 2023-02-02T06:08:04Z

Workload was _bulk with only index operations, and occasionally deletion of old indexes.

@mhoffm-aiven, were _bulk index and "occasional index deletes" happening concurrently?

Curious, what was the total number of documents?

...occasionally deletion of old indexes.

Old segrep indexes? Or just old random indexes?

Also, I noticed these are time based indexes using index sorting. @dreamer-89, do we have a test for this configuration?

There is also a ShuffleForcedMergePolicy that interleaves old and new segments in a new combined segment to make old and recent doc search more efficient for time based sorted indexes. I'm sure you weren't using it but I'm making a note that we should add a segrep test using this merge policy to ensure no unexpected re-allocations.

dreamer-89 · 2023-02-03T06:17:07Z

Also, I noticed these are time based indexes using index sorting. @dreamer-89, do we have a test for this configuration?

Thank you @nknize for the comment. No, there is no test for this configuration. We will add one, thanks for bringing this up.

Regarding issue, I suspect it happens because of shard re-balancing after addition of new nodes. The addition of new nodes results in shard relocation where segment file conflicts after primary promotion and is a known issue. The fix for the issue recently merged into 2.x but did not make into 2.4.1. I am trying to write an integration test to mimic this scenario to confirm.

mhoffm-aiven · 2023-02-03T06:50:34Z

Hey @nknize,

There were _bulk index operations happening concurrently but not for indexes that were to be deleted ( those have been 2 days with no new documents ); deleted indexes were old segrep indexes yes.

nknize · 2023-02-03T23:08:30Z

@mhoffm-aiven this is helpful. Do you have any data characteristics we could use to repro a scenario w/ a similar synthetic test data set? Number of docs, number of fields? Did you change any of the default _bulk parameters?

and restarting the cluster by accident

Did you perform a full cluster or rolling restart?

mhoffm-aiven · 2023-02-14T10:26:00Z

Hey @nknize,

sorry for the late response, i was a bit swarmed. I checked, no _bulk parameters were changed, _bulk payloads are one megabyte in size, i dont know how many documents, it might change since it are log lines; i collected them into one megabyte payloads and then flushed by issuing the accumulated _bulk request. The mapping is basically a message text field and a couple auxiliary keywords. Throughput as a whole for this way roughly 50-60mb/s.

I did restart all opensearch processes (by accident) while in the process of draining and adding nodes iirc.

Do you need any more detail?

mch2 · 2023-02-23T18:06:00Z

We've since added better logging to print the segments in conflict when this is caught again.

In the meantime have done some digging on these gen fields.

SegmentCommitInfo#fieldInfosGen - this file name is driven from segmentInfo.name, which is bumped with counter. - link

SegmentCommitInfo#docValuesGen - Also driven from segmentInfo.name - link

SegmentCommitInfo#delGen - yes, .liv is driven from the delGen and is an issue. Our InternalEngine is only using soft deletes with indexWriter#softUpdateDocument, and in my unit tests I'm not able to generate .liv, will take a closer look here... The other issue is advancing this counter does not look possible given its package private in SegmentCommitInfos, nor do I think its wise to depend on this behavior going fwd.

Will take a look at #6369 in an effort to better select a new primary. The original counter increase was an extra guardrail to prevent against behind replicas getting select as primary & re-writing segments under the same name. We could easily identify this case on syncing replicas by considering primary term, but would clobber the file and break ongoing queries.

Poojita-Raj · 2023-03-23T19:59:26Z

The .liv files are named using the delGen values. However, in our current implementation, internalEngine is only using softDeletes so we would not be creating any .liv files that would have to be copied over to the replica.

Since there are no .liv files, this scenario will not present a problem in the current implementation, so closing it. We could revisit it if any change in implementation of deletes occur.

dreamer-89 added bug Something isn't working untriaged labels Jan 4, 2023

This was referenced Jan 4, 2023

[Segment Replication] Bump segment infos counter before commit during replica promotion #4365

Merged

[Meta] Promote Segment Replication out of experimental. #5147

Closed

saratvemulapalli added the distributed framework label Jan 5, 2023

anasalkouz removed the untriaged label Jan 10, 2023

mch2 assigned mch2 and Poojita-Raj and unassigned mch2 Jan 11, 2023

dreamer-89 mentioned this issue Jan 27, 2023

[Segment Replication] Log diff.different files for debuggability #6049

Merged

reta mentioned this issue Jan 30, 2023

[BUG][Segment Replication] Stuck Initializing Shard w/ Segment Replication enabled #6084

Closed

Jeevananthan-23 mentioned this issue Feb 24, 2023

[Segment Replication] Should consider using RAFT consensus algorithm for Segment replication #6369

Open

anasalkouz unassigned Poojita-Raj Mar 2, 2023

anasalkouz added Migration:In Progress and removed Migration:In Progress labels Mar 17, 2023

anasalkouz assigned Poojita-Raj Mar 23, 2023

Poojita-Raj closed this as completed Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Evaulate and update all generation fields in SegmentCommitInfo during primary promotion #5701

[BUG] Evaulate and update all generation fields in SegmentCommitInfo during primary promotion #5701

dreamer-89 commented Jan 4, 2023

Poojita-Raj commented Jan 11, 2023

mhoffm-aiven commented Jan 27, 2023

dreamer-89 commented Jan 27, 2023 •

edited

Loading

mhoffm-aiven commented Jan 30, 2023

nknize commented Feb 2, 2023 •

edited

Loading

dreamer-89 commented Feb 3, 2023

mhoffm-aiven commented Feb 3, 2023

nknize commented Feb 3, 2023 •

edited

Loading

mhoffm-aiven commented Feb 14, 2023

mch2 commented Feb 23, 2023 •

edited

Loading

Poojita-Raj commented Mar 23, 2023

[BUG] Evaulate and update all generation fields in SegmentCommitInfo during primary promotion #5701

[BUG] Evaulate and update all generation fields in SegmentCommitInfo during primary promotion #5701

Comments

dreamer-89 commented Jan 4, 2023

Poojita-Raj commented Jan 11, 2023

mhoffm-aiven commented Jan 27, 2023

dreamer-89 commented Jan 27, 2023 • edited Loading

mhoffm-aiven commented Jan 30, 2023

nknize commented Feb 2, 2023 • edited Loading

dreamer-89 commented Feb 3, 2023

mhoffm-aiven commented Feb 3, 2023

nknize commented Feb 3, 2023 • edited Loading

mhoffm-aiven commented Feb 14, 2023

mch2 commented Feb 23, 2023 • edited Loading

Poojita-Raj commented Mar 23, 2023

dreamer-89 commented Jan 27, 2023 •

edited

Loading

nknize commented Feb 2, 2023 •

edited

Loading

nknize commented Feb 3, 2023 •

edited

Loading

mch2 commented Feb 23, 2023 •

edited

Loading