Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Evaulate and update all generation fields in SegmentCommitInfo during primary promotion #5701

Closed
dreamer-89 opened this issue Jan 4, 2023 · 11 comments
Assignees
Labels
bug Something isn't working distributed framework

Comments

@dreamer-89
Copy link
Member

Coming from #4365 (comment), we need to relook into different SegmentCommitInfo generations and update if needed. As part of this bug, identify use of different generation fields in SegmentCommitInfo and possibly bump them as well.

@Poojita-Raj
Copy link
Contributor

Looking into it

@mch2 mch2 assigned mch2 and Poojita-Raj and unassigned mch2 Jan 11, 2023
@mhoffm-aiven
Copy link

Hey, im not sure if its related but i have been running an experiment using segment replication using an opensearch 2.4.1 cluster with 9 nodes:

OpenSearchException[Segment Replication failed]; nested: IllegalStateException[Shard [opensearch_segment_replication_test_index-2023-01-26][40] has local copies of segments that differ from the primary]

I started facing it after adding 6 nodes to the cluster ( to have 15 in total ) and restarting the opensearch processes on all nodes to add some static configuration. Since then the cluster has been performing relocations and failing due to this error i suppose. All of this was performed under production load.

@dreamer-89
Copy link
Member Author

dreamer-89 commented Jan 27, 2023

Hey, im not sure if its related but i have been running an experiment using segment replication using an opensearch 2.4.1 cluster with 9 nodes:

OpenSearchException[Segment Replication failed]; nested: IllegalStateException[Shard [opensearch_segment_replication_test_index-2023-01-26][40] has local copies of segments that differ from the primary]

I started facing it after adding 6 nodes to the cluster ( to have 15 in total ) and restarting the opensearch processes on all nodes to add some static configuration. Since then the cluster has been performing relocations and failing due to this error i suppose. All of this was performed under production load.

Thank you @mhoffm-aiven for sharing this issue. This is really important to make the feature more robust when tested under production workload. So, thank you again!

In order to repro this issue, can you please share the cluster configuration, index settings and type of operations (index/update/delete), workload size.

@mhoffm-aiven
Copy link

Hey @dreamer-89,

sorry i already deleted the cluster, from memory though, cluster configuration that was different from our default was

cluster.routing.allocation.cluster_concurrent_rebalance = 16

the indexes were created using this template

PUT /_index_templates/segment_replication_test
{
  "index_patterns": [
    "opensearch_segment_replication_test_index*"
  ],
  "template": {
    "settings": {
      "replication.type": "SEGMENT"
    }
  }
}

using these settings

    "settings": {
        "index": {
            "refresh_interval": "5s",
            "sort.field": "timestamp",
            "sort.order": "asc",
        },
        "number_of_replicas": 1,
        "number_of_shards": 52,
     },

I noticed that the service was relocating shards constantly after adding 6 nodes and restarting the cluster by accident ( i mistakenly configured cluster.routing.allocation.cluster_concurrent_rebalance with static config ). Checking after a while (2 days) why its not done relocating i saw that this bug happened pretty close to messing with the cluster by adding nodes and restarting.

Workload was _bulk with only index operations, and occasionally deletion of old indexes.

@nknize
Copy link
Collaborator

nknize commented Feb 2, 2023

Workload was _bulk with only index operations, and occasionally deletion of old indexes.

@mhoffm-aiven, were _bulk index and "occasional index deletes" happening concurrently?

Curious, what was the total number of documents?

...occasionally deletion of old indexes.

Old segrep indexes? Or just old random indexes?

Also, I noticed these are time based indexes using index sorting. @dreamer-89, do we have a test for this configuration?

There is also a ShuffleForcedMergePolicy that interleaves old and new segments in a new combined segment to make old and recent doc search more efficient for time based sorted indexes. I'm sure you weren't using it but I'm making a note that we should add a segrep test using this merge policy to ensure no unexpected re-allocations.

@dreamer-89
Copy link
Member Author

Also, I noticed these are time based indexes using index sorting. @dreamer-89, do we have a test for this configuration?

Thank you @nknize for the comment. No, there is no test for this configuration. We will add one, thanks for bringing this up.

Regarding issue, I suspect it happens because of shard re-balancing after addition of new nodes. The addition of new nodes results in shard relocation where segment file conflicts after primary promotion and is a known issue. The fix for the issue recently merged into 2.x but did not make into 2.4.1. I am trying to write an integration test to mimic this scenario to confirm.

@mhoffm-aiven
Copy link

Hey @nknize,

There were _bulk index operations happening concurrently but not for indexes that were to be deleted ( those have been 2 days with no new documents ); deleted indexes were old segrep indexes yes.

@nknize
Copy link
Collaborator

nknize commented Feb 3, 2023

@mhoffm-aiven this is helpful. Do you have any data characteristics we could use to repro a scenario w/ a similar synthetic test data set? Number of docs, number of fields? Did you change any of the default _bulk parameters?

and restarting the cluster by accident

Did you perform a full cluster or rolling restart?

@mhoffm-aiven
Copy link

Hey @nknize,

sorry for the late response, i was a bit swarmed. I checked, no _bulk parameters were changed, _bulk payloads are one megabyte in size, i dont know how many documents, it might change since it are log lines; i collected them into one megabyte payloads and then flushed by issuing the accumulated _bulk request. The mapping is basically a message text field and a couple auxiliary keywords. Throughput as a whole for this way roughly 50-60mb/s.

I did restart all opensearch processes (by accident) while in the process of draining and adding nodes iirc.

Do you need any more detail?

@mch2
Copy link
Member

mch2 commented Feb 23, 2023

We've since added better logging to print the segments in conflict when this is caught again.

In the meantime have done some digging on these gen fields.

SegmentCommitInfo#fieldInfosGen - this file name is driven from segmentInfo.name, which is bumped with counter. - link

SegmentCommitInfo#docValuesGen - Also driven from segmentInfo.name - link

SegmentCommitInfo#delGen - yes, .liv is driven from the delGen and is an issue. Our InternalEngine is only using soft deletes with indexWriter#softUpdateDocument, and in my unit tests I'm not able to generate .liv, will take a closer look here... The other issue is advancing this counter does not look possible given its package private in SegmentCommitInfos, nor do I think its wise to depend on this behavior going fwd.

Will take a look at #6369 in an effort to better select a new primary. The original counter increase was an extra guardrail to prevent against behind replicas getting select as primary & re-writing segments under the same name. We could easily identify this case on syncing replicas by considering primary term, but would clobber the file and break ongoing queries.

@Poojita-Raj
Copy link
Contributor

The .liv files are named using the delGen values. However, in our current implementation, internalEngine is only using softDeletes so we would not be creating any .liv files that would have to be copied over to the replica.

Since there are no .liv files, this scenario will not present a problem in the current implementation, so closing it. We could revisit it if any change in implementation of deletes occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework
Projects
Status: Done
Development

No branches or pull requests

7 participants