Incremental snapshot not working correctly after running forcemerge. #102395
Labels
>bug
:Distributed/Engine
Anything around managing Lucene and the Translog in an open shard.
Team:Distributed
Meta label for distributed team
Elasticsearch Version
Version: 8.7.1, Build: rpm/f229ed3f893a515d590d0f39b05f68913e2d9b53/2023-04-27T04:33:42.127815583Z, JVM: 20.0.1
Installed Plugins
discovery-ec2
Java Version
bundled
OS Version
Linux 6.1.59-84.139.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Oct 24 20:57:25 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Problem Description
We had an index containing many
docs.deleted
. In order to clean up the index it was restored on a cluster where no read/writes were happening on the index. After that, we ran a_forcemerge
withonly_expunge_deletes
as true on the index to optimize it. Before the force merge operation was started, SLM took one snapshot of the index to s3.The force merge operation optimized the index and reduced the number of segments from 5k to 1.5k. However, when we took a fresh snapshot of the index and restored it, the index didn't match the state after the force merge. The restored index still had the state that was before the force merge operation. On checking the snapshot details we also observed that the incremental snapshot taken after the force merge operation finished in 800 ms despite the fact that all underlying segment files had changed.
Steps to Reproduce
index.merge.policy.expunge_deletes_allowed
setting to 1.only_expunge_deletes
set to true._forcemerge
.Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: