ES shard allocation bug #110826

LuPan92 · 2024-07-12T13:11:40Z

Elasticsearch Version

Version: 7.17.18, Build: default/tar/8682172c2130b9a411b1bd5ff37c9792367de6b0/2024-02-02T12:04:59.691750271Z, JVM: 11.0.20

Installed Plugins

No response

Java Version

11.0.20

OS Version

Linux bsa5295 3.10.0-1160.108.1.el7.x86_64 #1 SMP Thu Jan 25 16:17:31 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.

Steps to Reproduce

My test steps are as follows

elasticsearch.yml

cluster.name: ISOP_1720490318878
http.port: 19399
network.host: bsa5295
node.name: bsa5295
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
node.master: true
node.data: true
path.logs: /home/worker/elasticsearch/logs
path.data: /home/sdf/elasticsearch/data,/home/sdg/elasticsearch/data,/home/sdh/elasticsearch/data,/home/sdi/elasticsearch/data,/home/sdb/elasticsearch/data,/home/sdc/elasticsearch/data,/home/sdd/elasticsearch/data,/home/sde/elasticsearch/data,/home/sdj/elasticsearch/data,/home/sdk/elasticsearch/data,/home/sdf/elasticsearch_1/data,/home/sdg/elasticsearch_1/data,/home/sdh/elasticsearch_1/data,/home/sdi/elasticsearch_1/data,/home/sdb/elasticsearch_1/data,/home/sdc/elasticsearch_1/data,/home/sdd/elasticsearch_1/data,/home/sde/elasticsearch_1/data,/home/sdj/elasticsearch_1/data,/home/sdk/elasticsearch_1/data,/home/sdf/elasticsearch_2/data,/home/sdg/elasticsearch_2/data
transport.tcp.port: 9300
gateway.expected_nodes: 1
action.auto_create_index: .watches,.triggered_watches,.watcher-history-*,.kibana*,.security,.monitoring*
discovery.seed_hosts: [bsa5295]
cluster.initial_master_nodes: [bsa5295]
thread_pool.write.queue_size: 2000
indices.recovery.max_bytes_per_sec: 200mb
cluster.routing.allocation.node_concurrent_recoveries: 10
cluster.max_shards_per_node: 5000
cluster.routing.allocation.same_shard.host: true
cluster.routing.allocation.disk.watermark.low: 90%
cluster.routing.allocation.disk.watermark.high: 95%
cluster.fault_detection.follower_check.timeout: 180s
cluster.fault_detection.follower_check.retry_count: 10
cluster.fault_detection.follower_check.interval: 10s
cluster.publish.timeout: 1800s
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 10%
xpack.ml.enabled: false
cluster.election.duration: 30s
cluster.join.timeout: 360s
node.processors: 80

Create index my_index1

curl -X PUT "bsa5295:19399/my_index1" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 25,
    "number_of_replicas": 0
  }
}'

View index uuid

[worker@bsa5295 ~]$ curl bsa5295:19399/_cat/indices | grep my_index1
green open  my_index1                                 fI4auV0lRtmxeYN8XrXf8g 25 0         0      0   5.5kb   5.5kb

View the path corresponding to the shard
You can see that all shards are allocated under /home/sdj/elasticsearch
expected behavior:

When the path.data configured on the data node is multi-path, it is expected that all shards of a single index can be distributed nearly evenly to each path.
Almost all shards take the same path, which is not in line with our expectations. Because when writing and querying the index data, only a few disk IO resources can be utilized at the same time.

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-07-12T17:23:31Z

Pinging @elastic/es-distributed (Team:Distributed)

mhl-b · 2024-07-12T22:02:37Z

Does this answer your question?

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/important-settings.html#_multiple_data_paths

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.

Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

LuPan92 · 2024-07-13T01:13:20Z

Does this answer your question?

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/important-settings.html#_multiple_data_paths

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.
Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

I checked the disk usage of each path in path.data. The high disk usage watermark we configured has not yet been reached. The disk usage of each path is as follows:

Supplement: When I reduce the length of path.data to less than 20 paths, the problem magically disappears

mhl-b · 2024-07-13T02:19:23Z

When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.

Not sure whats the disk io skew in your case, you might need to check your disk performance.
About all shards goes to the same path, then it's documented and expected behaviour. See link provided above first paragraph. Following:

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.

LuPan92 · 2024-07-13T04:43:15Z

expected behavior:

When the path.data configured on the data node is multi-path, it is expected that all shards of a single index can be distributed nearly evenly to each path.
Almost all shards take the same path, which is not in line with our expectations. Because when writing and querying the index data, only a few disk IO resources can be utilized at the same time.

mhl-b · 2024-07-15T16:57:19Z

Thanks for your interested in Elasticsearch. We are closing this issue as multiple data path feature is deprecated and we are not going to fix this issue.

LuPan92 added >bug needs:triage Requires assignment of a team area label labels Jul 12, 2024

mayya-sharipova added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed needs:triage Requires assignment of a team area label labels Jul 12, 2024

elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jul 12, 2024

mhl-b closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES shard allocation bug #110826

ES shard allocation bug #110826

LuPan92 commented Jul 12, 2024 •

edited

Loading

elasticsearchmachine commented Jul 12, 2024

mhl-b commented Jul 12, 2024 •

edited

Loading

LuPan92 commented Jul 13, 2024 •

edited

Loading

mhl-b commented Jul 13, 2024

LuPan92 commented Jul 13, 2024

mhl-b commented Jul 15, 2024 •

edited by DaveCTurner

Loading

ES shard allocation bug #110826

ES shard allocation bug #110826

Comments

LuPan92 commented Jul 12, 2024 • edited Loading

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

elasticsearchmachine commented Jul 12, 2024

mhl-b commented Jul 12, 2024 • edited Loading

LuPan92 commented Jul 13, 2024 • edited Loading

mhl-b commented Jul 13, 2024

LuPan92 commented Jul 13, 2024

mhl-b commented Jul 15, 2024 • edited by DaveCTurner Loading

LuPan92 commented Jul 12, 2024 •

edited

Loading

mhl-b commented Jul 12, 2024 •

edited

Loading

LuPan92 commented Jul 13, 2024 •

edited

Loading

mhl-b commented Jul 15, 2024 •

edited by DaveCTurner

Loading