Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES shard allocation bug #110826

Closed
LuPan92 opened this issue Jul 12, 2024 · 6 comments
Closed

ES shard allocation bug #110826

LuPan92 opened this issue Jul 12, 2024 · 6 comments
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team

Comments

@LuPan92
Copy link

LuPan92 commented Jul 12, 2024

Elasticsearch Version

Version: 7.17.18, Build: default/tar/8682172c2130b9a411b1bd5ff37c9792367de6b0/2024-02-02T12:04:59.691750271Z, JVM: 11.0.20

Installed Plugins

No response

Java Version

11.0.20

OS Version

Linux bsa5295 3.10.0-1160.108.1.el7.x86_64 #1 SMP Thu Jan 25 16:17:31 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.

Steps to Reproduce

My test steps are as follows

  1. elasticsearch.yml
cluster.name: ISOP_1720490318878
http.port: 19399
network.host: bsa5295
node.name: bsa5295
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
node.master: true
node.data: true
path.logs: /home/worker/elasticsearch/logs
path.data: /home/sdf/elasticsearch/data,/home/sdg/elasticsearch/data,/home/sdh/elasticsearch/data,/home/sdi/elasticsearch/data,/home/sdb/elasticsearch/data,/home/sdc/elasticsearch/data,/home/sdd/elasticsearch/data,/home/sde/elasticsearch/data,/home/sdj/elasticsearch/data,/home/sdk/elasticsearch/data,/home/sdf/elasticsearch_1/data,/home/sdg/elasticsearch_1/data,/home/sdh/elasticsearch_1/data,/home/sdi/elasticsearch_1/data,/home/sdb/elasticsearch_1/data,/home/sdc/elasticsearch_1/data,/home/sdd/elasticsearch_1/data,/home/sde/elasticsearch_1/data,/home/sdj/elasticsearch_1/data,/home/sdk/elasticsearch_1/data,/home/sdf/elasticsearch_2/data,/home/sdg/elasticsearch_2/data
transport.tcp.port: 9300
gateway.expected_nodes: 1
action.auto_create_index: .watches,.triggered_watches,.watcher-history-*,.kibana*,.security,.monitoring*
discovery.seed_hosts: [bsa5295]
cluster.initial_master_nodes: [bsa5295]
thread_pool.write.queue_size: 2000
indices.recovery.max_bytes_per_sec: 200mb
cluster.routing.allocation.node_concurrent_recoveries: 10
cluster.max_shards_per_node: 5000
cluster.routing.allocation.same_shard.host: true
cluster.routing.allocation.disk.watermark.low: 90%
cluster.routing.allocation.disk.watermark.high: 95%
cluster.fault_detection.follower_check.timeout: 180s
cluster.fault_detection.follower_check.retry_count: 10
cluster.fault_detection.follower_check.interval: 10s
cluster.publish.timeout: 1800s
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 10%
xpack.ml.enabled: false
cluster.election.duration: 30s
cluster.join.timeout: 360s
node.processors: 80
  1. Create index my_index1
curl -X PUT "bsa5295:19399/my_index1" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 25,
    "number_of_replicas": 0
  }
}'
  1. View index uuid
[worker@bsa5295 ~]$ curl bsa5295:19399/_cat/indices | grep my_index1
green open  my_index1                                 fI4auV0lRtmxeYN8XrXf8g 25 0         0      0   5.5kb   5.5kb
  1. View the path corresponding to the shard
    企业微信截图_b672c602-b73c-49ad-8cb5-76ce64d5fd38

  2. You can see that all shards are allocated under /home/sdj/elasticsearch

  3. expected behavior:

  • When the path.data configured on the data node is multi-path, it is expected that all shards of a single index can be distributed nearly evenly to each path.
  • Almost all shards take the same path, which is not in line with our expectations. Because when writing and querying the index data, only a few disk IO resources can be utilized at the same time.

Logs (if relevant)

No response

@LuPan92 LuPan92 added >bug needs:triage Requires assignment of a team area label labels Jul 12, 2024
@mayya-sharipova mayya-sharipova added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed needs:triage Requires assignment of a team area label labels Jul 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jul 12, 2024
@mhl-b
Copy link
Contributor

mhl-b commented Jul 12, 2024

Does this answer your question?

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/important-settings.html#_multiple_data_paths

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.

Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

@LuPan92
Copy link
Author

LuPan92 commented Jul 13, 2024

Does this answer your question?

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/important-settings.html#_multiple_data_paths

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.
Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

I checked the disk usage of each path in path.data. The high disk usage watermark we configured has not yet been reached. The disk usage of each path is as follows:
企业微信截图_fbab5f0c-2483-4b70-bc19-85ba7b50b571

Supplement: When I reduce the length of path.data to less than 20 paths, the problem magically disappears

@mhl-b
Copy link
Contributor

mhl-b commented Jul 13, 2024

When the path.data length of the es data node exceeds 20, all shards of the same index will be allocated to one path. Causes disk io skew when writing.

Not sure whats the disk io skew in your case, you might need to check your disk performance.
About all shards goes to the same path, then it's documented and expected behaviour. See link provided above first paragraph. Following:

If needed, you can specify multiple paths in path.data. Elasticsearch stores the node’s data across all provided paths but keeps each shard’s data on the same path.

@LuPan92
Copy link
Author

LuPan92 commented Jul 13, 2024

expected behavior:

  • When the path.data configured on the data node is multi-path, it is expected that all shards of a single index can be distributed nearly evenly to each path.

  • Almost all shards take the same path, which is not in line with our expectations. Because when writing and querying the index data, only a few disk IO resources can be utilized at the same time.

@mhl-b
Copy link
Contributor

mhl-b commented Jul 15, 2024

Thanks for your interested in Elasticsearch. We are closing this issue as multiple data path feature is deprecated and we are not going to fix this issue.

@mhl-b mhl-b closed this as completed Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

4 participants