Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elasticsearch] Add dimensions fields for TSDB migration #6623

Merged
merged 8 commits into from
Jun 27, 2023

Conversation

constanca-m
Copy link
Contributor

@constanca-m constanca-m commented Jun 20, 2023

What does this PR do?

Set some fields as dimension so the data streams can be migrated to TSDB in the future.

This PR sets dimensions on all metrics data streams, except for CCR, Cluster Stats, Enrich, Shard and Pending Tasks. Read next section for more details.

Details

To enable TSDB we need some fields set as dimension. The combination of the dimension fields + the timestamp must be unique. Otherwise, documents will be overwritten. Check this for more information.

The same set of ECS fields were set as dimensions: service.address, host.name, agent.id. This decision was based on #5193 (comment). None of the cloud fields are considered needed, since the service address needs to be set for the integration to work. If two different integrations are used for the same service address, then they can be overwritten since they will have the exact same values.

  • CCR: Pending
    1. Exists field of type nested: issue [Elasticsearch][CCR] Change type nested to object #6604.
    2. Exists field of type text: issue [TSDB] TSDB enablement fails when there is a field of type text elasticsearch#96254.
  • Cluster stats: Pending
    1. Exists field of type text: issue [TSDB] TSDB enablement fails when there is a field of type text elasticsearch#96254.
  • Ingest Pipeline:
    • elasticsearch.ingest_pipeline.name_fingerprint: the name of a pipeline is of type wildcard. Since that type does not qualify for a dimension, we need to create a fingerprint for it.
    • elasticsearch.ingest_pipeline.processor.order_index to distinguish between the processors and the pipeline itself. The processors should have that field set, and the pipeline should not.
  • Index
    • elasticsearch.index.name is unique per cluster. Should be enough.
  • Index summary:
    • From my understanding, each service.address only sends one document per timestamp. No more fields should be needed to set as dimension.
  • Index recovery
    • elasticsearch.index.name since it is unique per cluster
    • elasticsearch.index.recovery.id since an index can be distributed in more than one shard
  • Node:
    • elasticsearch.node.name is unique per cluster
  • Node stats:
    • elasticsearch.node.name is unique per cluster
  • ML Job:
    • elasticsearch.ml.job.id is unique
  • Enrich: Pending
  • Shard: not rich in metrics. Will not be migrated.
  • Pending tasks: not rich in metrics. Will not be migrated.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Related issues

Relates to #6618.

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m requested a review from a team as a code owner June 20, 2023 09:38
@constanca-m constanca-m self-assigned this Jun 20, 2023
@constanca-m constanca-m added Integration:elasticsearch Elasticsearch Team:Infra Monitoring UI - DEPRECATED Label for the Infrastructure Monitoring UI team. - DEPRECATED - Use Team:obs-ux-infra_services labels Jun 20, 2023
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@elasticmachine
Copy link

elasticmachine commented Jun 20, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-06-23T13:57:25.694+0000

  • Duration: 31 min 37 sec

Test stats 🧪

Test Results
Failed 0
Passed 60
Skipped 0
Total 60

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Jun 20, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (5/5) 💚
Files 100.0% (9/9) 💚
Classes 100.0% (9/9) 💚
Methods 87.5% (98/112) 👎 -11.401
Lines 91.98% (562/611) 👎 -1.796
Conditionals 100.0% (0/0) 💚

@constanca-m
Copy link
Contributor Author

/test

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@crespocarlos crespocarlos self-requested a review June 22, 2023 09:39
Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All data_streams are missing fields that were defined in this convention: #5193 (comment)
CCR:

  • doesn't it need a elasticsearch.cluster.id as a dimension?

enrich:

  • doesn't it need a elasticsearch.cluster.id as a dimension? node.id could be enough, but elasticsearch.cluster.id ensure uniqueness if we have multiple clusters

Index:

  • elasticsearch.cluster.id I think is missing. there could be a case for the same name of the index in multiple clusters

the same for index_recovery, I think it applies to all data_streams

ingest_pipeline:
do you think it is needed to add node.id or the pipeline name/id is the same on all nodes?

ml:
should node.id be added

"node": {
"id": "2eRkSFTXSLie_seiHf4Y1A",
"name": "efacd89a6e88"
?

@@ -89,6 +89,7 @@
type: keyword
- name: cluster.name
type: keyword
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if cluster.id will not be a better candidate for the dimension?

Copy link
Contributor Author

@constanca-m constanca-m Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not migrate this one, it is still pending (description).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not migrate this one, it is still pending (description).

but you are planning to add dimension fields for this data_streams that are blocked by mentioned in description issues, in this PR? or you plan to move those data_streams to another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be moved to another PR. I will remove this dimension, but I will leave the ecs ones, just to not cause confusion then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the one for cluster stats (as I believe it is also not necessary). I am leaving the enrich dimensions though, even if it is not migrated - I will validate it again when the issue is resolved.

processors:
- fingerprint:
fields:
- elasticsearch.ingest_pipeline.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type: wildcard is the case as type: object ?

fields:
    - name: name
      type: wildcard
      description: Name / id of the ingest pipeline

can you please share sample of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please share sample of it?

Sorry, I don't understand. A sample of the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand. A sample of the error?

sample of the document - part of the document that include this field, there is missing same_event for this data_stream, can't check it there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a sample document

{
  "_index": ".ds-metrics-elasticsearch.ingest_pipeline-default-2023.06.23-000001",
  "_id": "5p2D54gBH7q8D4JF6839",
  "_version": 1,
  "_score": 0,
  "_source": {
    "agent": {
      "name": "kind-control-plane",
      "id": "a781ce37-a210-49d3-8344-6518fb35d4ac",
      "type": "metricbeat",
      "ephemeral_id": "55936ecf-0dfb-474d-9031-284992efdf8a",
      "version": "8.8.0"
    },
    "@timestamp": "2023-06-23T09:09:21.280Z",
    "elasticsearch": {
      "node": {
        "roles": [
          "data_content",
          "data_hot",
          "ingest",
          "master",
          "remote_cluster_client",
          "transform"
        ],
        "name": "instance-0000000000",
        "id": "J_W-dXFXTxuXnGCwbCb6Iw"
      },
      "cluster": {
        "name": "985f2ca8e1a74327aa2c698275330b90",
        "id": "SyM7nU1DRmKd3soposFsXg"
      },
      "ingest_pipeline": {
        "total": {
          "count": 428,
          "failed": 0,
          "time": {
            "total": {
              "ms": 0
            },
            "self": {
              "ms": 0
            }
          }
        },
        "name": "metrics-elasticsearch.stack_monitoring.cluster_stats-1.7.4",
        "name_fingerprint": "LX8WOW8tc72gcK7v5HOrWtDf6v4="
      }
    },
    "ecs": {
      "version": "8.0.0"
    },
    "data_stream": {
      "namespace": "default",
      "type": "metrics",
      "dataset": "elasticsearch.ingest_pipeline"
    },
    "service": {
      "address": "https://test-es-3.es.us-central1.gcp.cloud.es.io:9243",
      "type": "elasticsearch"
    },
    "elastic_agent": {
      "id": "a781ce37-a210-49d3-8344-6518fb35d4ac",
      "version": "8.8.0",
      "snapshot": true
    },
    "host": {
      "hostname": "kind-control-plane",
      "os": {
        "kernel": "5.15.49-linuxkit",
        "codename": "focal",
        "name": "Ubuntu",
        "type": "linux",
        "family": "debian",
        "version": "20.04.6 LTS (Focal Fossa)",
        "platform": "ubuntu"
      },
      "containerized": false,
      "ip": [
        "10.244.0.1",
        "10.244.0.1",
        "10.244.0.1",
        "172.18.0.2",
        "fc00:f853:ccd:e793::2",
        "fe80::42:acff:fe12:2",
        "172.25.0.4"
      ],
      "name": "kind-control-plane",
      "id": "e12fa0193ee24a5cae5f9665f6e7eb8c",
      "mac": [
        "02-42-AC-12-00-02",
        "02-42-AC-19-00-04",
        "22-DE-5A-26-82-AC",
        "3A-AE-FC-E1-7E-8C",
        "7E-91-38-58-97-2B"
      ],
      "architecture": "x86_64"
    },
    "metricset": {
      "period": 10000,
      "name": "ingest_pipeline"
    },
    "event": {
      "duration": 275991722,
      "agent_id_status": "verified",
      "ingested": "2023-06-23T09:09:22Z",
      "module": "elasticsearch",
      "dataset": "elasticsearch.ingest_pipeline"
    }
  },
  "fields": {
    "elastic_agent.version": [
      "8.8.0"
    ],
    "elasticsearch.ingest_pipeline.name_fingerprint": [
      "LX8WOW8tc72gcK7v5HOrWtDf6v4="
    ],
    "host.hostname": [
      "kind-control-plane"
    ],
    "host.mac": [
      "02-42-AC-12-00-02",
      "02-42-AC-19-00-04",
      "22-DE-5A-26-82-AC",
      "3A-AE-FC-E1-7E-8C",
      "7E-91-38-58-97-2B"
    ],
    "service.type": [
      "elasticsearch"
    ],
    "host.ip": [
      "10.244.0.1",
      "10.244.0.1",
      "10.244.0.1",
      "172.18.0.2",
      "fc00:f853:ccd:e793::2",
      "fe80::42:acff:fe12:2",
      "172.25.0.4"
    ],
    "agent.type": [
      "metricbeat"
    ],
    "event.module": [
      "elasticsearch"
    ],
    "host.os.version": [
      "20.04.6 LTS (Focal Fossa)"
    ],
    "elasticsearch.ingest_pipeline.total.time.total.ms": [
      0
    ],
    "host.os.kernel": [
      "5.15.49-linuxkit"
    ],
    "host.os.name": [
      "Ubuntu"
    ],
    "agent.name": [
      "kind-control-plane"
    ],
    "host.name": [
      "kind-control-plane"
    ],
    "elastic_agent.snapshot": [
      true
    ],
    "event.agent_id_status": [
      "verified"
    ],
    "host.id": [
      "e12fa0193ee24a5cae5f9665f6e7eb8c"
    ],
    "elasticsearch.node.roles": [
      "data_content",
      "data_hot",
      "ingest",
      "master",
      "remote_cluster_client",
      "transform"
    ],
    "elasticsearch.node.id": [
      "J_W-dXFXTxuXnGCwbCb6Iw"
    ],
    "elasticsearch.cluster.name": [
      "985f2ca8e1a74327aa2c698275330b90"
    ],
    "elasticsearch.ingest_pipeline.total.failed": [
      0
    ],
    "host.os.type": [
      "linux"
    ],
    "elastic_agent.id": [
      "a781ce37-a210-49d3-8344-6518fb35d4ac"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "elasticsearch.ingest_pipeline.total.time.self.ms": [
      0
    ],
    "metricset.period": [
      10000
    ],
    "host.os.codename": [
      "focal"
    ],
    "elasticsearch.ingest_pipeline.name": [
      "metrics-elasticsearch.stack_monitoring.cluster_stats-1.7.4"
    ],
    "data_stream.type": [
      "metrics"
    ],
    "event.duration": [
      275991722
    ],
    "elasticsearch.cluster.id": [
      "SyM7nU1DRmKd3soposFsXg"
    ],
    "host.architecture": [
      "x86_64"
    ],
    "metricset.name": [
      "ingest_pipeline"
    ],
    "event.ingested": [
      "2023-06-23T09:09:22.000Z"
    ],
    "@timestamp": [
      "2023-06-23T09:09:21.280Z"
    ],
    "elasticsearch.node.name": [
      "instance-0000000000"
    ],
    "agent.id": [
      "a781ce37-a210-49d3-8344-6518fb35d4ac"
    ],
    "elasticsearch.ingest_pipeline.total.count": [
      428
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "host.os.platform": [
      "ubuntu"
    ],
    "host.containerized": [
      false
    ],
    "service.address": [
      "https://test-es-3.es.us-central1.gcp.cloud.es.io:9243"
    ],
    "data_stream.dataset": [
      "elasticsearch.ingest_pipeline"
    ],
    "agent.ephemeral_id": [
      "55936ecf-0dfb-474d-9031-284992efdf8a"
    ],
    "agent.version": [
      "8.8.0"
    ],
    "host.os.family": [
      "debian"
    ],
    "event.dataset": [
      "elasticsearch.ingest_pipeline"
    ]
  }
}

Copy link
Contributor

@tetianakravchenko tetianakravchenko Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at this sample, "name": "metrics-elasticsearch.stack_monitoring.cluster_stats-1.7.4", seems to be a keyword.
And seems that wildcard belong to the keyword family - https://www.elastic.co/guide/en/elasticsearch/reference/7.17/keyword.html#keyword

does adding a dimension on name field fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the wildcart is not valid to be a dimension.

@@ -37,6 +37,7 @@
Node ID
- name: name
type: keyword
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't node.id be a better candidate? it could be not unique for multiple clusters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each cluster only has one service.address, so the combination service.address + node.name should be unique

Copy link
Contributor

@tetianakravchenko tetianakravchenko Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service.address is a host address defined in the configuration, so it could be for example localhost:9200 if the agent is running on the same instance with the elasticsearch - that is not unique enough
node.name from my understanding it is a hostname, isnt it? so it can be the same for multiple clusters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localhost:9200 is a default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the integration to work you need to give the service.address. This way, if you give to the same ES integration the same service.address, you will be receiving metrics from the same clusters as before. I tested with with a local cluster and one on the cloud.

Copy link
Contributor Author

@constanca-m constanca-m Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.name from my understanding it is a hostname, isnt it? so it can be the same for multiple clusters

The service.address uniquely identifies a cluster for an ES integration, and since node.name is unique per cluster, that combination is enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the integration to work you need to give the service.address. This way, if you give to the same ES integration the same service.address, you will be receiving metrics from the same clusters as before.

why? If I set service.address as localhost:9200, install agent on different nodes and use the same policy for those node, I will get correct data

The service.address uniquely identifies a cluster for an ES integration, and since node.name is unique per cluster, that combination is enough.

but there can be the same node.name for 2 different clusters. It is not unique

example: I have 2 different instance: es-test and es-test2 in the same gcp account (it is just for the test, more realistic: have instance with the same name in different accounts/in different cloud providers, just for the test I've changes the hostname of es-test2 to es-test):
Screenshot 2023-06-23 at 23 22 31

service.address the same for both nodes, node.name as well. Since I did not change default value - cluster.name the same as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused.

To install the integration in some policy you need to set the service.address:
image

This way, the service.address is unique. You cannot connect to two different clusters using the same service.address. So the service.address uniquely identifies a cluster.

If I set service.address as localhost:9200, install agent on different nodes and use the same policy for those node, I will get correct data

So install two different agents? The agent.id is a dimension, so there is no overlapping. If the service.address for the ES is different, there is also no overlap. Otherwise, there is as it should be.

but there can be the same node.name for 2 different clusters. It is not unique

We always have value for service.address. The node.name is unique per cluster, so service.address + node.name is unique.

I tested it it by adding to the policy:

  • 1 local elastic agent
  • 1 cluster with 3 nodes
  • Another cluster with 3 nodes (this one so I could update the version)

I didn't get any overlap.

@@ -316,6 +316,7 @@
Node ID
- name: name
type: keyword
dimension: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same as for node

@constanca-m
Copy link
Contributor Author

All data_streams are missing fields that were defined in this convention: #5193 (comment)
CCR:
...
enrich:

I did not migrate these data streams. There are issues still pending (more in the description) @tetianakravchenko

@constanca-m
Copy link
Contributor Author

Index:

elasticsearch.cluster.id I think is missing. there could be a case for the same name of the index in multiple clusters

The service.address is unique, since the integration cannot connect to two different clusters with the same service.address. So service.address + index.name is unique.

@constanca-m
Copy link
Contributor Author

ingest_pipeline:
do you think it is needed to add node.id or the pipeline name/id is the same on all nodes?

ml:
should node.id be added

Both the id for a ML job and the name for ingest pipeline are unique per cluster, so setting them as a dimension is enough.

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@agithomas
Copy link
Contributor

Any specific reason why not all common dimensions (8 nos) are not included ?

@constanca-m
Copy link
Contributor Author

Any specific reason why not all common dimensions (8 nos) are not included ?

Are you talking about the ECS dimensions? it is in the description. service.address is unique, no more should be needed in that regard @agithomas

@tetianakravchenko
Copy link
Contributor

Any specific reason why not all common dimensions (8 nos) are not included ?

I've added this comment as well - #6623 (review)

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!. However, my recommendation is to have all ECS common dimensions included to avoid the risk

@constanca-m constanca-m merged commit b7bb08a into elastic:main Jun 27, 2023
@constanca-m constanca-m deleted the es-add-dimensions branch June 27, 2023 09:11
@elasticmachine
Copy link

Package elasticsearch - 1.8.0 containing this change is available at https://epr.elastic.co/search?package=elasticsearch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:elasticsearch Elasticsearch Team:Infra Monitoring UI - DEPRECATED Label for the Infrastructure Monitoring UI team. - DEPRECATED - Use Team:obs-ux-infra_services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants