Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better out-of-the-box mappings for logs, metrics and synthetics #64978

Merged
merged 24 commits into from
May 4, 2021
Merged
Changes from 3 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
1100b95
Proposal: Default templates: Path match for ip and message
ruflin Nov 12, 2020
dae84d6
Update x-pack/plugin/core/src/main/resources/logs-mappings.json
ruflin Nov 12, 2020
90ca4b1
update mapping with discussion
ruflin Nov 20, 2020
35d0727
Update x-pack/plugin/core/src/main/resources/logs-mappings.json
ruflin Nov 26, 2020
ce72eef
Update x-pack/plugin/core/src/main/resources/logs-mappings.json
ruflin Nov 26, 2020
2e130a5
Put the most specific dynamic templates first.
jpountz Nov 26, 2020
7497f29
Add tests for dynamic templates.
jpountz Nov 26, 2020
ce711a7
add mappings to metrics and synthetics too
ruflin Dec 22, 2020
30e97b3
Merge branch 'master' into path-match-ip-message
ruflin Dec 22, 2020
74cb397
add index renaming
ruflin Dec 22, 2020
67bed57
Merge branch 'master' into path-match-ip-message
jpountz Apr 21, 2021
863bae7
Increment registry version.
jpountz Apr 21, 2021
5dc16be
Trim message field from metrics and synthetics templates.
jpountz Apr 21, 2021
9ea0a35
Fix tests.
jpountz Apr 21, 2021
4707606
Factor conventions that are the same for all types into a single file.
jpountz Apr 21, 2021
d6eb0af
Merge branch 'master' into path-match-ip-message
jpountz Apr 21, 2021
4c04815
Merge remote-tracking branch 'origin/master' into path-match-ip-message
jpountz Apr 22, 2021
c9249b5
Fix data_stream type.
jpountz Apr 22, 2021
db58ab9
Simplify host and observer mappings.
jpountz Apr 22, 2021
9950d8b
Fix indentation.
jpountz Apr 22, 2021
c4cb7fd
Fix test failure.
jpountz Apr 22, 2021
f7376e5
Move `message` dynamic template to the shared mapping template.
jpountz May 3, 2021
42c9f58
Remove `observer` field.
jpountz May 3, 2021
89eb06d
Merge branch 'master' into path-match-ip-message
jpountz May 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions x-pack/plugin/core/src/main/resources/logs-mappings.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,21 @@
"type": "keyword"
},
"match_mapping_type": "string"
},
ruflin marked this conversation as resolved.
Show resolved Hide resolved
"match_ip": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz With the recent development in ES, I started to ask myself if these should actually all be dynamic mapping for runtime fields? It would change the PR a bit as I think in this case we should bring back the global message field but everything else should match as runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, having runtime fields would allow us to potentially also have runtime fields for things like *_time or *_date so these would work and still not be indexed. It would also allow an "easy" fix in case the detection was not correct (I assume).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic mappings are going to be a security issue. Are there other ways to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin Can you dig a bit more into why you're thinking about using runtime fields?

If you are worried of failing to index some documents, then maybe we should add ignore_malformed:true to these ip mappings?

If you would like to allow users to change their minds, then either runtime fields or concrete fields is fine since we allow runtime fields to shadow the definition of concrete fields, so users would have the leisure of configuring their field differently via a runtime field on existing indices if they wanted to regardless of the decision we would make here.

@scunningham There is no other way to do it. My understanding of the parallel thread on #68414 is that we would still allow dynamic mappings for trusted endpoints, so I believe that fine-tuning dynamic templates in these base templates still provides value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets assume we have a document with several *.ip fields inside. By default currently these are mapped to keywords which is not correct. So if someone wants to query for the *.ip fields, the result is unexpected because of the wrong type. In the ideal case, these fields would not be indexed at all because my assumption is, in most cases these are not used. And if they are used, a more specific dataset with a template will likely exist and have the mappings for it.

The above goes further: We currently index all the things as keyword. Instead we should only index a few fields and not index all the others and instead use runtime fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if they are used, a more specific dataset with a template will likely exist and have the mappings for it.

I think this holds true for our integrations, but we also need to take care of users who are onboarding new sources of data, e.g. because it's custom or because we don't have an integration for it (yet)?

My understanding is that host.ip should generally not be used, so we could make this one runtime. But the source IP field of a nginx.access dataset should be indexed in order to allow users to run cardinality aggregations on this field (how many users per day) or compare IPv4 traffic vs IPv6 traffic? In general it feels hard to me to make the assumption about the data that because it is an IP field, then generally it is not useful, even if that's the case most of the time in ECS.

What about mapping host.ip as a runtime field and keeping that template for other fields whose name ends with .ip and map them as indexed ip fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with @tsg and the Security solution leverages these IP fields in a dashboard, so we should keep them indexed, at least for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsg What we discuss here is the default. Are the queries you run in the context of installed packages? If yes, these could still have a more specific definition that makes it index. The part here is only the global default if no other template exists.

"match_mapping_type": "string",
"match": "ip",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep a single space after the colon?

"mapping": {
"type": "ip"
}
},
ruflin marked this conversation as resolved.
Show resolved Hide resolved
"match_message": {
"match_mapping_type": "string",
"match": "message",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep a single space after the colon?

"mapping": {
"type": "text",
"norms": false
}
}
}
],
Expand Down Expand Up @@ -45,9 +60,6 @@
"type": "ip"
}
}
},
"message": {
"type": "text"
}
}
}
Expand Down