Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic ECS mapping progress #5055

Closed
P1llus opened this issue Jan 19, 2023 · 27 comments
Closed

Dynamic ECS mapping progress #5055

P1llus opened this issue Jan 19, 2023 · 27 comments
Assignees
Labels
enhancement New feature or request

Comments

@P1llus
Copy link
Member

P1llus commented Jan 19, 2023

This issue is to track the progress and discussion around using the newly merged dynamic ECS template from elastic package: elastic/elastic-package#1073

Status on specific tests can be found in the issue comments below, but as an overview:

  • Custom input testing - Done, working without issues.
  • SEI Integrations - Done, working without issues
  • Endpoint - Initial testing performed, only a minor issue requires investigation.
  • APM - Initial testing performed, no apparent technical issues, but needs some discussion around implementation
  • Observability Integrations - Done, currently looking at next possible steps.
  • Cloud Defense - Initial testing done, outside of integration package, no issues yet
  • Cloud Security Posture - Initial testing done, outside of integration package, no issues yet

This was initially created for common usecases in custom input packages, that do not have any out of the box ECS mapping, and to cover beat processors like add_host_metadata that produces ECS fields that might not already be mapped: elastic/elastic-package#1018

However this later evolves into replacing the need for adding ECS fields manually in each integration package, allowing all integration packages to map ECS field types automatically from a single source, to make it all more consistent.

This feature is currently in a TESTING phase until further notice, and package developers could use the below process to test their own packages with the dynamic template:

  1. Add the required field in _dev/build/build.yml
dependencies:
  ecs:
    reference: git@8.6
    import_mappings: true
  1. Remove the ecs.yml file from each datastream in the package.
  2. Ensure that the base-fields.yml still has mappings for @timestamp and data_stream fields
- name: data_stream.type
  type: constant_keyword
  description: Data stream type.
- name: data_stream.dataset
  type: constant_keyword
  description: Data stream dataset name.
- name: data_stream.namespace
  type: constant_keyword
  description: Data stream namespace.
- name: event.module
  type: constant_keyword
  description: Event module
  value: ti_abusech
- name: event.dataset
  type: constant_keyword
  description: Event dataset
  value: ti_abusech.malware
- name: "@timestamp"
  type: date
  description: Event timestamp.
  1. Update format_version in your root manifest.yml to 2.3.0
  2. Ensure you build the latest elastic-package
  3. Either spin up the stack and test the package manually, or if you have configured system test

Related issues:

elastic/elasticsearch#85692
#4961
elastic/ecs#1869
elastic/elasticsearch#89743
elastic/elastic-package#1018
elastic/elastic-package#1073

@P1llus P1llus self-assigned this Jan 19, 2023
@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@P1llus
Copy link
Member Author

P1llus commented Jan 19, 2023

I have tested this with a few different packages like ti_abusech and checkpoint and they are working just fine, awaiting test results from a few other package teams as well.

Initially this will be added to the custom input packages as a first phase.

@P1llus
Copy link
Member Author

P1llus commented Jan 24, 2023

Currently awaiting status updates from 4-5 different teams that are currently testing, will add any updates here once the feedback is collected.

@muthu-mps
Copy link
Contributor

I have tested with two different integrations postgresql and iis integrations and they are working fine, I didn't see any issues other than postgresql log data-stream which has ECS fields infields.ymlthat needs to be removed before building the package.

@P1llus
Copy link
Member Author

P1llus commented Feb 1, 2023

Testing for cloud_defence has also been successful, awaiting some results from endpoint as well.

It is not important at this stage to test every package, but rather try to have as diverse set of tests as possible, currently seems to be working quite well!

@P1llus
Copy link
Member Author

P1llus commented Feb 1, 2023

One note that I can see is that fields do not get the ignore_above flag when it matches a dynamic template rule, because it is not set.

Before going live we should look into if this should be necessary, as fleet sets this by default, but somehow this is overriden.

@P1llus
Copy link
Member Author

P1llus commented Feb 1, 2023

During one of the tests, it was noted that since the dynamic template focuses only on ECS fields that are not of type keyword, any other field should be mapped to keyword by default.

However if the source of the data actually comes in as INT/Long, we can be used to the value being automatically converted to string. I have seen one example of this with group.id and user.id.

The best solution for this is a few different steps:

  1. Continue to monitor progress and see if this happens more often.
  2. If implementing dynamic templates for packages that do not have system tests, we should be more considerate when investigating, because system tests would point out that ECS keyword fields are of the wrong type.
  3. If more feedback returns this for the same group of fields (for example only *.id fields then it might be good to consider adding them to the dynamic template.

@felixbarny
Copy link
Member

felixbarny commented Feb 2, 2023

I'd love to get these dynamic mappings into the default mappings for logs-*-*.

@P1llus
Copy link
Member Author

P1llus commented Feb 3, 2023

I'd love to get these dynamic mappings into the default mappings for logs-*-*.

Thanks @felixbarny
It is certainly something we could look into! The reason it has not been brought up yet is because we wanted the initial functionality to be opt-in, as there are certain edge cases we might want to catch first!

Let's pick up the initial discussion sometime soon, might want to include @ruflin as well, we would first need to map out any potential pitfalls it could create.

@P1llus
Copy link
Member Author

P1llus commented Feb 3, 2023

Testing the dynamic template for APM was performed here: elastic/apm-server#10155 and the output of the initial tests have been very positive!

@ruflin
Copy link
Member

ruflin commented Feb 13, 2023

Really great to see this moving forward and also having apm on board. @felixbarny ++ on bringing this eventually to logs-*-* etc. and is also where the discussion started (elastic/elasticsearch#85692). We first "proof" the model and then make it available more broadly.

@ruflin
Copy link
Member

ruflin commented Feb 20, 2023

In https://github.com/elastic/elastic-package/pull/1093/files#diff-46325f013498dcd5e9ec6be22768014c10d9685cd852382599d3ad7cc5cc4b4c, a lot of repetition happens which makes reading and maintenance harder. Being able to specify arrays for matching types would be very helpful. Similar thing is also discussed here: elastic/elasticsearch#66364 @javanna Would be nice to get this prioritised.

This is not a blocker for this effort but I think would help the overall usability.

@ruflin
Copy link
Member

ruflin commented Feb 21, 2023

Eventually we should make this feature the default when generating a new package / data_stream, meaning it should be added by the create commands.

@P1llus
Copy link
Member Author

P1llus commented Mar 22, 2023

This issue was used to track the progress of the initial implementation. There has been more coverage over time, but the usefulness of this issue is done, so I will close this for now.

Any new initiative moving forward further will require a new issue.

@P1llus P1llus closed this as completed Mar 22, 2023
@ruflin
Copy link
Member

ruflin commented Mar 22, 2023

So excited to see this closed 🎉

@ruflin
Copy link
Member

ruflin commented Mar 28, 2023

@P1llus How can we make sure this adopted by most integrations moving forward? Can we make this the default in the generator?

@rameshelastic @lalit-satapathy @andresrc Lets make sure we adopt this in our integrations.

@ishleenk17
Copy link
Contributor

ishleenk17 commented Aug 14, 2023

@P1llus
There are certain issues which have been reported across integrations:

  • Type Conflict issue as a certain field is not present in the ecs.yml. Eg: NGINX, System.
  • A certain field not defined (Eg: service.address).

Would usage of dynamic mapping resolve the above issues?
On trying out, The conflict issues get resolved, but do all the ecs fields get mapped with dynamic mapping? service.address doesn't get added.

How is it chosen that which ecs fields are added for which Integration?

@ishleenk17
Copy link
Contributor

Also, if we include dynamic mapping, how will the TSDB mapping happen?
There are quite a lot of ecs field tagged as dimensions. Eg: Perf datastream in MYSQL

Will TSDB and dynamic mapping work together?

@P1llus
Copy link
Member Author

P1llus commented Aug 14, 2023

@ishleenk17 The dynamic templates was indeed created to resolve those type of issues, though be aware that type conflicts happens because 1 integration has the wrong field type.
Meaning that if you use a dynamic template, and the field is mapped correctly, another integration might not use it, and have the wrong field type, which would then still result in a field conflict, so it does resolve it for your specific integration, but not for all of them (yet).

We are also at a impass here, because the functionality I describe above was meant as a stepping stone, recently the dynamic template was merged into Elasticsearch as well, but I do not believe we have functionality for integrations to use that yet, the main end goal is for the implementation defined here (which is just a hardcoded template in elastic-package), to be replaced with that bundled dynamic template instead.

All ECS fields was defined at some point, but this implementation has not been updated, as we rather want to start using that bundled dynamic template instead.
The way that it works is that it defaults all fields to keywords, unless another type is specified, so none of the fields are undefined.
It uses regex/glob patterns to define all the fields that are of another type than keyword, here is the patterns: https://github.com/elastic/elastic-package/blob/main/internal/builder/_static/ecs_mappings.yaml

However we really want to start using the version bundled with ES, as its more up to date and uses new dynamic template features which makes it much cleaner: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/template-resources/src/main/resources/ecs%40mappings.json

For the TSDB question you would have to check with @ruflin :)

@ishleenk17
Copy link
Contributor

Meaning that if you use a dynamic template, and the field is mapped correctly, another integration might not use it, and have the wrong field type, which would then still result in a field conflict, so it does resolve it for your specific integration, but not for all of them (yet).

I think the end goal would be to remove ecs.yml from all integrations and use dynamic mapping. If that is done I expect all Integrations to be free of type conflict issue. (which we are seeing quite a lot lately).

implementation defined here (which is just a hardcoded template in elastic-package)

I see you have mentioned a hardcoded template in EP. So that means it will always be the same set of ecs fields for all Integrations? Or does it vary? We have observed that in some Integrations service.* fields are present, in others, they aren't . Any specific reason for this?

However we really want to start using the version bundled with ES, as its more up to date and uses new dynamic template features which makes it much cleaner:

Should we look into using dynamic mapping for resolving conflict issues? Or shall we wait for the ES feature to be enabled for Integrations? If we have to wait up, then for now we have to resort to manually handling all these issues.

For the TSDB question you would have to check with @ruflin :)

I suppose this will be an important aspect @ruflin. Be it whatever approach we take, the ES one or the current one.

cc: @lalit-satapathy @rameshelastic

@P1llus
Copy link
Member Author

P1llus commented Aug 14, 2023

Meaning that if you use a dynamic template, and the field is mapped correctly, another integration might not use it, and have the wrong field type, which would then still result in a field conflict, so it does resolve it for your specific integration, but not for all of them (yet).

I think the end goal would be to remove ecs.yml from all integrations and use dynamic mapping. If that is done I expect all Integrations to be free of type conflict issue. (which we are seeing quite a lot lately).

Absolutely, one way or the other, we should not need to define ECS fields manually in the end, and will help us making future integration work more scaleable, when the number of integrations becomes even larger.

implementation defined here (which is just a hardcoded template in elastic-package)

I see you have mentioned a hardcoded template in EP. So that means it will always be the same set of ecs fields for all Integrations? Or does it vary? We have observed that in some Integrations service.* fields are present, in others, they aren't . Any specific reason for this?

The dynamic template covers all ECS fields, for all integrations, but they are not field mappings. It will only create an actual mapping of a field once it has been used at least once, which makes it great to keep the field count down.
When you mention service.* is not present, do you mean that it has no value? Or that the field is not mapped?
From my understanding, integrations that did not have many changes to fields or pipelines over quite some time, has now been "ignored" by CI, and the issue won't really be displayed to us until we actually make a change to that specific integration.
For security integrations, we moved all our integrations over to the newest format_version, this helped us greatly to catch older bugs or field mapping conflicts.

However we really want to start using the version bundled with ES, as its more up to date and uses new dynamic template features which makes it much cleaner:

Should we look into using dynamic mapping for resolving conflict issues? Or shall we wait for the ES feature to be enabled for Integrations? If we have to wait up, then for now we have to resort to manually handling all these issues.
I think that should be up to you, if this is very metrics focused, then I would recommend waiting, as the bundled version in Elastic Package does not take dimensional fields into consideration, if its just logs then it should be fine.

In the end we should have all integrations use the new version of the template bundled with ES.

For the TSDB question you would have to check with @ruflin :)

I suppose this will be an important aspect @ruflin. Be it whatever approach we take, the ES one or the current one.

cc: @lalit-satapathy @rameshelastic

@ruflin
Copy link
Member

ruflin commented Aug 16, 2023

@ishleenk17 Do the integrations that had some conflicts use the most recent format_version and the dynamic ECS templates?

For TSDB + ECS dynamic templates, I don't see a conflict here, maybe @felixbarny does?

The ECS templates have now landed in Elasticsearch (thanks @eyalkoren ). We now need a plan on how this template can also be used in integrations. Unfortunately it will mean, this template only will become available to integrations running on newer stack versions.

@P1llus Would it make sense the template that you currently build to align with the ES one? This means all updated integrations already get the newest / most complete template?

@ishleenk17
Copy link
Contributor

@ishleenk17 Do the integrations that had some conflicts use the most recent format_version and the dynamic ECS templates?

No, most of them are not in the recent format_version. They don't use the dynamic ecs template. They are using the ecs.yml

@ruflin
Copy link
Member

ruflin commented Aug 17, 2023

They don't use the dynamic ecs template
Should we switch over?

@ishleenk17
Copy link
Contributor

They don't use the dynamic ecs template
Should we switch over?

If we plan to move to ES template for ECS in future, I am not sure if we should migrate to the current dynamic template.
Also, we need to sort out the TSDB+dynamic ECS field aspect. How will we mark some of the ecs fields as dimensions in case of dynamic mapping.

@ruflin
Copy link
Member

ruflin commented Aug 17, 2023

If we plan to move to ES template for ECS in future, I am not sure if we should migrate to the current dynamic template.

My assumption is the change to dynamic templates is only a config change so low effort. Is this assumption correct? The effort to move to ES is larger and will take longer.

we need to sort out the TSDB+dynamic ECS field aspect. How will we mark some of the ecs fields as dimensions in case of dynamic mapping.

What is the conflict here? I expect the dynamic template does not contain any dimensions. The dimension fields we specify field by field and these dimension fields win over the dynamic ones?

@felixbarny
Copy link
Member

For TSDB + ECS dynamic templates, I don't see a conflict here, maybe @felixbarny does?

No, I don't think there's a conflict. But the dynamic templates don't define any dimensions. That can be done with manual field mappings, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants