Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor for parsing Amazon Ion documents #3730

Closed
emmachase opened this issue Nov 30, 2023 · 2 comments · Fixed by #3803
Closed

Processor for parsing Amazon Ion documents #3730

emmachase opened this issue Nov 30, 2023 · 2 comments · Fixed by #3803
Assignees
Labels
plugin - processor A plugin to manipulate data in the data prepper pipeline.
Milestone

Comments

@emmachase
Copy link
Contributor

Is your feature request related to a problem? Please describe.
For my use-case I have nested ion documents in my input. For example:

{
  "event": "{id:\"foo...\", status: ACTIVE, timestamp: 2023-11-30T21:05:23.383Z, amount: dollars::100.0}"
}

I would like to parse these into fields so that I can index and search them in OpenSearch.

Describe the solution you'd like
A processor for parsing ion documents parse_ion, similar to parse_json, and csv.
The implementation would likely be very similar to parse_json, and perhaps under the hood they can share most of their logic, just supplying different ObjectMapper implementations for each as well as any language specific configurations.

Describe alternatives you've considered (Optional)
It's possible to preprocess simple well-formatted ion documents converting them to json in order to prepare them for parse_json using regular expressions (substitute_string), but this is hacky, probably slow, and very prone to bugs.

I have also considered creating a new intermediary service that converts the ion to json before submitting to data-prepper, but this adds additional complexity and just defeats the purpose of data-prepper in general.

Additional context
I'm willing to submit a PR for this, would like to get feedback on the idea & approach though.

@dlvenable dlvenable added plugin - processor A plugin to manipulate data in the data prepper pipeline. and removed untriaged labels Nov 30, 2023
@dlvenable
Copy link
Member

@emmachase , Thank you for this suggestion. I would suggest that we make this a new processor. This has the advantage of letting the configurations change if necessary. Perhaps we'd add certain configurations for looser parsing of one or the other. It would also be clearer for users who wouldn't look for ION processing in a JSON processor.

parse_ion:
  source: /ion-string
  destination: /data

And thank you for your interest in submitting a PR. We'd be happy to help get it merged in.

I think this could be easily accomplished by refactoring the ParseJsonProcessor class to make most of the logic go into a common class. And I'd be fine starting with the ParseIonProcessor in the same Gradle project (parse-json-processor) to keep it simple. Maybe we'd split it eventually to avoid unnecessary dependencies, but as it is all dependencies must deploy with Data Prepper.

I would suggest also having a different class for the configuration - ParseIonProcessorConfig. We recently did something similar in our kafka-plugins project where we decoupled the configurations for the Kafka buffer and source.

@dlvenable
Copy link
Member

@emmachase , Thank you for the PR. This feature will be released in Data Prepper 2.7.0, currently scheduled for early 2024.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin - processor A plugin to manipulate data in the data prepper pipeline.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants