Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create decrompress processor to decompress gzipped keys #4118

Merged
merged 2 commits into from
Feb 14, 2024

Conversation

graytaylor0
Copy link
Member

@graytaylor0 graytaylor0 commented Feb 12, 2024

Description

Adds a decompress processor to decompress specific fields in Events.

Base configuration is

- decompress:
     keys: [ "key_one", "key_two" ]
     type: gzip

This processor is extendible to more types, but it currently only supports gzip.

This processor assumes that all fields to be decompressed are base64 encoded.

Issues Resolved

Resolves #4016

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have a few requests.

public class DecompressProcessor extends AbstractProcessor<Record<Event>, Record<Event>> {

private static final Logger LOG = LoggerFactory.getLogger(DecompressProcessor.class);
static final String DECOMPRESSION_PROCESSING_ERRORS = "decompressionProcessingErrors";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add decompression in this name? It will already be scoped to the processor, so this will yield:

my-pipeline.decompress.decompressionProcessingErrors

Why not just make it processingErrors?

@DataPrepperPluginConstructor
public DecompressProcessor(final PluginMetrics pluginMetrics,
final DecompressProcessorConfig decompressProcessorConfig,
final ExpressionEvaluator expressionEvaluator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please validate the expression in the constructor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to do the validation in the Config.java? I thought that's the convention we follow. I am ok with this too. Just checking.


import org.opensearch.dataprepper.model.codec.DecompressionEngine;

public interface IDecompressionType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IInterfaceName convention is a .NET convention and not used often in Java. Additionally, we can improve our interfaces by describing what they are capable of rather than name them based on a type. In this case, this could be HasDecompressionEngine, or DecompressionEngineFactory. Both of those names align with existing conventions in this project and in Java.


package org.opensearch.dataprepper.plugins.processor.decompress.encoding;

public interface IEncodingType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about the I in the name.

import static org.opensearch.dataprepper.plugins.processor.decompress.DecompressProcessorTest.buildRecordWithEvent;

@ExtendWith(MockitoExtension.class)
public class ITDecompressProcessorTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the name DecompressProcessorIT. This works with Gradle and is how other integration tests in the project are named.


}

private String getDecompressedString(final BufferedReader bufferedReader) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably use commons-io to replace this code. The class org.apache.commons.io.output.IOUtils has a method String toString(InputStream input, Charset charset). It should work for this.

* SPDX-License-Identifier: Apache-2.0
*/

plugins {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this block. The base Gradle project adds it to all sub-projects. You can remove it.

import java.util.Map;
import java.util.stream.Collectors;

public enum DecompressionType implements IDecompressionType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some unit tests for the string to enum parsing. See ForwardingAuthenticationTest which does something similar for another enum. Also see that some tests use @EnumSource to help with future additions.

import java.util.Map;
import java.util.stream.Collectors;

public enum EncodingType implements IEncodingType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about adding tests for the name parsing.

import java.io.InputStreamReader;
import java.util.Collection;

@DataPrepperPlugin(name = "decompress", pluginType = Processor.class, pluginConfigurationType = DecompressProcessorConfig.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a documentation issue so that we track the need to document this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Taylor Gray <tylgry@amazon.com>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks!

@graytaylor0 graytaylor0 merged commit 3e3f302 into opensearch-project:main Feb 14, 2024
72 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decompress processor
3 participants