Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dynamic rule detection for pipeline config transformation #4601

Merged
merged 5 commits into from
Aug 17, 2024

Conversation

srikanthjg
Copy link
Contributor

@srikanthjg srikanthjg commented Jun 5, 2024

Description

Dynamically detects rules present in rules folder and applies a corresponding transformation based on plugin name.

Issues Resolved

Resolves #4600

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

kkondaka
kkondaka previously approved these changes Jun 8, 2024

Path rulesFolderPath;

if ("jar".equals(uri.getScheme())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we expect the rules in a jar file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in runtime, we would expect everything inside a jar.

Comment on lines 71 to 76
List<Path> mockRuleFiles = Arrays.asList(
Paths.get("src/test/resources/transformation/rules/documentdb-rule1.yaml"),
Paths.get("src/test/resources/transformation/rules/documentdb-rule.yaml")
);
doReturn(mockRuleFiles).when(transformersFactory).getRuleFiles();
assertTrue(mockRuleFiles.size() > 0, "There should be at least one rule file.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You defined mockRuleFiles and assert it's not empty. Does it actually test the method?

// Get the URI of the rules folder
URI uri = null;
try {
uri = getClass().getClassLoader().getResource("rules").toURI();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a predefined directory structure to ensure that we are only getting the transform rules:

org/opensearch/dataprepper/transforms/

Thus, we'd have:

org/opensearch/dataprepper/transforms/rules/
org/opensearch/dataprepper/transforms/templates/

Copy link
Contributor Author

@srikanthjg srikanthjg Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way i understand this comment and this comment, we want to move the template and rules to be plugin specific and move this into data-prepper-plugin directory, ie, each plugin gets a rule and template but there are some limitations to this approach. If for example, there is an intent by an user for which plugin does not exist and we want to tranform that to other processors, where do we place the rules and templates in this case. Example for this is with respect to security lake.

}
} else {
// File is not inside a JAR
rulesFolderPath = Paths.get(uri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this ever occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to address 2 cases:

  1. "jar" scheme: This case occurs when the code is running from within a JAR file, ie during runtime.
  2. Non-"jar" scheme: This case occurs when the code is running in a development environment where the rules directory is available in the file system and not within a jar.

public InputStream getPluginRuleFileStream(String pluginName) {
if(pluginName == null || pluginName.isEmpty()){
throw new RuntimeException("Transformation plugin not found");
public List<Path> getRuleFiles() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API contract lists out Paths for rules. Then the caller comes back and get the input stream by calling readRuleFile. You should be able to consolidate these.

Just make one method: Collection<InputStream> loadRules().

I do see that you use the file name later, I think for debugging purposes. You could make a class that holds both a name and an InputStream.

class RuleInputStream {
  private String name;
  private InputStream inputStream
}

@AllArgsConstructor
@Data
public class RuleFileEvaluation {
Boolean result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be private. And I think they can also be private.

@@ -1,3 +1,4 @@
plugin_name: "documentdb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is for testing, let's add test in the name. Maybe test-documentdb.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikanthjg do you want to make this small change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be inconsistent even within the same rule yaml. The apply_when condition for the rule yaml, we still define it as documentdb and it would looks as follows:

plugin_name: "documentdbtest"
apply_when:
  - "$..source.documentdb"

We still want use documentdb and not documentdbtest in the apply_when condition as that is what is used in the pipeline yaml; and that cannot be mocked that easily.

The plugin name in the rule yaml is only used to get the corresponding template, but it seems inconsistent if we use documentdb vs documentdbtest even within the same file for the same plugin we want to test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this work?

plugin_name: "documentdbtest"
apply_when:
  - "$..source.documentdbtest"

These files are only used for unit testing, right? So, there is no need to keep the names matching the others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will need to change like 20 files(test yaml files) in the test as all of them use documentdb. i cant just change the rule to documentdbtest and make it work. The userconfig, expected output and template depends on this. can i add this as another PR? it should it mostly replace all but i see some other issues with it.

@@ -1,2 +1,3 @@
plugin_name: "documentdb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above, please add test into the name.

break;
}
} catch (PathNotFoundException e) {
LOG.debug("Json Path not found for {}", ruleFile.getFileName().toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call .toString() here.

Map sourceOptions = new HashMap<String, Object>();
Map s3_bucket = new HashMap<>();
Map<String, Object> sourceOptions = new HashMap<>();
Map<String, Object> s3_bucket = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use correct naming conventions: s3Bucket.

sourceOptions.put("option2", null);
final PluginModel source = new PluginModel("http", sourceOptions);
Map<String, Object> sourceOptions = new HashMap<>();
Map<String, Object> s3_bucket = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about variable naming conventions.

@@ -1,3 +1,4 @@
plugin_name: "documentdb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this into the mongodb plugin project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkondaka
Copy link
Collaborator

kkondaka commented Aug 5, 2024

@srikanthjg Please address David's other comments. I will approve again and we can ask Hai to review too.

@kkondaka
Copy link
Collaborator

kkondaka commented Aug 5, 2024

@srikanthjg I am not sure if we are going to have a "real" plugin for each of the plugins in the pre-transformation config file. So, while moving documentdb related rules and templates to document db directory is a good suggestion by David, it might not be possible in all cases. For now, I suggest we just commit what you implemented in this PR and if needed, we can take the approach suggested by David in a future PR

srikanthjg and others added 3 commits August 14, 2024 12:20
…tion

Signed-off-by: Srikanth Govindarajan <srikanthjg123@gmail.com>
Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this empty file. You don't need a build.gradle file at all.

Copy link
Contributor Author

@srikanthjg srikanthjg Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build.gradle is needed to this plugin as a module. When i tried without build.gradle, classloader search was not picking up the rule file from the folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a README.md to this project to explain how to use it and test with it.

} catch (FileNotFoundException e){
LOG.error("Template File Not Found for {}", PLUGIN_NAME);
} catch (FileNotFoundException e) {
LOG.error("Template File Not Found");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the additional information for tracking the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add it back.

}
}

} catch (FileNotFoundException e) {
LOG.debug("Rule File Not Found");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error?

Also, can you include the file name in the log?

rulesPath = Paths.get(rulesURL.toURI());
} catch (FileSystemNotFoundException e) {
// Handle the case where the file system is not accessible (e.g., in a JAR)
FileSystem fileSystem = FileSystems.newFileSystem(rulesURL.toURI(), Collections.emptyMap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work? These will always be in jar files.

Copy link
Contributor Author

@srikanthjg srikanthjg Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works. It will always hit this code when dataprepper is running because it is always in jar during execution but will differ when unit testing.

@@ -64,7 +65,7 @@ public RuleEvaluatorResult isTransformationNeeded(PipelinesDataFlowModel pipelin
.build();
}
} catch (FileNotFoundException e) {
LOG.error("Template File Not Found");
LOG.error("Template File Not Found for {}", PLUGIN_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you don't even need this catch block. You can remove it. And then go back to using a local pluginName variable.

For Example:

User Config:
```aidl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```aidl
```yaml

on the template.

Rule:
```aidl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```aidl
```yaml

```

Template:
```aidl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```aidl
```yaml

```

Output:
```aidl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```aidl
```yaml

Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
@kkondaka kkondaka merged commit 38f8079 into opensearch-project:main Aug 17, 2024
72 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dynamic Rule Detection
4 participants