Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for AWS security lake sink as a bucket selector mode in S3 sink #4846

Merged
merged 9 commits into from
Aug 20, 2024

Conversation

kkondaka
Copy link
Collaborator

Description

Add support for AWS security lake sink as a bucket selector mode in S3 sink

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [ X] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

kkondaka and others added 5 commits August 19, 2024 13:09
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
…3 sink

Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
}

@Override
public Map<String, String> getMetadata(int eventCount) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is related to the bucket_selector. I'd suggest that the aws_security_lake transformation just add this as additional metadata. I also think "additional metadata" is clearer since it doesn't overwrite the existing metadata.

s3:
  bucket_selector:
    aws_security_lake
  additional_metadata:
    asl_rows: ${numberOfEvents}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable how do we plan do support ${numberOfEvents}? It is not a field in the key. It is not a function. I think it would be confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had done something similar to this in either the s3 sink or opensearch sink already where you could add some custom sink-only data (either to the index name or the S3 path). If we have that, we can follow that pattern.

Another idea is to make it somewhat less flexible by specifying the name exactly.

s3:
  additional_batch_metadata:
    aws_rows: numberOfEvents

We could require that the value be numberOfEvents exactly. In the future, maybe we add more.

Yet another approach would be less flexible, but could solve it:

s3:
  number_of_events_key: asl_rows

Comment on lines 82 to 83
int locIndex = sourceLocation.indexOf("/ext/");
return sourceLocation.substring(5, locIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int locIndex = sourceLocation.indexOf("/ext/");
return sourceLocation.substring(5, locIndex);
final String bucketSeparator = "/ext/";
int locIndex = sourceLocation.indexOf(bucketSeparator);
return sourceLocation.substring(bucketSeparator.length(), locIndex);

import software.amazon.awssdk.services.securitylake.model.CustomLogSourceConfiguration;
import software.amazon.awssdk.services.securitylake.model.CreateCustomLogSourceResponse;
import software.amazon.awssdk.services.securitylake.model.CustomLogSourceCrawlerConfiguration;
//import software.amazon.awssdk.services.securitylake.model.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove

final LocalDate now = LocalDate.now();
final String eventDay = String.format("%d%02d%02d", now.getYear(), now.getMonthValue(), now.getDayOfMonth());
int locIdx = sourceLocation.indexOf("/ext/");
pathPrefix = sourceLocation.substring(locIdx+1)+"region="+region+"/accountId="+accountId+"/eventDay="+eventDay+"/";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only create this once. Use String.format() for clarity.

final String accountId=arn.split(":")[4];
final LocalDate now = LocalDate.now();
final String eventDay = String.format("%d%02d%02d", now.getYear(), now.getMonthValue(), now.getDayOfMonth());
int locIdx = sourceLocation.indexOf("/ext/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can extract this "/ext/" into a constant. You also use it below.

@JsonProperty("bucket_selector")
private PluginModel bucketSelector;

@AssertTrue(message = "Only one of bucket or bucket_selector option should be specified")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@AssertTrue(message = "Only one of bucket or bucket_selector option should be specified")
@AssertTrue(message = "You may not use both bucket and bucket_selector together in one S3 sink.")

This message should be more readable.

Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good!

@@ -39,5 +43,6 @@ public int hashCode() {

public Map<String, Object> getGroupIdentifierHash() { return groupIdentifierHash; }

public Map<String, String> getMetadata(int eventCount) { return predefinedObjectMetadata != null ? Map.of(predefinedObjectMetadata.getNumberOfObjects(), Integer.toString(eventCount)) : null; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally, I think this is not the place to do this. The metadata could become more involved and would require additional inputs. But, I think this can work for now. Maybe we can clean it up in a follow-on PR.

@Size(min = 3, max = 500, message = "bucket length should be at least 3 characters")
private String bucketName;

@JsonProperty("bucket_selector")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add @JsonPropertyDescriptions to these if possible. Can be a follow up PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do follow up PR

import com.fasterxml.jackson.annotation.JsonProperty;
public class PredefinedObjectMetadata {
@JsonProperty("number_of_objects")
private String numberOfObjects;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a String? Sounds like it would be integer with the name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call this number_of_objects_key. Will do it in the next PR

@kkondaka kkondaka merged commit 385d438 into opensearch-project:main Aug 20, 2024
47 checks passed
kkondaka added a commit to kkondaka/kk-data-prepper-f2 that referenced this pull request Aug 20, 2024
…3 sink (opensearch-project#4846)

* dplive1.yaml

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Delete .github/workflows/static.yml

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Add support for AWS security lake sink as a bucket selector mode in S3 sink

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Fixed tests

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Added javadoc for S3BucketSelector

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Added new  tests for KeyGenerator

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Added new  tests and fixed style errors

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Addressed review comments

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Fixed test build failure

Signed-off-by: Kondaka <krishkdk@amazon.com>

---------

Signed-off-by: Kondaka <krishkdk@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants