Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_splunk: remove raw endpoint #9007

Merged
merged 1 commit into from
Jul 2, 2024
Merged

Conversation

pmeier
Copy link
Contributor

@pmeier pmeier commented Jun 25, 2024

Fixes #8927. This does not remove the ability to send raw events, i.e. using Splunk_Send_Raw On, but rather sends them to correct endpoint.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • [N/A] Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Fixes fluent#8927. This does **not** remove the ability to send raw events,
i.e. using `Splunk_Send_Raw On`, but rather sends them to correct endpoint.

Signed-off-by: Philip Meier <github.pmeier@posteo.de>
@edsiper
Copy link
Member

edsiper commented Jun 25, 2024

is /services/collector/event able to receive raw events ?

@pmeier
Copy link
Contributor Author

pmeier commented Jun 25, 2024

@edsiper Could you define what exactly you mean by "raw events"? The term has a different meaning in fluent-bit than in splunk as explained in #8927 (comment).

@edsiper
Copy link
Member

edsiper commented Jul 1, 2024

I will double check on this, cannot remember all the details of the raw endpoint and why I implemented on that way at that moment (asking other maintainer to take a look at this too), thank you.

@edsiper edsiper added this to the Fluent Bit v3.1.0 milestone Jul 1, 2024
@cosmo0920
Copy link
Contributor

From the Splunk official docs, Fluent Bit needs to add channel parameter as a URL parameter or as a header with x-splunk-request-channel when sending events for a raw endpoint.

Channel
This endpoint requires a data channel GUID to differentiate data from different clients. Generate a GUID and provide it in a POST request as a custom HTTP header or as a parameter.

If a channel is not provided in the POST request, an error response is sent. Only valid GUIDs can be used. An error message is returned if GUID validation fails.

ref: https://docs.splunk.com/Documentation/Splunk/9.2.1/RESTREF/RESTinput#services.2Fcollector.2Fraw
ref: https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/AboutHECIDXAck#About_channels_and_sending_data

It seems that raw event point can handle JSON type of logs. Because the examples contain JSON case of sending payload.


However, Splunk's documents may complicated in this case. Because without indexer acknowledgement there is not necessity to use channels.

Sending events to HEC with indexer acknowledgment active is similar to sending them with the setting off. There is one crucial difference: when you have indexer acknowledgment turned on, you must specify a channel when you send events.

ref: https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/AboutHECIDXAck#About_channels_and_sending_data

JSON request with timestamp
curl  https://localhost:8088/services/collector/raw?channel=934793C0-FC91-467E-965A-7EAACEFBC4AB
-H 'Authorization: Splunk 934793C0-FC91-467E-965A-7EAACEFBC4AB'
-d '{"message":"Hello World", "date":"Wed Aug 10 12:27:53 PDT 2016"}'

If we use only for structured data, we're able to remove raw endpoint from out_splunk. However, I observed that raw endpoint without index acknowledgement can handle raw JSON events via raw endpoint.

Plus, if we remove raw endpoint and no needed to use specifying a raw endpoint, we need to remove splunk_send_raw config map which is defined here: https://github.com/fluent/fluent-bit/blob/master/plugins/out_splunk/splunk.c#L919-L925

@pmeier
Copy link
Contributor Author

pmeier commented Jul 1, 2024

If we use only for structured data, we're able to remove raw endpoint from out_splunk. However, I observed that raw endpoint without index acknowledgement can handle raw JSON events via raw endpoint.

Yeah, but if the event endpoint does what we want and we never sent raw strings, there is no point to ever trying to sent something to the raw endpoint. Hence, this PR.

Plus, if we remove raw endpoint and no needed to use specifying a raw endpoint, we need to remove splunk_send_raw config map which is defined here: master/plugins/out_splunk/splunk.c#L919-L925

That is not what we want. The "raw mode" in fluent-bit means that the record is sent as is to splunk without any processing (except for #8926). If activated, the user is responsible to bring the record into the right format required by splunk, for example by using a Lua filter before it. This behavior is necessary for cases when the configuration options that the out_splunk plugin provides are not sufficient. I'm facing such a use case and thus cannot use Splunk_Send_Raw Off.

When Splunk_Send_Raw Off is configured (default), the whole record is nested under the event key and one can configure other options to be inserted into the JSON data that is being sent to splunk. This is useful for a simple use case.

@cosmo0920
Copy link
Contributor

If we use only for structured data, we're able to remove raw endpoint from out_splunk. However, I observed that raw endpoint without index acknowledgement can handle raw JSON events via raw endpoint.

Yeah, but if the event endpoint does what we want and we never sent raw strings, there is no point to ever trying to sent something to the raw endpoint. Hence, this PR.

Plus, if we remove raw endpoint and no needed to use specifying a raw endpoint, we need to remove splunk_send_raw config map which is defined here: master/plugins/out_splunk/splunk.c#L919-L925

That is not what we want. The "raw mode" in fluent-bit means that the record is sent as is to splunk without any processing (except for #8926). If activated, the user is responsible to bring the record into the right format required by splunk, for example by using a Lua filter before it. This behavior is necessary for cases when the configuration options that the out_splunk plugin provides are not sufficient. I'm facing such a use case and thus cannot use Splunk_Send_Raw Off.

When Splunk_Send_Raw Off is configured (default), the whole record is nested under the event key and one can configure other options to be inserted into the JSON data that is being sent to splunk. This is useful for a simple use case.

Ah, I got it. So, using raw endpoint is currently not efficient and inappropriate in fluent-bit. This motivation is what I wanted to know. Really appreciated to describe.

I realized that this change should be reasonable. But, the behavior changes should be described in fluent-bit's documentation properly.

Here is out_splunk's documentation: https://github.com/fluent/fluent-bit-docs/blob/master/pipeline/outputs/splunk.md#sending-raw-events

I also understand what you mean in this PR. This Splunk_Send_Raw is used for sending your modified logs types of events. In some cases as described in documentation, those are intended to behave like Splunk's metrics.

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation in this PR. Currently, not modifying logs types of events should be treated as structured logs and sometimes they are used as adopted a format of Splunk metrics. For such cases, it is inappropriate usages for raw endpoint. In addition, there is no necessity to be existing raw endpoint for now.

@pmeier
Copy link
Contributor Author

pmeier commented Jul 1, 2024

But, the behavior changes should be described in fluent-bit's documentation properly.

The documentation currently doesn't say anything about the endpoint the data is being sent to. I think this is fine given that this is more of an implementation detail of fluent-bit.

As for documenting the change: Splunk_Send_Raw On has not worked since fluent-bit==1.8, when the raw endpoint was introduced. Or maybe it has initially since one might be able to send JSON data (#9007 (comment)), but it certainly doesn't work on fluent-bit==3.0.7. Meaning, I would treat this as bug fix rather than a feature change.

@edsiper edsiper merged commit 69bf966 into fluent:master Jul 2, 2024
44 checks passed
@edsiper
Copy link
Member

edsiper commented Jul 2, 2024

thanks everybody

@pmeier pmeier deleted the out_splunk/remove-raw branch July 2, 2024 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

splunk output with splunk_send_raw on is using the wrong endpoint
3 participants