Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filelogreceiver]: Add ability to sort by mtime #28691

Conversation

BinaryFissionGames
Copy link
Contributor

Description:

  • Adds a new ordering_criteria.mode config option to the ordering_criteria config, which can be used to select the way ordering criteria is applied (regex or mtime)
  • Add a feature gate for mtime mode

In order to do this, filtering was abstracted out into its own Filter interface, where there are now two implementations (a "Regex" implementation and a "mtime" implementation).

An optional follow-up performance improvement may be made here, to have the finder return fs.DirEntry directly to query the mtime without making an extra call to os.Stat for each file.

Link to tracking Issue: #27812

Testing:

  • Added unit tests for new functionality

Documentation:

  • Added new mode parameter to filelogreceiver docs

@@ -133,11 +169,12 @@ func (m Matcher) MatchFiles() ([]string, error) {
if len(files) == 0 {
return files, errors.Join(fmt.Errorf("no files match the configured criteria"), errs)
}
if len(m.filterOpts) == 0 {

if m.f == nil || m.f.SkipFiltering() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need SkipFiltering? Isn't it enough to check if m.f == nil?

@@ -50,7 +50,8 @@ Tails and parses logs from files.
| `retry_on_failure.initial_interval` | `1s` | [Time](#time-parameters) to wait after the first failure before retrying. |
| `retry_on_failure.max_interval` | `30s` | Upper bound on retry backoff [interval](#time-parameters). Once this value is reached the delay between consecutive retries will remain constant at the specified value. |
| `retry_on_failure.max_elapsed_time` | `5m` | Maximum amount of [time](#time-parameters) (including retries) spent trying to send a logs batch to a downstream consumer. Once this value is reached, the data is discarded. Retrying never stops if set to `0`.
| `ordering_criteria.regex` | | Regular expression used for sorting, should contain a named capture groups that are to be used in `regex_key`. |
| `ordering_criteria.method` | `regex` | Specifies the mode to use for ordering. Can be `regex` (sort using regex and sort_by rules) or `mtime` (sort by file modified time). |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this serve basically the same purpose as ordering_criteria.sort_by.sort_type? Why not add mtime as an additional option there?

This would also allow users to compose multiple types of ordering and filtering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got a little lost looking at the current implementation where every sort type requires the regex to be set, I'll take a stab at reworking to have to be a separate sort type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New implementation for sort_type here: #28850

@BinaryFissionGames
Copy link
Contributor Author

Closing in favor of #28850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants