Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional log level for api key read #1946

Merged
merged 12 commits into from
Oct 6, 2022
49 changes: 38 additions & 11 deletions internal/pkg/api/handleAck.go
Original file line number Diff line number Diff line change
Expand Up @@ -335,15 +335,19 @@ func (ack *AckT) handlePolicyChange(ctx context.Context, zlog zerolog.Logger, ag
}

for _, output := range agent.Outputs {
if !agent.Active {
// stop processing if agent is inactive
break
}

if output.Type != policy.OutputTypeElasticsearch {
continue
}

err := ack.updateAPIKey(ctx,
zlog,
agent.Id,
agent,
currRev, currCoord,
agent.PolicyID,
output.APIKeyID, output.PermissionsHash, output.ToRetireAPIKeyIds)
if err != nil {
return err
Expand All @@ -356,18 +360,29 @@ func (ack *AckT) handlePolicyChange(ctx context.Context, zlog zerolog.Logger, ag

func (ack *AckT) updateAPIKey(ctx context.Context,
zlog zerolog.Logger,
agentID string,
agent *model.Agent,
currRev, currCoord int64,
policyID, apiKeyID, permissionHash string,
apiKeyID, permissionHash string,
toRetireAPIKeyIDs []model.ToRetireAPIKeyIdsItems) error {

if apiKeyID != "" {
res, err := ack.bulk.APIKeyRead(ctx, apiKeyID, true)
if err != nil {
zlog.Error().
Err(err).
Str(LogAPIKeyID, apiKeyID).
Msg("Failed to read API Key roles")
if isAgentActive(ctx, zlog, ack.bulk, agent.Id) {
zlog.Error().
Err(err).
Str(LogAPIKeyID, apiKeyID).
Msg("Failed to read API Key roles")
} else {
// not an error, race when API key was invalidated before acking
zlog.Info().
Err(err).
Str(LogAPIKeyID, apiKeyID).
Msg("Failed to read invalidated API Key roles")

// update to inactive for future checks
agent.Active = false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still an error. If the ApiKey is gone, the ACK should fail. We should still output an error, but it should be an info level log message. Ie. this is an error, but it is expected.

It is also unexpected to have this call produce a side effect of changing the agent record.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @scunningham offline, approach agreed on is: returning error and propagating it to the client while decreasing log level on the server side inside handler

}
} else {
clean, removedRolesCount, err := cleanRoles(res.RoleDescriptors)
if err != nil {
Expand All @@ -393,22 +408,22 @@ func (ack *AckT) updateAPIKey(ctx context.Context,
}

body := makeUpdatePolicyBody(
policyID,
agent.PolicyID,
currRev,
currCoord,
)

err := ack.bulk.Update(
ctx,
dl.FleetAgents,
agentID,
agent.Id,
body,
bulk.WithRefresh(),
bulk.WithRetryOnConflict(3),
)

zlog.Err(err).
Str(LogPolicyID, policyID).
Str(LogPolicyID, agent.PolicyID).
Int64("policyRevision", currRev).
Int64("policyCoordinator", currCoord).
Msg("ack policy")
Expand Down Expand Up @@ -513,6 +528,18 @@ func (ack *AckT) handleUpgrade(ctx context.Context, zlog zerolog.Logger, agent *
return nil
}

func isAgentActive(ctx context.Context, zlog zerolog.Logger, bulk bulk.Bulk, agentID string) bool {
agent, err := dl.FindAgent(ctx, bulk, dl.QueryAgentByID, dl.FieldID, agentID)
scunningham marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
zlog.Warn().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why just warn? isn't it a real error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can take it like an error

Err(err).
Msg("failed to find agent by ID")
return true
}

return agent.Active // it is a valid error in case agent is active (was not invalidated)
}

// Generate an update script that validates that the policy_id
// has not changed underneath us by an upstream process (Kibana or otherwise).
// We have a race condition where a user could have assigned a new policy to
Expand Down