Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected cron scaling issue when start / end are defined with different days #4677

Closed
rubroboletus opened this issue Jun 12, 2023 · 15 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Looking for support from community stale All issues that are marked as stale due to inactivity

Comments

@rubroboletus
Copy link

rubroboletus commented Jun 12, 2023

Report

We wanna to scale our deployments / statefulsets from 0 replicas to bigger number (3) during working days from 6:30AM to 10:10PM. We have used following setup:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: selenium-node-chrome
  namespace: selenium-hub
spec:
  advanced:
    restoreToOriginalReplicaCount: true
  cooldownPeriod: 300
  fallback:
    failureThreshold: 3
    replicas: 3
  idleReplicaCount: 0
  maxReplicaCount: 3
  minReplicaCount: 0
  pollingInterval: 30
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: selenium-node-chrome
  triggers:
  - metadata:
      desiredReplicas: "3"
      end: 10 22 * * *
      start: 30 6 * * 1-5
      timezone: Europe/Prague
    type: cron

The problem is, that during working days from Mon to Fri, pods are correctly started, but they did not stop on Friday, as expected, and keep running till Sunday 22:10.

Expected Behavior

We expect, that using this definition for cron, pods will start (scale up to 3 replicas) each working day at 6:30 and stop (scale down to 0 replicas) daily at 22:10.

Actual Behavior

During working days from Mon to Fri, pods are correctly started, but they did not stop on Friday, as expected, and keep running till Sunday 22:10.

Steps to Reproduce the Problem

  1. define any statefulset / deployment
  2. define scaledobject from our example
  3. wait till friday night

Logs from KEDA operator

example

KEDA Version

2.10.0

Kubernetes Version

1.26

Platform

Amazon Web Services

Scaler Details

cron

Anything else?

No response

@rubroboletus rubroboletus added the bug Something isn't working label Jun 12, 2023
@JorTurFer
Copy link
Member

Hi,
You are right!, I can reproduce the issue and it's because in this code:

switch {
case nextStartTime < nextEndTime && currentTime < nextStartTime:
metric := GenerateMetricInMili(metricName, float64(defaultDesiredReplicas))
return []external_metrics.ExternalMetricValue{metric}, false, nil
case currentTime <= nextEndTime:
metric := GenerateMetricInMili(metricName, float64(s.metadata.desiredReplicas))
return []external_metrics.ExternalMetricValue{metric}, true, nil
default:
metric := GenerateMetricInMili(metricName, float64(defaultDesiredReplicas))
return []external_metrics.ExternalMetricValue{metric}, false, nil
}

as the next start is on Monday and next end is today, the first case is rejected, but as current time is before end time, the scaler thinks that it's in active period. TBH, I don't see an easy solution from code pov because there are multiple "end periods" without any "starting period", but I can suggest a workaround. if you set the ending cron for the same days that the starting cron, the problem will be solved:

  - metadata:
      desiredReplicas: "3"
      end: 10 22 * * 1-5
      start: 30 6 * * 1-5
      timezone: Europe/Prague

@JorTurFer
Copy link
Member

BTW, based on the ScaledObject name ("selenium-node-chrome") I guess that you are scaling out/in a selenium node. Don't you prefer to use the selenium scaler directly?

@rubroboletus
Copy link
Author

Hi, You are right!, I can reproduce the issue and it's because in this code:

switch {
case nextStartTime < nextEndTime && currentTime < nextStartTime:
metric := GenerateMetricInMili(metricName, float64(defaultDesiredReplicas))
return []external_metrics.ExternalMetricValue{metric}, false, nil
case currentTime <= nextEndTime:
metric := GenerateMetricInMili(metricName, float64(s.metadata.desiredReplicas))
return []external_metrics.ExternalMetricValue{metric}, true, nil
default:
metric := GenerateMetricInMili(metricName, float64(defaultDesiredReplicas))
return []external_metrics.ExternalMetricValue{metric}, false, nil
}

as the next start is on Monday and next end is today, the first case is rejected, but as current time is before end time, the scaler thinks that it's in active period. TBH, I don't see an easy solution from code pov because there are multiple "end periods" without any "starting period", but I can suggest a workaround. if you set the ending cron for the same days that the starting cron, the problem will be solved:

  - metadata:
      desiredReplicas: "3"
      end: 10 22 * * 1-5
      start: 30 6 * * 1-5
      timezone: Europe/Prague

Thank you, we are using your suggested solution, but still I see this mentioned version as buggy.
Regarding selenium scaler, it is working with Selenium 4 and because of lazy department, that is programming tests in our company, we are still on version 3 :-(

@JorTurFer JorTurFer added help wanted Looking for support from community good first issue Good for newcomers labels Jun 13, 2023
@sbdtu5498
Copy link

sbdtu5498 commented Jun 14, 2023

@JorTurFer I think this bug is persisting because of the design itself. Why we have taken start and end time when cron spec says "timezone and time" instead of "timezone, start and end" should inherently provide everything.

//current
triggers:
- type: cron
  metadata:
    # Required
    timezone: Asia/Kolkata  # The acceptable values would be a value from the IANA Time Zone Database.
    start: 30 * * * *       # Every hour on the 30th minute
    end: 45 * * * *         # Every hour on the 45th minute
    desiredReplicas: "10"
//proposed
triggers:
- type: cron
  metadata:
    # Required
    timezone: Asia/Kolkata  # The acceptable values would be a value from the IANA Time Zone Database.
    time: 30-45 * * * *       # Every hour on the 30th minute and terminate at 45th minute.
    desiredReplicas: "10"

We would be able to leverage the cron library in a better way. Also, I think there are some bugs in parsing logic as well for example
func parseCronTimeFormat(s string) string {
s = strings.ReplaceAll(s, " ", "")
s = strings.ReplaceAll(s, "*", "x")
s = strings.ReplaceAll(s, "/", "Sl")
s = strings.ReplaceAll(s, "?", "Qm")
return s
}
here in this scenario, the author has tried to incorporate some features but it probably has led to some bugs instead. If we switch to the proposed scenario we would need to incorporate less scenario testing as the corn job has defined behavior. If we stick to the current parsing scenario, we would have to tackle cases such as start = "/5 * * * *" time = "/6 * * * "
this would lead to a net close time of "
/1 * * * *" so we simply don't want the cron job to run. In any case aren't we simply increasing our codebase redundancy?

@sbdtu5498
Copy link

sbdtu5498 commented Jun 14, 2023

If you think that we can change the user side spec in maybe next or next to next release, this change might be worth considering as it would not require changing a lot of code, and the code side interface will remain the same and we would have one to one cron consistency. Also, it might not be a good first issue so I think we should remove the label.

PS - I will try to fix it when I will get some time.

@JorTurFer
Copy link
Member

We would be able to leverage the cron library in a better way. Also, I think there are some bugs in parsing logic as well for example
func parseCronTimeFormat(s string) string {
s = strings.ReplaceAll(s, " ", "")
s = strings.ReplaceAll(s, "*", "x")
s = strings.ReplaceAll(s, "/", "Sl")
s = strings.ReplaceAll(s, "?", "Qm")
return s
}

This function is used only to sanitize the expression for the hpa's metric name. The parsing relies in the underlying library:

func getCronTime(location *time.Location, spec string) (int64, error) {
c := cron.New(cron.WithLocation(location))
_, err := c.AddFunc(spec, func() { _ = fmt.Sprintf("Cron initialized for location %s", location.String()) })
if err != nil {
return 0, err
}
c.Start()
cronTime := c.Entries()[0].Next.Unix()
c.Stop()
return cronTime, nil
}

I think this bug is persisting because of the design itself.

Yeah, but I'd not invest too much time because we want to redesign it as new scaler: #3566
If you are interested and you have some time, I'd recommend to help there :)

@sbdtu5498
Copy link

sbdtu5498 commented Jun 14, 2023

This function is used only to sanitize the expression for the hpa's metric name. The parsing relies in the underlying library

Yeah that library is like the defacto standard to handle cron in go. That's why I was saying it as we would not require any sort of redundant handling, etc.

Yeah, but I'd not invest too much time because we want to redesign it as new scaler: #3566
If you are interested and you have some time, I'd recommend to help there :)

But after looking at the design proposal to change the spec. It looks really brilliant, I really like the idea. Minimal parsing can get things done. No library imports, required, etc. This is exactly what I wanted to do but with cron library, but the kube-green based interface design is like much much better, sleek and elegant. So basically two targets that we have to rewrite is cron_scaler.go and appropriate test. Also I just did a brief overview and think these files are self-inclusive for such implementation, as other things are being handled by Keda internal interfaces. If I have missed anything, let me know and feel free to assign me the issue. if everything looks good , I will try to implement the issue this week. Thanks!

@JorTurFer
Copy link
Member

We should keep the current cron scaler backward compatible, and that's why we proposed to create another one. But if you implement it as a new scaler, it's fine :)

@tomkerkhove
Copy link
Member

Can we close this issue then in favor of #3566?

@stale
Copy link

stale bot commented Aug 21, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Aug 21, 2023
@v-shenoy
Copy link
Contributor

@krishna-kariya PTAL.

@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Aug 23, 2023
@stale
Copy link

stale bot commented Oct 22, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Oct 22, 2023
@zroubalik zroubalik removed the stale All issues that are marked as stale due to inactivity label Oct 24, 2023
Copy link

stale bot commented Dec 23, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Dec 23, 2023
@zroubalik zroubalik removed the stale All issues that are marked as stale due to inactivity label Dec 28, 2023
Copy link

stale bot commented Mar 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Mar 1, 2024
Copy link

stale bot commented Mar 8, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Looking for support from community stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

6 participants