Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSOC] hyperopt suggestion service logic update #2412

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

shashank-iitbhu
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2374

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tenzen-y
Copy link
Member

/area gsoc

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Aug 26, 2024
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @shashank-iitbhu!
I left a few comments

.github/workflows/e2e-test-pytorch-mnist.yaml Outdated Show resolved Hide resolved
examples/v1beta1/hp-tuning/hyperopt-distribution.yaml Outdated Show resolved Hide resolved
NORMAL = 2;
LOG_NORMAL = 3;
DISTRIBUTION_UNKNOWN = 4;
DISTRIBUTION_UNKNOWN = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the same name as for parameter_type

Suggested change
DISTRIBUTION_UNKNOWN = 0;
UNKNOWN_DISTRIBUTION = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DISTRIBUTION_UNKNOWN = 0;
DISTRIBUTION _UNSPECIFIED = 0;

I would like to select the UNSPECIFIED suffix here.
Please see: https://google.aip.dev/126

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, @tenzen-y should we rename other gRPC parameters to UNSPECIFIED ?

Copy link
Member

@tenzen-y tenzen-y Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing released gRPC, it indicates losing backward compatibility.
So, I would like to keep using the existing API for released protocolbuffers API, @andreyvelich WDYT?

Copy link
Member

@andreyvelich andreyvelich Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these gRPC APIs are not exposed to the end-users, do you still think that we should not change the existing APIs ?
It only affects users who build their own Suggestion service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these gRPC APIs are not exposed to the end-users, do you still think that we should not change the existing APIs ?

Almost correct. Additionally, when users keep using the removed Suggestion Services like the Chocolate Suggestion, users face the same problem.

So, can we collect feedback on the dedicated issue outside of here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's followup on this in the issue, and rename it after few months if we don't get any feedback.
@shashank-iitbhu Please can you create an issue to track it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's followup on this in the issue, and rename it after few months if we don't get any feedback. @shashank-iitbhu Please can you create an issue to track it ?

Sure, I will create a separate issue to track the renaming of other gRPC parameters to UNSPECIFIED.

)
elif param.type == DOUBLE:
hyperopt_search_space[param.name] = hyperopt.hp.uniform(
hyperopt_search_space[param.name] = hyperopt.hp.uniformint(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If parameter is int, why we can't support other distributions like lognormal ?

Copy link
Contributor Author

@shashank-iitbhu shashank-iitbhu Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions like uniform quniform loguniform normal etc return float values. They are designed to sample from a range of values that can take any real number (float), which might not make sense if we're looking for an integer value.
Although we can definitely add support for these distributions when parameter is int also. Should we do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y @kubeflow/wg-training-leads @shashank-iitbhu Should we round this float value to int if user wants to use this distribution and int parameter type ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y @kubeflow/wg-training-leads @shashank-iitbhu Should we round this float value to int if user wants to use this distribution and int parameter type ?

SGTM
Users can specify the double parameter type if they want to compute more exactly.
But, documentation of this restriction for int parameter type would be better.

pkg/suggestion/v1beta1/hyperopt/base_service.py Outdated Show resolved Hide resolved
@shashank-iitbhu shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from 1a7a831 to fddb763 Compare September 10, 2024 16:23
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

validation fix

add e2e tests for hyperopt

added e2e test to workflow
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

sigma calculation fixed

fix

parse new arguments to mnist.py
@shashank-iitbhu shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from fddb763 to 282f81d Compare September 10, 2024 16:33
)
elif param.distribution == api_pb2.NORMAL:
mu = (float(param.min) + float(param.max)) / 2
sigma = (float(param.max) - float(param.min)) / 6
Copy link
Contributor Author

@shashank-iitbhu shashank-iitbhu Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed this article to determine the value of sigma from min and max.
cc @tenzen-y @andreyvelich

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add this article to the comments. WDYT @tenzen-y @johnugeorge ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to depend on the individual article. Instead of that, it would be better to add an actual mathematical description here as a comment.

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
@shashank-iitbhu
Copy link
Contributor Author

@tenzen-y I have added two new parameters, weight_decay and dropout_rate, to the Hyperopt example and passed them to mnist.py, but I haven't used them in the Net class yet in the train and test functions. If you check the logs for this e2e test, the maximum value of the loss metrics is an enormously large number. I can't figure out what I'm missing. Also tested this locally.

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Sep 22, 2024
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
feasibleSpace:
min: "32"
max: "64"
distribution: "logNormal"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y Testing the logNormal distribution using batch_size.

@shashank-iitbhu
Copy link
Contributor Author

@tenzen-y

if param.type == INTEGER:
assignments.append(Assignment(param.name, int(vals[param.name][0])))

here the float values sampled from the distribution get converted to int for INT parameter type.

shashank-iitbhu and others added 2 commits September 22, 2024 20:38
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
Comment on lines +118 to +119
log_min = math.log(float(param.min))
log_max = math.log(float(param.max))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use the fixed value when the min and max are scalers the same as Nevergrad, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can, but that was an edge case when min and max are not defined in case of nevergrad.

    elif isinstance(param, (p.Log, p.Scalar)):
        if (param.bounds[0][0] is None) or (param.bounds[1][0] is None):
            if isinstance(param, p.Scalar) and not param.integer:
                return hp.lognormal(label=param_name, mu=0, sigma=1)

For example,

    - name: batch_size
      parameterType: int
      feasibleSpace:
        min: "32"
        max: "64"
        distribution: "logNormal"

The above parameter will be sampled out from this graph:
Screenshot 2024-09-22 at 8 48 49 PM
where u=3.8123 and sigma=0.3465 are calculated by putting min=32 and max=64 in our code. and E(X) represents the mean which is 48 in our case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.
In that case, could you address the cases where min or max is not specified, as well as nevergrad?

https://github.com/facebookresearch/nevergrad/blob/a2006e50b068fe598e0f3d7dab9c9bcf6cf97e00/nevergrad/optimization/externalbo.py#L61-L64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shashank-iitbhu This is still pending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if param.FeasibleSpace.Max == "" && param.FeasibleSpace.Min == "" {
allErrs = append(allErrs, field.Required(parametersPath.Index(i).Child("feasibleSpace").Child("max"),
fmt.Sprintf("feasibleSpace.max or feasibleSpace.min must be specified for parameterType: %v", param.ParameterType)))
}

The webhook validator requires feasibleSpace.max or feasibleSpace.min to be specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when either min or max is empty, this validation does not reject the request, right?
So, shouldn't we implement the special case in the Suggestion Service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the validation webhook does not reject the request when either min or max is empty. But I created an example where:

    - name: batch_size
      parameterType: int
      feasibleSpace:
        min: "32"
        distribution: "logNormal"

For this, the experiment is being created but the suggestion service is not sampling out any value hence the trials are not running, though handled this case (when either min or max are not specified) in pkg/suggestion/v1beta1/hyperopt/base_service.py.
Do we need to check experiment_defaults.go file?
https://github.com/kubeflow/katib/blob/867c40a1b0669446c774cd6e770a5b7bbf1eb2f1/pkg/apis/controller/experiments/v1beta1/experiment_defaults.go

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Sep 22, 2024
Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>
@shashank-iitbhu shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from a1156fc to 658daaf Compare September 23, 2024 04:44
- "file-metrics-collector,pytorchjob-mnist"
- "median-stop-with-json-format,file-metrics-collector-with-json-format"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

min: "0.01"
max: "0.05"
step: "0.01"
distribution: "normal"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove quotes

Suggested change
distribution: "normal"
distribution: normal

feasibleSpace:
min: "0.001"
max: "1"
distribution: "uniform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
distribution: "uniform"
distribution: uniform

feasibleSpace:
min: "32"
max: "64"
distribution: "logNormal"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
distribution: "logNormal"
distribution: logNormal

feasibleSpace:
min: "0.001"
max: "1"
distribution: "uniform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test logUniform in this example also ?

Comment on lines +76 to +84
else:
if param.type == INTEGER:
hyperopt_search_space[param.name] = hyperopt.hp.uniformint(
param.name, float(param.min), float(param.max)
)
else:
hyperopt_search_space[param.name] = hyperopt.hp.uniform(
param.name, float(param.min), float(param.max)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify it

elif param.type == INTEGER:
    hyperopt_search_space[param.name] = hyperopt.hp.uniformint(
        param.name, float(param.min), float(param.max)
    )
else:
    hyperopt_search_space[param.name] = hyperopt.hp.uniform(
        param.name, float(param.min), float(param.max)
    )

if param.type in [INTEGER, DOUBLE]:
if param.distribution == api_pb2.UNIFORM or param.distribution is None:
if param.step:
hyperopt_search_space[param.name] = hyperopt.hp.quniform(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have quniformint in Hyperopt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, missed this, should use quniformint here for INT

Comment on lines +96 to +97
math.log(float(param.min)),
math.log(float(param.max)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to take logarithm from the values before passing them to the loguniform ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We convert param.min and param.max to logarithmic scale as when optimizing, the sampled value is constrained to the interval [exp(low), exp(high)]. Refer to hp.loguniform documentation here.

Also here in the nevergrad implementation the upper and lower bounds were converted to logarithmic scale for hp.loguniform

@andreyvelich @tenzen-y

math.log(float(param.max)),
)
elif param.distribution == api_pb2.NORMAL:
mu = (float(param.min) + float(param.max)) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add the comment before this line on why we do this.

)
elif param.distribution == api_pb2.NORMAL:
mu = (float(param.min) + float(param.max)) / 2
sigma = (float(param.max) - float(param.min)) / 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add this article to the comments. WDYT @tenzen-y @johnugeorge ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GSOC] Project 8: Support various Parameter Distribution in Katib
3 participants