[GSOC] `hyperopt` suggestion service logic update #2412

shashank-iitbhu · 2024-08-21T22:01:36Z

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2374

Checklist:

Docs included if any changes are user facing

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

google-oss-prow · 2024-08-21T22:01:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tenzen-y · 2024-08-22T18:39:38Z

/area gsoc

…arch_space.py

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

andreyvelich

Thank you for this @shashank-iitbhu!
I left a few comments

.github/workflows/e2e-test-pytorch-mnist.yaml

examples/v1beta1/hp-tuning/hyperopt-distribution.yaml

andreyvelich · 2024-09-02T21:53:07Z

pkg/apis/manager/v1beta1/api.proto

-    NORMAL = 2;
-    LOG_NORMAL = 3;
-    DISTRIBUTION_UNKNOWN = 4;
+    DISTRIBUTION_UNKNOWN = 0;


Please keep the same name as for parameter_type

Suggested change

DISTRIBUTION_UNKNOWN = 0;

UNKNOWN_DISTRIBUTION = 0;

Suggested change

DISTRIBUTION_UNKNOWN = 0;

DISTRIBUTION _UNSPECIFIED = 0;

I would like to select the UNSPECIFIED suffix here.
Please see: https://google.aip.dev/126

Make sense, @tenzen-y should we rename other gRPC parameters to UNSPECIFIED ?

Changing released gRPC, it indicates losing backward compatibility.
So, I would like to keep using the existing API for released protocolbuffers API, @andreyvelich WDYT?

Since these gRPC APIs are not exposed to the end-users, do you still think that we should not change the existing APIs ?
It only affects users who build their own Suggestion service.

Since these gRPC APIs are not exposed to the end-users, do you still think that we should not change the existing APIs ?

Almost correct. Additionally, when users keep using the removed Suggestion Services like the Chocolate Suggestion, users face the same problem.

So, can we collect feedback on the dedicated issue outside of here?

Sure, let's followup on this in the issue, and rename it after few months if we don't get any feedback.
@shashank-iitbhu Please can you create an issue to track it ?

Sure, let's followup on this in the issue, and rename it after few months if we don't get any feedback. @shashank-iitbhu Please can you create an issue to track it ?

Sure, I will create a separate issue to track the renaming of other gRPC parameters to UNSPECIFIED.

pkg/controller.v1beta1/suggestion/suggestionclient/suggestionclient.go

test/unit/v1beta1/suggestion/test_hyperopt_service.py

andreyvelich · 2024-09-02T21:57:32Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

-                )
-            elif param.type == DOUBLE:
-                hyperopt_search_space[param.name] = hyperopt.hp.uniform(
+                hyperopt_search_space[param.name] = hyperopt.hp.uniformint(


If parameter is int, why we can't support other distributions like lognormal ?

Distributions like uniform quniform loguniform normal etc return float values. They are designed to sample from a range of values that can take any real number (float), which might not make sense if we're looking for an integer value.
Although we can definitely add support for these distributions when parameter is int also. Should we do this?

@tenzen-y @kubeflow/wg-training-leads @shashank-iitbhu Should we round this float value to int if user wants to use this distribution and int parameter type ?

@tenzen-y @kubeflow/wg-training-leads @shashank-iitbhu Should we round this float value to int if user wants to use this distribution and int parameter type ?

SGTM
Users can specify the double parameter type if they want to compute more exactly.
But, documentation of this restriction for int parameter type would be better.

pkg/suggestion/v1beta1/hyperopt/base_service.py

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in> validation fix add e2e tests for hyperopt added e2e test to workflow

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in> sigma calculation fixed fix parse new arguments to mnist.py

shashank-iitbhu · 2024-09-11T22:11:31Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

+                        )
+                elif param.distribution == api_pb2.NORMAL:
+                    mu = (float(param.min) + float(param.max)) / 2
+                    sigma = (float(param.max) - float(param.min)) / 6


I followed this article to determine the value of sigma from min and max.
cc @tenzen-y @andreyvelich

Maybe we should add this article to the comments. WDYT @tenzen-y @johnugeorge ?

I do not want to depend on the individual article. Instead of that, it would be better to add an actual mathematical description here as a comment.

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

pkg/suggestion/v1beta1/hyperopt/base_service.py

shashank-iitbhu · 2024-09-19T20:24:59Z

@tenzen-y I have added two new parameters, weight_decay and dropout_rate, to the Hyperopt example and passed them to mnist.py, but I haven't used them in the Net class yet in the train and test functions. If you check the logs for this e2e test, the maximum value of the loss metrics is an enormously large number. I can't figure out what I'm missing. Also tested this locally.

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

shashank-iitbhu · 2024-09-22T15:03:05Z

examples/v1beta1/hp-tuning/hyperopt-distribution.yaml

+      feasibleSpace:
+        min: "32"
+        max: "64"
+        distribution: "logNormal"


@tenzen-y Testing the logNormal distribution using batch_size.

shashank-iitbhu · 2024-09-22T15:07:33Z

@tenzen-y

katib/pkg/suggestion/v1beta1/hyperopt/base_service.py

Lines 265 to 266 in 867c40a

    
           if param.type == INTEGER: 
        
               assignments.append(Assignment(param.name, int(vals[param.name][0])))

here the float values sampled from the distribution get converted to int for INT parameter type.

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

tenzen-y · 2024-09-22T15:12:27Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

+                    log_min = math.log(float(param.min))
+                    log_max = math.log(float(param.max))


Shouldn't we use the fixed value when the min and max are scalers the same as Nevergrad, right?

Yeah we can, but that was an edge case when min and max are not defined in case of nevergrad.

elif isinstance(param, (p.Log, p.Scalar)): if (param.bounds[0][0] is None) or (param.bounds[1][0] is None): if isinstance(param, p.Scalar) and not param.integer: return hp.lognormal(label=param_name, mu=0, sigma=1)

For example,

- name: batch_size parameterType: int feasibleSpace: min: "32" max: "64" distribution: "logNormal"

The above parameter will be sampled out from this graph:

where u=3.8123 and sigma=0.3465 are calculated by putting min=32 and max=64 in our code. and E(X) represents the mean which is 48 in our case.

That makes sense.
In that case, could you address the cases where min or max is not specified, as well as nevergrad?

https://github.com/facebookresearch/nevergrad/blob/a2006e50b068fe598e0f3d7dab9c9bcf6cf97e00/nevergrad/optimization/externalbo.py#L61-L64

@shashank-iitbhu This is still pending.

katib/pkg/webhook/v1beta1/experiment/validator/validator.go

Lines 287 to 290 in 867c40a

if param.FeasibleSpace.Max == "" && param.FeasibleSpace.Min == "" {

allErrs = append(allErrs, field.Required(parametersPath.Index(i).Child("feasibleSpace").Child("max"),

fmt.Sprintf("feasibleSpace.max or feasibleSpace.min must be specified for parameterType: %v", param.ParameterType)))

}

The webhook validator requires feasibleSpace.max or feasibleSpace.min to be specified.

But when either min or max is empty, this validation does not reject the request, right?
So, shouldn't we implement the special case in the Suggestion Service?

Yes, the validation webhook does not reject the request when either min or max is empty. But I created an example where:

- name: batch_size parameterType: int feasibleSpace: min: "32" distribution: "logNormal"

For this, the experiment is being created but the suggestion service is not sampling out any value hence the trials are not running, though handled this case (when either min or max are not specified) in pkg/suggestion/v1beta1/hyperopt/base_service.py.
Do we need to check experiment_defaults.go file?
https://github.com/kubeflow/katib/blob/867c40a1b0669446c774cd6e770a5b7bbf1eb2f1/pkg/apis/controller/experiments/v1beta1/experiment_defaults.go

test/unit/v1beta1/suggestion/test_hyperopt_service.py

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

test/unit/v1beta1/suggestion/test_hyperopt_service.py

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

andreyvelich · 2024-09-24T14:10:16Z

.github/workflows/e2e-test-pytorch-mnist.yaml

          - "file-metrics-collector,pytorchjob-mnist"
          - "median-stop-with-json-format,file-metrics-collector-with-json-format"
+


Suggested change

andreyvelich · 2024-09-24T14:18:05Z

examples/v1beta1/hp-tuning/hyperopt-distribution.yaml

+        min: "0.01"
+        max: "0.05"
+        step: "0.01"
+        distribution: "normal"


you can remove quotes

Suggested change

distribution: "normal"

distribution: normal

andreyvelich · 2024-09-24T14:18:12Z

examples/v1beta1/hp-tuning/hyperopt-distribution.yaml

+      feasibleSpace:
+        min: "0.001"
+        max: "1"
+        distribution: "uniform"


Suggested change

distribution: "uniform"

distribution: uniform

andreyvelich · 2024-09-24T14:18:25Z

examples/v1beta1/hp-tuning/hyperopt-distribution.yaml

+      feasibleSpace:
+        min: "32"
+        max: "64"
+        distribution: "logNormal"


Suggested change

distribution: "logNormal"

distribution: logNormal

andreyvelich · 2024-09-24T14:18:54Z

examples/v1beta1/hp-tuning/hyperopt-distribution.yaml

+      feasibleSpace:
+        min: "0.001"
+        max: "1"
+        distribution: "uniform"


Can you test logUniform in this example also ?

andreyvelich · 2024-09-24T14:59:07Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

+                    else:
+                        if param.type == INTEGER:
+                            hyperopt_search_space[param.name] = hyperopt.hp.uniformint(
+                                param.name, float(param.min), float(param.max)
+                            )
+                        else:
+                            hyperopt_search_space[param.name] = hyperopt.hp.uniform(
+                                param.name, float(param.min), float(param.max)
+                            )


You can simplify it

elif param.type == INTEGER: hyperopt_search_space[param.name] = hyperopt.hp.uniformint( param.name, float(param.min), float(param.max) ) else: hyperopt_search_space[param.name] = hyperopt.hp.uniform( param.name, float(param.min), float(param.max) )

andreyvelich · 2024-09-24T14:59:43Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

+            if param.type in [INTEGER, DOUBLE]:
+                if param.distribution == api_pb2.UNIFORM or param.distribution is None:
+                    if param.step:
+                        hyperopt_search_space[param.name] = hyperopt.hp.quniform(


Do we have quniformint in Hyperopt ?

oh yes, missed this, should use quniformint here for INT

andreyvelich · 2024-09-24T15:08:43Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

+                            math.log(float(param.min)),
+                            math.log(float(param.max)),


Do we need to take logarithm from the values before passing them to the loguniform ?

We convert param.min and param.max to logarithmic scale as when optimizing, the sampled value is constrained to the interval [exp(low), exp(high)]. Refer to hp.loguniform documentation here.

Also here in the nevergrad implementation the upper and lower bounds were converted to logarithmic scale for hp.loguniform

@andreyvelich @tenzen-y

andreyvelich · 2024-09-24T15:10:12Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

+                            math.log(float(param.max)),
+                        )
+                elif param.distribution == api_pb2.NORMAL:
+                    mu = (float(param.min) + float(param.max)) / 2


Please can you add the comment before this line on why we do this.

andreyvelich · 2024-09-24T15:10:56Z

pkg/suggestion/v1beta1/hyperopt/base_service.py

+                        )
+                elif param.distribution == api_pb2.NORMAL:
+                    mu = (float(param.min) + float(param.max)) / 2
+                    sigma = (float(param.max) - float(param.min)) / 6


Maybe we should add this article to the comments. WDYT @tenzen-y @johnugeorge ?

hyperopt suggestion logic update

f615e3f

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

google-oss-prow bot requested review from anencore94, gaocegege and johnugeorge August 21, 2024 22:01

google-oss-prow bot added the size/M label Aug 21, 2024

shashank-iitbhu mentioned this pull request Aug 21, 2024

[GSOC] Project 8: Support various Parameter Distribution in Katib #2374

Open

12 tasks

google-oss-prow bot added the area/gsoc label Aug 22, 2024

shashank-iitbhu added 3 commits August 25, 2024 19:35

Merge upstream master and resolve conflicts in base_service.py and se…

a8bc887

…arch_space.py

fix

a67f373

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

DISTRIBUTION_UNKNOWN enum set to 0 in gRPC api

365c2f5

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

google-oss-prow bot added size/L and removed size/M labels Aug 26, 2024

andreyvelich reviewed Sep 2, 2024

View reviewed changes

shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from 1a7a831 to fddb763 Compare September 10, 2024 16:23

shashank-iitbhu added 7 commits September 10, 2024 22:00

convert parameter method fix

caa2422

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in> validation fix add e2e tests for hyperopt added e2e test to workflow

convert feasibleSpace func updated

0f38a51

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

renamed DISTRIBUTION_UNKNOWN to DISTRIBUTION_UNSPECIFIED

ae9fa34

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

fix

910a46c

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

added more test cases for hyperopt distributions

08b01ac

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

added support for NORMAL and LOG_NORMAL in hyperopt suggestion service

16dc030

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

added e2e tests for NORMAL and LOG_NORMAL

282f81d

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in> sigma calculation fixed fix parse new arguments to mnist.py

shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from fddb763 to 282f81d Compare September 10, 2024 16:33

shashank-iitbhu commented Sep 11, 2024

View reviewed changes

shashank-iitbhu requested a review from tenzen-y September 17, 2024 12:16

shashank-iitbhu added 3 commits September 19, 2024 13:52

hyperopt-suggestion example update

b7d09a6

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

updated logic for log distributions

58ab1ac

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

updated logic for log distributions

2b1932e

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

tenzen-y reviewed Sep 19, 2024

View reviewed changes

pkg/suggestion/v1beta1/hyperopt/base_service.py Show resolved Hide resolved

shashank-iitbhu added 2 commits September 22, 2024 16:19

e2e test fixed

2f1c355

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

added support for parameter distributions for Parameter type INT

8391c29

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

google-oss-prow bot added size/XL and removed size/L labels Sep 22, 2024

unit test fixed

23fd30b

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

google-oss-prow bot added size/L and removed size/XL labels Sep 22, 2024

shashank-iitbhu requested a review from tenzen-y September 22, 2024 14:46

shashank-iitbhu commented Sep 22, 2024

View reviewed changes

shashank-iitbhu and others added 2 commits September 22, 2024 20:38

Update pkg/suggestion/v1beta1/hyperopt/base_service.py

7f6deb5

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

comment fixed

b85b4bf

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

tenzen-y reviewed Sep 22, 2024

View reviewed changes

test/unit/v1beta1/suggestion/test_hyperopt_service.py Show resolved Hide resolved

added unit tests for INT parameter type

dc36303

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

tenzen-y reviewed Sep 22, 2024

View reviewed changes

google-oss-prow bot added size/XL and removed size/L labels Sep 22, 2024

completed param unit test cases

658daaf

Signed-off-by: Shashank Mittal <shashank.mittal.mec22@itbhu.ac.in>

shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from a1156fc to 658daaf Compare September 23, 2024 04:44

andreyvelich reviewed Sep 24, 2024

View reviewed changes

		log_min = math.log(float(param.min))
		log_max = math.log(float(param.max))

	if param.FeasibleSpace.Max == "" && param.FeasibleSpace.Min == "" {
	allErrs = append(allErrs, field.Required(parametersPath.Index(i).Child("feasibleSpace").Child("max"),
	fmt.Sprintf("feasibleSpace.max or feasibleSpace.min must be specified for parameterType: %v", param.ParameterType)))
	}

		- "file-metrics-collector,pytorchjob-mnist"
		- "median-stop-with-json-format,file-metrics-collector-with-json-format"

[GSOC] hyperopt suggestion service logic update #2412

Are you sure you want to change the base?

[GSOC] hyperopt suggestion service logic update #2412

Conversation

shashank-iitbhu commented Aug 21, 2024

google-oss-prow bot commented Aug 21, 2024

tenzen-y commented Aug 22, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tenzen-y Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

andreyvelich Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shashank-iitbhu Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shashank-iitbhu Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shashank-iitbhu commented Sep 19, 2024

Choose a reason for hiding this comment

shashank-iitbhu commented Sep 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[GSOC] `hyperopt` suggestion service logic update #2412

[GSOC] `hyperopt` suggestion service logic update #2412

tenzen-y Sep 3, 2024 •

edited

Loading

andreyvelich Sep 3, 2024 •

edited

Loading

shashank-iitbhu Sep 6, 2024 •

edited

Loading

shashank-iitbhu Sep 11, 2024 •

edited

Loading