fit_*_random_forest: Drop parameter max_variables? #358

m-mohr · 2022-03-24T11:14:06Z

as touched in the dev telco today: it could also be an option to leave mtry/max_variables out of the spec for now (backends will be able to pick a good enough default behavior), and just introduce it when we are sure about how we are going to handle it

Originally posted by @soxofaan in #351 (comment)

jdries · 2022-03-29T09:44:50Z

+1 for this one
Internally, we've even reached the conclusion that random forest is nice for textbook examples, but for a lot of real world use cases, new types of ML are used like boosters. Hence, investing a lot of time in getting parameters right might not make sense.

m-mohr · 2022-04-21T09:48:05Z

On the other hand, I don't see why the current definition is problematic. Back-ends can now choose which options they allow by adapting the schema/enum.

soxofaan · 2022-04-21T12:23:46Z

Back-ends can now choose which options they allow by adapting the schema/enum.

We should avoid that, I think. It's bad for the user experience if there is divergence between the process docs at openeo.org, the docs of the python client (online or in the user's IDE) and the process definitions at openeo.vito.be.
It's not only bad for the user experience, but it makes the merging at the level of aggregator/federation also a lot harder.

m-mohr · 2022-04-21T13:00:22Z

We have that in many places because it just doesn't work otherwise with the many underlying implementations. That's why we have the JSON Schemas in the first place, so that clients can actually adapt to it. Ideally, you'd implement everything, but that's not realistic (see recent issues about max_variables, ard processes, aggregate_temporal_period etc).

Just search for "enum" in openeo-processes and you'll already find over 20 processes with a customizable enum. And this doesn't even include the "options" objects in some processes.

The Python client diverges by design so I don't buy that argument.

m-mohr mentioned this issue Mar 24, 2022

Fixes for the random forest processes #351

Merged

soxofaan mentioned this issue Apr 21, 2022

Default for max_variables in fit_class_random_forest and fit_regr_random_forest #365

Open

m-mohr added the ML label Apr 21, 2022

m-mohr added this to the 2.0.0 milestone Feb 1, 2023

m-mohr modified the milestones: 2.0.0, 2.1.0 Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fit_*_random_forest: Drop parameter max_variables? #358

fit_*_random_forest: Drop parameter max_variables? #358

m-mohr commented Mar 24, 2022

jdries commented Mar 29, 2022

m-mohr commented Apr 21, 2022

soxofaan commented Apr 21, 2022

m-mohr commented Apr 21, 2022

fit_*_random_forest: Drop parameter max_variables? #358

fit_*_random_forest: Drop parameter max_variables? #358

Comments

m-mohr commented Mar 24, 2022

jdries commented Mar 29, 2022

m-mohr commented Apr 21, 2022

soxofaan commented Apr 21, 2022

m-mohr commented Apr 21, 2022