Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fit_*_random_forest: Drop parameter max_variables? #358

Open
m-mohr opened this issue Mar 24, 2022 · 4 comments
Open

fit_*_random_forest: Drop parameter max_variables? #358

m-mohr opened this issue Mar 24, 2022 · 4 comments
Labels
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Mar 24, 2022

as touched in the dev telco today: it could also be an option to leave mtry/max_variables out of the spec for now (backends will be able to pick a good enough default behavior), and just introduce it when we are sure about how we are going to handle it

Originally posted by @soxofaan in #351 (comment)

@jdries
Copy link
Contributor

jdries commented Mar 29, 2022

+1 for this one
Internally, we've even reached the conclusion that random forest is nice for textbook examples, but for a lot of real world use cases, new types of ML are used like boosters. Hence, investing a lot of time in getting parameters right might not make sense.

@m-mohr
Copy link
Member Author

m-mohr commented Apr 21, 2022

On the other hand, I don't see why the current definition is problematic. Back-ends can now choose which options they allow by adapting the schema/enum.

@m-mohr m-mohr added the ML label Apr 21, 2022
@soxofaan
Copy link
Member

Back-ends can now choose which options they allow by adapting the schema/enum.

We should avoid that, I think. It's bad for the user experience if there is divergence between the process docs at openeo.org, the docs of the python client (online or in the user's IDE) and the process definitions at openeo.vito.be.
It's not only bad for the user experience, but it makes the merging at the level of aggregator/federation also a lot harder.

@m-mohr
Copy link
Member Author

m-mohr commented Apr 21, 2022

We have that in many places because it just doesn't work otherwise with the many underlying implementations. That's why we have the JSON Schemas in the first place, so that clients can actually adapt to it. Ideally, you'd implement everything, but that's not realistic (see recent issues about max_variables, ard processes, aggregate_temporal_period etc).

Just search for "enum" in openeo-processes and you'll already find over 20 processes with a customizable enum. And this doesn't even include the "options" objects in some processes.

The Python client diverges by design so I don't buy that argument.

@m-mohr m-mohr added this to the 2.0.0 milestone Feb 1, 2023
@m-mohr m-mohr modified the milestones: 2.0.0, 2.1.0 Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants