default for train/validation split in fit_class_random_forest #350

soxofaan · 2022-03-16T16:47:58Z

openeo-processes/proposals/fit_class_random_forest.json

Lines 27 to 33 in e4df864

    
           "name": "training", 
        
           "description": "The amount of training data to be used in the classification, given as a fraction. The sampling will be chosen randomly through the data object. The remaining data will be used as test data for the validation.", 
        
           "schema": { 
        
               "type": "number", 
        
               "exclusiveMinimum": 0, 
        
               "maximum": 1 
        
           }

Maybe this has been discussed before, but can't we pick a sensible default for the training-validation split? e.g. 80% / 20%?

m-mohr · 2022-03-20T15:22:56Z

Sure, what is a sensible default?

soxofaan · 2022-03-21T08:26:23Z

Python sklearn's train_test_split seems to default to a split of 75% train - 25% test/validation

But most articles I checked from a quick google suggest 80% train - 20% validation as good default.

m-mohr · 2022-03-23T14:04:21Z

dev telco: Remove this parameter completely

m-mohr added a commit that referenced this issue Mar 21, 2022

default for train/validation split in fit_class_random_forest #350

e231b39

m-mohr mentioned this issue Mar 21, 2022

Fixes for the random forest processes #351

Merged

m-mohr linked a pull request Mar 21, 2022 that will close this issue

Fixes for the random forest processes #351

Merged

m-mohr added the ML label Mar 21, 2022

m-mohr added this to the 1.3.0 milestone Mar 21, 2022

m-mohr closed this as completed Mar 23, 2022

m-mohr modified the milestones: 1.3.0, 2.0.0 Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default for train/validation split in fit_class_random_forest #350

default for train/validation split in fit_class_random_forest #350

soxofaan commented Mar 16, 2022 •

edited

Loading

m-mohr commented Mar 20, 2022

soxofaan commented Mar 21, 2022

m-mohr commented Mar 23, 2022

default for train/validation split in fit_class_random_forest #350

default for train/validation split in fit_class_random_forest #350

Comments

soxofaan commented Mar 16, 2022 • edited Loading

m-mohr commented Mar 20, 2022

soxofaan commented Mar 21, 2022

m-mohr commented Mar 23, 2022

soxofaan commented Mar 16, 2022 •

edited

Loading