Skip to content

Dried Apricot

Compare
Choose a tag to compare
@shaycrk shaycrk released this 27 Aug 05:15
· 173 commits to master since this release

WARNING: BREAKING CHANGES!

Note that several changes in triage 5 break backwards compatibility with triage 4. If you are upgrading a project from an earlier version of triage, it is highly recommended that you first create a backup of your current database!

These breaking changes include:

  • Revision in the way the model_hash is calculated means that if you're re-running an experiment from an earlier version of triage, it will re-train your models and give them new model_ids even if the configuration hasn't changed.
  • The built_by_experiment column has been removed from triage_metadata.models in preference of tracking the specific run that built the model. The experiment_hash can still be obtained by joining to triage_metadata.triage_runs (née triage_metadata.experiment_runs). Should you need the data that was in this column at the time of migration, it can be found in triage_metadata.deprecated_models_built_by_experiment, but it will not be restored to the table upon database downgrade.
  • Changes in the structure of matrix metadata means the matrix_hash will no longer be backwards-compatible with oder version of triage (as with models, re-running an old config would result in matrices being re-created)
  • The random_seed column has been removed from triage_metadata.experiments in preference of tracking it at the run level as well. A database upgrade followed by a downgrade would lose this data (but could be recovered from the runs table)

New Functionality

  • Functionality for predicting forward, either with an existing model object or by retraining a new model with the most current data given a model_group_id (#631)
  • Utility for adding predictions to models previously trained/tested with save_predictions=False (#836)
  • Provisioner for easily setting up a postgresql database (via docker) that can be used with triage (#840)
  • More flexibility in parallelization for more resource-intensive model types, like random forests (#853)

Bug Fixes

  • Ensure model-level random seeds are re-used when the config and experiment-level random seed are unchanged (#848)
  • Remove the project path from the model_hash definition: the model_id shouldn't depend on where triage is being run (#830)
  • Ensure that feature groups are sorted in matrix metadata for consistency in downstream calculations (#833)

Thanks To

@tweddielin, @thcrock, @ecsalomon, @kasunamare