Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Keeping track of the best model checkpoint #345

Open
miguelvr opened this issue Jan 16, 2020 · 2 comments
Open

Keeping track of the best model checkpoint #345

miguelvr opened this issue Jan 16, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@miguelvr
Copy link
Contributor

🚀 Feature

There is currently no way of keeping track of the best checkpoint in classy vision, with a focus on a specific meter.

Motivation / Pitch

Knowing which checkpoint performs better against a validation set is a common practice in ML, it makes sense to automate this process for the user.

Alternatives

The only way I've managed to do it so far was by analyzing the contents of the saved checkpoints for a specific meter or by manually looking at the training logs. Both of these are very tedious processes.

Additional Context

I think this can be implemented by a ClassyHook that watches the values of a user-specified meter and saves the best phase with a given strategy (max/min). The downside I see is that in order to be accurate, this needs to be synced across all the workers in the distributed case.

Results can then be logged or dumped to a file in the master worker.

@mannatsingh
Copy link
Contributor

This is something we do plan on supporting, thanks for filing a feature request, @miguelvr !

The approach I had in mind was to just support this feature within the CheckpointHook, so that it would save the "best" checkpoint with a special filename.

@mannatsingh mannatsingh added the enhancement New feature or request label Jan 16, 2020
@miguelvr
Copy link
Contributor Author

I don't mind doing a PR for that as well. I'm using the framework for training a lot of models at the moment and this is something that'll have to do anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants