-
-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add early stopping utils #1545
Add early stopping utils #1545
Conversation
Thanks for the contribution! Seems generally handy and useful. Do you think it would make more sense as a callback? |
We can always do it that way later though. For now some docs and tests would be helpful. |
Something like this might make sense too once we merge #1471 struct EarlyStopping
l::AbstractVector
patience
tolerance
end
EarlyStopping(memory, patience, tolerance) = EarlyStopping(CircularBuffer(memory), patience, tolerance)
function (es::EarlyStopping)(l, gs, d)
roi = es.l[end-patience:end]
if l within tolerance
push!(es.l, l)
else
Flux.stop()
end
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. Can you add a doc string, entry in the documentation, and some tests?
No need to limit this utility by pigeon-holing it into a callback-biased implementation. I think the current approach in the PR is best. It can easily be used in a CB like: ea = Flux.early_stopping(loss(), patience=5)
cb = () -> ea() && Flux.stop() Perhaps we can have a version of |
Implementing it as a callback doesn't take away anything from it being used outside of a callback context. We should also generally want this to be decoupled from the loss function. It would hold unnecessary references alive. Anyway this would be expected to be used within a train loop 99% of the times, so it's worth it to make that api first class. |
The callback system uses |
Anyways, let's not open an argument. Like you said, we can do the callback (or not callback) later. |
The tests should be added under Docs will include a docstring and an entry in |
I have considered implementing a callback function for early stopping, which seems more natural and integrated into the current Flux workflow. However, here are some considerations that drive me to this closure-based implementation:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I think once you figure out the increasing/decreasing metric part, then this will likely be ready to go.
Your feedback on early stopping is appreciated. I think it is just more proof that we should do the most generic thing in this context, which is to write a utility that returns true
or false
. It will make the utility more flexible to use a variety of contexts. There are several options for how we integrate a boolean utility into the rest of the callback system.
re 2. since |
It can be used either way if the user is willing to a) write boiler plate try/catch code or b) accept an ugly exception message. A boolean utility truly can be used however the user wants, because it makes no impositions on the user. |
My personal experience tells otherwise: I believe this is because Lines 106 to 114 in 8100707
Maybe this is an unexpected behavior? |
No that's expected. I think what Dhairya was saying is that if you call the callback in your outer loop (over epochs) then it will throw an exception and stop the program. That's not great IMO because it means you either accept an ugly exception, or you write some boilerplate try/catch code to make it pretty. I don't see why we would want the user to write more boilerplate when |
I am also considering @DhairyaLGandhi's suggestion. Maybe we could provide a variant of I will try implement if this sounds like a good idea to you both @DhairyaLGandhi @darsnack. |
My only concern here is calling this "early stopping," since it doesn't match the accepted understanding of the term. To @DhairyaLGandhi's point, the current approach doesn't stop anything, it just returns a boolean that can be used to stop something. If we were to move to this more generic and powerful approach, I would suggest we consider renaming the function. The other alternative is to build this utility for applying a threshold on a window of values, then make |
`true` to stop, `false` to continue
Sounds somewhat troublesome and complicated to use 😂 I think I will stick with this implemention for now, which seems more intuitive. |
Since the current design allows for more than just stopping, do we still want to consider a name change? PyTorch's But if we use the utility here as our predicate, then the name |
For the current implementation, yes it will always be a So, in our case, |
What about something like: function patience(predicate, patience)
on_trigger = let count = 0
(args...; kwargs...) -> begin
count = predicate(args...; kwargs...) ? count + 1 : 0
return count >= patience
end
end
return on_trigger
end
function plateau(f, p; distance = -, min_dist = 0, init_score = 0)
on_plateau = let best_score = init_score
(args...; kwargs...) -> begin
score = f(args...; kwargs...)
Δ = distance(best_score, score)
best_score = Δ < 0 ? best_score : score
return Δ < min_dist
end
end
return patience(on_plateau, p)
end
const early_stopping = plateau Personally, I think the patience logic is fairly simple. The hard part is the score closure. If it wasn't for the best score tracking, then I would suggest completely foregoing |
This looks great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, thanks. I noticed a couple questions that I missed before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good. I like the doc additions. I think we should be GTM after this last round of revisions.
cc @DhairyaLGandhi so he can look over things
Cool, I'll give this a once over this weekend |
src/utils.jl
Outdated
""" | ||
function patience(predicate, wait) | ||
on_trigger = let count = 0 | ||
(args...; kwargs...) -> begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_ trigger
should be the name of the function imo. Good to have it named for stacktraces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that done with a let
?
Like
let ...
function name(...)
...
end
end
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I believe so
@queensferryme If you can address the latest comments when you get a chance, we can finish this PR up! Thanks for all your work and patience. |
Great, thanks @queensferryme! bors r+ |
1545: Add early stopping utils r=darsnack a=queensferryme This pull request introduces a utility function `early_stopping(f; min_delta=0, patience=3)` for conveniently implementing early stopping. Its motive was discussed in #227 back in 2018. Also AFAIU, early stopping is widely adopted in training large-scale neural networks nowadays as a mechanism to prevent overfitting. Therefore, I believe a built-in utility function for early stopping would be beneficial for Flux.jl users. The implementation is heavily based on [tf.keras.callbacks.EarlyStopping](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) and [ignite.handlers.EarlyStopping](https://github.com/pytorch/ignite/blob/master/ignite/handlers/early_stopping.py). > This is a _draft_ implementation. Documentation and tests are not added yet. As I am new to Flux.jl, I think advice and opinions from maintainers would be helpful before I proceed. **Example Usage**: ```julia # test_ea.jl using Flux function loss() v = 1 return () -> v += 1 end ea = Flux.early_stopping(loss(), patience=5) Flux.@epochs 10 begin ea() || break end ``` **Output**: ``` ➜ julia test_ea.jl Activating environment at `~/Developer/Flux.jl/Project.toml` [ Info: Epoch 1 [ Info: Epoch 2 [ Info: Epoch 3 [ Info: Epoch 4 [ Info: Epoch 5 [ Info: Epoch 6 ➜ ``` ### PR Checklist - [x] Tests are added - [ ] Entry in NEWS.md - [x] Documentation, if applicable - [ ] API changes require approval from a committer (different from the author, if applicable) Co-authored-by: Queensferry <queensferry.me@gmail.com> Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Thanks all |
@DhairyaLGandhi @darsnack Thanks for your patience and time! |
This pull request introduces a utility function
early_stopping(f; min_delta=0, patience=3)
for conveniently implementing early stopping. Its motive was discussed in #227 back in 2018. Also AFAIU, early stopping is widely adopted in training large-scale neural networks nowadays as a mechanism to prevent overfitting. Therefore, I believe a built-in utility function for early stopping would be beneficial for Flux.jl users.The implementation is heavily based on tf.keras.callbacks.EarlyStopping and ignite.handlers.EarlyStopping.
Example Usage:
Output:
PR Checklist