Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple Schedulers #1434

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

DhairyaLGandhi
Copy link
Member

Simple design for scheduling learning rate decay using the Optimiser interface

@DhairyaLGandhi DhairyaLGandhi changed the title Dg/multilr Add simple Schedulers Dec 21, 2020
@gxyd
Copy link
Contributor

gxyd commented Jan 5, 2021

Is the intention here to add other Schedulers as well that Pytorch implements?


* LambdaLR
* MultiplicativeLR (this PR?)
* StepLR
* MultiStepLR
* ExponentialLR
* CosineAnnealingLR
* ReduceLROnPlateu
* CyclicLR
* OneCycleLR
* CosineAnnealingWarmRestarts

@DhairyaLGandhi
Copy link
Member Author

Happy to have help with this! Having said that, the design of the optimisers seems to lend itself fine to adding schedulers. Note that we would want to have a fairly shallow type hierarchy.

We probably want to talk about what all features schedulers might need that can fall off from this design and what the limitations are.

@DhairyaLGandhi
Copy link
Member Author

I find the method to find the current epoch number a bit janky, which I would love to improve upon as a start

@ToucheSir
Copy link
Member

I think this should be reconciled with https://github.com/darsnack/ParameterSchedulers.jl somehow. cc @darsnack and @lorenzoh

@darsnack
Copy link
Member

darsnack commented Jan 5, 2021

ParameterSchedulers.jl should have most of PyTorch's functionality already implemented. The only ones that I'd want to double check are ReduceLROnPlateu and CosineAnnealingWarmRestarts. Probably these are just syntactic sugar to compose the existing schedulers.

@CarloLucibello mentioned integrating ParameterSchedulers.jl into Flux. I would have already made a PR, but I ran head first into Flux's limited optimizer interface. The main issue is that apply! is called on a per parameter basis. So, it ends up getting called many times per iteration, meaning that any scheduler would need to know a priori the number of times it will be called. This is computable but hacky imo. You can't rely on overloading update! since that only works for implicit parameters and breaks composability (i.e. Flux.Optimise.Optimiser doesn't invoke update! on each sub-optimizer, it invokes apply!).

I am currently writing up what a future interface should look like, and I will post it soon. I'll follow up with a PR to Flux that implements it based on the discussion that follows that post.

PS: This really only presents a problem for Flux.train! where the training loop is inaccessible to the user.

@darsnack
Copy link
Member

darsnack commented Jan 5, 2021

Looking at the source of this PR, it will hit the same issue that ScheduledOptim from ParameterSchedulers.jl did. o.current will end up getting updated more than once per iteration since apply! will be called once for every parameter. You could reconcile this by passing params(m) into the constructor and doing the scheduler logic only once per iteration, but that's too hacky.

@DhairyaLGandhi
Copy link
Member Author

will end up getting updated more than once per iteration

I'm aware of the bug, which isn't difficult to fix but this was more for design discussion than final implementation. You'll notice the janky name :)

@darsnack
Copy link
Member

darsnack commented Jan 5, 2021

Good to hear — a design discussion is exactly what I think is needed too.

@DhairyaLGandhi
Copy link
Member Author

Do post your thoughts here @darsnack, that's kind of the motivation of the PR.

The question on composition is that we can make it safe to call apply on a number of parameters together, but it's not very intuitive. To be clear, i want to move these over to optimisers.jl now, so we have a unified interface. It's written with that in mind.

@darsnack
Copy link
Member

darsnack commented Jan 5, 2021

To be clear, i want to move these over to optimisers.jl now, so we have a unified interface. It's written with that in mind.

Perfect, I was thinking the exact same thing.

@ToucheSir
Copy link
Member

I do think the interaction between learning rate schedule and optimizer will be very different moving from the current Flux interface to the mostly stateless one in Optimisers.jl. For example, ScheduledOptim.update_func can no longer be a purely mutating function, and I wonder if a PyTorch-esque interface is the best way to go. I know I keep mentioning it, but it's illustrative to see what a mutation-free library like Optax does here.

@darsnack
Copy link
Member

darsnack commented Jan 8, 2021

I posted my thoughts in a discussion. I'm interested to hear how people think the interface can be improved. Personally, I felt that I'm still unsatisfied with hyperparameter accesses, but I can't figure out anything better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants