-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new: add quaterion progress bar, add trainer defaults, move callbacks… #114
Conversation
✅ Deploy Preview for capable-unicorn-d5e336 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
[WIP] Errors in netlify log |
quaterion/main.py
Outdated
train_dataloader: DataLoader instance to retrieve samples during training | ||
stage | ||
val_dataloader: Optional DataLoader instance to retrieve samples during | ||
validation stage | ||
trainer: `pytorch_lightning.Trainer` instance to handle fitting routine | ||
internally. If not passed will be created with `Quaterion.trainer_defaults` | ||
ckpt_path: Path/URL of the checkpoint from which training is resumed. If there is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it some kind of requirement or something intended for UX?
I guess it will instead cause a headache most of the time. TBH, sounds more awkward than #29.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of reasons:
-
We want to replace default lightning progress bar with our progress bar. We don't want to replace a user defined one.
We have no a possibility to distinguish whether progress bar was set by user or by trainer itself.
Hence, we can pass our progress bar viatrainer_defaults
. But if user wants to provide some custom progress bar, it can be done with overridingcallbacks
intrainer_defaults
or with passing a ready to usepl.Trainer
instance. -
It just relieves users from some boilerplate code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there are a lot of parameters we would like to pre-define in trainer, which would serve as reasonable defaults
, but it looks like trainer is not designed to be manipulated after it is already created. Additionally, we can't wrap trainer, because it have a huge list of parameters, and supporting backward compatibility for them would be very painful.
So there is a proposal to handle 90% of cases with default trainer configuration which will look nice, at the same time keeping the ability to override trainer if required. Default configuration should work good enough for all the tutorial examples, if it is currently not - we can adjust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this idea comes from using a custom progress bar, then I would like to stick to plain old TQDM. Noone can blame us for using a package that is downloaded ~10m a week.
We set many defaults here, and even if a user wants to change one of them, they will lose all the good defaults. And they will need to continuously check what is set. Some of them are boilerplates, but some are hyperparameters. In #29, we concluded that we cannot safely predict user's intention and that it's not our responsibility. However, this change assumes user intentions from a more radical perspective, and we take more responsibilities. It's just confusing.
The final decision is yours, but this PR should be reverted in my humble opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set many defaults here, and even if a user wants to change one of them, they will lose all the good defaults.
can you please elaborate on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I was afraid of that, but maybe we can show info/warning in case if user is using a default trainier with a link to explanation? I would really want to have a default trainer, it looks much better on GIF and demos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer writing fewer lines of code and agree that it looks much better, but I strongly believe that hyperparameters should be really, really set by users themselves.
Alternatively, we can convert this one into a discussion and log a link to that discussion in Quaterion.fit()
before actually merging it into master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which hyperparameters are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of epochs and early stopping are hyperparameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can preserve the default value of max_epochs
as it is in pytorch_lightning.Trainer
, AFAIK it is 1000
.
Imho, I don't see something really wrong in usage of EarlyStopping
by default.
As I can judge, it is quite common to set EarlyStopping
and monitor validation_loss
, is not it?
num_batches = len(train_dataloader) | ||
if num_batches > 0: | ||
defaults["log_every_n_steps"] = min(50, num_batches) | ||
except Exception: # If dataset has to length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe specify exception? (TypeError, NotImplementedError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though about it, but I don't trust this list used in Lightning, the whole section is optional, so I am if it fails for whatever reason
quaterion/main.py
Outdated
not encoder.trainable | ||
for encoder in trainable_model.model.encoders.values() | ||
] | ||
) and (trainable_model.configure_caches() is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a correct check?
We have CacheType.None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
def render(self, task) -> RenderableType: | ||
task_speed = f"{task.speed:>.2f}" if task.speed is not None else "0.00" | ||
self.max_length = max(len(task_speed), self.max_length) | ||
task_speed = " " * (self.max_length - len(task_speed)) + task_speed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we limit the minimal width (4 as a len("0.00")), but do not limit the maximum, right?
we allow this field to be broader then 4 symbols ("42.42")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the idea is to keep maximal seed length. Otherwise the whole line will be shifted each time the speed changes from 9.99
to 10.01
One of my concerns regarding our custom progress bar is that once user uses a custom progress bar or lightning default one, our |
quaterion/main.py
Outdated
disable_checkpoints = all( | ||
[ | ||
not encoder.trainable | ||
for encoder in trainable_model.model.encoders.values() | ||
] | ||
) and ( | ||
cache_config is not None and cache_config.cache_type != CacheType.NONE | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_encoders_frozen = all(not encoder.trainable for encoder in trainable_model.model.encoders.values())
cache_configured = cache_config is not None and cache_config.cache_type != CacheType.None
disable_checkpoints = all_encoders_frozen and cache_configured
I accidentally merged pull request into this branch, will amend two last commits and make a force push |
b7759a1
to
79243e2
Compare
… #113