Training Loss Evaluation #152

rufex2001 · 2020-10-06T16:44:58Z

This PR adds an evaluation job that is simply the training loss. Currently, support is only for computing the loss on the training split. This is part of the following PR:

#99

…g split)

kge/job/train.py

rgemulla · 2020-10-07T14:45:40Z

Also, the configuraiton should be updated (saying that there is now this other evaluation type).

This reverts commit 7e5626d.

parsing of regex does not work correctly yet

rufex2001 · 2020-10-09T11:18:44Z

@rgemulla All comments addressed in latest commit

rufex2001 · 2020-10-09T11:19:49Z

@rgemulla Please wait until I add sum_loss to tracing for merging

rufex2001 · 2020-10-09T12:54:19Z

@rgemulla I've added sum loss tracing to trainers and the training loss eval job. Two things:

Should we do this as simple as it is how? Or should we add this also at the batch trace level? This would mean trainers add sum_loss to their results object.
The solution to preparing the train job when creating it as an eval job (eval.py line 172) calls a private method from outside. Not sure this is what you meant, but this is partly why I didn't do it like this directly, but I forgot to bring it up (until now).

…ions. parameters are grouped by regex expressions.

kge/job/eval.py

kge/job/train.py

rufex2001 · 2020-10-09T17:11:59Z

@rgemulla Ok, num_examples was there but called size. Renamed it to num_examples because I think that makes more sense than the notion of "epoch size". We do call it size at the batch level, which of course makes sense. If you want, I can revert this.

Also, not sure what keys to include in the trace produced by this eval job. I manually add avg_loss and num_examples from the internal training job for now. When adding everything, some things are overwritten, e.g. the event key, type key, epoch number key, etc. Also, this causes an error during the trace event, e.g. because there is already a key for job_id that comes from the already traced entry from the training job. We need to either decide which keys from the train job to include (avg cost too? avg penalty?), or which keys to exclude to prevent errors and overwritten keys. Ideally, it would be nice if all relevant keys from the train job were available to design custom metrics.

rufex2001 · 2020-10-12T16:02:30Z

@rgemulla Tracing done as suggested.

rgemulla

Other than this, looks good to me. Please add an entry into the CHANGELOG before merging.

rgemulla · 2020-10-13T10:15:28Z

kge/job/train.py

@@ -288,9 +312,12 @@ def run_epoch(self) -> Dict[str, Any]:
            epoch=self.epoch,
            split=self.train_split,
            batches=len(self.loader),
-            size=self.num_examples,
-            lr=[group["lr"] for group in self.optimizer.param_groups],
+            num_examples=self.num_examples,


Since we use size everywhere else (e.g., for the batch size), I suggest to continue using "size" here.

rufex2001 · 2020-10-13T13:14:06Z

@rgemulla Fixed!

rgemulla · 2020-10-13T15:16:10Z

Alright, please add the CHANGELOG and merge

…g split)

…ation' into training_loss_evaluation

Added support for training_loss evaluation (currently only on trainin…

3cab063

…g split)

rufex2001 requested a review from rgemulla October 6, 2020 16:44

separate relation optimizer

7e5626d

rgemulla reviewed Oct 7, 2020

View reviewed changes

AdrianKs and others added 3 commits October 8, 2020 16:02

Revert "separate relation optimizer"

d539199

This reverts commit 7e5626d.

parameter specific optimizer configurations

d275419

parsing of regex does not work correctly yet

Addresses comments from PR

c1feea6

Added sum_loss tracing to trainer and trainjob eval

93db374

give option to create parameter groups and provide group specific opt…

87c5463

…ions. parameters are grouped by regex expressions.

rgemulla reviewed Oct 9, 2020

View reviewed changes

kge/job/eval.py Outdated Show resolved Hide resolved

kge/job/eval.py Outdated Show resolved Hide resolved

kge/job/eval.py Outdated Show resolved Hide resolved

kge/job/eval.py Show resolved Hide resolved

kge/job/train.py Outdated Show resolved Hide resolved

AdrianKs and others added 4 commits October 9, 2020 17:14

fix renaming of optimizer args

b965a65

throw error if multiple optimizer types are set

05e23ad

update CHANGELOG.md

2a37777

Removed sum_loss, added num_examples

3525c1e

AdrianKs and others added 2 commits October 12, 2020 09:20

add parameter_names to checkpoint dump

a22724e

Addresing tracing of training_loss eval job

fc7ea27

rgemulla reviewed Oct 13, 2020

View reviewed changes

Fixed tracing keys for consistency

9992621

drop redundant logging of evaluation results

15420a3

rufex2001 added 3 commits October 14, 2020 13:12

Updated changelog with training loss evaluation job

e42f524

Added support for training_loss evaluation (currently only on trainin…

2976634

…g split)

Addresses comments from PR

b10fb91

rufex2001 added 6 commits October 14, 2020 14:13

Added sum_loss tracing to trainer and trainjob eval

c687b0e

Removed sum_loss, added num_examples

18d821f

Addresing tracing of training_loss eval job

9a2638e

Fixed tracing keys for consistency

c04f2ca

Updated changelog with training loss evaluation job

4eb7169

Merge remote-tracking branch 'refs/remotes/origin/training_loss_evalu…

242da1f

…ation' into training_loss_evaluation

rufex2001 merged commit 54257fb into master Oct 14, 2020

rufex2001 deleted the training_loss_evaluation branch October 14, 2020 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Loss Evaluation #152

Training Loss Evaluation #152

rufex2001 commented Oct 6, 2020

rgemulla commented Oct 7, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 12, 2020

rgemulla left a comment •

edited

Loading

rgemulla Oct 13, 2020

rufex2001 commented Oct 13, 2020

rgemulla commented Oct 13, 2020

Training Loss Evaluation #152

Training Loss Evaluation #152

Conversation

rufex2001 commented Oct 6, 2020

rgemulla commented Oct 7, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 9, 2020

rufex2001 commented Oct 12, 2020

rgemulla left a comment • edited Loading

Choose a reason for hiding this comment

rgemulla Oct 13, 2020

Choose a reason for hiding this comment

rufex2001 commented Oct 13, 2020

rgemulla commented Oct 13, 2020

rgemulla left a comment •

edited

Loading