Question for models/trainer.py#L325 ? #114

zjreno · 2021-05-19T09:08:09Z

In https://github.com/nlpyang/BertSum/blob/master/src/models/trainer.py#L325 ,
After sum(), the loss.numel() must be 1 , What different between (loss/loss.numel()).backward() with loss.backward() ?

So, I guess, the loss.numel() may express the n_docs ?
Can we use loss / normalization replace (loss/loss.numel()) ?

The text was updated successfully, but these errors were encountered:

Anothernewcomer · 2021-12-14T02:28:30Z

Hi I have the same problem, what's your conclusion?

haidequanbu · 2022-08-10T07:45:45Z

Hi,I have a bug about this statement:
Traceback (most recent call last):
File "train.py", line 340, in
train(args, device_id)
File "train.py", line 272, in train
trainer.train(train_iter_fct, args.train_steps)
File "/root/code/BertSum/src/models/trainer.py", line 155, in train
self._gradient_accumulation(
File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation
loss.div(float(normalization)).backward()
File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Does it have any relation with the statement?Or have you solv it?
Pardon me for my poor English!

haidequanbu · 2022-08-11T06:36:22Z

Hi,I have a bug about this statement: Traceback (most recent call last): File "train.py", line 340, in train(args, device_id) File "train.py", line 272, in train trainer.train(train_iter_fct, args.train_steps) File "/root/code/BertSum/src/models/trainer.py", line 155, in train self._gradient_accumulation( File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation loss.div(float(normalization)).backward() File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Does it have any relation with the statement?Or have you solv it? Pardon me for my poor English!

Ok,i have already solved the problem.It is about using BCEcross before,you should give a sigmoid layer before the output.

zjreno changed the title ~~Question for models/trainer.py#L325 ,~~ Question for models/trainer.py#L325 ? May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question for models/trainer.py#L325 ? #114

Question for models/trainer.py#L325 ? #114

zjreno commented May 19, 2021 •

edited

Loading

Anothernewcomer commented Dec 14, 2021

haidequanbu commented Aug 10, 2022

haidequanbu commented Aug 11, 2022

Question for models/trainer.py#L325 ? #114

Question for models/trainer.py#L325 ? #114

Comments

zjreno commented May 19, 2021 • edited Loading

Anothernewcomer commented Dec 14, 2021

haidequanbu commented Aug 10, 2022

haidequanbu commented Aug 11, 2022

zjreno commented May 19, 2021 •

edited

Loading