cuda OOM #5

opentld · 2022-03-30T10:44:19Z

platform: windows10 anaconda, RTX2080 8G

python inference_davis.py --with_box_refine --binary --freeze_text_encoder --output_dir davis_dirs/resnet50 --resume ckpt/ytvos_r50.pth --backbone resnet50 --ngpu 1

Inference only supports for batch size = 1
Namespace(a2d_path='data/a2d_sentences', aux_loss=True, backbone='resnet50', backbone_pretrained=None, batch_size=1, bbox_loss_coef=5, binary=True, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_path='data/coco', controller_layers=3, dataset_file='davis', davis_path='data/ref-davis', dec_layers=4, dec_n_points=4, device='cuda', dice_loss_coef=5, dilation=False, dim_feedforward=2048, dist_url='env://', dropout=0.1, dynamic_mask_channels=8, enc_layers=4, enc_n_points=4, eos_coef=0.1, epochs=10, eval=False, focal_alpha=0.25, freeze_text_encoder=True, giou_loss_coef=2, hidden_dim=256, jhmdb_path='data/jhmdb_sentences', lr=0.0001, lr_backbone=5e-05, lr_backbone_names=['backbone.0'], lr_drop=[6, 8], lr_linear_proj_mult=1.0, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_text_encoder=1e-05, lr_text_encoder_names=['text_encoder'], mask_dim=256, mask_loss_coef=2, masks=True, max_size=640, max_skip=3, ngpu=1, nheads=8, num_feature_levels=4, num_frames=5, num_queries=5, num_workers=4, output_dir='davis_dirs/resnet50', position_embedding='sine', pre_norm=False, pretrained_weights=None, rel_coord=True, remove_difficult=False, resume='ckpt/ytvos_r50.pth', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_dice=5, set_cost_giou=2, set_cost_mask=2, split='valid', start_epoch=0, threshold=0.5, two_stage=False, use_checkpoint=False, visualize=False, weight_decay=0.0005, with_box_refine=True, world_size=1, ytvos_path='data/ref-youtube-vos')
Start inference
processor 0: 0% 0/30 [00:00<?, ?it/s]Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.dense.bias']

This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
number of params: 51394175
D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
Traceback (most recent call last):
File "inference_davis.py", line 330, in
main(args)
File "inference_davis.py", line 103, in main
p.run()
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "inference_davis.py", line 224, in sub_processor
outputs = model([imgs], [exp], [target])
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\referformer.py", line 286, in forward
self.transformer(srcs, text_embed, masks, poses, query_embeds)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 170, in forward
memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 291, in forward
output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 261, in forward
src = self.forward_ffn(src)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 248, in forward_ffn
src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\linear.py", line 96, in forward
return F.linear(input, self.weight, self.bias)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 1.43 GiB (GPU 0; 8.00 GiB total capacity; 3.75 GiB already allocated; 691.50 MiB free; 5.43 GiB reserved in total by PyTorch)
processor 0: 0% 0/30 [00:23<?, ?it/s]

At least how much memory is required to run?
or What parameters can be modified to reduce memory overhead?

Thanks!

opentld · 2022-03-30T12:21:55Z

I changed the clip_len to 8 and it worked, but when running to 47%, OOM appeared again :( @wjn922

processor 0: 47% 14/30 [06:20<06:29, 24.34s/it]Traceback (most recent call last):
File "inference_davis.py", line 329, in
main(args)
File "inference_davis.py", line 103, in main
p.run()
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "inference_davis.py", line 254, in sub_processor
anno_masks[anno_masks < 0.5] = 0.0
RuntimeError: CUDA out of memory. Tried to allocate 4.59 GiB (GPU 0; 8.00 GiB total capacity; 1.67 GiB already allocated; 2.85 GiB free; 3.26 GiB reserved in total by PyTorch)
processor 0: 47% 14/30 [06:47<07:45, 29.09s/it]

opentld · 2022-03-30T13:11:36Z

After checking, it was the image of 'goldfish' that caused the OOM, so I skip objects with num_obj greater than 3.
Back to the original topic, will changing clip_len to 8 reduce the precision? @wjn922

wjn922 · 2022-03-31T08:58:40Z

Hi,

We run the code on the V100 with 32G memory. We find it needs around 24G generally, while for some videos containing a lot of objects, it will reach 32G.

To reduce the memory, one way is use a shorter clip like you do. Another way is to reduce the video resolution here. But these two solutions are likely to reduce the precision.

opentld · 2022-04-01T11:35:33Z

It views the language as queries and directly attends to the most relevant regions in the video frames....

How to achieve using language as queries like the gif of the homepage shows? @wjn922

wjn922 · 2022-04-02T08:55:28Z

For the Transformer decoder, the decoder embedding is the pooled language feature, and the learnable queries are pos embedding. Please refer here.

AngelTang190 · 2022-06-24T14:07:03Z

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

AngelTang190 · 2022-06-25T16:05:53Z

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

I tried after resizing it to 250, the 48th video gives the CUDA OOM error, the number of expressions is 2, and the length of the video is 36. The values are not very high compared to the previous videos. What causes this to happen? Below is the result of the error:

processor 0: 24% 48/202 [04:14<18:29, 7.20s/it]
Number of expressions: 2
Length of video: 36

Process Process-2:
Traceback (most recent call last):
File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "inference_ytvos.py", line 207, in sub_processor
outputs = model([imgs], [exp], [target])
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, **kwargs)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/referformer.py", line 321, in forward
mask_features = self.pixel_decoder(features, text_features, pos, memory, nf=t) # [batch_sizetime, c, out_h, out_w]
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 258, in forward
y = self.forward_features(features, text_features, pos, memory, nf)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 227, in forward_features
cur_fpn = cross_attn(tgt=vision_features,
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 404, in forward
return self.forward_post(tgt, memory, t, h, w,
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 337, in forward_post
tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1206, in relu
result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 2.34 GiB (GPU 0; 10.75 GiB total capacity; 4.96 GiB already allocated; 1.71 GiB free; 7.29 GiB reserved in total by PyTorch)
Total inference time: 255.2629 s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda OOM #5

cuda OOM #5

opentld commented Mar 30, 2022

opentld commented Mar 30, 2022 •

edited

Loading

opentld commented Mar 30, 2022

wjn922 commented Mar 31, 2022

opentld commented Apr 1, 2022

wjn922 commented Apr 2, 2022

AngelTang190 commented Jun 24, 2022

AngelTang190 commented Jun 25, 2022

cuda OOM #5

cuda OOM #5

Comments

opentld commented Mar 30, 2022

opentld commented Mar 30, 2022 • edited Loading

opentld commented Mar 30, 2022

wjn922 commented Mar 31, 2022

opentld commented Apr 1, 2022

wjn922 commented Apr 2, 2022

AngelTang190 commented Jun 24, 2022

AngelTang190 commented Jun 25, 2022

opentld commented Mar 30, 2022 •

edited

Loading