Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda OOM #5

Open
opentld opened this issue Mar 30, 2022 · 7 comments
Open

cuda OOM #5

opentld opened this issue Mar 30, 2022 · 7 comments

Comments

@opentld
Copy link

opentld commented Mar 30, 2022

platform: windows10 anaconda, RTX2080 8G

python inference_davis.py --with_box_refine --binary --freeze_text_encoder --output_dir davis_dirs/resnet50 --resume ckpt/ytvos_r50.pth --backbone resnet50 --ngpu 1

Inference only supports for batch size = 1
Namespace(a2d_path='data/a2d_sentences', aux_loss=True, backbone='resnet50', backbone_pretrained=None, batch_size=1, bbox_loss_coef=5, binary=True, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_path='data/coco', controller_layers=3, dataset_file='davis', davis_path='data/ref-davis', dec_layers=4, dec_n_points=4, device='cuda', dice_loss_coef=5, dilation=False, dim_feedforward=2048, dist_url='env://', dropout=0.1, dynamic_mask_channels=8, enc_layers=4, enc_n_points=4, eos_coef=0.1, epochs=10, eval=False, focal_alpha=0.25, freeze_text_encoder=True, giou_loss_coef=2, hidden_dim=256, jhmdb_path='data/jhmdb_sentences', lr=0.0001, lr_backbone=5e-05, lr_backbone_names=['backbone.0'], lr_drop=[6, 8], lr_linear_proj_mult=1.0, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_text_encoder=1e-05, lr_text_encoder_names=['text_encoder'], mask_dim=256, mask_loss_coef=2, masks=True, max_size=640, max_skip=3, ngpu=1, nheads=8, num_feature_levels=4, num_frames=5, num_queries=5, num_workers=4, output_dir='davis_dirs/resnet50', position_embedding='sine', pre_norm=False, pretrained_weights=None, rel_coord=True, remove_difficult=False, resume='ckpt/ytvos_r50.pth', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_dice=5, set_cost_giou=2, set_cost_mask=2, split='valid', start_epoch=0, threshold=0.5, two_stage=False, use_checkpoint=False, visualize=False, weight_decay=0.0005, with_box_refine=True, world_size=1, ytvos_path='data/ref-youtube-vos')
Start inference
processor 0: 0% 0/30 [00:00<?, ?it/s]Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.dense.bias']

  • This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    number of params: 51394175
    D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
    D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
    To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.)
    return torch.floor_divide(self, other)
    Traceback (most recent call last):
    File "inference_davis.py", line 330, in
    main(args)
    File "inference_davis.py", line 103, in main
    p.run()
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
    File "inference_davis.py", line 224, in sub_processor
    outputs = model([imgs], [exp], [target])
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "D:\SourceCodes\Transformers\ReferFormer\models\referformer.py", line 286, in forward
    self.transformer(srcs, text_embed, masks, poses, query_embeds)
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 170, in forward
    memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 291, in forward
    output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 261, in forward
    src = self.forward_ffn(src)
    File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 248, in forward_ffn
    src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
    File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
    RuntimeError: CUDA out of memory. Tried to allocate 1.43 GiB (GPU 0; 8.00 GiB total capacity; 3.75 GiB already allocated; 691.50 MiB free; 5.43 GiB reserved in total by PyTorch)
    processor 0: 0% 0/30 [00:23<?, ?it/s]

At least how much memory is required to run?
or What parameters can be modified to reduce memory overhead?

Thanks!

@opentld
Copy link
Author

opentld commented Mar 30, 2022

I changed the clip_len to 8 and it worked, but when running to 47%, OOM appeared again :( @wjn922


processor 0: 47% 14/30 [06:20<06:29, 24.34s/it]Traceback (most recent call last):
File "inference_davis.py", line 329, in
main(args)
File "inference_davis.py", line 103, in main
p.run()
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "inference_davis.py", line 254, in sub_processor
anno_masks[anno_masks < 0.5] = 0.0
RuntimeError: CUDA out of memory. Tried to allocate 4.59 GiB (GPU 0; 8.00 GiB total capacity; 1.67 GiB already allocated; 2.85 GiB free; 3.26 GiB reserved in total by PyTorch)
processor 0: 47% 14/30 [06:47<07:45, 29.09s/it]


image

@opentld
Copy link
Author

opentld commented Mar 30, 2022

After checking, it was the image of 'goldfish' that caused the OOM, so I skip objects with num_obj greater than 3.
Back to the original topic, will changing clip_len to 8 reduce the precision? @wjn922

image

@wjn922
Copy link
Owner

wjn922 commented Mar 31, 2022

Hi,

We run the code on the V100 with 32G memory. We find it needs around 24G generally, while for some videos containing a lot of objects, it will reach 32G.

To reduce the memory, one way is use a shorter clip like you do. Another way is to reduce the video resolution here. But these two solutions are likely to reduce the precision.

@opentld
Copy link
Author

opentld commented Apr 1, 2022

It views the language as queries and directly attends to the most relevant regions in the video frames....

How to achieve using language as queries like the gif of the homepage shows? @wjn922

@wjn922
Copy link
Owner

wjn922 commented Apr 2, 2022

For the Transformer decoder, the decoder embedding is the pooled language feature, and the learnable queries are pos embedding. Please refer here.

@AngelTang190
Copy link

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

@AngelTang190
Copy link

Hi @wjn922 ,

What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error?

I tried after resizing it to 250, the 48th video gives the CUDA OOM error, the number of expressions is 2, and the length of the video is 36. The values are not very high compared to the previous videos. What causes this to happen? Below is the result of the error:

processor 0: 24% 48/202 [04:14<18:29, 7.20s/it]
Number of expressions: 2
Length of video: 36

Process Process-2:
Traceback (most recent call last):
File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/fyp-student/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "inference_ytvos.py", line 207, in sub_processor
outputs = model([imgs], [exp], [target])
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, **kwargs)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/referformer.py", line 321, in forward
mask_features = self.pixel_decoder(features, text_features, pos, memory, nf=t) # [batch_size
time, c, out_h, out_w]
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 258, in forward
y = self.forward_features(features, text_features, pos, memory, nf)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 227, in forward_features
cur_fpn = cross_attn(tgt=vision_features,
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 404, in forward
return self.forward_post(tgt, memory, t, h, w,
File "/home/fyp-student/PycharmProjects/ReferringImageSegmentationTraining/ReferFormer/models/segmentation.py", line 337, in forward_post
tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
File "/home/fyp-student/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 1206, in relu
result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 2.34 GiB (GPU 0; 10.75 GiB total capacity; 4.96 GiB already allocated; 1.71 GiB free; 7.29 GiB reserved in total by PyTorch)
Total inference time: 255.2629 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants