question for uniform matcher #23

StephanPan · 2021-04-21T06:27:34Z

according to your code, the uniform matcher seems calculate the L1 distance between pred_bbox/anchor with target among batch imgs. but i think it should be computed within single img. another question is that i do not understand the fusion method of the anchor indices and the pred_box indices, why simply add the two indices?https://github.com/megvii-model/YOLOF/blob/61a8accf957dceef11ea8029f121922b5f60901e/playground/detection/coco/yolof/yolof_base/uniform_matcher.py#L77

chensnathan · 2021-04-21T10:00:23Z

Hi,
For the first question, the indexes are actually selected in each image. Using batch is an implementation to avoid loop calculation.

For the second question, you can refer to the answer here.

StephanPan · 2021-04-22T02:01:48Z

thanks for your answer! and i wonder whether it is suitable for light-weighted model such as yolov4tiny, and i am in experiment, the result is not well, i simply changed the backbone to the yolov4tiny's, the map only got 13.8 as the input size is 320. could you give any suggestions?

chensnathan · 2021-04-22T04:56:00Z

Hi, we did not train tiny models before. But I am happy to help get reasonable results.

Could you provide more details about your modification? The backbone file, pre-trained models, and config file will be helpful.

StephanPan · 2021-04-22T10:57:33Z

i simply change the backbone according to the yolov4tiny, and the anchor of 512 is deleted since the limited img size. btw, the activation is replaced by leakyrelu. other setting is the same as cpsdarknet53-dc5.
class DarkNet(Backbone):
"""DarkNet backbone.
Refer to the paper for more details: https://arxiv.org/pdf/1804.02767

Args:
    depth (int): Depth of Darknet, from {53}.
    num_stages (int): Darknet stages, normally 5.
    with_csp (bool): Use cross stage partial connection or not.
    out_features (List[str]): Output features.
    norm_type (str): type of normalization layer.
    res5_dilation (int): dilation for the last stage
"""

arch_settings = {
    53: (DarkBlock, (1, 1, 1))
}

def __init__(self,
             depth,
             with_csp=True,
             out_features=["res5"],
             norm_type="BN",
             res5_dilation=1):
    super(DarkNet, self).__init__()
    if depth not in self.arch_settings:
        raise KeyError('invalid depth {} for resnet'.format(depth))
    self.with_csp = with_csp
    self._out_features = out_features
    self.norm_type = norm_type
    self.res5_dilation = res5_dilation

    self.block, self.stage_blocks = self.arch_settings[depth]
    self.inplanes = 64

    self._make_stem_layer()

    self.dark_layers = []
    for i, num_blocks in enumerate(self.stage_blocks):
        planes = 128 * 2 ** i
        dilation = 1
        stride = 2
        if i == 4 and self.res5_dilation == 2:
            dilation = self.res5_dilation
            stride = 1
        if not self.with_csp:
            layer = make_dark_layer(
                block=self.block,
                inplanes=self.inplanes,
                planes=planes,
                num_blocks=num_blocks,
                dilation=dilation,
                stride=stride,
                norm_type=self.norm_type
            )
        else:
            layer = make_cspdark_layer(
                block=self.block,
                inplanes=self.inplanes,
                planes=planes,
                num_blocks=num_blocks,
                is_csp_first_stage=True if i == 0 else False,
                dilation=dilation,
                norm_type=self.norm_type
            )
            layer = CrossStagePartialBlock(
                self.inplanes,
                planes,
                stage_layers=layer,
                is_csp_first_stage=True if i == 0 else False,
                dilation=dilation,
                stride=stride,
                norm_type=self.norm_type
            )
        self.inplanes = planes
        layer_name = 'layer{}'.format(i + 1)
        self.add_module(layer_name, layer)
        self.dark_layers.append(layer_name)

    # freeze stage<=2
    # for p in self.conv1.parameters():
    #     p.requires_grad = False
    # for p in self.bn1.parameters():
    #     p.requires_grad = False
    # for p in self.layer1.parameters():
    #     p.requires_grad = False
    # for p in self.layer2.parameters():
    #     p.requires_grad = False

def _make_stem_layer(self):
    self.conv1 = nn.Conv2d(
        3,
        32,
        kernel_size=3,
        stride=2,
        padding=1,
        bias=False
    )
    self.bn1 = get_norm(
        self.norm_type, 32, eps=1e-4, momentum=0.03
    )
    # self.act1 = Mish()
    self.act1 = LeakyReLU()

    self.conv2 = nn.Conv2d(
        32,
        self.inplanes,
        kernel_size=3,
        stride=2,
        padding=1,
        bias=False
    )
    self.bn2 = get_norm(
        self.norm_type, self.inplanes, eps=1e-4, momentum=0.03
    )
    self.act2 = LeakyReLU()

def forward(self, x):
    outputs = {}
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.act1(x)
    x = self.conv2(x)
    x = self.bn2(x)
    x = self.act2(x)

    for i, layer_name in enumerate(self.dark_layers):
        layer = getattr(self, layer_name)
        x = layer(x)
    outputs[self._out_features[-1]] = x
    return outputs

def output_shape(self):
    return {
        "res3": ShapeSpec(
            channels=512, stride=16 if self.res5_dilation == 2 else 32
        )
    }

chensnathan · 2021-04-23T02:43:55Z

Ok, I will try it.

StephanPan · 2021-04-25T07:26:32Z

Thx for your reply! Another question is that wheather the multi-scale training and swa are included?

chensnathan · 2021-04-26T02:03:07Z

Multi-scale training is supported by Detectron2. You can refer to this repo for swa.

The results for the multi-scale training and saw are not included in this repo. You can try them yourself.

StephanPan · 2021-04-26T08:49:31Z

thx a lot! i find that when i change the test img size from 608 to 320, the performance drops a lot. map drops from 43.2 to 34.5. The performance degradation is significant in small and medium object (small object map drops from 22.8 to 11.8, medium object map drops from 47.2 to 36.4). compare to yolov4 with the input size of 320, the small object detection of yolof is not satisfying, is there any suggestions to improve is?

chensnathan · 2021-04-26T09:09:37Z

You may need to re-train YOLOF with small image sizes. The provided pre-train model is trained with relatively large image sizes (from 512 to 768), which is not suitable to test with image size 320.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question for uniform matcher #23

question for uniform matcher #23

StephanPan commented Apr 21, 2021

chensnathan commented Apr 21, 2021

StephanPan commented Apr 22, 2021

chensnathan commented Apr 22, 2021

StephanPan commented Apr 22, 2021

chensnathan commented Apr 23, 2021

StephanPan commented Apr 25, 2021

chensnathan commented Apr 26, 2021

StephanPan commented Apr 26, 2021

chensnathan commented Apr 26, 2021

question for uniform matcher #23

question for uniform matcher #23

Comments

StephanPan commented Apr 21, 2021

chensnathan commented Apr 21, 2021

StephanPan commented Apr 22, 2021

chensnathan commented Apr 22, 2021

StephanPan commented Apr 22, 2021

chensnathan commented Apr 23, 2021

StephanPan commented Apr 25, 2021

chensnathan commented Apr 26, 2021

StephanPan commented Apr 26, 2021

chensnathan commented Apr 26, 2021