Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question for uniform matcher #23

Open
StephanPan opened this issue Apr 21, 2021 · 9 comments
Open

question for uniform matcher #23

StephanPan opened this issue Apr 21, 2021 · 9 comments

Comments

@StephanPan
Copy link

according to your code, the uniform matcher seems calculate the L1 distance between pred_bbox/anchor with target among batch imgs. but i think it should be computed within single img. another question is that i do not understand the fusion method of the anchor indices and the pred_box indices, why simply add the two indices?https://github.com/megvii-model/YOLOF/blob/61a8accf957dceef11ea8029f121922b5f60901e/playground/detection/coco/yolof/yolof_base/uniform_matcher.py#L77

@chensnathan
Copy link
Collaborator

Hi,
For the first question, the indexes are actually selected in each image. Using batch is an implementation to avoid loop calculation.

For the second question, you can refer to the answer here.

@StephanPan
Copy link
Author

thanks for your answer! and i wonder whether it is suitable for light-weighted model such as yolov4tiny, and i am in experiment, the result is not well, i simply changed the backbone to the yolov4tiny's, the map only got 13.8 as the input size is 320. could you give any suggestions?

@chensnathan
Copy link
Collaborator

Hi, we did not train tiny models before. But I am happy to help get reasonable results.

Could you provide more details about your modification? The backbone file, pre-trained models, and config file will be helpful.

@StephanPan
Copy link
Author

i simply change the backbone according to the yolov4tiny, and the anchor of 512 is deleted since the limited img size. btw, the activation is replaced by leakyrelu. other setting is the same as cpsdarknet53-dc5.
class DarkNet(Backbone):
"""DarkNet backbone.
Refer to the paper for more details: https://arxiv.org/pdf/1804.02767

Args:
    depth (int): Depth of Darknet, from {53}.
    num_stages (int): Darknet stages, normally 5.
    with_csp (bool): Use cross stage partial connection or not.
    out_features (List[str]): Output features.
    norm_type (str): type of normalization layer.
    res5_dilation (int): dilation for the last stage
"""

arch_settings = {
    53: (DarkBlock, (1, 1, 1))
}

def __init__(self,
             depth,
             with_csp=True,
             out_features=["res5"],
             norm_type="BN",
             res5_dilation=1):
    super(DarkNet, self).__init__()
    if depth not in self.arch_settings:
        raise KeyError('invalid depth {} for resnet'.format(depth))
    self.with_csp = with_csp
    self._out_features = out_features
    self.norm_type = norm_type
    self.res5_dilation = res5_dilation

    self.block, self.stage_blocks = self.arch_settings[depth]
    self.inplanes = 64

    self._make_stem_layer()

    self.dark_layers = []
    for i, num_blocks in enumerate(self.stage_blocks):
        planes = 128 * 2 ** i
        dilation = 1
        stride = 2
        if i == 4 and self.res5_dilation == 2:
            dilation = self.res5_dilation
            stride = 1
        if not self.with_csp:
            layer = make_dark_layer(
                block=self.block,
                inplanes=self.inplanes,
                planes=planes,
                num_blocks=num_blocks,
                dilation=dilation,
                stride=stride,
                norm_type=self.norm_type
            )
        else:
            layer = make_cspdark_layer(
                block=self.block,
                inplanes=self.inplanes,
                planes=planes,
                num_blocks=num_blocks,
                is_csp_first_stage=True if i == 0 else False,
                dilation=dilation,
                norm_type=self.norm_type
            )
            layer = CrossStagePartialBlock(
                self.inplanes,
                planes,
                stage_layers=layer,
                is_csp_first_stage=True if i == 0 else False,
                dilation=dilation,
                stride=stride,
                norm_type=self.norm_type
            )
        self.inplanes = planes
        layer_name = 'layer{}'.format(i + 1)
        self.add_module(layer_name, layer)
        self.dark_layers.append(layer_name)

    # freeze stage<=2
    # for p in self.conv1.parameters():
    #     p.requires_grad = False
    # for p in self.bn1.parameters():
    #     p.requires_grad = False
    # for p in self.layer1.parameters():
    #     p.requires_grad = False
    # for p in self.layer2.parameters():
    #     p.requires_grad = False

def _make_stem_layer(self):
    self.conv1 = nn.Conv2d(
        3,
        32,
        kernel_size=3,
        stride=2,
        padding=1,
        bias=False
    )
    self.bn1 = get_norm(
        self.norm_type, 32, eps=1e-4, momentum=0.03
    )
    # self.act1 = Mish()
    self.act1 = LeakyReLU()

    self.conv2 = nn.Conv2d(
        32,
        self.inplanes,
        kernel_size=3,
        stride=2,
        padding=1,
        bias=False
    )
    self.bn2 = get_norm(
        self.norm_type, self.inplanes, eps=1e-4, momentum=0.03
    )
    self.act2 = LeakyReLU()

def forward(self, x):
    outputs = {}
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.act1(x)
    x = self.conv2(x)
    x = self.bn2(x)
    x = self.act2(x)

    for i, layer_name in enumerate(self.dark_layers):
        layer = getattr(self, layer_name)
        x = layer(x)
    outputs[self._out_features[-1]] = x
    return outputs

def output_shape(self):
    return {
        "res3": ShapeSpec(
            channels=512, stride=16 if self.res5_dilation == 2 else 32
        )
    }

@chensnathan
Copy link
Collaborator

Ok, I will try it.

@StephanPan
Copy link
Author

Thx for your reply! Another question is that wheather the multi-scale training and swa are included?

@chensnathan
Copy link
Collaborator

Multi-scale training is supported by Detectron2. You can refer to this repo for swa.

The results for the multi-scale training and saw are not included in this repo. You can try them yourself.

@StephanPan
Copy link
Author

thx a lot! i find that when i change the test img size from 608 to 320, the performance drops a lot. map drops from 43.2 to 34.5. The performance degradation is significant in small and medium object (small object map drops from 22.8 to 11.8, medium object map drops from 47.2 to 36.4). compare to yolov4 with the input size of 320, the small object detection of yolof is not satisfying, is there any suggestions to improve is?

@chensnathan
Copy link
Collaborator

You may need to re-train YOLOF with small image sizes. The provided pre-train model is trained with relatively large image sizes (from 512 to 768), which is not suitable to test with image size 320.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants