Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impressive Results when changing conf-thres and iou-thres #8669

Closed
1 task done
pourmand1376 opened this issue Jul 21, 2022 · 11 comments
Closed
1 task done

Impressive Results when changing conf-thres and iou-thres #8669

pourmand1376 opened this issue Jul 21, 2022 · 11 comments
Labels
question Further information is requested

Comments

@pourmand1376
Copy link
Contributor

pourmand1376 commented Jul 21, 2022

Search before asking

Question

I have actually searched yolov5 and read theses related issues:

Still, with my dataset (which is medical image) I get impressive results when increasing conf-thres. I am not sure if my model is doing good or actually changing conf-thres is basically wrong?

Also, PR curves for each conf-thres is different. Is this normal?
I am worried about the results because of warning that you issue when conf-thres is more than 0.001.

[Default] Conf-thres = 0.001

image

Conf-thres = 0.5

image

Conf-thres = 0.7

image

Conf-thres = 0.8

image

Conf-thres = 0.85

image

Conf-thres = 0.86

image

Conf-thres = 0.88

image

Conf-thres = 0.9

image

Additional

No response

@pourmand1376 pourmand1376 added the question Further information is requested label Jul 21, 2022
@Zephyr69
Copy link

Zephyr69 commented Jul 21, 2022

If you mean the conf-thres in val.py, yes it's wrong to change that. The reason is exactly what was said by the warning.

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 22, 2022

In my specific application, I only care about precision/recall.
Is that also wrong? If yes, why would that be the case?

Also, see this comment which says that changing mAP would produce inaccurate mAP but nothing about other metrics.

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 22, 2022

As I can see, this is the only place where conf-thres is used in val.py.

yolov5/val.py

Line 217 in 4c1784b

out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)

So, this is basically filtering out the results. Results which have lower conf-thres than defined are just ignored. This is fine by any standard.

This is where AP for each class is calculated.

yolov5/val.py

Lines 263 to 271 in 4c1784b

# Compute metrics
stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)] # to numpy
if len(stats) and stats[0].any():
tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)
ap50, ap = ap[:, 0], ap.mean(1) # AP@0.5, AP@0.5:0.95
mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
nt = np.bincount(stats[3].astype(int), minlength=nc) # number of targets per class
else:
nt = torch.zeros(1)

And this is the actual code for calculating average precision.

yolov5/utils/metrics.py

Lines 29 to 93 in 4c1784b

def ap_per_class(tp, conf, pred_cls, target_cls, plot=False, save_dir='.', names=(), eps=1e-16):
""" Compute the average precision, given the recall and precision curves.
Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
# Arguments
tp: True positives (nparray, nx1 or nx10).
conf: Objectness value from 0-1 (nparray).
pred_cls: Predicted object classes (nparray).
target_cls: True object classes (nparray).
plot: Plot precision-recall curve at mAP@0.5
save_dir: Plot save directory
# Returns
The average precision as computed in py-faster-rcnn.
"""
# Sort by objectness
i = np.argsort(-conf)
tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
# Find unique classes
unique_classes, nt = np.unique(target_cls, return_counts=True)
nc = unique_classes.shape[0] # number of classes, number of detections
# Create Precision-Recall curve and compute AP for each class
px, py = np.linspace(0, 1, 1000), [] # for plotting
ap, p, r = np.zeros((nc, tp.shape[1])), np.zeros((nc, 1000)), np.zeros((nc, 1000))
for ci, c in enumerate(unique_classes):
i = pred_cls == c
n_l = nt[ci] # number of labels
n_p = i.sum() # number of predictions
if n_p == 0 or n_l == 0:
continue
# Accumulate FPs and TPs
fpc = (1 - tp[i]).cumsum(0)
tpc = tp[i].cumsum(0)
# Recall
recall = tpc / (n_l + eps) # recall curve
r[ci] = np.interp(-px, -conf[i], recall[:, 0], left=0) # negative x, xp because xp decreases
# Precision
precision = tpc / (tpc + fpc) # precision curve
p[ci] = np.interp(-px, -conf[i], precision[:, 0], left=1) # p at pr_score
# AP from recall-precision curve
for j in range(tp.shape[1]):
ap[ci, j], mpre, mrec = compute_ap(recall[:, j], precision[:, j])
if plot and j == 0:
py.append(np.interp(px, mrec, mpre)) # precision at mAP@0.5
# Compute F1 (harmonic mean of precision and recall)
f1 = 2 * p * r / (p + r + eps)
names = [v for k, v in names.items() if k in unique_classes] # list: only classes that have data
names = dict(enumerate(names)) # to dict
if plot:
plot_pr_curve(px, py, ap, Path(save_dir) / 'PR_curve.png', names)
plot_mc_curve(px, f1, Path(save_dir) / 'F1_curve.png', names, ylabel='F1')
plot_mc_curve(px, p, Path(save_dir) / 'P_curve.png', names, ylabel='Precision')
plot_mc_curve(px, r, Path(save_dir) / 'R_curve.png', names, ylabel='Recall')
i = smooth(f1.mean(0), 0.1).argmax() # max F1 index
p, r, f1 = p[:, i], r[:, i], f1[:, i]
tp = (r * nt).round() # true positives
fp = (tp / (p + eps) - tp).round() # false positives
return tp, fp, p, r, f1, ap, unique_classes.astype(int)

which calls this function:

yolov5/utils/metrics.py

Lines 96 to 121 in 4c1784b

def compute_ap(recall, precision):
""" Compute the average precision, given the recall and precision curves
# Arguments
recall: The recall curve (list)
precision: The precision curve (list)
# Returns
Average precision, precision curve, recall curve
"""
# Append sentinel values to beginning and end
mrec = np.concatenate(([0.0], recall, [1.0]))
mpre = np.concatenate(([1.0], precision, [0.0]))
# Compute the precision envelope
mpre = np.flip(np.maximum.accumulate(np.flip(mpre)))
# Integrate area under curve
method = 'interp' # methods: 'continuous', 'interp'
if method == 'interp':
x = np.linspace(0, 1, 101) # 101-point interp (COCO)
ap = np.trapz(np.interp(x, mrec, mpre), x) # integrate
else: # 'continuous'
i = np.where(mrec[1:] != mrec[:-1])[0] # points where x axis (recall) changes
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) # area under curve
return ap, mpre, mrec

Just the first stage is affected by conf-thres, so after changing this hyperparameter, the results of all metrics should be OK.

Note that ap_per_class function doesn't depend on any external thresholds. So, PR curve would not combine results for all confidence thresholds but just for one conf-threshold. This is what is see from code.

Am I missing something?

@GavinYang5
Copy link

i have the same question

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 22, 2022

I think this may be a bug. I checked that results for recall, precision and mAP are actually wrong!

Recall should decrease as we increase conf-thres value.

@pourmand1376 pourmand1376 changed the title Impressive Results when changing conf-thres Impressive Results when changing conf-thres (Most probably a bug) Jul 23, 2022
@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 23, 2022

I think the bug is here. In line 217, we are filtering out based on non_max_suppression. Then we enumerate on filtered result and use filtered targets to calculate mAP. This should not be the case. We should always return unfiltered targets.

yolov5/val.py

Lines 214 to 247 in 1c5e92a

targets[:, 2:] *= torch.tensor((width, height, width, height), device=device) # to pixels
lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else [] # for autolabelling
t3 = time_sync()
out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)
dt[2] += time_sync() - t3
# Metrics
for si, pred in enumerate(out):
labels = targets[targets[:, 0] == si, 1:]
nl, npr = labels.shape[0], pred.shape[0] # number of labels, predictions
path, shape = Path(paths[si]), shapes[si][0]
correct = torch.zeros(npr, niou, dtype=torch.bool, device=device) # init
seen += 1
if npr == 0:
if nl:
stats.append((correct, *torch.zeros((2, 0), device=device), labels[:, 0]))
continue
# Predictions
if single_cls:
pred[:, 5] = 0
predn = pred.clone()
scale_coords(im[si].shape[1:], predn[:, :4], shape, shapes[si][1]) # native-space pred
# Evaluate
if nl:
tbox = xywh2xyxy(labels[:, 1:5]) # target boxes
scale_coords(im[si].shape[1:], tbox, shape, shapes[si][1]) # native-space labels
labelsn = torch.cat((labels[:, 0:1], tbox), 1) # native-space labels
correct = process_batch(predn, labelsn, iouv)
if plots:
confusion_matrix.process_batch(predn, labelsn)
stats.append((correct, pred[:, 4], pred[:, 5], labels[:, 0])) # (correct, conf, pcls, tcls)

I've also printed target_cls.shape when passing the input to ap_per_class function like this:

def ap_per_class(tp, conf, pred_cls, target_cls, plot=False, save_dir='.', names=(), eps=1e-16):
    """ Compute the average precision, given the recall and precision curves.
    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
    # Arguments
        tp:  True positives (nparray, nx1 or nx10).
        conf:  Objectness value from 0-1 (nparray).
        pred_cls:  Predicted object classes (nparray).
        target_cls:  True object classes (nparray).
        plot:  Plot precision-recall curve at mAP@0.5
        save_dir:  Plot save directory
    # Returns
        The average precision as computed in py-faster-rcnn.
    """
    print(f"TP Shape: {tp.shape}")
    print(f"Conf Shape : {conf.shape}")
    print(f"predicted_cls: {pred_cls.shape}")
    print(f"target_cls: {target_cls.shape}")
## using conf-thres 0.85
TP Shape: (33, 10)
Conf Shape : (33,)
predicted_cls: (33,)
target_cls: (30,)
#  -------------
# using conf-thres 0.001
TP Shape: (25239, 10)
Conf Shape : (25239,)
predicted_cls: (25239,)
target_cls: (281,)
# -------------
# using conf-thres 0.001
TP Shape: (24777, 10)
Conf Shape : (24777,)
predicted_cls: (24777,)
target_cls: (266,)
# --------------

Even when having same conf-thres as recommended, we end up calculating the result wrong! Target_cls should always be the same, no matter what conf-thres is, this is because number of target labels per class is always the same.

This way, even setting conf-thres to 0.001 gives wrong results.

If you look at labels count column in every threshold above, you understand that this is wrong. Labels count shouldn't change for each run!

I will do a PR soon.

@pourmand1376 pourmand1376 changed the title Impressive Results when changing conf-thres (Most probably a bug) Impressive Results when changing conf-thres (most probably a bug) Jul 23, 2022
@pourmand1376 pourmand1376 changed the title Impressive Results when changing conf-thres (most probably a bug) Impressive Results when changing conf-thres and iou-thres (most probably a bug) Jul 23, 2022
@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 23, 2022

Now with PR #8686, my results are consistent and everything is fine. You can also see that label's count is persistent in each run and it is equal to actual labels that I have.

However, my model is not doing any good, so I can not publish the results! Before that, I was actually happy about the result.

Here's the results after fixing the bug:

Conf-thres = 0.001

image

Conf-thres = 0.2

image

Conf-thres = 0.5

image

Conf-thres = 0.7

image

Conf-thres = 0.85

image

Conf-thres = 0.9

image

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 23, 2022

We can not just say that changing conf-thres is wrong.

See this paper which has demonstrated the importance of confidence threshold. It is interesting that it has actually done some experiments using Yolov5.

Confidence Score: The Forgotten Dimension of Object Detection Performance Evaluation

Page 15 of the paper:
image

@pourmand1376 pourmand1376 changed the title Impressive Results when changing conf-thres and iou-thres (most probably a bug) Impressive Results when changing conf-thres and iou-thres Jul 23, 2022
@pourmand1376
Copy link
Contributor Author

This PR is not required actually. If you have this problem, you should just update your forked repo to latest version. This has been fixed a little earlier.

@kaminetzky
Copy link

I was having the same issues before updating my repo. Thank you for your analysis!

@glenn-jocher
Copy link
Member

@kaminetzky you're welcome! It's great to hear that your issues have been addressed. Feel free to reach out if you need further assistance. Good luck with your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants