目标检测模型的评价标准-AP与mAP


目录

目录
  • 文章,我这里的公式和图也是参考此文章的。11 点插值计算方式计算 \(AP\) 公式如下:

    • 这是通常意义上的 11 points_Interpolated 形式的 AP,选取固定的 \({0,0.1,0.2,…,1.0}\) 11 个阈值,这个在 PASCAL2007 中使用
    • 这里因为参与计算的只有 11 个点,所以 \(K=11\),称为 11 points_Interpolated,\(k\) 为阈值索引
    • \(P_{interp}(k)\) 取第 \(k\) 个阈值所对应的样本点之后的样本中的最大值,只不过这里的阈值被限定在了 \({0,0.1,0.2,…,1.0}\) 范围内。

    从曲线上看,真实 AP< approximated AP < Interpolated AP11-points Interpolated AP 可能大也可能小,当数据量很多的时候会接近于 Interpolated AP,与 Interpolated AP 不同,前面的公式中计算 AP 时都是对 PR 曲线的面积估计,PASCAL 的论文里给出的公式就更加简单粗暴了,直接计算11 个阈值处的 precision 的平均值。PASCAL 论文给出的 11 点计算 AP 的公式如下。

    1, 在给定 recalprecision 的条件下计算 AP

    def voc_ap(rec, prec, use_07_metric=False):
        """ 
        ap = voc_ap(rec, prec, [use_07_metric])
        Compute VOC AP given precision and recall.
        If use_07_metric is true, uses the
        VOC 07 11 point method (default:False).
        """
        if use_07_metric:
            # 11 point metric
            ap = 0.
            for t in np.arange(0., 1.1, 0.1):
                if np.sum(rec >= t) == 0:
                    p = 0
                else:
                    p = np.max(prec[rec >= t])
                ap = ap + p / 11.
        else:
            # correct AP calculation
            # first append sentinel values at the end
            mrec = np.concatenate(([0.], rec, [1.]))
            mpre = np.concatenate(([0.], prec, [0.]))
    
            # compute the precision envelope
            for i in range(mpre.size - 1, 0, -1):
                mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
    
            # to calculate area under PR curve, look for points
            # where X axis (recall) changes value
            i = np.where(mrec[1:] != mrec[:-1])[0]
    
            # and sum (\Delta recall) * prec
            ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
        return ap
    

    2,给定目标检测结果文件和测试集标签文件 xml 等计算 AP

    def parse_rec(filename):
        """ Parse a PASCAL VOC xml file 
        Return : list, element is dict.
        """
        tree = ET.parse(filename)
        objects = []
        for obj in tree.findall('object'):
            obj_struct = {}
            obj_struct['name'] = obj.find('name').text
            obj_struct['pose'] = obj.find('pose').text
            obj_struct['truncated'] = int(obj.find('truncated').text)
            obj_struct['difficult'] = int(obj.find('difficult').text)
            bbox = obj.find('bndbox')
            obj_struct['bbox'] = [int(bbox.find('xmin').text),
                                  int(bbox.find('ymin').text),
                                  int(bbox.find('xmax').text),
                                  int(bbox.find('ymax').text)]
            objects.append(obj_struct)
    
        return objects
    
    def voc_eval(detpath,
                 annopath,
                 imagesetfile,
                 classname,
                 cachedir,
                 ovthresh=0.5,
                 use_07_metric=False):
        """rec, prec, ap = voc_eval(detpath,
                                    annopath,
                                    imagesetfile,
                                    classname,
                                    [ovthresh],
                                    [use_07_metric])
        Top level function that does the PASCAL VOC evaluation.
        detpath: Path to detections result file
            detpath.format(classname) should produce the detection results file.
        annopath: Path to annotations file
            annopath.format(imagename) should be the xml annotations file.
        imagesetfile: Text file containing the list of images, one image per line.
        classname: Category name (duh)
        cachedir: Directory for caching the annotations
        [ovthresh]: Overlap threshold (default = 0.5)
        [use_07_metric]: Whether to use VOC07's 11 point AP computation
            (default False)
        """
        # assumes detections are in detpath.format(classname)
        # assumes annotations are in annopath.format(imagename)
        # assumes imagesetfile is a text file with each line an image name
        # cachedir caches the annotations in a pickle file
    
        # first load gt
        if not os.path.isdir(cachedir):
            os.mkdir(cachedir)
        cachefile = os.path.join(cachedir, '%s_annots.pkl' % imagesetfile)
        # read list of images
        with open(imagesetfile, 'r') as f:
            lines = f.readlines()
        imagenames = [x.strip() for x in lines]
    
        if not os.path.isfile(cachefile):
            # load annotations
            recs = {}
            for i, imagename in enumerate(imagenames):
                recs[imagename] = parse_rec(annopath.format(imagename))
                if i % 100 == 0:
                    print('Reading annotation for {:d}/{:d}'.format(
                        i + 1, len(imagenames)))
            # save
            print('Saving cached annotations to {:s}'.format(cachefile))
            with open(cachefile, 'wb') as f:
                pickle.dump(recs, f)
        else:
            # load
            with open(cachefile, 'rb') as f:
                try:
                    recs = pickle.load(f)
                except:
                    recs = pickle.load(f, encoding='bytes')
    
        # extract gt objects for this class
        class_recs = {}
        npos = 0
        for imagename in imagenames:
            R = [obj for obj in recs[imagename] if obj['name'] == classname]
            bbox = np.array([x['bbox'] for x in R])
            difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
            det = [False] * len(R)
            npos = npos + sum(~difficult)
            class_recs[imagename] = {'bbox': bbox,
                                     'difficult': difficult,
                                     'det': det}
    
        # read dets
        detfile = detpath.format(classname)
        with open(detfile, 'r') as f:
            lines = f.readlines()
    
        splitlines = [x.strip().split(' ') for x in lines]
        image_ids = [x[0] for x in splitlines]
        confidence = np.array([float(x[1]) for x in splitlines])
        BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
    
        nd = len(image_ids)
        tp = np.zeros(nd)
        fp = np.zeros(nd)
    
        if BB.shape[0] > 0:
            # sort by confidence
            sorted_ind = np.argsort(-confidence)
            sorted_scores = np.sort(-confidence)
            BB = BB[sorted_ind, :]
            image_ids = [image_ids[x] for x in sorted_ind]
    
            # go down dets and mark TPs and FPs
            for d in range(nd):
                R = class_recs[image_ids[d]]
                bb = BB[d, :].astype(float)
                ovmax = -np.inf
                BBGT = R['bbox'].astype(float)
    
                if BBGT.size > 0:
                    # compute overlaps
                    # intersection
                    ixmin = np.maximum(BBGT[:, 0], bb[0])
                    iymin = np.maximum(BBGT[:, 1], bb[1])
                    ixmax = np.minimum(BBGT[:, 2], bb[2])
                    iymax = np.minimum(BBGT[:, 3], bb[3])
                    iw = np.maximum(ixmax - ixmin + 1., 0.)
                    ih = np.maximum(iymax - iymin + 1., 0.)
                    inters = iw * ih
    
                    # union
                    uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                           (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                           (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
    
                    overlaps = inters / uni
                    ovmax = np.max(overlaps)
                    jmax = np.argmax(overlaps)
    
                if ovmax > ovthresh:
                    if not R['difficult'][jmax]:
                        if not R['det'][jmax]:
                            tp[d] = 1.
                            R['det'][jmax] = 1
                        else:
                            fp[d] = 1.
                else:
                    fp[d] = 1.
    
        # compute precision recall
        fp = np.cumsum(fp)
        tp = np.cumsum(tp)
        rec = tp / float(npos)
        # avoid divide by zero in case the first detection matches a difficult
        # ground truth
        prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
        ap = voc_ap(rec, prec, use_07_metric)
    
        return rec, prec, ap
    

    2.4,mAP 计算方法

    因为 \(mAP\) 值的计算是对数据集中所有类别的 \(AP\) 值求平均,所以我们要计算 \(mAP\),首先得知道某一类别的 \(AP\) 值怎么求。不同数据集的某类别的 \(AP\) 计算方法大同小异,主要分为三种:

    (1)在 VOC2007,只需要选取当 \(Recall >= 0, 0.1, 0.2, ..., 1\)11 个点时的 Precision 最大值,然后 \(AP\) 就是这 11Precision 的平均值,\(mAP\) 就是所有类别 \(AP\) 值的平均。VOC 数据集中计算 \(AP\) 的代码(用的是插值计算方法,代码出自py-faster-rcnn仓库)

    (2)在 VOC2010 及以后,需要针对每一个不同的 Recall 值(包括 0 和 1),选取其大于等于这些 Recall 值时的 Precision 最大值,然后计算 PR 曲线下面积作为 \(AP\) 值,\(mAP\) 就是所有类别 \(AP\) 值的平均。

    (3)COCO 数据集,设定多个 IOU 阈值(0.5-0.95, 0.05 为步长),在每一个 IOU 阈值下都有某一类别的 AP 值,然后求不同 IOU 阈值下的 AP 平均,就是所求的最终的某类别的 AP 值。

    三,目标检测度量标准汇总

    评价指标 定义及理解
    mAP mean Average Precision, 即各类别 AP 的平均值
    AP PR 曲线下面积,后文会详细讲解
    PR 曲线 Precision-Recall 曲线
    Precision \(TP / (TP + FP)\)
    Recall \(TP / (TP + FN)\)
    TP IoU>0.5 的检测框数量(同一 Ground Truth 只计算一次,阈值取 0.5
    FP IoU<=0.5 的检测框,或者是检测到同一个 GT 的多余检测框的数量
    FN 没有检测到的 GT 的数量

    四,参考资料

    • 目标检测评价标准-AP mAP
    • 目标检测的性能评价指标
    • Soft-NMS
    • Recent Advances in Deep Learning for Object Detection
    • A Simple and Fast Implementation of Faster R-CNN
    • 分类模型评估指标——准确率、精准率、召回率、F1、ROC曲线、AUC曲线
    • 一文让你彻底理解准确率,精准率,召回率,真正率,假正率,ROC/AUC