基于COCO数据集验证的目标检测算法天梯排行榜
基于COCO数据集验证的目标检测算法天梯排行榜
AP50
| Rank | Model | box AP | AP50 | Paper | Code | Result | Year | Tags |
|---|---|---|---|---|---|---|---|---|
| 1 | SwinV2-G (HTC++) | 63.1 | Swin Transformer V2: Scaling Up Capacity and Resolution | Link | 2021 | Swin-Transformer | ||
| 2 | Florence-CoSwin-H | 62.4 | Florence: A New Foundation Model for Computer Vision | 2021 | Swin-Transformer | |||
| 3 | GLIP (Swin-L, multi-scale) | 61.5 | 79.5 | Grounded Language-Image Pre-training | 2021 | multiscale; Vision Language; Dynamic Head; BERT-Base |
||
| 4 | Soft Teacher + Swin-L (HTC++, multi-scale) | 61.3 | End-to-End Semi-Supervised Object Detection with Soft Teacher | 2021 | multiscale; Swin-Transformer |
|||
| 5 | DyHead (Swin-L, multi scale, self-training) | 60.6 | 78.5 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale; Swin-Transformer |
||
| 6 | Dual-Swin-L (HTC, multi-scale) | 60.1 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | multiscale Swin-Transformer |
|||
| 7 | Dual-Swin-L (HTC, single-scale) | 59.4 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | Swin-Transformer | |||
| 8 | Focal-L (DyHead, multi-scale) | 58.9 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale Focal-Transformer |
|||
| 9 | DyHead (Swin-L, multi scale) | 58.7 | 77.1 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale Swin-Transformer |
||
| 10 | Swin-L (HTC++, multi scale) | 58.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | multiscale Swin-Transformer |
|||
| 11 | Focal-L (HTC++, multi-scale) | 58.4 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale | |||
| 12 | Swin-L (HTC++, single scale) | 57.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | single scale Swin-Transformer |
|||
| 13 | YOLOR-D6 (1280, single-scale, 34 fps) | 57.3 | 75.0 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 14 | SOLQ (Swin-L, single) | 56.5 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 15 | YOLOR-E6 (1280, single-scale, 45 fps) | 56.4 | 74.1 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 16 | CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) | 56.4 | 74.0 | Probabilistic two-stage detection | 2021 | single scale FPN DCN |
||
| 17 | QueryInst (single-scale) | 56.1 | 75.9 | Instances as Queries | 2021 | |||
| 18 | YOLOv4-P7 with TTA | 55.8 | 73.2 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
||
| 19 | DetectoRS (ResNeXt-101-64x4d, multi-scale) | 55.7 | 74.2 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||
| 20 | YOLOR-W6 (1280, single-scale, 66 fps) | 55.5 | 73.2 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 21 | YOLOv4-P7 CSP-P7 (single-scale, 16 fps) | 55.4 | 73.3 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 22 | CSP-p6 + Mish (multi-scale) | 55.2 | 72.9 | Mish: A Self Regularized Non-Monotonic Activation Function | 2019 | multiscale | ||
| 23 | YOLOv4-P6 with TTA | 54.9 | 72.6 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
||
| 24 | Cascade Eff-B7 NAS-FPN (1280) | 54.8 | Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation | 2020 | single scale NAS-FPN |
|||
| 25 | DetectoRS (ResNeXt-101-32x4d, multi-scale) | 54.7 | 73.5 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||
| 26 | YOLOv4-P6 CSP-P6 (single-scale, 32 fps) | 54.3 | 72.3 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 27 | SpineNet-190 (1280, with Self-training on OpenImages, single-scale) | 54.3 | Rethinking Pre-training and Self-training | 2020 | single scale | |||
| 28 | UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) | 54.1 | 71.6 | USB: Universal-Scale Object Detection Benchmark | 2021 | multiscale DCN |
||
| 29 | EfficientDet-D7 (single-scale) | 53.7 | 72.4 | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale | ||
| 30 | PAA (ResNext-152-32x8d + DCN, multi-scale) | 53.5 | 71.6 | Probabilistic Anchor Assignment with IoU Prediction for Object Detection | 2020 | ResNeXt multiscale DCN |
||
| 31 | LSNet (Res2Net-101+ DCN, multi-scale) | 53.5 | 71.1 | Location-Sensitive Visual Recognition with Cross-IOU Loss | 2021 | multiscale DCN |
||
| 32 | ResNeSt-200 (multi-scale) | 53.3 | 72.0 | ResNeSt: Split-Attention Networks | 2020 | multiscale | ||
| 33 | Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) | 53.3 | 71.9 | CBNet: A Novel Composite Backbone Network Architecture for Object Detection | 2019 | multiscale | ||
| 34 | DetectoRS (ResNeXt-101-32x4d, single-scale) | 53.3 | 71.6 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt single scale |
||
| 35 | GFLV2 (Res2Net-101, DCN, multiscale) | 53.3 | 70.9 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | multiscale DCN |
||
| 36 | RelationNet++ (ResNeXt-64x4d-101-DCN) | 52.7 | RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder | 2020 | ResNeXt DCN |
|||
| 37 | YOLOv4-P5 with TTA | 52.5 | 70.3 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
||
| 38 | Deformable DETR (ResNeXt-101+DCN) | 52.3 | 71.9 | Deformable DETR: Deformable Transformers for End-to-End Object Detection | 2020 | ResNeXt DCN |
||
| 39 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 52.3 | 70.9 | Global Context Networks | 2020 | ResNeXt DCN GCN |
||
| 40 | RetinaNet (SpineNet-190, 1280x1280) | 52.1 | 71.8 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 41 | RepPoints v2 (ResNeXt-101, DCN, multi-scale) | 52.1 | 70.1 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt; multiscale DCN |
||
| 42 | AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) | 51.9 | 70.4 | Attention-guided Context Feature Pyramid Network for Object Detection | 2020 | ResNeXt multiscale FPN |
||
| 43 | OTA (ResNeXt-101+DCN, multiscale) | 51.5 | 68.6 | OTA: Optimal Transport Assignment for Object Detection | 2021 | |||
| 44 | UniverseNet-20.08d (Res2Net-101, DCN, single-scale) | 51.3 | 70.0 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||
| 45 | TSD (SENet154-DCN,multi-scale) | 51.2 | 71.9 | Revisiting the Sibling Head in Object Detector | 2020 | multiscale DCN |
||
| 46 | YOLOX-X (Modified CSP v5) | 51.2 | 69.6 | YOLOX: Exceeding YOLO Series in 2021 | 2021 | YOLO | ||
| 47 | RetinaNet (SpineNet-143, 1280x1280) | 50.7 | 70.4 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 48 | ATSS (ResNetXt-64x4d-101+DCN,multi-scale) | 50.7 | 68.9 | Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection | 2019 | ResNeXt multiscale DCN |
||
| 49 | NAS-FPN (AmoebaNet-D, learned aug) | 50.7 | Learning Data Augmentation Strategies for Object Detection | 2019 | FPN | |||
| 50 | GFLV2 (Res2Net-101, DCN) | 50.6 | 69 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN | ||
| 51 | aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) | 50.2 | 70.3 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt multiscale DCN |
||
| 52 | FreeAnchor + SEPC (DCN, ResNext-101-64x4d) | 50.1 | 69.8 | Scale-Equalizing Pyramid Convolution for Object Detection | 2020 | ResNeXt DCN |
||
| 53 | D2Det (ResNet-101-DCN, multi-scale test) | 50.1 | 69.4 | D2Det: Towards High Quality Object Detection and Instance Segmentation | 2020 | multiscale DCN ResNet |
||
| 54 | Dynamic R-CNN (ResNet-101-DCN, multi-scale) | 50.1 | 68.3 | Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training | 2020 | multiscale DCN ResNet |
||
| 55 | TSD (ResNet-101-Deformable, Image Pyramid) | 49.4 | 69.6 | Revisiting the Sibling Head in Object Detector | 2020 | ResNet | ||
| 56 | RepPoints v2 (ResNeXt-101, DCN) | 49.4 | 68.9 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt DCN |
||
| 57 | CPNDet (Hourglass-104, multi-scale) | 49.2 | 67.3 | Corner Proposal Network for Anchor-free, Two-stage Object Detection | 2020 | multiscale | ||
| 58 | GFLV2 (ResNeXt-101, 32x4d, DCN) | 49 | 67.6 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNeXt DCN |
||
| 59 | aLRP Loss (ResNext-101-64x4d, DCN, single scale) | 48.9 | 69.3 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale DCN |
||
| 60 | UniverseNet-20.08 (Res2Net-50, DCN, single-scale) | 48.8 | 67.5 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||
| 61 | SOLQ (ResNet101, single scale) | 48.7 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 62 | RetinaNet (SpineNet-96, 1024x1024) | 48.6 | 68.4 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 63 | TridentNet (ResNet-101-Deformable, Image Pyramid) | 48.4 | 69.7 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||
| 64 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 48.4 | 67.6 | GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond | 2019 | ResNeXt DCN GCN |
||
| 65 | GFLV2 (ResNet-101-DCN) | 48.3 | 66.5 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN ResNet |
||
| 66 | GFL (X-101-32x4d-DCN, single-scale) | 48.2 | 67.4 | Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection | 2020 | ResNeXt single scale DCN |
||
| 67 | ISTR (ResNet101-FPN-3x, single-scale) | 48.1 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | ||||
| 68 | aLRP Loss (ResNext-101-64x4d, single scale) | 47.8 | 68.4 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale |
||
| 69 | MatrixNet Corners (ResNet-152, multi-scale) | 47.8 | 66.2 | Matrix Nets: A New Deep Architecture for Object Detection | 2019 | multiscale ResNet |
||
| 70 | SOLQ (ResNet50, single scale) | 47.8 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 71 | SAPD (ResNeXt-101, single-scale) | 47.4 | 67.4 | Soft Anchor-Point Object Detection | 2019 | ResNeXt single scale |
||
| 72 | PANet (ResNeXt-101, multi-scale) | 47.4 | 67.2 | Path Aggregation Network for Instance Segmentation | 2018 | ResNeXt multiscale |
||
| 73 | HTC (HRNetV2p-W48) | 47.3 | 65.9 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 74 | HTC (ResNeXt-101-FPN) | 47.1 | 63.9 | Hybrid Task Cascade for Instance Segmentation | 2019 | ResNeXt FPN |
||
| 75 | CenterNet511 (Hourglass-104, multi-scale) | 47.0 | 64.5 | CenterNet: Keypoint Triplets for Object Detection | 2019 | multiscale | ||
| 76 | MAL (ResNeXt101, multi-scale) | 47.0 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt multiscale |
|||
| 77 | ISTR (ResNet50-FPN-3x) | 46.8 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | FPN ResNet |
|||
| 78 | RetinaNet (SpineNet-49, 896x896) | 46.7 | 66.3 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 79 | RPDet (ResNet-101-DCN, multi-scale) | 46.5 | 67.4 | RepPoints: Point Set Representation for Object Detection | 2019 | multiscale DCN ResNet |
||
| 80 | HoughNet (MS) | 46.4 | 65.1 | HoughNet: Integrating near and long-range evidence for bottom-up object detection | 2020 | multiscale | ||
| 81 | PPDet (ResNeXt-101-FPN, multiscale) | 46.3 | 64.8 | Reducing Label Noise in Anchor-Free Object Detection | 2020 | ResNeXt multiscale FPN |
||
| 82 | GFLV2 (ResNet-101) | 46.2 | 64.3 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||
| 83 | SNIPER (ResNet-101) | 46.1 | 67.0 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||
| 84 | Mask R-CNN (HRNetV2p-W48 + cascade) | 46.1 | 64.0 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 85 | DCNv2 (ResNet-101, multi-scale) | 46.0 | 67.9 | Deformable ConvNets v2: More Deformable, Better Results | 2018 | multiscale DCN ResNet |
||
| 86 | Gaussian-FCOS | 46 | Localization Uncertainty Estimation for Anchor-Free Object Detection | 2020 | ||||
| 87 | Cascade R-CNN-FPN (ResNet-101, map-guided) | 45.9 | 64.2 | InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting | 2019 | FPN ResNet |
||
| 88 | MAL (ResNeXt101, single-scale) | 45.9 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt single scale |
|||
| 89 | CenterMask+VoVNetV2-99 (single-scale) | 45.8 | 64.5 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||
| 90 | D-RFCN + SNIP (DPN-98 with flip, multi-scale) | 45.7 | 67.3 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale | ||
| 91 | YOLOv4 (CD53) | 45.5 | 64.1 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 92 | PP-YOLO (608x608) | 45.2 | 65.2 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||
| 93 | AC-FPN Cascade R-CNN (ResNet-101, single scale) | 45 | 64.4 | Attention-guided Context Feature Pyramid Network for Object Detection | 2019 | single scale FPN ResNet |
||
| 94 | FreeAnchor (ResNeXt-101) | 44.8 | 64.3 | FreeAnchor: Learning to Match Anchors for Visual Object Detection | 2019 | ResNeXt | ||
| 95 | FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) | 44.7 | 64.1 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 96 | CenterMask+VoVNet2-57 (single-scale) | 44.7 | 63.1 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||
| 97 | FSAF (ResNeXt-101, multi-scale) | 44.6 | 65.2 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | ResNeXt multiscale |
||
| 98 | aLRP Loss (ResNext-101, DCN, 500 scale) | 44.6 | 65.0 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt DCN |
||
| 99 | CenterMask + X-101-32x8d (single-scale) | 44.6 | 63.4 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||
| 100 | RetinaNet (SpineNet-49, 640x640) | 44.3 | 63.8 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 101 | YOLOF-DC5 | 44.3 | 62.9 | You Only Look One-level Feature | 2021 | YOLO | ||
| 102 | GFLV2 (ResNet-50) | 44.3 | 62.3 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||
| 103 | InterNet (ResNet-101-FPN, multi-scale) | 44.2 | 67.5 | Feature Intertwiner for Object Detection | 2019 | multiscale FPN ResNet |
||
| 104 | M2Det (VGG-16, multi-scale) | 44.2 | 64.6 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale | ||
| 105 | Faster R-CNN (LIP-ResNet-101-MD w FPN) | 43.9 | 65.7 | LIP: Local Importance-based Pooling | 2019 | FPN | ||
| 106 | M2Det (ResNet-101, multi-scale) | 43.9 | 64.4 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale ResNet |
||
| 107 | YOLOv3 @800 + ASFF* (Darknet-53) | 43.9 | 64.1 | Learning Spatial Fusion for Single-Shot Object Detection | 2019 | YOLO | ||
| 108 | FoveaBox (ResNeXt-101) | 43.9 | 63.5 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | ||
| 109 | ExtremeNet (Hourglass-104, multi-scale) | 43.7 | 60.5 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | multiscale | ||
| 110 | YOLOv4-608 | 43.5 | 65.7 | YOLOv4: Optimal Speed and Accuracy of Object Detection | 2020 | single scale YOLO |
||
| 111 | SNIPER (ResNet-50) | 43.5 | 65.0 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||
| 112 | CenterNet (HRNetV2-W48) | 43.5 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||
| 113 | D-RFCN + SNIP (ResNet-101, multi-scale) | 43.4 | 65.5 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale ResNet |
||
| 114 | Grid R-CNN (ResNeXt-101-FPN) | 43.2 | 63.0 | Grid R-CNN | 2018 | ResNeXt FPN |
||
| 115 | FCOS (ResNeXt-101-64x4d-FPN) | 43.2 | 62.8 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 116 | CornerNet-Saccade (Hourglass-104, multi-scale) | 43.2 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | multiscale | |||
| 117 | Libra R-CNN (ResNeXt-101-FPN) | 43.0 | 64 | Libra R-CNN: Towards Balanced Learning for Object Detection | 2019 | ResNeXt FPN |
||
| 118 | RPDet (ResNet-101-DCN) | 42.8 | 65.0 | RepPoints: Point Set Representation for Object Detection | 2019 | DCN ResNet |
||
| 119 | SpineNet-49 (640, RetinaNet, single-scale) | 42.8 | 62.3 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | single scale | ||
| 120 | Cascade R-CNN (ResNet-101-FPN+, cascade) | 42.8 | 62.1 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 121 | Cascade R-CNN | 42.8 | 62.1 | Cascade R-CNN: High Quality Object Detection and Instance Segmentation | 2019 | |||
| 122 | TridentNet (ResNet-101) | 42.7 | 63.6 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||
| 123 | FCOS (ResNeXt-32x8d-101-FPN) | 42.7 | 62.2 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 124 | RetinaMask (ResNeXt-101-FPN-GN) | 42.6 | 62.5 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | ResNeXt FPN |
||
| 125 | TAL + TAP | 42.5 | 60.3 | TOOD: Task-aligned One-stage Object Detection | 2021 | |||
| 126 | Faster R-CNN (HRNetV2p-W48) | 42.4 | 63.6 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 127 | HSD (Rest101, 768x768, single-scale test) | 42.3 | 61.2 | Hierarchical Shot Detector | 2019 | single scale | ||
| 128 | CornerNet511 (Hourglass-104, multi-scale) | 42.1 | 57.8 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | multiscale | ||
| 129 | FoveaBox (ResNeXt-101) | 42.1 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||
| 130 | FCOS (HRNet-W32-5l) | 42.0 | 60.4 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | |||
| 131 | RefineDet512+ (ResNet-101) | 41.8 | 62.9 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||
| 132 | GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) | 41.6 | 62.8 | Gradient Harmonized Single-stage Detector | 2018 | FPN | ||
| 133 | CenterNet-DLA (DLA-34, multi-scale) | 41.6 | Objects as Points | 2019 | multiscale | |||
| 134 | RetinaNet (SpineNet-49S, 640x640) | 41.5 | 60.5 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 135 | RPDet (ResNet-101) | 41 | 62.9 | RepPoints: Point Set Representation for Object Detection | 2019 | ResNet | ||
| 136 | M2Det (VGG-16, single-scale) | 41.0 | 59.7 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale | ||
| 137 | FSAF (ResNet-101, single-scale) | 40.9 | 61.5 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | single scale ResNet |
||
| 138 | RetinaNet (ResNeXt-101-FPN) | 40.8 | 61.1 | Focal Loss for Dense Object Detection | 2017 | ResNeXt FPN |
||
| 139 | Cascade R-CNN (ResNet-50-FPN+, cascade) | 40.6 | 59.9 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 140 | Faster R-CNN (Cascade RPN) | 40.6 | 58.9 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||
| 141 | ResNet-50-DW-DPN (Deformable Kernels) | 40.6 | Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation | 2019 | ResNet | |||
| 142 | IoU-Net | 40.6 | Acquisition of Localization Confidence for Accurate Object Detection | 2018 | ||||
| 143 | FCOS (HRNetV2p-W48) | 40.5 | 59.3 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 144 | ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS | 40.4 | Bounding Box Regression with Uncertainty for Accurate Object Detection | 2018 | FPN ResNet |
|||
| 145 | RDSNet (ResNet-101, RetinaNet, mask, MBRM) | 40.3 | 60.1 | RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation | 2019 | ResNet | ||
| 146 | ExtremeNet (Hourglass-104, single-scale) | 40.2 | 55.5 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | single scale | ||
| 147 | Mask R-CNN (ResNet-101-FPN, CBN) | 40.1 | 60.5 | Cross-Iteration Batch Normalization | 2020 | FPN ResNet |
||
| 148 | Fast R-CNN (Cascade RPN) | 40.1 | 59.4 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||
| 149 | Mask R-CNN (ResNeXt-101-FPN) | 39.8 | 62.3 | Mask R-CNN | 2017 | ResNeXt FPN |
||
| 150 | GA-Faster-RCNN | 39.8 | 59.2 | Region Proposal by Guided Anchoring | 2019 | |||
| 151 | FPN (ResNet101 backbone) | 39.5 | ChainerCV: a Library for Deep Learning in Computer Vision | 2017 | FPN ResNet |
|||
| 152 | RetinaMask (ResNet-50-FPN) | 39.4 | 58.6 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | FPN ResNet |
||
| 153 | PP-YOLO (320x320) | 39.3 | 59.3 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||
| 154 | AA-ResNet-10 + RetinaNet | 39.2 | Attention Augmented Convolutional Networks | 2019 | ||||
| 155 | MAL (ResNet50, single-scale) | 39.2 | Multiple Anchor Learning for Visual Object Detection | 2019 | single scale ResNet |
|||
| 156 | RetinaNet (ResNet-101-FPN) | 39.1 | 59.1 | Focal Loss for Dense Object Detection | 2017 | FPN ResNet |
||
| 157 | Cascade R-CNN (ResNet-101-FPN+) | 38.8 | 61.1 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 158 | M2Det (ResNet-101, single-scale) | 38.8 | 59.4 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale ResNet |
||
| 159 | SaccadeNet (DLA-34-DCN) | 38.5 | 55.6 | SaccadeNet: A Fast and Accurate Object Detector | 2020 | DCN | ||
| 160 | Mask R-CNN (ResNet-101-FPN) | 38.2 | 60.3 | Mask R-CNN | 2017 | FPN ResNet |
||
| 161 | WSMA-Seg | 38.1 | Segmentation is All You Need | 2019 | ||||
| 162 | Faster R-CNN + FPN + CGD | 37.9 | Compact Global Descriptor for Neural Networks | 2019 | FPN | |||
| 163 | CornerNet511 (Hourglass-52, single-scale) | 37.8 | 53.7 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | single scale | ||
| 164 | RefineDet512+ (VGG-16) | 37.6 | 58.7 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||
| 165 | DeformConv-R-FCN (Aligned-Inception-ResNet) | 37.5 | 58.0 | Deformable Convolutional Networks | 2017 | |||
| 166 | Faster R-CNN (ImageNet+300M) | 37.4 | 58 | Revisiting Unreasonable Effectiveness of Data in Deep Learning Era | 2017 | |||
| 167 | Mask R-CNN (Bottleneck-injected ResNet-50, FPN) | 36.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN ResNet |
|||
| 168 | Faster R-CNN + TDM | 36.8 | Beyond Skip Connections: Top-Down Modulation for Object Detection | 2016 | ||||
| 169 | Cascade R-CNN (ResNet-50-FPN+) | 36.5 | 59 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN; ResNet |
||
| 170 | RefineDet512 (ResNet-101) | 36.4 | 57.5 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||
| 171 | Faster R-CNN + FPN | 36.2 | Feature Pyramid Networks for Object Detection | 2016 | FPN | |||
| 172 | Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) | 35.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN; ResNet |
|||
| 173 | Faster R-CNN (box refinement, context, multi-scale testing) | 34.9 | Deep Residual Learning for Image Recognition | 2015 | multiscale | |||
| 174 | Faster R-CNN | 34.7 | Speed/accuracy trade-offs for modern convolutional object detectors | 2016 | ||||
| 175 | CornerNet-Squeeze | 34.4 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | ||||
| 176 | MultiPath Network | 33.2 | A MultiPath Network for Object Detection | 2016 | ||||
| 177 | ION | 33.1 | 55.7 | Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks | 2015 | |||
| 178 | RefineDet512 (VGG-16) | 33 | 54.5 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||
| 179 | YOLOv3 + Darknet-53 | 33.0 | YOLOv3: An Incremental Improvement | 2018 | YOLO | |||
| 180 | SSD512 | 28.8 | 48.5 | SSD: Single Shot MultiBox Detector | 2015 | |||
| 181 | MnasFPN (MobileNetV2) | 26.1 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 182 | ESPNetv2-512 | 26.0 | ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network | 2018 | ||||
| 183 | MnasFPN (MobileNetV3) | 25.5 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 184 | MnasFPN (MNASNet-B1) | 24.6 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 185 | MnasFPN x0.7 (MobileNetV2) | 23.8 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 186 | MobielNet-v1-SSD-300x300+CGD | 21.4 | Compact Global Descriptor for Neural Networks | 2019 | ||||
| 187 | Fast-RCNN | 19.7 | Fast R-CNN | 2015 | ||||
| 188 | MobileNet | 19.3 | MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications | 2017 | ||||
| 189 | DAT-S (RetinaNet) | 69.6 | Vision Transformer with Deformable Attention | 2022 | ||||
| 190 | CenterMask-VoVNet99 (multi-scale) | 68.3 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | multiscale | |||
| 191 | Mask R-CNN (HRNetV2p-W32 + cascade) | 62.5 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||
| 192 | FoveaBox (ResNeXt-101) | 61.9 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||
| 193 | VirTex Mask R-CNN (ResNet-50-FPN) | 61.7 | VirTex: Learning Visual Representations from Textual Annotations | 2020 | FPN; ResNet |
|||
| 194 | Centermask + ResNet101 | 61.6 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | ResNet | |||
| 195 | PAFNet (ResNet50-vd) | 59.8 | PAFNet: An Efficient Anchor-Free Object Detector Guidance | 2021 | ResNet | |||
| 196 | IoU-Net+EnergyRegression | 58.5 | Energy-Based Models for Deep Probabilistic Regression | 2019 | ||||
| 197 | Cascade R-CNN (HRNetV2p-W48) | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||||
| 198 | ISTR (ResNet50-FPN-3x, single-scale) | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | |||||
| 199 | FoveaBox (ResNeXt-101) | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | ||||
| 200 | EfficientDet-D7x (single-scale) | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale |
AP75
| Rank | Model | box AP | AP75 | Paper | Code | Result | Year | Tags |
|---|---|---|---|---|---|---|---|---|
| 1 | SwinV2-G (HTC++) | 63.1 | Swin Transformer V2: Scaling Up Capacity and Resolution | Link | 2021 | Swin-Transformer | ||
| 2 | Florence-CoSwin-H | 62.4 | Florence: A New Foundation Model for Computer Vision | 2021 | Swin-Transformer | |||
| 3 | GLIP (Swin-L, multi-scale) | 61.5 | 67.7 | Grounded Language-Image Pre-training | 2021 | multiscale; Vision Language; Dynamic Head; BERT-Base |
||
| 4 | Soft Teacher + Swin-L (HTC++, multi-scale) | 61.3 | End-to-End Semi-Supervised Object Detection with Soft Teacher | 2021 | multiscale; Swin-Transformer |
|||
| 5 | DyHead (Swin-L, multi scale, self-training) | 60.6 | 66.6 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale; Swin-Transformer |
||
| 6 | Dual-Swin-L (HTC, multi-scale) | 60.1 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | multiscale Swin-Transformer |
|||
| 7 | Dual-Swin-L (HTC, single-scale) | 59.4 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | Swin-Transformer | |||
| 8 | Focal-L (DyHead, multi-scale) | 58.9 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale Focal-Transformer |
|||
| 9 | DyHead (Swin-L, multi scale) | 58.7 | 64.5 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale Swin-Transformer |
||
| 10 | Swin-L (HTC++, multi scale) | 58.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | multiscale Swin-Transformer |
|||
| 11 | Focal-L (HTC++, multi-scale) | 58.4 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale | |||
| 12 | Swin-L (HTC++, single scale) | 57.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | single scale Swin-Transformer |
|||
| 13 | YOLOR-D6 (1280, single-scale, 34 fps) | 57.3 | 62.7 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 14 | SOLQ (Swin-L, single) | 56.5 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 15 | YOLOR-E6 (1280, single-scale, 45 fps) | 56.4 | 61.6 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 16 | CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) | 56.4 | 61.6 | Probabilistic two-stage detection | 2021 | single scale FPN DCN |
||
| 17 | QueryInst (single-scale) | 56.1 | 61.9 | Instances as Queries | 2021 | |||
| 18 | YOLOv4-P7 with TTA | 55.8 | 61.2 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
||
| 19 | DetectoRS (ResNeXt-101-64x4d, multi-scale) | 55.7 | 61.1 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||
| 20 | YOLOR-W6 (1280, single-scale, 66 fps) | 55.5 | 60.6 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 21 | YOLOv4-P7 CSP-P7 (single-scale, 16 fps) | 55.4 | 60.7 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 22 | CSP-p6 + Mish (multi-scale) | 55.2 | 60.5 | Mish: A Self Regularized Non-Monotonic Activation Function | 2019 | multiscale | ||
| 23 | YOLOv4-P6 with TTA | 54.9 | 60.2 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
||
| 24 | Cascade Eff-B7 NAS-FPN (1280) | 54.8 | Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation | 2020 | single scale NAS-FPN |
|||
| 25 | DetectoRS (ResNeXt-101-32x4d, multi-scale) | 54.7 | 60.1 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||
| 26 | YOLOv4-P6 CSP-P6 (single-scale, 32 fps) | 54.3 | 59.5 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 27 | SpineNet-190 (1280, with Self-training on OpenImages, single-scale) | 54.3 | Rethinking Pre-training and Self-training | 2020 | single scale | |||
| 28 | UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) | 54.1 | 59.9 | USB: Universal-Scale Object Detection Benchmark | 2021 | multiscale DCN |
||
| 29 | EfficientDet-D7 (single-scale) | 53.7 | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale | |||
| 30 | PAA (ResNext-152-32x8d + DCN, multi-scale) | 53.5 | 59.1 | Probabilistic Anchor Assignment with IoU Prediction for Object Detection | 2020 | ResNeXt multiscale DCN |
||
| 31 | LSNet (Res2Net-101+ DCN, multi-scale) | 53.5 | 59.2 | Location-Sensitive Visual Recognition with Cross-IOU Loss | 2021 | multiscale DCN |
||
| 32 | ResNeSt-200 (multi-scale) | 53.3 | 58.0 | ResNeSt: Split-Attention Networks | 2020 | multiscale | ||
| 33 | Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) | 53.3 | 58.5 | CBNet: A Novel Composite Backbone Network Architecture for Object Detection | 2019 | multiscale | ||
| 34 | DetectoRS (ResNeXt-101-32x4d, single-scale) | 53.3 | 58.5 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt single scale |
||
| 35 | GFLV2 (Res2Net-101, DCN, multiscale) | 53.3 | 59.2 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | multiscale DCN |
||
| 36 | RelationNet++ (ResNeXt-64x4d-101-DCN) | 52.7 | RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder | 2020 | ResNeXt DCN |
|||
| 37 | YOLOv4-P5 with TTA | 52.5 | 58 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
||
| 38 | Deformable DETR (ResNeXt-101+DCN) | 52.3 | 58.1 | Deformable DETR: Deformable Transformers for End-to-End Object Detection | 2020 | ResNeXt DCN |
||
| 39 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 52.3 | 56.9 | Global Context Networks | 2020 | ResNeXt DCN GCN |
||
| 40 | RetinaNet (SpineNet-190, 1280x1280) | 52.1 | 56.5 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 41 | RepPoints v2 (ResNeXt-101, DCN, multi-scale) | 52.1 | 57.5 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt; multiscale DCN |
||
| 42 | AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) | 51.9 | 57 | Attention-guided Context Feature Pyramid Network for Object Detection | 2020 | ResNeXt multiscale FPN |
||
| 43 | OTA (ResNeXt-101+DCN, multiscale) | 51.5 | 57.1 | OTA: Optimal Transport Assignment for Object Detection | 2021 | |||
| 44 | UniverseNet-20.08d (Res2Net-101, DCN, single-scale) | 51.3 | 55.8 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||
| 45 | TSD (SENet154-DCN,multi-scale) | 51.2 | 56.0 | Revisiting the Sibling Head in Object Detector | 2020 | multiscale DCN |
||
| 46 | YOLOX-X (Modified CSP v5) | 51.2 | 55.7 | YOLOX: Exceeding YOLO Series in 2021 | 2021 | YOLO | ||
| 47 | RetinaNet (SpineNet-143, 1280x1280) | 50.7 | 54.9 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 48 | ATSS (ResNetXt-64x4d-101+DCN,multi-scale) | 50.7 | 56.3 | Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection | 2019 | ResNeXt multiscale DCN |
||
| 49 | NAS-FPN (AmoebaNet-D, learned aug) | 50.7 | Learning Data Augmentation Strategies for Object Detection | 2019 | FPN | |||
| 50 | GFLV2 (Res2Net-101, DCN) | 50.6 | 55.3 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN | ||
| 51 | aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) | 50.2 | 53.9 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt multiscale DCN |
||
| 52 | FreeAnchor + SEPC (DCN, ResNext-101-64x4d) | 50.1 | 54.3 | Scale-Equalizing Pyramid Convolution for Object Detection | 2020 | ResNeXt DCN |
||
| 53 | D2Det (ResNet-101-DCN, multi-scale test) | 50.1 | 54.9 | D2Det: Towards High Quality Object Detection and Instance Segmentation | 2020 | multiscale DCN ResNet |
||
| 54 | Dynamic R-CNN (ResNet-101-DCN, multi-scale) | 50.1 | 55.6 | Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training | 2020 | multiscale DCN ResNet |
||
| 55 | TSD (ResNet-101-Deformable, Image Pyramid) | 49.4 | 54.4 | Revisiting the Sibling Head in Object Detector | 2020 | ResNet | ||
| 56 | RepPoints v2 (ResNeXt-101, DCN) | 49.4 | 53.4 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt DCN |
||
| 57 | CPNDet (Hourglass-104, multi-scale) | 49.2 | 53.7 | Corner Proposal Network for Anchor-free, Two-stage Object Detection | 2020 | multiscale | ||
| 58 | GFLV2 (ResNeXt-101, 32x4d, DCN) | 49 | 53.5 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNeXt DCN |
||
| 59 | aLRP Loss (ResNext-101-64x4d, DCN, single scale) | 48.9 | 52.5 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale DCN |
||
| 60 | UniverseNet-20.08 (Res2Net-50, DCN, single-scale) | 48.8 | 53.0 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||
| 61 | SOLQ (ResNet101, single scale) | 48.7 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 62 | RetinaNet (SpineNet-96, 1024x1024) | 48.6 | 52.5 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 63 | TridentNet (ResNet-101-Deformable, Image Pyramid) | 48.4 | 53.5 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||
| 64 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 48.4 | 52.7 | GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond | 2019 | ResNeXt DCN GCN |
||
| 65 | GFLV2 (ResNet-101-DCN) | 48.3 | 52.8 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN ResNet |
||
| 66 | GFL (X-101-32x4d-DCN, single-scale) | 48.2 | 52.6 | Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection | 2020 | ResNeXt single scale DCN |
||
| 67 | ISTR (ResNet101-FPN-3x, single-scale) | 48.1 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | ||||
| 68 | aLRP Loss (ResNext-101-64x4d, single scale) | 47.8 | 51.1 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale |
||
| 69 | MatrixNet Corners (ResNet-152, multi-scale) | 47.8 | 52.3 | Matrix Nets: A New Deep Architecture for Object Detection | 2019 | multiscale ResNet |
||
| 70 | SOLQ (ResNet50, single scale) | 47.8 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 71 | SAPD (ResNeXt-101, single-scale) | 47.4 | 51.1 | Soft Anchor-Point Object Detection | 2019 | ResNeXt single scale |
||
| 72 | PANet (ResNeXt-101, multi-scale) | 47.4 | 51.8 | Path Aggregation Network for Instance Segmentation | 2018 | ResNeXt multiscale |
||
| 73 | HTC (HRNetV2p-W48) | 47.3 | 51.2 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 74 | HTC (ResNeXt-101-FPN) | 47.1 | 44.7 | Hybrid Task Cascade for Instance Segmentation | 2019 | ResNeXt FPN |
||
| 75 | CenterNet511 (Hourglass-104, multi-scale) | 47.0 | 50.7 | CenterNet: Keypoint Triplets for Object Detection | 2019 | multiscale | ||
| 76 | MAL (ResNeXt101, multi-scale) | 47.0 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt multiscale |
|||
| 77 | ISTR (ResNet50-FPN-3x) | 46.8 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | FPN ResNet |
|||
| 78 | RetinaNet (SpineNet-49, 896x896) | 46.7 | 50.6 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 79 | RPDet (ResNet-101-DCN, multi-scale) | 46.5 | 50.9 | RepPoints: Point Set Representation for Object Detection | 2019 | multiscale DCN ResNet |
||
| 80 | HoughNet (MS) | 46.4 | 50.7 | HoughNet: Integrating near and long-range evidence for bottom-up object detection | 2020 | multiscale | ||
| 81 | PPDet (ResNeXt-101-FPN, multiscale) | 46.3 | 51.6 | Reducing Label Noise in Anchor-Free Object Detection | 2020 | ResNeXt multiscale FPN |
||
| 82 | GFLV2 (ResNet-101) | 46.2 | 50.5 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||
| 83 | SNIPER (ResNet-101) | 46.1 | 51.6 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||
| 84 | Mask R-CNN (HRNetV2p-W48 + cascade) | 46.1 | 50.3 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 85 | DCNv2 (ResNet-101, multi-scale) | 46.0 | 50.8 | Deformable ConvNets v2: More Deformable, Better Results | 2018 | multiscale DCN ResNet |
||
| 86 | Gaussian-FCOS | 46 | Localization Uncertainty Estimation for Anchor-Free Object Detection | 2020 | ||||
| 87 | Cascade R-CNN-FPN (ResNet-101, map-guided) | 45.9 | 50 | InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting | 2019 | FPN ResNet |
||
| 88 | MAL (ResNeXt101, single-scale) | 45.9 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt single scale |
|||
| 89 | CenterMask+VoVNetV2-99 (single-scale) | 45.8 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | |||
| 90 | D-RFCN + SNIP (DPN-98 with flip, multi-scale) | 45.7 | 51.1 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale | ||
| 91 | YOLOv4 (CD53) | 45.5 | 49.5 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 92 | PP-YOLO (608x608) | 45.2 | 49.9 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||
| 93 | AC-FPN Cascade R-CNN (ResNet-101, single scale) | 45 | 49 | Attention-guided Context Feature Pyramid Network for Object Detection | 2019 | single scale FPN ResNet |
||
| 94 | FreeAnchor (ResNeXt-101) | 44.8 | 48.4 | FreeAnchor: Learning to Match Anchors for Visual Object Detection | 2019 | ResNeXt | ||
| 95 | FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) | 44.7 | 48.4 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 96 | CenterMask+VoVNet2-57 (single-scale) | 44.7 | 48.6 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||
| 97 | FSAF (ResNeXt-101, multi-scale) | 44.6 | 48.6 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | ResNeXt multiscale |
||
| 98 | aLRP Loss (ResNext-101, DCN, 500 scale) | 44.6 | 47.5 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt DCN |
||
| 99 | CenterMask + X-101-32x8d (single-scale) | 44.6 | 48.4 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||
| 100 | RetinaNet (SpineNet-49, 640x640) | 44.3 | 47.6 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 101 | YOLOF-DC5 | 44.3 | 47.5 | You Only Look One-level Feature | 2021 | YOLO | ||
| 102 | GFLV2 (ResNet-50) | 44.3 | 48.5 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||
| 103 | InterNet (ResNet-101-FPN, multi-scale) | 44.2 | 51.1 | Feature Intertwiner for Object Detection | 2019 | multiscale FPN ResNet |
||
| 104 | M2Det (VGG-16, multi-scale) | 44.2 | 49.3 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale | ||
| 105 | Faster R-CNN (LIP-ResNet-101-MD w FPN) | 43.9 | 48.1 | LIP: Local Importance-based Pooling | 2019 | FPN | ||
| 106 | M2Det (ResNet-101, multi-scale) | 43.9 | 48 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale ResNet |
||
| 107 | YOLOv3 @800 + ASFF* (Darknet-53) | 43.9 | 49.2 | Learning Spatial Fusion for Single-Shot Object Detection | 2019 | YOLO | ||
| 108 | FoveaBox (ResNeXt-101) | 43.9 | 47.7 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | ||
| 109 | ExtremeNet (Hourglass-104, multi-scale) | 43.7 | 47.0 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | multiscale | ||
| 110 | YOLOv4-608 | 43.5 | 47.3 | YOLOv4: Optimal Speed and Accuracy of Object Detection | 2020 | single scale YOLO |
||
| 111 | SNIPER (ResNet-50) | 43.5 | 48.6 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||
| 112 | CenterNet (HRNetV2-W48) | 43.5 | 46.5 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 113 | D-RFCN + SNIP (ResNet-101, multi-scale) | 43.4 | 48.4 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale ResNet |
||
| 114 | Grid R-CNN (ResNeXt-101-FPN) | 43.2 | 46.6 | Grid R-CNN | 2018 | ResNeXt FPN |
||
| 115 | FCOS (ResNeXt-101-64x4d-FPN) | 43.2 | 46.6 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 116 | CornerNet-Saccade (Hourglass-104, multi-scale) | 43.2 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | multiscale | |||
| 117 | Libra R-CNN (ResNeXt-101-FPN) | 43.0 | 47 | Libra R-CNN: Towards Balanced Learning for Object Detection | 2019 | ResNeXt FPN |
||
| 118 | RPDet (ResNet-101-DCN) | 42.8 | 46.3 | RepPoints: Point Set Representation for Object Detection | 2019 | DCN ResNet |
||
| 119 | SpineNet-49 (640, RetinaNet, single-scale) | 42.8 | 46.1 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | single scale | ||
| 120 | Cascade R-CNN (ResNet-101-FPN+, cascade) | 42.8 | 46.3 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 121 | Cascade R-CNN | 42.8 | 46.3 | Cascade R-CNN: High Quality Object Detection and Instance Segmentation | 2019 | |||
| 122 | TridentNet (ResNet-101) | 42.7 | 46.5 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||
| 123 | FCOS (ResNeXt-32x8d-101-FPN) | 42.7 | 46.1 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 124 | RetinaMask (ResNeXt-101-FPN-GN) | 42.6 | 46.0 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | ResNeXt FPN |
||
| 125 | TAL + TAP | 42.5 | 46.4 | TOOD: Task-aligned One-stage Object Detection | 2021 | |||
| 126 | Faster R-CNN (HRNetV2p-W48) | 42.4 | 46.4 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 127 | HSD (Rest101, 768x768, single-scale test) | 42.3 | 46.9 | Hierarchical Shot Detector | 2019 | single scale | ||
| 128 | CornerNet511 (Hourglass-104, multi-scale) | 42.1 | 45.3 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | multiscale | ||
| 129 | FoveaBox (ResNeXt-101) | 42.1 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||
| 130 | FCOS (HRNet-W32-5l) | 42.0 | 45.3 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | |||
| 131 | RefineDet512+ (ResNet-101) | 41.8 | 45.7 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||
| 132 | GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) | 41.6 | 44.2 | Gradient Harmonized Single-stage Detector | 2018 | FPN | ||
| 133 | CenterNet-DLA (DLA-34, multi-scale) | 41.6 | Objects as Points | 2019 | multiscale | |||
| 134 | RetinaNet (SpineNet-49S, 640x640) | 41.5 | 44.6 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 135 | RPDet (ResNet-101) | 41 | 44.3 | RepPoints: Point Set Representation for Object Detection | 2019 | ResNet | ||
| 136 | M2Det (VGG-16, single-scale) | 41.0 | 45 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale | ||
| 137 | FSAF (ResNet-101, single-scale) | 40.9 | 44 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | single scale ResNet |
||
| 138 | RetinaNet (ResNeXt-101-FPN) | 40.8 | 44.1 | Focal Loss for Dense Object Detection | 2017 | ResNeXt FPN |
||
| 139 | Cascade R-CNN (ResNet-50-FPN+, cascade) | 40.6 | 44 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 140 | Faster R-CNN (Cascade RPN) | 40.6 | 44.5 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||
| 141 | ResNet-50-DW-DPN (Deformable Kernels) | 40.6 | Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation | 2019 | ResNet | |||
| 142 | IoU-Net | 40.6 | Acquisition of Localization Confidence for Accurate Object Detection | 2018 | ||||
| 143 | FCOS (HRNetV2p-W48) | 40.5 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||
| 144 | ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS | 40.4 | Bounding Box Regression with Uncertainty for Accurate Object Detection | 2018 | FPN ResNet |
|||
| 145 | RDSNet (ResNet-101, RetinaNet, mask, MBRM) | 40.3 | 43 | RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation | 2019 | ResNet | ||
| 146 | ExtremeNet (Hourglass-104, single-scale) | 40.2 | 43.2 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | single scale | ||
| 147 | Mask R-CNN (ResNet-101-FPN, CBN) | 40.1 | 44.1 | Cross-Iteration Batch Normalization | 2020 | FPN ResNet |
||
| 148 | Fast R-CNN (Cascade RPN) | 40.1 | 43.8 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||
| 149 | Mask R-CNN (ResNeXt-101-FPN) | 39.8 | 43.4 | Mask R-CNN | 2017 | ResNeXt FPN |
||
| 150 | GA-Faster-RCNN | 39.8 | 43.5 | Region Proposal by Guided Anchoring | 2019 | |||
| 151 | FPN (ResNet101 backbone) | 39.5 | ChainerCV: a Library for Deep Learning in Computer Vision | 2017 | FPN ResNet |
|||
| 152 | RetinaMask (ResNet-50-FPN) | 39.4 | 42.3 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | FPN ResNet |
||
| 153 | PP-YOLO (320x320) | 39.3 | 42.7 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||
| 154 | AA-ResNet-10 + RetinaNet | 39.2 | Attention Augmented Convolutional Networks | 2019 | ||||
| 155 | MAL (ResNet50, single-scale) | 39.2 | Multiple Anchor Learning for Visual Object Detection | 2019 | single scale ResNet |
|||
| 156 | RetinaNet (ResNet-101-FPN) | 39.1 | 42.3 | Focal Loss for Dense Object Detection | 2017 | FPN ResNet |
||
| 157 | Cascade R-CNN (ResNet-101-FPN+) | 38.8 | 41.9 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 158 | M2Det (ResNet-101, single-scale) | 38.8 | 41.7 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale ResNet |
||
| 159 | SaccadeNet (DLA-34-DCN) | 38.5 | 41.4 | SaccadeNet: A Fast and Accurate Object Detector | 2020 | DCN | ||
| 160 | Mask R-CNN (ResNet-101-FPN) | 38.2 | 41.7 | Mask R-CNN | 2017 | FPN ResNet |
||
| 161 | WSMA-Seg | 38.1 | Segmentation is All You Need | 2019 | ||||
| 162 | Faster R-CNN + FPN + CGD | 37.9 | Compact Global Descriptor for Neural Networks | 2019 | FPN | |||
| 163 | CornerNet511 (Hourglass-52, single-scale) | 37.8 | 40.1 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | single scale | ||
| 164 | RefineDet512+ (VGG-16) | 37.6 | 40.8 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||
| 165 | DeformConv-R-FCN (Aligned-Inception-ResNet) | 37.5 | Deformable Convolutional Networks | 2017 | ||||
| 166 | Faster R-CNN (ImageNet+300M) | 37.4 | 40.1 | Revisiting Unreasonable Effectiveness of Data in Deep Learning Era | 2017 | |||
| 167 | Mask R-CNN (Bottleneck-injected ResNet-50, FPN) | 36.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN !!ResNet |
|||
| 168 | Faster R-CNN + TDM | 36.8 | Beyond Skip Connections: Top-Down Modulation for Object Detection | 2016 | ||||
| 169 | Cascade R-CNN (ResNet-50-FPN+) | 36.5 | 39.2 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN; ResNet |
||
| 170 | RefineDet512 (ResNet-101) | 36.4 | 39.5 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||
| 171 | Faster R-CNN + FPN | 36.2 | Feature Pyramid Networks for Object Detection | 2016 | FPN | |||
| 172 | Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) | 35.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN; ResNet |
|||
| 173 | Faster R-CNN (box refinement, context, multi-scale testing) | 34.9 | Deep Residual Learning for Image Recognition | 2015 | multiscale | |||
| 174 | Faster R-CNN | 34.7 | Speed/accuracy trade-offs for modern convolutional object detectors | 2016 | ||||
| 175 | CornerNet-Squeeze | 34.4 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | ||||
| 176 | MultiPath Network | 33.2 | A MultiPath Network for Object Detection | 2016 | ||||
| 177 | ION | 33.1 | 34.6 | Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks | 2015 | |||
| 178 | RefineDet512 (VGG-16) | 33 | 35.5 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||
| 179 | YOLOv3 + Darknet-53 | 33.0 | YOLOv3: An Incremental Improvement | 2018 | YOLO | |||
| 180 | SSD512 | 28.8 | 30.3 | SSD: Single Shot MultiBox Detector | 2015 | |||
| 181 | MnasFPN (MobileNetV2) | 26.1 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 182 | ESPNetv2-512 | 26.0 | ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network | 2018 | ||||
| 183 | MnasFPN (MobileNetV3) | 25.5 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 184 | MnasFPN (MNASNet-B1) | 24.6 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 185 | MnasFPN x0.7 (MobileNetV2) | 23.8 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 186 | MobielNet-v1-SSD-300x300+CGD | 21.4 | Compact Global Descriptor for Neural Networks | 2019 | ||||
| 187 | Fast-RCNN | 19.7 | Fast R-CNN | 2015 | ||||
| 188 | MobileNet | 19.3 | MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications | 2017 | ||||
| 189 | DAT-S (RetinaNet) | 51.2 | Vision Transformer with Deformable Attention | 2022 | ||||
| 190 | CenterMask-VoVNet99 (multi-scale) | 53.2 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | multiscale | |||
| 191 | Mask R-CNN (HRNetV2p-W32 + cascade) | 48.6 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||
| 192 | FoveaBox (ResNeXt-101) | 45.2 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||
| 193 | VirTex Mask R-CNN (ResNet-50-FPN) | 44.8 | VirTex: Learning Visual Representations from Textual Annotations | 2020 | FPN; ResNet |
|||
| 194 | Centermask + ResNet101 | 46.9 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | ResNet | |||
| 195 | PAFNet (ResNet50-vd) | 45.3 | PAFNet: An Efficient Anchor-Free Object Detector Guidance | 2021 | ResNet | |||
| 196 | IoU-Net+EnergyRegression | 41.8 | Energy-Based Models for Deep Probabilistic Regression | 2019 | ||||
| 197 | Cascade R-CNN (HRNetV2p-W48) | 48.6 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||
| 198 | ISTR (ResNet50-FPN-3x, single-scale) | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | |||||
| 199 | FoveaBox (ResNeXt-101) | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | ||||
| 200 | EfficientDet-D7x (single-scale) | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale |
APS
| Rank | Model | box AP | APS | Paper | Code | Result | Year | Tags |
|---|---|---|---|---|---|---|---|---|
| 1 | SwinV2-G (HTC++) | 63.1 | Swin Transformer V2: Scaling Up Capacity and Resolution | Link | 2021 | Swin-Transformer | ||
| 2 | Florence-CoSwin-H | 62.4 | Florence: A New Foundation Model for Computer Vision | 2021 | Swin-Transformer | |||
| 3 | GLIP (Swin-L, multi-scale) | 61.5 | 45.3 | Grounded Language-Image Pre-training | 2021 | multiscale; Vision Language; Dynamic Head; BERT-Base |
||
| 4 | Soft Teacher + Swin-L (HTC++, multi-scale) | 61.3 | End-to-End Semi-Supervised Object Detection with Soft Teacher | 2021 | multiscale; Swin-Transformer |
|||
| 5 | DyHead (Swin-L, multi scale, self-training) | 60.6 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale; Swin-Transformer |
|||
| 6 | Dual-Swin-L (HTC, multi-scale) | 60.1 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | multiscale Swin-Transformer |
|||
| 7 | Dual-Swin-L (HTC, single-scale) | 59.4 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | Swin-Transformer | |||
| 8 | Focal-L (DyHead, multi-scale) | 58.9 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale Focal-Transformer |
|||
| 9 | DyHead (Swin-L, multi scale) | 58.7 | 41.7 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale Swin-Transformer |
||
| 10 | Swin-L (HTC++, multi scale) | 58.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | multiscale Swin-Transformer |
|||
| 11 | Focal-L (HTC++, multi-scale) | 58.4 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale | |||
| 12 | Swin-L (HTC++, single scale) | 57.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | single scale Swin-Transformer |
|||
| 13 | YOLOR-D6 (1280, single-scale, 34 fps) | 57.3 | 40.4 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 14 | SOLQ (Swin-L, single) | 56.5 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 15 | YOLOR-E6 (1280, single-scale, 45 fps) | 56.4 | 39.1 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 16 | CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) | 56.4 | 38.7 | Probabilistic two-stage detection | 2021 | single scale FPN DCN |
||
| 17 | QueryInst (single-scale) | 56.1 | 37.4 | Instances as Queries | 2021 | |||
| 18 | YOLOv4-P7 with TTA | 55.8 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
|||
| 19 | DetectoRS (ResNeXt-101-64x4d, multi-scale) | 55.7 | 37.7 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||
| 20 | YOLOR-W6 (1280, single-scale, 66 fps) | 55.5 | 37.6 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||
| 21 | YOLOv4-P7 CSP-P7 (single-scale, 16 fps) | 55.4 | 38.1 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 22 | CSP-p6 + Mish (multi-scale) | 55.2 | 37.6 | Mish: A Self Regularized Non-Monotonic Activation Function | 2019 | multiscale | ||
| 23 | YOLOv4-P6 with TTA | 54.9 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
|||
| 24 | Cascade Eff-B7 NAS-FPN (1280) | 54.8 | Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation | 2020 | single scale NAS-FPN |
|||
| 25 | DetectoRS (ResNeXt-101-32x4d, multi-scale) | 54.7 | 37.4 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||
| 26 | YOLOv4-P6 CSP-P6 (single-scale, 32 fps) | 54.3 | 36.6 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 27 | SpineNet-190 (1280, with Self-training on OpenImages, single-scale) | 54.3 | Rethinking Pre-training and Self-training | 2020 | single scale | |||
| 28 | UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) | 54.1 | 35.8 | USB: Universal-Scale Object Detection Benchmark | 2021 | multiscale DCN |
||
| 29 | EfficientDet-D7 (single-scale) | 53.7 | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale | |||
| 30 | PAA (ResNext-152-32x8d + DCN, multi-scale) | 53.5 | 36.0 | Probabilistic Anchor Assignment with IoU Prediction for Object Detection | 2020 | ResNeXt multiscale DCN |
||
| 31 | LSNet (Res2Net-101+ DCN, multi-scale) | 53.5 | 35.2 | Location-Sensitive Visual Recognition with Cross-IOU Loss | 2021 | multiscale DCN |
||
| 32 | ResNeSt-200 (multi-scale) | 53.3 | 35.1 | ResNeSt: Split-Attention Networks | 2020 | multiscale | ||
| 33 | Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) | 53.3 | 35.5 | CBNet: A Novel Composite Backbone Network Architecture for Object Detection | 2019 | multiscale | ||
| 34 | DetectoRS (ResNeXt-101-32x4d, single-scale) | 53.3 | 33.9 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt single scale |
||
| 35 | GFLV2 (Res2Net-101, DCN, multiscale) | 53.3 | 35.7 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | multiscale DCN |
||
| 36 | RelationNet++ (ResNeXt-64x4d-101-DCN) | 52.7 | RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder | 2020 | ResNeXt DCN |
|||
| 37 | YOLOv4-P5 with TTA | 52.5 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
|||
| 38 | Deformable DETR (ResNeXt-101+DCN) | 52.3 | 34.4 | Deformable DETR: Deformable Transformers for End-to-End Object Detection | 2020 | ResNeXt DCN |
||
| 39 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 52.3 | Global Context Networks | 2020 | ResNeXt DCN GCN |
|||
| 40 | RetinaNet (SpineNet-190, 1280x1280) | 52.1 | 35.4 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 41 | RepPoints v2 (ResNeXt-101, DCN, multi-scale) | 52.1 | 34.5 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt; multiscale DCN |
||
| 42 | AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) | 51.9 | 34.2 | Attention-guided Context Feature Pyramid Network for Object Detection | 2020 | ResNeXt multiscale FPN |
||
| 43 | OTA (ResNeXt-101+DCN, multiscale) | 51.5 | 34.1 | OTA: Optimal Transport Assignment for Object Detection | 2021 | |||
| 44 | UniverseNet-20.08d (Res2Net-101, DCN, single-scale) | 51.3 | 31.7 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||
| 45 | TSD (SENet154-DCN,multi-scale) | 51.2 | 33.8 | Revisiting the Sibling Head in Object Detector | 2020 | multiscale DCN |
||
| 46 | YOLOX-X (Modified CSP v5) | 51.2 | 31.2 | YOLOX: Exceeding YOLO Series in 2021 | 2021 | YOLO | ||
| 47 | RetinaNet (SpineNet-143, 1280x1280) | 50.7 | 33.6 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 48 | ATSS (ResNetXt-64x4d-101+DCN,multi-scale) | 50.7 | 33.2 | Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection | 2019 | ResNeXt multiscale DCN |
||
| 49 | NAS-FPN (AmoebaNet-D, learned aug) | 50.7 | 34.2 | Learning Data Augmentation Strategies for Object Detection | 2019 | FPN | ||
| 50 | GFLV2 (Res2Net-101, DCN) | 50.6 | 31.3 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN | ||
| 51 | aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) | 50.2 | 32.0 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt multiscale DCN |
||
| 52 | FreeAnchor + SEPC (DCN, ResNext-101-64x4d) | 50.1 | 31.3 | Scale-Equalizing Pyramid Convolution for Object Detection | 2020 | ResNeXt DCN |
||
| 53 | D2Det (ResNet-101-DCN, multi-scale test) | 50.1 | 32.7 | D2Det: Towards High Quality Object Detection and Instance Segmentation | 2020 | multiscale DCN ResNet |
||
| 54 | Dynamic R-CNN (ResNet-101-DCN, multi-scale) | 50.1 | 32.8 | Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training | 2020 | multiscale DCN ResNet |
||
| 55 | TSD (ResNet-101-Deformable, Image Pyramid) | 49.4 | 32.7 | Revisiting the Sibling Head in Object Detector | 2020 | ResNet | ||
| 56 | RepPoints v2 (ResNeXt-101, DCN) | 49.4 | 30.3 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt DCN |
||
| 57 | CPNDet (Hourglass-104, multi-scale) | 49.2 | 31.0 | Corner Proposal Network for Anchor-free, Two-stage Object Detection | 2020 | multiscale | ||
| 58 | GFLV2 (ResNeXt-101, 32x4d, DCN) | 49 | 29.7 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNeXt DCN |
||
| 59 | aLRP Loss (ResNext-101-64x4d, DCN, single scale) | 48.9 | 30.8 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale DCN |
||
| 60 | UniverseNet-20.08 (Res2Net-50, DCN, single-scale) | 48.8 | 30.1 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||
| 61 | SOLQ (ResNet101, single scale) | 48.7 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 62 | RetinaNet (SpineNet-96, 1024x1024) | 48.6 | 32 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 63 | TridentNet (ResNet-101-Deformable, Image Pyramid) | 48.4 | 31.8 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||
| 64 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 48.4 | GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond | 2019 | ResNeXt DCN GCN |
|||
| 65 | GFLV2 (ResNet-101-DCN) | 48.3 | 28.8 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN ResNet |
||
| 66 | GFL (X-101-32x4d-DCN, single-scale) | 48.2 | 29.2 | Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection | 2020 | ResNeXt single scale DCN |
||
| 67 | ISTR (ResNet101-FPN-3x, single-scale) | 48.1 | 28.7 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | |||
| 68 | aLRP Loss (ResNext-101-64x4d, single scale) | 47.8 | 30.2 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale |
||
| 69 | MatrixNet Corners (ResNet-152, multi-scale) | 47.8 | 29.7 | Matrix Nets: A New Deep Architecture for Object Detection | 2019 | multiscale ResNet |
||
| 70 | SOLQ (ResNet50, single scale) | 47.8 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||
| 71 | SAPD (ResNeXt-101, single-scale) | 47.4 | 28.1 | Soft Anchor-Point Object Detection | 2019 | ResNeXt single scale |
||
| 72 | PANet (ResNeXt-101, multi-scale) | 47.4 | 30.1 | Path Aggregation Network for Instance Segmentation | 2018 | ResNeXt multiscale |
||
| 73 | HTC (HRNetV2p-W48) | 47.3 | 28.0 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 74 | HTC (ResNeXt-101-FPN) | 47.1 | 22.8 | Hybrid Task Cascade for Instance Segmentation | 2019 | ResNeXt FPN |
||
| 75 | CenterNet511 (Hourglass-104, multi-scale) | 47.0 | 28.9 | CenterNet: Keypoint Triplets for Object Detection | 2019 | multiscale | ||
| 76 | MAL (ResNeXt101, multi-scale) | 47.0 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt multiscale |
|||
| 77 | ISTR (ResNet50-FPN-3x) | 46.8 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | FPN ResNet |
|||
| 78 | RetinaNet (SpineNet-49, 896x896) | 46.7 | 29.1 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 79 | RPDet (ResNet-101-DCN, multi-scale) | 46.5 | 30.3 | RepPoints: Point Set Representation for Object Detection | 2019 | multiscale DCN ResNet |
||
| 80 | HoughNet (MS) | 46.4 | 29.1 | HoughNet: Integrating near and long-range evidence for bottom-up object detection | 2020 | multiscale | ||
| 81 | PPDet (ResNeXt-101-FPN, multiscale) | 46.3 | 31.4 | Reducing Label Noise in Anchor-Free Object Detection | 2020 | ResNeXt multiscale FPN |
||
| 82 | GFLV2 (ResNet-101) | 46.2 | 27.8 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||
| 83 | SNIPER (ResNet-101) | 46.1 | 29.6 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||
| 84 | Mask R-CNN (HRNetV2p-W48 + cascade) | 46.1 | 27.1 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 85 | DCNv2 (ResNet-101, multi-scale) | 46.0 | 27.8 | Deformable ConvNets v2: More Deformable, Better Results | 2018 | multiscale DCN ResNet |
||
| 86 | Gaussian-FCOS | 46 | Localization Uncertainty Estimation for Anchor-Free Object Detection | 2020 | ||||
| 87 | Cascade R-CNN-FPN (ResNet-101, map-guided) | 45.9 | 26.3 | InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting | 2019 | FPN ResNet |
||
| 88 | MAL (ResNeXt101, single-scale) | 45.9 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt single scale |
|||
| 89 | CenterMask+VoVNetV2-99 (single-scale) | 45.8 | 27.8 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||
| 90 | D-RFCN + SNIP (DPN-98 with flip, multi-scale) | 45.7 | 29.3 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale | ||
| 91 | YOLOv4 (CD53) | 45.5 | 27 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||
| 92 | PP-YOLO (608x608) | 45.2 | 26.3 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||
| 93 | AC-FPN Cascade R-CNN (ResNet-101, single scale) | 45 | 26.9 | Attention-guided Context Feature Pyramid Network for Object Detection | 2019 | single scale FPN ResNet |
||
| 94 | FreeAnchor (ResNeXt-101) | 44.8 | 27 | FreeAnchor: Learning to Match Anchors for Visual Object Detection | 2019 | ResNeXt | ||
| 95 | FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) | 44.7 | 27.6 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 96 | CenterMask+VoVNet2-57 (single-scale) | 44.7 | 27.1 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||
| 97 | FSAF (ResNeXt-101, multi-scale) | 44.6 | 29.7 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | ResNeXt multiscale |
||
| 98 | aLRP Loss (ResNext-101, DCN, 500 scale) | 44.6 | 24.6 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt DCN |
||
| 99 | CenterMask + X-101-32x8d (single-scale) | 44.6 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | |||
| 100 | RetinaNet (SpineNet-49, 640x640) | 44.3 | 25.9 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 101 | YOLOF-DC5 | 44.3 | 24.0 | You Only Look One-level Feature | 2021 | YOLO | ||
| 102 | GFLV2 (ResNet-50) | 44.3 | 26.8 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||
| 103 | InterNet (ResNet-101-FPN, multi-scale) | 44.2 | 27.2 | Feature Intertwiner for Object Detection | 2019 | multiscale FPN ResNet |
||
| 104 | M2Det (VGG-16, multi-scale) | 44.2 | 29.2 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale | ||
| 105 | Faster R-CNN (LIP-ResNet-101-MD w FPN) | 43.9 | 25.4 | LIP: Local Importance-based Pooling | 2019 | FPN | ||
| 106 | M2Det (ResNet-101, multi-scale) | 43.9 | 29.6 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale ResNet |
||
| 107 | YOLOv3 @800 + ASFF* (Darknet-53) | 43.9 | 27.0 | Learning Spatial Fusion for Single-Shot Object Detection | 2019 | YOLO | ||
| 108 | FoveaBox (ResNeXt-101) | 43.9 | 26.8 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | ||
| 109 | ExtremeNet (Hourglass-104, multi-scale) | 43.7 | 24.1 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | multiscale | ||
| 110 | YOLOv4-608 | 43.5 | 26.7 | YOLOv4: Optimal Speed and Accuracy of Object Detection | 2020 | single scale YOLO |
||
| 111 | SNIPER (ResNet-50) | 43.5 | 26.1 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||
| 112 | CenterNet (HRNetV2-W48) | 43.5 | 22.2 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 113 | D-RFCN + SNIP (ResNet-101, multi-scale) | 43.4 | 27.2 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale ResNet |
||
| 114 | Grid R-CNN (ResNeXt-101-FPN) | 43.2 | 25.1 | Grid R-CNN | 2018 | ResNeXt FPN |
||
| 115 | FCOS (ResNeXt-101-64x4d-FPN) | 43.2 | 26.5 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 116 | CornerNet-Saccade (Hourglass-104, multi-scale) | 43.2 | 24.4 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | multiscale | ||
| 117 | Libra R-CNN (ResNeXt-101-FPN) | 43.0 | 25.3 | Libra R-CNN: Towards Balanced Learning for Object Detection | 2019 | ResNeXt FPN |
||
| 118 | RPDet (ResNet-101-DCN) | 42.8 | 24.9 | RepPoints: Point Set Representation for Object Detection | 2019 | DCN ResNet |
||
| 119 | SpineNet-49 (640, RetinaNet, single-scale) | 42.8 | 23.7 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | single scale | ||
| 120 | Cascade R-CNN (ResNet-101-FPN+, cascade) | 42.8 | 23.7 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 121 | Cascade R-CNN | 42.8 | 23.7 | Cascade R-CNN: High Quality Object Detection and Instance Segmentation | 2019 | |||
| 122 | TridentNet (ResNet-101) | 42.7 | 23.9 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||
| 123 | FCOS (ResNeXt-32x8d-101-FPN) | 42.7 | 26.0 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||
| 124 | RetinaMask (ResNeXt-101-FPN-GN) | 42.6 | 24.8 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | ResNeXt FPN |
||
| 125 | TAL + TAP | 42.5 | TOOD: Task-aligned One-stage Object Detection | 2021 | ||||
| 126 | Faster R-CNN (HRNetV2p-W48) | 42.4 | 24.9 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 127 | HSD (Rest101, 768x768, single-scale test) | 42.3 | 22.8 | Hierarchical Shot Detector | 2019 | single scale | ||
| 128 | CornerNet511 (Hourglass-104, multi-scale) | 42.1 | 20.8 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | multiscale | ||
| 129 | FoveaBox (ResNeXt-101) | 42.1 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||
| 130 | FCOS (HRNet-W32-5l) | 42.0 | 25.4 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | |||
| 131 | RefineDet512+ (ResNet-101) | 41.8 | 25.6 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||
| 132 | GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) | 41.6 | 22.3 | Gradient Harmonized Single-stage Detector | 2018 | FPN | ||
| 133 | CenterNet-DLA (DLA-34, multi-scale) | 41.6 | 21.5 | Objects as Points | 2019 | multiscale | ||
| 134 | RetinaNet (SpineNet-49S, 640x640) | 41.5 | 23.3 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||
| 135 | RPDet (ResNet-101) | 41 | 23.6 | RepPoints: Point Set Representation for Object Detection | 2019 | ResNet | ||
| 136 | M2Det (VGG-16, single-scale) | 41.0 | 22.1 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale | ||
| 137 | FSAF (ResNet-101, single-scale) | 40.9 | 24 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | single scale ResNet |
||
| 138 | RetinaNet (ResNeXt-101-FPN) | 40.8 | 24.1 | Focal Loss for Dense Object Detection | 2017 | ResNeXt FPN |
||
| 139 | Cascade R-CNN (ResNet-50-FPN+, cascade) | 40.6 | 22.6 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 140 | Faster R-CNN (Cascade RPN) | 40.6 | 22.0 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||
| 141 | ResNet-50-DW-DPN (Deformable Kernels) | 40.6 | 24.6 | Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation | 2019 | ResNet | ||
| 142 | IoU-Net | 40.6 | Acquisition of Localization Confidence for Accurate Object Detection | 2018 | ||||
| 143 | FCOS (HRNetV2p-W48) | 40.5 | 23.4 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||
| 144 | ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS | 40.4 | Bounding Box Regression with Uncertainty for Accurate Object Detection | 2018 | FPN ResNet |
|||
| 145 | RDSNet (ResNet-101, RetinaNet, mask, MBRM) | 40.3 | 22.1 | RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation | 2019 | ResNet | ||
| 146 | ExtremeNet (Hourglass-104, single-scale) | 40.2 | 20.4 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | single scale | ||
| 147 | Mask R-CNN (ResNet-101-FPN, CBN) | 40.1 | 35.8 | Cross-Iteration Batch Normalization | 2020 | FPN ResNet |
||
| 148 | Fast R-CNN (Cascade RPN) | 40.1 | 22.1 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||
| 149 | Mask R-CNN (ResNeXt-101-FPN) | 39.8 | 22.1 | Mask R-CNN | 2017 | ResNeXt FPN |
||
| 150 | GA-Faster-RCNN | 39.8 | 21.8 | Region Proposal by Guided Anchoring | 2019 | |||
| 151 | FPN (ResNet101 backbone) | 39.5 | ChainerCV: a Library for Deep Learning in Computer Vision | 2017 | FPN ResNet |
|||
| 152 | RetinaMask (ResNet-50-FPN) | 39.4 | 21.9 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | FPN ResNet |
||
| 153 | PP-YOLO (320x320) | 39.3 | 16.7 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||
| 154 | AA-ResNet-10 + RetinaNet | 39.2 | Attention Augmented Convolutional Networks | 2019 | ||||
| 155 | MAL (ResNet50, single-scale) | 39.2 | Multiple Anchor Learning for Visual Object Detection | 2019 | single scale ResNet |
|||
| 156 | RetinaNet (ResNet-101-FPN) | 39.1 | 21.8 | Focal Loss for Dense Object Detection | 2017 | FPN ResNet |
||
| 157 | Cascade R-CNN (ResNet-101-FPN+) | 38.8 | 21.3 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||
| 158 | M2Det (ResNet-101, single-scale) | 38.8 | 20.5 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale ResNet |
||
| 159 | SaccadeNet (DLA-34-DCN) | 38.5 | 19.2 | SaccadeNet: A Fast and Accurate Object Detector | 2020 | DCN | ||
| 160 | Mask R-CNN (ResNet-101-FPN) | 38.2 | 20.1 | Mask R-CNN | 2017 | FPN ResNet |
||
| 161 | WSMA-Seg | 38.1 | Segmentation is All You Need | 2019 | ||||
| 162 | Faster R-CNN + FPN + CGD | 37.9 | Compact Global Descriptor for Neural Networks | 2019 | FPN | |||
| 163 | CornerNet511 (Hourglass-52, single-scale) | 37.8 | 17.0 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | single scale | ||
| 164 | RefineDet512+ (VGG-16) | 37.6 | 22.7 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||
| 165 | DeformConv-R-FCN (Aligned-Inception-ResNet) | 37.5 | 19.4 | Deformable Convolutional Networks | 2017 | |||
| 166 | Faster R-CNN (ImageNet+300M) | 37.4 | 17.5 | Revisiting Unreasonable Effectiveness of Data in Deep Learning Era | 2017 | |||
| 167 | Mask R-CNN (Bottleneck-injected ResNet-50, FPN) | 36.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN !!ResNet |
|||
| 168 | Faster R-CNN + TDM | 36.8 | Beyond Skip Connections: Top-Down Modulation for Object Detection | 2016 | ||||
| 169 | Cascade R-CNN (ResNet-50-FPN+) | 36.5 | 20.3 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN; ResNet |
||
| 170 | RefineDet512 (ResNet-101) | 36.4 | 16.6 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||
| 171 | Faster R-CNN + FPN | 36.2 | Feature Pyramid Networks for Object Detection | 2016 | FPN | |||
| 172 | Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) | 35.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN; ResNet |
|||
| 173 | Faster R-CNN (box refinement, context, multi-scale testing) | 34.9 | Deep Residual Learning for Image Recognition | 2015 | multiscale | |||
| 174 | Faster R-CNN | 34.7 | Speed/accuracy trade-offs for modern convolutional object detectors | 2016 | ||||
| 175 | CornerNet-Squeeze | 34.4 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | ||||
| 176 | MultiPath Network | 33.2 | A MultiPath Network for Object Detection | 2016 | ||||
| 177 | ION | 33.1 | 14.5 | Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks | 2015 | |||
| 178 | RefineDet512 (VGG-16) | 33 | 16.3 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||
| 179 | YOLOv3 + Darknet-53 | 33.0 | YOLOv3: An Incremental Improvement | 2018 | YOLO | |||
| 180 | SSD512 | 28.8 | SSD: Single Shot MultiBox Detector | 2015 | ||||
| 181 | MnasFPN (MobileNetV2) | 26.1 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 182 | ESPNetv2-512 | 26.0 | ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network | 2018 | ||||
| 183 | MnasFPN (MobileNetV3) | 25.5 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 184 | MnasFPN (MNASNet-B1) | 24.6 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 185 | MnasFPN x0.7 (MobileNetV2) | 23.8 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||
| 186 | MobielNet-v1-SSD-300x300+CGD | 21.4 | Compact Global Descriptor for Neural Networks | 2019 | ||||
| 187 | Fast-RCNN | 19.7 | Fast R-CNN | 2015 | ||||
| 188 | MobileNet | 19.3 | MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications | 2017 | ||||
| 189 | DAT-S (RetinaNet) | 32.3 | Vision Transformer with Deformable Attention | 2022 | ||||
| 190 | CenterMask-VoVNet99 (multi-scale) | 32.4 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | multiscale | |||
| 191 | Mask R-CNN (HRNetV2p-W32 + cascade) | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||||
| 192 | FoveaBox (ResNeXt-101) | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | ||||
| 193 | VirTex Mask R-CNN (ResNet-50-FPN) | VirTex: Learning Visual Representations from Textual Annotations | 2020 | FPN; ResNet |
||||
| 194 | Centermask + ResNet101 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | ResNet | ||||
| 195 | PAFNet (ResNet50-vd) | 22.8 | PAFNet: An Efficient Anchor-Free Object Detector Guidance | 2021 | ResNet | |||
| 196 | IoU-Net+EnergyRegression | Energy-Based Models for Deep Probabilistic Regression | 2019 | |||||
| 197 | Cascade R-CNN (HRNetV2p-W48) | 26.0 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||
| 198 | ISTR (ResNet50-FPN-3x, single-scale) | 27.8 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | ||||
| 199 | FoveaBox (ResNeXt-101) | 24.9 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||
| 200 | EfficientDet-D7x (single-scale) | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale |
| Rank | Model | box AP | AP50 | AP75 | APS | APM | APL | AP | Extra Training Data | Paper | Code | Result | Year | Tags |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SwinV2-G (HTC++) | 63.1 | Swin Transformer V2: Scaling Up Capacity and Resolution | Link | 2021 | Swin-Transformer | ||||||||
| 2 | Florence-CoSwin-H | 62.4 | Florence: A New Foundation Model for Computer Vision | 2021 | Swin-Transformer | |||||||||
| 3 | GLIP (Swin-L, multi-scale) | 61.5 | 79.5 | 67.7 | 45.3 | 64.9 | 75.0 | Grounded Language-Image Pre-training | 2021 | multiscale; Vision Language; Dynamic Head; BERT-Base |
||||
| 4 | Soft Teacher + Swin-L (HTC++, multi-scale) | 61.3 | End-to-End Semi-Supervised Object Detection with Soft Teacher | 2021 | multiscale; Swin-Transformer |
|||||||||
| 5 | DyHead (Swin-L, multi scale, self-training) | 60.6 | 78.5 | 66.6 | 64.0 | 74.2 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale; Swin-Transformer |
|||||
| 6 | Dual-Swin-L (HTC, multi-scale) | 60.1 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | multiscale Swin-Transformer |
|||||||||
| 7 | Dual-Swin-L (HTC, single-scale) | 59.4 | CBNetV2: A Composite Backbone Network Architecture for Object Detection | 2021 | Swin-Transformer | |||||||||
| 8 | Focal-L (DyHead, multi-scale) | 58.9 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale Focal-Transformer |
|||||||||
| 9 | DyHead (Swin-L, multi scale) | 58.7 | 77.1 | 64.5 | 41.7 | 62.0 | 72.8 | Dynamic Head: Unifying Object Detection Heads with Attentions | 2021 | multiscale Swin-Transformer |
||||
| 10 | Swin-L (HTC++, multi scale) | 58.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | multiscale Swin-Transformer |
|||||||||
| 11 | Focal-L (HTC++, multi-scale) | 58.4 | Focal Self-attention for Local-Global Interactions in Vision Transformers | 2021 | multiscale | |||||||||
| 12 | Swin-L (HTC++, single scale) | 57.7 | Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | 2021 | single scale Swin-Transformer |
|||||||||
| 13 | YOLOR-D6 (1280, single-scale, 34 fps) | 57.3 | 75.0 | 62.7 | 40.4 | 61.2 | 69.2 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||||
| 14 | SOLQ (Swin-L, single) | 56.5 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||||||||
| 15 | YOLOR-E6 (1280, single-scale, 45 fps) | 56.4 | 74.1 | 61.6 | 39.1 | 60.1 | 68.2 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||||
| 16 | CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) | 56.4 | 74.0 | 61.6 | 38.7 | 59.7 | 68.6 | Probabilistic two-stage detection | 2021 | single scale FPN DCN |
||||
| 17 | QueryInst (single-scale) | 56.1 | 75.9 | 61.9 | 37.4 | 58.9 | 70.3 | Instances as Queries | 2021 | |||||
| 18 | YOLOv4-P7 with TTA | 55.8 | 73.2 | 61.2 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
|||||||
| 19 | DetectoRS (ResNeXt-101-64x4d, multi-scale) | 55.7 | 74.2 | 61.1 | 37.7 | 58.4 | 68.1 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||||
| 20 | YOLOR-W6 (1280, single-scale, 66 fps) | 55.5 | 73.2 | 60.6 | 37.6 | 59.5 | 67.7 | You Only Learn One Representation: Unified Network for Multiple Tasks | 2021 | single scale YOLO |
||||
| 21 | YOLOv4-P7 CSP-P7 (single-scale, 16 fps) | 55.4 | 73.3 | 60.7 | 38.1 | 59.5 | 67.4 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||||
| 22 | CSP-p6 + Mish (multi-scale) | 55.2 | 72.9 | 60.5 | 37.6 | 59.0 | 66.9 | Mish: A Self Regularized Non-Monotonic Activation Function | 2019 | multiscale | ||||
| 23 | YOLOv4-P6 with TTA | 54.9 | 72.6 | 60.2 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
|||||||
| 24 | Cascade Eff-B7 NAS-FPN (1280) | 54.8 | Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation | 2020 | single scale NAS-FPN |
|||||||||
| 25 | DetectoRS (ResNeXt-101-32x4d, multi-scale) | 54.7 | 73.5 | 60.1 | 37.4 | 57.3 | 66.4 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt multiscale |
||||
| 26 | YOLOv4-P6 CSP-P6 (single-scale, 32 fps) | 54.3 | 72.3 | 59.5 | 36.6 | 58.2 | 65.5 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||||
| 27 | SpineNet-190 (1280, with Self-training on OpenImages, single-scale) | 54.3 | Rethinking Pre-training and Self-training | 2020 | single scale | |||||||||
| 28 | UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) | 54.1 | 71.6 | 59.9 | 35.8 | 57.2 | 67.4 | USB: Universal-Scale Object Detection Benchmark | 2021 | multiscale DCN |
||||
| 29 | EfficientDet-D7 (single-scale) | 53.7 | 72.4 | 57.0 | 66.3 | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale | ||||||
| 30 | PAA (ResNext-152-32x8d + DCN, multi-scale) | 53.5 | 71.6 | 59.1 | 36.0 | 56.3 | 66.9 | Probabilistic Anchor Assignment with IoU Prediction for Object Detection | 2020 | ResNeXt multiscale DCN |
||||
| 31 | LSNet (Res2Net-101+ DCN, multi-scale) | 53.5 | 71.1 | 59.2 | 35.2 | 56.4 | 65.8 | Location-Sensitive Visual Recognition with Cross-IOU Loss | 2021 | multiscale DCN |
||||
| 32 | ResNeSt-200 (multi-scale) | 53.3 | 72.0 | 58.0 | 35.1 | 56.2 | 66.8 | ResNeSt: Split-Attention Networks | 2020 | multiscale | ||||
| 33 | Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) | 53.3 | 71.9 | 58.5 | 35.5 | 55.8 | 66.7 | CBNet: A Novel Composite Backbone Network Architecture for Object Detection | 2019 | multiscale | ||||
| 34 | DetectoRS (ResNeXt-101-32x4d, single-scale) | 53.3 | 71.6 | 58.5 | 33.9 | 56.5 | 66.9 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | 2020 | ResNeXt single scale |
||||
| 35 | GFLV2 (Res2Net-101, DCN, multiscale) | 53.3 | 70.9 | 59.2 | 35.7 | 56.1 | 65.6 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | multiscale DCN |
||||
| 36 | RelationNet++ (ResNeXt-64x4d-101-DCN) | 52.7 | RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder | 2020 | ResNeXt DCN |
|||||||||
| 37 | YOLOv4-P5 with TTA | 52.5 | 70.3 | 58 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | multiscale YOLO |
|||||||
| 38 | Deformable DETR (ResNeXt-101+DCN) | 52.3 | 71.9 | 58.1 | 34.4 | 54.4 | 65.6 | Deformable DETR: Deformable Transformers for End-to-End Object Detection | 2020 | ResNeXt DCN |
||||
| 39 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 52.3 | 70.9 | 56.9 | Global Context Networks | 2020 | ResNeXt DCN GCN |
|||||||
| 40 | RetinaNet (SpineNet-190, 1280x1280) | 52.1 | 71.8 | 56.5 | 35.4 | 55 | 63.6 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||||
| 41 | RepPoints v2 (ResNeXt-101, DCN, multi-scale) | 52.1 | 70.1 | 57.5 | 34.5 | 54.6 | 63.6 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt; multiscale DCN |
||||
| 42 | AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) | 51.9 | 70.4 | 57 | 34.2 | 54.8 | 64.7 | Attention-guided Context Feature Pyramid Network for Object Detection | 2020 | ResNeXt multiscale FPN |
||||
| 43 | OTA (ResNeXt-101+DCN, multiscale) | 51.5 | 68.6 | 57.1 | 34.1 | 53.7 | 64.1 | OTA: Optimal Transport Assignment for Object Detection | 2021 | |||||
| 44 | UniverseNet-20.08d (Res2Net-101, DCN, single-scale) | 51.3 | 70.0 | 55.8 | 31.7 | 55.3 | 64.9 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||||
| 45 | TSD (SENet154-DCN,multi-scale) | 51.2 | 71.9 | 56.0 | 33.8 | 54.8 | 64.2 | Revisiting the Sibling Head in Object Detector | 2020 | multiscale DCN |
||||
| 46 | YOLOX-X (Modified CSP v5) | 51.2 | 69.6 | 55.7 | 31.2 | 56.1 | 66.1 | YOLOX: Exceeding YOLO Series in 2021 | 2021 | YOLO | ||||
| 47 | RetinaNet (SpineNet-143, 1280x1280) | 50.7 | 70.4 | 54.9 | 33.6 | 53.9 | 62.1 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||||
| 48 | ATSS (ResNetXt-64x4d-101+DCN,multi-scale) | 50.7 | 68.9 | 56.3 | 33.2 | 52.9 | 62.4 | Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection | 2019 | ResNeXt multiscale DCN |
||||
| 49 | NAS-FPN (AmoebaNet-D, learned aug) | 50.7 | 34.2 | 55.5 | 64.5 | Learning Data Augmentation Strategies for Object Detection | 2019 | FPN | ||||||
| 50 | GFLV2 (Res2Net-101, DCN) | 50.6 | 69 | 55.3 | 31.3 | 54.3 | 63.5 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN | ||||
| 51 | aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) | 50.2 | 70.3 | 53.9 | 32.0 | 53.1 | 63.0 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt multiscale DCN |
||||
| 52 | FreeAnchor + SEPC (DCN, ResNext-101-64x4d) | 50.1 | 69.8 | 54.3 | 31.3 | 53.3 | 63.7 | Scale-Equalizing Pyramid Convolution for Object Detection | 2020 | ResNeXt DCN |
||||
| 53 | D2Det (ResNet-101-DCN, multi-scale test) | 50.1 | 69.4 | 54.9 | 32.7 | 52.7 | 62.1 | D2Det: Towards High Quality Object Detection and Instance Segmentation | 2020 | multiscale DCN ResNet |
||||
| 54 | Dynamic R-CNN (ResNet-101-DCN, multi-scale) | 50.1 | 68.3 | 55.6 | 32.8 | 53.0 | 61.2 | Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training | 2020 | multiscale DCN ResNet |
||||
| 55 | TSD (ResNet-101-Deformable, Image Pyramid) | 49.4 | 69.6 | 54.4 | 32.7 | 52.5 | 61.0 | Revisiting the Sibling Head in Object Detector | 2020 | ResNet | ||||
| 56 | RepPoints v2 (ResNeXt-101, DCN) | 49.4 | 68.9 | 53.4 | 30.3 | 52.1 | 62.3 | RepPoints V2: Verification Meets Regression for Object Detection | 2020 | ResNeXt DCN |
||||
| 57 | CPNDet (Hourglass-104, multi-scale) | 49.2 | 67.3 | 53.7 | 31.0 | 51.9 | 62.4 | Corner Proposal Network for Anchor-free, Two-stage Object Detection | 2020 | multiscale | ||||
| 58 | GFLV2 (ResNeXt-101, 32x4d, DCN) | 49 | 67.6 | 53.5 | 29.7 | 52.4 | 61.4 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNeXt DCN |
||||
| 59 | aLRP Loss (ResNext-101-64x4d, DCN, single scale) | 48.9 | 69.3 | 52.5 | 30.8 | 51.5 | 62.1 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale DCN |
||||
| 60 | UniverseNet-20.08 (Res2Net-50, DCN, single-scale) | 48.8 | 67.5 | 53.0 | 30.1 | 52.3 | 61.1 | USB: Universal-Scale Object Detection Benchmark | 2021 | single scale DCN |
||||
| 61 | SOLQ (ResNet101, single scale) | 48.7 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||||||||
| 62 | RetinaNet (SpineNet-96, 1024x1024) | 48.6 | 68.4 | 52.5 | 32 | 52.3 | 62 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||||
| 63 | TridentNet (ResNet-101-Deformable, Image Pyramid) | 48.4 | 69.7 | 53.5 | 31.8 | 51.3 | 60.3 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||||
| 64 | GCNet (ResNeXt-101 + DCN + cascade + GC r4) | 48.4 | 67.6 | 52.7 | GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond | 2019 | ResNeXt DCN GCN |
|||||||
| 65 | GFLV2 (ResNet-101-DCN) | 48.3 | 66.5 | 52.8 | 28.8 | 51.9 | 60.7 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | DCN ResNet |
||||
| 66 | GFL (X-101-32x4d-DCN, single-scale) | 48.2 | 67.4 | 52.6 | 29.2 | 51.7 | 60.2 | Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection | 2020 | ResNeXt single scale DCN |
||||
| 67 | ISTR (ResNet101-FPN-3x, single-scale) | 48.1 | 28.7 | 50.4 | 61.5 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | |||||||
| 68 | aLRP Loss (ResNext-101-64x4d, single scale) | 47.8 | 68.4 | 51.1 | 30.2 | 50.8 | 59.1 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt single scale |
||||
| 69 | MatrixNet Corners (ResNet-152, multi-scale) | 47.8 | 66.2 | 52.3 | 29.7 | 50.4 | 60.7 | Matrix Nets: A New Deep Architecture for Object Detection | 2019 | multiscale ResNet |
||||
| 70 | SOLQ (ResNet50, single scale) | 47.8 | SOLQ: Segmenting Objects by Learning Queries | 2021 | Transformer single scale |
|||||||||
| 71 | SAPD (ResNeXt-101, single-scale) | 47.4 | 67.4 | 51.1 | 28.1 | 50.3 | 61.5 | Soft Anchor-Point Object Detection | 2019 | ResNeXt single scale |
||||
| 72 | PANet (ResNeXt-101, multi-scale) | 47.4 | 67.2 | 51.8 | 30.1 | 51.7 | 60.0 | Path Aggregation Network for Instance Segmentation | 2018 | ResNeXt multiscale |
||||
| 73 | HTC (HRNetV2p-W48) | 47.3 | 65.9 | 51.2 | 28.0 | 49.7 | 59.8 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||||
| 74 | HTC (ResNeXt-101-FPN) | 47.1 | 63.9 | 44.7 | 22.8 | 43.9 | 54.6 | Hybrid Task Cascade for Instance Segmentation | 2019 | ResNeXt FPN |
||||
| 75 | CenterNet511 (Hourglass-104, multi-scale) | 47.0 | 64.5 | 50.7 | 28.9 | 49.9 | 58.9 | CenterNet: Keypoint Triplets for Object Detection | 2019 | multiscale | ||||
| 76 | MAL (ResNeXt101, multi-scale) | 47.0 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt multiscale |
|||||||||
| 77 | ISTR (ResNet50-FPN-3x) | 46.8 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | FPN ResNet |
|||||||||
| 78 | RetinaNet (SpineNet-49, 896x896) | 46.7 | 66.3 | 50.6 | 29.1 | 50.1 | 61.7 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||||
| 79 | RPDet (ResNet-101-DCN, multi-scale) | 46.5 | 67.4 | 50.9 | 30.3 | 49.7 | 57.1 | RepPoints: Point Set Representation for Object Detection | 2019 | multiscale DCN ResNet |
||||
| 80 | HoughNet (MS) | 46.4 | 65.1 | 50.7 | 29.1 | 48.5 | 58.1 | HoughNet: Integrating near and long-range evidence for bottom-up object detection | 2020 | multiscale | ||||
| 81 | PPDet (ResNeXt-101-FPN, multiscale) | 46.3 | 64.8 | 51.6 | 31.4 | 49.9 | 56.4 | Reducing Label Noise in Anchor-Free Object Detection | 2020 | ResNeXt multiscale FPN |
||||
| 82 | GFLV2 (ResNet-101) | 46.2 | 64.3 | 50.5 | 27.8 | 49.9 | 57 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||||
| 83 | SNIPER (ResNet-101) | 46.1 | 67.0 | 51.6 | 29.6 | 48.9 | 58.1 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||||
| 84 | Mask R-CNN (HRNetV2p-W48 + cascade) | 46.1 | 64.0 | 50.3 | 27.1 | 48.6 | 58.3 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||||
| 85 | DCNv2 (ResNet-101, multi-scale) | 46.0 | 67.9 | 50.8 | 27.8 | 49.1 | 59.5 | Deformable ConvNets v2: More Deformable, Better Results | 2018 | multiscale DCN ResNet |
||||
| 86 | Gaussian-FCOS | 46 | Localization Uncertainty Estimation for Anchor-Free Object Detection | 2020 | ||||||||||
| 87 | Cascade R-CNN-FPN (ResNet-101, map-guided) | 45.9 | 64.2 | 50 | 26.3 | 49 | 58.6 | InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting | 2019 | FPN ResNet |
||||
| 88 | MAL (ResNeXt101, single-scale) | 45.9 | Multiple Anchor Learning for Visual Object Detection | 2019 | ResNeXt single scale |
|||||||||
| 89 | CenterMask+VoVNetV2-99 (single-scale) | 45.8 | 64.5 | 27.8 | 48.3 | 57.6 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | |||||
| 90 | D-RFCN + SNIP (DPN-98 with flip, multi-scale) | 45.7 | 67.3 | 51.1 | 29.3 | 48.8 | 57.1 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale | ||||
| 91 | YOLOv4 (CD53) | 45.5 | 64.1 | 49.5 | 27 | 49 | 56.7 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | 2020 | single scale YOLO |
||||
| 92 | PP-YOLO (608x608) | 45.2 | 65.2 | 49.9 | 26.3 | 47.8 | 57.2 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||||
| 93 | AC-FPN Cascade R-CNN (ResNet-101, single scale) | 45 | 64.4 | 49 | 26.9 | 47.7 | 56.6 | Attention-guided Context Feature Pyramid Network for Object Detection | 2019 | single scale FPN ResNet |
||||
| 94 | FreeAnchor (ResNeXt-101) | 44.8 | 64.3 | 48.4 | 27 | 47.9 | 56 | FreeAnchor: Learning to Match Anchors for Visual Object Detection | 2019 | ResNeXt | ||||
| 95 | FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) | 44.7 | 64.1 | 48.4 | 27.6 | 47.5 | 55.6 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||||
| 96 | CenterMask+VoVNet2-57 (single-scale) | 44.7 | 63.1 | 48.6 | 27.1 | 55.9 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | |||||
| 97 | FSAF (ResNeXt-101, multi-scale) | 44.6 | 65.2 | 48.6 | 29.7 | 47.1 | 54.6 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | ResNeXt multiscale |
||||
| 98 | aLRP Loss (ResNext-101, DCN, 500 scale) | 44.6 | 65.0 | 47.5 | 24.6 | 48.1 | 58.3 | A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection | 2020 | ResNeXt DCN |
||||
| 99 | CenterMask + X-101-32x8d (single-scale) | 44.6 | 63.4 | 48.4 | 47.2 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | single scale | ||||||
| 100 | RetinaNet (SpineNet-49, 640x640) | 44.3 | 63.8 | 47.6 | 25.9 | 47.7 | 61.1 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||||
| 101 | YOLOF-DC5 | 44.3 | 62.9 | 47.5 | 24.0 | 48.5 | 60.4 | You Only Look One-level Feature | 2021 | YOLO | ||||
| 102 | GFLV2 (ResNet-50) | 44.3 | 62.3 | 48.5 | 26.8 | 47.7 | 54.1 | Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection | 2020 | ResNet | ||||
| 103 | InterNet (ResNet-101-FPN, multi-scale) | 44.2 | 67.5 | 51.1 | 27.2 | 50.3 | 57.7 | Feature Intertwiner for Object Detection | 2019 | multiscale FPN ResNet |
||||
| 104 | M2Det (VGG-16, multi-scale) | 44.2 | 64.6 | 49.3 | 29.2 | 47.9 | 55.1 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale | ||||
| 105 | Faster R-CNN (LIP-ResNet-101-MD w FPN) | 43.9 | 65.7 | 48.1 | 25.4 | 46.7 | 56.3 | LIP: Local Importance-based Pooling | 2019 | FPN | ||||
| 106 | M2Det (ResNet-101, multi-scale) | 43.9 | 64.4 | 48 | 29.6 | 49.6 | 54.3 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | multiscale ResNet |
||||
| 107 | YOLOv3 @800 + ASFF* (Darknet-53) | 43.9 | 64.1 | 49.2 | 27.0 | 46.6 | 53.4 | Learning Spatial Fusion for Single-Shot Object Detection | 2019 | YOLO | ||||
| 108 | FoveaBox (ResNeXt-101) | 43.9 | 63.5 | 47.7 | 26.8 | 46.9 | 55.6 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | ||||
| 109 | ExtremeNet (Hourglass-104, multi-scale) | 43.7 | 60.5 | 47.0 | 24.1 | 46.9 | 57.6 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | multiscale | ||||
| 110 | YOLOv4-608 | 43.5 | 65.7 | 47.3 | 26.7 | 46.7 | 53.3 | YOLOv4: Optimal Speed and Accuracy of Object Detection | 2020 | single scale YOLO |
||||
| 111 | SNIPER (ResNet-50) | 43.5 | 65.0 | 48.6 | 26.1 | 46.3 | 56.0 | SNIPER: Efficient Multi-Scale Training | 2018 | ResNet | ||||
| 112 | CenterNet (HRNetV2-W48) | 43.5 | 46.5 | 22.2 | 57.8 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||||||
| 113 | D-RFCN + SNIP (ResNet-101, multi-scale) | 43.4 | 65.5 | 48.4 | 27.2 | 46.5 | 54.9 | An Analysis of Scale Invariance in Object Detection - SNIP | 2017 | multiscale ResNet |
||||
| 114 | Grid R-CNN (ResNeXt-101-FPN) | 43.2 | 63.0 | 46.6 | 25.1 | 46.5 | 55.2 | Grid R-CNN | 2018 | ResNeXt FPN |
||||
| 115 | FCOS (ResNeXt-101-64x4d-FPN) | 43.2 | 62.8 | 46.6 | 26.5 | 46.2 | 53.3 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||||
| 116 | CornerNet-Saccade (Hourglass-104, multi-scale) | 43.2 | 24.4 | 44.6 | 57.3 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | multiscale | ||||||
| 117 | Libra R-CNN (ResNeXt-101-FPN) | 43.0 | 64 | 47 | 25.3 | 45.6 | 54.6 | Libra R-CNN: Towards Balanced Learning for Object Detection | 2019 | ResNeXt FPN |
||||
| 118 | RPDet (ResNet-101-DCN) | 42.8 | 65.0 | 46.3 | 24.9 | 46.2 | 54.7 | RepPoints: Point Set Representation for Object Detection | 2019 | DCN ResNet |
||||
| 119 | SpineNet-49 (640, RetinaNet, single-scale) | 42.8 | 62.3 | 46.1 | 23.7 | 45.2 | 57.3 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | single scale | ||||
| 120 | Cascade R-CNN (ResNet-101-FPN+, cascade) | 42.8 | 62.1 | 46.3 | 23.7 | 45.5 | 55.2 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||||
| 121 | Cascade R-CNN | 42.8 | 62.1 | 46.3 | 23.7 | 45.5 | 55.2 | Cascade R-CNN: High Quality Object Detection and Instance Segmentation | 2019 | |||||
| 122 | TridentNet (ResNet-101) | 42.7 | 63.6 | 46.5 | 23.9 | 46.6 | 56.6 | Scale-Aware Trident Networks for Object Detection | 2019 | ResNet | ||||
| 123 | FCOS (ResNeXt-32x8d-101-FPN) | 42.7 | 62.2 | 46.1 | 26.0 | 45.6 | 52.6 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | ResNeXt FPN |
||||
| 124 | RetinaMask (ResNeXt-101-FPN-GN) | 42.6 | 62.5 | 46.0 | 24.8 | 45.6 | 53.8 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | ResNeXt FPN |
||||
| 125 | TAL + TAP | 42.5 | 60.3 | 46.4 | TOOD: Task-aligned One-stage Object Detection | 2021 | ||||||||
| 126 | Faster R-CNN (HRNetV2p-W48) | 42.4 | 63.6 | 46.4 | 24.9 | 44.6 | 53.0 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||||
| 127 | HSD (Rest101, 768x768, single-scale test) | 42.3 | 61.2 | 46.9 | 22.8 | 47.3 | 55.9 | Hierarchical Shot Detector | 2019 | single scale | ||||
| 128 | CornerNet511 (Hourglass-104, multi-scale) | 42.1 | 57.8 | 45.3 | 20.8 | 44.8 | 56.7 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | multiscale | ||||
| 129 | FoveaBox (ResNeXt-101) | 42.1 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||||||||
| 130 | FCOS (HRNet-W32-5l) | 42.0 | 60.4 | 45.3 | 25.4 | 45.0 | 51.0 | FCOS: Fully Convolutional One-Stage Object Detection | 2019 | |||||
| 131 | RefineDet512+ (ResNet-101) | 41.8 | 62.9 | 45.7 | 25.6 | 45.1 | 54.1 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||||
| 132 | GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) | 41.6 | 62.8 | 44.2 | 22.3 | 45.1 | 55.3 | Gradient Harmonized Single-stage Detector | 2018 | FPN | ||||
| 133 | CenterNet-DLA (DLA-34, multi-scale) | 41.6 | 21.5 | 43.9 | 56.0 | Objects as Points | 2019 | multiscale | ||||||
| 134 | RetinaNet (SpineNet-49S, 640x640) | 41.5 | 60.5 | 44.6 | 23.3 | 45 | 58 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | 2019 | |||||
| 135 | RPDet (ResNet-101) | 41 | 62.9 | 44.3 | 23.6 | 44.1 | 51.7 | RepPoints: Point Set Representation for Object Detection | 2019 | ResNet | ||||
| 136 | M2Det (VGG-16, single-scale) | 41.0 | 59.7 | 45 | 22.1 | 46.5 | 53.8 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale | ||||
| 137 | FSAF (ResNet-101, single-scale) | 40.9 | 61.5 | 44 | 24 | 44.2 | 51.3 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | 2019 | single scale ResNet |
||||
| 138 | RetinaNet (ResNeXt-101-FPN) | 40.8 | 61.1 | 44.1 | 24.1 | 44.2 | 51.2 | Focal Loss for Dense Object Detection | 2017 | ResNeXt FPN |
||||
| 139 | Cascade R-CNN (ResNet-50-FPN+, cascade) | 40.6 | 59.9 | 44 | 22.6 | 42.7 | 52.1 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||||
| 140 | Faster R-CNN (Cascade RPN) | 40.6 | 58.9 | 44.5 | 22.0 | 42.8 | 52.6 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||||
| 141 | ResNet-50-DW-DPN (Deformable Kernels) | 40.6 | 24.6 | 43.9 | 53.3 | Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation | 2019 | ResNet | ||||||
| 142 | IoU-Net | 40.6 | Acquisition of Localization Confidence for Accurate Object Detection | 2018 | ||||||||||
| 143 | FCOS (HRNetV2p-W48) | 40.5 | 59.3 | 23.4 | 42.6 | 51.0 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||||
| 144 | ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS | 40.4 | Bounding Box Regression with Uncertainty for Accurate Object Detection | 2018 | FPN ResNet |
|||||||||
| 145 | RDSNet (ResNet-101, RetinaNet, mask, MBRM) | 40.3 | 60.1 | 43 | 22.1 | 43.5 | 51.5 | RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation | 2019 | ResNet | ||||
| 146 | ExtremeNet (Hourglass-104, single-scale) | 40.2 | 55.5 | 43.2 | 20.4 | 43.2 | 53.1 | Bottom-up Object Detection by Grouping Extreme and Center Points | 2019 | single scale | ||||
| 147 | Mask R-CNN (ResNet-101-FPN, CBN) | 40.1 | 60.5 | 44.1 | 35.8 | 57.3 | 38.5 | Cross-Iteration Batch Normalization | 2020 | FPN ResNet |
||||
| 148 | Fast R-CNN (Cascade RPN) | 40.1 | 59.4 | 43.8 | 22.1 | 42.4 | 51.6 | Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution | 2019 | |||||
| 149 | Mask R-CNN (ResNeXt-101-FPN) | 39.8 | 62.3 | 43.4 | 22.1 | 43.2 | 51.2 | Mask R-CNN | 2017 | ResNeXt FPN |
||||
| 150 | GA-Faster-RCNN | 39.8 | 59.2 | 43.5 | 21.8 | 42.6 | 50.7 | Region Proposal by Guided Anchoring | 2019 | |||||
| 151 | FPN (ResNet101 backbone) | 39.5 | ChainerCV: a Library for Deep Learning in Computer Vision | 2017 | FPN ResNet |
|||||||||
| 152 | RetinaMask (ResNet-50-FPN) | 39.4 | 58.6 | 42.3 | 21.9 | 42.0 | 51.0 | RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free | 2019 | FPN ResNet |
||||
| 153 | PP-YOLO (320x320) | 39.3 | 59.3 | 42.7 | 16.7 | 41.4 | 57.8 | PP-YOLO: An Effective and Efficient Implementation of Object Detector | 2020 | YOLO | ||||
| 154 | AA-ResNet-10 + RetinaNet | 39.2 | Attention Augmented Convolutional Networks | 2019 | ||||||||||
| 155 | MAL (ResNet50, single-scale) | 39.2 | Multiple Anchor Learning for Visual Object Detection | 2019 | single scale ResNet |
|||||||||
| 156 | RetinaNet (ResNet-101-FPN) | 39.1 | 59.1 | 42.3 | 21.8 | 42.7 | 50.2 | Focal Loss for Dense Object Detection | 2017 | FPN ResNet |
||||
| 157 | Cascade R-CNN (ResNet-101-FPN+) | 38.8 | 61.1 | 41.9 | 21.3 | 41.8 | 49.8 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN ResNet |
||||
| 158 | M2Det (ResNet-101, single-scale) | 38.8 | 59.4 | 41.7 | 20.5 | 43.9 | 53.4 | M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | 2018 | single scale ResNet |
||||
| 159 | SaccadeNet (DLA-34-DCN) | 38.5 | 55.6 | 41.4 | 19.2 | 42.1 | 50.6 | SaccadeNet: A Fast and Accurate Object Detector | 2020 | DCN | ||||
| 160 | Mask R-CNN (ResNet-101-FPN) | 38.2 | 60.3 | 41.7 | 20.1 | 41.1 | 50.2 | Mask R-CNN | 2017 | FPN ResNet |
||||
| 161 | WSMA-Seg | 38.1 | Segmentation is All You Need | 2019 | ||||||||||
| 162 | Faster R-CNN + FPN + CGD | 37.9 | Compact Global Descriptor for Neural Networks | 2019 | FPN | |||||||||
| 163 | CornerNet511 (Hourglass-52, single-scale) | 37.8 | 53.7 | 40.1 | 17.0 | 39.0 | 50.5 | CornerNet: Detecting Objects as Paired Keypoints | 2018 | single scale | ||||
| 164 | RefineDet512+ (VGG-16) | 37.6 | 58.7 | 40.8 | 22.7 | 40.3 | 48.3 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||||
| 165 | DeformConv-R-FCN (Aligned-Inception-ResNet) | 37.5 | 58.0 | 19.4 | 40.1 | 52.5 | Deformable Convolutional Networks | 2017 | ||||||
| 166 | Faster R-CNN (ImageNet+300M) | 37.4 | 58 | 40.1 | 17.5 | 41.1 | 51.2 | Revisiting Unreasonable Effectiveness of Data in Deep Learning Era | 2017 | |||||
| 167 | Mask R-CNN (Bottleneck-injected ResNet-50, FPN) | 36.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN !!ResNet |
|||||||||
| 168 | Faster R-CNN + TDM | 36.8 | Beyond Skip Connections: Top-Down Modulation for Object Detection | 2016 | ||||||||||
| 169 | Cascade R-CNN (ResNet-50-FPN+) | 36.5 | 59 | 39.2 | 20.3 | 38.8 | 46.4 | Cascade R-CNN: Delving into High Quality Object Detection | 2017 | FPN; ResNet |
||||
| 170 | RefineDet512 (ResNet-101) | 36.4 | 57.5 | 39.5 | 16.6 | 39.9 | 51.4 | Single-Shot Refinement Neural Network for Object Detection | 2017 | ResNet | ||||
| 171 | Faster R-CNN + FPN | 36.2 | Feature Pyramid Networks for Object Detection | 2016 | FPN | |||||||||
| 172 | Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) | 35.9 | torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation | 2020 | FPN; ResNet |
|||||||||
| 173 | Faster R-CNN (box refinement, context, multi-scale testing) | 34.9 | Deep Residual Learning for Image Recognition | 2015 | multiscale | |||||||||
| 174 | Faster R-CNN | 34.7 | Speed/accuracy trade-offs for modern convolutional object detectors | 2016 | ||||||||||
| 175 | CornerNet-Squeeze | 34.4 | CornerNet-Lite: Efficient Keypoint Based Object Detection | 2019 | ||||||||||
| 176 | MultiPath Network | 33.2 | A MultiPath Network for Object Detection | 2016 | ||||||||||
| 177 | ION | 33.1 | 55.7 | 34.6 | 14.5 | 35.2 | 47.2 | Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks | 2015 | |||||
| 178 | RefineDet512 (VGG-16) | 33 | 54.5 | 35.5 | 16.3 | 36.3 | 44.3 | Single-Shot Refinement Neural Network for Object Detection | 2017 | |||||
| 179 | YOLOv3 + Darknet-53 | 33.0 | YOLOv3: An Incremental Improvement | 2018 | YOLO | |||||||||
| 180 | SSD512 | 28.8 | 48.5 | 30.3 | SSD: Single Shot MultiBox Detector | 2015 | ||||||||
| 181 | MnasFPN (MobileNetV2) | 26.1 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||||||||
| 182 | ESPNetv2-512 | 26.0 | ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network | 2018 | ||||||||||
| 183 | MnasFPN (MobileNetV3) | 25.5 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||||||||
| 184 | MnasFPN (MNASNet-B1) | 24.6 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||||||||
| 185 | MnasFPN x0.7 (MobileNetV2) | 23.8 | MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices | 2019 | FPN | |||||||||
| 186 | MobielNet-v1-SSD-300x300+CGD | 21.4 | Compact Global Descriptor for Neural Networks | 2019 | ||||||||||
| 187 | Fast-RCNN | 19.7 | Fast R-CNN | 2015 | ||||||||||
| 188 | MobileNet | 19.3 | MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications | 2017 | ||||||||||
| 189 | DAT-S (RetinaNet) | 69.6 | 51.2 | 32.3 | 51.8 | 63.4 | 47.9 | Vision Transformer with Deformable Attention | 2022 | |||||
| 190 | CenterMask-VoVNet99 (multi-scale) | 68.3 | 53.2 | 32.4 | 60.0 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | multiscale | ||||||
| 191 | Mask R-CNN (HRNetV2p-W32 + cascade) | 62.5 | 48.6 | 56.3 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | ||||||||
| 192 | FoveaBox (ResNeXt-101) | 61.9 | 45.2 | 46.8 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||||||
| 193 | VirTex Mask R-CNN (ResNet-50-FPN) | 61.7 | 44.8 | VirTex: Learning Visual Representations from Textual Annotations | 2020 | FPN; ResNet |
||||||||
| 194 | Centermask + ResNet101 | 61.6 | 46.9 | CenterMask : Real-Time Anchor-Free Instance Segmentation | 2019 | ResNet | ||||||||
| 195 | PAFNet (ResNet50-vd) | 59.8 | 45.3 | 22.8 | 45.8 | 59.2 | PAFNet: An Efficient Anchor-Free Object Detector Guidance | 2021 | ResNet | |||||
| 196 | IoU-Net+EnergyRegression | 58.5 | 41.8 | Energy-Based Models for Deep Probabilistic Regression | 2019 | |||||||||
| 197 | Cascade R-CNN (HRNetV2p-W48) | 48.6 | 26.0 | 47.3 | 56.3 | Deep High-Resolution Representation Learning for Visual Recognition | 2019 | |||||||
| 198 | ISTR (ResNet50-FPN-3x, single-scale) | 27.8 | 48.7 | 59.9 | ISTR: End-to-End Instance Segmentation with Transformers | 2021 | ||||||||
| 199 | FoveaBox (ResNeXt-101) | 24.9 | FoveaBox: Beyond Anchor-based Object Detector | 2019 | ResNeXt | |||||||||
| 200 | EfficientDet-D7x (single-scale) | 57.9 | EfficientDet: Scalable and Efficient Object Detection | 2019 | single scale |