MOT20 评测指标详解 + MOT CLEAR 指标思想详解(英文)
- Tracking of multiple, partially occluded humans based on static body part detection.
CLEAR metrics
MOT20 is inspired by the paper EvaluatingMultiple Object Tracking Performance: The CLEAR MOTMetrics
There are 2 main requirements in the paper.
- The distance between an object and a hypothesis should not exceed a threshold \(T\)

Distance measure can be
- IoU(Overlap)
- Euclidean distance(Centroid Tracking)
The threshold should be task-dependent and not pre-defined for general cases.
- Consistent Tracking
Construct a list of object-hypothesis mappings
\[M_{t} = \{(o_{i},h_{j})\}^{t} \]
only count mismatch errors once at the time frames where a change in object-hypothesis mappings is made;
and consider the correspondences in intermediate segments as correct. Especially in cases where many objects are being tracked and mismatches are frequent

It has already been the common manner in MOT.
Track reinlinization is needed after fragmentation.
If \(h_{1},h_{2}\)(hypotheses with ID as 1 and 2) are all valid choices(with distance \(
Then the pair in \(M_{t}\), i.e. the hypothesis which has been matched to the object \(o_{i}\) before, is considered

Target/Annotations of MOT20
We define the target class of CVPR19 as all upright walking people that are reachable along the viewing ray without a physical obstacle, i.e. reflec tions, people behind a transparent wall or window are excluded.
DISTRACOTRS: distractor, static person, reflection, person on vehicle)
That is, a method is neither penalized nor rewarded for tracking or not tracking those similar classes.
Distractor cancellation
Since a detector is likely to fire in those cases, we do not want to penalize a tracker with a set of false positives for properly following that set of detections, i.e. of a person on a bicycle.
Likewise, we do not want to penalize with false negatives a tracker that is based on motion cues and therefore does not track a sitting person.
Metrics in Object Detection
Confusion Matrix
is-actual|test-result
- TP: Actually Positive
- FP: Not Actually Positive
- FN: Not Actually Negative


Confusion Matrix
Confusion Matrix of Object Detection
Evaluating Object Detection Models: Guide to Performance Metrics
Intersection over Union, also referred to as the Jaccard Index


- TP: Target Successfully Detected
- FP: Distractor Wrongly Detected- 误检
- 类别引起的
- 回归框不准确引起的
 
 
- 误检
- FN: Target Failed to be Detected- 漏检
 
Tracker-to-target assignment Prerequisites
Hereinafter, hypothesis means a proposed track.
In a frame, a track is actually equals to a bounding-box
Bounding Box Matching
+ Prerequisite 1: Divide the hypotheses into `TP`, `FP`, `FN`
threshold is based on IoU = 0.5
refer to the paper 4.1.2
It's the same as object detection in single frame.
- TP,FP,FN
- FAF(False Alarm, i.e. FP, per Frame)
- FPPI(False Positives Per Image)
Tracklet Matching
+ Prerequisite 2: a true object should be recovered at most once,
+ and that one hypothesis cannot account for more than one target. 
For the following, we assume that each ground truth trajectory has one unique start and one unique end point, i.e. that it is not fragmented.
In other words, when a target leaves the field-of-view and then reappears, it is treated as an unseen target with a new ID
Tracklet Matching is not greedily performed on single frames.

a method that finds twice as many trajectories will almost certainly produce more identity switches.
IDSW should not be considered alone to assess the overall performance.
IDSW/recall is introduced.
- IDSW; IDSW/Recall(Normalized)
Tracker-to-target assignment IN DETAILS

Tracking of Multiple, Partially Occluded Humans based on Static Body Part Detection
In this part, let red track's ID is 1 and blue track's ID is 2, target is called T
ID Switch without fragmentation

- At frame 2, T is matched to red track 1 the first time, so it should preserve 1 as its id during its appearance.
- At frame 5, T is matched to blue track 2, ID switch occurs.
ID Switch with fragmentation

- At frame 3, fragmentation occurs due to FP(误检)
- At frame 5, T is matched to blue track, changing its ID from 1 to 2. ID switch occurs.
Error due to Propagation

- At frame 1, matching is resonal good.
- Along this part of sequence, the track above preserve the same ID 1 until frame 5.
- The track below preserve the same ID 2 until frame 2.
 
- There is 5 FNand 4FP(blue hypotheses)- because the track above waste the closer blue track
- and grasp the red track, causing the track below haveing no hypothesis to match.
 
There is NO fragmentation or ID switch in this part of sequence.
- Note: Fragment and ID switch is counted
- when THE NEW track appears

Note that no fragmentations are counted in frames 3 and 6 because tracking of those targets is not resumed at a later point (in this part of sequence).
Interrupted GT trajectory

Metrics
Prerequisite: Concatenate all sequences

Drawback of MOTA

It's normalized(influenced) by \(\#\text{GT}\)
metrics
- MOTA: matching metric
- MOTP: detector regression prediction
- MT, PT, ML: Mostly Tracked,Partially Tracked,Mostly Lost
- FM: fragmentations
- FM / Recall.
 
Average Rank
- For each track, calculate rank according to each metric.
- If there are \(N\) metrics, the rank vector has dimensionality of \(N\)
 
- Average the rank vector.
ref
- MOT20: A benchmark for multi object tracking in crowded scenes
- code: https://github.com/JonathonLuiten/TrackEval