[paper reading][CVPR 2020] Spatio-Temporal Graph for Video Captioning with Knowledge Distillation


目录
  • 2 Related Work
    • General Video Classification
  • 3
    • 3.2 Spatio-Temporal Graph

  • CVPR 2020
  • https://openaccess.thecvf.com/content_CVPR_2020/papers/Pan_Spatio-Temporal_Graph_for_Video_Captioning_With_Knowledge_Distillation_CVPR_2020_paper.pdf
  • spatio-temporal graph model for video captioning that exploits object interactions in space and time
  • two-branch, knowledge distillatio

2 Related Work

General Video Classification

  • 3D conv
  • two-stream, optical flow
  • wider range
  • SlowFast, multiple time scales, two pathways
  • feature bank, long-term, correlated, short-term
  • raw pixels, in contrast, objects within scenes

3

  • two-branch, distill
  • scene, 2D, resnet, 3D, I3D
  • object features: \(N_T\) objects, each \(o_t^j\) has the same dimension

3.2 Spatio-Temporal Graph

  • decompose our graph into two components: the spatial graph and the temporal graph
  • Spatial: normalized Intersection over Union (IoU) value, explicitly
  • temporal: object transformations, semantic similarities, \(cos\)
  • imagine: # - % = $ x @ structure