[paper reading][CVPR 2020] Temporal Pyramid Network for Action Recognition

目录

1 Introduction
2 Related Work
- Video Action Recognition
- Visual Tempo Modeling in Action Recognition
3 Temporal Pyramid Network
- 3.2
- 3.3

CVPR 2020
https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Temporal_Pyramid_Network_for_Action_Recognition_CVPR_2020_paper.pdf
visual tempo, temporal scales
previous: sample raw videos at multiple rates, frame pyramid, multi-branch
feature hierarchy
imrpovements, plug-and-play, especially large variances in tempos

1 Introduction

Visual tempo actually describes how fast an action goes, which tends to determine the effective duration at the temporal scale for recognition
- inter-class difference, hand clapping and walking
- intra-class difference, somersault
- pyramid, multi-branch, multiple features outputs, combine
- temporal receptive field, different depth, in a single model, catch both
- feature-level aggregation
- ablation, most of its improvements, significant variances

Video Action Recognition

2D, then 1D paradigm
per-frame and optical flow (two stream)
variants
2D CNN, not temporal at early stages
3D? non-local, inflating, decomposing...

Visual Tempo Modeling in Action Recognition

3 Temporal Pyramid Network

TPN, single network, plug-and-play, feature level
collect hierarchical features?
- "single depth": multiple rates for multiple tensors, but the size \(C*T*W*H\) is the same (the same spatial granularity)
- "multi depth": multiple sizes, richer semantics, careful treatment of fusion, ensure correct information flows
modulation, align, stride, match: shape and receptive field
auxiliary classification head, stronger supervision, \(\mathcal L_{total} = \mathcal L_{CE,o} + \sum \lambda_i \mathcal L_{CE, i}\)
then, temporal modulation, \(\alpha\), flexible, downsample, factor

3.2

aggregate? isolation, bottom-up, top-down
element-wise, compatibility of the addition, \(\delta\) factor
information flow:
- "Cascade", bottom-up after a top-down
- "Parallel": simultaneously

3.3

ResNet, 3D backbone
res2, res3, res4, res5, downsampled
stride, max-pooling..., can be trained in an end-to-end manner

Paperreading temporal cv

相关

Centos7 安装 opencv

云服务器安装opcv-python

Android Kotlin opencv MatOfPoint 转 MatOfPoint2f 报错踩坑 (解决)

weblogic-CVE-2020-2551-IIOP反序列化学习记录

【图像处理】OpenCV+Python图像处理入门教程（四）几何变换

ICCV 2021口罩人物身份鉴别全球挑战赛冠军方案分享

opencv--检测图片中的圆形

OpenCV 学习笔记（1）

EasyCVR对接华为iVS订阅摄像机和用户变更请求接口介绍

Tensorflow实现LeNet5网络并保存pb模型，实现自定义的手写数字识别（附opencv-python调用

OpenCV4【17】-DNN 之 yolov3 目标检测

编译opencv3.1.0时出现错误：error: ‘NppiGraphcutState’ has not been declared

标签