FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
CVPR 2021 (Oral)

The City College of New York, InterDigital

Abstract

Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked new research in 3D scene flow. Nevertheless, it remains challenging to extract scene flow from point clouds due to the sparsity and irregularity in typical point cloud sampling patterns. One major issue related to irregular sampling is identified as the randomness during point set abstraction/feature extraction---an elementary process in many flow estimation scenarios. A novel Spatial Abstraction with Attention (SA^2) layer is accordingly proposed to alleviate the unstable abstraction problem. Moreover, a Temporal Abstraction with Attention (TA^2) layer is proposed to rectify attention in temporal domain, leading to benefits with motions scaled in a larger range. Extensive analysis and experiments verified the motivation and significant performance gains of our method, dubbed as Flow Estimation via Spatial-Temporal Attention (FESTA), when compared to several state-of-the-art benchmarks of scene flow estimation.

Video

Pipeline

We adaptively shift the attended regions when seeking abstraction from one point cloud spatially, and when fusing information across two point clouds temporally.

We propose the SA2 layer for stable point cloud ab- straction. It shifts the FPS down-sampled points to invariant positions for defining the attended regions, regardless of how the point clouds were sampled from the scene manifold. Effectiveness of the SA2 layer is verified both theoretically and empirically.
We propose the TA2 layer to estimate both small- and large- scale motions. It emphasizes the regions that are more likely to find good matches between the point clouds, regardless of the scale of the motion.
OurproposedFESTAarchitectureachievesthestate-of- the-art performance for 3D point cloud scene flow esti- mation on both synthetic and real world benchmarks. Our method significantly outperforms the state-of-the- art methods of scene flow estimation.

Citation

@InProceedings{Wang_2021_CVPR,
    author    = {Wang, Haiyan and Pang, Jiahao and Lodhi, Muhammad A. and Tian, Yingli and Tian, Dong},
    title     = {FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {14173-14182}
}

Acknowledgements

The website template was borrowed from Michaël Gharbi.