We present an approach for real-time estimation of 3D hand shape and pose from a single RGB image.
To achieve real-time performance, we utilize an efficient Convolutional Neural (CNN): MobileNetV3-Small to extract key features from an input image. The extracted features are then sent to an iterative 3D regression module to infer camera parameters, hand shapes and joint angles for projecting and articulating a 3D hand model. By combining the deep neural network with the differentiable hand model, we can train the network with supervision from 2D and 3D annotations in an end-to-end manner. Experiments on two publicly available datasets demonstrate that our approach matches the accuracy of most existing methods while running at over 110 Hz on a GPU or 75 Hz on a CPU.
@inproceedings{MobileHand:2020,
title = {MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image},
author = {Guan Ming, Lim and Prayook, Jatesiktat and Wei Tech, Ang},
booktitle = {27th International Conference on Neural Information Processing (ICONIP)},
year = {2020}
}