Recognizing Micro-Expression in Video Clip with Adaptive Key-Frame Mining

M Peng, Chongyang Wang, Y Gao, Tao Bi, T Chen, Y Shi, X-D Zhou
in Journal article


As a spontaneous expression of emotion on face, micro-expression is receiving
increasing attention from the affective computing community. Whist better
recognition accuracy is achieved by various deep learning (DL) techniques, one
characteristic of micro-expression has been not fully exploited. That is, such
facial movement is transient and sparsely localized through time. Therefore,
the representation learned from a full video clip is usually redundant. On the
other hand, methods utilizing the single apex frame require manual annotations
and sacrifice the temporal dynamics. To simultaneously localize and recognize
such fleeting facial movements, we propose a novel end-to-end deep learning
architecture, referred to as Adaptive Key-frame Mining Network (AKMNet).
Operating on the raw video clip of micro-expression, AKMNet is able to learn
discriminative spatio-temporal representation by combining spatial features of
self-learned local key frames and their global-temporal dynamics. Empirical and
theoretical evaluations show advantages of the proposed approach with improved