Journal Differential Equations and Control Processes - Construction of an Emotional Image of a Person Based on the Analysis of Key Points in Consecutive Frames of a Video Sequence

Construction of an Emotional Image of a Person Based on the Analysis of Key Points in Consecutive Frames of a Video Sequence

Author(s):

Dmitry Dmitriyevich Averianov

Researcher at the Center for Research and Development, LLC "Robert Bosch"
PhD student of the Department of Applied Cybernetics, Faculty of Mathematics and Mechanics,
St. Petersburg State University (SPBU)

dmitryaverianov@gmail.com

Mikhail Valerievich Zheludev

Ph.D. Senior Researcher at the Center for Research and Development, LLC "Robert Bosch"
Saint Petersburg. st. Marshal Govorov, 49

mikhail.zheludev@ru.bosch.com

Vladimir Ilyich Kiyaev

Ph.D. Associate Professor, Department of Astronomy, Faculty of Mathematical and Mechanical,
St. Petersburg State University (SPBU)

kiyaev@mail.ru

Abstract:

The work is devoted to the development of an algorithm for classifying human behavior in the context of detecting the truthfulness or falsity of statements presented in video file format. The analysis of the video file was carried out within the time window, in which both changes in the micromotility of the facial muscles and speech signs were analyzed. In our case, facial expressions are represented by a mathematical representation in the form of a vector containing the necessary digital information about the state of the face, which is characterized by the positions of special points (key points of the nose, eyebrows, eyes, eyelids, etc.). The mimic vector is formed as a result of training non-linear models. The speech characterizing vector is formed on the basis of the heuristic characteristics of the audio signal. The temporal aggregation of vectors for the final classification of behavior is performed by a separate neural network. The paper presents the results of the accuracy and speed of the algorithm, which show that the new approach is competitive with respect to existing methods.

Keywords

audio analysis
facial landmarks
lie detector
machine and deep learning
speech signal
transformers
video analytics
video classification

References:

Goupil L. et al. Listeners’ perceptions of the certainty and honesty of a speaker are associated with a common prosodic signature // Nature Communications. 2021. Vol. 12, № 1. P. 861
Teixeira J. P., Oliveira C., Lopes C. Vocal Acoustic Analysis - Jitter, Shimmer and HNR Parameters // Procedia Technology. 2013. Vol. 9. P. 1112-1122
Burzo M. et al. Multimodal deception detection // The Handbook of Multimodal-Multisensor Interfaces, Volume 2. 2018
Chow A., Louie J. Detecting lies via speech patterns. 2017
Zhang, X., Sugano, Y., Fritz, M. & Bulling, A. 2017, " It's Written All over Your Face: Full-Face Appearance-Based Gaze Estimation", IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 2299
Kathi, M. G. & Shaik, J. H. 2021, " Estimating the smile by evaluating the spread of lips", Revue d'Intelligence Artificielle, vol. 35, no. 2, pp. 153-158
Zhang, X., Sugano, Y., Fritz, M. & Bulling, A. 2015, " Appearance-based gaze estimation in the wild", Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4511
Bazarevsky, V. et. al., BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs, CoRR, abs/1907. 05047. 2019
Kaiming H. et. al., Deep Residual Learning for Image Recognition, CVPR 2016, 2016
Bertatius G. et. al., Is Space-Time Attention All You Need for Video Understanding?, ICML 2021, 2021
Vaswani A. et. al., Attention Is All You Need, NIPS 2017, 2017
Gong Y., et. al., AST: Audio Spectrogram Transformer, Interspeech 2021, 2021
Burkhardt F. et al. A Database of German Emotional Speech // Interspeech. 2005. P. 1517-1520
Zhu Y., et. al., TinaFace: Strong but Simple Baseline for Face Detection, arXiv preprint arXiv:2011. 13183, 2020
Tran D., et. al., A Closer Look at Spatiotemporal Convolutions for Action Recognition, CVPR 2018, 2018
Olah C., Understanding LSTM Networks // colah. github. io. 2015.
Alammar J., Visualizing A Neural Machine Translation Model (Mechanics of Seq2Seq Models With Attention)
Vaswani A., Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 2017

Differential Equations and Control Processes
(Differencialnie Uravnenia i Protsesy Upravlenia)

Construction of an Emotional Image of a Person Based on the Analysis of Key Points in Consecutive Frames of a Video Sequence

Author(s):

Abstract:

Keywords

References:

Full text (pdf)

Differential Equations and Control Processes (Differencialnie Uravnenia i Protsesy Upravlenia)

Construction of an Emotional Image of a Person Based on the Analysis of Key Points in Consecutive Frames of a Video Sequence

Author(s):

Abstract:

Keywords

References:

Full text (pdf)

Differential Equations and Control Processes
(Differencialnie Uravnenia i Protsesy Upravlenia)