ISSN 1817-2172, рег. Эл. № ФС77-39410, ВАК

Differential Equations and Control Processes
(Differencialnie Uravnenia i Protsesy Upravlenia)

Construction of an Emotional Image of a Person Based on the Analysis of Key Points in Consecutive Frames of a Video Sequence

Author(s):

Dmitry Dmitriyevich Averianov

Researcher at the Center for Research and Development, LLC "Robert Bosch"
PhD student of the Department of Applied Cybernetics, Faculty of Mathematics and Mechanics,
St. Petersburg State University (SPBU)

dmitryaverianov@gmail.com

Mikhail Valerievich Zheludev

Ph.D. Senior Researcher at the Center for Research and Development, LLC "Robert Bosch"
Saint Petersburg. st. Marshal Govorov, 49

mikhail.zheludev@ru.bosch.com

Vladimir Ilyich Kiyaev

Ph.D. Associate Professor, Department of Astronomy, Faculty of Mathematical and Mechanical,
St. Petersburg State University (SPBU)

kiyaev@mail.ru

Abstract:

The work is devoted to the development of an algorithm for classifying human behavior in the context of detecting the truthfulness or falsity of statements presented in video file format. The analysis of the video file was carried out within the time window, in which both changes in the micromotility of the facial muscles and speech signs were analyzed. In our case, facial expressions are represented by a mathematical representation in the form of a vector containing the necessary digital information about the state of the face, which is characterized by the positions of special points (key points of the nose, eyebrows, eyes, eyelids, etc.). The mimic vector is formed as a result of training non-linear models. The speech characterizing vector is formed on the basis of the heuristic characteristics of the audio signal. The temporal aggregation of vectors for the final classification of behavior is performed by a separate neural network. The paper presents the results of the accuracy and speed of the algorithm, which show that the new approach is competitive with respect to existing methods.

Keywords

References:

  1. Goupil L. et al. Listeners’ perceptions of the certainty and honesty of a speaker are associated with a common prosodic signature // Nature Communications. 2021. Vol. 12, № 1. P. 861
  2. Teixeira J. P., Oliveira C., Lopes C. Vocal Acoustic Analysis - Jitter, Shimmer and HNR Parameters // Procedia Technology. 2013. Vol. 9. P. 1112-1122
  3. Burzo M. et al. Multimodal deception detection // The Handbook of Multimodal-Multisensor Interfaces, Volume 2. 2018
  4. Chow A., Louie J. Detecting lies via speech patterns. 2017
  5. Zhang, X., Sugano, Y., Fritz, M. & Bulling, A. 2017, " It's Written All over Your Face: Full-Face Appearance-Based Gaze Estimation", IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 2299
  6. Kathi, M. G. & Shaik, J. H. 2021, " Estimating the smile by evaluating the spread of lips", Revue d'Intelligence Artificielle, vol. 35, no. 2, pp. 153-158
  7. Zhang, X., Sugano, Y., Fritz, M. & Bulling, A. 2015, " Appearance-based gaze estimation in the wild", Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4511
  8. Bazarevsky, V. et. al., BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs, CoRR, abs/1907. 05047. 2019
  9. Kaiming H. et. al., Deep Residual Learning for Image Recognition, CVPR 2016, 2016
  10. Bertatius G. et. al., Is Space-Time Attention All You Need for Video Understanding?, ICML 2021, 2021
  11. Vaswani A. et. al., Attention Is All You Need, NIPS 2017, 2017
  12. Gong Y., et. al., AST: Audio Spectrogram Transformer, Interspeech 2021, 2021
  13. Burkhardt F. et al. A Database of German Emotional Speech // Interspeech. 2005. P. 1517-1520
  14. Zhu Y., et. al., TinaFace: Strong but Simple Baseline for Face Detection, arXiv preprint arXiv:2011. 13183, 2020
  15. Tran D., et. al., A Closer Look at Spatiotemporal Convolutions for Action Recognition, CVPR 2018, 2018
  16. Olah C., Understanding LSTM Networks // colah. github. io. 2015.
  17. Alammar J., Visualizing A Neural Machine Translation Model (Mechanics of Seq2Seq Models With Attention)
  18. Vaswani A., Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 2017

Full text (pdf)