Using Bellman Optimality Principle for the Generative Autoencoder Architecture for the Problems of the Attribute Data Typesetting and Semantic Description in Data Management
Author(s):
Sergey Kuznetsov
Unidata LLC
Saint-Petersburg State University
sergey.kouznetsov@gmail.com
Abstract:
The publication presents the problems of identifying data types (typesetting) and semantic description of the attributes when managing structured data and master data (Master Data Management). A formal definition of the generalized attribute typesetting problem is given, which allows generation of the additional data types. This problem allows using the discrete Bellman optimality principle under special criteria of the target function. A unified architecture of the deep generative neural network addressing simultaneously the generalized attribute typesetting and semantic description generation problems is proposed. The architecture is based on the generative adversarial autoencoder architecture (AAE) using the mechanisms of soft-attention, and long-term memory (SCRN). The effectiveness of such implementation, in particular, is achieved through the application of the principles of dynamic programming within each epoch of the network training.
Keywords
- AAE
- data profiling
- data types generation
- data typesetting
- deep learning
- discrete Bellman optimality principle
- GAN
- generative neural networks
- LSTM
- master data
- metadata
- neural network architecture
- SCRN
- soft attention
References:
- Kantorovich L. V. " Mathematical methods of organizing and planning production. " Management science 6. 4, 1960, pp. 366-422 (in Russian)
- Jain A. et al. Overview and importance of data quality for machine learning tasks //Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. - 2020. - pp. 3561-3562
- International D. DAMA-DMBOK: data management body of knowledge. - Technics Publications, LLC, 2017
- Kumar A., Boehm M., Yang J. Data management in machine learning: Challenges, techniques, and systems //Proceedings of the 2017 ACM International Conference on Management of Data. - 2017. - pp. 1717-1722
- Thirumuruganathan S. et al. Data Curation with Deep Learning //EDBT. - 2020. - pp. 277-286
- Pavia S. et al. Hybrid Metadata Classification in Large-scale Structured Datasets //J. Data Intell. - 2022. - Т. 3. - №. 4. - pp. 460-473
- Khan H., Wang X., Liu H. Handling missing data through deep convolutional neural network //Information Sciences. - 2022. - Т. 595. - pp. 278-293
- Stonebraker M. Inclusion of new types in relational data base systems //Readings in Artificial Intelligence and Databases. - Morgan Kaufmann, 1989. - pp. 599-606
- Purandhar N., Ayyasamy S., Siva Kumar P. Classification of clustered health care data analysis using generative adversarial networks (GAN) //Soft Computing. - 2022. - Т. 26. - №. 12. - pp. 5511-5521
- Zhu G. et al. A novel LSTM-GAN algorithm for time series anomaly detection // 2019 prognostics and system health management conference (PHM-Qingdao). - IEEE, 2019. - pp. 1-6
- Kuznetsov S., Konstantinov A., Skvortsov N. The value of your data, Alpina PRO Publishing House, 2022 (in Russian)
- Reyes-Ortiz, Jorge, Anguita, Davide, Ghio, Alessandro, Oneto, Luca, and Parra, Xavier. (2012). Human Activity Recognition Using Smartphones. UCI Machine Learning Repository. https://doi.org/10.24432/C54S4K
- Li J. et al. Feature selection: A data perspective //ACM computing surveys (CSUR). - 2017. - Т. 50. - №. 6. - pp. 1-45
- Chen, R. C., Dewi, C., Huang, S. W., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7(1), 52
- Bengio Y., Goodfellow I., Courville A. Deep learning. - Cambridge, MA, USA : MIT press, 2017
- Romanovsky I. V. Algorithms for solving extremal problems. - 1977. (in Russian)
- Yu, Huizhen, A. Rupam Mahmood, and Richard S. Sutton. " On generalized bellman equations and temporal-difference learning. " The Journal of Machine Learning Research 19. 1. - 2018. - pp. 1864-1912
- Goodfellow, I. NIPS 2016 tutorial: Generative adversarial networks. arXiv 2016. arXiv preprint arXiv:1701. 00160
- Kuznetsov S. V., Summation of enumerators in discrete optimization problems in the context of master data management // Differencialnie Uravnenia i Protsesy Upravlenia. - 2023. - No. 4. - pp. 42-52. (in Russian)
- Dudar Z. V., Shuklin D. E. Semantic neural network as a formal language for describing and processing the meaning of texts in natural language // Radioelektronika i informatika. - 2000. - No. 3 (12). - P. 72-76. (in Russian)
- Xu K. et al. Show, attend and tell: Neural image caption generation with visual attention //International conference on machine learning. - PMLR, 2015. - pp. 2048-2057
- Gers F. A., Schmidhuber J, Cummins F. Learning to Forget: Continual prediction with LSTM // Neural Computation, 2000, vol. 12 no. 10, pp. 2451-2471
- Jzefowicz R., Zaremba W., Sutskever I. An empirical exploration of Recurrent Network Architectures // Proc. 32nd ICML, 2015, pp. 2342 - 2350
- Mikolov T. et al. Learning longer memory in recurrent neural networks //arXiv preprint arXiv:1412. 7753. - 2014
- Lei T., Zhang Y., Artzi Y. Training RNNs as fast as CNNs. - 2018
- Chen, Daqing. (2019). Online Retail II. UCI Machine Learning Repository. https://doi.org/10.24432/C5CG6D