Multi-Agent Deep Reinforcement Learning for Policy Optimization in Sequential Data Environments with Partial Observability

Authors

  • Angela Darienzo Computer programmer Author

Keywords:

Multi-Agent Reinforcement Learning, Deep RL, Partial Observability, Policy Optimization, Sequential Decision-Making, , Decentralized Control, CTDE, POMDP 

Abstract

In environments characterized by high temporal complexity and incomplete information, effective policy optimization becomes a core challenge in multi-agent systems. This paper investigates the use of Multi-Agent Deep Reinforcement Learning (MADRL) under conditions of partial observability, where agents must learn to act based only on local and noisy observations. We propose a policy learning framework that incorporates recurrent neural networks (RNNs) for memory-based representation and leverages centralized training with decentralized execution (CTDE). The system is evaluated on benchmark decentralized partially observable environments, demonstrating superior stability and policy convergence compared to baseline algorithms. Our findings highlight the potential of causally-aware memory policies and attention-driven coordination in solving complex sequential tasks with minimal information.

 

References

Lowe, R., et al. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. NeurIPS.

Adapa, C.S.R. (2025). Building a standout portfolio in master data management (MDM) and data engineering. International Research Journal of Modernization in Engineering Technology and Science, 7(3), 8082–8099. https://doi.org/10.56726/IRJMETS70424

Foerster, J., et al. (2018). Counterfactual Multi-Agent Policy Gradients. AAAI Con-ference on Artificial Intelligence.

Rashid, T., et al. (2018). QMIX: Monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning. International Conference on Machine Learning (ICML).

Mukesh, V. (2025). Architecting intelligent systems with integration technologies to enable seamless automation in distributed cloud environments. International Journal of Advanced Research in Cloud Computing (IJARCC), 6(1),5-10.

Sankaranarayanan, S. (2025). The Role of Data Engineering in Enabling Real-Time Analytics and Decision-Making Across Heterogeneous Data Sources in Cloud-Native Environments. International Journal of Advanced Research in Cyber Security (IJARC), 6(1), January-June 2025.

Adapa, C.S.R. (2025). Transforming quality management with AI/ML and MDM in-tegration: A LabCorp case study. International Journal on Science and Technology (IJSAT), 16(1), 1–12.

Hausknecht, M., & Stone, P. (2015). Deep Recurrent Q-Learning for Partially Ob-servable MDPs. AAAI Fall Symposium.

Iqbal, S., & Sha, F. (2019). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. International Conference on Machine Learning (ICML).

Kim, Y., et al. (2022). Causal Influence and Disentanglement in Multi-Agent Rein-forcement Learning. International Conference on Learning Representations (ICLR).

Mukesh, V. (2024). A Comprehensive Review of Advanced Machine Learning Tech-niques for Enhancing Cybersecurity in Blockchain Networks. ISCSITR-International Journal of Artificial Intelligence, 5(1), 1–6.

Yao, L., et al. (2020). Temporal Causal Discovery for Time Series Analysis. Neural Information Processing Systems (NeurIPS).

S.Sankara Narayanan and M.Ramakrishnan, Software As A Service: MRI Cloud Au-tomated Brain MRI Segmentation And Quantification Web Services, International Journal of Computer Engineering & Technology, 8(2), 2017, pp. 38–48.

Chandra Sekhara Reddy Adapa. (2025). Blockchain-Based Master Data Management: A Revolutionary Approach to Data Security and Integrity. International Journal of Information Technology and Management Information Systems (IJITMIS), 16(2), 1061-1076.

Mukesh, V., Joel, D., Balaji, V. M., Tamilpriyan, R., & Yogesh Pandian, S. (2024). Data management and creation of routes for automated vehicles in smart city. Inter-national Journal of Computer Engineering and Technology (IJCET), 15(36), 2119–2150. doi: https://doi.org/10.5281/zenodo.14993009

Bengio, Y., et al. (2020). A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms. International Conference on Learning Representations (ICLR).

Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to Com-municate with Deep Multi-Agent Reinforcement Learning. NeurIPS.

Peng, P., et al. (2017). Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games. NeurIPS.

Mukesh, V. (2022). Evaluating Blockchain Based Identity Management Systems for Secure Digital Transformation. International Journal of Computer Science and En-gineering (ISCSITR-IJCSE), 3(1), 1–5.

Zhang, K., Yang, Z., & Başar, T. (2019). Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms. Handbook of Reinforcement Learning and Control.

Oliehoek, F. A., & Amato, C. (2016). A Concise Introduction to Decentralized POMDPs. Springer.

Sankar Narayanan .S, System Analyst, Anna University Coimbatore , 2010. INTEL-LECTUAL PROPERY RIGHTS: ECONOMY Vs SCIENCE &TECHNOLOGY. In-ternational Journal of Intellectual Property Rights (IJIPR) .Volume:1,Issue:1,Pages:6-10.

Adapa, C.S.R. (2025). Cloud-based master data management: Transforming enter-prise data strategy. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 11(2), 1057–1065. https://doi.org/10.32628/CSEIT25112436

Albrecht, S. V., & Stone, P. (2018). Autonomous Agents Modelling Other Agents: A Comprehensive Survey. Artificial Intelligence, 258, 66–95.

Hernandez-Leal, P., Kartal, B., & Taylor, M. E. (2019). A Survey and Critique of Multiagent Deep Reinforcement Learning. Autonomous Agents and Multi-Agent Systems, 33, 750–797.

Sunehag, P., et al. (2018). Value-Decomposition Networks for Cooperative Mul-ti-Agent Learning. AAAI Conference.

Sankar Narayanan .S System Analyst, Anna University Coimbatore , 2010. PATTERN BASED SOFTWARE PATENT.International Journal of Computer Engineering and Technology (IJCET) -Volume:1,Issue:1,Pages:8-17.

Jiang, J., & Lu, Z. (2018). Learning Attentional Communication for Multi-Agent Cooperation. NeurIPS.

Amato, C., Konidaris, G., Cruz, G., How, J. P., & Kaelbling, L. P. (2015). Planning for Decentralized Control of Multiple Robots Under Uncertainty. ICRA.

Ghosh, S., et al. (2021). Learning to Learn Communication in Multi-Agent Rein-forcement Learning: A Meta-Gradient Approach. NeurIPS.

Christiano, P., et al. (2016). Transfer of Control in Multi-Agent Systems. arXiv pre-print arXiv:1604.04544.

Downloads

Published

26-03-2025

How to Cite

Angela Darienzo. (2025). Multi-Agent Deep Reinforcement Learning for Policy Optimization in Sequential Data Environments with Partial Observability. International Journal of Computer Science and Information Technology Research , 6(2), 54-62. https://ijcsitr.com/index.php/home/article/view/IJCSITR_06_02_05