REINFORCEMENT LEARNING METHODS FOR UNMANNED AERIAL VEHICLES IN MILITARY LOGISTICS TASKS
DOI:
https://doi.org/10.35546/kntu2078-4481.2025.3.2.31Keywords:
reinforcement learning, unmanned aerial vehicles, military logistics, MARL, CMDP, CVaR, CTDE, routing, energy constraints, simulation-based trainingAbstract
This article provides a comprehensive review of modern reinforcement learning (RL) methods and their application in military logistics involving unmanned aerial vehicles (UAVs). The relevance of this topic arises from the increasing role of UAVs in ensuring rapid cargo transportation, reconnaissance, and support for combat units, particularly under time constraints and high-risk conditions. The paper analyzes key peer-reviewed studies that demonstrate the efficiency of RL in route planning, multi-agent coordination (multi-agent RL, MARL), resource and energy management, and risk-sensitive decision-making (CVaR). Special attention is devoted to approaches that formalize logistics tasks as constrained Markov decision processes (CMDP), the application of attention mechanisms and graph neural networks for route optimization, and centralized training with decentralized execution (CTDE) frameworks that enable effective multi-UAV cooperation in real time. Mathematical models and formulas describing policy optimization, energy constraints, and algorithmic safety enhancements are presented. The review also examines methods for integrating RL solutions with monitoring and control systems, addressing current challenges such as sim-to-real transfer, limited onboard computational resources, and resilience to communication loss. The findings of this work can serve as a foundation for building autonomous, adaptive logistics platforms capable of operating efficiently in dynamic and hazardous environments.
References
Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction. Cambridge : MIT Press, 2018. 548 p.
Abbeel P., Coates A., Ng A. Y. Autonomous helicopter aerobatics through apprenticeship learning. International Journal of Robotics Research. 2010. Vol. 29, No. 13. P. 1608–1639. DOI: 10.1177/0278364910371999
Lowe R., Wu Y., Tamar A., Harb J., Abbeel P., Mordatch I. Multi-Agent Actor-Critic for Mixed Cooperative- Competitive Environments. Advances in Neural Information Processing Systems (NeurIPS). 2017. P. 6379–6390.
Foerster J., Farquhar G., Afouras T., Nardelli N., Whiteson S. Counterfactual multi-agent policy gradients. Proceedings of the AAAI Conference on Artificial Intelligence. 2018. Vol. 32, No. 1. P. 2974–2982.
Achiam J., Held D., Tamar A., Abbeel P. Constrained Policy Optimization. Proceedings of the 34th International Conference on Machine Learning (ICML). 2017. P. 22–31.
Cohen M. H., Belta C. Safe exploration in model-based reinforcement learning using control barrier functions. Automatica. 2023. Vol. 147. Art. 110684. DOI: 10.1016/j.automatica.2022.110684
Chow Y., Tamar A., Mannor S., Pavone M. Risk-sensitive and robust decision-making: a CVaR optimization approach. Advances in Neural Information Processing Systems (NeurIPS). 2015. P. 1522–1530.
Chen S., Mo Y., Wu X., Xiao J., Liu Q. Reinforcement Learning-Based Energy-Saving Path Planning for UAVs in Turbulent Wind. Electronics. 2024. Vol. 13, No. 16. Art. 3190. DOI: 10.3390/electronics13163190
Khalil E., Dai H., Zhang Y., Dilkina B., Song L. Learning combinatorial optimization algorithms over graphs. Advances in Neural Information Processing Systems (NeurIPS). 2017. P. 6348–6358.
Akkaya I., Andrychowicz M., Chociej M., et al. Solving Rubik’s Cube with a Robot Hand. arXiv preprint. arXiv:1910.07113. 2019.







