REINFORCEMENT LEARNING METHODS FOR UNMANNED AERIAL VEHICLES IN MILITARY LOGISTICS TASKS

Authors

DOI:

https://doi.org/10.35546/kntu2078-4481.2025.3.2.31

Keywords:

reinforcement learning, unmanned aerial vehicles, military logistics, MARL, CMDP, CVaR, CTDE, routing, energy constraints, simulation-based training

Abstract

This article provides a comprehensive review of modern reinforcement learning (RL) methods and their application in military logistics involving unmanned aerial vehicles (UAVs). The relevance of this topic arises from the increasing role of UAVs in ensuring rapid cargo transportation, reconnaissance, and support for combat units, particularly under time constraints and high-risk conditions. The paper analyzes key peer-reviewed studies that demonstrate the efficiency of RL in route planning, multi-agent coordination (multi-agent RL, MARL), resource and energy management, and risk-sensitive decision-making (CVaR). Special attention is devoted to approaches that formalize logistics tasks as constrained Markov decision processes (CMDP), the application of attention mechanisms and graph neural networks for route optimization, and centralized training with decentralized execution (CTDE) frameworks that enable effective multi-UAV cooperation in real time. Mathematical models and formulas describing policy optimization, energy constraints, and algorithmic safety enhancements are presented. The review also examines methods for integrating RL solutions with monitoring and control systems, addressing current challenges such as sim-to-real transfer, limited onboard computational resources, and resilience to communication loss. The findings of this work can serve as a foundation for building autonomous, adaptive logistics platforms capable of operating efficiently in dynamic and hazardous environments.

References

Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction. Cambridge : MIT Press, 2018. 548 p.

Abbeel P., Coates A., Ng A. Y. Autonomous helicopter aerobatics through apprenticeship learning. International Journal of Robotics Research. 2010. Vol. 29, No. 13. P. 1608–1639. DOI: 10.1177/0278364910371999

Lowe R., Wu Y., Tamar A., Harb J., Abbeel P., Mordatch I. Multi-Agent Actor-Critic for Mixed Cooperative- Competitive Environments. Advances in Neural Information Processing Systems (NeurIPS). 2017. P. 6379–6390.

Foerster J., Farquhar G., Afouras T., Nardelli N., Whiteson S. Counterfactual multi-agent policy gradients. Proceedings of the AAAI Conference on Artificial Intelligence. 2018. Vol. 32, No. 1. P. 2974–2982.

Achiam J., Held D., Tamar A., Abbeel P. Constrained Policy Optimization. Proceedings of the 34th International Conference on Machine Learning (ICML). 2017. P. 22–31.

Cohen M. H., Belta C. Safe exploration in model-based reinforcement learning using control barrier functions. Automatica. 2023. Vol. 147. Art. 110684. DOI: 10.1016/j.automatica.2022.110684

Chow Y., Tamar A., Mannor S., Pavone M. Risk-sensitive and robust decision-making: a CVaR optimization approach. Advances in Neural Information Processing Systems (NeurIPS). 2015. P. 1522–1530.

Chen S., Mo Y., Wu X., Xiao J., Liu Q. Reinforcement Learning-Based Energy-Saving Path Planning for UAVs in Turbulent Wind. Electronics. 2024. Vol. 13, No. 16. Art. 3190. DOI: 10.3390/electronics13163190

Khalil E., Dai H., Zhang Y., Dilkina B., Song L. Learning combinatorial optimization algorithms over graphs. Advances in Neural Information Processing Systems (NeurIPS). 2017. P. 6348–6358.

Akkaya I., Andrychowicz M., Chociej M., et al. Solving Rubik’s Cube with a Robot Hand. arXiv preprint. arXiv:1910.07113. 2019.

Published

2025-11-28