MODELING OF A UAV CONTROL SYSTEM USING THE REINFORCEMENT LEARNING MODEL

Authors

DOI:

https://doi.org/10.32782/mathematical-modelling/2026-9-1-10

Keywords:

UAV, drone, AI, reinforcement learning, controller, agent, reward, imitation, thrust, observation

Abstract

The article investigated the most common methods of UAV control using both traditional methods using the proportional- integral-differential (PID) control law and intelligent systems. The use of artificial intelligence (AI) algorithms, in particular reinforcement learning algorithms (Reinforcement Learning), ensures adaptability to changing dynamics and the environment. When creating a UAV model, the main indicators of the UAV control system were taken into account – Lift Amplitude and Lift Difference. The use of these parameters in modeling allows for realistic behavior of the drone in two-dimensional space, which makes it possible to accurately assess the effectiveness of control algorithms. The Lift Amplitude is responsible for basic stability and the performance of vertical tasks, while the Lift Difference provides the ability to maneuver and reach specified waypoints. A performance evaluation system (scoring) was proposed, which is based on the task of navigating to random points in space, represented in the form of «balloons». During the research, a mathematical model and control algorithms were developed that take into account the dynamics of UAV movement in two-dimensional space, taking into account the influence of inertia, gravity and propeller thrust. Simulation was carried out using the DQN, SAC and SAC algorithms with a modification of the differential thrust level. A comparative analysis of the effectiveness of the above approaches in different scenarios was performed. To test the effectiveness of the algorithms, a simulation environment based on the Python programming language was created using the NumPy, Matplotlib, Pygame and Stable-Baselines3 libraries. The environment allows you to model the tasks of flight stabilization, navigation and obstacle avoidance. In order to create a universal platform for researching control systems, a simulation environment based on the Python language was developed. This environment allows you to test algorithms in conditions close to real ones, without the need to use expensive equipment.

References

Литвиненко М. І., Ленець, В. Г., Гармаш Н. В. Шульга В. В. Аспекти впровадження штучного інтелекту у військовій справі. Збірник наукових праць Харківського національного університету Повітряних Сил. 2024. С. 13–18. DOI: https://doi.org/10.30748/zhups.2024.80.02

Koch W., Mancuso R., West R., Bestavros A. 2019. Reinforcement Learning for UAV Attitude Control. ACM Transactions on Cyber-Physical Systems. 2019. Vol. 3. №. 2. Article 22, 21 pages. https://doi.org/10.1145/3301273

ArduPilot Copter URL: https://ardupilot.org/copter/index.html (Дата звернення 18.11.25).

Maleki K. N., Ashenayi K., Hook L. R., Fuller J. G., Hutchins N. A reliable system design for nondeterministic adaptive controllers in small UAV autopilots. 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC’16). Sacramento, 25-29 September 2016. CA, USA, 2016, pp. 1–5, DOI: https://doi.org/10.1109/DASC.2016.7778103

Santoso F., Garratt M. A., Anavatti S. G. State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Transactions on Automation Science and Engineering. 2017. Vol. 15, № 2, Р. 613–627. DOI: https://doi.org/10.1109/TASE.2017.2651109

Dierks T., Jagannathan S. 2010. Output feedback control of a quadrotor UAV using neural networks. IEEE Transactions on Neural Networks. 2010. Vol. 21, № 1. Р. 50–66. DOI: https://doi.org/10.1109/TNN.2009.2034145

Bobtsov A., Guirik A., Budko M., Budko M. Hybrid parallel neuro-controller for multiro-tor unmanned aerial vehicle. 2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT’16), 18–20 October 2016. Lisbon, Portugal, 2016. Р. 1–4. DOI: https://doi.org/10.1109/ICUMT.2016.7765223

Shepherd III J. F. Tumer K. Robust neuro-control for a micro quadrotor. In Proceedings of the 12thAnnual Conference on Genetic and Evolutionary Computation. (GECCO’10). ACM, 2010. New York, NY, 1131–1138. https://doi.org/10.1145/1830483.1830693

Miglino O., Lund H. H., Nolfi S. Evolving mobile robots in simulated and real environ-ments. Artificial Life 1995. Vol. 2. № 4, 417–434. DOI: https://doi.org/10.1162/artl.1995.2.4.417

Hwangbo J., Sa I., Siegwart R., Hutter M. Control of a quadrotor with reinforcementlearning. IEEE Robotics and Automation Letters. 2017. Vol. 2. № 4. Р. 2096–2103. https://doi.org/10.1109/LRA.2017.2720851

Zorain M., Khan F. S., Hasanv N., Mohy Ud Din Z., Zeb Gul J. Deep reinforcement learning for UAV attitude control via adaptive gain optimization Applied Intelligence. 2025. Vol. 55. Issue 17. P. 1092. https://doi.org/10.1007/s10489-025-06978-1

Stable-Baselines3 Docs ‒ Reliable Reinforcement Learning Implementations URL: https://stable-baselines3.readthedocs.io/en/v1.0/ (Дата звернення 20.11.25).

Weights & Biases AI developer platform. URL: https://wandb.ai/site/ (Дата звернення 20.11.25).

Published

2026-07-01