权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Application of Reinforcement Learning to the Flight Control of Unmanned Aerial Vehicles

强化学习在无人机飞行控制中的应用

基本信息

批准号：
2104294
负责人：
金额：
--
依托单位：
University of Bristol
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2018
资助国家：
英国
起止时间：
2018 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2104294
关键词：
Application Reinforcement Learning Flight Control

项目摘要

Project description:Complex urban environments pose a significant challenge for the operation of Unmanned Autonomous Systems (UAS). To operate in such areas, vehicles require the ability to rapidly change direction, avoid obstacles, and land in confined areas. This is especially challenging for a fixed-wing platform, due to the minimum airspeed needed to prevent aircraft stall. Fixed-wing platforms offer a number of advantages of rotary-wing vehicles, such as increased flight endurance and range, and greater payload capacity. As such, there is significant research in improving the agility of fixed-wing platforms, to improve their ability to operate in complex environments. This research proposal aims to build upon previous research projects conducted by the University of Bristol Flight Lab [1] [2]. These projects used a variable-sweep, fixed-wing platform to perform a bio-inspired perched landing manoeuvre. This agile landing manoeuvre, taking advantage of dynamic stall, enabled to UAS to land safely on small landing site, with minimal aircraft velocity, without the need for a long landing strip or arresting equipment. In particular, such a manoeuvre is applicable to challenging operational environments, such as in a complex urban setting, or operating from the deck of a ship. Non-linear control strategies were evaluated to generate the necessary perching manoeuvres. The reinforcement learning process, using a Deep Q-Network (DQN), generated trajectories with the lowest cost function, and showed the ability to generate trajectories from a range of starting conditions.Research in the first year aimed to modernise the perching UAV learning process, integrating and evaluating state-of-the-art reinforcement learning algorithms and frameworks. Compared to the DQN algorithm used previously, modern algorithms, such as Proximal Policy Optimisation (PPO), demonstrate the ability to attain higher rewards, as well as improved stability and convergence during the learning process. [3] This research has also explored the use of continuous control outputs, to increase the granularity of actuator control available to the learning agent. The next stage of this research is transitioning to real-world flight testing of the perching manoeuvre using these improvements to the process. This project has also transitioned to using state-of-the-art frameworks, such as OpenAI's Gym toolkit, to modernise and modularise the learning architecture. This lays the foundation for simpler, faster implementation of alternative algorithms and scenarios moving forward. This research project will aim to build on previous projects of the research group, and incorporate state-of-the-art algorithms and techniques, to develop reinforcement learning-based flight controllers which can perform a number of agile flight manoeuvres. The current flight dynamics model of a model UAV will be improved and expanded, to improve accuracy when performing agile manoeuvres, and by incorporating the lateral degrees of freedom into the current longitudinal-only model. Methods to improve the accuracy of the trained model will be evaluated and implemented, such as incorporating flight data into the offline, simulated learning process, and conducting online learning on the real-world vehicle. A number of agile flight manoeuvres, applicable to the operating in complex environments, will be selected, tested and evaluated. Examples of candidate algorithms include rapid changes of direction, and minimum distance 180 turns, such that the vehicle can avoid obstacles and navigate cluttered environments. A key focus of this research will be generating trained controllers and the necessary software frameworks such that they can be tested and used on real-world platforms.

项目描述：复杂的城市环境给无人自主系统(UAS)的运行带来了巨大的挑战。要在这样的区域运行，车辆需要能够迅速改变方向，避开障碍物，并在受限区域着陆。这对于固定翼平台来说尤其具有挑战性，因为防止飞机失速所需的最低空速。固定翼平台提供了旋转翼飞行器的许多优势，如更高的飞行耐力和航程，以及更大的有效载荷能力。因此，提高固定翼平台的敏捷性，提高其在复杂环境中的作业能力具有重要的研究意义。这项研究提案旨在建立在布里斯托尔大学飞行实验室[1][2]以前进行的研究项目的基础上。这些项目使用了一个可变后掠翼的固定翼平台来执行生物灵感栖息着陆动作。这种灵活的着陆操作，利用动态失速，使无人机能够以最小的飞机速度安全降落在小型着陆点，而不需要长长的着陆跑道或拦阻设备。特别是，这种机动适用于具有挑战性的作战环境，例如在复杂的城市环境中，或在舰船甲板上操作。对非线性控制策略进行了评估，以产生必要的栖息操纵。强化学习过程使用深度Q网络(DQN)，以最低的代价函数生成轨迹，并显示出从一系列起始条件生成轨迹的能力。第一年的研究旨在使栖息式无人机学习过程现代化，集成和评估最先进的强化学习算法和框架。与以前使用的DQN算法相比，现代算法，如最近策略优化(PPO)，显示了在学习过程中获得更高回报的能力，以及更好的稳定性和收敛性能。[3]本研究还探索了连续控制输出的使用，以增加可供学习代理使用的执行器控制的粒度。这项研究的下一阶段是过渡到使用这些过程改进的栖息动作的真实世界飞行测试。该项目还过渡到使用最先进的框架，如OpenAI的Gym工具包，以实现学习架构的现代化和模块化。这为更简单、更快地实施替代算法和方案奠定了基础。这项研究项目旨在以课题组以前的项目为基础，结合最先进的算法和技术，开发基于强化学习的飞行控制器，可以执行一些灵活的飞行动作。目前无人机模型的飞行动力学模型将进行改进和扩展，以提高进行灵活机动时的精度，并将横向自由度纳入当前仅限纵向的模型中。将评估和实施提高训练模型准确性的方法，例如将飞行数据纳入离线、模拟学习过程，以及在真实世界的飞行器上进行在线学习。将选择、测试和评估一些适用于复杂环境下操作的敏捷飞行动作。候选算法的例子包括快速改变方向和最小距离180个转弯，这样车辆就可以避开障碍物并在混乱的环境中导航。这项研究的一个关键重点将是生成训练有素的控制器和必要的软件框架，以便它们可以在真实世界的平台上进行测试和使用。