星际争霸多智能体挑战赛(SMAC)-程序员宅基地

技术标签: 人工智能  多智能体强化学习  

 目录

The StarCraft Multi-Agent Challenge

星际争霸多智能体挑战赛

Abstract 摘要

1 Introduction 

1 引言

2 Related Work 

2 相关工作

3 Multi-Agent Reinforcement Learning

3 多智能体强化学习

Dec-POMDPs 12-POMDPs

(十二月-POMDP)

Centralised training with decentralised execution集中式培训与分散执行

4 SMAC 

Scenarios 场景

State and Observations 

状态和观察结果

Action Space 

动作空间

Rewards 奖励

5 PyMARL 

6 Results 

6 结果

7 Conclusion and Future Work

7 结论和未来工作

Acknowledgements 

致谢

References 

引用

Appendix ASMAC 

附录 ASMAC

A.1 Scenarios

A.1 场景

A.2 Environment Setting 

A.2 环境设置

Appendix B Evaluation Methodology

附录 B 评估方法

B.1 Evaluation Metrics 

B.1 评估指标

Appendix C Experimental Setup

附录 C 实验设置

C.1 Architecture and Training

C.1 架构和训练

C.2 Reward and Observation

C.2 奖励与观察

Appendix DTable of Results

附录 DTable of results

The StarCraft Multi-Agent Challenge

星际争霸多智能体挑战赛

https://arxiv.org/abs/1902.04043

Abstract 摘要

        In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems.
        在过去几年中,
深度多智能体强化学习(RL)已成为一个非常活跃的研究领域。该领域的一类特别具有挑战性的问题是部分可观察的、合作的、多智能体的学习,其中智能体团队必须学会协调他们的行为,同时只以他们的个体观察为条件。这是一个有吸引力的研究领域,因为这些问题与大量现实世界的系统相关,并且比一般和问题更容易评估。

        Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.1 SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge scenarios and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms.2 We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.
        ALE 和 MuJoCo 等标准化环境使单智能体 RL 能够超越玩具领域,例如网格世界。然而,对于合作的多智能体 RL,没有可比的基准。因此,该领域的大多数论文都使用一次性的玩具问题,因此很难衡量真正的进展。在本文中,我们提出了星际争霸多智能体挑战(SMAC)作为基准问题来填补这一空白。 1 SMAC基于流行的即时战略游戏《星际争霸II》,专注于微观管理挑战,其中每个单位都由一个独立的代理控制,该代理必须根据当地观察采取行动。我们为基准和评估的最佳实践提供了多样化的挑战场景和建议。我们还开源了一个深度多智能体RL学习框架,包括最先进的算法。 2 我们相信,SMAC可以在未来几年提供标准的基准测试环境。我们针对多个 SMAC 场景的最佳代理的视频可在以下网址获得:https://youtu.be/VZ7zmQ_obZ0。

1 Introduction 

1 引言

        Deep reinforcement learning (RL) promises a scalable approach to solving arbitrary sequential decision-making problems, demanding only that a user must specify a reward function that expresses the desired behaviour. However, many real-world problems that might be tackled by RL are inherently multi-agent in nature. For example, the coordination of self-driving cars, autonomous drones, and other multi-robot systems are becoming increasingly critical. Network traffic routing, distributed sensing, energy distribution, and other logistical problems are also inherently multi-agent. As such, it is essential to develop multi-agent RL (MARL) solutions that can handle decentralisation constraints and deal with the exponentially growing joint action space of many agents.
        
深度强化学习 (RL) 承诺提供一种可扩展的方法来解决任意顺序决策问题,仅要求用户必须指定表达所需行为的奖励函数。然而,RL可能解决的许多现实世界问题本质上是多智能体的。例如,自动驾驶汽车、自主无人机和其他多机器人系统的协调变得越来越重要。网络流量路由、分布式感知、能量分配和其他物流问题本质上也是多智能体的。因此,开发多智能体 RL (MARL) 解决方案至关重要,该解决方案可以处理去中心化约束并处理许多智能体呈指数级增长的联合行动空间。

        Partially observable, cooperative, multi-agent learning problems are of particular interest. Cooperative problems avoid difficulties in evaluation inherent with general-sum games (e.g., which opponents are evaluated against). Cooperative problems also map well to a large class of critical problems where a single user that manages a distributed system can specify the overall goal, e.g., minimising traffic or other inefficiencies. Most real-world problems depend on inputs from noisy or limited sensors, so partial observability must also be dealt with effectively. This often includes limitations on communication that result in a need for decentralised execution of learned policies. However, there commonly is access to additional information during training, which may be carried out in controlled conditions or simulation.
        
部分可观察的、合作的、多智能体的学习问题特别令人感兴趣。合作问题避免了一般和博弈固有的评估困难(例如,根据哪些对手进行评估)。合作问题也可以很好地映射到一大类关键问题,在这些问题中,管理分布式系统的单个用户可以指定总体目标,例如,最小化流量或其他低效率。大多数现实世界的问题都依赖于来自嘈杂或有限传感器的输入,因此还必须有效地处理部分可观测性。这通常包括对沟通的限制,导致需要分散执行所学政策。但是,在培训期间通常会访问其他信息,这些信息可以在受控条件或模拟中进行。

        A growing number of recent works Foerster et al. (2018a); Rashid et al. (2018); Sunehag et al. (2017); Lowe et al. (2017) have begun to address the problems in this space. However, there is a clear lack of standardised benchmarks for research and evaluation. Instead, researchers often propose one-off environments which can be overly simple or tuned to the proposed algorithms. In single-agent RL, standard environments such as the Arcade Learning Environment Bellemare et al. (2013), or MuJoCo for continuous control Plappert et al. (2018), have enabled great progress. In this paper, we aim to follow this successful model by offering challenging standard benchmarks for deep MARL and to facilitate more rigorous experimental methodology across the field.
        Foerster et al. ( 2018a) 近期作品越来越多;Rashid等人(2018);Sunehag等人(2017);Lowe等人(2017)已经开始解决这一领域的问题。然而,研究和评估显然缺乏标准化的基准。相反,研究人员经常提出一次性的环境,这些环境可能过于简单或针对所提出的算法进行调整。在单智能体RL中,标准环境,如Arcade学习环境Bellemare等人(2013)或用于连续控制的MuJoCo(Plappert等人,2018)已经取得了长足的进步。在本文中,我们旨在通过为深度 MARL 提供具有挑战性的标准基准来遵循这一成功的模型,并促进整个领域更严格的实验方法。

        Some testbeds have emerged for other multi-agent regimes, such as Poker Heinrich & Silver (2016), Pong Tampuu et al. (2015), Keepaway Soccer Stone et al. (2005), or simple gridworld-like environments Lowe et al. (2017); Leibo et al. (2017); Yang et al. (2018); Zheng et al. (2017). Nonetheless, we identify a clear gap in challenging and standardised testbeds for the important set of domains described above.
        其他多智能体机制已经出现了一些测试平台,例如Poker Heinrich & Silver(2016),Pong Tampuu等人(2015),Keepaway Soccer Stone等人(2005),或简单的网格世界环境Lowe等人(2017);Leibo et al. ( 2017);Yang et al. ( 2018);Zheng 等人(2017 年)。尽管如此,我们发现在上述重要领域集的挑战性和标准化测试平台方面存在明显差距。

        To fill this gap, we introduce the StarCraft Multi-Agent Challenge (SMAC). SMAC is built on the popular real-time strategy game StarCraft II3 and makes use of the SC2LE environment Vinyals et al. (2017). Instead of tackling the full game of StarCraft with centralised control, we focus on decentralised micromanagement challenges (Figure 1). In these challenges, each of our units is controlled by an independent, learning agent that has to act based only on local observations, while the opponent’s units are controlled by the hand-coded built-in StarCraft II AI. We offer a diverse set of scenarios that challenge algorithms to handle high-dimensional inputs and partial observability, and to learn coordinated behaviour even when restricted to fully decentralised execution.
        
为了填补这一空白,我们推出了星际争霸多智能体挑战赛(SMAC)。SMAC建立在流行的即时战略游戏《星际争霸II 3 》之上,并利用了SC2LE环境Vinyals et al. (2017)。我们没有通过集中控制来应对《星际争霸》的全部游戏,而是专注于分散的微观管理挑战(图 1)。在这些挑战中,我们的每个单位都由一个独立的学习代理控制,该代理必须仅根据本地观察采取行动,而对手的单位则由手动编码的内置星际争霸 II AI 控制。我们提供了一组多样化的场景,这些场景挑战算法来处理高维输入和部分可观察性,并学习协调行为,即使仅限于完全分散的执行。

        The full games of StarCraft: BroodWar and StarCraft II have already been used as RL environments, due to the many interesting challenges inherent to the games Synnaeve et al. (2016); Vinyals et al. (2017). DeepMind’s AlphaStar DeepMind (2019) has recently shown an impressive level of play on a StarCraft II matchup using a centralised controller. In contrast, SMAC is not intended as an environment to train agents for use in full StarCraft II gameplay. Instead, by introducing strict decentralisation and local partial observability, we use the StarCraft II game engine to build a new set of rich cooperative multi-agent problems that bring unique challenges, such as the nonstationarity of learning Foerster et al. (2017), multi-agent credit assignment Foerster et al. (2018a), and the difficulty of representing the value of joint actions Rashid et al. (2018).
        《星际争霸:母巢战争》和《星际争霸II》的完整游戏已经被用作RL环境,因为游戏固有的许多有趣的挑战 Synnaeve et al. ( 2016);Vinyals等人(2017)。DeepMind 的 AlphaStar DeepMind ( 2019) 最近在使用集中式控制器的《星际争霸 II》对决中展示了令人印象深刻的游戏水平。相比之下,SMAC并不打算作为训练特工用于完整《星际争霸II》游戏的环境。取而代之的是,通过引入严格的去中心化和局部部分可观测性,我们使用《星际争霸II》游戏引擎构建了一组新的丰富的合作多智能体问题,这些问题带来了独特的挑战,例如学习的非平稳性Foerster et al. (2017),多智能体信用分配Foerster et al.(2018a),以及表示联合行动价值的困难Rashid et al. (2018)。

        To further facilitate research in this field, we also open-source PyMARL, a learning framework that can serve as a starting point for other researchers and includes implementations of several key MARL algorithms. PyMARL is modular, extensible, built on PyTorch, and serves as a template for dealing with some of the unique challenges of deep MARL in practice. We include results on our full set of SMAC environments using QMIX Rashid et al. (2018) and several baseline algorithms, and challenge the community to make progress on difficult environments in which good performance has remained out of reach so far. We also offer a set of guidelines for best practices in evaluations using our benchmark, including the reporting of standardised performance metrics, sample efficiency, and computational requirements (see Appendix B).
        为了进一步促进该领域的研究,我们还开源了 PyMARL,这是一个学习框架,可以作为其他研究人员的起点,包括几个关键 MARL 算法的实现。PyMARL 是模块化的、可扩展的、基于 PyTorch 构建的,可作为处理实践中深度 MARL 的一些独特挑战的模板。我们使用 QMIX Rashid 等人 (2018) 和几种基线算法包括了全套 SMAC 环境的结果,并挑战社区在迄今为止仍无法获得良好性能的困难环境中取得进展。我们还为使用我们的基准进行评估的最佳实践提供了一套指南,包括报告标准化性能指标、样本效率和计算要求(见附录 B)。

        We hope SMAC will serve as a valuable standard benchmark, enabling systematic and robust progress in deep MARL for years to come.
        我们希望SMAC能够成为一个有价值的标准基准,在未来几年内在深度MARL方面取得系统和稳健的进展。

2 Related Work 

2 相关工作

Much work has gone into designing environments to test and develop MARL agents. However, not many of these focused on providing a qualitatively challenging environment that would provide together elements of partial observability, challenging dynamics, and high-dimensional observation spaces.
在设计环境以测试和开发 MARL 代理方面已经做了大量工作。然而,其中没有多少专注于提供一个定性上具有挑战性的环境,该环境将提供部分可观测性、具有挑战性的动力学和高维观测空间的元素。

Stone et al. (2005) presented Keepaway soccer, a domain built on the RoboCup soccer simulator (Kitano et al., 1997), a 2D simulation of a football environment with simplified physics, where the main task consists of keeping a ball within a pre-defined area where agents in teams can reach, steal, and pass the ball, providing a simplified setup for studying cooperative MARL. This domain was later extended to the Half Field Offense task (Kalyanakrishnan et al., 2006; Hausknecht et al., 2016), which increases the difficulty of the problem by requiring the agents to not only keep the ball within bounds but also to score a goal. Neither task scales well in difficulty with the number of agents, as most agents need to do little coordination. There is also a lack of interesting environment dynamics beyond the simple 2D physics nor good reward signals, thus reducing the impact of the environment as a testbed.
Stone et al. (2005) 提出了 Keepaway soccer,这是一个建立在 RoboCup 足球模拟器(Kitano et al., 1997)上的领域,这是一个具有简化物理的足球环境的 2D 模拟,其中主要任务包括将球保持在预定义的区域内,团队中的代理可以到达、抢断和传球,为研究合作 MARL 提供了简化的设置。这个领域后来扩展到半场进攻任务(Kalyanakrishnan等人,2006;Hausknecht et al., 2016),这增加了问题的难度,要求代理人不仅要将球保持在界内,还要进球。这两项任务都不能很好地扩展代理的数量,因为大多数代理几乎不需要进行协调。除了简单的 2D 物理特性之外,还缺乏有趣的环境动态,也没有良好的奖励信号,从而减少了环境作为测试平台的影响。

Multiple gridworld-like environments have also been explored. Lowe et al. (2017) released a set of simple grid-world like environments for multi-agent RL alongside an implementation of MADDPG, featuring a mix of competitive and cooperative tasks focused on shared communication and low level continuous control. Leibo et al. (2017) show several mixed-cooperative Markov environment focused on testing social dilemmas, however, they did not release an implementation to further explore the tasks. Yang et al. (2018); Zheng et al. (2017) present a framework for creating gridworlds focuses on many-agents tasks, where the number of agents ranges from the hundreds to the millions. This

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/wq6qeg88/article/details/137169928

智能推荐

什么是内部类?成员内部类、静态内部类、局部内部类和匿名内部类的区别及作用?_成员内部类和局部内部类的区别-程序员宅基地

文章浏览阅读3.4k次,点赞8次,收藏42次。一、什么是内部类?or 内部类的概念内部类是定义在另一个类中的类;下面类TestB是类TestA的内部类。即内部类对象引用了实例化该内部对象的外围类对象。public class TestA{ class TestB {}}二、 为什么需要内部类?or 内部类有什么作用?1、 内部类方法可以访问该类定义所在的作用域中的数据,包括私有数据。2、内部类可以对同一个包中的其他类隐藏起来。3、 当想要定义一个回调函数且不想编写大量代码时,使用匿名内部类比较便捷。三、 内部类的分类成员内部_成员内部类和局部内部类的区别

分布式系统_分布式系统运维工具-程序员宅基地

文章浏览阅读118次。分布式系统要求拆分分布式思想的实质搭配要求分布式系统要求按照某些特定的规则将项目进行拆分。如果将一个项目的所有模板功能都写到一起,当某个模块出现问题时将直接导致整个服务器出现问题。拆分按照业务拆分为不同的服务器,有效的降低系统架构的耦合性在业务拆分的基础上可按照代码层级进行拆分(view、controller、service、pojo)分布式思想的实质分布式思想的实质是为了系统的..._分布式系统运维工具

用Exce分析l数据极简入门_exce l趋势分析数据量-程序员宅基地

文章浏览阅读174次。1.数据源准备2.数据处理step1:数据表处理应用函数:①VLOOKUP函数; ② CONCATENATE函数终表:step2:数据透视表统计分析(1) 透视表汇总不同渠道用户数, 金额(2)透视表汇总不同日期购买用户数,金额(3)透视表汇总不同用户购买订单数,金额step3:讲第二步结果可视化, 比如, 柱形图(1)不同渠道用户数, 金额(2)不同日期..._exce l趋势分析数据量

宁盾堡垒机双因素认证方案_horizon宁盾双因素配置-程序员宅基地

文章浏览阅读3.3k次。堡垒机可以为企业实现服务器、网络设备、数据库、安全设备等的集中管控和安全可靠运行,帮助IT运维人员提高工作效率。通俗来说,就是用来控制哪些人可以登录哪些资产(事先防范和事中控制),以及录像记录登录资产后做了什么事情(事后溯源)。由于堡垒机内部保存着企业所有的设备资产和权限关系,是企业内部信息安全的重要一环。但目前出现的以下问题产生了很大安全隐患:密码设置过于简单,容易被暴力破解;为方便记忆,设置统一的密码,一旦单点被破,极易引发全面危机。在单一的静态密码验证机制下,登录密码是堡垒机安全的唯一_horizon宁盾双因素配置

谷歌浏览器安装(Win、Linux、离线安装)_chrome linux debian离线安装依赖-程序员宅基地

文章浏览阅读7.7k次,点赞4次,收藏16次。Chrome作为一款挺不错的浏览器,其有着诸多的优良特性,并且支持跨平台。其支持(Windows、Linux、Mac OS X、BSD、Android),在绝大多数情况下,其的安装都很简单,但有时会由于网络原因,无法安装,所以在这里总结下Chrome的安装。Windows下的安装:在线安装:离线安装:Linux下的安装:在线安装:离线安装:..._chrome linux debian离线安装依赖

烤仔TVの尚书房 | 逃离北上广?不如押宝越南“北上广”-程序员宅基地

文章浏览阅读153次。中国发达城市榜单每天都在刷新,但无非是北上广轮流坐庄。北京拥有最顶尖的文化资源,上海是“摩登”的国际化大都市,广州是活力四射的千年商都。GDP和发展潜力是衡量城市的数字指...

随便推点

java spark的使用和配置_使用java调用spark注册进去的程序-程序员宅基地

文章浏览阅读3.3k次。前言spark在java使用比较少,多是scala的用法,我这里介绍一下我在项目中使用的代码配置详细算法的使用请点击我主页列表查看版本jar版本说明spark3.0.1scala2.12这个版本注意和spark版本对应,只是为了引jar包springboot版本2.3.2.RELEASEmaven<!-- spark --> <dependency> <gro_使用java调用spark注册进去的程序

汽车零部件开发工具巨头V公司全套bootloader中UDS协议栈源代码,自己完成底层外设驱动开发后,集成即可使用_uds协议栈 源代码-程序员宅基地

文章浏览阅读4.8k次。汽车零部件开发工具巨头V公司全套bootloader中UDS协议栈源代码,自己完成底层外设驱动开发后,集成即可使用,代码精简高效,大厂出品有量产保证。:139800617636213023darcy169_uds协议栈 源代码

AUTOSAR基础篇之OS(下)_autosar 定义了 5 种多核支持类型-程序员宅基地

文章浏览阅读4.6k次,点赞20次,收藏148次。AUTOSAR基础篇之OS(下)前言首先,请问大家几个小小的问题,你清楚:你知道多核OS在什么场景下使用吗?多核系统OS又是如何协同启动或者关闭的呢?AUTOSAR OS存在哪些功能安全等方面的要求呢?多核OS之间的启动关闭与单核相比又存在哪些异同呢?。。。。。。今天,我们来一起探索并回答这些问题。为了便于大家理解,以下是本文的主题大纲:[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JCXrdI0k-1636287756923)(https://gite_autosar 定义了 5 种多核支持类型

VS报错无法打开自己写的头文件_vs2013打不开自己定义的头文件-程序员宅基地

文章浏览阅读2.2k次,点赞6次,收藏14次。原因:自己写的头文件没有被加入到方案的包含目录中去,无法被检索到,也就无法打开。将自己写的头文件都放入header files。然后在VS界面上,右键方案名,点击属性。将自己头文件夹的目录添加进去。_vs2013打不开自己定义的头文件

【Redis】Redis基础命令集详解_redis命令-程序员宅基地

文章浏览阅读3.3w次,点赞80次,收藏342次。此时,可以将系统中所有用户的 Session 数据全部保存到 Redis 中,用户在提交新的请求后,系统先从Redis 中查找相应的Session 数据,如果存在,则再进行相关操作,否则跳转到登录页面。此时,可以将系统中所有用户的 Session 数据全部保存到 Redis 中,用户在提交新的请求后,系统先从Redis 中查找相应的Session 数据,如果存在,则再进行相关操作,否则跳转到登录页面。当数据量很大时,count 的数量的指定可能会不起作用,Redis 会自动调整每次的遍历数目。_redis命令

URP渲染管线简介-程序员宅基地

文章浏览阅读449次,点赞3次,收藏3次。URP的设计目标是在保持高性能的同时,提供更多的渲染功能和自定义选项。与普通项目相比,会多出Presets文件夹,里面包含着一些设置,包括本色,声音,法线,贴图等设置。全局只有主光源和附加光源,主光源只支持平行光,附加光源数量有限制,主光源和附加光源在一次Pass中可以一起着色。URP:全局只有主光源和附加光源,主光源只支持平行光,附加光源数量有限制,一次Pass可以计算多个光源。可编程渲染管线:渲染策略是可以供程序员定制的,可以定制的有:光照计算和光源,深度测试,摄像机光照烘焙,后期处理策略等等。_urp渲染管线

推荐文章

热门文章

相关标签