The 27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2024)
Achieving high sample efficiency is a critical research area in reinforcement learning. This becomes extremely difficult in multi-agent reinforcement learning (MARL), as the capacity of the joint state and action space grows exponentially with the number of agents. The reliance of MARL solely on exploration and trial-and-error, without incorporating prior knowledge, exacerbates the issue of low sample efficiency. Currently, introducing symmetry into MARL is an effective approach to address this issue. Yet the concept of hierarchical symmetry, which maintains symmetry across different levels of a multi-agent system (MAS), has not been explored in existing methods. This paper focuses on multi-agent cooperative tasks and proposes a method incorporating hierarchical symmetry, termed the Hierarchical Equivariant Policy Network (HEPN) which is O(n)-equivariant. Specifically, HEPN utilizes clustering to perform hierarchical information extraction in MAS, and employs graph neural networks to model agent interactions. We conducted extensive experiments across various multi-agent tasks. The results indicate that our method achieves faster convergence speeds and higher convergence rewards compared to baseline algorithms. Additionally, we have deployed our algorithm in a physical multi-robot system, confirming its effectiveness in real-world environments.
Multi-agent systems (MAS) can naturally be constructed as a graph structure, where similar agents can be clustered into high-level nodes to form a high-level system. In the high-level system, more efficient message passing can be achieved, resulting in more reasonable actions. As shown in the left figure, symmetry exists in both high-level and low-level systems (Rotating a system, whether at low or high level, the actions of agents also rotate.), which is the hierarchical symmetry in MAS. This paper focuses on leveraging the intrinsic hierarchical symmetry in multi-agent systems to improve the sample efficiency of MARL. We proposed a novel policy network named the Hierarchical Equivariant Policy Network (HEPN), and its framework is shown below: