Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning

COLM 2025

Seoul National University
* equally contributed to this work

Game Play between Ours and Built-in AI (Elite)

Abstract

Large Language Models (LLMs) have recently demonstrated impressive action sequence prediction capabilities but often struggle with dynamic, longhorizon tasks such as real-time strategic games. In a game such as StarCraftII (SC2), agents need to manage resource constraints and adapt to evolving battlefield situations in a partially observable environment. This often overwhelms exisiting LLM-based approaches. To address these challenges, we propose a hierarchical multi-agent framework that employs specialized imitation learning agents under a meta-controller called Strategic Planner (SP). By expert demonstrations, each specialized agent learns a distinctive strategy, such as aerial support or defensive maneuvers, and produces coherent, structured multistep action sequences. The SP then orchestrates these proposals into a single, environmentally adaptive plan that ensures local decisions aligning with long-term strategies. We call this HIMA (Hierarchical Imitation Multi-Agent). We also present TEXTSCII-ALL, a comprehensive SC2 testbed that encompasses all race match combinations in SC2. Our empirical results show that HIMA outperforms state of the arts in strategic clarity, adaptability, and computational efficiency, underscoring the potential of combining specialized imitation modules with meta-level orchestration to develop more robust, general-purpose AI agents

Illustration of the proposed Hierarchical Imitation Multi-Agent (HIMA)

Limitations in existing LLM-based approaches

Insufficient domain knowledge and frequent short-term action generation lead to inefficient and ineffective long-term planning during game play.

Overview of the proposed hierarchical imitation multi-agent (HIMA) framework

Hierarchical Imitation Multi-Agent (HIMA) Framework

tl;dr. Multiple fine-tuned imitation agents (1.5B models) trained on our collected human demonstrations generate long action proposals, which a strategic planner (e.g., GPT) orchestrates into adaptive strategies.

Overview of the proposed hierarchical imitation multi-agent (HIMA) framework


Imitation-learning agent using human demonstration

We use human demonstration data from professional StarCraft II replays to train each specialized agent on distinct strategic patterns through supervised fine-tuning. Each agent generates structured action sequences spanning multiple timesteps (approximately 3 minutes), accompanied by tactical rationales that explain why these actions suit the current game state. By clustering the demonstration data based on unit compositions, we create agents specialized in different strategies—such as air superiority, ground forces, or balanced approaches. The Strategic Planner meta-controller then orchestrates these diverse agent proposals into a unified, adaptive plan that responds to real-time battlefield conditions. This hierarchical approach significantly reduces decision-making frequency while maintaining strategic coherence and adaptability in complex RTS environments.

Dataset for structured action sequence

Environment-aware action orchestration

The Strategic Planner orchestrates action proposals from multiple specialized agents by considering real-time environmental contexts including opponent state and previous action outcomes. We employ a four-stage process: assessing current game state, resolving conflicting strategies from different agents using Nominal Group Technique, formulating a unified strategy, and planning actions through temporal Chain-of-Thought (t-CoT). The t-CoT mechanism breaks down decisions into immediate actions (addressing failed commands), short-term responses (countering upcoming threats), and long-term strategic goals (maintaining technological advancement). Environmental feedback, such as enemy attacks or failed prerequisites, triggers immediate re-evaluation and adaptation of the current plan. This orchestration ensures that agent proposals are dynamically adjusted to battlefield conditions while maintaining strategic coherence across different time horizons.

Environment-aware action orchestration

Results

For more experimental results, please check out the paper.

Quantitative comparison with prior arts in matchups with built-in AI

Win rates (%) across all nine race matchups (Lv.4-Lv.10) in our TextSCII-ALL evaluation environment and performance of SoTAs.
Match-ups between HIMA (Ours) and SoTAs.

BibTeX

@inproceedings{ahnKC25,
  author    = {Ahn, Daechul and Kim, San and Choi, Jonghyun},
  title     = {Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning},
  booktitle = {COLM},
  year      = {2025},
}