CooT: Learning to Coordinate In-Context with Coordination Transformers

Abstract

Effective coordination among unfamiliar partners remains a major challenge in multi-agent systems. Existing approaches, such as population-based methods, improve robustness through diversity but often lack mechanisms for efficient adaptation beyond the training distribution. Furthermore, gradient-based fine-tuning is often impractical for few-shot coordination as it requires a large number of interactions for meaningful improvement.

To address these limitations, we propose Coordination Transformers (CooT), a framework that leverages In-Context Learning (ICL) for real-time partner adaptation. Unlike prior ICL approaches that focus primarily on task generalization, CooT is specifically designed to generalize across diverse partner behaviors. By training on trajectories from behavior-preferring agents, CooT learns to align its actions with partner intentions purely through observation, without requiring explicit supervision or parameter updates.

We evaluate CooT on two challenging multi-agent benchmarks: Overcooked and Google Research Football. Results demonstrate that CooT consistently outperforms population-based methods, gradient-based fine-tuning, and Meta-RL baselines, achieving stable and rapid adaptation. Furthermore, human evaluations identify CooT as the preferred collaborator. Our ablations confirm its ability to remain stable under sudden partner changes, making it a reliable framework for real-world human-AI collaboration.

Method Overview

We propose Coordination Transformers (CooT), a framework designed specifically for in-context partner adaptation. CooT is trained on trajectories collected from interactions between pairs of agents whose behaviors reflect distinct underlying preferences (Hidden-Utility Markov Game).

At test time, CooT coordinates with unseen partners by continually updating its context from recent episodes, which adapts to the partner online without gradient updates, enabling few-shot generalization.

Coordination Demos

All demos feature Human-AI Coordination on the Coordination Ring Overcooked layout.
The AI Agent (Colored Hat) must coordinate with a Human Partner (Grey Hat) whose strategy is unseen during training.

CooT (Ours)

CooT

In-Context Adaptation: CooT leverages ICL to identify partner intentions from real-time cues. It adapts its policy based on interaction history context without requiring explicit partner modeling.

Behavioral Cloning (BC)

BC

No Adaptation Mechanism: BC works in simple layouts where agents act independently. However, when real coordination is required, it cannot adjust to a partner’s behavior.

HSP

MEP

Population Bottleneck: Population-based approaches (HSP, MEP) are limited by the representative quality of its training partners. It often lacks the flexibility needed to coordinate with real humans who deviate from the training distribution.

Performance & Scalability

Overcooked Benchmark

CooT leads in complex layouts like Coord. Ring Multi-recipe. This advantage stems from conditioning on full interaction histories to predict best-response actions for diverse, unseen partners.

Google Research Football (GRF)

CooT successfully scales to three-player complex environments in GRF. It effectively generalizes beyond two-agent scenarios, outperforming baselines in complex football dynamics.

Analysis of Adaptation

Few-shot Adaptation

CooT demonstrates efficient adaptation within the first 15 episodes. Only a handful of trajectories are sufficient to align its policy with novel, unseen partner behaviors.

Human-AI User Study

In a double-blind study with 36 participants, CooT was selected as the most preferred partner by 50% of users. Participants noted its observable adaptive capabilities as a primary strength.

CooT : Learning to Coordinate In-Context with Coordination Transformers