Taku Ito, Luca Cocchi, et al.
ICML 2025
Bridging the gap between algorithmic precision and human-like risk nuance is essential for crafting multi-agent systems that learn adaptable and strategically intuitive behaviors. We introduce CPT-MADDPG, an extension of the Multi-Agent Deep Deterministic Policy Gradient algorithm, embedding Cumulative Prospect Theory (CPT) value and probability weight transforms into both actor and critic updates. By replacing expected return maximization with rank-dependent Choquet integrals over gains and losses, CPT-MADDPG endows agents with tunable risk profiles —ranging from exploratory, risk-seeking to conservative, loss-averse behaviors—without human intervention. Across competitive pursuit (Simple Tag), cooperative coverage (Simple Spread), and strategic bidding (first-price auctions), we show that risk-seeking parameterized CPT speeds early learning, extreme risk-averse parameterized CPT enforces prudence at a performance cost, transparent utility sharing preserves coordination under heterogeneity, and naive dynamic adaptation destabilizes convergence. In auction settings, learned CPT policies replicate documented overbidding phenomena, with short-term gains followed by long-term losses. Our work demonstrates a principled framework for integrating human-like risk attitudes toward strategic multi-agent deployment.
Taku Ito, Luca Cocchi, et al.
ICML 2025
Yidi Wu, Thomas Bohnstingl, et al.
ICML 2025
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010