A Generalized Load Balancing Policy With Multi-Teacher Reinforcement Learning


IEEE Global Communications Conference (GLOBECOM)



Research Areas


Although reinforcement learning (RL) shows advantages in cellular network load balancing, it suffers from a low generalization ability, preventing it from real-world applications. Specifically, if network traffic pattern changes, the learned RL policy cannot adapt accordingly, resulting in system performance degradation. To address this issue, we propose a Multi-teacher MOdel BAsed Reinforcement Learning algorithm (MOBA), which leverages multi-teacher knowledge distillation theory to learn a generalized load balancing policy for adapting the real-world traffic pattern changes. The key is that different teachers represent different traffic patterns, and can learn various system models. By distilling and transferring the teacher knowledge, the student network is able to learn a generalized system model that covers different traffic patterns and unseen situations. Moreover, to improve the robustness of multi-teacher knowledge transfer, we learn a set of student models and use an ensemble method to jointly predict system dynamics. Results show that, compared with state-of-the-art RL methods, MOBA improves the minimal throughput and total throughput of a cellular network by up to 28.6% and 23.2%. Results also show that MOBA improves the training efficiency by up to 64%.